redict/tests/unit/cluster.tcl
Meir Shpilraien (Spielrein) 885f6b5ceb
Redis Function Libraries (#10004)
# Redis Function Libraries

This PR implements Redis Functions Libraries as describe on: https://github.com/redis/redis/issues/9906.

Libraries purpose is to provide a better code sharing between functions by allowing to create multiple
functions in a single command. Functions that were created together can safely share code between
each other without worrying about compatibility issues and versioning.

Creating a new library is done using 'FUNCTION LOAD' command (full API is described below)

This PR introduces a new struct called libraryInfo, libraryInfo holds information about a library:
* name - name of the library
* engine - engine used to create the library
* code - library code
* description - library description
* functions - the functions exposed by the library

When Redis gets the `FUNCTION LOAD` command it creates a new empty libraryInfo.
Redis passes the `CODE` to the relevant engine alongside the empty libraryInfo.
As a result, the engine will create one or more functions by calling 'libraryCreateFunction'.
The new funcion will be added to the newly created libraryInfo. So far Everything is happening
locally on the libraryInfo so it is easy to abort the operation (in case of an error) by simply
freeing the libraryInfo. After the library info is fully constructed we start the joining phase by
which we will join the new library to the other libraries currently exist on Redis.
The joining phase make sure there is no function collision and add the library to the
librariesCtx (renamed from functionCtx). LibrariesCtx is used all around the code in the exact
same way as functionCtx was used (with respect to RDB loading, replicatio, ...).
The only difference is that apart from function dictionary (maps function name to functionInfo
object), the librariesCtx contains also a libraries dictionary that maps library name to libraryInfo object.

## New API
### FUNCTION LOAD
`FUNCTION LOAD <ENGINE> <LIBRARY NAME> [REPLACE] [DESCRIPTION <DESCRIPTION>] <CODE>`
Create a new library with the given parameters:
* ENGINE - REPLACE Engine name to use to create the library.
* LIBRARY NAME - The new library name.
* REPLACE - If the library already exists, replace it.
* DESCRIPTION - Library description.
* CODE - Library code.

Return "OK" on success, or error on the following cases:
* Library name already taken and REPLACE was not used
* Name collision with another existing library (even if replace was uses)
* Library registration failed by the engine (usually compilation error)

## Changed API
### FUNCTION LIST
`FUNCTION LIST [LIBRARYNAME <LIBRARY NAME PATTERN>] [WITHCODE]`
Command was modified to also allow getting libraries code (so `FUNCTION INFO` command is no longer
needed and removed). In addition the command gets an option argument, `LIBRARYNAME` allows you to
only get libraries that match the given `LIBRARYNAME` pattern. By default, it returns all libraries.

### INFO MEMORY
Added number of libraries to `INFO MEMORY`

### Commands flags
`DENYOOM` flag was set on `FUNCTION LOAD` and `FUNCTION RESTORE`. We consider those commands
as commands that add new data to the dateset (functions are data) and so we want to disallows
to run those commands on OOM.

## Removed API
* FUNCTION CREATE - Decided on https://github.com/redis/redis/issues/9906
* FUNCTION INFO - Decided on https://github.com/redis/redis/issues/9899

## Lua engine changes
When the Lua engine gets the code given on `FUNCTION LOAD` command, it immediately runs it, we call
this run the loading run. Loading run is not a usual script run, it is not possible to invoke any
Redis command from within the load run.
Instead there is a new API provided by `library` object. The new API's: 
* `redis.log` - behave the same as `redis.log`
* `redis.register_function` - register a new function to the library

The loading run purpose is to register functions using the new `redis.register_function` API.
Any attempt to use any other API will result in an error. In addition, the load run is has a time
limit of 500ms, error is raise on timeout and the entire operation is aborted.

### `redis.register_function`
`redis.register_function(<function_name>, <callback>, [<description>])`
This new API allows users to register a new function that will be linked to the newly created library.
This API can only be called during the load run (see definition above). Any attempt to use it outside
of the load run will result in an error.
The parameters pass to the API are:
* function_name - Function name (must be a Lua string)
* callback - Lua function object that will be called when the function is invokes using fcall/fcall_ro
* description - Function description, optional (must be a Lua string).

### Example
The following example creates a library called `lib` with 2 functions, `f1` and `f1`, returns 1 and 2 respectively:
```
local function f1(keys, args)
    return 1
end

local function f2(keys, args)
    return 2
end

redis.register_function('f1', f1)
redis.register_function('f2', f2)
```

Notice: Unlike `eval`, functions inside a library get the KEYS and ARGV as arguments to the
functions and not as global.

### Technical Details

On the load run we only want the user to be able to call a white list on API's. This way, in
the future, if new API's will be added, the new API's will not be available to the load run
unless specifically added to this white list. We put the while list on the `library` object and
make sure the `library` object is only available to the load run by using [lua_setfenv](https://www.lua.org/manual/5.1/manual.html#lua_setfenv) API. This API allows us to set
the `globals` of a function (and all the function it creates). Before starting the load run we
create a new fresh Lua table (call it `g`) that only contains the `library` API (we make sure
to set global protection on this table just like the general global protection already exists
today), then we use [lua_setfenv](https://www.lua.org/manual/5.1/manual.html#lua_setfenv)
to set `g` as the global table of the load run. After the load run finished we update `g`
metatable and set `__index` and `__newindex` functions to be `_G` (Lua default globals),
we also pop out the `library` object as we do not need it anymore.
This way, any function that was created on the load run (and will be invoke using `fcall`) will
see the default globals as it expected to see them and will not have the `library` API anymore.

An important outcome of this new approach is that now we can achieve a distinct global table
for each library (it is not yet like that but it is very easy to achieve it now). In the future we can
decide to remove global protection because global on different libraries will not collide or we
can chose to give different API to different libraries base on some configuration or input.

Notice that this technique was meant to prevent errors and was not meant to prevent malicious
user from exploit it. For example, the load run can still save the `library` object on some local
variable and then using in `fcall` context. To prevent such a malicious use, the C code also make
sure it is running in the right context and if not raise an error.
2022-01-06 13:39:38 +02:00

225 lines
7.0 KiB
Tcl

# Primitive tests on cluster-enabled redis using redis-cli
source tests/support/cli.tcl
proc cluster_info {r field} {
if {[regexp "^$field:(.*?)\r\n" [$r cluster info] _ value]} {
set _ $value
}
}
# Provide easy access to CLUSTER INFO properties. Same semantic as "proc s".
proc csi {args} {
set level 0
if {[string is integer [lindex $args 0]]} {
set level [lindex $args 0]
set args [lrange $args 1 end]
}
cluster_info [srv $level "client"] [lindex $args 0]
}
# make sure the test infra won't use SELECT
set ::singledb 1
# cluster creation is complicated with TLS, and the current tests don't really need that coverage
tags {tls:skip external:skip cluster} {
# start three servers
set base_conf [list cluster-enabled yes cluster-node-timeout 1]
start_server [list overrides $base_conf] {
start_server [list overrides $base_conf] {
start_server [list overrides $base_conf] {
set node1 [srv 0 client]
set node2 [srv -1 client]
set node3 [srv -2 client]
set node3_pid [srv -2 pid]
test {Create 3 node cluster} {
exec src/redis-cli --cluster-yes --cluster create \
127.0.0.1:[srv 0 port] \
127.0.0.1:[srv -1 port] \
127.0.0.1:[srv -2 port]
wait_for_condition 1000 50 {
[csi 0 cluster_state] eq {ok} &&
[csi -1 cluster_state] eq {ok} &&
[csi -2 cluster_state] eq {ok}
} else {
fail "Cluster doesn't stabilize"
}
}
test "Run blocking command on cluster node3" {
# key9184688 is mapped to slot 10923 (first slot of node 3)
set node3_rd [redis_deferring_client -2]
$node3_rd brpop key9184688 0
$node3_rd flush
wait_for_condition 50 100 {
[s -2 blocked_clients] eq {1}
} else {
fail "Client not blocked"
}
}
test "Perform a Resharding" {
exec src/redis-cli --cluster-yes --cluster reshard 127.0.0.1:[srv -2 port] \
--cluster-to [$node1 cluster myid] \
--cluster-from [$node3 cluster myid] \
--cluster-slots 1
}
test "Verify command got unblocked after resharding" {
# this (read) will wait for the node3 to realize the new topology
assert_error {*MOVED*} {$node3_rd read}
# verify there are no blocked clients
assert_equal [s 0 blocked_clients] {0}
assert_equal [s -1 blocked_clients] {0}
assert_equal [s -2 blocked_clients] {0}
}
test "Wait for cluster to be stable" {
wait_for_condition 1000 50 {
[catch {exec src/redis-cli --cluster \
check 127.0.0.1:[srv 0 port] \
}] == 0
} else {
fail "Cluster doesn't stabilize"
}
}
test "Sanity test push cmd after resharding" {
assert_error {*MOVED*} {$node3 lpush key9184688 v1}
set node1_rd [redis_deferring_client 0]
$node1_rd brpop key9184688 0
$node1_rd flush
wait_for_condition 50 100 {
[s 0 blocked_clients] eq {1}
} else {
puts "Client not blocked"
puts "read from blocked client: [$node1_rd read]"
fail "Client not blocked"
}
$node1 lpush key9184688 v2
assert_equal {key9184688 v2} [$node1_rd read]
}
$node1_rd close
$node3_rd close
test "Run blocking command again on cluster node1" {
$node1 del key9184688
# key9184688 is mapped to slot 10923 which has been moved to node1
set node1_rd [redis_deferring_client 0]
$node1_rd brpop key9184688 0
$node1_rd flush
wait_for_condition 50 100 {
[s 0 blocked_clients] eq {1}
} else {
fail "Client not blocked"
}
}
test "Kill a cluster node and wait for fail state" {
# kill node3 in cluster
exec kill -SIGSTOP $node3_pid
wait_for_condition 1000 50 {
[csi 0 cluster_state] eq {fail} &&
[csi -1 cluster_state] eq {fail}
} else {
fail "Cluster doesn't fail"
}
}
test "Verify command got unblocked after cluster failure" {
assert_error {*CLUSTERDOWN*} {$node1_rd read}
# verify there are no blocked clients
assert_equal [s 0 blocked_clients] {0}
assert_equal [s -1 blocked_clients] {0}
}
exec kill -SIGCONT $node3_pid
$node1_rd close
# stop three servers
}
}
}
# Test redis-cli -- cluster create, add-node, call.
# Test that functions are propagated on add-node
start_server [list overrides $base_conf] {
start_server [list overrides $base_conf] {
start_server [list overrides $base_conf] {
start_server [list overrides $base_conf] {
start_server [list overrides $base_conf] {
set node4_rd [redis_client -3]
set node5_rd [redis_client -4]
test {Functions are added to new node on redis-cli cluster add-node} {
exec src/redis-cli --cluster-yes --cluster create \
127.0.0.1:[srv 0 port] \
127.0.0.1:[srv -1 port] \
127.0.0.1:[srv -2 port]
wait_for_condition 1000 50 {
[csi 0 cluster_state] eq {ok} &&
[csi -1 cluster_state] eq {ok} &&
[csi -2 cluster_state] eq {ok}
} else {
fail "Cluster doesn't stabilize"
}
# upload a function to all the cluster
exec src/redis-cli --cluster-yes --cluster call 127.0.0.1:[srv 0 port] \
FUNCTION LOAD LUA TEST {redis.register_function('test', function() return 'hello' end)}
# adding node to the cluster
exec src/redis-cli --cluster-yes --cluster add-node \
127.0.0.1:[srv -3 port] \
127.0.0.1:[srv 0 port]
wait_for_condition 1000 50 {
[csi 0 cluster_state] eq {ok} &&
[csi -1 cluster_state] eq {ok} &&
[csi -2 cluster_state] eq {ok} &&
[csi -3 cluster_state] eq {ok}
} else {
fail "Cluster doesn't stabilize"
}
# make sure 'test' function was added to the new node
assert_equal {{library_name TEST engine LUA description {} functions {{name test description {}}}}} [$node4_rd FUNCTION LIST]
# add function to node 5
assert_equal {OK} [$node5_rd FUNCTION LOAD LUA TEST {redis.register_function('test', function() return 'hello' end)}]
# make sure functions was added to node 5
assert_equal {{library_name TEST engine LUA description {} functions {{name test description {}}}}} [$node5_rd FUNCTION LIST]
# adding node 5 to the cluster should failed because it already contains the 'test' function
catch {
exec src/redis-cli --cluster-yes --cluster add-node \
127.0.0.1:[srv -4 port] \
127.0.0.1:[srv 0 port]
} e
assert_match {*node already contains functions*} $e
}
# stop 5 servers
}
}
}
}
}
} ;# tags