Commit Graph

1660 Commits

Author SHA1 Message Date
Oran Agra
3204a03574
Move doc metadata from COMMAND to COMMAND DOCS (#10056)
Syntax:
`COMMAND DOCS [<command name> ...]`

Background:
Apparently old version of hiredis (and thus also redis-cli) can't
support more than 7 levels of multi-bulk nesting.

The solution is to move all the doc related metadata from COMMAND to a
new COMMAND DOCS sub-command.

The new DOCS sub-command returns a map of commands (not an array like in COMMAND),
And the same goes for the `subcommands` field inside it (also contains a map)

Besides that, the remaining new fields of COMMAND (hints, key-specs, and
sub-commands), are placed in the outer array rather than a nested map.
this was done mainly for consistency with the old format.

Other changes:
---
* Allow COMMAND INFO with no arguments, which returns all commands, so that we can some day deprecated
  the plain COMMAND (no args)

* Reduce the amount of deferred replies from both COMMAND and COMMAND
  DOCS, especially in the inner loops, since these create many small
  reply objects, which lead to many small write syscalls and many small
  TCP packets.
  To make this easier, when populating the command table, we count the
  history, args, and hints so we later know their size in advance.
  Additionally, the movablekeys flag was moved into the flags register.
* Update generate-commands-json.py to take the data from both command, it
  now executes redis-cli directly, instead of taking input from stdin.
* Sub-commands in both COMMAND (and COMMAND INFO), and also COMMAND DOCS,
  show their full name. i.e. CONFIG 
*   GET will be shown as `config|get` rather than just `get`.
  This will be visible both when asking for `COMMAND INFO config` and COMMAND INFO config|get`, but is
  especially important for the later.
  i.e. imagine someone doing `COMMAND INFO slowlog|get config|get` not being able to distinguish between the two
  items in the array response.
2022-01-11 17:16:16 +02:00
Binbin
39feee8e3a
LPOP/RPOP with count against non existing list return null array (#10095)
It used to return `$-1` in RESP2, now we will return `*-1`.
This is a bug in redis 6.2 when COUNT was added, the `COUNT`
option was introduced in #8179. Fix #10089.

the documentation of [LPOP](https://redis.io/commands/lpop) says
```
When called without the count argument:
Bulk string reply: the value of the first element, or nil when key does not exist.

When called with the count argument:
Array reply: list of popped elements, or nil when key does not exist.
```
2022-01-11 14:26:13 +02:00
Madelyn Olson
d0949b7c5c
Fix timing issue with cluster hostname test (#10086) 2022-01-10 16:21:05 -08:00
chenyang8094
bd46a2abf4
Support whitespace characters in appendfilename, and ban them in appenddirname (#10049)
1. Ban whitespace characters in `appenddirname`   
2. Handle the case where `appendfilename` contains spaces (for backwards compatibility)
2022-01-10 09:09:39 +02:00
Madelyn Olson
e8e02f900c
Changed latency histogram output to omit trailing 0s and periods (#10075)
Changed latency percentile output to omit trailing 0s and periods
2022-01-09 17:04:18 -08:00
Binbin
a84c964d37
Fix crash when error [sub]command name contains | (#10082)
The following error commands will crash redis-server:
```
> get|
Error: Server closed the connection
> get|set
Error: Server closed the connection
> get|other
```

The reason is in #9504, we use `lookupCommandBySds` for find the
container command. And it split the command (argv[0]) with `|`.
If we input something like `get|other`, after the split, `get`
will become a valid command name, pass the `ERR unknown command`
check, and finally crash in `addReplySubcommandSyntaxError`

In this case we do not need to split the command name with `|`
and just look in the commands dict to find if `argv[0]` is a
container command.

So this commit introduce a new function call `isContainerCommandBySds`
that it will return true if a command name is a container command.

Also with the old code, there is a incorrect error message:
```
> config|get set
(error) ERR Unknown subcommand or wrong number of arguments for 'set'. Try CONFIG|GET HELP.
```

The crash was reported in #10070.
2022-01-09 13:06:51 +02:00
guybe7
7cd6a64d2f
lpGetInteger returns int64_t, avoid overflow (#10068)
Fix #9410

Crucial for the ms and sequence deltas, but I changed all
calls, just in case (e.g. "flags")

Before this commit:
`ms_delta` and `seq_delta` could have overflown, causing `currid` to be wrong,
which in turn would cause `streamTrim` to trim the entire rax node (see new test)
2022-01-07 15:31:05 +02:00
Meir Shpilraien (Spielrein)
885f6b5ceb
Redis Function Libraries (#10004)
# Redis Function Libraries

This PR implements Redis Functions Libraries as describe on: https://github.com/redis/redis/issues/9906.

Libraries purpose is to provide a better code sharing between functions by allowing to create multiple
functions in a single command. Functions that were created together can safely share code between
each other without worrying about compatibility issues and versioning.

Creating a new library is done using 'FUNCTION LOAD' command (full API is described below)

This PR introduces a new struct called libraryInfo, libraryInfo holds information about a library:
* name - name of the library
* engine - engine used to create the library
* code - library code
* description - library description
* functions - the functions exposed by the library

When Redis gets the `FUNCTION LOAD` command it creates a new empty libraryInfo.
Redis passes the `CODE` to the relevant engine alongside the empty libraryInfo.
As a result, the engine will create one or more functions by calling 'libraryCreateFunction'.
The new funcion will be added to the newly created libraryInfo. So far Everything is happening
locally on the libraryInfo so it is easy to abort the operation (in case of an error) by simply
freeing the libraryInfo. After the library info is fully constructed we start the joining phase by
which we will join the new library to the other libraries currently exist on Redis.
The joining phase make sure there is no function collision and add the library to the
librariesCtx (renamed from functionCtx). LibrariesCtx is used all around the code in the exact
same way as functionCtx was used (with respect to RDB loading, replicatio, ...).
The only difference is that apart from function dictionary (maps function name to functionInfo
object), the librariesCtx contains also a libraries dictionary that maps library name to libraryInfo object.

## New API
### FUNCTION LOAD
`FUNCTION LOAD <ENGINE> <LIBRARY NAME> [REPLACE] [DESCRIPTION <DESCRIPTION>] <CODE>`
Create a new library with the given parameters:
* ENGINE - REPLACE Engine name to use to create the library.
* LIBRARY NAME - The new library name.
* REPLACE - If the library already exists, replace it.
* DESCRIPTION - Library description.
* CODE - Library code.

Return "OK" on success, or error on the following cases:
* Library name already taken and REPLACE was not used
* Name collision with another existing library (even if replace was uses)
* Library registration failed by the engine (usually compilation error)

## Changed API
### FUNCTION LIST
`FUNCTION LIST [LIBRARYNAME <LIBRARY NAME PATTERN>] [WITHCODE]`
Command was modified to also allow getting libraries code (so `FUNCTION INFO` command is no longer
needed and removed). In addition the command gets an option argument, `LIBRARYNAME` allows you to
only get libraries that match the given `LIBRARYNAME` pattern. By default, it returns all libraries.

### INFO MEMORY
Added number of libraries to `INFO MEMORY`

### Commands flags
`DENYOOM` flag was set on `FUNCTION LOAD` and `FUNCTION RESTORE`. We consider those commands
as commands that add new data to the dateset (functions are data) and so we want to disallows
to run those commands on OOM.

## Removed API
* FUNCTION CREATE - Decided on https://github.com/redis/redis/issues/9906
* FUNCTION INFO - Decided on https://github.com/redis/redis/issues/9899

## Lua engine changes
When the Lua engine gets the code given on `FUNCTION LOAD` command, it immediately runs it, we call
this run the loading run. Loading run is not a usual script run, it is not possible to invoke any
Redis command from within the load run.
Instead there is a new API provided by `library` object. The new API's: 
* `redis.log` - behave the same as `redis.log`
* `redis.register_function` - register a new function to the library

The loading run purpose is to register functions using the new `redis.register_function` API.
Any attempt to use any other API will result in an error. In addition, the load run is has a time
limit of 500ms, error is raise on timeout and the entire operation is aborted.

### `redis.register_function`
`redis.register_function(<function_name>, <callback>, [<description>])`
This new API allows users to register a new function that will be linked to the newly created library.
This API can only be called during the load run (see definition above). Any attempt to use it outside
of the load run will result in an error.
The parameters pass to the API are:
* function_name - Function name (must be a Lua string)
* callback - Lua function object that will be called when the function is invokes using fcall/fcall_ro
* description - Function description, optional (must be a Lua string).

### Example
The following example creates a library called `lib` with 2 functions, `f1` and `f1`, returns 1 and 2 respectively:
```
local function f1(keys, args)
    return 1
end

local function f2(keys, args)
    return 2
end

redis.register_function('f1', f1)
redis.register_function('f2', f2)
```

Notice: Unlike `eval`, functions inside a library get the KEYS and ARGV as arguments to the
functions and not as global.

### Technical Details

On the load run we only want the user to be able to call a white list on API's. This way, in
the future, if new API's will be added, the new API's will not be available to the load run
unless specifically added to this white list. We put the while list on the `library` object and
make sure the `library` object is only available to the load run by using [lua_setfenv](https://www.lua.org/manual/5.1/manual.html#lua_setfenv) API. This API allows us to set
the `globals` of a function (and all the function it creates). Before starting the load run we
create a new fresh Lua table (call it `g`) that only contains the `library` API (we make sure
to set global protection on this table just like the general global protection already exists
today), then we use [lua_setfenv](https://www.lua.org/manual/5.1/manual.html#lua_setfenv)
to set `g` as the global table of the load run. After the load run finished we update `g`
metatable and set `__index` and `__newindex` functions to be `_G` (Lua default globals),
we also pop out the `library` object as we do not need it anymore.
This way, any function that was created on the load run (and will be invoke using `fcall`) will
see the default globals as it expected to see them and will not have the `library` API anymore.

An important outcome of this new approach is that now we can achieve a distinct global table
for each library (it is not yet like that but it is very easy to achieve it now). In the future we can
decide to remove global protection because global on different libraries will not collide or we
can chose to give different API to different libraries base on some configuration or input.

Notice that this technique was meant to prevent errors and was not meant to prevent malicious
user from exploit it. For example, the load run can still save the `library` object on some local
variable and then using in `fcall` context. To prevent such a malicious use, the C code also make
sure it is running in the right context and if not raise an error.
2022-01-06 13:39:38 +02:00
Ozan Tezcan
568c2e039b
Set errno to EEXIST in redisFork() if child process exists (#10059)
Callers of redisFork() are logging `strerror(errno)` on failure.
`errno` is not set when there is already a child process, causing printing
current value of errno which was set before `redisFork()` call. 

Setting errno to EEXIST on this failure to provide more meaningful error message.
2022-01-06 09:54:21 +02:00
filipe oliveira
5dd15443ac
Added INFO LATENCYSTATS section: latency by percentile distribution/latency by cumulative distribution of latencies (#9462)
# Short description

The Redis extended latency stats track per command latencies and enables:
- exporting the per-command percentile distribution via the `INFO LATENCYSTATS` command.
  **( percentile distribution is not mergeable between cluster nodes ).**
- exporting the per-command cumulative latency distributions via the `LATENCY HISTOGRAM` command.
  Using the cumulative distribution of latencies we can merge several stats from different cluster nodes
  to calculate aggregate metrics .

By default, the extended latency monitoring is enabled since the overhead of keeping track of the
command latency is very small.
 
If you don't want to track extended latency metrics, you can easily disable it at runtime using the command:
 - `CONFIG SET latency-tracking no`

By default, the exported latency percentiles are the p50, p99, and p999.
You can alter them at runtime using the command:
- `CONFIG SET latency-tracking-info-percentiles "0.0 50.0 100.0"`


## Some details:
- The total size per histogram should sit around 40 KiB. We only allocate those 40KiB when a command
  was called for the first time.
- With regards to the WRITE overhead As seen below, there is no measurable overhead on the achievable
  ops/sec or full latency spectrum on the client. Including also the measured redis-benchmark for unstable
  vs this branch. 
- We track from 1 nanosecond to 1 second ( everything above 1 second is considered +Inf )

## `INFO LATENCYSTATS` exposition format

   - Format: `latency_percentiles_usec_<CMDNAME>:p0=XX,p50....` 

## `LATENCY HISTOGRAM [command ...]` exposition format

Return a cumulative distribution of latencies in the format of a histogram for the specified command names.

The histogram is composed of a map of time buckets:
- Each representing a latency range, between 1 nanosecond and roughly 1 second.
- Each bucket covers twice the previous bucket's range.
- Empty buckets are not printed.
- Everything above 1 sec is considered +Inf.
- At max there will be log2(1000000000)=30 buckets

We reply a map for each command in the format:
`<command name> : { `calls`: <total command calls> , `histogram` : { <bucket 1> : latency , < bucket 2> : latency, ...  } }`

Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-05 14:01:05 +02:00
sundb
4d3c4cfac7
Show the elapsed time of single test and speed up some tests (#10058)
Following #10038.

This PR introduces two changes.
1. Show the elapsed time of a single test in the test output, in order to have a more
detailed understanding of the changes in test run time.

2. Speedup two tests related to `key-load-delay` configuration.
other tests do not seem to be affected by #10003.
2022-01-05 13:49:01 +02:00
Ozan Tezcan
d1b5b63872
Fix typo in multi test (#10054) 2022-01-05 10:16:04 +02:00
Binbin
b7f9e9ae39
Add tests for blocking XREAD[GROUP] when the stream ran dry (#10035)
The purpose of this commit is to add some tests to
cover #5299, which was fixed in #5300 but without tests.

This commit should close #5306 and #5299.
2022-01-04 21:48:49 +02:00
guybe7
ac84b1cd82
Ban snapshot-creating commands and other admin commands from transactions (#10015)
Creating fork (or even a foreground SAVE) during a transaction breaks the atomicity of the transaction.
In addition to that, it could mess up the propagated transaction to the AOF file.

This change blocks SAVE, PSYNC, SYNC and SHUTDOWN from being executed inside MULTI-EXEC.
It does that by adding a command flag, so that modules can flag their commands with that flag too.

Besides it changes BGSAVE, BGREWRITEAOF, and CONFIG SET appendonly, to turn the
scheduled flag instead of forking righ taway.

Other changes:
* expose `protected`, `no-async-loading`, and `no_multi` flags in COMMAND command
* add a test to validate propagation of FLUSHALL inside a transaction.
* add a test to validate how CONFIG SET that errors reacts in a transaction

Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 13:37:47 +02:00
zhaozhao.zz
2e1979a21e
use startEvictionTimeProc() in config set maxmemory (#10019)
This would mean that the effects of `CONFIG SET maxmemory` may not be visible once the command returns.
That could anyway happen since incremental eviction was added in redis 6.2 (see #7653)

We do this to fix one of the propagation bugs about eviction see #9890 and #10014.
2022-01-04 13:08:10 +02:00
chenyang8094
87789fae0b
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.

The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)

The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
  it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
  one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
  incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
  `appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
  It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
  we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
  delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
  period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
  manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
  `aof-load-truncated` is enabled.

Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-03 19:14:13 +02:00
Meir Shpilraien (Spielrein)
78a62c0124
Fix OOM error not raised of functions (#10048)
OOM Error did not raise on functions due to a bug.
Added test to verify the fix.
2022-01-03 19:04:29 +02:00
Madelyn Olson
5460c10047
Implement clusterbus message extensions and cluster hostname support (#9530)
Implement the ability for cluster nodes to advertise their location with extension messages.
2022-01-02 19:48:29 -08:00
Harkrishn Patro
9f8885760b
Sharded pubsub implementation (#8621)
This commit implements a sharded pubsub implementation based off of shard channels.

Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2022-01-02 16:54:47 -08:00
Binbin
b8ba942ac2
Add DUMP RESTORE tests for redis-cli -x and -X options (#10041)
This commit adds DUMP RESTORES tests for the -x and -X options.
I wanted to add it in #9980 which introduce the -X option, but
back then i failed due to some errors (related to redis-cli call).
2022-01-02 13:58:22 +02:00
Viktor Söderqvist
45a155bd0f
Wait for replicas when shutting down (#9872)
To avoid data loss, this commit adds a grace period for lagging replicas to
catch up the replication offset.

Done:

* Wait for replicas when shutdown is triggered by SIGTERM and SIGINT.

* Wait for replicas when shutdown is triggered by the SHUTDOWN command. A new
  blocked client type BLOCKED_SHUTDOWN is introduced, allowing multiple clients
  to call SHUTDOWN in parallel.
  Note that they don't expect a response unless an error happens and shutdown is aborted.

* Log warning for each replica lagging behind when finishing shutdown.

* CLIENT_PAUSE_WRITE while waiting for replicas.

* Configurable grace period 'shutdown-timeout' in seconds (default 10).

* New flags for the SHUTDOWN command:

    - NOW disables the grace period for lagging replicas.

    - FORCE ignores errors writing the RDB or AOF files which would normally
      prevent a shutdown.

    - ABORT cancels ongoing shutdown. Can't be combined with other flags.

* New field in the output of the INFO command: 'shutdown_in_milliseconds'. The
  value is the remaining maximum time to wait for lagging replicas before
  finishing the shutdown. This field is present in the Server section **only**
  during shutdown.

Not directly related:

* When shutting down, if there is an AOF saving child, it is killed **even** if AOF
  is disabled. This can happen if BGREWRITEAOF is used when AOF is off.

* Client pause now has end time and type (WRITE or ALL) per purpose. The
  different pause purposes are *CLIENT PAUSE command*, *failover* and
  *shutdown*. If clients are unpaused for one purpose, it doesn't affect client
  pause for other purposes. For example, the CLIENT UNPAUSE command doesn't
  affect client pause initiated by the failover or shutdown procedures. A completed
  failover or a failed shutdown doesn't unpause clients paused by the CLIENT
  PAUSE command.

Notes:

* DEBUG RESTART doesn't wait for replicas.

* We already have a warning logged when a replica disconnects. This means that
  if any replica connection is lost during the shutdown, it is either logged as
  disconnected or as lagging at the time of exit.

Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-02 09:50:15 +02:00
yoav-steinberg
1bf6d6f11e
Generate RDB with Functions only via redis-cli --functions-rdb (#9968)
This is needed in order to ease the deployment of functions for ephemeral cases, where user
needs to spin up a server with functions pre-loaded.

#### Details:

* Added `--functions-rdb` option to _redis-cli_.
* Functions only rdb via `REPLCONF rdb-filter-only functions`. This is a placeholder for a space
  separated inclusion filter for the RDB. In the future can be `REPLCONF rdb-filter-only
  "functions db:3 key-patten:user*"` and a complementing `rdb-filter-exclude` `REPLCONF`
  can also be added.
* Handle "slave requirements" specification to RDB saving code so we can use the same RDB
  when different slaves express the same requirements (like functions-only) and not share the
  RDB when their requirements differ. This is currently just a flags `int`, but can be extended to
  a more complex structure with various filter fields.
* make sure to support filters only in diskless replication mode (not to override the persistence file),
  we do that by forcing diskless (even if disabled by config)

other changes:
* some refactoring in rdb.c (extract portion of a big function to a sub-function)
* rdb_key_save_delay used in AOFRW too
* sendChildInfo takes the number of updated keys (incremental, rather than absolute)

Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-02 09:39:01 +02:00
sundb
888e92eb57
Fix a valgrind test failure due to slowly shutdown (#10038)
This pr is mainly to solve the problem that redis process cannot be exited normally, due to changes in #10003.
When a test uses the `key-load-delay` config to delay loading, but does not reset it at the end of the test, will lead to server wait for the loading to reach the event
loop (once in 2mb) before actually shutting down.
2022-01-01 17:45:13 +02:00
Viktor Söderqvist
e4b3a257ee
Modules: Mark all APIs non-experimental (#9983)
These exist for quite some time, and are no longer experimental
2021-12-30 12:17:22 +02:00
Binbin
4836ae32c7
redis-cli: Add -X option and extend --cluster call take arg from stdin (#9980)
There are two changes in this commit:

1. Add -X option to redis-cli.
Currently `-x` can only be used to provide the last argument,
so you can do `redis-cli dump keyname > key.dump`,
and then do `redis-cli -x restore keyname 0 < key.dump`.

But what if you want to add the replace argument (which comes last?).
oran suggested adding such usage:
`redis-cli -X <tag> restore keyname <tag> replace < key.dump`

i.e. you're able to provide a string in the arguments that's gonna be
substituted with the content from stdin.

Note that the tag name should not conflict with others non-replaced args.
And the -x and -X options are conflicting.

Some usages:
```
[root]# echo mypasswd | src/redis-cli -X passwd_tag mset username myname password passwd_tag                                                   OK
[root]# echo username > username.txt
[root]# head -c -1 username.txt | src/redis-cli -X name_tag mget name_tag password
1) "myname"
2) "mypasswd\n"
```

2. Handle the combination of both `-x` and `--cluster` or `-X` and `--cluster`
Extend the broadcast option to receive the last arg or <tag> arg from the stdin.

Now we can use `redis-cli -x --cluster call <host>:<port> cmd`,
or `redis-cli -X <tag> --cluster call <host>:<port> cmd <tag>`.
(support part of #9899)
2021-12-30 12:10:04 +02:00
Ozan Tezcan
b0c06e904a
Fixed typo in test tag (for needs:debug) (#10021) 2021-12-28 16:23:02 +02:00
guybe7
266d95066d
Remove incomplete fix of a broader problem (#10013)
Preventing COFIG SET maxmemory from propagating is just the tip of the iceberg.
Module that performs a write operation in a notification can cause any
command to be propagated, based on server.dirty

We need to come up with a better solution.
2021-12-28 10:19:58 +02:00
chenyang8094
af0b50f83a
Tests: don't rely on the response of MEMORY USAGE when mem_allocator is not jemalloc (#10010)
It turns out that libc malloc can return an allocation of a different size on requests of the same size.
this means that matching MEMORY USAGE of one key to another copy of the same data can fail.

Solution:
Keep running the test that calls MEMORY USAGE, but ignore the response.
We do that by introducing a new utility function to get the memory usage, which always returns 1
when the allocator is not jemalloc.

Other changes:
Some formatting for datatype2.tcl

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-12-27 21:37:21 +02:00
Itamar Haber
f810510bb2
Adds utils/gen-commands-json.py (#9958)
Following #9656, this script generates a "commands.json" file from the output
of the new COMMAND. The output of this script is used in redis/redis-doc#1714
and by redis/redis-io#259. This also converts a couple of rogue dashes (in 
'key-specs' and 'multiple-token' flags) to underscores (continues #9959).
2021-12-27 19:31:13 +02:00
chenyang8094
317464a386
Fix failing test due to recent change in transaction propagation (#10006)
PR #9890 may have introduced a problem.
There are tests that use MULTI-EXEC to make sure two BGSAVE / BGREWRITEAOF are executed together.
But now it's not valid to run run commands that create a snapshot inside a transaction (gonna be blocked soon)
This PR modifies the test not to rely on MULTI-EXEC.

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-12-27 15:18:17 +02:00
guybe7
0f15e025e6
Fix race in propagation test (#10012)
There's a race between testing DBSIZE and the thread starting.
If the thread hadn't started by the time we checked DBISZE, no
keys will have been evicted.
The correct way is to check the evicted_keys stat.
2021-12-27 12:31:24 +02:00
Binbin
e84ccc3f56
santize dump payload: fix carsh when zset with NAN score (#10002)
`zslInsert` with a NAN score will crash the server.
This one found by the `corrupt-dump-fuzzer`.
2021-12-26 11:40:11 +02:00
Meir Shpilraien (Spielrein)
365cbf46a7
Add FUNCTION DUMP and RESTORE. (#9938)
Follow the conclusions to support Functions in redis cluster (#9899)

Added 2 new FUNCTION sub-commands:
1. `FUNCTION DUMP` - dump a binary payload representation of all the functions.
2. `FUNCTION RESTORE <PAYLOAD> [FLUSH|APPEND|REPLACE]` - give the binary payload extracted
   using `FUNCTION DUMP`, restore all the functions on the given payload. Restore policy can be given to
   control how to handle existing functions (default is APPEND):
   * FLUSH: delete all existing functions.
   * APPEND: appends the restored functions to the existing functions. On collision, abort.
   * REPLACE: appends the restored functions to the existing functions. On collision,
     replace the old function with the new function.

Modify `redis-cli --cluster add-node` to use `FUNCTION DUMP` to get existing functions from
one of the nodes in the cluster, and `FUNCTION RESTORE` to load the same set of functions
to the new node. `redis-cli` will execute this step before sending the `CLUSTER MEET` command
to the new node. If `FUNCTION DUMP` returns an error, assume the current Redis version do not
support functions and skip `FUNCTION RESTORE`. If `FUNCTION RESTORE` fails, abort and do not send
the `CLUSTER MEET` command. If the new node already contains functions (before the `FUNCTION RESTORE`
is sent), abort and do not add the node to the cluster. Test was added to verify
`redis-cli --cluster add-node` works as expected.
2021-12-26 09:03:37 +02:00
Meir Shpilraien (Spielrein)
08ff606b0b
Changed fuction name to be case insensitive. (#9984)
Use case insensitive string comparison for function names (like we do for commands and configs)
In addition, add verification that the functions only use the following characters: [a-zA-Z0-9_]
2021-12-26 08:37:24 +02:00
guybe7
7ac213079c
Sort out mess around propagation and MULTI/EXEC (#9890)
The mess:
Some parts use alsoPropagate for late propagation, others using an immediate one (propagate()),
causing edge cases, ugly/hacky code, and the tendency for bugs

The basic idea is that all commands are propagated via alsoPropagate (i.e. added to a list) and the
top-most call() is responsible for going over that list and actually propagating them (and wrapping
them in MULTI/EXEC if there's more than one command). This is done in the new function,
propagatePendingCommands.

Callers to propagatePendingCommands:
1. top-most call() (we want all nested call()s to add to the also_propagate array and just the top-most
   one to propagate them) - via `afterCommand`
2. handleClientsBlockedOnKeys: it is out of call() context and it may propagate stuff - via `afterCommand`. 
3. handleClientsBlockedOnKeys edge case: if the looked-up key is already expired, we will propagate the
   expire but will not unblock any client so `afterCommand` isn't called. in that case, we have to propagate
   the deletion explicitly.
4. cron stuff: active-expire and eviction may also propagate stuff
5. modules: the module API allows to propagate stuff from just about anywhere (timers, keyspace notifications,
   threads). I could have tried to catch all the out-of-call-context places but it seemed easier to handle it in one
   place: when we free the context. in the spirit of what was done in call(), only the top-most freeing of a module
   context may cause propagation.
6. modules: when using a thread-safe ctx it's not clear when/if the ctx will be freed. we do know that the module
   must lock the GIL before calling RM_Replicate/RM_Call so we propagate the pending commands when
   releasing the GIL.

A "known limitation", which were actually a bug, was fixed because of this commit (see propagate.tcl):
   When using a mix of RM_Call with `!` and RM_Replicate, the command would propagate out-of-order:
   first all the commands from RM_Call, and then the ones from RM_Replicate

Another thing worth mentioning is that if, in the past, a client would issue a MULTI/EXEC with just one
write command the server would blindly propagate the MULTI/EXEC too, even though it's redundant.
not anymore.

This commit renames propagate() to propagateNow() in order to cause conflicts in pending PRs.
propagatePendingCommands is the only caller of propagateNow, which is now a static, internal helper function.

Optimizations:
1. alsoPropagate will not add stuff to also_propagate if there's no AOF and replicas
2. alsoPropagate reallocs also_propagagte exponentially, to save calls to memmove

Bugfixes:
1. CONFIG SET can create evictions, sending notifications which can cause to dirty++ with modules.
   we need to prevent it from propagating to AOF/replicas
2. We need to set current_client in RM_Call. buggy scenario:
   - CONFIG SET maxmemory, eviction notifications, module hook calls RM_Call
   - assertion in lookupKey crashes, because current_client has CONFIG SET, which isn't CMD_WRITE
3. minor: in eviction, call propagateDeletion after notification, like active-expire and all commands
   (we always send a notification before propagating the command)
2021-12-23 00:03:48 +02:00
Oran Agra
b7567394e1
resolve replication test timing sensitivity - 2nd attempt (#9988)
issue started failing after #9878 was merged (made an exiting test more sensitive)
looks like #9982 didn't help, tested this one and it seems to work better.

this commit does two things:
1. reduce the extra delay i added earlier and instead add more keys, the effect no duration
   of replication is the same, but the intervals in which the server is responsive to the tcl client is higher.
2. improve the test infra to print context when assert_error fails.
2021-12-22 23:37:12 +02:00
Oran Agra
e33e0295bb
resolve replication test timing sensitivity (#9982)
issue started failing after #9878 was merged (made an exiting test more sensitive)
2021-12-22 16:05:53 +02:00
Oran Agra
41e6e05dee
Allow most CONFIG SET during loading, block some commands in async-loading (#9878)
## background
Till now CONFIG SET was blocked during loading.
(In the not so distant past, GET was disallowed too)

We recently (not released yet) added an async-loading mode, see #9323,
and during that time it'll serve CONFIG SET and any other command.
And now we realized (#9770) that some configs, and commands are dangerous
during async-loading.

## changes
* Allow most CONFIG SET during loading (both on async-loading and normal loading)
* Allow CONFIG REWRITE and CONFIG RESETSTAT during loading
* Block a few config during loading (`appendonly`, `repl-diskless-load`, and `dir`)
* Block a few commands during loading (list below)

## the blocked commands:
* SAVE - obviously we don't wanna start a foregreound save during loading 8-)
* BGSAVE - we don't mind to schedule one, but we don't wanna fork now
* BGREWRITEAOF - we don't mind to schedule one, but we don't wanna fork now
* MODULE - we obviously don't wanna unload a module during replication / rdb loading
  (MODULE HELP and MODULE LIST are not blocked)
* SYNC / PSYNC - we're in the middle of RDB loading from master, must not allow sync
  requests now.
* REPLICAOF / SLAVEOF - we're in the middle of replicating, maybe it makes sense to let
  the user abort it, but he couldn't do that so far, i don't wanna take any risk of bugs due to odd state.
* CLUSTER - only allow [HELP, SLOTS, NODES, INFO, MYID, LINKS, KEYSLOT, COUNTKEYSINSLOT,
  GETKEYSINSLOT, RESET, REPLICAS, COUNT_FAILURE_REPORTS], for others, preserve the status quo

## other fixes
* processEventsWhileBlocked had an issue when being nested, this could happen with a busy script
  during async loading (new), but also in a busy script during AOF loading (old). this lead to a crash in
  the scenario described in #6988
2021-12-22 14:11:16 +02:00
zhugezy
ad55fbaabb
Shorten timeouts of CLIENT PAUSE to avoid hanging when tests fail. (#9975)
If a test fails at `wait_for_blocked_clients_count` after the `PAUSE` command,
It won't send `UNPAUSE` to server, leading to the server hanging until timeout,
which is bad and hard to debug sometimes when developing.
This PR tries to fix this.

Timeout in `CLIENT PAUSE` shortened from 1e5 seconds(extremely long) to 50~100 seconds.
2021-12-22 12:06:29 +02:00
Meir Shpilraien (Spielrein)
3bcf108416
Change FUNCTION CREATE, DELETE and FLUSH to be WRITE commands instead of MAY_REPLICATE. (#9953)
The issue with MAY_REPLICATE is that all automatic mechanisms to handle
write commands will not work. This require have a special treatment for:
* Not allow those commands to be executed on RO replica.
* Allow those commands to be executed on RO replica from primary connection.
* Allow those commands to be executed on the RO replica from AOF.

By setting those commands as WRITE commands we are getting all those properties from Redis.
Test was added to verify that those properties work as expected.

In addition, rearrange when and where functions are flushed. Before this PR functions were
flushed manually on `rdbLoadRio` and cleaned manually on failure. This contradicts the
assumptions that functions are data and need to be created/deleted alongside with the
data. A side effect of this, for example, `debug reload noflush` did not flush the data but
did flush the functions, `debug loadaof` flush the data but not the functions.
This PR move functions deletion into `emptyDb`. `emptyDb` (renamed to `emptyData`) will
now accept an additional flag, `NOFUNCTIONS` which specifically indicate that we do not
want to flush the functions (on all other cases, functions will be flushed). Used the new flag
on FLUSHALL and FLUSHDB only! Tests were added to `debug reload` and `debug loadaof`
to verify that functions behave the same as the data.

Notice that because now functions will be deleted along side with the data we can not allow
`CLUSTER RESET` to be called from within a function (it will cause the function to be released
while running), this PR adds `NO_SCRIPT` flag to `CLUSTER RESET`  so it will not be possible
to be called from within a function. The other cluster commands are allowed from within a
function (there are use-cases that uses `GETKEYSINSLOT` to iterate over all the keys on a
given slot). Tests was added to verify `CLUSTER RESET` is denied from within a script.

Another small change on this PR is that `RDBFLAGS_ALLOW_DUP` is also applicable on functions.
When loading functions, if this flag is set, we will replace old functions with new ones on collisions.
2021-12-21 16:13:29 +02:00
zhugezy
1b0968df46
Remove EVAL script verbatim replication, propagation, and deterministic execution logic (#9812)
# Background

The main goal of this PR is to remove relevant logics on Lua script verbatim replication,
only keeping effects replication logic, which has been set as default since Redis 5.0.
As a result, Lua in Redis 7.0 would be acting the same as Redis 6.0 with default
configuration from users' point of view.

There are lots of reasons to remove verbatim replication.
Antirez has listed some of the benefits in Issue #5292:

>1. No longer need to explain to users side effects into scripts.
    They can do whatever they want.
>2. No need for a cache about scripts that we sent or not to the slaves.
>3. No need to sort the output of certain commands inside scripts
    (SMEMBERS and others): this both simplifies and gains speed.
>4. No need to store scripts inside the RDB file in order to startup correctly.
>5. No problems about evicting keys during the script execution.

When looking back at Redis 5.0, antirez and core team decided to set the config
`lua-replicate-commands yes` by default instead of removing verbatim replication
directly, in case some bad situations happened. 3 years later now before Redis 7.0,
it's time to remove it formally.

# Changes

- configuration for lua-replicate-commands removed
  - created config file stub for backward compatibility
- Replication script cache removed
  - this is useless under script effects replication
  - relevant statistics also removed
- script persistence in RDB files is also removed
- Propagation of SCRIPT LOAD and SCRIPT FLUSH to replica / AOF removed
- Deterministic execution logic in scripts removed (i.e. don't run write commands
  after random ones, and sorting output of commands with random order)
  - the flags indicating which commands have non-deterministic results are kept as hints to clients.
- `redis.replicate_commands()` & `redis.set_repl()` changed
  - now `redis.replicate_commands()` does nothing and return an 1
  - ...and then `redis.set_repl()` can be issued before `redis.replicate_commands()` now
- Relevant TCL cases adjusted
- DEBUG lua-always-replicate-commands removed

# Other changes
- Fix a recent bug comparing CLIENT_ID_AOF to original_client->flags instead of id. (introduced in #9780)

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-12-21 08:32:42 +02:00
Binbin
febc3f63b2
Fix recent daily CI test failures (#9966)
Recent PRs have introduced some failures, this commit
try to fix these CI failures. Here are the changes:

1. Enable debug-command in sentinel test.
```
Master reboot in very short time: ERR DEBUG command not allowed. If the
enable-debug-command option is set to "local", you can run it from a
local connection, otherwise you need to set this option in the
configuration file, and then restart the server.
```

2. Enable protected-config in sentinel test.
```
SDOWN is triggered by misconfigured instance replying with errors: ERR
CONFIG SET failed (possibly related to argument 'dir') - can't set
protected config
```

3. Enable debug-command in cluster test.
```
Verify slaves consistency: ERR DEBUG command not allowed. If the
enable-debug-command option is set to "local", you can run it from a
local connection, otherwise you need to set this option in the
configuration file, and then restart the server.
```

4. quicklist fill should be signed int.
The reason for the modification is to eliminate the warning.
Modify `int fill: QL_FILL_BITS` to `signed int fill: QL_FILL_BITS`

The first three were introduced at #9920 (same issue).
And the last one was introduced at #9962.
2021-12-20 12:31:13 +02:00
Oran Agra
6add1b7217
Add external test that runs without debug command (#9964)
- add needs:debug flag for some tests
- disable "save" in external tests (speedup?)
- use debug_digest proc instead of debug command directly so it can be skipped
- use OBJECT ENCODING instead of DEBUG OBJECT to get encoding
- add a proc for OBJECT REFCOUNT so it can be skipped
- move a bunch of tests in latency_monitor tests to happen later so that latency monitor has some values in it
- add missing close_replication_stream calls
- make sure to close the temp client if DEBUG LOG fails
2021-12-19 17:41:51 +02:00
YaacovHazan
ae2f5b7b2e
Protected configs and sensitive commands (#9920)
Block sensitive configs and commands by default.

* `enable-protected-configs` - block modification of configs with the new `PROTECTED_CONFIG` flag.
   Currently we add this flag to `dbfilename`, and `dir` configs,
   all of which are non-mutable configs that can set a file redis will write to.
* `enable-debug-command` - block the `DEBUG` command
* `enable-module-command` - block the `MODULE` command

These have a default value set to `no`, so that these features are not
exposed by default to client connections, and can only be set by modifying the config file.

Users can change each of these to either `yes` (allow all access), or `local` (allow access from
local TCP connections and unix domain connections)

Note that this is a **breaking change** (specifically the part about MODULE command being disabled by default).
I.e. we don't consider DEBUG command being blocked as an issue (people shouldn't have been using it),
and the few configs we protected are unlikely to have been set at runtime anyway.
On the other hand, it's likely to assume some users who use modules, load them from the config file anyway.
Note that's the whole point of this PR, for redis to be more secure by default and reduce the attack surface on
innocent users, so secure defaults will necessarily mean a breaking change.
2021-12-19 10:46:16 +02:00
guybe7
5df070ba39
COMMAND: Use underscores instead of hyphens in attributes (#9959)
some languages can build a json-like object by parsing a textual json,
but it works poorly when attributes contain hyphens

example in JS:
```
let j = JSON.parse(json)
j['key-name'] <- works
j.key-name <= illegal syntax
```
2021-12-18 09:00:42 +02:00
ny0312
792afb4432
Introduce memory management on cluster link buffers (#9774)
Introduce memory management on cluster link buffers:
 * Introduce a new `cluster-link-sendbuf-limit` config that caps memory usage of cluster bus link send buffers.
 * Introduce a new `CLUSTER LINKS` command that displays current TCP links to/from peers.
 * Introduce a new `mem_cluster_links` field under `INFO` command output, which displays the overall memory usage by all current cluster links.
 * Introduce a new `total_cluster_links_buffer_limit_exceeded` field under `CLUSTER INFO` command output, which displays the accumulated count of cluster links freed due to `cluster-link-sendbuf-limit`.
2021-12-16 21:56:59 -08:00
Meir Shpilraien (Spielrein)
687210f155
Add FUNCTION FLUSH command to flush all functions (#9936)
Added `FUNCTION FLUSH` command. The new sub-command allows delete all the functions.
An optional `[SYNC|ASYNC]` argument can be given to control whether or not to flush the
functions synchronously or asynchronously. if not given the default flush mode is chosen by
`lazyfree-lazy-user-flush` configuration values.

Add the missing `functions.tcl` test to the list of tests that are executed in test_helper.tcl,
and call FUNCTION FLUSH in between servers in external mode
2021-12-16 17:58:25 +02:00
yoav-steinberg
70ff26b454
Multiparam config get. (#9914)
Support doing `CONFIG GET <x> <y> <z>`, each of them can also be
a pattern with wildcards.

This avoids duplicates in the result by looping over the configs and for
each once checking all the patterns, once a match is found for a pattern
we move on to the next config.
2021-12-16 09:01:13 +02:00
guybe7
867816003e
Auto-generate the command table from JSON files (#9656)
Delete the hardcoded command table and replace it with an auto-generated table, based
on a JSON file that describes the commands (each command must have a JSON file).

These JSON files are the SSOT of everything there is to know about Redis commands,
and it is reflected fully in COMMAND INFO.

These JSON files are used to generate commands.c (using a python script), which is then
committed to the repo and compiled.

The purpose is:
* Clients and proxies will be able to get much more info from redis, instead of relying on hard coded logic.
* drop the dependency between Redis-user and the commands.json in redis-doc.
* delete help.h and have redis-cli learn everything it needs to know just by issuing COMMAND (will be
  done in a separate PR)
* redis.io should stop using commands.json and learn everything from Redis (ultimately one of the release
  artifacts should be a large JSON, containing all the information about all of the commands, which will be
  generated from COMMAND's reply)
* the byproduct of this is:
  * module commands will be able to provide that info and possibly be more of a first-class citizens
  * in theory, one may be able to generate a redis client library for a strictly typed language, by using this info.

### Interface changes

#### COMMAND INFO's reply change (and arg-less COMMAND)

Before this commit the reply at index 7 contained the key-specs list
and reply at index 8 contained the sub-commands list (Both unreleased).
Now, reply at index 7 is a map of:
- summary - short command description
- since - debut version
- group - command group
- complexity - complexity string
- doc-flags - flags used for documentation (e.g. "deprecated")
- deprecated-since - if deprecated, from which version?
- replaced-by - if deprecated, which command replaced it?
- history - a list of (version, what-changed) tuples
- hints - a list of strings, meant to provide hints for clients/proxies. see https://github.com/redis/redis/issues/9876
- arguments - an array of arguments. each element is a map, with the possibility of nesting (sub-arguments)
- key-specs - an array of keys specs (already in unstable, just changed location)
- subcommands - a list of sub-commands (already in unstable, just changed location)
- reply-schema - will be added in the future (see https://github.com/redis/redis/issues/9845)

more details on these can be found in https://github.com/redis/redis-doc/pull/1697

only the first three fields are mandatory 

#### API changes (unreleased API obviously)

now they take RedisModuleCommand opaque pointer instead of looking up the command by name

- RM_CreateSubcommand
- RM_AddCommandKeySpec
- RM_SetCommandKeySpecBeginSearchIndex
- RM_SetCommandKeySpecBeginSearchKeyword
- RM_SetCommandKeySpecFindKeysRange
- RM_SetCommandKeySpecFindKeysKeynum

Currently, we did not add module API to provide additional information about their commands because
we couldn't agree on how the API should look like, see https://github.com/redis/redis/issues/9944.

### Somehow related changes
1. Literals should be in uppercase while placeholder in lowercase. Now all the GEO* command
   will be documented with M|KM|FT|MI and can take both lowercase and uppercase

### Unrelated changes
1. Bugfix: no_madaory_keys was absent in COMMAND's reply
2. expose CMD_MODULE as "module" via COMMAND
3. have a dedicated uint64 for ACL categories (instead of having them in the same uint64 as command flags)

Co-authored-by: Itamar Haber <itamar@garantiadata.com>
2021-12-15 21:23:15 +02:00
Wen Hui
a09bc5045b
Error message improvement for CONFIG SET command (#9924)
When CONFIG SET fails, print the name of the config that failed.
This is helpful since config set is now variadic.

however, there are cases where several configs have the same apply
function, and we can't be sure which one of them caused the failure.
2021-12-15 09:46:32 +02:00