This change sets a low limit for multibulk and bulk length in the
protocol for unauthenticated connections, so that they can't easily
cause redis to allocate massive amounts of memory by sending just a few
characters on the network.
The new limits are 10 arguments of 16kb each (instead of 1m of 512mb)
Note that this breaks compatibility because in the past doing:
DECRBY x -9223372036854775808
would succeed (and create an invalid result) and now this returns an error.
Remove hard coded multi-bulk limit (was 1,048,576), new limit is INT_MAX.
When client sends an m-bulk that's higher than 1024, we initially only allocate
the argv array for 1024 arguments, and gradually grow that allocation as arguments
are received.
Fixing CI test issues introduced in #8687
- valgrind warnings in readQueryFromClient when client was freed by processInputBuffer
- adding DEBUG pause-cron for tests not to be time dependent.
- skipping a test that depends on socket buffers / events not compatible with TLS
- making sure client got subscribed by not using deferring client
### Description
A mechanism for disconnecting clients when the sum of all connected clients is above a
configured limit. This prevents eviction or OOM caused by accumulated used memory
between all clients. It's a complimentary mechanism to the `client-output-buffer-limit`
mechanism which takes into account not only a single client and not only output buffers
but rather all memory used by all clients.
#### Design
The general design is as following:
* We track memory usage of each client, taking into account all memory used by the
client (query buffer, output buffer, parsed arguments, etc...). This is kept up to date
after reading from the socket, after processing commands and after writing to the socket.
* Based on the used memory we sort all clients into buckets. Each bucket contains all
clients using up up to x2 memory of the clients in the bucket below it. For example up
to 1m clients, up to 2m clients, up to 4m clients, ...
* Before processing a command and before sleep we check if we're over the configured
limit. If we are we start disconnecting clients from larger buckets downwards until we're
under the limit.
#### Config
`maxmemory-clients` max memory all clients are allowed to consume, above this threshold
we disconnect clients.
This config can either be set to 0 (meaning no limit), a size in bytes (possibly with MB/GB
suffix), or as a percentage of `maxmemory` by using the `%` suffix (e.g. setting it to `10%`
would mean 10% of `maxmemory`).
#### Important code changes
* During the development I encountered yet more situations where our io-threads access
global vars. And needed to fix them. I also had to handle keeps the clients sorted into the
memory buckets (which are global) while their memory usage changes in the io-thread.
To achieve this I decided to simplify how we check if we're in an io-thread and make it
much more explicit. I removed the `CLIENT_PENDING_READ` flag used for checking
if the client is in an io-thread (it wasn't used for anything else) and just used the global
`io_threads_op` variable the same way to check during writes.
* I optimized the cleanup of the client from the `clients_pending_read` list on client freeing.
We now store a pointer in the `client` struct to this list so we don't need to search in it
(`pending_read_list_node`).
* Added `evicted_clients` stat to `INFO` command.
* Added `CLIENT NO-EVICT ON|OFF` sub command to exclude a specific client from the
client eviction mechanism. Added corrosponding 'e' flag in the client info string.
* Added `multi-mem` field in the client info string to show how much memory is used up
by buffered multi commands.
* Client `tot-mem` now accounts for buffered multi-commands, pubsub patterns and
channels (partially), tracking prefixes (partially).
* CLIENT_CLOSE_ASAP flag is now handled in a new `beforeNextClient()` function so
clients will be disconnected between processing different clients and not only before sleep.
This new function can be used in the future for work we want to do outside the command
processing loop but don't want to wait for all clients to be processed before we get to it.
Specifically I wanted to handle output-buffer-limit related closing before we process client
eviction in case the two race with each other.
* Added a `DEBUG CLIENT-EVICTION` command to print out info about the client eviction
buckets.
* Each client now holds a pointer to the client eviction memory usage bucket it belongs to
and listNode to itself in that bucket for quick removal.
* Global `io_threads_op` variable now can contain a `IO_THREADS_OP_IDLE` value
indicating no io-threading is currently being executed.
* In order to track memory used by each clients in real-time we can't rely on updating
these stats in `clientsCron()` alone anymore. So now I call `updateClientMemUsage()`
(used to be `clientsCronTrackClientsMemUsage()`) after command processing, after
writing data to pubsub clients, after writing the output buffer and after reading from the
socket (and maybe other places too). The function is written to be fast.
* Clients are evicted if needed (with appropriate log line) in `beforeSleep()` and before
processing a command (before performing oom-checks and key-eviction).
* All clients memory usage buckets are grouped as follows:
* All clients using less than 64k.
* 64K..128K
* 128K..256K
* ...
* 2G..4G
* All clients using 4g and up.
* Added client-eviction.tcl with a bunch of tests for the new mechanism.
* Extended maxmemory.tcl to test the interaction between maxmemory and
maxmemory-clients settings.
* Added an option to flag a numeric configuration variable as a "percent", this means that
if we encounter a '%' after the number in the config file (or config set command) we
consider it as valid. Such a number is store internally as a negative value. This way an
integer value can be interpreted as either a percent (negative) or absolute value (positive).
This is useful for example if some numeric configuration can optionally be set to a percentage
of something else.
Co-authored-by: Oran Agra <oran@redislabs.com>
This commit introduced a new flag to the RM_Call:
'C' - Check if the command can be executed according to the ACLs associated with it.
Also, three new API's added to check if a command, key, or channel can be executed or accessed
by a user, according to the ACLs associated with it.
- RM_ACLCheckCommandPerm
- RM_ACLCheckKeyPerm
- RM_ACLCheckChannelPerm
The user for these API's is a RedisModuleUser object, that for a Module user returned by the RM_CreateModuleUser API, or for a general ACL user can be retrieved by these two new API's:
- RM_GetCurrentUserName - Retrieve the user name of the client connection behind the current context.
- RM_GetModuleUserFromUserName - Get a RedisModuleUser from a user name
As a result of getting a RedisModuleUser from name, it can now also access the general ACL users (not just ones created by the module).
This mean the already existing API RM_SetModuleUserACL(), can be used to change the ACL rules for such users.
This is similar to the recent addition of LMPOP/BLMPOP (#9373), but zset.
Syntax for the new ZMPOP command:
`ZMPOP numkeys [<key> ...] MIN|MAX [COUNT count]`
Syntax for the new BZMPOP command:
`BZMPOP timeout numkeys [<key> ...] MIN|MAX [COUNT count]`
Some background:
- ZPOPMIN/ZPOPMAX take only one key, and can return multiple elements.
- BZPOPMIN/BZPOPMAX take multiple keys, but return only one element from just one key.
- ZMPOP/BZMPOP can take multiple keys, and can return multiple elements from just one key.
Note that ZMPOP/BZMPOP can take multiple keys, it eventually operates on just on key.
And it will propagate as ZPOPMIN or ZPOPMAX with the COUNT option.
As new commands, if we can not pop any elements, the response like:
- ZMPOP: Return a NIL in both RESP2 and RESP3, unlike ZPOPMIN/ZPOPMAX return emptyarray.
- BZMPOP: Return a NIL in both RESP2 and RESP3 when timeout is reached, like BZPOPMIN/BZPOPMAX.
For the normal response is nested arrays in RESP2 and RESP3:
```
ZMPOP/BZMPOP
1) keyname
2) 1) 1) member1
2) score1
2) 1) member2
2) score2
In RESP2:
1) "myzset"
2) 1) 1) "three"
2) "3"
2) 1) "two"
2) "2"
In RESP3:
1) "myzset"
2) 1) 1) "three"
2) (double) 3
2) 1) "two"
2) (double) 2
```
i've seen this CI failure a couple of times on MacOS:
*** [err]: lazy free a stream with all types of metadata in tests/unit/lazyfree.tcl
lazyfree isn't done
only reason i can think of is that 500ms is sometimes not enough on slow systems.
Implements the [LIMIT limit] variant of SINTERCARD/ZINTERCARD.
Now with the LIMIT, we can stop the searching when cardinality
reaching the limit, and return the cardinality ASAP.
Note that in SINTERCARD, the old synatx was: `SINTERCARD key [key ...]`
In order to add a optional parameter, we must break the old synatx.
So the new syntax of SINTERCARD will be consistent with ZINTERCARD.
New syntax: `SINTERCARD numkeys key [key ...] [LIMIT limit]`.
Note that this means that SINTERCARD has a different syntax than
SINTER and SINTERSTORE (taking numkeys argument)
As for ZINTERCARD, we can easily add a optional parameter to it.
New syntax: `ZINTERCARD numkeys key [key ...] [LIMIT limit]`
Fix#7297
The problem:
Today, there is no way for a client library or app to know the key name indexes for commands such as
ZUNIONSTORE/EVAL and others with "numkeys", since COMMAND INFO returns no useful info for them.
For cluster-aware redis clients, this requires to 'patch' the client library code specifically for each of these commands or to
resolve each execution of these commands with COMMAND GETKEYS.
The solution:
Introducing key specs other than the legacy "range" (first,last,step)
The 8th element of the command info array, if exists, holds an array of key specs. The array may be empty, which indicates
the command doesn't take any key arguments or may contain one or more key-specs, each one may leads to the discovery
of 0 or more key arguments.
A client library that doesn't support this key-spec feature will keep using the first,last,step and movablekeys flag which will
obviously remain unchanged.
A client that supports this key-specs feature needs only to look at the key-specs array. If it finds an unrecognized spec, it
must resort to using COMMAND GETKEYS if it wishes to get all key name arguments, but if all it needs is one key in order
to know which cluster node to use, then maybe another spec (if the command has several) can supply that, and there's no
need to use GETKEYS.
Each spec is an array of arguments, first one is the spec name, the second is an array of flags, and the third is an array
containing details about the spec (specific meaning for each spec type)
The initial flags we support are "read" and "write" indicating if the keys that this key-spec finds are used for read or for write.
clients should ignore any unfamiliar flags.
In order to easily find the positions of keys in a given array of args we introduce keys specs. There are two logical steps of
key specs:
1. `start_search`: Given an array of args, indicate where we should start searching for keys
2. `find_keys`: Given the output of start_search and an array of args, indicate all possible indices of keys.
### start_search step specs
- `index`: specify an argument index explicitly
- `index`: 0 based index (1 means the first command argument)
- `keyword`: specify a string to match in `argv`. We should start searching for keys just after the keyword appears.
- `keyword`: the string to search for
- `start_search`: an index from which to start the keyword search (can be negative, which means to search from the end)
Examples:
- `SET` has start_search of type `index` with value `1`
- `XREAD` has start_search of type `keyword` with value `[“STREAMS”,1]`
- `MIGRATE` has start_search of type `keyword` with value `[“KEYS”,-2]`
### find_keys step specs
- `range`: specify `[count, step, limit]`.
- `lastkey`: index of the last key. relative to the index returned from begin_search. -1 indicating till the last argument, -2 one before the last
- `step`: how many args should we skip after finding a key, in order to find the next one
- `limit`: if count is -1, we use limit to stop the search by a factor. 0 and 1 mean no limit. 2 means ½ of the remaining args, 3 means ⅓, and so on.
- “keynum”: specify `[keynum_index, first_key_index, step]`.
- `keynum_index`: is relative to the return of the `start_search` spec.
- `first_key_index`: is relative to `keynum_index`.
- `step`: how many args should we skip after finding a key, in order to find the next one
Examples:
- `SET` has `range` of `[0,1,0]`
- `MSET` has `range` of `[-1,2,0]`
- `XREAD` has `range` of `[-1,1,2]`
- `ZUNION` has `start_search` of type `index` with value `1` and `find_keys` of type `keynum` with value `[0,1,1]`
- `AI.DAGRUN` has `start_search` of type `keyword` with value `[“LOAD“,1]` and `find_keys` of type `keynum` with value
`[0,1,1]` (see https://oss.redislabs.com/redisai/master/commands/#aidagrun)
Note: this solution is not perfect as the module writers can come up with anything, but at least we will be able to find the key
args of the vast majority of commands.
If one of the above specs can’t describe the key positions, the module writer can always fall back to the `getkeys-api` option.
Some keys cannot be found easily (`KEYS` in `MIGRATE`: Imagine the argument for `AUTH` is the string “KEYS” - we will
start searching in the wrong index).
The guarantee is that the specs may be incomplete (`incomplete` will be specified in the spec to denote that) but we never
report false information (assuming the command syntax is correct).
For `MIGRATE` we start searching from the end - `startfrom=-1` - and if one of the keys is actually called "keys" we will
report only a subset of all keys - hence the `incomplete` flag.
Some `incomplete` specs can be completely empty (i.e. UNKNOWN begin_search) which should tell the client that
COMMAND GETKEYS (or any other way to get the keys) must be used (Example: For `SORT` there is no way to describe
the STORE keyword spec, as the word "store" can appear anywhere in the command).
We will expose these key specs in the `COMMAND` command so that clients can learn, on startup, where the keys are for
all commands instead of holding hardcoded tables or use `COMMAND GETKEYS` in runtime.
Comments:
1. Redis doesn't internally use the new specs, they are only used for COMMAND output.
2. In order to support the current COMMAND INFO format (reply array indices 4, 5, 6) we created a synthetic range, called
legacy_range, that, if possible, is built according to the new specs.
3. Redis currently uses only getkeys_proc or the legacy_range to get the keys indices (in COMMAND GETKEYS for
example).
"incomplete" specs:
the command we have issues with are MIGRATE, STRALGO, and SORT
for MIGRATE, because the token KEYS, if exists, must be the last token, we can search in reverse. it one of the keys is
actually the string "keys" will return just a subset of the keys (hence, it's "incomplete")
for SORT and STRALGO we can use this heuristic (the keys can be anywhere in the command) and therefore we added a
key spec that is both "incomplete" and of "unknown type"
if a client encounters an "incomplete" spec it means that it must find a different way (either COMMAND GETKEYS or have
its own parser) to retrieve the keys.
please note that all commands, apart from the three mentioned above, have "complete" key specs
List functions operating on elements by index:
* RM_ListGet
* RM_ListSet
* RM_ListInsert
* RM_ListDelete
Iteration is done using a simple for loop over indices.
The index based functions use an internal iterator as an optimization.
This is explained in the docs:
```
* Many of the list functions access elements by index. Since a list is in
* essence a doubly-linked list, accessing elements by index is generally an
* O(N) operation. However, if elements are accessed sequentially or with
* indices close together, the functions are optimized to seek the index from
* the previous index, rather than seeking from the ends of the list.
*
* This enables iteration to be done efficiently using a simple for loop:
*
* long n = RM_ValueLength(key);
* for (long i = 0; i < n; i++) {
* RedisModuleString *elem = RedisModule_ListGet(key, i);
* // Do stuff...
* }
```
Make bitpos/bitcount support bit index:
```
BITPOS key bit [start [end [BIT|BYTE]]]
BITCOUNT key [start end [BIT|BYTE]]
```
The default behavior is `BYTE`, so these commands are still compatible with old.
Part two of implementing #8702 (zset), after #8887.
## Description of the feature
Replaced all uses of ziplist with listpack in t_zset, and optimized some of the code to optimize performance.
## Rdb format changes
New `RDB_TYPE_ZSET_LISTPACK` rdb type.
## Rdb loading improvements:
1) Pre-expansion of dict for validation of duplicate data for listpack and ziplist.
2) Simplifying the release of empty key objects when RDB loading.
3) Unify ziplist and listpack data verify methods for zset and hash, and move code to rdb.c.
## Interface changes
1) New `zset-max-listpack-entries` config is an alias for `zset-max-ziplist-entries` (same with `zset-max-listpack-value`).
2) OBJECT ENCODING will return listpack instead of ziplist.
## Listpack improvements:
1) Add `lpDeleteRange` and `lpDeleteRangeWithEntry` functions to delete a range of entries from listpack.
2) Improve the performance of `lpCompare`, converting from string to integer is faster than converting from integer to string.
3) Replace `snprintf` with `ll2string` to improve performance in converting numbers to strings in `lpGet()`.
## Zset improvements:
1) Improve the performance of `zzlFind` method, use `lpFind` instead of `lpCompare` in a loop.
2) Use `lpDeleteRangeWithEntry` instead of `lpDelete` twice to delete a element of zset.
## Tests
1) Add some unittests for `lpDeleteRange` and `lpDeleteRangeWithEntry` function.
2) Add zset RDB loading test.
3) Add benchmark test for `lpCompare` and `ziplsitCompare`.
4) Add empty listpack zset corrupt dump test.
Throw an error when a user is provided multiple times on the command line instead of silently throwing one of them away.
Remove unneeded validation for validating users on ACL load.
We want to add COUNT option for BLPOP.
But we can't do it without breaking compatibility due to the command arguments syntax.
So this commit introduce two new commands.
Syntax for the new LMPOP command:
`LMPOP numkeys [<key> ...] LEFT|RIGHT [COUNT count]`
Syntax for the new BLMPOP command:
`BLMPOP timeout numkeys [<key> ...] LEFT|RIGHT [COUNT count]`
Some background:
- LPOP takes one key, and can return multiple elements.
- BLPOP takes multiple keys, but returns one element from just one key.
- LMPOP can take multiple keys and return multiple elements from just one key.
Note that LMPOP/BLMPOP can take multiple keys, it eventually operates on just one key.
And it will propagate as LPOP or RPOP with the COUNT option.
As a new command, it still return NIL if we can't pop any elements.
For the normal response is nested arrays in RESP2 and RESP3, like:
```
LMPOP/BLMPOP
1) keyname
2) 1) element1
2) element2
```
I.e. unlike BLPOP that returns a key name and one element so it uses a flat array,
and LPOP that returns multiple elements with no key name, and again uses a flat array,
this one has to return a nested array, and it does for for both RESP2 and RESP3 (like SCAN does)
Some discuss can see: #766#8824
When a replica paused, it would not apply any commands event the command comes from master, if we feed the non-applied command to replication stream, the replication offset would be wrong, and data would be lost after failover(since replica's `master_repl_offset` grows but command is not applied).
To fix it, here are the changes:
* Don't update replica's replication offset or propagate commands to sub-replicas when it's paused in `commandProcessed`.
* Show `slave_read_repl_offset` in info reply.
* Add an assert to make sure master client should never be blocked unless pause or module (some modules may use block way to do background (parallel) processing and forward original block module command to the replica, it's not a good way but it can work, so the assert excludes module now, but someday in future all modules should rewrite block command to propagate like what `BLPOP` does).
This one follow #9313 and goes deeper (validation of config file parsing)
Move the check/update logic to a new updateClientOutputBufferLimit
function. So that it can be used in CONFIG SET and config file parsing.
In old way, we always increase server.dirty in BITSET and BITFIELD SET.
Even the command doesn't really change anything. This commit make
sure BITSET and BITFIELD SET only increase dirty when the value changed.
Because of that, if the value not changed, some others implications:
- Avoid adding useless AOF
- Reduce replication traffic
- Will not trigger keyspace notifications (setbit)
- Will not invalidate WATCH
- Will not sent the invalidation message to the tracking client
Part one of implementing #8702 (taking hashes first before other types)
## Description of the feature
1. Change ziplist encoded hash objects to listpack encoding.
2. Convert existing ziplists on RDB loading time. an O(n) operation.
## Rdb format changes
1. Add RDB_TYPE_HASH_LISTPACK rdb type.
2. Bump RDB_VERSION to 10
## Interface changes
1. New `hash-max-listpack-entries` config is an alias for `hash-max-ziplist-entries` (same with `hash-max-listpack-value`)
2. OBJECT ENCODING will return `listpack` instead of `ziplist`
## Listpack improvements:
1. Support direct insert, replace integer element (rather than convert back and forth from string)
3. Add more listpack capabilities to match the ziplist ones (like `lpFind`, `lpRandomPairs` and such)
4. Optimize element length fetching, avoid multiple calculations
5. Use inline to avoid function call overhead.
## Tests
1. Add a new test to the RDB load time conversion
2. Adding the listpack unit tests. (based on the one in ziplist.c)
3. Add a few "corrupt payload: fuzzer findings" tests, and slightly modify existing ones.
Co-authored-by: Oran Agra <oran@redislabs.com>
## Current state
1. Lua has its own parser that handles parsing `reds.call` replies and translates them
to Lua objects that can be used by the user Lua code. The parser partially handles
resp3 (missing big number, verbatim, attribute, ...)
2. Modules have their own parser that handles parsing `RM_Call` replies and translates
them to RedisModuleCallReply objects. The parser does not support resp3.
In addition, in the future, we want to add Redis Function (#8693) that will probably
support more languages. At some point maintaining so many parsers will stop
scaling (bug fixes and protocol changes will need to be applied on all of them).
We will probably end up with different parsers that support different parts of the
resp protocol (like we already have today with Lua and modules)
## PR Changes
This PR attempt to unified the reply parsing of Lua and modules (and in the future
Redis Function) by introducing a new parser unit (`resp_parser.c`). The new parser
handles parsing the reply and calls different callbacks to allow the users (another
unit that uses the parser, i.e, Lua, modules, or Redis Function) to analyze the reply.
### Lua API Additions
The code that handles reply parsing on `scripting.c` was removed. Instead, it uses
the resp_parser to parse and create a Lua object out of the reply. As mentioned
above the Lua parser did not handle parsing big numbers, verbatim, and attribute.
The new parser can handle those and so Lua also gets it for free.
Those are translated to Lua objects in the following way:
1. Big Number - Lua table `{'big_number':'<str representation for big number>'}`
2. Verbatim - Lua table `{'verbatim_string':{'format':'<verbatim format>', 'string':'<verbatim string value>'}}`
3. Attribute - currently ignored and not expose to the Lua parser, another issue will be open to decide how to expose it.
Tests were added to check resp3 reply parsing on Lua
### Modules API Additions
The reply parsing code on `module.c` was also removed and the new resp_parser is used instead.
In addition, the RedisModuleCallReply was also extracted to a separate unit located on `call_reply.c`
(in the future, this unit will also be used by Redis Function). A nice side effect of unified parsing is
that modules now also support resp3. Resp3 can be enabled by giving `3` as a parameter to the
fmt argument of `RM_Call`. It is also possible to give `0`, which will indicate an auto mode. i.e, Redis
will automatically chose the reply protocol base on the current client set on the RedisModuleCtx
(this mode will mostly be used when the module want to pass the reply to the client as is).
In addition, the following RedisModuleAPI were added to allow analyzing resp3 replies:
* New RedisModuleCallReply types:
* `REDISMODULE_REPLY_MAP`
* `REDISMODULE_REPLY_SET`
* `REDISMODULE_REPLY_BOOL`
* `REDISMODULE_REPLY_DOUBLE`
* `REDISMODULE_REPLY_BIG_NUMBER`
* `REDISMODULE_REPLY_VERBATIM_STRING`
* `REDISMODULE_REPLY_ATTRIBUTE`
* New RedisModuleAPI:
* `RedisModule_CallReplyDouble` - getting double value from resp3 double reply
* `RedisModule_CallReplyBool` - getting boolean value from resp3 boolean reply
* `RedisModule_CallReplyBigNumber` - getting big number value from resp3 big number reply
* `RedisModule_CallReplyVerbatim` - getting format and value from resp3 verbatim reply
* `RedisModule_CallReplySetElement` - getting element from resp3 set reply
* `RedisModule_CallReplyMapElement` - getting key and value from resp3 map reply
* `RedisModule_CallReplyAttribute` - getting a reply attribute
* `RedisModule_CallReplyAttributeElement` - getting key and value from resp3 attribute reply
* New context flags:
* `REDISMODULE_CTX_FLAGS_RESP3` - indicate that the client is using resp3
Tests were added to check the new RedisModuleAPI
### Modules API Changes
* RM_ReplyWithCallReply might return REDISMODULE_ERR if the given CallReply is in resp3
but the client expects resp2. This is not a breaking change because in order to get a resp3
CallReply one needs to specifically specify `3` as a parameter to the fmt argument of
`RM_Call` (as mentioned above).
Tests were added to check this change
### More small Additions
* Added `debug set-disable-deny-scripts` that allows to turn on and off the commands no-script
flag protection. This is used by the Lua resp3 tests so it will be possible to run `debug protocol`
and check the resp3 parsing code.
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
Add SINTERCARD and ZINTERCARD commands that are similar to
ZINTER and SINTER but only return the cardinality with minimum
processing and memory overheads.
Co-authored-by: Oran Agra <oran@redislabs.com>
Add NX, XX, GT, and LT flags to EXPIRE, PEXPIRE, EXPIREAT, PEXAPIREAT.
- NX - only modify the TTL if no TTL is currently set
- XX - only modify the TTL if there is a TTL currently set
- GT - only increase the TTL (considering non-volatile keys as infinite expire time)
- LT - only decrease the TTL (considering non-volatile keys as infinite expire time)
return value of the command is 0 when the operation was skipped due to one of these flags.
Signed-off-by: Ning Sun <sunng@protonmail.com>
Fixes:
- When a consumer is created as a side effect, redis didn't issue a keyspace notification,
nor incremented the server.dirty (affects periodic snapshots).
this was a bug in XREADGROUP, XCLAIM, and XAUTOCLAIM.
- When attempting to delete a non-existent consumer, don't issue a keyspace notification
and don't increment server.dirty
this was a bug in XGROUP DELCONSUMER
Other changes:
- Changed streamLookupConsumer() to always only do lookup consumer (never do implicit creation),
Its last seen time is updated unless the SLC_NO_REFRESH flag is specified.
- Added streamCreateConsumer() to create a new consumer. When the creation is successful,
it will notify and dirty++ unless the SCC_NO_NOTIFY or SCC_NO_DIRTIFY flags is specified.
- Changed streamDelConsumer() to always only do delete consumer.
- Added keyspace notifications tests about stream events.
With an empty src key, we need to deal with two situations:
1. non-STORE: We should return emptyarray.
2. STORE: Try to delete the store key and return 0.
This applies to both GEOSEARCHSTORE (new to v6.2), and
also GEORADIUS STORE (which was broken since forever)
This pr try to fix#9261. i.e. both STORE variants would have behaved
like the non-STORE variants when the source key was missing,
returning an empty array and not deleting the destination key,
instead of returning 0, and deleting the destination key.
Also add more tests for some commands.
- GEORADIUS: wrong type src key, non existing src key, empty search,
store with non existing src key, store with empty search
- GEORADIUSBYMEMBER: wrong type src key, non existing src key,
non existing member, store with non existing src key
- GEOSEARCH: wrong type src key, non existing src key, empty search,
frommember with non existing member
- GEOSEARCHSTORE: wrong type key, non existing src key,
fromlonlat with empty search, frommember with non existing member
Co-authored-by: Oran Agra <oran@redislabs.com>
GETBIT, SETBIT may access wrong address because of wrap.
BITCOUNT and BITPOS may return wrapped results.
BITFIELD may access the wrong address but also allocate insufficient memory and segfault (see CVE-2021-32761).
This commit uses `uint64_t` or `long long` instead of `size_t`.
related https://github.com/redis/redis/pull/8096
At 32bit platform:
> setbit bit 4294967295 1
(integer) 0
> config set proto-max-bulk-len 536870913
OK
> append bit "\xFF"
(integer) 536870913
> getbit bit 4294967296
(integer) 0
When the bit index is larger than 4294967295, size_t can't hold bit index. In the past, `proto-max-bulk-len` is limit to 536870912, so there is no problem.
After this commit, bit position is stored in `uint64_t` or `long long`. So when `proto-max-bulk-len > 536870912`, 32bit platforms can still be correct.
For 64bit platform, this problem still exists. The major reason is bit pos 8 times of byte pos. When proto-max-bulk-len is very larger, bit pos may overflow.
But at 64bit platform, we don't have so long string. So this bug may never happen.
Additionally this commit add a test cost `512MB` memory which is tag as `large-memory`. Make freebsd ci and valgrind ci ignore this test.
- promote the code in DEBUG PROTOCOL to addReplyBigNum
- DEBUG PROTOCOL ATTRIB skips the attribute when client is RESP2
- networking.c addReply for push and attributes generate assertion when
called on a RESP2 client, anything else would produce a broken
protocol that clients can't handle.
There are two issues fixed in this commit:
1. we want to fail the EXEC command in case there is a watched key that's logically
expired but not yet deleted by active expire or lazy expire.
2. we saw that currently cache time is update in every `call()` (including nested calls),
this time is being also being use for the isKeyExpired comparison, we want to update
the cache time only in the first call (execCommand)
Co-authored-by: Oran Agra <oran@redislabs.com>
due to a copy-paste bug, it used to reply with null response rather than empty array.
this commit includes new tests that are looking at the RESP response directly in
order to be able to tell the difference between them.
Co-authored-by: Oran Agra <oran@redislabs.com>
Modules that use background threads with thread safe contexts are likely
to use RM_BlockClient() without a timeout function, because they do not
set up a timeout.
Before this commit, `CLIENT UNBLOCK` would result with a crash as the
`NULL` timeout callback is called. Beyond just crashing, this is also
logically wrong as it may throw the module into an unexpected client
state.
This commits makes `CLIENT UNBLOCK` on such clients behave the same as
any other client that is not in a blocked state and therefore cannot be
unblocked.
Return a bad score when used with negative count (or count of 1), and non-ziplist encoded zset.
Also add test to validate the return value and cover the issue.
In the past, the first bind address that was explicitly specified was
also used to bind outgoing connections. This could result with some
problems. For example: on some systems using `bind 127.0.0.1` would
result with outgoing connections also binding to `127.0.0.1` and failing
to connect to remote addresses.
With the recent change to the way `bind` is handled, this presented
other issues:
* The default first bind address is '*' which is not a valid address.
* We make no distinction between user-supplied config that is identical
to the default, and the default config.
This commit addresses both these issues by introducing an explicit
configuration parameter to control the bind address on outgoing
connections.
* Specifying an empty `bind ""` configuration prevents Redis from listening on any TCP port. Before this commit, such configuration was not accepted.
* Using `CONFIG GET bind` will always return an explicit configuration value. Before this commit, if a bind address was not specified the returned value was empty (which was an anomaly).
Another behavior change is that modifying the `bind` configuration to a non-default value will NO LONGER DISABLE protected-mode implicitly.
Previously, passing 0 for newlen would not truncate the string at all.
This adds handling of this case, freeing the old string and creating a new empty string.
Other changes:
- Move `src/modules/testmodule.c` to `tests/modules/basics.c`
- Introduce that basic test into the test suite
- Add tests to cover StringTruncate
- Add `test-modules` build target for the main makefile
- Extend `distclean` build target to clean modules too
The `Tracking gets notification of expired keys` test in tracking.tcl
used to hung in valgrind CI quite a lot.
It turns out the reason is that with valgrind and a busy machine, the
server cron active expire cycle could easily run in the same event loop
as the command that created `mykey`, so that when they key got expired,
there were two change events to broadcast, one that set the key and one
that expired it, but since we used raxTryInsert, the client that was
associated with the "last" change was the one that created the key, so
the NOLOOP filtered that event.
This commit adds a test that reproduces the problem by using lazy expire
in a multi-exec which makes sure the key expires in the same event loop
as the one that added it.
Fix test failure which introduced by #9003.
The following case will occur when querybuf expansion will allocate memory equal to (16*1024)k.
1) make use ```CFLAGS=-DNO_MALLOC_USABLE_SIZE```.
2) ```malloc``` will not allocate more under ```alpine```.
Create new module type enhanced callbacks: mem_usage2, free_effort2, unlink2, copy2.
These will be given a context point from which the module can obtain the key name and database id.
In addition the digest and defrag context can now be used to obtain the key name and database id.
When using RESP3, ZPOPMAX/ZPOPMIN should return nested arrays for consistency
with other commands (e.g. ZRANGE).
We do that only when COUNT argument is present (similarly to how LPOP behaves).
for reasoning see https://github.com/redis/redis/issues/8824#issuecomment-855427955
This is a breaking change only when RESP3 is used, and COUNT argument is present!
The initialize memory of `querybuf` is `PROTO_IOBUF_LEN(1024*16) * 2` (due to sdsMakeRoomFor being greedy), under `jemalloc`, the allocated memory will be 40k.
This will most likely result in the `querybuf` being resized when call `clientsCronResizeQueryBuffer` unless the client requests it fast enough.
Note that this bug existed even before #7875, since the condition for resizing includes the sds headers (32k+6).
## Changes
1. Use non-greedy sdsMakeRoomFor when allocating the initial query buffer (of 16k).
1. Also use non-greedy allocation when working with BIG_ARG (we won't use that extra space anyway)
2. in case we did use a greedy allocation, read as much as we can into the buffer we got (including internal frag), to reduce system calls.
3. introduce a dedicated constant for the shrinking (same value as before)
3. Add test for querybuf.
4. improve a maxmemory test by ignoring the effect of replica query buffers (can accumulate many ACKs on slow env)
5. improve a maxmemory by disabling slowlog (it will cause slight memory growth on slow env).
SINTERSTORE would have deleted the dest key right away,
even when later on it is bound to fail on an (WRONGTYPE) error.
With this change it first picks up all the input keys, and only later
delete the dest key if one is empty.
Also add more tests for some commands.
Mainly focus on
- `wrong type error`:
expand test case (base on sinter bug) in non-store variant
add tests for store variant (although it exists in non-store variant, i think it would be better to have same tests)
- the dstkey result when we meet `non-exist key (empty set)` in *store
sdiff:
- improve test case about wrong type error (the one we found in sinter, although it is safe in sdiff)
- add test about using non-exist key (treat it like an empty set)
sdiffstore:
- according to sdiff test case, also add some tests about `wrong type error` and `non-exist key`
- the different is that in sdiffstore, we will consider the `dstkey` result
sunion/sunionstore add more tests (same as above)
sinter/sinterstore also same as above ...
The root cause is that one test (`5 keys in, 5 keys out`) is leaking a volatile key
that can expire while another later test(`All TTL in commands are propagated
as absolute timestamp in replication stream`) is running.
Such leaked expiration injects an unexpected `DEL` command into the
replication command during the later test, causing it to fail.
The fixes are two fold:
1. Plug the leak in the first test.
2. Add FLUSHALL to the later test, to avoid future interference from other tests.
This PR adds a spell checker CI action that will fail future PRs if they introduce typos and spelling mistakes.
This spell checker is based on blacklist of common spelling mistakes, so it will not catch everything,
but at least it is also unlikely to cause false positives.
Besides that, the PR also fixes many spelling mistakes and types, not all are a result of the spell checker we use.
Here's a summary of other changes:
1. Scanned the entire source code and fixes all sorts of typos and spelling mistakes (including missing or extra spaces).
2. Outdated function / variable / argument names in comments
3. Fix outdated keyspace masks error log when we check `config.notify-keyspace-events` in loadServerConfigFromString.
4. Trim the white space at the end of line in `module.c`. Check: https://github.com/redis/redis/pull/7751
5. Some outdated https link URLs.
6. Fix some outdated comment. Such as:
- In README: about the rdb, we used to said create a `thread`, change to `process`
- dbRandomKey function coment (about the dictGetRandomKey, change to dictGetFairRandomKey)
- notifyKeyspaceEvent fucntion comment (add type arg)
- Some others minor fix in comment (Most of them are incorrectly quoted by variable names)
7. Modified the error log so that users can easily distinguish between TCP and TLS in `changeBindAddr`
This commit revives the improves the ability to run the test suite against
external servers, instead of launching and managing `redis-server` processes as
part of the test fixture.
This capability existed in the past, using the `--host` and `--port` options.
However, it was quite limited and mostly useful when running a specific tests.
Attempting to run larger chunks of the test suite experienced many issues:
* Many tests depend on being able to start and control `redis-server` themselves,
and there's no clear distinction between external server compatible and other
tests.
* Cluster mode is not supported (resulting with `CROSSSLOT` errors).
This PR cleans up many things and makes it possible to run the entire test suite
against an external server. It also provides more fine grained controls to
handle cases where the external server supports a subset of the Redis commands,
limited number of databases, cluster mode, etc.
The tests directory now contains a `README.md` file that describes how this
works.
This commit also includes additional cleanups and fixes:
* Tests can now be tagged.
* Tag-based selection is now unified across `start_server`, `tags` and `test`.
* More information is provided about skipped or ignored tests.
* Repeated patterns in tests have been extracted to common procedures, both at a
global level and on a per-test file basis.
* Cleaned up some cases where test setup was based on a previous test executing
(a major anti-pattern that repeats itself in many places).
* Cleaned up some cases where test teardown was not part of a test (in the
future we should have dedicated teardown code that executes even when tests
fail).
* Fixed some tests that were flaky running on external servers.
Till now GET and NX were mutually exclusive.
This change make their combination mean a "Get or Set" command.
If the key exists it returns the old value and avoids setting,
and if it does't exist it returns nil and sets it to the new value (possibly with expiry time)
The decision to stop trimming due to LIMIT in XADD and XTRIM was after the limit was reached.
i.e. the code was deleting **at least** that count of records (from the LIMIT argument's perspective, not the MAXLEN),
instead of **up to** that count of records.
see #9046
The test that was merged yesterday fails with valgrind and freebsd CI
that are too slow, and 10 seconds apparently passed between the time the
command was sent to redis and the time it was actually executed.
```
*** [err]: All TTL in commands are propagated as absolute timestamp in replication stream in tests/unit/expire.tcl
Expected 'del a' to match 'set foo1 bar PXAT *' (context: type source line 778 file /home/runner/work/redis/redis/tests/test_helper.tcl cmd {assert_match [lindex $patterns $j] [read_from_replication_stream $s]} proc ::assert_replication_stream level 1)
```
Till now, on replica full-sync we used to transfer absolute time for TTL,
however when a command arrived (EXPIRE or EXPIREAT),
we used to propagate it as is to replicas (possibly with relative time),
but always translate it to EXPIREAT (absolute time) to AOF.
This commit changes that and will always use absolute time for propagation.
see discussion in #8433
Furthermore, we Introduce new commands: `EXPIRETIME/PEXPIRETIME`
that allow extracting the absolute TTL time from a key.
When test stop 'load handler' by killing the process that generating the load,
some commands that already in the input buffer, still might be processed by the server.
This may cause some instability in tests, that count on that no more commands
processed after we stop the `load handler'
In this commit, new proc 'wait_load_handlers_disconnected' added, to verify that no more
cammands from any 'load handler' prossesed, by checking that the clients who
genreate the load is disconnceted.
Also, replacing check of dbsize with wait_for_ofs_sync before comparing debug digest, as
it would fail in case the last key the workload wrote was an overridden key (not a new one).
Affected tests
Race fix:
- failover command to specific replica works
- Connect multiple replicas at the same time (issue #141), master diskless=$mdl, replica diskless=$sdl
- AOF rewrite during write load: RDB preamble=$rdbpre
Cleanup and speedup:
- Test replication with blocking lists and sorted sets operations
- Test replication with parallel clients writing in different DBs
- Test replication partial resync: $descr (diskless: $mdl, $sdl, reconnect: $reconnect
I recently saw this failure:
[err]: lazy free a stream with all types of metadata in tests/unit/lazyfree.tcl
Expected '2' to be equal to '1' (context: type eval line 23 cmd {assert_equal [s lazyfreed_objects] 1} proc ::test)
The only explanation for such a thing is that the async flushdb wasn't
done before we did the resetstat
When client breached the output buffer soft limit but then went idle,
we didn't disconnect on soft limit timeout, now we do.
Note this also resolves some sporadic test failures in due to Linux
buffering data which caused tests to fail if during the test we went
back under the soft COB limit.
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: sundb <sundbcn@gmail.com>
Adding a new type mask for key space notification, REDISMODULE_NOTIFY_MODULE, to enable unique notifications from commands on REDISMODULE_KEYTYPE_MODULE type keys (which is currently unsupported).
Modules can subscribe to a module key keyspace notification by RM_SubscribeToKeyspaceEvents,
and clients by notify-keyspace-events of redis.conf or via the CONFIG SET, with the characters 'd' or 'A'
(REDISMODULE_NOTIFY_MODULE type mask is part of the '**A**ll' notation for key space notifications).
Refactor: move some pubsub test infra from pubsub.tcl to util.tcl to be re-used by other tests.
Before this commit using RM_Call without "!" could cause the master
to lazy-expire a key (delete it) but without replicating to replicas.
This could cause the replica's memory usage to gradually grow and
could also cause consistency issues if the master and replica have
a clock diff.
This bug was introduced in #8617
Added a test which demonstrates that scenario.
In the initial release of Redis 6.2 setting a user to only allow pubsub access to
a specific channel, and doing ACL SAVE, resulted in an assertion when
ACL LOAD was used. This was later changed by #8723 (not yet released),
but still not properly resolved (now it errors instead of crash).
The problem is that the server that generates an ACL file, doesn't know what
would be the setting of the acl-pubsub-default config in the server that will load it.
so ACL SAVE needs to always start with resetchannels directive.
This should still be compatible with old acl files (from redis 6.0), and ones from earlier
versions of 6.2 that didn't mess with channels.
Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
The tail size of c->reply is 16kb, but in the test only publish a
few chars each time, due to a change in #8699, the obuf limit
is now checked a new memory allocation is made, so this test
would have sometimes failed to trigger a soft limit disconnection
in time.
The solution is to write bigger payloads to the output buffer, but
still limit their rate (not more than 100k/s).
Fix out of range error messages to be clearer (avoid mentioning 9223372036854775807)
* Fix XAUTOCLAIM COUNT option confusing error msg
* Fix other RPOP and alike error message to mention positive
Background:
Redis 6.2 added ACL control for pubsub channels (#7993), which were supposed
to be permissive by default to retain compatibility with redis 6.0 ACL.
But due to a bug, only newly created users got this `acl-pubsub-default` applied,
while overwritten (updated) users got reset to `resetchannels` (denied).
Since the "default" user exists before loading the config file,
any ACL change to it, results in an update / overwrite.
So when a "default" user is loaded from config file or include ACL
file with no channels related rules, the user will not have any
permissions to any channels. But other users will have default
permissions to any channels.
When upgraded from 6.0 with config rewrite, this will lead to
"default" user channels permissions lost.
When users are loaded from include file, then call "acl load", users
will also lost channels permissions.
Similarly, the `reset` ACL rule, would have reset the user to be denied
access to any channels, ignoring `acl-pubsub-default` and breaking
compatibility with redis 6.0.
The implication of this fix is that it regains compatibility with redis 6.0,
but breaks compatibility with redis 6.2.0 and 2.0.1. e.g. after the upgrade,
the default user will regain access to pubsub channels.
Other changes:
Additionally this commit rename server.acl_pubusub_default to
server.acl_pubsub_default and fix typo in acl tests.
This command used to return the last scanned entry id as the cursor,
instead of the next one to be scanned.
so in the next call, the user could / should have sent `(cursor` and not
just `cursor` if he wanted to avoid scanning the same record twice.
Scanning the record twice would look odd if someone is checking what
exactly was scanned, but it also has a side effect of incrementing the
delivery count twice.
If GT/LT fails the operation we need to reply with
nill (like failure due to NX).
Other changes:
Add the missing $encoding suffix to many zset tests
Note: there's a behavior change just in case of INCR + GT/LT that fails.
The old code was replying with the wrong (rejected) score, and now it'll reply with nil.
Note that that's anyway a corner case so this "behavior change" shouldn't have too much affect.
Using GT/LT with INCR has a predictable result even before we run the command
(INCR GT will only only / always fail if the increment is negative).
Problem:
Currently, when performing random distribution verification, we determine
the probability of each element occurring in the sum, but the probability is
only an estimate, these tests had rare sporadic failures, and we cannot verify
what the probability of failure will be.
Solution:
Using the chi-square distribution instead of the original random distribution
validation makes the test more reasonable and easier to find problems.
the bug was also discussed in #8716, and was solved in #8719, but incompletely:
when the server is started, and the save option is default, if you issue the " config set save "" "
to change the save option, and then issue the “config rewrite” command, the " save "" " won't be saved.
'processCommandAndResetClient' returns 1 if client is dead. It does it
by checking if serve.current_client is NULL. On script timeout, Redis will re-enter
'processCommandAndResetClient' and when finish we will set server.current_client
to NULL. This will cause later to falsely return 1 and think that the client that
sent the timed-out script is dead (Redis to stop reading from the client buffer).
Add publish channel permissions check in processCommand.
processCommand didn't check publish channel permissions, so we can
queue a publish command in a transaction. But when exec the transaction,
it will fail with -NOPERM.
We also union keys/commands/channels permissions check togegher in
ACLCheckAllPerm. Remove pubsubCheckACLPermissionsOrReply in
publishCommand/subscribeCommand/psubscribeCommand. Always
check permissions in processCommand/execCommand/
luaRedisGenericCommand.
* SLOWLOG didn't record anything for blocked commands because the client
was reset and argv was already empty. there was a fix for this issue
specifically for modules, now it works for all blocked clients.
* The original command argv (before being re-written) was also reset
before adding the slowlog on behalf of the blocked command.
* Latency monitor is now updated regardless of the slowlog flags of the
command or its execution (their purpose is to hide sensitive info from
the slowlog, not hide the fact the latency happened).
* Latency monitor now uses real_cmd rather than c->cmd (which may be
different if the command got re-written, e.g. GEOADD)
Changes:
* Unify shared code between slowlog insertion in call() and
updateStatsOnUnblock(), hopefully prevent future bugs from happening
due to the later being overlooked.
* Reset CLIENT_PREVENT_LOGGING in resetClient rather than after command
processing.
* Add a test for SLOWLOG and BLPOP
Notes:
- real_cmd == c->lastcmd, except inside MULTI and Lua.
- blocked commands never happen in these cases (MULTI / Lua)
- real_cmd == c->cmd, except for when the command is rewritten (e.g.
GEOADD)
- blocked commands (currently) are never rewritten
- other than the command's CLIENT_PREVENT_LOGGING, and the
execution flag CLIENT_PREVENT_LOGGING, other cases that we want to
avoid slowlog are on AOF loading (specifically CMD_CALL_SLOWLOG will
be off when executed from execCommand that runs from an AOF)
pcall function runs another LUA function in protected mode, this means
that any error will be caught by this function and will not stop the LUA
execution. The script kill mechanism uses error to stop the running script.
Scripts that uses pcall can catch the error raise by the script kill mechanism,
this will cause a script like this to be unkillable:
local f = function()
while 1 do
redis.call('ping')
end
end
while 1 do
pcall(f)
end
The fix is, when we want to kill the script, we set the hook function to be invoked
after each line. This will promise that the execution will get another
error before it is able to enter the pcall function again.
1. moduleReplicateMultiIfNeeded should use server.in_eval like
moduleHandlePropagationAfterCommandCallback
2. server.in_eval could have been set to 1 and not reset back
to 0 (a lot of missed early-exits after in_eval is already 1)
Note: The new assertions in processCommand cover (2) and I added
two module tests to cover (1)
Implications:
If an EVAL that failed (and thus left server.in_eval=1) runs before a module
command that replicates, the replication stream will contain MULTI (because
moduleReplicateMultiIfNeeded used to check server.lua_caller which is NULL
at this point) but not EXEC (because server.in_eval==1)
This only affects modules as module.c the only user of server.in_eval.
Affects versions 6.2.0, 6.2.1
Bug 1:
When a module ctx is freed moduleHandlePropagationAfterCommandCallback
is called and handles propagation. We want to prevent it from propagating
commands that were not replicated by the same context. Example:
1. module1.foo does: RM_Replicate(cmd1); RM_Call(cmd2); RM_Replicate(cmd3)
2. RM_Replicate(cmd1) propagates MULTI and adds cmd1 to also_propagagte
3. RM_Call(cmd2) create a new ctx, calls call() and destroys the ctx.
4. moduleHandlePropagationAfterCommandCallback is called, calling
alsoPropagates EXEC (Note: EXEC is still not written to socket),
setting server.in_trnsaction = 0
5. RM_Replicate(cmd3) is called, propagagting yet another MULTI (now
we have nested MULTI calls, which is no good) and then cmd3
We must prevent RM_Call(cmd2) from resetting server.in_transaction.
REDISMODULE_CTX_MULTI_EMITTED was revived for that purpose.
Bug 2:
Fix issues with nested RM_Call where some have '!' and some don't.
Example:
1. module1.foo does RM_Call of module2.bar without replication (i.e. no '!')
2. module2.bar internally calls RM_Call of INCR with '!'
3. at the end of module1.foo we call RM_ReplicateVerbatim
We want the replica/AOF to see only module1.foo and not the INCR from module2.bar
Introduced a global replication_allowed flag inside RM_Call to determine
whether we need to replicate or not (even if '!' was specified)
Other changes:
Split beforePropagateMultiOrExec to beforePropagateMulti afterPropagateExec
just for better readability
It seems like non-Linux sockets may be less greedy, resulting with more
transient client output buffers.
Haven't proven this but empirically when stressing this test on
non-Linux tends to exhibit increased mem_clients_normal values.
Add ability to modify port, tls-port and bind configurations by CONFIG SET command.
To simplify the code and make it cleaner, a new structure
added, socketFds, which contains the file descriptors array and its counter,
and used for TCP, TLS and Cluster sockets file descriptors.
Because when the RM_Call is invoked. It will create a faker client.
The point is client connection is NULL, so server will crash in connGetInfo
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
A single client pointer is added in the server struct. This is
initialized by the first RM_Call() and reused for every subsequent
RM_Call() except if it's already in use, which means that it's not
used for (recursive) module calls to modules. For these, a new
"fake" client is created each time.
Other changes:
* Avoid allocating a dict iterator in pubsubUnsubscribeAllChannels
when not needed
This validation was only done for sub-commands and not for commands.
These would have been valid (not produce any error)
ACL SETUSER bob +@all +client
ACL SETUSER bob +client +client
so no reason for this one to fail:
ACL SETUSER bob +client +client|id
One example why this is needed is that pfdebug wasn't part of the @hyperloglog
group and now it is. so something like:
acl setuser user1 +@hyperloglog +pfdebug|test
would have succeeded in early 6.0.x, and fail in 6.2 RC3
Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
When redis responds with tracking-redir-broken push message (RESP3),
it was responding with a broken protocol: an array of 3 elements, but only
pushes 2 elements.
Some bugs in the test make this pass. Read the push reply
will consume an extra reply, because the reply length is 3, but there
are only two elements, so the next reply will be treated as third
element. So the test is corrected too.
Other changes:
* checkPrefixCollisionsOrReply success should return 1 instead of -1,
this bug didn't have any implications.
* improve client tracking tests to validate more of the response it reads.
Respond with error if expire time overflows from positive to negative of vice versa.
* `SETEX`, `SET EX`, `GETEX` etc would have already error on negative value,
but now they would also error on overflows (i.e. when the input was positive but
after the manipulation it becomes negative, which would have passed before)
* `EXPIRE` and `EXPIREAT` was ok taking negative values (would implicitly delete
the key), we keep that, but we do error if the user provided a value that changes
sign when manipulated (except the case of changing sign when `basetime` is added)
Signed-off-by: Gnanesh <gnaneshkunal@outlook.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
There are two tests in other.tcl that were dependant of the sha1 package
import which meant that they didn't usually run.
The reason it was like that was that prior to the creation of DEBUG
DIGEST, the test suite used to have an equivalent function, but that's
no longer the case and this dependency isn't needed.
The other change is to revert config changes done by the test before the
test suite continues. can be useful if using `--host` to run multiple
units against the same server
The added flag affects the return value of RM_HashSet() to include
the number of inserted fields, in addition to updated and deleted
fields.
errno is set on errors, tests are added and documentation updated.
- removes time sensitive checks from block on background tests during leak checks.
- fix uninitialized variable on RedisModuleBlockedClient() when calling
RM_BlockedClientMeasureTimeEnd() without RM_BlockedClientMeasureTimeStart()
The test failed from time to time on Github actions.
We think it's possible that on the module's blocking timeout
time tracking test, the timeout is happening prior we issue the
RedisModule_BlockedClientMeasureTimeStart(bc) on the
background thread. If that is the case one possible solution
is to increase the timeout.
Increasing to 200ms to 500ms to see if nightly stops failing.
Without this fix, RM_ZsetRem can leave empty sorted sets which are
not allowed to exist.
Removing from a sorted set while iterating seems to work (while
inserting causes failed assetions). RM_ZsetRangeEndReached is
modified to return 1 if the key doesn't exist, to terminate
iteration when the last element has been removed.
Changes to HRANDFIELD and ZRANDMEMBER:
* Fix risk of OOM panic when client query a very big negative count (avoid allocating huge temporary buffer).
* Fix uneven random distribution in HRANDFIELD with negative count (wasn't using dictGetFairRandomKey).
* Add tests to check an even random distribution (HRANDFIELD, SRANDMEMBER, ZRANDMEMBER).
Co-authored-by: Oran Agra <oran@redislabs.com>
Fix errors of GEOSEARCH bybox search due to:
1. projection of the box to a trapezoid (when the meter box is converted to long / lat it's no longer a box).
2. width and height mismatch
Changes:
- New GEOSEARCH point in rectangle algorithm
- Fix GEOSEARCH bybox width and height mismatch bug
- Add GEOSEARCH bybox testing to the existing "GEOADD + GEORANGE randomized test"
- Add new fuzzy test to stress test the bybox corners and edges
- Add some tests for edge cases of the bybox algorithm
Co-authored-by: Oran Agra <oran@redislabs.com>
This commit enables tracking time of the background tasks and on replies,
opening the door for properly tracking commands that rely on blocking / background
work via the slowlog, latency history, and commandstats.
Some notes:
- The time spent blocked waiting for key changes, or blocked on synchronous
replication is not accounted for.
- **This commit does not affect latency tracking of commands that are non-blocking
or do not have background work.** ( meaning that it all stays the same with exception to
`BZPOPMIN`,`BZPOPMAX`,`BRPOP`,`BLPOP`, etc... and module's commands that rely
on background threads ).
- Specifically for latency history command we've added a new event class named
`command-unblocking` that will enable latency monitoring on commands that spawn
background threads to do the work.
- For blocking commands we're now considering the total time of a command as the
time spent on call() + the time spent on replying when unblocked.
- For Modules commands that rely on background threads we're now considering the
total time of a command as the time spent on call (main thread) + the time spent on
the background thread ( if marked within `RedisModule_MeasureTimeStart()` and
`RedisModule_MeasureTimeEnd()` ) + the time spent on replying (main thread)
To test for this feature we've added a `unit/moduleapi/blockonbackground` test that relies on
a module that blocks the client and sleeps on the background for a given time.
- check blocked command that uses RedisModule_MeasureTimeStart() is tracking background time
- check blocked command that uses RedisModule_MeasureTimeStart() is tracking background time even in timeout
- check blocked command with multiple calls RedisModule_MeasureTimeStart() is tracking the total background time
- check blocked command without calling RedisModule_MeasureTimeStart() is not reporting background time
New commands:
`HRANDFIELD [<count> [WITHVALUES]]`
`ZRANDMEMBER [<count> [WITHSCORES]]`
Algorithms are similar to the one in SRANDMEMBER.
Both return a simple bulk response when no arguments are given, and an array otherwise.
In case values/scores are requested, RESP2 returns a long array, and RESP3 a nested array.
note: in all 3 commands, the only option that also provides random order is the one with negative count.
Changes to SRANDMEMBER
* Optimization when count is 1, we can use the more efficient algorithm of non-unique random
* optimization: work with sds strings rather than robj
Other changes:
* zzlGetScore: when zset needs to convert string to double, we use safer memcpy (in
case the buffer is too small)
* Solve a "bug" in SRANDMEMBER test: it intended to test a positive count (case 3 or
case 4) and by accident used a negative count
Co-authored-by: xinluton <xinluton@qq.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
APIs added for these stream operations: add, delete, iterate and
trim (by ID or maxlength). The functions are prefixed by RM_Stream.
* RM_StreamAdd
* RM_StreamDelete
* RM_StreamIteratorStart
* RM_StreamIteratorStop
* RM_StreamIteratorNextID
* RM_StreamIteratorNextField
* RM_StreamIteratorDelete
* RM_StreamTrimByLength
* RM_StreamTrimByID
The type RedisModuleStreamID is added and functions for converting
from and to RedisModuleString.
* RM_CreateStringFromStreamID
* RM_StringToStreamID
Whenever the stream functions return REDISMODULE_ERR, errno is set to
provide additional error information.
Refactoring: The zset iterator fields in the RedisModuleKey struct
are wrapped in a union, to allow the same space to be used for type-
specific info for streams and allow future use for other key types.
if option `set-proc-title' is no, then do nothing for proc title.
The reason has been explained long ago, see following:
We update redis to 2.8.8, then found there are some side effect when
redis always change the process title.
We run several slave instance on one computer, and all these salves
listen on unix socket only, then ps will show:
1 S redis 18036 1 0 80 0 - 56130 ep_pol 14:02 ? 00:00:31 /usr/sbin/redis-server *:0
1 S redis 23949 1 0 80 0 - 11074 ep_pol 15:41 ? 00:00:00 /usr/sbin/redis-server *:0
for redis 2.6 the output of ps is like following:
1 S redis 18036 1 0 80 0 - 56130 ep_pol 14:02 ? 00:00:31 /usr/sbin/redis-server /etc/redis/a.conf
1 S redis 23949 1 0 80 0 - 11074 ep_pol 15:41 ? 00:00:00 /usr/sbin/redis-server /etc/redis/b.conf
Later is more informational in our case. The situation
is worse when we manage the config and process running
state by salt. Salt check the process by running "ps |
grep SIG" (for Gentoo System) to check the running
state, where SIG is the string to search for when
looking for the service process with ps. Previously, we
define sig as "/usr/sbin/redis-server
/etc/redis/a.conf". Since the ps output is identical for
our case, so we have no way to check the state of
specified redis instance.
So, for our case, we prefer the old behavior, i.e, do
not change the process title for the main redis process.
Or add an option such as "set-proc-title [yes|no]" to
control this behavior.
Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
This commit introduces two new command and two options for an existing command
GETEX <key> [PERSIST][EX seconds][PX milliseconds] [EXAT seconds-timestamp]
[PXAT milliseconds-timestamp]
The getexCommand() function implements extended options and variants of the GET
command. Unlike GET command this command is not read-only. Only one of the options
can be used at a given time.
1. PERSIST removes any TTL associated with the key.
2. EX Set expiry TTL in seconds.
3. PX Set expiry TTL in milliseconds.
4. EXAT Same like EX instead of specifying the number of seconds representing the
TTL (time to live), it takes an absolute Unix timestamp
5. PXAT Same like PX instead of specifying the number of milliseconds representing the
TTL (time to live), it takes an absolute Unix timestamp
Command would return either the bulk string, error or nil.
GETDEL <key>
Would delete the key after getting.
SET key value [NX] [XX] [KEEPTTL] [GET] [EX <seconds>] [PX <milliseconds>]
[EXAT <seconds-timestamp>][PXAT <milliseconds-timestamp>]
Two new options added here are EXAT and PXAT
Key implementation notes
- `SET` with `PX/EX/EXAT/PXAT` is always translated to `PXAT` in `AOF`. When relative time is
specified (`PX/EX`), replication will always use `PX`.
- `setexCommand` and `psetexCommand` would no longer need translation in `feedAppendOnlyFile`
as they are modified to invoke `setGenericCommand ` with appropriate flags which will take care of
correct AOF translation.
- `GETEX` without any optional argument behaves like `GET`.
- `GETEX` command is never propagated, It is either propagated as `PEXPIRE[AT], or PERSIST`.
- `GETDEL` command is propagated as `DEL`
- Combined the validation for `SET` and `GETEX` arguments.
- Test cases to validate AOF/Replication propagation
It was confusing as to why these don't return a map type.
the reason is that order matters, so we need to make sure the client
library knows to respect it.
Added comments in the implementation and tests to cover it.
some tests use attach_to_replication_stream to watch what's propagated
to replicas, but in some cases the periodic ping may slip in and fail
the test.
we disable that ping by setting the period to once an hour (tests should
not run for that long).
other change is so that the next time this oom-score-adj test fails,
we'll see the value (assert_equals prints it)
1. Valgrind leak in a recent change in a module api test
2. Increase treshold of a RESTORE TTL test
3. Change assertions to use assert_range which prints the values
BLPOP and other blocking list commands can only block on empty keys
and LPUSH only wakes up clients when the list is created.
Using the module API, it's possible to block on a non-empty key.
Unblocking a client blocked on a non-empty list (or zset) can only
be done using RedisModule_SignalKeyAsReady(). This commit tests it.
the test was misleading because the module would actually woke up on a wrong type and
re-blocked, while the test name suggests the module doesn't not wake up at all on a wrong type..
i changed the name of the test + added verification that indeed the module wakes up and gets
re-blocked after it understand it's the wrong type
This commit adds tests to make sure that relative and absolute expire commands
are propagated as is to replicas and stop any future attempt to change that without
a proper discussion. see #8327 and #5171
Additionally it slightly improve the AOF test that tests the opposite (always
propagating absolute times), by covering more commands, and shaving 2
seconds from the test time.
This was a regression from #7625 (only in 6.2 RC2).
This makes it possible again to implement blocking list and zset
commands using the modules API.
This commit also includes a test case for the reverse: A module
unblocks a client blocked on BLPOP by inserting elements using
RedisModule_ListPush(). This already works, but it was untested.
This adds basic coverage to IO threads by running the cluster and few selected Redis test suite tests with the IO threads enabled.
Also provides some necessary additional improvements to the test suite:
* Add --config to sentinel/cluster tests for arbitrary configuration.
* Fix --tags whitelisting which was broken.
* Add a `network` tag to some tests that are more network intensive. This is work in progress and more tests should be properly tagged in the future.
* Adds ASYNC and SYNC arguments to SCRIPT FLUSH
* Adds SYNC argument to FLUSHDB and FLUSHALL
* Adds new config to control the default behavior of FLUSHDB, FLUSHALL and SCRIPT FLUASH.
the new behavior is as follows:
* FLUSH[ALL|DB],SCRIPT FLUSH: Determine sync or async according to the
value of lazyfree-lazy-user-flush.
* FLUSH[ALL|DB],SCRIPT FLUSH ASYNC: Always flushes the database in an async manner.
* FLUSH[ALL|DB],SCRIPT FLUSH SYNC: Always flushes the database in a sync manner.
This fixes three issues:
1. Using debug SLEEP was impacting the subsequent test, and causing it to pass reliably even though it should have failed. There was exactly 5 seconds of artificial pause (after 1000, wait 3000, wait 1000) between the debug sleep 5 and when we needed to unblock the client in the subsequent test. Now the test properly makes sure the client is unblocked, and the subsequent test is fixed.
2. Minor, the client pause types were using & comparisons instead of ==, since it was previously a flag.
3. Test is faster now that some of the hand wavy time is removed.
This PR adds another trimming strategy to XADD and XTRIM named MINID
(complements the existing MAXLEN).
It also adds a new LIMIT argument that allows incremental trimming by repeated
calls (rather than all at once).
This provides the ability to trim all records older than a certain ID (which makes it
possible for the user to trim by age too).
Example:
XTRIM mystream MINID ~ 1608540753 will trim entries with id < 1608540753,
but might not trim all (because of the ~ modifier)
The purpose is to ease the use of streams. many users use streams as logs and
the common case is wanting a log
of the last X seconds rather than a log that contains maximum X entries (new
MINID vs existing MAXLEN)
The new LIMIT modifier is only supported when the trim strategy uses ~.
i.e. when the user asked for exact trimming, it all happens in one go (no
possibility for incremental trimming).
However, when ~ is provided, we trim full rax nodes, up to the limit number
of records.
The default limit is 100*stream_node_max_entries (used when LIMIT is not
provided).
I.e. this is a behavior change (even if the existing MAXLEN strategy is used).
An explicit limit of 0 means unlimited (but note that it's not the default).
Other changes:
Refactor arg parsing code for XADD and XTRIM to use common code.
The defragger works well on these systems, but the tests and their
thresholds are not adjusted for these big pages, so the defragger isn't
able to get down the fragmentation to the levels the test expects and it
fails on "defrag didn't stop".
Randomly choosing 8k as the threshold for the skipping
Fixes#8265 (which had 65k pages)
Add ZRANGESTORE command, and improve ZSTORE command to deprecated Z[REV]RANGE[BYSCORE|BYLEX].
Syntax for the new ZRANGESTORE command:
ZRANGESTORE [BYSCORE | BYLEX] [REV] [LIMIT offset count]
New syntax for ZRANGE:
ZRANGE [BYSCORE | BYLEX] [REV] [WITHSCORES] [LIMIT offset count]
Old syntax for ZRANGE:
ZRANGE [WITHSCORES]
Other ZRANGE commands remain unchanged.
The implementation uses common code for all of these, by utilizing a consumer interface that in one
command response to the client, and in the other command stores a zset key.
Co-authored-by: Oran Agra <oran@redislabs.com>
New command: XAUTOCLAIM <key> <group> <consumer> <min-idle-time> <start> [COUNT <count>] [JUSTID]
The purpose is to claim entries from a stale consumer without the usual
XPENDING+XCLAIM combo which takes two round trips.
The syntax for XAUTOCLAIM is similar to scan: A cursor is returned (streamID)
by each call and should be used as start for the next call. 0-0 means the scan is complete.
This PR extends the deferred reply mechanism for any bulk string (not just counts)
This PR carries some unrelated test code changes:
- Renames the term "client" into "consumer" in the stream-cgroups test
- And also changes DEBUG SLEEP into "after"
Co-authored-by: Oran Agra <oran@redislabs.com>
When a Lua script returns a map to redis (a feature which was added in
redis 6 together with RESP3), it would have returned the value first and
the key second.
If the client was using RESP2, it was getting them out of order, and if
the client was in RESP3, it was getting a map of value => key.
This was happening regardless of the Lua script using redis.setresp(3)
or not.
This also affects a case where the script was returning a map which it got
from from redis by doing something like: redis.setresp(3); return redis.call()
This fix is a breaking change for redis 6.0 users who happened to rely
on the wrong order (either ones that used redis.setresp(3), or ones that
returned a map explicitly).
This commit also includes other two changes in the tests:
1. The test suite now handles RESP3 maps as dicts rather than nested
lists
2. Remove some redundant (duplicate) tests from tracking.tcl
This PR not only fixes the problem that swapdb does not make the
transaction fail, but also optimizes the FLUSHALL and FLUSHDB command to
set the CLIENT_DIRTY_CAS flag to avoid unnecessary traversal of clients.
FLUSHDB was changed to first iterate on all watched keys, and then on the
clients watching each key.
Instead of iterating though all clients, and for each iterate on watched keys.
Co-authored-by: Oran Agra <oran@redislabs.com>
New command flags similar to what SADD already has.
Co-authored-by: huangwei03 <huangwei03@kuaishou.com>
Co-authored-by: Itamar Haber <itamar@redislabs.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
This Commit pushes forward the observability on overall error statistics and command statistics within redis-server:
It extends INFO COMMANDSTATS to have
- failed_calls in - so we can keep track of errors that happen from the command itself, broken by command.
- rejected_calls - so we can keep track of errors that were triggered outside the commmand processing per se
Adds a new section to INFO, named ERRORSTATS that enables keeping track of the different errors that
occur within redis ( within processCommand and call ) based on the reply Error Prefix ( The first word
after the "-", up to the first space ).
This commit also fixes RM_ReplyWithError so that it can be correctly identified as an error reply.
Adds: `L/RPOP <key> [count]`
Implements no. 2 of the following strategies:
1. Loop on listTypePop - this would result in multiple calls for memory freeing and allocating (see 769167a079)
2. Iterate the range to build the reply, then call quickListDelRange - this requires two iterations and **is the current choice**
3. Refactor quicklist to have a pop variant of quickListDelRange - probably optimal but more complex
Also:
* There's a historical check for NULL after calling listTypePop that was converted to an assert.
* This refactors common logic shared between LRANGE and the new form of LPOP/RPOP into addListRangeReply (adds test for b/w compat)
* Consequently, it may have made sense to have `LRANGE l -1 -2` and `LRANGE l 9 0` be legit and return a reverse reply. Due to historical reasons that would be, however, a breaking change.
* Added minimal comments to existing commands to adhere to the style, make core dev life easier and get commit karma, naturally.
In the distant history there was only the read flag for commands, and whatever
command that didn't have the read flag was a write one.
Then we added the write flag, but some portions of the code still used !read
Also some commands that don't work on the keyspace at all, still have the read
flag.
Changes in this commit:
1. remove the read-only flag from TIME, ECHO, ROLE and LASTSAVE
2. EXEC command used to decides if it should propagate a MULTI by looking at
the command flags (!read & !admin).
When i was about to change it to look at the write flag instead, i realized
that this would cause it not to propagate a MULTI for PUBLISH, EVAL, and
SCRIPT, all 3 are not marked as either a read command or a write one (as
they should), but all 3 are calling forceCommandPropagation.
So instead of introducing a new flag to denote a command that "writes" but
not into the keyspace, and still needs propagation, i decided to rely on
the forceCommandPropagation, and just fix the code to propagate MULTI when
needed rather than depending on the command flags at all.
The implication of my change then is that now it won't decide to propagate
MULTI when it sees one of these: SELECT, PING, INFO, COMMAND, TIME and
other commands which are neither read nor write.
3. Changing getNodeByQuery and clusterRedirectBlockedClientIfNeeded in
cluster.c to look at !write rather than read flag.
This should have no implications, since these code paths are only reachable
for commands which access keys, and these are always marked as either read
or write.
This commit improve MULTI propagation tests, for modules and a bunch of
other special cases, all of which used to pass already before that commit.
the only one that test change that uncovered a change of behavior is the
one that DELs a non-existing key, it used to propagate an empty
multi-exec block, and no longer does.
Additionally the older defrag tests are using an obsolete way to check
if the defragger is suuported (the error no longer contains "DISABLED").
this doesn't usually makes a difference since these tests are completely
skipped if the allocator is not jemalloc, but that would fail if the
allocator is a jemalloc that doesn't support defrag.
Add a new set of defrag functions that take a defrag context and allow
defragmenting memory blocks and RedisModuleStrings.
Modules can register a defrag callback which will be invoked when the
defrag process handles globals.
Modules with custom data types can also register a datatype-specific
defrag callback which is invoked for keys that require defragmentation.
The callback and associated functions support both one-step and
multi-step options, depending on the complexity of the key as exposed by
the free_effort callback.
Add commands to query geospatial data with bounding box.
Two new commands that replace the existing 4 GEORADIUS* commands.
GEOSEARCH key [FROMMEMBER member] [FROMLOC long lat] [BYRADIUS radius
unit] [BYBOX width height unit] [WITHCORD] [WITHDIST] [WITHASH] [COUNT
count] [ASC|DESC]
GEOSEARCHSTORE dest_key src_key [FROMMEMBER member] [FROMLOC long lat]
[BYRADIUS radius unit] [BYBOX width height unit] [WITHCORD] [WITHDIST]
[WITHASH] [COUNT count] [ASC|DESC] [STOREDIST]
- Add two types of CIRCULAR_TYPE and RECTANGLE_TYPE to achieve different searches
- Judge whether the point is within the rectangle, refer to:
geohashGetDistanceIfInRectangle
This adds a new `tls-client-cert-file` and `tls-client-key-file`
configuration directives which make it possible to use different
certificates for the TLS-server and TLS-client functions of Redis.
This is an optional directive. If it is not specified the `tls-cert-file`
and `tls-key-file` directives are used for TLS client functions as well.
Also, `utils/gen-test-certs.sh` now creates additional server-only and client-only certs and will skip intensive operations if target files already exist.
This adds a copy callback for module data types, in order to make
modules compatible with the new COPY command.
The callback is optional and COPY will fail for keys with data types
that do not implement it.
Module blocked clients cache the response in a temporary client,
the reply list in this client would be affected by the recent fix
in #7202, but when the reply is later copied into the real client,
it would have bypassed all the checks for output buffer limit, which
would have resulted in both: responding with a partial response to
the client, and also not disconnecting it at all.
c4fdf09c0 added a test that now fails with valgrind
it fails for two resons:
1) the test samples the used memory and then limits the maxmemory to
that value, but it turns out this is not atomic and on slow machines
the background cron process that clean out old query buffers reduces
the memory so that the setting doesn't cause eviction.
2) the dbsize was tested late, after reading some invalidation messages
by that time more and more keys got evicted, partially draining the
db. this is not the focus of this fix (still a known limitation)
when the same consumer re-claim an entry that it already has, there's
no need to remove-and-insert if it's the same rax.
we do need to update the idle time though.
this commit only improves efficiency (doesn't change behavior).
* Add CLIENT INFO subcommand.
The output is identical to CLIENT LIST but provides a single line for
the current client only.
* Add CLIENT LIST ID [id...].
Co-authored-by: Itamar Haber <itamar@redislabs.com>
The test creates keys with various encodings, DUMP them, corrupt the payload
and RESTORES it.
It utilizes the recently added use-exit-on-panic config to distinguish between
asserts and segfaults.
If the restore succeeds, it runs random commands on the key to attempt to
trigger a crash.
It runs in two modes, one with deep sanitation enabled and one without.
In the first one we don't expect any assertions or segfaults, in the second one
we expect assertions, but no segfaults.
We also check for leaks and invalid reads using valgrind, and if we find them
we print the commands that lead to that issue.
Changes in the code (other than the test):
- Replace a few NPD (null pointer deference) flows and division by zero with an
assertion, so that it doesn't fail the test. (since we set the server to use
`exit` rather than `abort` on assertion).
- Fix quite a lot of flows in rdb.c that could have lead to memory leaks in
RESTORE command (since it now responds with an error rather than panic)
- Add a DEBUG flag for SET-SKIP-CHECKSUM-VALIDATION so that the test don't need
to bother with faking a valid checksum
- Remove a pile of code in serverLogObjectDebugInfo which is actually unsafe to
run in the crash report (see comments in the code)
- fix a missing boundary check in lzf_decompress
test suite infra improvements:
- be able to run valgrind checks before the process terminates
- rotate log files when restarting servers
- improve stream rdb encoding test to include more types of stream metadata
- add test to cover various ziplist encoding entries (although it does
look like the stress test above it is able to find some too
- add another test for ziplist encoding for hash with full sanitization
- add similar ziplist encoding tests for list
When client tracking is enabled signalModifiedKey can increase memory usage,
this can cause the loop in performEvictions to keep running since it was measuring
the memory usage impact of signalModifiedKey.
The section that measures the memory impact of the eviction should be just on dbDelete,
excluding keyspace notification, client tracking, and propagation to AOF and replicas.
This resolves part of the problem described in #8069
p.s. fix took 1 minute, test took about 3 hours to write.
One way this was happening is when a module issued an RM_Call which would inject MULTI.
If the module command that does that was itself issued by something else that already did
added MULTI (e.g. another module, or a Lua script), it would have caused nested MULTI.
In fact the MULTI state in the client or the MULTI_EMITTED flag in the context isn't
the right indication that we need to propagate MULTI or not, because on a nested calls
(possibly a module action called by a keyspace event of another module action), these
flags aren't retained / reflected.
instead there's now a global propagate_in_transaction flag for that.
in addition to that, we now have a global in_eval and in_exec flags, to serve the flags
of RM_GetContextFlags, since their dependence on the current client is wrong for the same
reasons mentioned above.
As we know, redis may reject user's requests or evict some keys if
used memory is over maxmemory. Dictionaries expanding may make
things worse, some big dictionaries, such as main db and expires dict,
may eat huge memory at once for allocating a new big hash table and be
far more than maxmemory after expanding.
There are related issues: #4213#4583
More details, when expand dict in redis, we will allocate a new big
ht[1] that generally is double of ht[0], The size of ht[1] will be
very big if ht[0] already is big. For db dict, if we have more than
64 million keys, we need to cost 1GB for ht[1] when dict expands.
If the sum of used memory and new hash table of dict needed exceeds
maxmemory, we shouldn't allow the dict to expand. Because, if we
enable keys eviction, we still couldn't add much more keys after
eviction and rehashing, what's worse, redis will keep less keys when
redis only remains a little memory for storing new hash table instead
of users' data. Moreover users can't write data in redis if disable
keys eviction.
What this commit changed ?
Add a new member function expandAllowed for dict type, it provide a way
for caller to allow expand or not. We expose two parameters for this
function: more memory needed for expanding and dict current load factor,
users can implement a function to make a decision by them.
For main db dict and expires dict type, these dictionaries may be very
big and cost huge memory for expanding, so we implement a judgement
function: we can stop dict to expand provisionally if used memory will
be over maxmemory after dict expands, but to guarantee the performance
of redis, we still allow dict to expand if dict load factor exceeds the
safe load factor.
Add test cases to verify we don't allow main db to expand when left
memory is not enough, so that avoid keys eviction.
Other changes:
For new hash table size when expand. Before this commit, the size is
that double used of dict and later _dictNextPower. Actually we aim to
control a dict load factor between 0.5 and 1.0. Now we replace *2 with
+1, since the first check is that used >= size, the outcome of before
will usually be the same as _dictNextPower(used+1). The only case where
it'll differ is when dict_can_resize is false during fork, so that later
the _dictNextPower(used*2) will cause the dict to jump to *4 (i.e.
_dictNextPower(1025*2) will return 4096).
Fix rehash test cases due to changing algorithm of new hash table size
when expand.
SELECT used to read the index into a `long` variable, and then pass it to a function
that takes an `int`, possibly causing an overflow before the range check.
Now all these commands use better and cleaner range check, and that also results in
a slight change of the error response in case of an invalid database index.
SELECT:
in the past it would have returned either `-ERR invalid DB index` (if not a number),
or `-ERR DB index is out of range` (if not between 1..16 or alike).
now it'll return either `-ERR value is out of range` (if not a number), or
`-ERR value is out of range, value must between -2147483648 and 2147483647`
(if not in the range for an int), or `-ERR DB index is out of range`
(if not between 0..16 or alike)
MOVE:
in the past it would only fail with `-ERR index out of range` no matter the reason.
now return the same errors as the new ones for SELECT mentioned above.
(i.e. unlike for SELECT even for a value like 17 we changed the error message)
COPY:
doesn't really matter how it behaved in the past (new command), new behavior is
like the above two.
Fixes#7923.
This PR appropriates the special `&` symbol (because `@` and `*` are taken),
followed by a literal value or pattern for describing the Pub/Sub patterns that
an ACL user can interact with. It is similar to the existing key patterns
mechanism in function (additive) and implementation (copy-pasta). It also adds
the allchannels and resetchannels ACL keywords, naturally.
The default user is given allchannels permissions, whereas new users get
whatever is defined by the acl-pubsub-default configuration directive. For
backward compatibility in 6.2, the default of this directive is allchannels but
this is likely to be changed to resetchannels in the next major version for
stronger default security settings.
Unless allchannels is set for the user, channel access permissions are checked
as follows :
* Calls to both PUBLISH and SUBSCRIBE will fail unless a pattern matching the
argumentative channel name(s) exists for the user.
* Calls to PSUBSCRIBE will fail unless the pattern(s) provided as an argument
literally exist(s) in the user's list.
Such failures are logged to the ACL log.
Runtime changes to channel permissions for a user with existing subscribing
clients cause said clients to disconnect unless the new permissions permit the
connections to continue. Note, however, that PSUBSCRIBErs' patterns are matched
literally, so given the change bar:* -> b*, pattern subscribers to bar:* will be
disconnected.
Notes/questions:
* UNSUBSCRIBE, PUNSUBSCRIBE and PUBSUB remain unprotected due to lack of reasons
for touching them.
The bug was introduced by #5021 which only attempted avoid EXIST on an
already expired key from returning 1 on a replica.
Before that commit, dbExists was used instead of
lookupKeyRead (which had an undesired effect to "touch" the LRU/LFU)
Other than that, this commit fixes OBJECT to also come empty handed on
expired keys in replica.
And DEBUG DIGEST-VALUE to behave like DEBUG OBJECT (get the data from
the key regardless of it's expired state)
Blocking command should not be used with MULTI, LUA, and RM_Call. This is because,
the caller, who executes the command in this context, expects a reply.
Today, LUA and MULTI have a special (and different) treatment to blocking commands:
LUA - Most commands are marked with no-script flag which are checked when executing
and command from LUA, commands that are not marked (like XREAD) verify that their
blocking mode is not used inside LUA (by checking the CLIENT_LUA client flag).
MULTI - Command that is going to block, first verify that the client is not inside
multi (by checking the CLIENT_MULTI client flag). If the client is inside multi, they
return a result which is a match to the empty key with no timeout (for example blpop
inside MULTI will act as lpop)
For modules that perform RM_Call with blocking command, the returned results type is
REDISMODULE_REPLY_UNKNOWN and the caller can not really know what happened.
Disadvantages of the current state are:
No unified approach, LUA, MULTI, and RM_Call, each has a different treatment
Module can not safely execute blocking command (and get reply or error).
Though It is true that modules are not like LUA or MULTI and should be smarter not
to execute blocking commands on RM_Call, sometimes you want to execute a command base
on client input (for example if you create a module that provides a new scripting
language like javascript or python).
While modules (on modules command) can check for REDISMODULE_CTX_FLAGS_LUA or
REDISMODULE_CTX_FLAGS_MULTI to know not to block the client, there is no way to
check if the command came from another module using RM_Call. So there is no way
for a module to know not to block another module RM_Call execution.
This commit adds a way to unify the treatment for blocking clients by introducing
a new CLIENT_DENY_BLOCKING client flag. On LUA, MULTI, and RM_Call the new flag
turned on to signify that the client should not be blocked. A blocking command
verifies that the flag is turned off before blocking. If a blocking command sees
that the CLIENT_DENY_BLOCKING flag is on, it's not blocking and return results
which are matches to empty key with no timeout (as MULTI does today).
The new flag is checked on the following commands:
List blocking commands: BLPOP, BRPOP, BRPOPLPUSH, BLMOVE,
Zset blocking commands: BZPOPMIN, BZPOPMAX
Stream blocking commands: XREAD, XREADGROUP
SUBSCRIBE, PSUBSCRIBE, MONITOR
In addition, the new flag is turned on inside the AOF client, we do not want to
block the AOF client to prevent deadlocks and commands ordering issues (and there
is also an existing assert in the code that verifies it).
To keep backward compatibility on LUA, all the no-script flags on existing commands
were kept untouched. In addition, a LUA special treatment on XREAD and XREADGROUP was kept.
To keep backward compatibility on MULTI (which today allows SUBSCRIBE, and PSUBSCRIBE).
We added a special treatment on those commands to allow executing them on MULTI.
The only backward compatibility issue that this PR introduces is that now MONITOR
is not allowed inside MULTI.
Tests were added to verify blocking commands are not blocking the client on LUA, MULTI,
or RM_Call. Tests were added to verify the module can check for CLIENT_DENY_BLOCKING flag.
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Itamar Haber <itamar@redislabs.com>
ZREVRANGEBYSCORE key max min [WITHSCORES] [LIMIT offset count]
When the offset is too large, the query is very slow. Especially when the offset is greater than the length of zset it is easy to determine whether the offset is greater than the length of zset at first, and If it exceed the length of zset, then return directly.
Co-authored-by: Oran Agra <oran@redislabs.com>
Syntax:
COPY <key> <new-key> [DB <dest-db>] [REPLACE]
No support for module keys yet.
Co-authored-by: tmgauss
Co-authored-by: Itamar Haber <itamar@redislabs.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
Add two optional callbacks to the RedisModuleTypeMethods structure, which is `free_effort`
and `unlink`. the `free_effort` callback indicates the effort required to free a module memory.
Currently, if the effort exceeds LAZYFREE_THRESHOLD, the module memory may be released
asynchronously. the `unlink` callback indicates the key has been removed from the DB by redis, and
may soon be freed by a background thread.
Add `lazyfreed_objects` info field, which represents the number of objects that have been
lazyfreed since redis was started.
Add `RM_GetTypeMethodVersion` API, which return the current redis-server runtime value of
`REDISMODULE_TYPE_METHOD_VERSION`. You can use that when calling `RM_CreateDataType` to know
which fields of RedisModuleTypeMethods are gonna be supported and which will be ignored.
- Add ZDIFF and ZDIFFSTORE which work similarly to SDIFF and SDIFFSTORE
- Make sure the new WITHSCORES argument that was added for ZUNION isn't considered valid for ZUNIONSTORE
Co-authored-by: Oran Agra <oran@redislabs.com>
Test support for the new map, null and push message types. Map objects are parsed as a list of lists of key value pairs.
for instance: user => john password => 123
will be parsed to the following TCL list:
{{user john} {password 123}}
Also added the following tests:
Redirection still works with RESP3
Able to use a RESP3 client as a redirection client
No duplicate invalidation messages when turning BCAST mode on after normal tracking
Server is able to evacuate enough keys when num of keys surpasses limit by more than defined initial effort
Different clients using different protocols can track the same key
OPTOUT tests
OPTIN tests
Clients can redirect to the same connection
tracking-redir-broken test
HELLO 3 checks
Invalidation messages still work when using RESP3, with and without redirection
Switching to RESP3 doesn't disturb previous tracked keys
Tracking info is correct
Flushall and flushdb produce invalidation messages
These tests achieve 100% line coverage for tracking.c using lcov.
Perform full reset of all client connection states, is if the client was
disconnected and re-connected. This affects:
* MULTI state
* Watched keys
* MONITOR mode
* Pub/Sub subscription
* ACL/Authenticated state
* Client tracking state
* Cluster read-only/asking state
* RESP version (reset to 2)
* Selected database
* CLIENT REPLY state
The response is +RESET to make it easily distinguishable from other
responses.
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Itamar Haber <itamar@redislabs.com>
Not disabling save, slower systems begun background save that did not
complete in time, resulting with SAVE failing with "ERR Background save
already in progress".
In redisFork(), we don't set child pid, so updateDictResizePolicy()
doesn't take effect, that isn't friendly for copy-on-write.
The bug was introduced this in redis 6.0: 56258c6
This wrong behavior was backed by a test, and also documentation, and dates back to 2010.
But it makes no sense to anyone involved so it was decided to change that.
Note that 20eeddf (invalidate watch on expire on access) was released in 6.0 RC2
and 2d1968f released in in 6.0.0 GA (invalidate watch when key is evicted).
both of which do similar changes.
introduces a NOMKSTREAM option for xadd command, this would be useful for some
use cases when we do not want to create new stream by default:
XADD key [MAXLEN [~|=] <count>] [NOMKSTREAM] <ID or *> [field value] [field value]
* Introduce a new API's: RM_GetContextFlagsAll, and
RM_GetKeyspaceNotificationFlagsAll that will return the
full flags mask of each feature. The module writer can
check base on this value if the Flags he needs are
supported or not.
* For each flag, introduce a new value on redismodule.h,
this value represents the LAST value and should be there
as a reminder to update it when a new value is added,
also it will be used in the code to calculate the full
flags mask (assuming flags are incrementally increasing).
In addition, stated that the module writer should not use
the LAST flag directly and he should use the GetFlagAll API's.
* Introduce a new API: RM_IsSubEventSupported, that returns for a given
event and subevent, whether or not the subevent supported.
* Introduce a new macro RMAPI_FUNC_SUPPORTED(func) that returns whether
or not a function API is supported by comparing it to NULL.
* Introduce a new API: int RM_GetServerVersion();, that will return the
current Redis version in the format 0x00MMmmpp; e.g. 0x00060008;
* Changed unstable version from 999.999.999 to 255.255.255
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>