redict

mirror of https://codeberg.org/redict/redict.git synced 2025-01-22 16:18:28 -05:00

Author	SHA1	Message	Date
sundb	8aad2ac352	Add missing lua_pop in luaGetFromRegistry (#11097 ) This pr mainly has the following four changes: 1. Add missing lua_pop in `luaGetFromRegistry`. This bug affects `redis.register_function`, where `luaGetFromRegistry` in `luaRegisterFunction` will return null when we call `redis.register_function` nested. .e.g ``` FUNCTION LOAD "#!lua name=mylib \n local lib=redis \n lib.register_function('f2', function(keys, args) lib.register_function('f1', function () end) end)" fcall f2 0 ```` But since we exit when luaGetFromRegistry returns null, it does not cause the stack to grow indefinitely. 3. When getting `REGISTRY_RUN_CTX_NAME` from the registry, use `serverAssert` instead of error return. Since none of these lua functions are registered at the time of function load, scriptRunCtx will never be NULL. 4. Add `serverAssert` for `luaLdbLineHook`, `luaEngineLoadHook`. 5. Remove `luaGetFromRegistry` from `redis_math_random` and `redis_math_randomseed`, it looks like they are redundant.	2022-08-14 11:50:18 +03:00
DarrenJiang13	44859a41ee	fix the client type in trackingInvalidateKey() (#11052 ) Fix bug with scripts ignoring client tracking NOLOOP and send an invalidation message anyway.	2022-08-10 11:58:54 +03:00
Valentino Geron	3270f2d54e	Tests: improve skip tags around resp3 (#11090 ) some skip tags were missing on some tests avoid using HELLO if denytags has resp3 (target server may not support it) Co-authored-by: Valentino Geron <valentino@redis.com>	2022-08-07 16:32:31 +03:00
Huang Zhw	ec5034a2e3	acl: bitfield with get and set\|incrby can be executed with readonly permission (#11086 ) `bitfield` with `get` may not be readonly. ``` 127.0.0.1:6384> acl setuser hello on nopass %R~* +@all OK 127.0.0.1:6384> auth hello 1 OK 127.0.0.1:6384> bitfield hello set i8 0 1 (error) NOPERM this user has no permissions to access one of the keys used as arguments 127.0.0.1:6384> bitfield hello set i8 0 1 get i8 0 1) (integer) 0 2) (integer) 1 ``` Co-authored-by: Oran Agra <oran@redislabs.com>	2022-08-07 09:21:19 +03:00
Binbin	6a7dd00cdd	Re-enable aof-race integration tests (#10972 ) This is the history of aof-race related changes: 1. added in `3aa4b00970` 2. disabled in `dcdfd005a0` 3. enabled in `5c63922691` 4. disabled in `53a2af3941` This PR refreshes the aof-race test, re-enable it. Closes #10971	2022-08-04 11:13:29 +03:00
Valentino Geron	dcafee55a5	Fix acl tests to support `--singledb` flag (#11077 ) * some of the tests don't clean the key the use * marked tests with `{singledb:skip}` if they use SELECT Co-authored-by: Valentino Geron <valentino@redis.com>	2022-08-03 12:11:32 +03:00
Wen Hui	beb9746a9f	Fix function load error message (#10964 ) Update error messages for function load	2022-08-02 18:19:53 -07:00
Binbin	9f0f533bc8	Solve usleep compilation warning in keyspace_events.c (#11073 ) There is a -Wimplicit-function-declaration warning in here: ``` keyspace_events.c: In function ‘KeySpace_NotificationGeneric’: keyspace_events.c:67:9: warning: implicit declaration of function ‘usleep’; did you mean ‘sleep’? [-Wimplicit-function-declaration] 67 \| usleep(1); \| ^~~~~~ \| sleep ```	2022-08-02 18:00:11 +03:00
Binbin	e13b681874	Tests (cluster / sentinel): add --stop and--loop options (#11070 ) --stop: Blocks once the first test fails. --loop: Execute the specified set of tests forever. It is useful when we debug some test failures.	2022-08-01 10:12:27 +03:00
Huang Zhw	61451b02cb	tracking pending invalidation message of flushdb sent by (#11068 ) trackingHandlePendingKeyInvalidations should use proto.	2022-07-31 16:14:39 +03:00
Binbin	e7144693e2	Fix bgsaveerr issue in psync wrong offset test (#11043 ) The kill above is sometimes successful and sometimes already too late. The PING in pysnc wrong offset test got rejected by bgsaveerr because lastbgsave_status is C_ERR. In theory, using diskless can avoid PING being affected, because when the replica is dropped, we will kill the child with SIGUSR1, and this will not affect lastbgsave_status. Anyway, this kill is not particularly needed here, dropping the kill is the best one, since we do have the waitForBgsave, so just let it take care of the bgsave. No need for fast termination.	2022-07-27 14:58:25 +03:00
guybe7	45c99d7092	Adds RM_Microseconds and RM_CachedMicroseconds (#11016 ) RM_Microseconds Return the wall-clock Unix time, in microseconds RM_CachedMicroseconds Returns a cached copy of the Unix time, in microseconds. It is updated in the server cron job and before executing a command. It is useful for complex call stacks, such as a command causing a key space notification, causing a module to execute a RedisModule_Call, causing another notification, etc. It makes sense that all these callbacks would use the same clock.	2022-07-27 14:40:05 +03:00
Huang Zhw	6f0a27e38e	When client tracking is on, invalidation message of flushdb in a (#11038 ) When FLUSHDB / FLUSHALL / SWAPDB is inside MULTI / EXEC, the client side tracking invalidation message was interleaved with transaction response.	2022-07-26 13:28:37 +03:00
Meir Shpilraien (Spielrein)	020e046b42	Fix #11030 , use lua_rawget to avoid triggering metatables and crash. (#11032 ) Fix #11030, use lua_rawget to avoid triggering metatables. #11030 shows how return `_G` from the Lua script (either function or eval), cause the Lua interpreter to Panic and the Redis processes to exit with error code 1. Though return `_G` only panic on Redis 7 and 6.2.7, the underline issue exists on older versions as well (6.0 and 6.2). The underline issue is returning a table with a metatable such that the metatable raises an error. The following example demonstrate the issue: ``` 127.0.0.1:6379> eval "local a = {}; setmetatable(a,{__index=function() foo() end}) return a" 0 Error: Server closed the connection ``` ``` PANIC: unprotected error in call to Lua API (user_script:1: Script attempted to access nonexistent global variable 'foo') ``` The Lua panic happened because when returning the result to the client, Redis needs to introspect the returning table and transform the table into a resp. In order to scan the table, Redis uses `lua_gettable` api which might trigger the metatable (if exists) and might raise an error. This code is not running inside `pcall` (Lua protected call), so raising an error causes the Lua to panic and exit. Notice that this is not a crash, its a Lua panic that exit with error code 1. Returning `_G` panics on Redis 7 and 6.2.7 because on those versions `_G` has a metatable that raises error when trying to fetch a none existing key. ### Solution Instead of using `lua_gettable` that might raise error and cause the issue, use `lua_rawget` that simply return the value from the table without triggering any metatable logic. This is promised not to raise and error. The downside of this solution is that it might be considered as breaking change, if someone rely on metatable in the returned value. An alternative solution is to wrap this entire logic with `pcall` (Lua protected call), this alternative require a much bigger refactoring. ### Back Porting The same fix will work on older versions as well (6.2, 6.0). Notice that on those version, the issue can cause Redis to crash if inside the metatable logic there is an attempt to accesses Redis (`redis.call`). On 7.0, there is not crash and the `redis.call` is executed as if it was done from inside the script itself. ### Tests Tests was added the verify the fix	2022-07-26 10:33:50 +03:00
Viktor Söderqvist	5032de50f2	Gossip forgotten nodes on `CLUSTER FORGET` (#10869 ) Gossip the cluster node blacklist in ping and pong messages. This means that CLUSTER FORGET doesn't need to be sent to all nodes in a cluster. It can be sent to one or more nodes and then be propagated to the rest of them. For each blacklisted node, its node id and its remaining blacklist TTL is gossiped in a cluster bus ping extension (introduced in #9530).	2022-07-26 10:28:13 +03:00
Binbin	5ce64ab010	Fix timing issue in cluster test (#11008 ) A timing issue like this was reported in freebsd daily CI: ``` *** [err]: Sanity test push cmd after resharding in tests/unit/cluster/cli.tcl Expected 'CLUSTERDOWN The cluster is down' to match 'MOVED' ``` We additionally wait for each node to reach a consensus on the cluster state in wait_for_condition to avoid the cluster down error. The fix just like #10495, quoting madolson's comment: Cluster check just verifies the the config state is self-consistent, waiting for cluster_state to be okay is an independent check that all the nodes actually believe each other are healthy. At the same time i noticed that unit/moduleapi/cluster.tcl has an exact same test, may have the same problem, also modified it.	2022-07-18 20:35:13 -07:00
Oran Agra	2825b6057b	Fix heap overflow corruption in XAUTOCLAIM (CVE-2022-31144) (#11002 ) The temporary array for deleted entries reply of XAUTOCLAIM was insufficient, but also in fact the COUNT argument should be used to control the size of the reply, so instead of terminating the loop by only counting the claimed entries, we'll count deleted entries as well. Fix #10968 Addresses CVE-2022-31144	2022-07-18 11:36:19 +03:00
ranshid	eacca729a5	Avoid using unsafe C functions (#10932 ) replace use of: sprintf --> snprintf strcpy/strncpy --> redis_strlcpy strcat/strncat --> redis_strlcat why are we making this change? Much of the code uses some unsafe variants or deprecated buffer handling functions. While most cases are probably not presenting any issue on the known path programming errors and unterminated strings might lead to potential buffer overflows which are not covered by tests. As part of this PR we change 1. added implementation for redis_strlcpy and redis_strlcat based on the strl implementation: https://linux.die.net/man/3/strl 2. change all occurrences of use of sprintf with use of snprintf 3. change occurrences of use of strcpy/strncpy with redis_strlcpy 4. change occurrences of use of strcat/strncat with redis_strlcat 5. change the behavior of ll2string/ull2string/ld2string so that it will always place null termination ('\0') on the output buffer in the first index. this was done in order to make the use of these functions more safe in cases were the user will not check the output returned by them (for example in rdbRemoveTempFile) 6. we added a compiler directive to issue a deprecation error in case a use of sprintf/strcpy/strcat is found during compilation which will result in error during compile time. However keep in mind that since the deprecation attribute is not supported on all compilers, this is expected to fail during push workflows. NOTE: while this is only an initial milestone. We might also consider using the *_s implementation provided by the C11 Extensions (however not yet widly supported). I would also suggest to start looking at static code analyzers to track unsafe use cases. For example LLVM clang checker supports security.insecureAPI.DeprecatedOrUnsafeBufferHandling which can help locate unsafe function usage. https://clang.llvm.org/docs/analyzer/checkers.html#security-insecureapi-deprecatedorunsafebufferhandling-c The main reason not to onboard it at this stage is that the alternative excepted by clang is to use the C11 extensions which are not always supported by stdlib.	2022-07-18 10:56:26 +03:00
Madelyn Olson	3abdec9969	Fix cluster hostnames test causing failover while running valgrind (#10991 ) In the newly added cluster hostnames test, the primary is failing over during the reboot for valgrind so we are validating the wrong node. This change just sets the replica to prevent taking over, which seems to fix the test. We could have also set the timeout higher, but it slows down the test.	2022-07-17 09:57:34 +03:00
Valentino Geron	847cdca151	Add an option to specify multiple skip files using `--skipfile` (#10975 ) `--skipfile` can be repeated. For example: ./runtests --skipfile file1.txt --skipfile file2.txt Co-authored-by: Valentino Geron <valentino@redis.com>	2022-07-17 08:47:35 +03:00
Oran Agra	599e59ebc5	Avoid valgrind fishy value warning on corrupt restore payloads (#10937 ) The corrupt dump fuzzer uncovered a valgrind warning saying: ``` ==76370== Argument 'size' of function malloc has a fishy (possibly negative) value: -3744781444216323815 ``` This allocation would have failed (returning NULL) and being handled properly by redis (even before this change), but we also want to silence the valgrind warnings (which are checking that casting to ssize_t produces a non-negative value). The solution i opted for is to explicitly fail these allocations (returning NULL), before even reaching `malloc` (which would have failed and return NULL too). The implication is that we will not be able to support a single allocation of more than 2GB on a 32bit system (which i don't think is a realistic scenario). i.e. i do think we could be facing cases were redis consumes more than 2gb on a 32bit system, but not in a single allocation. The byproduct of this, is that i dropped the overflow assertions, since these will now lead to the same OOM panic we have for failed allocations.	2022-07-13 09:14:38 +03:00
Madelyn Olson	8a4e3bcd8d	Cluster test improvements (#10920 ) * Restructured testing to allow running cluster tests easily as part of the normal testing	2022-07-12 10:41:29 -07:00
Binbin	693acc0114	Trying to fix cluster test (#10963 ) #10942 break the new test added in #10449 ``` Testing unit: 29-slot-migration-response.tcl Cluster Join and auto-discovery test: FAILED: Cluster failed to join into a full mesh. ``` It looks like we need to wait for the cluster in 28 to become stable.	2022-07-11 15:21:35 +03:00
Binbin	35e8ae3eb5	Add cluster-port support to redis-cli --cluster (#10344 ) In #9389, we add a new `cluster-port` config and make cluster bus port configurable, and currently redis-cli --cluster create/add-node doesn't support with a configurable `cluster-port` instance. Because redis-cli uses the old way (port + 10000) to send the `CLUSTER MEET` command. Now we add this support on redis-cli `--cluster`, note we don't need to explicitly pass in the `cluster-port` parameter, we can get the real `cluster-port` of the node in `clusterManagerNodeLoadInfo`, so the `--cluster create` and `--cluster add-node` interfaces have not changed. We will use the `cluster-port` when we are doing `CLUSTER MEET`, also note that `CLUSTER MEET` bus-port parameter was added in 4.0, so if the bus_port (the one in redis-cli) is 0, or equal (port + 10000), we just call `CLUSTER MEET` with 2 arguments, using the old form. Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com>	2022-07-11 11:23:31 +03:00
Madelyn Olson	e6a1b2ea95	Fix crash during handshake and cluster shards call (#10942 ) * Fix an engine crash when there are nodes in handshaking and a user calls cluster shards	2022-07-10 22:00:44 -07:00
Wen Hui	f620e6ac73	Add tests for error messages during slot migrations (#10449 ) * Add tests for error messages during slot migrations Co-authored-by: Ubuntu <lucas.guang.yang1@huawei.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>	2022-07-04 10:31:12 -05:00
Harkrishn Patro	0ab885a685	Account sharded pubsub channels memory consumption (#10925 ) Account sharded pubsub channels memory consumption in client memory usage computation to accurately evict client based on the set threshold for `maxmemory-clients`.	2022-07-04 09:18:57 +03:00
Yossi Gottlieb	69d5576832	Fix TLS tests on newer tcl-tls/OpenSSL. (#10910 ) Before this commit, TLS tests on Ubuntu 22.04 would fail as dropped connections result with an ECONNABORTED error thrown instead of an empty read.	2022-07-03 13:34:14 +03:00
Binbin	35e836c26d	Add SENTINEL command flag to CLIENT/COMMANDS subcommands (#10904 ) This was harmless because we marked the parent command with SENTINEL flag. So the populateCommandTable was ok. And we also don't show the flag (SENTINEL and ONLY-SENTNEL) in COMMAND INFO. In this PR, we also add the same CMD_SENTINEL and CMD_ONLY_SENTINEL flags check when populating the sub-commands. so that in the future it'll be possible to add some sub-commands to sentinel or sentinel-only but not others.	2022-06-30 16:32:40 +03:00
Wen Hui	51da5c3dde	Fix CLUSTER RESET command argument number issue (#10898 ) Fix regression of CLUSTER RESET command in redis 7.0. cluster reset command format is: CLUSTER RESET [ HARD \| SOFT] According to the cluster reset command doc and codes, the third argument is optional, so the arity in json file should be -2 instead of 3. Add test to verify future regressions with RESET and RESET SOFT that were not covered. Co-authored-by: Ubuntu <lucas.guang.yang1@huawei.com> Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: Binbin <binloveplay1314@qq.com>	2022-06-29 08:17:00 +03:00
jonnyomerredis	35c2ee8716	Add sharded pubsub keychannel count to client info (#10895 ) When calling CLIENT INFO/LIST, and in various debug prints, Redis is printing the number of pubsub channels / patterns the client is subscribed to. With the addition of sharded pubsub, it would be useful to print the number of keychannels the client is subscribed to as well.	2022-06-28 10:11:17 +03:00
Viktor Söderqvist	6af021007a	Add missing REDISMODULE_CLIENTINFO_INITIALIZER (#10885 ) The module API docs mentions this macro, but it was not defined (so no one could have used it). Instead of adding it as is, we decided to add a _V1 macro, so that if / when we some day extend this struct, modules that use this API and don't need the extra fields, will still use the old version and still be compatible with older redis version (despite being compiled with newer redismodule.h)	2022-06-27 08:29:05 +03:00
RinChanNOW!	2854637385	Support conversion between `RedisModuleString` and `unsigned long long` (#10889 ) Since the ranges of `unsigned long long` and `long long` are different, we cannot read an `unsigned long long` integer from a `RedisModuleString` by `RedisModule_StringToLongLong` . So I added two new Redis Module APIs to support the conversion between these two types: * `RedisModule_StringToULongLong` * `RedisModule_CreateStringFromULongLong` Signed-off-by: RinChanNOWWW <hzy427@gmail.com>	2022-06-26 15:02:52 +03:00
Binbin	d443e312ad	redis-server command line arguments allow passing config name and value in the same arg (#10866 ) This commit has two topics. ## Passing config name and value in the same arg In #10660 (Redis 7.0.1), when we supported the config values that can start with `--` prefix (one of the two topics of that PR), we broke another pattern: `redis-server redis.config "name value"`, passing both config name and it's value in the same arg, see #10865 This wasn't a intended change (i.e we didn't realize this pattern used to work). Although this is a wrong usage, we still like to fix it. Now we support something like: ``` src/redis-server redis.conf "--maxmemory '700mb'" "--maxmemory-policy volatile-lru" --proc-title-template --my--title--template --loglevel verbose ``` ## Changes around --save Also in this PR, we undo the breaking change we made in #10660 on purpose. 1. `redis-server redis.conf --save --loglevel verbose` (missing `save` argument before anotehr argument). In 7.0.1, it was throwing an wrong arg error. Now it will work and reset the save, similar to how it used to be in 7.0.0 and 6.2.x. 3. `redis-server redis.conf --loglevel verbose --save` (missing `save` argument as last argument). In 6.2, it did not reset the save, which was a bug (inconsistent with the previous bullet). Now we will make it work and reset the save as well (a bug fix).	2022-06-26 14:36:39 +03:00
Viktor Söderqvist	6272ca609e	Add RM_SetClientNameById and RM_GetClientNameById (#10839 ) Adding Module APIs to let the module read and set the client name of an arbitrary connection.	2022-06-26 14:34:59 +03:00
judeng	d2405b9b6b	fix benchmark failure in daily test with TLS (#10896 ) The new test added in #10891 can fail with a different error. see comment in networking.c saying ```c /* That's a best effort error message, don't check write errors. * Note that for TLS connections, no handshake was done yet so nothing * is written and the connection will just drop. */ ```	2022-06-23 18:19:36 +03:00
judeng	49876158cc	fix redis-benchmark's bug: check if clients are created successfully in idle mode (#10891 ) my maxclients config: ``` redis-cli config get maxclients 1) "maxclients" 2) "4064" ``` Before this bug was fixed, creating 4065 clients appeared to be successful, but only 4064 were actually created``` ``` ./redis-benchmark -c 4065 -I Creating 4065 idle connections and waiting forever (Ctrl+C when done) cients: 4065 ``` now : ``` ./redis-benchmark -c 4065 -I Creating 4065 idle connections and waiting forever (Ctrl+C when done) Error from server: ERR max number of clients reached ./redis-benchmark -c 4064 -I Creating 4064 idle connections and waiting forever (Ctrl+C when done) clients: 4064 ```	2022-06-22 19:30:22 +03:00
Meir Shpilraien (Spielrein)	61baabd8d5	Fix crash on RM_Call with script mode. (#10886 ) The PR fixes 2 issues: ### RM_Call crash on script mode `RM_Call` can potentially be called from a background thread where `server.current_client` are not set. In such case we get a crash on `NULL` dereference. The fix is to check first if `server.current_client` is `NULL`, if it does we should verify disc errors and readonly replica as we do to any normal clients (no masters nor AOF). ### RM_Call block OOM commands when not needed Again `RM_Call` can be executed on a background thread using a `ThreadSafeCtx`. In such case `server.pre_command_oom_state` can be irrelevant and should not be considered when check OOM state. This cause OOM commands to be blocked when not necessarily needed. In such case, check the actual used memory (and not the cached value). Notice that in order to know if the cached value can be used, we check that the ctx that was used on the `RM_Call` is a ThreadSafeCtx. Module writer can potentially abuse the API and use ThreadSafeCtx on the main thread. We consider this as a API miss used.	2022-06-21 10:01:13 +03:00
Moti Cohen	4c72a09b78	Fix sentinel acl change test. Timing issue. (#10868 ) Co-authored-by: moticless <moticless@github.com>	2022-06-19 09:45:16 +03:00
Oran Agra	2189100383	optimize zset conversion on large ZRANGESTORE (#10789 ) when we know the size of the zset we're gonna store in advance, we can check if it's greater than the listpack encoding threshold, in which case we can create a skiplist from the get go, and avoid converting the listpack to skiplist later after it was already populated.	2022-06-14 21:12:45 +03:00
Oran Agra	8ef4f1dbad	Script that made modification will not break with unexpected NOREPLICAS error (#10855 ) If a script made a modification and then was interrupted for taking too long. there's a chance redis will detect that a replica dropped and would like to reject write commands with NOREPLICAS due to insufficient good replicas. returning an error on a command in this case breaks the script atomicity. The same could in theory happen with READONLY, MISCONF, but i don't think these state changes can happen during script execution.	2022-06-14 21:09:50 +03:00
Oran Agra	ffa0077041	Allow ECHO in loading and stale modes (#10853 ) I noticed that scripting.tcl uses INFO from within a script and thought it's an overkill and concluded it's nicer to use another CMD_STALE command, decided to use ECHO, and then noticed it's not at all allowed in stale mode. probably overlooked at #6843	2022-06-14 08:48:08 +03:00
Binbin	92fb4f4f61	Fixed SET and BITFIELD commands being wrongly marked movablekeys (#10837 ) The SET and BITFIELD command were added `get_keys_function` in #10148, causing them to be wrongly marked movablekeys in `populateCommandMovableKeys`. This was an unintended side effect introduced in #10148 (7.0 RC1) which could cause some clients an extra round trip for these commands in cluster mode. Since we define movablekeys as a way to determine if the legacy range [first, last, step] doesn't find all keys, then we need a completely different approach. The right approach should be to check if the legacy range covers all key-specs, and if none of the key-specs have the INCOMPLETE flag. This way, we don't need to look at getkeys_proc of VARIABLE_FLAG at all. Probably with the exception of modules, who may still not be using key-specs. In this PR, we removed `populateCommandMovableKeys` and put its logic in `populateCommandLegacyRangeSpec`. In order to properly serve both old and new modules, we must probably keep relying CMD_MODULE_GETKEYS, but do that only for modules that don't declare key-specs. For ones that do, we need to take the same approach we take with native redis commands. This approach was proposed by Oran. Fixes #10833 Co-authored-by: Oran Agra <oran@redislabs.com>	2022-06-12 08:22:18 +03:00
Christian Krieg	032619b82b	Fixing test to consider statically linked binaries (#10835 ) The test calls `ldd` on `redis-server` in order to find out whether the binary was linked against `libmusl`; However, `ldd` returns a value different from `0` when statically linking the binaries agains libc-musl, because `redis-server` is not a dynamic executable (as given by the exception thrown by the failing test), and `make test` terminates with an error:: $ ldd src/redis-server not a dynamic executable $ echo $? 1 This commit fixes the test by ignoring such failures. Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>	2022-06-09 12:59:33 +03:00
Petr Vaněk	f22bfe86b6	Update musl libc detection pattern (#10826 ) This change fixes failing `integration/logging.tcl` test in Gentoo with musl libc, where `ldd` returns ``` libc.so => /lib/ld-musl-x86_64.so.1 (0x7f9d5f171000) ``` unlike Alpine's ``` libc.musl-x86_64.so.1 => /lib/ld-musl-x86_64.so.1 (0x7f82cfa16000) ``` The solution is to extend matching pattern introduced in #8532.	2022-06-07 18:47:01 +03:00
zhaozhao.zz	a18c91d642	rewrite alias config to original name (#10811 ) Redis 7 adds some new alias config like `hash-max-listpack-entries` alias `hash-max-ziplist-entries`. If a config file contains both real name and alias like this: ``` hash-max-listpack-entries 20 hash-max-ziplist-entries 20 ``` after set `hash-max-listpack-entries` to 100 and `config rewrite`, the config file becomes to: ``` hash-max-listpack-entries 100 hash-max-ziplist-entries 20 ``` we can see that the alias config is not modified, and users will get wrong config after restart. 6.0 and 6.2 doesn't have this bug, since they only have the `slave` word alias. Co-authored-by: Oran Agra <oran@redislabs.com>	2022-06-02 14:03:47 +03:00
zhugezy	cf3323dba4	Fix bugs in CONFIG REWRITE, omitting rename-command and include lines, and inserting comments around module and acl configs (#10761 ) A regression from #10285 (redis 7.0). CONFIG REWRITE would put lines with: `include`, `rename-command`, `user`, `loadmodule`, and any module specific config in a comment. For ACL `user`, `loadmodule` and module specific configs would be re-inserted at the end (instead of updating existing lines), so the only implication is a messy config file full of comments. But for `rename-command` and `include`, the implication would be that they're now missing, so a server restart would lose them. Co-authored-by: Oran Agra <oran@redislabs.com>	2022-06-02 08:36:55 +03:00
Oran Agra	df55861838	Expose script flags to processCommand for better handling (#10744 ) The important part is that read-only scripts (not just EVAL_RO and FCALL_RO, but also ones with `no-writes` executed by normal EVAL or FCALL), will now be permitted to run during CLIENT PAUSE WRITE (unlike before where only the _RO commands would be processed). Other than that, some errors like OOM, READONLY, MASTERDOWN are now handled by processCommand, rather than the command itself affects the error string (and even error code in some cases), and command stats. Besides that, now the `may-replicate` commands, PFCOUNT and PUBLISH, will be considered `write` commands in scripts and will be blocked in all read-only scripts just like other write commands. They'll also be blocked in EVAL_RO (i.e. even for scripts without the `no-writes` shebang flag. This commit also hides the `may_replicate` flag from the COMMAND command output. this is a breaking change. background about may_replicate: We don't want to expose a no-may-replicate flag or alike to scripts, since we consider the may-replicate thing an internal concern of redis, that we may some day get rid of. In fact, the may-replicate flag was initially introduced to flag EVAL: since we didn't know what it's gonna do ahead of execution, before function-flags existed). PUBLISH and PFCOUNT, both of which because they have side effects which may some day be fixed differently. code changes: The changes in eval.c are mostly code re-ordering: - evalCalcFunctionName is extracted out of evalGenericCommand - evalExtractShebangFlags is extracted luaCreateFunction - evalGetCommandFlags is new code	2022-06-01 14:09:40 +03:00
Oran Agra	b2061de2e7	Fix broken protocol in MISCONF error, RM_Yield bugs, RM_Call(EVAL) OOM check bug, and new RM_Call checks. (#10786 ) * Fix broken protocol when redis can't persist to RDB (general commands, not modules), excessive newline. regression of #10372 (7.0 RC3) * Fix broken protocol when Redis can't persist to AOF (modules and scripts), missing newline. * Fix bug in OOM check of EVAL scripts called from RM_Call. set the cached OOM state for scripts before executing module commands too, so that it can serve scripts that are executed by modules. i.e. in the past EVAL executed by RM_Call could have either falsely fail or falsely succeeded because of a wrong cached OOM state flag. * Fix bugs with RM_Yield: 1. SHUTDOWN should only accept the NOSAVE mode 2. Avoid eviction during yield command processing. 3. Avoid processing master client commands while yielding from another client * Add new two more checks to RM_Call script mode. 1. READONLY You can't write against a read only replica 2. MASTERDOWN Link with MASTER is down and `replica-serve-stale-data` is set to `no` * Add new RM_Call flag to let redis automatically refuse `deny-oom` commands while over the memory limit. * Add tests to cover various errors from Scripts, Modules, Modules calling scripts, and Modules calling commands in script mode. Add tests: * Looks like the MISCONF error was completely uncovered by the tests, add tests for it, including from scripts, and modules * Add tests for NOREPLICAS from scripts * Add tests for the various errors in module RM_Call, including RM_Call that calls EVAL, and RM_call in "eval mode". that includes: NOREPLICAS, READONLY, MASTERDOWN, MISCONF	2022-06-01 13:04:22 +03:00
Harkrishn Patro	4065b4f27e	Sharded pubsub publish messagebulk as smessage (#10792 ) To easily distinguish between sharded channel message and a global channel message, introducing `smessage` (instead of `message`) as message bulk for sharded channel publish message. This is gonna be a breaking change in 7.0.1! Background: Sharded pubsub introduced in redis 7.0, but after the release we quickly realized that the fact that it's problematic that the client can't distinguish between normal (global) pubsub messages and sharded ones. This is important because the same connection can subscribe to both, but messages sent to one pubsub system are not propagated to the other (they're completely separate), so if one connection is used to subscribe to both, we need to assist the client library to know which message it got so it can forward it to the correct callback.	2022-05-31 08:03:59 +03:00
Madelyn Olson	ed29d634b3	Add readonly flag to EVAL_RO, EVALSHA_RO and FCALL_RO (#10728 ) * Add readonly flag to EVAL_RO, EVALSHA_RO and FCALL_RO * Require users to explicitly declare @scripting to get access to lua scripting.	2022-05-29 23:42:56 -07:00
Binbin	1013cbeae2	Fix sentinel disconnect test timing issue after auth-pass change (#10784 ) There is a timing issue reported in test-sanitizer-address (gcc): ``` Sentinels (re)connection following SENTINEL SET mymaster auth-pass: FAILED: Expected to be disconnected from master due to wrong password ``` The reason we reach it, is because the test is fast enough to modify auth-pass and test sentinel connection status with the server, before its scheduled operation got the chance to update connection status with the server. We need to wait for `sentinelTimer` to kick in, and then update the connection status. Replace condition with wait_for_condition on the check. Fix just like #10480 did	2022-05-29 08:38:38 +03:00
Vitaly	6461f09f43	Fix ZRANGESTORE crash when zset_max_listpack_entries is 0 (#10767 ) When `zrangestore` is called container destination object is created. Before this PR we used to create a listpack based object even if `zset-max-ziplist-entries` or equivalent`zset-max-listpack-entries` was set to 0. This triggered immediate conversion of the listpack into a skiplist in `zrangestore`, which hits an assertion resulting in an engine crash. Added a TCL test that reproduces this issue.	2022-05-27 22:34:00 +03:00
Binbin	6f7c1a8ce6	Fix outdated comment about flags in moduleCreateArgvFromUserFormat (#10781 ) Clearly more than one flag exists, also fixed some typos. Fixes #10776	2022-05-26 17:34:17 +03:00
Valentino Geron	9eb97b5d94	Fix regex support in --only, --skipfile and --skiptest (#10741 ) The regex support was added in: * https://github.com/redis/redis/pull/9352 * https://github.com/redis/redis/pull/9555 * https://github.com/redis/redis/pull/10212 These commits break backword compatiblity with older versions. This fix keeps the test suite infra compatible with old versions by default. However, if you want regex, the string must start with `/`	2022-05-25 18:25:38 +03:00
Binbin	450c88f368	Fix BZMPOP gets unblocked by non-key args and returns them (#10764 ) This bug was introduced in #9484 (7.0.0). It result that BZMPOP blocked on non-key arguments. Like `bzmpop 0 1 myzset min count 10`, this command will additionally block in these keys (except for the first and the last argument) and can return their values: - 0: timeout value - 1: numkeys value - min: min/max token - count: count token	2022-05-23 14:15:54 +03:00
Oran Agra	b0e18f804d	Scripts that declare the `no-writes` flag are implicitly `allow-oom` too. (#10699 ) Scripts that have the `no-writes` flag, cannot execute write commands, and since all `deny-oom` commands are write commands, we now act as if the `allow-oom` flag is implicitly set for scripts that set the `no-writes` flag. this also implicitly means that the EVAL_RO and FCALL_RO commands can never fails with OOM error. Note about a bug that's no longer relevant: There was an issue with EVAL_RO using shebang not being blocked correctly in OOM state: When an EVAL script declares a shebang, it was by default not allowed to run in OOM state. but this depends on a flag that is updated before the command is executed, which was not updated in case of the `_RO` variants. the result is that if the previous cached state was outdated (either true or false), the script will either unjustly fail with OOM, or unjustly allowed to run despite the OOM state. It doesn't affect scripts without a shebang since these depend on the actual commands they run, and since these are only read commands, they don't care for that cached oom state flag. it did affect scripts with shebang and no allow-oom flag, bug after the change in this PR, scripts that are run with eval_ro would implicitly have that flag so again the cached state doesn't matter. p.s. this isn't a breaking change since all it does is allow scripts to run when they should / could rather than blocking them.	2022-05-22 16:02:59 +03:00
Wen Hui	135998ed8d	Update comments on command args, and a misleading error reply (#10645 ) Updated the comments for: info command lmpopCommand and blmpopCommand sinterGenericCommand Fix the missing "key" words in the srandmemberCommand function For LPOS command, when rank is 0, prompt user that rank could be positive number or negative number, and add a test for it	2022-05-13 17:55:49 +03:00
Binbin	586a16ad79	Fix race in module fork kill test (#10717 ) The purpose of the test is to kill the child while it is running. From the last two lines we can see the child exits before being killed. ``` - Module fork started pid: 56998 * <fork> fork child started - Killing running module fork child: 56998 * <fork> fork child exiting signal-handler (1652267501) Received SIGUSR1 in child, exiting now. ``` In this commit, we pass an argument to `fork.create` indicating how long it should sleep. For the fork kill test, we use a longer time to avoid the child exiting before being killed. Other changes: use wait_for_condition instead of hardcoded `after 250`. Unify the test for failing fork with the one for killing it (save time)	2022-05-12 20:10:38 +03:00
Binbin	bfbb15f75d	redis-server command line arguments support take one bulk string with spaces for MULTI_ARG configs parsing. And allow options value to use the -- prefix (#10660 ) ## Take one bulk string with spaces for MULTI_ARG configs parsing Currently redis-server looks for arguments that start with `--`, and anything in between them is considered arguments for the config. like: `src/redis-server --shutdown-on-sigint nosave force now --port 6380` MULTI_ARG configs behave differently for CONFIG command, vs the command line argument for redis-server. i.e. CONFIG command takes one bulk string with spaces in it, while the command line takes an argv array with multiple values. In this PR, in config.c, if `argc > 1` we can take them as is, and if the config is a `MULTI_ARG` and `argc == 1`, we will split it by spaces. So both of these will be the same: ``` redis-server --shutdown-on-sigint nosave force now --shutdown-on-sigterm nosave force redis-server --shutdown-on-sigint nosave "force now" --shutdown-on-sigterm nosave force redis-server --shutdown-on-sigint nosave "force now" --shutdown-on-sigterm "nosave force" ``` ## Allow options value to use the `--` prefix Currently it decides to switch to the next config, as soon as it sees `--`, even if there was not a single value provided yet to the last config, this makes it impossible to define a config value that has `--` prefix in it. For instance, if we want to set the logfile to `--my--log--file`, like `redis-server --logfile --my--log--file --loglevel verbose`, current code will handle that incorrectly. In this PR, now we allow a config value that has `--` prefix in it. But note that something like `redis-server --some-config --config-value1 --config-value2 --loglevel debug` would not work, because if you want to pass a value to a config starting with `--`, it can only be a single value. like: `redis-server --some-config "--config-value1 --config-value2" --loglevel debug` An example (using `--` prefix config value): ``` redis-server --logfile --my--log--file --loglevel verbose redis-cli config get logfile loglevel 1) "loglevel" 2) "verbose" 3) "logfile" 4) "--my--log--file" ``` ### Potentially breaking change `redis-server --save --loglevel verbose` used to work the same as `redis-server --save "" --loglevel verbose` now, it'll error!	2022-05-11 11:33:35 +03:00
Binbin	783b210db4	FLUSHDB and FLUSHALL add call forceCommandPropagation / FLUSHALL reset dirty counter to 0 if we enable save (#10691 ) ## FLUSHALL We used to restore the dirty counter after `rdbSave` zeroed it if we enable save. Otherwise FLUSHALL will not be replicated nor put into the AOF. And then we do increment it again below. Without that extra dirty++, when db was already empty, FLUSHALL will not be replicated nor put into the AOF. We now gonna replace all that dirty counter magic with a call to forceCommandPropagation (REPL and AOF), instead of all the messing around with the dirty counter. Added tests to cover three part (dirty counter, REPL, AOF). One benefit other than cleaner code is that the `rdb_changes_since_last_save` is correct in this case. ## FLUSHDB FLUSHDB was not replicated nor put into the AOF when db was already empty. Unlike DEL on a non-existing key, FLUSHDB always does something, and that's to call the module hook. So basically FLUSHDB is never a NOP, and thus it should always be propagated. Not doing that, could mean that if a module does something in that hook, and wants to avoid issues of that hook being missing on the replica if the db is empty, it'll need to do complicated things. So now FLUSHDB add call forceCommandPropagation, we will always propagate FLUSHDB. Always propagating FLUSHDB seems like a safe approach that shouldn't have any drawbacks (other than looking odd) This was mentioned in #8972 ## Test section: We actually found it while solving a race condition in the BGSAVE test (other.tcl). It was found in extra_ci Daily Arm64 (test-libc-malloc). ``` [exception]: Executing test client: ERR Background save already in progress. ERR Background save already in progress ``` It look like `r flushdb` trigger (schedule) a bgsave right after `waitForBgsave r` and before `r save`. Changing flushdb to flushall, FLUSHALL will do a foreground save and then set the dirty counter to 0.	2022-05-11 11:21:16 +03:00
Meir Shpilraien (Spielrein)	442e73ea09	Fix #10705 , avoid relinking the same library twice. (#10706 ) Set `old_li` to NULL to avoid linking it again on error. Before the fix, loading an already existing library will cause the existing library to be added again. This cause not harm other then wrong statistics. The statistics that are effected by the issue are: * `libraries_count` and `functions_count` returned by `function stats` command * `used_memory_functions` returned on `info memory` command * `functions.caches` returned on `memory stats` command	2022-05-10 11:47:45 +03:00
Oran Agra	2bcd890d8a	Fix --save command line regression in redis 7.0.0 (#10690 ) Unintentional change in #9644 (since RC1) meant that an empty `--save ""` config from command line, wouldn't have clear any setting from the config file Added tests to cover that, and improved test infra to take additional command line args for redis-server	2022-05-09 13:37:49 +03:00
Oran Agra	eb915a82a5	Bug fixes for enum configs with overlapping bit flags (module API) (#10661 ) If we want to support bits that can be overlapping, we need to make sure that: 1. we don't use the same bit for two return values. 2. values should be sorted so that prefer ones (matching more bits) come first.	2022-05-09 13:36:53 +03:00
Lu JJ	87131a5fa6	fast path when SDIFF command has the same key as the first key (#10663 ) When user uses the same input key for SDIFF as the first one, the result must be empty, so we don't need to process the elements to test. This method is like the one done in zset‘s `zsetChooseDiffAlgorithm` Co-authored-by: Oran Agra <oran@redislabs.com>	2022-05-02 16:18:11 +03:00
meir	efa162bcd7	Protect any table which is reachable from globals and added globals white list. The white list is done by setting a metatable on the global table before initializing any library. The metatable set the `__newindex` field to a function that check the white list before adding the field to the table. Fields which is not on the white list are simply ignored. After initialization phase is done we protect the global table and each table that might be reachable from the global table. For each table we also protect the table metatable if exists.	2022-04-27 00:37:40 +03:00
meir	3731580b6b	Protect globals of both evals scripts and functions. Use the new `lua_enablereadonlytable` Lua API to protect the global tables of both evals scripts and functions. For eval scripts, the implemetation is easy, We simply call `lua_enablereadonlytable` on the global table to turn it into a readonly table. On functions its more complecated, we want to be able to switch globals between load run and function run. To achieve this, we create a new empty table that acts as the globals table for function, we control the actual globals using metatable manipulation. Notice that even if the user gets a pointer to the original tables, all the tables are set to be readonly (using `lua_enablereadonlytable` Lua API) so he can not change them. The following inlustration better explain the solution: ``` Global table {} <- global table metatable {.__index = __real_globals__} ``` The `__real_globals__` is set depends on the run context (function load or function call). Why this solution is needed and its not enough to simply switch globals? When we run in the context of function load and create our functions, our function gets the current globals that was set when they were created. Replacing the globals after the creation will not effect them. This is why this trick it mandatory.	2022-04-27 00:37:40 +03:00
Oran Agra	8192625458	Add module API flag for using enum configs as bit flags (#10643 ) Enables registration of an enum config that'll let the user pass multiple keywords that will be combined with `\|` as flags into the integer config value. ``` const char *enum_vals[] = {"none", "one", "two", "three"}; const int int_vals[] = {0, 1, 2, 4}; if (RedisModule_RegisterEnumConfig(ctx, "flags", 3, REDISMODULE_CONFIG_DEFAULT \| REDISMODULE_CONFIG_BITFLAGS, enum_vals, int_vals, 4, getFlagsConfigCommand, setFlagsConfigCommand, NULL, NULL) == REDISMODULE_ERR) { return REDISMODULE_ERR; } ``` doing: `config set moduleconfigs.flags "two three"` will result in 6 being passed to`setFlagsConfigCommand`.	2022-04-26 20:29:20 +03:00
chenyang8094	46ec6ad98e	Fix bug when AOF enabled after startup. put the new incr file in the manifest only when AOFRW is done. (#10616 ) Changes: - When AOF is enabled after startup, the data accumulated during `AOF_WAIT_REWRITE` will only be stored in a temp INCR AOF file. Only after the first AOFRW is successful, we will add it to manifest file. Before this fix, the manifest referred to the temp file which could cause a restart during that time to load it without it's base. - Add `aof_rewrites_consecutive_failures` info field for aofrw limiting implementation. Now we can guarantee that these behaviors of MP-AOF are the same as before (past redis releases): - When AOF is enabled after startup, the data accumulated during `AOF_WAIT_REWRITE` will only be stored in a visible place. Only after the first AOFRW is successful, we will add it to manifest file. - When disable AOF, we did not delete the AOF file in the past so there's no need to change that behavior now (yet). - When toggling AOF off and then on (could be as part of a full-sync), a crash or restart before the first rewrite is completed, would result with the previous version being loaded (might not be right thing, but that's what we always had).	2022-04-26 16:31:19 +03:00
Eduardo Semprebon	3a1d14259d	Allow configuring signaled shutdown flags (#10594 ) The SHUTDOWN command has various flags to change it's default behavior, but in some cases establishing a connection to redis is complicated and it's easier for the management software to use signals. however, so far the signals could only trigger the default shutdown behavior. Here we introduce the option to control shutdown arguments for SIGTERM and SIGINT. New config options: `shutdown-on-sigint [nosave \| save] [now] [force]` `shutdown-on-sigterm [nosave \| save] [now] [force]` Implementation: Support MULTI_ARG_CONFIG on createEnumConfig to support multiple enums to be applied as bit flags. Co-authored-by: Oran Agra <oran@redislabs.com>	2022-04-26 14:34:04 +03:00
Madelyn Olson	6fa8e4f7af	Set replicas to panic on disk errors, and optionally panic on replication errors (#10504 ) * Till now, replicas that were unable to persist, would still execute the commands they got from the master, now they'll panic by default, and we add a new `replica-ignore-disk-errors` config to change that. * Till now, when a command failed on a replica or AOF-loading, it only logged a warning and a stat, we add a new `propagation-error-behavior` config to allow panicking in that state (may become the default one day) Note that commands that fail on the replica can either indicate a bug that could cause data inconsistency between the replica and the master, or they could be in some cases (specifically in previous versions), a result of a command (e.g. EVAL) that failed on the master, but still had to be propagated to fail on the replica as well.	2022-04-26 13:25:33 +03:00
Madelyn Olson	efcd1bf394	By default prevent cross slot operations in functions and scripts with # (#10615 ) Adds the `allow-cross-slot-keys` flag to Eval scripts and Functions to allow scripts to access keys from multiple slots. The default behavior is now that they are not allowed to do that (unlike before). This is a breaking change for 7.0 release candidates (to be part of 7.0.0), but not for previous redis releases since EVAL without shebang isn't doing this check. Note that the check is done on both the keys declared by the EVAL / FCALL command arguments, and also the ones used by the script when making a `redis.call`. A note about the implementation, there seems to have been some confusion about allowing access to non local keys. I thought I missed something in our wider conversation, but Redis scripts do block access to non-local keys. So the issue was just about cross slots being accessed.	2022-04-26 12:09:21 +03:00
guybe7	df787764e3	Fix regression not aborting transaction on error, and re-edit some error responses (#10612 ) 1. Disk error and slave count checks didn't flag the transactions or counted correctly in command stats (regression from #10372 , 7.0 RC3) 2. RM_Call will reply the same way Redis does, in case of non-exisitng command or arity error 3. RM_WrongArtiy will consider the full command name 4. Use lowercase 'u' in "unknonw subcommand" (to align with "unknown command") Followup work of #10127	2022-04-25 13:08:13 +03:00
guybe7	21e39ec461	Test: RM_Call from within "expired" notification (#10613 ) This case is interesting because it originates from cron, rather than from another command. The idea came from looking at #9890 and #10573, and I was wondering if RM_Call would work properly when `server.current_client == NULL`	2022-04-25 13:05:06 +03:00
Yossi Gottlieb	bd823c7fa3	Run large-memory tests as solo. (#10626 ) This avoids random memory spikes and enables --large-memory tests to run on moderately sized systems.	2022-04-24 17:29:35 +03:00
Binbin	a6b3ce28a8	Fix timing issue in slowlog redact test (#10614 ) * Fix timing issue in slowlog redact test This test failed once in my daily CI (test-sanitizer-address (clang)) ``` *** [err]: SLOWLOG - Some commands can redact sensitive fields in tests/unit/slowlog.tcl Expected 'migrate 127.0.0.1 25649 key 9 5000 AUTH2 (redacted) (redacted)' to match '* key 9 5000 AUTH (redacted)' (context: type eval line 12 cmd {assert_match {* key 9 5000 AUTH (redacted)} [lindex [lindex [r slowlog get] 1] 3]} proc ::test) ``` The reason is that with slowlog-log-slower-than 10000, slowlog get will have a chance to exceed 10ms. Change slowlog-log-slower-than from 10000 to -1, disable it. Also handles a same potentially problematic test above. This is actually the same timing issue as #10432. But also avoid repeated calls to `SLOWLOG GET`	2022-04-24 12:16:30 +03:00
David CARLIER	aba2865c86	Add socket-mark-id support for marking sockets. (#10349 ) Add a configuration option to attach an operating system-specific identifier to Redis sockets, supporting advanced network configurations using iptables (Linux) or ipfw (FreeBSD).	2022-04-20 09:29:37 +03:00
Oran Agra	a1c85eebf4	Tests: improve skip tags around maxmemory and resp3 (#10597 ) some skip tags where missing on some tests....	2022-04-19 14:57:28 +03:00
sundb	1a93804645	Return 0 when config set out-of-range oom-score-adj-values (#10601 ) When oom-score-adj-values is out of range, setConfigOOMScoreAdjValuesOption should return 0, not -1, otherwise it will be considered as success.	2022-04-19 11:31:15 +03:00
Oran Agra	7d1ad6ca96	Fix RM_Yield bug processing future commands of the current client. (#10573 ) RM_Yield was missing a call to protectClient to prevent redis from processing future commands of the yielding client. Adding tests that fail without this fix. This would be complicated to solve since nested calls to RM_Call used to replace the current_client variable with the module temp client. It looks like it's no longer necessary to do that, since it was added back in #9890 to solve two issues, both already gone: 1. call to CONFIG SET maxmemory could trigger a module hook calling RM_Call. although this specific issue is gone, arguably other hooks like keyspace notification, can do the same. 2. an assertion in lookupKey that checks the current command of the current client, introduced in #9572 and removed in #10248	2022-04-18 14:56:00 +03:00
guybe7	f49ff156ec	Add RM_PublishMessageShard (#10543 ) since PUBLISH and SPUBLISH use different dictionaries for channels and clients, and we already have an API for PUBLISH, it only makes sense to have one for SPUBLISH Add test coverage and unifying some test infrastructure.	2022-04-17 15:43:22 +03:00
Meir Shpilraien (Spielrein)	789c94fece	Added test to verify loading Lua binary payload is not possible (#10583 ) The tests verify that loading a binary payload to the Lua interpreter raises an error. The Lua code modification was done here: `fdf9d45509` which force the Lau interpreter to always use the text parser.	2022-04-17 15:28:50 +03:00
guybe7	fe1c096b18	Add RM_MallocSizeString, RM_MallocSizeDict (#10542 ) Add APIs to allow modules to compute the memory consumption of opaque objects owned by redis. Without these, the mem_usage callbacks of module data types are useless in many cases. Other changes: Fix streamRadixTreeMemoryUsage to include the size of the rax structure itself	2022-04-17 08:31:57 +03:00
Madelyn Olson	effa707e9d	Fix incorrect error code for eval scripts and fix test error checking (#10575 ) By the convention of errors, there is supposed to be a space between the code and the name. While looking at some lua stuff I noticed that interpreter errors were not adding the space, so some clients will try to map the detailed error message into the error. We have tests that hit this condition, but they were just checking that the string "starts" with ERR. I updated some other tests with similar incorrect string checking. This isn't complete though, as there are other ways we check for ERR I didn't fix. Produces some fun output like: ``` # Errorstats errorstat_ERR:count=1 errorstat_ERRuser_script_1_:count=1 ```	2022-04-14 11:18:32 +03:00
Oran Agra	95050f2683	solve corrupt dump fuzzer crash in streams (#10579 ) we had a panic in streamLastValidID when the stream metadata said it's not empty, but the rax is empty.	2022-04-14 08:29:35 +03:00
Luke Palmer	bb7891f080	Keyspace event for new keys (#10512 ) Add an optional keyspace event when new keys are added to the db. This is useful for applications where clients need to be aware of the redis keyspace. Such an application can SCAN once at startup and then listen for "new" events (plus others associated with DEL, RENAME, etc).	2022-04-13 11:36:38 +03:00
Madelyn Olson	8bd01a07ae	Allow specifying ACL reason for module log entry (#10559 ) Allow specifying an ACL log reason, which is shown in the log. Right now it always shows "unknown", which is a little bit cryptic. This is a breaking change, but this API was added as part of 7 so it seems ok to stabilize it still.	2022-04-11 22:16:17 -07:00
guybe7	719db14ec7	COMMAND DOCS shows module name, where applicable (#10544 ) Add field to COMMAND DOCS response to denote the name of the module that added that command. COMMAND LIST can filter by module, but if you get the full commands list, you may still wanna know which command belongs to which module. The alternative would be to do MODULE LIST, and then multiple calls to COMMAND LIST	2022-04-10 11:41:31 +03:00
Oran Agra	451531f1c8	Fix RM_Yield bug (#10548 ) The bug was when using REDISMODULE_YIELD_FLAG_CLIENTS. in that case we would have only set the CLIENTS type flag in server.busy_module_yield_flags and then clear that flag when exiting RM_Yield, so we would never call unblockPostponedClients when the context is destroyed. This didn't really have any actual implication, which is why the tests couldn't (and still can't) find that since the bug only happens when using CLIENT, but in this case we won't have any clients to un-postpone i.e. clients will get rejected with BUSY error, rather than being postponed. Unrelated: * Adding tests for nested contexts, just in case. * Avoid nested RM_Yield calls	2022-04-07 11:52:28 +03:00
Lu JJ	f110de4b23	Fix the bug that caused hash encoding errors when using hincrbyfloat or hincrby commands (#10479 ) Fixed a bug that used the `hincrbyfloat` or `hincrby` commands to make the field or value exceed the `hash_max_listpack_value` but did not change the object encoding of the hash structure. Add a length check for field and value, check the length of value first, if the length of value does not exceed `hash_max_listpack_value` then check the length of field. If the length of field or value is too long, it will reduce the efficiency of listpack, and the object encoding will become hashtable after AOF restart, so this is also to keep the same before and after AOF restart.	2022-04-05 21:45:45 +03:00
judeng	8a7049d363	use $^ instead of $< for linker in module makefile (#10530 )	2022-04-05 17:08:27 +03:00
Moti Cohen	e342bedc83	Stabilize Sentinel tests - refine failover-timeout & tilt-period (#10518 ) Sentinel once in a while experience Sentinel TILT period or leader election failure cycle. The problem is that those default timeout are too big and once it happens, it breaks our tests. Suggesting: - Reducing failover-timeout from 20 to 10sec (actually it is multiplied by 2 and reach 40sec of timeout) - Modify tilt-period from default of 30sec to 5sec. When TILT period happens it might lead to failover in our tests, and might cause also to failover cycle cycle failure. Sentinel tests should `wait_for_condition` up to 50seconds, where needed, to be stable in case having single TILT period or failover failure cycle. In addition relax timing configuration for "manual failover" Sentinel test (was modified several months ago as part of an effort to reduce tests runtime)	2022-04-05 17:07:59 +03:00
Meir Shpilraien (Spielrein)	ae020e3d56	Functions: Move library meta data to be part of the library payload. (#10500 ) ## Move library meta data to be part of the library payload. Following the discussion on https://github.com/redis/redis/issues/10429 and the intention to add (in the future) library versioning support, we believe that the entire library metadata (like name and engine) should be part of the library payload and not provided by the `FUNCTION LOAD` command. The reasoning behind this is that the programmer who developed the library should be the one who set those values (name, engine, and in the future also version). It is not the responsibility of the admin who load the library into the database. The PR moves all the library metadata (engine and function name) to be part of the library payload. The metadata needs to be provided on the first line of the payload using the shebang format (`#!<engine> name=<name>`), example: ```lua #!lua name=test redis.register_function('foo', function() return 1 end) ``` The above script will run on the Lua engine and will create a library called `test`. ## API Changes (compare to 7.0 rc2) * `FUNCTION LOAD` command was change and now it simply gets the library payload and extract the engine and name from the payload. In addition, the command will now return the function name which can later be used on `FUNCTION DELETE` and `FUNCTION LIST`. * The description field was completely removed from`FUNCTION LOAD`, and `FUNCTION LIST` ## Breaking Changes (compare to 7.0 rc2) * Library description was removed (we can re-add it in the future either as part of the shebang line or an additional line). * Loading an AOF file that was generated by either 7.0 rc1 or 7.0 rc2 will fail because the old command syntax is invalid. ## Notes * Loading an RDB file that was generated by rc1 / rc2 is supported, Redis will automatically add the shebang to the libraries payloads (we can probably delete that code after 7.0.3 or so since there's no need to keep supporting upgrades from an RC build).	2022-04-05 10:27:24 +03:00
judeng	9578b67e0e	delete obsolete REDISMODULE_EXPERIMENTAL_API define in module demos (#10527 ) This macro was recently removed from redismodule.h, so no longer needed.	2022-04-05 08:21:41 +03:00
Meir Shpilraien (Spielrein)	047b609335	Fix #10508 , on error, pop function and error handler from Lua stack. (#10519 ) If, for some reason, Redis decides not to execute the script, we need to pop the function and error handler from Lua stack. Otherwise, eventually the Lua stack will explode. Relevant only for 7.0-rc1 and 7.0-rc2.	2022-04-04 10:58:59 +03:00
Moti Cohen	37beb5e67e	Fix sentinel ACL test. Timing issue. (#10510 ) Fix by replacing in test blind sleep with wait_for_condition(). Co-authored-by: moticless <moticless@github.com>	2022-04-03 10:56:15 +03:00
Viktor Söderqvist	b53c7f2c0b	Turn into replica on SETSLOT (#10489 ) * Fix race condition where node loses its last slot and turns into replica When a node has lost its last slot and finds out from the SETSLOT command before the cluster bus PONG from the new owner arrives. In this case, the node didn't turn itself into a replica of the new slot owner. This commit adds the same logic to the SETSLOT command as already exists for the cluster bus PONG processing. * Revert "Fix new / failing cluster slot migration test (#10482)" This reverts commit `0b21ef8d49`. In this test, the old slot owner finds out that it has lost its last slot in a nondeterministic way. Either the cluster bus PONG from the new slot owner and sometimes in a SETSLOT command from redis-cli. In both cases, the result should be the same and the old owner should turn itself into a replica of the new slot owner.	2022-04-02 14:58:07 -07:00
sundb	b8eb2a7340	Fix failing moduleconfigs tests and memory leak (#10501 ) Fix global `strval` not reset to NULL after being freed, causing a crash on alpine (most likely because the dynamic library loader doesn't init globals on reload) By the way, fix the memory leak of using `RedisModule_Free` to free `RedisModuleString`, and add a corresponding test.	2022-03-31 15:26:10 +03:00
Madelyn Olson	e81bd15e99	Prevent replica failover during manual takeover test (#10499 ) During 11-manual-takeover.tcl, if the killing of the instances happens too slowly, one of the replicas might be able to promote itself. I'm not sure why it was slow, but it was observed taking 6 seconds which is enough time to do an election. I was able to verify the error locally by adding a small delay (1 second) during ASAN CI. A fix is just to disable automated failover until all the nodes are confirmed dead.	2022-03-31 08:15:00 +03:00
Binbin	a3075ca4fe	Fix cluster slot migration test (#10495 ) Fix three timing issues in the test	2022-03-30 20:14:21 -07:00
Binbin	6075f50663	Move restart_killed_instances and verify_sentinel_auto_discovery to utils (#10497 ) Create a utils.tcl in sentinel/tests/includes, and move two procs to it. Allow sentinel test 08-hostname-conf run on its own.	2022-03-30 20:42:51 +03:00
Nick Chun	bda9d74dad	Module Configurations (#10285 ) This feature adds the ability to add four different types (Bool, Numeric, String, Enum) of configurations to a module to be accessed via the redis config file, and the CONFIG command. Configuration Names: We impose a restriction that a module configuration always starts with the module name and contains a '.' followed by the config name. If a module passes "config1" as the name to a register function, it will be registered as MODULENAME.config1. Configuration Persistence: Module Configurations exist only as long as a module is loaded. If a module is unloaded, the configurations are removed. There is now also a minimal core API for removal of standardConfig objects from configs by name. Get and Set Callbacks: Storage of config values is owned by the module that registers them, and provides callbacks for Redis to access and manipulate the values. This is exposed through a GET and SET callback. The get callback returns a typed value of the config to redis. The callback takes the name of the configuration, and also a privdata pointer. Note that these only take the CONFIGNAME portion of the config, not the entire MODULENAME.CONFIGNAME. ``` typedef RedisModuleString * (RedisModuleConfigGetStringFunc)(const char name, void privdata); typedef long long (RedisModuleConfigGetNumericFunc)(const char name, void privdata); typedef int (RedisModuleConfigGetBoolFunc)(const char name, void privdata); typedef int (RedisModuleConfigGetEnumFunc)(const char name, void privdata); ``` Configs must also must specify a set callback, i.e. what to do on a CONFIG SET XYZ 123 or when loading configurations from cli/.conf file matching these typedefs. name is again just the CONFIGNAME portion, val is the parsed value from the core, privdata is the registration time privdata pointer, and err is for providing errors to a client. ``` typedef int (RedisModuleConfigSetStringFunc)(const char name, RedisModuleString val, void privdata, RedisModuleString *err); typedef int (RedisModuleConfigSetNumericFunc)(const char name, long long val, void privdata, RedisModuleString *err); typedef int (RedisModuleConfigSetBoolFunc)(const char name, int val, void privdata, RedisModuleString *err); typedef int (RedisModuleConfigSetEnumFunc)(const char name, int val, void privdata, RedisModuleString *err); ``` Modules can also specify an optional apply callback that will be called after value(s) have been set via CONFIG SET: ``` typedef int (RedisModuleConfigApplyFunc)(RedisModuleCtx ctx, void privdata, RedisModuleString err); ``` Flags:** We expose 7 new flags to the module, which are used as part of the config registration. ``` #define REDISMODULE_CONFIG_MODIFIABLE 0 /* This is the default for a module config. / #define REDISMODULE_CONFIG_IMMUTABLE (1ULL<<0) / Can this value only be set at startup? / #define REDISMODULE_CONFIG_SENSITIVE (1ULL<<1) / Does this value contain sensitive information / #define REDISMODULE_CONFIG_HIDDEN (1ULL<<4) / This config is hidden in `config get <pattern>` (used for tests/debugging) / #define REDISMODULE_CONFIG_PROTECTED (1ULL<<5) / Becomes immutable if enable-protected-configs is enabled. / #define REDISMODULE_CONFIG_DENY_LOADING (1ULL<<6) / This config is forbidden during loading. / / Numeric Specific Configs / #define REDISMODULE_CONFIG_MEMORY (1ULL<<7) / Indicates if this value can be set as a memory value / ``` Module Registration APIs: ``` int (RedisModule_RegisterBoolConfig)(RedisModuleCtx ctx, char name, int default_val, unsigned int flags, RedisModuleConfigGetBoolFunc getfn, RedisModuleConfigSetBoolFunc setfn, RedisModuleConfigApplyFunc applyfn, void privdata); int (RedisModule_RegisterNumericConfig)(RedisModuleCtx ctx, const char name, long long default_val, unsigned int flags, long long min, long long max, RedisModuleConfigGetNumericFunc getfn, RedisModuleConfigSetNumericFunc setfn, RedisModuleConfigApplyFunc applyfn, void privdata); int (RedisModule_RegisterStringConfig)(RedisModuleCtx ctx, const char name, const char default_val, unsigned int flags, RedisModuleConfigGetStringFunc getfn, RedisModuleConfigSetStringFunc setfn, RedisModuleConfigApplyFunc applyfn, void privdata); int (RedisModule_RegisterEnumConfig)(RedisModuleCtx ctx, const char name, int default_val, unsigned int flags, const char enum_values, const int int_values, int num_enum_vals, RedisModuleConfigGetEnumFunc getfn, RedisModuleConfigSetEnumFunc setfn, RedisModuleConfigApplyFunc applyfn, void privdata); int (RedisModule_LoadConfigs)(RedisModuleCtx ctx); ``` The module name will be auto appended along with a "." to the front of the name of the config. What RM_Register[...]Config does: A RedisModule struct now keeps a list of ModuleConfig objects which look like: ``` typedef struct ModuleConfig { sds name; / Name of config without the module name appended to the front / void privdata; /* Optional data passed into the module config callbacks / union get_fn { / The get callback specificed by the module / RedisModuleConfigGetStringFunc get_string; RedisModuleConfigGetNumericFunc get_numeric; RedisModuleConfigGetBoolFunc get_bool; RedisModuleConfigGetEnumFunc get_enum; } get_fn; union set_fn { / The set callback specified by the module / RedisModuleConfigSetStringFunc set_string; RedisModuleConfigSetNumericFunc set_numeric; RedisModuleConfigSetBoolFunc set_bool; RedisModuleConfigSetEnumFunc set_enum; } set_fn; RedisModuleConfigApplyFunc apply_fn; RedisModule module; } ModuleConfig; ``` It also registers a standardConfig in the configs array, with a pointer to the ModuleConfig object associated with it. What happens on a CONFIG GET/SET MODULENAME.MODULECONFIG: For CONFIG SET, we do the same parsing as is done in config.c and pass that as the argument to the module set callback. For CONFIG GET, we call the module get callback and return that value to config.c to return to a client. CONFIG REWRITE: Starting up a server with module configurations in a .conf file but no module load directive will fail. The flip side is also true, specifying a module load and a bunch of module configurations will load those configurations in using the module defined set callbacks on a RM_LoadConfigs call. Configs being rewritten works the same way as it does for standard configs, as the module has the ability to specify a default value. If a module is unloaded with configurations specified in the .conf file those configurations will be commented out from the .conf file on the next config rewrite. RM_LoadConfigs: `RedisModule_LoadConfigs(RedisModuleCtx ctx);` This last API is used to make configs available within the onLoad() after they have been registered. The expected usage is that a module will register all of its configs, then call LoadConfigs to trigger all of the set callbacks, and then can error out if any of them were malformed. LoadConfigs will attempt to set all configs registered to either a .conf file argument/loadex argument or their default value if an argument is not specified. LoadConfigs is a required function if configs are registered. * Also note that LoadConfigs does not call the apply callbacks, but a module can do that directly after the LoadConfigs call. New Command: MODULE LOADEX [CONFIG NAME VALUE] [ARGS ...]: This command provides the ability to provide startup context information to a module. LOADEX stands for "load extended" similar to GETEX. Note that provided config names need the full MODULENAME.MODULECONFIG name. Any additional arguments a module might want are intended to be specified after ARGS. Everything after ARGS is passed to onLoad as RedisModuleString **argv. Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Madelyn Olson <matolson@amazon.com> Co-authored-by: sundb <sundbcn@gmail.com> Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com> Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>	2022-03-30 15:47:06 +03:00
Oran Agra	14b198868f	introduce MAX_D2STRING_CHARS instead of 128 const (#10487 ) There are a few places that use a hard coded const of 128 to allocate a buffer for d2string. Replace these with a clear macro. Note that In theory, converting double into string could take as much as nearly 400 chars, but since d2string uses `%g` and not `%f`, it won't pass some 40 chars. unrelated: restore some changes to auto generated commands.c that got accidentally reverted in #10293	2022-03-28 18:35:56 +03:00
Moti Cohen	63f77698cf	Fix sentinel test SDOWN is triggered by non-responding instance (#10484 ) A timing issue of debug sleep master isn't long enough to ensure that master is down and let the test identify it. Replaced the code with suspend PID until verified master-is-down.	2022-03-28 12:40:52 +03:00
Oran Agra	0b21ef8d49	Fix new / failing cluster slot migration test (#10482 ) #10381 fixed an issue in `redis-cli --cluster reshard` that used to fail it (redis-cli) because of a race condition. the race condition is / was that when moving the last slot from a node, sometimes the PONG messages delivering the configuration change arrive to that node before the SETSLOT arrives to it, and it becomes a replica. other times the the SETSLOT arrive first, and then PONG doesn't demote it. however, the PR also added a new test that suffers from exactly the same race condition, and the tests started failing a lot. The fact is (if i understand it correctly), that this test (the one being deleted here), isn't related to the fix that PR fixed (which was to fix redis-cli). The race condition in the cluster code still happens, and as long as we don't solve it, there's no reason to test it. For now, even if my understandings are wrong, i'm gonna delete that failing test, since as far as i understand, #10381 didn't introduce any new risks for that matter (which are gonna be compromised by removing this check), this race existed since forever, and still exists, and the fact that redis-cli is now immune to it is still being tested. Additional work should be carried to fix it, and i live it for other PRs to handle.	2022-03-27 18:39:19 +03:00
Moti Cohen	37d761ba29	Fix Sentinel reconnect test following ACL change (#10480 ) Replace condition with wait_for_condition On "Verify sentinel that restarted failed to reconnect master after ACL change" The reason we reach it, is because the test is fast enough to modify ACL and test sentinel connection status with the server - before its scheduled operation got the chance to update connection status with the server: ``` /* Perform scheduled operations for the specified Redis instance. / void sentinelHandleRedisInstance(sentinelRedisInstance ri) { /* ========== MONITORING HALF ============ / / Every kind of instance */ sentinelReconnectInstance(ri); ```	2022-03-27 17:56:21 +03:00
zhaozhao.zz	78bef6e1fe	optimize(remove) usage of client's pending_querybuf (#10413 ) To remove `pending_querybuf`, the key point is reusing `querybuf`, it means master client's `querybuf` is not only used to parse command, but also proxy to sub-replicas. 1. add a new variable `repl_applied` for master client to record how many data applied (propagated via `replicationFeedStreamFromMasterStream()`) but not trimmed in `querybuf`. 2. don't sdsrange `querybuf` in `commandProcessed()`, we trim it to `repl_applied` after the whole replication pipeline processed to avoid fragmented `sdsrange`. And here are some scenarios we cannot trim to `qb_pos`: * we don't receive complete command from master * master client blocked because of client pause * IO threads operate read, master client flagged with CLIENT_PENDING_COMMAND In these scenarios, `qb_pos` points to the part of the current command or the beginning of next command, and the current command is not applied yet, so the `repl_applied` is not equal to `qb_pos`. Some other notes: * Do not do big arg optimization on master client, since we can only sdsrange `querybuf` after data sent to replicas. * Set `qb_pos` and `repl_applied` to 0 when `freeClient` in `replicationCacheMaster`. * Rewrite `processPendingCommandsAndResetClient` to `processPendingCommandAndInputBuffer`, let `processInputBuffer` to be called successively after `processCommandAndResetClient`.	2022-03-25 10:45:40 +08:00
Meir Shpilraien (Spielrein)	f3855a0930	Add new RM_Call flags for script mode, no writes, and error replies. (#10372 ) The PR extends RM_Call with 3 new capabilities using new flags that are given to RM_Call as part of the `fmt` argument. It aims to assist modules that are getting a list of commands to be executed from the user (not hard coded as part of the module logic), think of a module that implements a new scripting language... * `S` - Run the command in a script mode, this means that it will raise an error if a command which are not allowed inside a script (flaged with the `deny-script` flag) is invoked (like SHUTDOWN). In addition, on script mode, write commands are not allowed if there is not enough good replicas (as configured with `min-replicas-to-write`) and/or a disk error happened. * `W` - no writes mode, Redis will reject any command that is marked with `write` flag. Again can be useful to modules that implement a new scripting language and wants to prevent any write commands. * `E` - Return errors as RedisModuleCallReply. Today the errors that happened before the command was invoked (like unknown commands or acl error) return a NULL reply and set errno. This might be missing important information about the failure and it is also impossible to just pass the error to the user using RM_ReplyWithCallReply. This new flag allows you to get a RedisModuleCallReply object with the relevant error message and treat it as if it was an error that was raised by the command invocation. Tests were added to verify the new code paths. In addition small refactoring was done to share some code between modules, scripts, and `processCommand` function: 1. `getAclErrorMessage` was added to `acl.c` to unified to log message extraction from the acl result 2. `checkGoodReplicasStatus` was added to `replication.c` to check the status of good replicas. It is used on `scriptVerifyWriteCommandAllow`, `RM_Call`, and `processCommand`. 3. `writeCommandsGetDiskErrorMessage` was added to `server.c` to get the error message on persistence failure. Again it is used on `scriptVerifyWriteCommandAllow`, `RM_Call`, and `processCommand`.	2022-03-22 14:13:28 +02:00
guybe7	e82c1aedea	BITSET and BITFIELD SET should propagate even if just length changed (#10459 ) Bug introduced in #9403, caused inconsistency between master and replica in case just the length (i.e. set a high-index bit to 0) changed.	2022-03-21 11:33:27 +02:00
Meir Shpilraien (Spielrein)	2f9cdcd733	Increase function tests timeout (#10458 ) Increase function tests timeout to avoid false failures on slow systems.	2022-03-21 11:00:27 +02:00
Madelyn Olson	557222d1e0	Fix timing issue in shards test and fix displayed TLS port (#10450 )	2022-03-20 22:08:40 -07:00
郭伟光	fae5b1a19d	unblockClient: avoid to reset client when the client was shutdown-blocked (#10440 ) fix #10439. see https://github.com/redis/redis/pull/9872 When executing SHUTDOWN we pause the client so we can un-pause it if the shutdown fails. this could happen during the timeout, if the shutdown is aborted, but could also happen from withing the initial `call()` to shutdown, if the rdb save fails. in that case when we return to `call()`, we'll crash if `c->cmd` has been set to NULL. The call stack is: ``` unblockClient(c) replyToClientsBlockedOnShutdown() cancelShutdown() finishShutdown() prepareForShutdown() shutdownCommand() ``` what's special about SHUTDOWN in that respect is that it can be paused, and then un-paused before the original `call()` returns. tests where added for both failed shutdown, and a followup successful one.	2022-03-20 15:18:53 +02:00
sundb	b9656adbd9	Restore ::singledb after cluster test (#10441 ) When ::singledb is 0, we will use db 9 for the test db. Since ::singledb is set to 1 in the cluster-related tests, but not restored, some subsequent tests associated with db 9 will fail.	2022-03-18 14:10:24 +02:00
Madelyn Olson	e8771efda9	Fixed incorrect parsing of hostname information from nodes.conf (#10435 )	2022-03-16 14:07:24 -07:00
Viktor Söderqvist	69017fa232	Fix redis-cli CLUSTER SETSLOT race conditions (#10381 ) After migrating a slot, send CLUSTER SETSLOT NODE to the destination node first to make sure the slot isn't left without an owner in case the destination node crashes before it is set as new owner. When informing the source node, it can happen that the destination node has already informed it and if the source node has lost its last slot, it has already turned itself into a replica. Redis-cli should ignore this error in this case.	2022-03-16 10:11:38 -07:00
Binbin	61b7e5916d	Fix module redact test for valgrind (#10432 ) The new module redact test will fail with valgrind: ``` [err]: modules can redact arguments in tests/unit/moduleapi/auth.tcl Expected 'slowlog reset' to be equal to 'auth.redact 1 (redacted) 3 (redacted)' (context: type eval line 12 cmd {assert_equal {slowlog reset} [lindex [lindex [r slowlog get] 2] 3]} proc ::test) ``` The reason is that with `slowlog-log-slower-than 10000`, `slowlog get` will have a chance to exceed 10ms. Made two changes to avoid failure: 1. change `slowlog-log-slower-than` from 10000 to -1, distable it. 2. assert to use the previous execution result. In theory, the second one can actually be left unchanged, but i think it will be better if it is changed.	2022-03-16 08:53:57 +02:00
Harkrishn Patro	45ccae89bb	Add new cluster shards command (#10293 ) Implement a new cluster shards command, which provides a flexible and extensible API for topology discovery. Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>	2022-03-15 18:24:40 -07:00
Madelyn Olson	416c9ac2ef	Add module API for redacting command arguments (#10425 ) Add module API for redacting client commands	2022-03-15 18:21:13 -07:00
ranshid	1078e30c5f	make sort/ro commands validate external keys access patterns (#10106 ) (#10340 ) Currently the sort and sort_ro can access external keys via `GET` and `BY` in order to make sure the user cannot violate the authorization ACL rules, the decision is to reject external keys access patterns unless ACL allows SORT full access to all keys. I.e. for backwards compatibility, SORT with GET/BY keeps working, but if ACL has restrictions to certain keys, these features get permission denied. ### Implemented solution We have discussed several potential solutions and decided to only allow the GET and BY arguments when the user has all key permissions with the SORT command. The reasons being that SORT with GET or BY is problematic anyway, for instance it is not supported in cluster mode since it doesn't declare keys, and we're not sure the combination of that feature with ACL key restriction is really required. HOWEVER If in the fullness of time we will identify a real need for fine grain access support for SORT, we would implement the complete solution which is the alternative described below. ### Alternative (Completion solution): Check sort ACL rules after executing it and before committing output (either via store or to COB). it would require making several changes to the sort command itself. and would potentially cause performance degradation since we will have to collect all the get keys instead of just applying them to a temp array and then scan the access keys against the ACL selectors. This solution can include an optimization to avoid the overheads of collecting the key names, in case the ACL rules grant SORT full key-access, or if the ACL key pattern literal matches the one used in GET/BY. It would also mean that authorization would be O(nlogn) since we will have to complete most of the command execution before we can perform verification Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Oran Agra <oran@redislabs.com>	2022-03-15 17:14:53 +02:00
Binbin	871fa12fec	Sentinel: fix reconnect test timing issue (#10424 ) We need to wait for `sentinelTimer` to kick in, and then trigger the reconnect. As for another change, we should better call `server_set_password` before calling SENTINEL SET auth-pass. Fixes problem introeuced in #10400	2022-03-14 11:13:14 +02:00
Moti Cohen	a6bf509810	Sentinel: fix no reconnect after auth-pass is changed (#10400 ) When updating SENTINEL with master’s new password (command: `SENTINEL SET mymaster auth-pass some-new-password`), sentinel might still keep the old connection and avoid reconnecting with the new password. This is because of wrong logic that traces the last ping (pong) time to servers. In fact it worked fine until `8631e64` changed the condition to send ping. To resolve it with minimal risk, let’s disconnect master and replicas once changing password/user. Based on earlier work of yz1509.	2022-03-13 10:13:47 +02:00
ranshid	11b071a22b	ACL DRYRUN does not validate the verified command args. (#10405 ) As a result we segfault when parsing and matching the command keys.	2022-03-10 10:08:41 +02:00
zhugezy	a26cab9dd6	set "disable-thp" config immutable (#10409 ) It's confusing for this config to be modifiable since it only takes effect on startup	2022-03-10 09:52:49 +02:00
蔡相跃	24da71e507	Fix typo "the the" (#10399 )	2022-03-09 13:55:17 +02:00
guybe7	2a2954086a	XREADGROUP: Unblock client if stream is deleted (#10306 ) Deleting a stream while a client is blocked XREADGROUP should unblock the client. The idea is that if a client is blocked via XREADGROUP is different from any other blocking type in the sense that it depends on the existence of both the key and the group. Even if the key is deleted and then revived with XADD it won't help any clients blocked on XREADGROUP because the group no longer exist, so they would fail with -NOGROUP anyway. The conclusion is that it's better to unblock these clients (with error) upon the deletion of the key, rather than waiting for the first XADD. Other changes: 1. Slightly optimize all `serveClientsBlockedOn` functions by checking `server.blocked_clients_by_type` 2. All `serveClientsBlockedOn` functions now use a list iterator rather than looking at `listFirst`, relying on `unblockClient` to delete the head of the list. Before this commit, only `serveClientsBlockedOnStreams` used to work like that. 3. bugfix: CLIENT UNBLOCK ERROR should work even if the command doesn't have a timeout_callback (only relevant to module commands)	2022-03-08 17:10:36 +02:00
zhaozhao.zz	728e62523e	script should not allow may-replicate commands when client pause write (#10364 ) In some special commands like eval_ro / fcall_ro we allow no-writes commands. But may-replicate commands are no-writes too, that leads crash when client pause write:	2022-03-08 16:53:11 +02:00
Shaya Potter	23f03e7965	Modules: Add REDISMODULE_EVENT_CONFIG (#10311 ) Add a new REDISMODULE_EVENT_CONFIG event type for notifying modules when Redis configuration changes.	2022-03-07 17:37:57 +02:00
Binbin	45d83fb2d4	Fix timing issue in rehash test (#10388 ) `Expected 'table size: 4096' to match 'table size: 8192'` This test failed once on daily macOS, the reason is because the bgsave has not stopped after the kill and `after 200`. So there is a child process and no rehash triggered. This commit use `waitForBgsave` to wait for it to finish.	2022-03-07 13:44:07 +02:00
Yossi Gottlieb	6740e1753d	Fix redis-cli test issues on tcl8.5. (#10386 ) Apparently using `\x` produces different results between tclsh 8.5 and 8.6, whereas `\u` is more consistent.	2022-03-06 13:02:35 +02:00
Yuta Hongo	e3ef73dc2a	redis-cli: Better --json Unicode support and --quoted-json (#10286 ) Normally, `redis-cli` escapes non-printable data received from Redis, using a custom scheme (which is also used to handle quoted input). When using `--json` this is not desired as it is not compatible with RFC 7159, which specifies JSON strings are assumed to be Unicode and how they should be escaped. This commit changes `--json` to follow RFC 7159, which means that properly encoded Unicode strings in Redis will result with a valid Unicode JSON. However, this introduces a new problem with `--json` and data that is not valid Unicode (e.g., random binary data, text that follows other encoding, etc.). To address this, we add `--quoted-json` which produces JSON strings that follow the original redis-cli quoting scheme. For example, a value that consists of only null (0x00) bytes will show up as: * `"\u0000\u0000\u0000"` when using `--json` * `"\\x00\\x00\\x00"` when using `--quoted-json`	2022-03-05 21:25:52 +02:00
ranshid	9b15dd288e	Introduce debug command to disable reply buffer resizing (#10360 ) In order to resolve some flaky tests which hard rely on examine memory footprint. we introduce the following fixes: # Fix in client-eviction test - by @yoav-steinberg Sometime the libc allocator can use different size client struct allocations. this may cause unexpected memory calculations to fail the test. # Introduce new DEBUG command for disabling reply buffer resizing In order to eliminate reply buffer resizing during specific tests. we introduced the ability to disable (and enable) the resizing cron job Co-authored-by: yoav-steinberg yoav@redislabs.com	2022-03-01 14:40:29 +02:00
Harkrishn Patro	21aabab401	Fix acl dryrun to return the tested common permission error. (#10359 )	2022-02-28 20:26:58 -08:00
ranshid	5860fa3d9c	deflake client-eviction test "evict clients only until below limit" (#10354 ) After introducing #9822 need to prevent client reply buffer shrink to maintain correct client memory math. add needs:debug missing one one test. Co-authored-by: Oran Agra <oran@redislabs.com>	2022-02-28 11:32:42 +02:00
Meir Shpilraien (Spielrein)	aa856b39f2	Sort out the mess around Lua error messages and error stats (#10329 ) This PR fix 2 issues on Lua scripting: * Server error reply statistics (some errors were counted twice). * Error code and error strings returning from scripts (error code was missing / misplaced). ## Statistics a Lua script user is considered part of the user application, a sophisticated transaction, so we want to count an error even if handled silently by the script, but when it is propagated outwards from the script we don't wanna count it twice. on the other hand, if the script decides to throw an error on its own (using `redis.error_reply`), we wanna count that too. Besides, we do count the `calls` in command statistics for the commands the script calls, we we should certainly also count `failed_calls`. So when a simple `eval "return redis.call('set','x','y')" 0` fails, it should count the failed call to both SET and EVAL, but the `errorstats` and `total_error_replies` should be counted only once. The PR changes the error object that is raised on errors. Instead of raising a simple Lua string, Redis will raise a Lua table in the following format: ``` { err='<error message (including error code)>', source='<User source file name>', line='<line where the error happned>', ignore_error_stats_update=true/false, } ``` The `luaPushError` function was modified to construct the new error table as describe above. The `luaRaiseError` was renamed to `luaError` and is now simply called `lua_error` to raise the table on the top of the Lua stack as the error object. The reason is that since its functionality is changed, in case some Redis branch / fork uses it, it's better to have a compilation error than a bug. The `source` and `line` fields are enriched by the error handler (if possible) and the `ignore_error_stats_update` is optional and if its not present then the default value is `false`. If `ignore_error_stats_update` is true, the error will not be counted on the error stats. When parsing Redis call reply, each error is translated to a Lua table on the format describe above and the `ignore_error_stats_update` field is set to `true` so we will not count errors twice (we counted this error when we invoke the command). The changes in this PR might have been considered as a breaking change for users that used Lua `pcall` function. Before, the error was a string and now its a table. To keep backward comparability the PR override the `pcall` implementation and extract the error message from the error table and return it. Example of the error stats update: ``` 127.0.0.1:6379> lpush l 1 (integer) 2 127.0.0.1:6379> eval "return redis.call('get', 'l')" 0 (error) WRONGTYPE Operation against a key holding the wrong kind of value. script: e471b73f1ef44774987ab00bdf51f21fd9f7974a, on @user_script:1. 127.0.0.1:6379> info Errorstats # Errorstats errorstat_WRONGTYPE:count=1 127.0.0.1:6379> info commandstats # Commandstats cmdstat_eval:calls=1,usec=341,usec_per_call=341.00,rejected_calls=0,failed_calls=1 cmdstat_info:calls=1,usec=35,usec_per_call=35.00,rejected_calls=0,failed_calls=0 cmdstat_lpush:calls=1,usec=14,usec_per_call=14.00,rejected_calls=0,failed_calls=0 cmdstat_get:calls=1,usec=10,usec_per_call=10.00,rejected_calls=0,failed_calls=1 ``` ## error message We can now construct the error message (sent as a reply to the user) from the error table, so this solves issues where the error message was malformed and the error code appeared in the middle of the error message: ```diff 127.0.0.1:6379> eval "return redis.call('set','x','y')" 0 -(error) ERR Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479): @user_script:1: OOM command not allowed when used memory > 'maxmemory'. +(error) OOM command not allowed when used memory > 'maxmemory' @user_script:1. Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479) ``` ```diff 127.0.0.1:6379> eval "redis.call('get', 'l')" 0 -(error) ERR Error running script (call to f_8a705cfb9fb09515bfe57ca2bd84a5caee2cbbd1): @user_script:1: WRONGTYPE Operation against a key holding the wrong kind of value +(error) WRONGTYPE Operation against a key holding the wrong kind of value script: 8a705cfb9fb09515bfe57ca2bd84a5caee2cbbd1, on @user_script:1. ``` Notica that `redis.pcall` was not change: ``` 127.0.0.1:6379> eval "return redis.pcall('get', 'l')" 0 (error) WRONGTYPE Operation against a key holding the wrong kind of value ``` ## other notes Notice that Some commands (like GEOADD) changes the cmd variable on the client stats so we can not count on it to update the command stats. In order to be able to update those stats correctly we needed to promote `realcmd` variable to be located on the client struct. Tests was added and modified to verify the changes. Related PR's: #10279, #10218, #10278, #10309 Co-authored-by: Oran Agra <oran@redislabs.com>	2022-02-27 13:40:57 +02:00
Itamar Haber	c81c7f51c3	Add stream consumer group lag tracking and reporting (#9127 ) Adds the ability to track the lag of a consumer group (CG), that is, the number of entries yet-to-be-delivered from the stream. The proposed constant-time solution is in the spirit of "best-effort." Partially addresses #8737. ## Description of approach We add a new "entries_added" property to the stream. This starts at 0 for a new stream and is incremented by 1 with every `XADD`. It is essentially an all-time counter of the entries added to the stream. Given the stream's length and this counter value, we can trivially find the logical "entries_added" counter of the first ID if and only if the stream is contiguous. A fragmented stream contains one or more tombstones generated by `XDEL`s. The new "xdel_max_id" stream property tracks the latest tombstone. The CG also tracks its last delivered ID's as an "entries_read" counter and increments it independently when delivering new messages, unless the this read counter is invalid (-1 means invalid offset). When the CG's counter is available, the reported lag is the difference between added and read counters. Lastly, this also adds a "first_id" field to the stream structure in order to make looking it up cheaper in most cases. ## Limitations There are two cases in which the mechanism isn't able to track the lag. In these cases, `XINFO` replies with `null` in the "lag" field. The first case is when a CG is created with an arbitrary last delivered ID, that isn't "0-0", nor the first or the last entries of the stream. In this case, it is impossible to obtain a valid read counter (short of an O(N) operation). The second case is when there are one or more tombstones fragmenting the stream's entries range. In both cases, given enough time and assuming that the consumers are active (reading and lacking) and advancing, the CG should be able to catch up with the tip of the stream and report zero lag. Once that's achieved, lag tracking would resume as normal (until the next tombstone is set). ## API changes * `XGROUP CREATE` added with the optional named argument `[ENTRIESREAD entries-read]` for explicitly specifying the new CG's counter. * `XGROUP SETID` added with an optional positional argument `[ENTRIESREAD entries-read]` for specifying the CG's counter. * `XINFO` reports the maximal tombstone ID, the recorded first entry ID, and total number of entries added to the stream. * `XINFO` reports the current lag and logical read counter of CGs. * `XSETID` is an internal command that's used in replication/aof. It has been added with the optional positional arguments `[ENTRIESADDED entries-added] [MAXDELETEDID max-deleted-entry-id]` for propagating the CG's offset and maximal tombstone ID of the stream. ## The generic unsolved problem The current stream implementation doesn't provide an efficient way to obtain the approximate/exact size of a range of entries. While it could've been nice to have that ability (#5813) in general, let alone specifically in the context of CGs, the risk and complexities involved in such implementation are in all likelihood prohibitive. ## A refactoring note The `streamGetEdgeID` has been refactored to accommodate both the existing seek of any entry as well as seeking non-deleted entries (the addition of the `skip_tombstones` argument). Furthermore, this refactoring also migrated the seek logic to use the `streamIterator` (rather than `raxIterator`) that was, in turn, extended with the `skip_tombstones` Boolean struct field to control the emission of these. Co-authored-by: Guy Benoish <guy.benoish@redislabs.com> Co-authored-by: Oran Agra <oran@redislabs.com>	2022-02-23 22:34:58 +02:00
Binbin	488aecb3ab	Fix timing issue in EXEC fail on lazy expired WATCHed key test (#10332 ) The test will fail on slow machines (valgrind or FreeBsd). Because in #10256 when WATCH is called on a key that's already logically expired, we will add an `expired` flag, and we will skip it in `isWatchedKeyExpired` check. Apparently we need to increase the expiration time so that the key can not expire logically then the WATCH is called. Also added retries to make sure it doesn't fail. I suppose 100ms is enough in valgrind, tested locally, no need to retry.	2022-02-23 08:47:16 +02:00
Viktor Söderqvist	e9ae03787e	Delete key doesn't dirty client who watched stale key (#10256 ) When WATCH is called on a key that's already logically expired, avoid discarding the transaction when the keys is actually deleted. When WATCH is called, a flag is stored if the key is already expired at the time of watch. The expired key is not deleted, only checked. When a key is "touched", if it is deleted and it was already expired when a client watched it, the client is not marked as dirty. Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>	2022-02-22 12:09:46 +02:00
ranshid	47c51d0c78	introduce dynamic client reply buffer size - save memory on idle clients (#9822 ) Current implementation simple idle client which serves no traffic still use ~17Kb of memory. this is mainly due to a fixed size reply buffer currently set to 16kb. We have encountered some cases in which the server operates in a low memory environments. In such cases a user who wishes to create large connection pools to support potential burst period, will exhaust a large amount of memory to maintain connected Idle clients. Some users may choose to "sacrifice" performance in order to save memory. This commit introduce a dynamic mechanism to shrink and expend the client reply buffer based on periodic observed peak. the algorithm works as follows: 1. each time a client reply buffer has been fully written, the last recorded peak is updated: new peak = MAX( last peak, current written size) 2. during clients cron we check for each client if the last observed peak was: a. matching the current buffer size - in which case we expend (resize) the buffer size by 100% b. less than half the buffer size - in which case we shrink the buffer size by 50% 3. In any case we will not resize the buffer in case: a. the current buffer peak is less then the current buffer usable size and higher than 1/2 the current buffer usable size b. the value of (current buffer usable size/2) is less than 1Kib c. the value of (current buffer usable size2) is larger than 16Kib 4. the peak value is reset to the current buffer position once every 5* seconds. we maintain a new field in the client structure (buf_peak_last_reset_time) which is used to keep track of how long it passed since the last buffer peak reset. ### Interface changes: CIENT LIST - now contains 2 new extra fields: rbs= < the current size in bytes of the client reply buffer > rbp=< the current value in bytes of the last observed buffer peak position > INFO STATS - now contains 2 new statistics: reply_buffer_shrinks = < total number of buffer shrinks performed > reply_buffer_expends = < total number of buffer expends performed > Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: Yoav Steinberg <yoav@redislabs.com>	2022-02-22 11:19:38 +02:00
Madelyn Olson	71204f9632	Implemented module getchannels api and renamed channel keyspec (#10299 ) This implements the following main pieces of functionality: * Renames key spec "CHANNEL" to be "NOT_KEY", and update the documentation to indicate it's for cluster routing and not for any other key related purpose. * Add the getchannels-api, so that modules can now define commands that are subject to ACL channel permission checks. * Add 4 new flags that describe how a module interacts with a command (SUBSCRIBE, PUBLISH, UNSUBSCRIBE, and PATTERN). They are all technically composable, however not sure how a command could both subscribe and unsubscribe from a command at once, but didn't see a reason to add explicit validation there. * Add two new module apis RM_ChannelAtPosWithFlags and RM_IsChannelsPositionRequest to duplicate the functionality provided by the keys position APIs. * The RM_ACLCheckChannelPermissions (only released in 7.0 RC1) was changed to take flags rather than a boolean literal. * The RM_ACLCheckKeyPermissions (only released in 7.0 RC1) was changed to take flags corresponding to keyspecs instead of custom permission flags. These keyspec flags mimic the flags for ACLCheckChannelPermissions.	2022-02-22 11:00:03 +02:00
YaacovHazan	65e4bce0e7	fix return value of loadAppendOnlyFiles (#10295 ) Make sure the status return from loading multiple AOF files reflects the overall result, not just the one of the last file. When one of the AOF files succeeded to load, but the last AOF file was empty, the loadAppendOnlyFiles will return AOF_EMPTY. This commit changes this behavior, and return AOF_OK in that case. This can happen for example, when loading old AOF file, and no more commands processed, the manifest file will include base AOF file with data, and empty incr AOF file. Co-authored-by: chenyang8094 <chenyang8094@users.noreply.github.com> Co-authored-by: Oran Agra <oran@redislabs.com>	2022-02-22 08:59:23 +02:00
Oran Agra	fad0b0d2a6	Fix error stats and failed command stats for blocked clients (#10309 ) This is a followup work for #10278, and a discussion about #10279 The changes: - fix failed_calls in command stats for blocked clients that got error. including CLIENT UNBLOCK, and module replying an error from a thread. - fix latency stats for XREADGROUP that filed with -NOGROUP Theory behind which errors should be counted: - error stats represents errors returned to the user, so an error handled by a module should not be counted. - total error counter should be the same. - command stats represents execution of commands (even with RM_Call, and if they fail or get rejected it counts these calls in commandstats, so it should also count failed_calls) Some thoughts about Scripts: for scripts it could be different since they're part of user code, not the infra (not an extension to redis) we certainly want commandstats to contain all calls and errors a simple script is like mult-exec transaction so an error inside it should be counted in error stats a script that replies with an error to the user (using redis.error_reply) should also be counted in error stats but then the problem is that a plain `return redis.call("SET")` should not be counted twice (once for the SET and once for EVAL) so that's something left to be resolved in #10279	2022-02-21 11:20:41 +02:00
yoav-steinberg	b59bb9b476	Fix script active defrag test (#10318 ) This includes two fixes: * We forgot to count non-key reallocs in defragmentation stats. * Fix the script defrag tests so to make dict entries less signigicant in fragmentation by making the scripts larger. This assures active defrage will complete and reach desired results. Some inherent fragmentation might exists in dict entries which we need to ignore. This lead to occasional CI failures.	2022-02-21 09:37:25 +02:00
qetu3790	b2d393b990	Fix geo search bounding box check causing missing results (#10018 ) Consider the following example: 1. geoadd k1 -0.15307903289794921875 85 n1 0.3515625 85.00019260486917005437 n2. 2. geodist k1 n1 n2 returns "4891.9380" 3. but GEORADIUSBYMEMBER k1 n1 4891.94 m only returns n1. n2 is in the boundingbox but out of search areas.So we let search areas contain boundingbox to get n2. Co-authored-by: Binbin <binloveplay1314@qq.com>	2022-02-21 08:06:58 +02:00
chenyang8094	a50aa29bde	Adapt redis-check-aof tool for Multi Part Aof (#10061 ) Modifications of this PR: 1. Support the verification of `Multi Part AOF`, while still maintaining support for the old-style `AOF/RDB-preamble`. `redis-check-aof` will automatically choose which mode to use according to the incoming file format. `Usage: redis-check-aof [--fix\|--truncate-to-timestamp $timestamp] <AOF/manifest>` 2. Refactor part of the code to make it easier to understand 3. Currently only supports truncate (`--fix` or `--truncate-to-timestamp`) the last AOF file (may be `BASE` or `INCR`) The reasons for 3 above: - for `--fix`: Only the last AOF may be truncated, this is guaranteed by redis - for `--truncate-to-timestamp`: Normally, we only have `BASE` + `INCR` files at most, and `BASE` cannot be truncated(It only contains a timestamp annotation at the beginning of the file), so only `INCR` can be truncated. If we have a `BASE+INCR1+INCR2` file (meaning we have an interrupted AOFRW), Only `INCR2` files can be truncated at this time. If we still insist on truncate `INCR1`, we need to manually delete `INCR2` and update the manifest file, then re-run `redis-check-aof` - If we want to support truncate any file, we need to add very complicated code to support the atomic modification of multiple file deletion and update manifest, I think this is unnecessary	2022-02-17 08:13:28 +02:00
YaacovHazan	e6478cfd10	fix "Connect multiple replicas at the same time" test (#10294 ) In order to make sure no more commands processed, we wait that the 'load handlers' will disconncet. The test by mistake waited on the (last) slave instead of the master.	2022-02-14 08:46:58 +02:00
Oran Agra	b099889a3a	Fix and improve module error reply statistics (#10278 ) This PR handles several aspects 1. Calls to RM_ReplyWithError from thread safe contexts don't violate thread safety. 2. Errors returning from RM_Call to the module aren't counted in the statistics (they might be handled silently by the module) 3. When a module propagates a reply it got from RM_Call to it's client, then the error statistics are counted. This is done by: 1. When appending an error reply to the output buffer, we avoid updating the global error statistics, instead we cache that error in a deferred list in the client struct. 2. When creating a RedisModuleCallReply object, the deferred error list is moved from the client into that object. 3. when a module calls RM_ReplyWithCallReply we copy the deferred replies to the dest client (if that's a real client, then that's when the error statistics are updated to the server) Note about RM_ReplyWithCallReply: if the original reply had an array with errors, and the module replied with just a portion of the original reply, and not the entire reply, the errors are currently not propagated and the errors stats will not get propagated. Fix #10180	2022-02-13 18:37:32 +02:00
Binbin	62c8be28ee	Regression test for sync psync crash (#10288 ) Added regression tests for #10020 / #10081 / #10243. The above PRs fixed some crashes due to an asserting, see function `clientHasPendingReplies` (introduced in #9166). This commit added some tests to cover the above scenario. These tests will all fail in #9166, althought fixed not, there is value in adding these tests to cover and verify the changes. And it also can cover #8868 (verify the logs). Other changes: 1. Reduces the wait time in `waitForBgsave` and `waitForBgrewriteaof` from 1s to 50ms, which should reduce the time for some tests. 2. Improve the test infra to print context when `assert_match` fails. 3. Improve the test infra to print `$error` when `assert_error` fails. ``` Expected an error matching 'ERR' but got 'OK' (context: type eval line 4 cmd {assert_error "ERR" {r set a b}} proc ::test) ```	2022-02-13 09:52:38 +02:00
yoav-steinberg	2eb9b19612	Fix Eval scripts defrag (broken 7.0 in RC1) (#10271 ) Remove scripts defragger since it was broken since #10126 (released in 7.0 RC1). would crash the server if defragger starts in a server that contains eval scripts. In #10126 the global `lua_script` dict became a dict to a custom `luaScript` struct with an internal `robj` in it instead of a generic `sds` -> `robj` dict. This means we need custom code to defrag it and since scripts should never really cause much fragmentation it makes more sense to simply remove the defrag code for scripts.	2022-02-11 21:58:05 +02:00
sundb	5f0119ca91	Fix duplicate module options define (#10284 ) The bug is introduced by #9323. (released in 7.0 RC1) The define of `REDISMODULE_OPTIONS_HANDLE_IO_ERRORS` and `REDISMODULE_OPTION_NO_IMPLICIT_SIGNAL_MODIFIED` have the same value. This will result in skipping `signalModifiedKey()` after `RM_CloseKey()` if the module has set `REDISMODULE_OPTIONS_HANDLE_REPL_ASYNC_LOAD` option. The implication is missing WATCH and client side tracking invalidations. Other changes: - add `no-implicit-signal-modified` to the options in INFO modules Co-authored-by: Oran Agra <oran@redislabs.com>	2022-02-11 20:15:52 +02:00
Harkrishn Patro	a5d17f0b6c	Check target node is a primary during cluster setslot. (#10277 )	2022-02-10 23:14:27 -08:00
Wen Hui	64e1e7e207	Add AUTH arity test (#10266 ) Add test for AUTH with too many arguments	2022-02-09 22:09:20 +02:00
Binbin	beb94c901e	Fix INFO SENTINEL memory leak (#10268 ) * Fix INFO SENTINEL memory leak Introduced in #6891 * remove the copy-paste sentence	2022-02-09 07:33:24 +02:00
Wen Hui	2e1bc942aa	Make INFO command variadic (#6891 ) This is an enhancement for INFO command, previously INFO only support one argument for different info section , if user want to get more categories information, either perform INFO all / default or calling INFO for multiple times. Description of the feature The goal of adding this feature is to let the user retrieve multiple categories via the INFO command, and still avoid emitting the same section twice. A use case for this is like Redis Sentinel, which periodically calling INFO command to refresh info from monitored Master/Slaves, only Server and Replication part categories are used for parsing information. If the INFO command can return just enough categories that client side needs, it can save a lot of time for client side parsing it as well as network bandwidth. Implementation To share code between redis, sentinel, and other users of INFO (DEBUG and modules), we have a new `genInfoSectionDict` function that returns a dict and some boolean flags (e.g. `all`) to the caller (built from user input). Sentinel is later purging unwanted sections from that, and then it is forwarded to the info `genRedisInfoString`. Usage Examples INFO Server Replication INFO CPU Memory INFO default commandstats Co-authored-by: Oran Agra <oran@redislabs.com>	2022-02-08 13:14:42 +02:00
yoav-steinberg	b76016a948	Consistent erros returned from EVAL scripts (#10218 ) This PR handles inconsistencies in errors returned from lua scripts. Details of the problem can be found in #10165. ### Changes - Remove double stack trace. It's enough that a stack trace is automatically added by the engine's error handler see `d0bc4fff18/src/function_lua.c (L472-L485)` and `d0bc4fff18/src/eval.c (L243-L255)` - Make sure all errors a preceded with an error code. Passing a simple string to `luaPushError()` will prepend it with a generic `ERR` error code. - Make sure lua error table doesn't include a RESP `-` error status. Lua stores redis error's as a lua table with a single `err` field and a string. When the string is translated back to RESP we add a `-` to it. See `d0bc4fff18/src/script_lua.c (L510-L517)` So there's no need to store it in the lua table. ### Before & After ```diff --- <unnamed> +++ <unnamed> @@ -1,14 +1,14 @@ 1: config set maxmemory 1 2: +OK 3: eval "return redis.call('set','x','y')" 0 - 4: -ERR Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479): @user_script:1: @user_script: 1: -OOM command not allowed when used memory > 'maxmemory'. + 4: -ERR Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479): @user_script:1: OOM command not allowed when used memory > 'maxmemory'. 5: eval "return redis.pcall('set','x','y')" 0 - 6: -@user_script: 1: -OOM command not allowed when used memory > 'maxmemory'. + 6: -OOM command not allowed when used memory > 'maxmemory'. 7: eval "return redis.call('select',99)" 0 8: -ERR Error running script (call to 4ad5abfc50bbccb484223905f9a16f09cd043ba8): @user_script:1: ERR DB index is out of range 9: eval "return redis.pcall('select',99)" 0 10: -ERR DB index is out of range 11: eval_ro "return redis.call('set','x','y')" 0 -12: -ERR Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479): @user_script:1: @user_script: 1: Write commands are not allowed from read-only scripts. +12: -ERR Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479): @user_script:1: ERR Write commands are not allowed from read-only scripts. 13: eval_ro "return redis.pcall('set','x','y')" 0 -14: -@user_script: 1: Write commands are not allowed from read-only scripts. +14: -ERR Write commands are not allowed from read-only scripts. ```	2022-02-08 11:44:40 +02:00
guybe7	3c3e6cc1c7	X[AUTO]CLAIM should skip deleted entries (#10227 ) Fix #7021 #8924 #10198 # Intro Before this commit X[AUTO]CLAIM used to transfer deleted entries from one PEL to another, but reply with "nil" for every such entry (instead of the entry id). The idea (for XCLAIM) was that the caller could see this "nil", realize the entry no longer exists, and XACK it in order to remove it from PEL. The main problem with that approach is that it assumes there's a correlation between the index of the "id" arguments and the array indices, which there isn't (in case some of the input IDs to XCLAIM never existed/read): ``` 127.0.0.1:6379> XADD x 1 f1 v1 "1-0" 127.0.0.1:6379> XADD x 2 f1 v1 "2-0" 127.0.0.1:6379> XADD x 3 f1 v1 "3-0" 127.0.0.1:6379> XGROUP CREATE x grp 0 OK 127.0.0.1:6379> XREADGROUP GROUP grp Alice COUNT 2 STREAMS x > 1) 1) "x" 2) 1) 1) "1-0" 2) 1) "f1" 2) "v1" 2) 1) "2-0" 2) 1) "f1" 2) "v1" 127.0.0.1:6379> XDEL x 1 2 (integer) 2 127.0.0.1:6379> XCLAIM x grp Bob 0 0-99 1-0 1-99 2-0 1) (nil) 2) (nil) ``` # Changes Now, X[AUTO]CLAIM acts in the following way: 1. If one tries to claim a deleted entry, we delete it from the PEL we found it in (and the group PEL too). So de facto, such entry is not claimed, just cleared from PEL (since anyway it doesn't exist in the stream) 2. since we never claim deleted entries, X[AUTO]CLAIM will never return "nil" instead of an entry. 3. add a new element to XAUTOCLAIM's response (see below) # Knowing which entries were cleared from the PEL The caller may want to log any entries that were found in a PEL but deleted from the stream itself (it would suggest that there might be a bug in the application: trimming the stream while some entries were still no processed by the consumers) ## XCLAIM the set {XCLAIM input ids} - {XCLAIM returned ids} contains all the entry ids that were not claimed which means they were deleted (assuming the input contains only entries from some PEL). The user doesn't need to XACK them because XCLAIM had already deleted them from the source PEL. ## XAUTOCLAIM XAUTOCLAIM has a new element added to its reply: it's an array of all the deleted stream IDs it stumbled upon. This is somewhat of a breaking change since X[AUTO]CLAIM used to be able to reply with "nil" and now it can't... But since it was undocumented (and generally a bad idea to rely on it, as explained above) the breakage is not that bad.	2022-02-08 10:20:09 +02:00
Oran Agra	66be30f7fc	Handle key-spec flags with modules (#10237 ) - add COMMAND GETKEYSANDFLAGS sub-command - add RM_KeyAtPosWithFlags and GetCommandKeysWithFlags - RM_KeyAtPos and RM_CreateCommand set flags requiring full access for keys - RM_CreateCommand set VARIABLE_FLAGS - expose `variable_flags` flag in COMMAND INFO key-specs - getKeysFromCommandWithSpecs prefers key-specs over getkeys-api - add tests for all of these	2022-02-08 10:01:35 +02:00
Binbin	7f4cca11dc	COMMAND DOCS avoid adding summary/since if they don't exist (#10252 ) If summary or since is empty, we used to return NULL in COMMAND DOCS. Currently all redis commands will have these two fields. But not for module command, summary and since are optional for RM_SetCommandInfo. With the change in #10043, if a module command doesn't have the summary or since, redis-cli will crash (see #10250). In this commit, COMMAND DOCS avoid adding summary or since when they are missing.	2022-02-07 19:57:50 +02:00
yoav-steinberg	9dfeda58ed	acl check api for functions and eval (#10220 ) Changes: 1. Adds the `redis.acl_check_cmd()` api to lua scripts. It can be used to check if the current user has permissions to execute a given command. The new function receives the command to check as an argument exactly like `redis.call()` receives the command to execute as an argument. 2. In the PR I unified the code used to convert lua arguments to redis argv arguments from both the new `redis.acl_check_cmd()` API and the `redis.[p]call()` API. This cleans up potential duplicate code. 3. While doing the refactoring in 2 I noticed there's an optimization to reduce allocation calls when parsing lua arguments into an `argv` array in the `redis.[p]call()` implementation. These optimizations were introduced years ago in `48c49c4851` and `4f686555ce`. It is unclear why this was added. The original commit message claims a 4% performance increase which I couldn't recreate and might not be worth it even if it did recreate. This PR removes that optimization. Following are details of the benchmark I did that couldn't reveal any performance improvements due to this optimization: ``` benchmark 1: src/redis-benchmark -P 500 -n 10000000 eval 'return redis.call("ping")' 0 benchmark 2: src/redis-benchmark -P 500 -r 1000 -n 1000000 eval 'return redis.call("mset","k1__rand_int__","v1__rand_int__","k2__rand_int__","v2__rand_int__","k3__rand_int__","v3__rand_int__","k4__rand_int__","v4__rand_int__")' 0 benchmark 3: src/redis-benchmark -P 500 -r 1000 -n 100000 eval "for i=1,100,1 do redis.call('set','kk'..i,'vv'..__rand_int__) end return redis.call('get','kk5')" 0 benchmark 4: src/redis-benchmark -P 500 -r 1000 -n 1000000 eval 'return redis.call("mset","k1__rand_int__","v1__rand_int__","k2__rand_int__","v2__rand_int__","k3__rand_int__","v3__rand_int__","k4__rand_int__","v4__rand_int__xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx")' ``` I ran the benchmark on this branch with and without commit 68b71680a4d3bb8f0509e06578a9f15d05b92a47 Results in requests per second: cmd \| without optimization \| without optimization 2nd run \| with original optimization \| with original optimization 2nd run -- \| -- \| -- \| -- \| -- 1 \| 461233.34 \| 477395.31 \| 471098.16 \| 469946.91 2 \| 34774.14 \| 35469.8 \| 35149.38 \| 34464.93 3 \| 6390.59 \| 6281.41 \| 6146.28 \| 6464.12 4 \| 28005.71 \| \| 27965.77 \| As you can see, different use cases showed identical or negligible performance differences. So finally I decided to chuck the original optimization and simplify the code.	2022-02-07 08:04:01 +02:00
Oran Agra	98b3f52599	add test suite infra to test RESP3 attributes (#10247 ) So far we only tested attributes using readraw, not the resp parser caches them, so that after getting the reply, you can query them if you want.	2022-02-07 00:10:05 +02:00
Wen Hui	6ebb679f06	Add tests for ACL command error cases (#10183 )	2022-02-06 07:58:28 +02:00
Jason Elbaum	5b17909c4f	redis-cli generates command help tables from the results of COMMAND (#10043 ) This is a followup to #9656 and implements the following step mentioned in that PR: * When possible, extract all the help and completion tips from COMMAND DOCS (Redis 7.0 and up) * If COMMAND DOCS fails, use the static help.h compiled into redis-cli. * Supplement additional command names from COMMAND (pre-Redis 7.0) The last step is needed to add module command and other non-standard commands. This PR does not change the interactive hinting mechanism, which still uses only the param strings to provide somewhat unreliable and inconsistent command hints (see #8084). That task is left for a future PR. Co-authored-by: Oran Agra <oran@redislabs.com>	2022-02-05 16:54:16 +02:00
Viktor Söderqvist	0a82fe8447	Command info module API (#10108 ) Adds RM_SetCommandInfo, allowing modules to provide the following command info: * summary * complexity * since * history * hints * arity * key specs * args This information affects the output of `COMMAND`, `COMMAND INFO` and `COMMAND DOCS`, Cluster, ACL and is used to filter commands with the wrong number of arguments before the call reaches the module code. The recently added API functions for key specs (never released) are removed. A minimalist example would look like so: ```c RedisModuleCommand mycmd = RedisModule_GetCommand(ctx,"mymodule.mycommand"); RedisModuleCommandInfo mycmd_info = { .version = REDISMODULE_COMMAND_INFO_VERSION, .arity = -5, .summary = "some description", }; if (RedisModule_SetCommandInfo(mycmd, &mycmd_info) == REDISMODULE_ERR) return REDISMODULE_ERR; ```` Notes: All the provided information (including strings) is copied, not keeping references to the API input data. * The version field is actually a static struct that contains the sizes of the the structs used in arrays, so we can extend these in the future and old version will still be able to take the part they can support.	2022-02-04 21:09:36 +02:00
Binbin	d7fcb3c5a1	Fix SENTINEL SET config rewrite test (#10232 ) Change the sentinel config file to a directory in SENTINEL SET test. So it will now fail on the `rename` in `rewriteConfigOverwriteFile`. The test used to set the sentinel config file permissions to `000` to simulate failure. But it fails on centos7 / freebsd / alpine. (introduced in #10151) Other changes: 1. More error messages after the config rewrite failure. 2. Modify arg name `force_all` in `rewriteConfig` to `force_write`. (was rename in #9304) 3. Fix a typo in debug quicklist-packed-threshold, then -> than. (#9357)	2022-02-04 11:39:51 +02:00
Binbin	d2fde2f655	Fix cluster tests failing due to subcommand names (#10231 ) Introduced in #10128	2022-02-04 11:32:30 +02:00
Wen Hui	65ef543f8c	Sentinel: return an error if configuration save fails (#10151 ) When performing `SENTINEL SET`, Sentinel updates the local configuration file. Before this commit, failure to update the file would still result with an `+OK` reply. Now, a `-ERR Failed to save config file` error will be returned. Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>	2022-02-03 13:20:35 +02:00
Vo Trong Phuc	53c43fcc84	Add check min-slave-* feature when evaluating Lua scripts and Functions (#10160 ) Add check enough good slaves for write command when evaluating scripts. This check is made before the script is executed, if we have function flags, and per redis command if we don't. Co-authored-by: Phuc. Vo Trong <phucvt@vng.com.vn> Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: Meir Shpilraien (Spielrein) <meir@redis.com>	2022-02-03 11:57:51 +02:00
Wen Hui	c9e1602f90	Add test case to improve code coverage for Addslotsrange and Delslotsrange command (#10128 ) add more test cases for addslotsrange and delslotsrange	2022-02-02 18:22:46 -08:00
郭伟光	6b5b3ca414	forbid module to unload when it holds ongoing timer (#10187 ) This is done to avoid a crash when the timer fires after the module was unloaded. Or memory leaks in case we wanted to just ignore the timer. It'll cause the MODULE UNLOAD command to return with an error Co-authored-by: sundb <sundbcn@gmail.com>	2022-02-01 14:54:11 +02:00
Meir Shpilraien (Spielrein)	6ca97da0fc	Fix wrong version calculation on Redis Function tests. (#10217 )	2022-01-31 12:49:57 +02:00
Madelyn Olson	8b1cda7568	Change replica migration tests to use continous slots to improve speed (#10215 )	2022-01-30 22:44:32 -08:00
Oran Agra	d364ede59c	Revent the attempt to fix cluster rebalance test (#10207 ) (#10212 ) It seems that fix didn't really solve the problem with ASAN, and also introduced issues with other CI runs. unrelated: - make runtest-cluster able to take multiple --single arguments	2022-01-31 01:47:58 +02:00
Harkrishn Patro	a43b6922d1	Set default channel permission to resetchannels for 7.0 (#10181 ) For backwards compatibility in 6.x, channels default permission was set to `allchannels` however with 7.0, we should modify it and the default value should be `resetchannels` for better security posture. Also, with selectors in ACL, a client doesn't have to set channel rules everytime and by default the value will be `resetchannels`. Before this change ``` 127.0.0.1:6379> acl list 1) "user default on nopass ~* &* +@all" 127.0.0.1:6379> acl setuser hp on nopass +@all ~* OK 127.0.0.1:6379> acl list 1) "user default on nopass ~* &* +@all" 2) "user hp on nopass ~* &* +@all" 127.0.0.1:6379> acl setuser hp1 on nopass -@all (%R~sales) OK 127.0.0.1:6379> acl list 1) "user default on nopass ~ &* +@all" 2) "user hp on nopass ~* &* +@all" 3) "user hp1 on nopass &* -@all (%R~sales* &* -@all)" ``` After this change ``` 127.0.0.1:6379> acl list 1) "user default on nopass ~* &* +@all" 127.0.0.1:6379> acl setuser hp on nopass +@all ~* OK 127.0.0.1:6379> acl list 1) "user default on nopass ~* &* +@all" 2) "user hp on nopass ~* resetchannels +@all" 127.0.0.1:6379> acl setuser hp1 on nopass -@all (%R~sales) OK 127.0.0.1:6379> acl list 1) "user default on nopass ~ &* +@all" 2) "user hp on nopass ~* resetchannels +@all" 3) "user hp1 on nopass resetchannels -@all (%R~sales* resetchannels -@all)" ```	2022-01-30 12:02:55 +02:00
Oran Agra	be0d293354	fix cluster rebalance test race (#10207 ) Try to fix the rebalance cluster test that's failing with ASAN daily: Looks like `redis-cli --cluster rebalance` gets `ERR Please use SETSLOT only with masters` in `clusterManagerMoveSlot()`. it happens when `12-replica-migration-2.tcl` is run with ASAN in GH Actions. in `Resharding all the master #0 slots away from it` So the fix (assuming i got it right) is to call `redis-cli --cluster check` before `--cluster rebalance`. p.s. it looks like a few other checks in these tests needed that wait, added them too. Other changes: * in instances.tcl, make sure to catch tcl test crashes and let the rest of the code proceed, so that if there was a redis crash, we'll find it and print it too. * redis-cli, try to make sure it prints an error instead of silently exiting. specifically about redis-cli: 1. clusterManagerMoveSlot used to print an error, only if the caller also asked for it (should be the other way around). 2. clusterManagerCommandReshard asked for an error, but didn't use it (probably tried to avoid the double print). 3. clusterManagerCommandRebalance didn't ask for the error, now it does. 4. making sure that other places in clusterManagerCommandRebalance print something before exiting with an error.	2022-01-30 11:30:19 +02:00
Binbin	d616925835	Allow SET without GET arg on write-only ACL. Allow BITFIELD GET on read-only ACL (#10148 ) SET is a R+W command, because it can also do `GET` on the data. SET without GET is a write-only command. SET with GET is a read+write command. In #9974, we added ACL to let users define write-only access. So when the user uses SET with GET option, and the user doesn't have the READ permission on the key, we need to reject it, but we rather not reject users with write-only permissions from using the SET command when they don't use GET. In this commit, we add a `getkeys_proc` function to control key flags in SET command. We also add a new key spec flag (VARIABLE_FLAGS) means that some keys might have different flags depending on arguments. We also handle BITFIELD command, add a `bitfieldGetKeys` function. BITFIELD GET is a READ ONLY command. BITFIELD SET or BITFIELD INCR are READ WRITE commands. Other changes: 1. SET GET was added in 6.2, add the missing since in set.json 2. Added tests to cover the changes in acl-v2.tcl 3. Fix some typos in server.h and cleanups in acl-v2.tcl Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Oran Agra <oran@redislabs.com>	2022-01-26 21:03:21 +02:00
Oran Agra	795ea011ba	Solve race in a BGSAVE test (#10190 ) This PR attempts to solve two problems that happen sometime in valgrind: `ERR Background save already in progress` and `not bgsave not aborted` the test used to populate the database with DEBUG, which didn't increment the dirty counter, so couldn't trigger an automatic bgsave. then it used a manual bgsave, and aborted it (when it got aborted it populated the dirty counter), and then it tried to do another bgsave. that other bgsave could have failed if the automatic one already started.	2022-01-26 19:46:02 +02:00
Oran Agra	da48a81290	solve race in expiration test (#10192 ) Failed on a non-valgrind run. on this line: ``` assert_equal 0 [$slave exists k] ``` the condition in `keyIsExpired` is `now > when`. so if the test is really fast, maybe it can get to EXISTS exactly 1000 milliseconds after the expiration was set, and the key isn't yet gone)	2022-01-26 19:45:31 +02:00
Binbin	7fdcada67b	Fix unused variable warning in subcommand.c (#10184 ) Forgot to handle it in #10135.	2022-01-26 10:21:51 +02:00
Madelyn Olson	f6b76e50ad	Change expression to look for at least one limit exceeded (#10173 ) This is an attempt to fix some of the issues with the cluster mode tests we are seeing in the daily run. The test is trying to incrementally adds a bunch of publish messages, expecting that eventually one of them will overflow. The tests stops one of the processes, so it expects that just that one Redis node will overflow. I think the test is flaky because under certain circumstances multiple links are getting disconnected, not just the one that is stalled.	2022-01-26 09:59:53 +02:00
Meir Shpilraien (Spielrein)	5a38ccc253	Added engine stats to FUNCTION STATS command. (#10179 ) Added the following statistics (per engine) to FUNCTION STATS command: * number of functions * number of libraries Output example: ``` > FUNCTION stats 1) "running_script" 2) (nil) 3) "engines" 4) 1) "LUA" 2) 1) "libraries_count" 2) (integer) 1 3) "functions_count" 4) (integer) 1 ``` To collect the stats, added a new dictionary to libraries_ctx that contains for each engine, the engine statistics representing the current libraries_ctx. Update the stats on: 1. Link library to libraries_ctx 2. Unlink library from libraries_ctx 3. Flushing libraries_ctx	2022-01-25 15:50:14 +02:00
Madelyn Olson	823da54361	Improve testing and update flags around commands without ACL keyspec flags (#10167 ) This PR aims to improve the flags associated with some commands and adds various tests around these cases. Specifically, it's concerned with commands which declare keys but have no ACL flags (think `EXISTS`), the user needs either read or write permission to access this type of key. This change is primarily concerned around commands in three categories: # General keyspace commands These commands are agnostic to the underlying data outside of side channel attacks, so they are not marked as ACCESS. * TOUCH * EXISTS * TYPE * OBJECT 'all subcommands' Note that TOUCH is not a write command, it could be a side effect of either a read or a write command. # Length and cardinality commands These commands are marked as NOT marked as ACCESS since they don't return actual user strings, just metadata. * LLEN * STRLEN * SCARD * HSTRLEN # Container has member commands These commands return information about the existence or metadata about the key. These commands are NOT marked as ACCESS since the check of membership is used widely in write commands e.g. the response of HSET. * SISMEMBER * HEXISTS # Intersection cardinality commands These commands are marked as ACCESS since they process data to compute the result. * PFCOUNT * ZCOUNT * ZINTERCARD * SINTERCARD	2022-01-25 09:55:30 +02:00
Madelyn Olson	c275010fff	Correctly handle minimum arity checks in scripts (#10171 ) Correctly handle variable arity checks in scripts	2022-01-24 22:08:57 -08:00
Oran Agra	0343fe9fa5	re-fix EVAL timeout test (#10169 ) The edit i made to #10098 before merging was broken. (INFO is forbidden in LOADING)	2022-01-24 23:01:08 +02:00
chenyang8094	fa60049648	Fix EVAL timeout test failed on freebsd (#10098 ) * Refactor EVAL timeout test * since the test used r config set appendonly yes which generates a rewrite, it missed it's purpose * Fix the bug that start_server returns before redis starts ready, which affects when multiple tests share the same dir. * Elapsed time tracking no loner needed Co-authored-by: Oran Agra <oran@redislabs.com>	2022-01-24 22:31:35 +02:00
yoav-steinberg	7eadc5ee70	Support function flags in script EVAL via shebang header (#10126 ) In #10025 we added a mechanism for flagging certain properties for Redis Functions. This lead us to think we'd like to "port" this mechanism to Redis Scripts (`EVAL`) as well. One good reason for this, other than the added functionality is because it addresses the poor behavior we currently have in `EVAL` in case the script performs a (non DENY_OOM) write operation during OOM state. See #8478 (And a previous attempt to handle it via #10093) for details. Note that in Redis Functions all write operations (including DEL) will return an error during OOM state unless the function is flagged as `allow-oom` in which case no OOM checking is performed at all. This PR: - Enables setting `EVAL` (and `SCRIPT LOAD`) script flags as defined in #10025. - Provides a syntactical framework via [shebang](https://en.wikipedia.org/wiki/Shebang_(Unix)) for additional script annotations and even engine selection (instead of just lua) for scripts. - Provides backwards compatibility so scripts without the new annotations will behave as they did before. - Appropriate tests. - Changes `EVAL[SHA]/_RO` to be flagged as `STALE` commands. This makes it possible to flag individual scripts as `allow-stale` or not flag them as such. In backwards compatibility mode these commands will return the `MASTERDOWN` error as before. - Changes `SCRIPT LOAD` to be flagged as a `STALE` command. This is mainly to make it logically compatible with the change to `EVAL` in the previous point. It enables loading a script on a stale server which is technically okay it doesn't relate directly to the server's dataset. Running the script does, but that won't work unless the script is explicitly marked as `allow-stale`. Note that even though the LUA syntax doesn't support hash tag comments `.lua` files do support a shebang tag on the top so they can be executed on Unix systems like any shell script. LUA's `luaL_loadfile` handles this as part of the LUA library. In the case of `luaL_loadbuffer`, which is what Redis uses, I needed to fix the input script in case of a shebang manually. I did this the same way `luaL_loadfile` does, by replacing the first line with a single line feed character.	2022-01-24 16:50:02 +02:00
Viktor Söderqvist	857dc5bacd	Disable keyspec module API in 7.0 RC1 (#10135 ) The keyspec API is not yet released and there is a plan to change it in #10108, which is going to be included in RC2. Therefore, we hide it in RC1 to avoid introducing a breaking change in RC2. Co-authored-by: Oran Agra <oran@redislabs.com>	2022-01-24 15:02:07 +02:00
chenyang8094	6dc3f09cb9	Fix AOFRW limit test occasional failures on slower machines (#10164 )	2022-01-24 14:55:24 +02:00
ny0312	b40a9ba5fd	Fix flaky cluster tests in 24-links.tcl (#10157 ) * Fix flaky cluster test "Disconnect link when send buffer limit reached" * Fix flaky cluster test "Each node has two links with each peer" Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>	2022-01-23 17:28:32 -08:00
Binbin	7e5ded2ad0	Fix timing issue in sentinel CKQUORUM test (#10036 ) A test failure was reported in Daily CI (test-centos7-tls). `CKQUORUM detects failover authorization cannot be reached`. ``` CKQUORUM detects failover authorization cannot be reached: FAILED: Expected 'invalid command name "OK 4 usable Sentinels. Quorum and failover authorization can be reached"' to match 'NOQUORUM' ``` It seems that current sentinel does not confirm that the other sentinels are actually `down`, and then check the quorum. It at least take 3 seconds on my machine, and we can see there will be a timing issue with the hard code `after 5000`. In this commit, we check the response of `SENTINEL SENTINELS mymaster` to ensure that other sentinels are actually `down` in the view the current sentinel. Solve the timing issue due to sentinel monitor mechanism.	2022-01-23 13:54:50 +02:00
Binbin	23325c135f	sub-command support for ACL CAT and COMMAND LIST. redisCommand always stores fullname (#10127 ) Summary of changes: 1. Rename `redisCommand->name` to `redisCommand->declared_name`, it is a const char * for native commands and SDS for module commands. 2. Store the [sub]command fullname in `redisCommand->fullname` (sds). 3. List subcommands in `ACL CAT` 4. List subcommands in `COMMAND LIST` 5. `moduleUnregisterCommands` now will also free the module subcommands. 6. RM_GetCurrentCommandName returns full command name Other changes: 1. Add `addReplyErrorArity` and `addReplyErrorExpireTime` 2. Remove `getFullCommandName` function that now is useless. 3. Some cleanups about `fullname` since now it is SDS. 4. Delete `populateSingleCommand` function from server.h that is useless. 5. Added tests to cover this change. 6. Add some module unload tests and fix the leaks 7. Make error messages uniform, make sure they always contain the full command name and that it's quoted. 7. Fixes some typos see the history in #9504, fixes #10124 Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: guybe7 <guy.benoish@redislabs.com>	2022-01-23 10:05:06 +02:00
guybe7	a6fd2a46d1	Improved handling of subcommands (don't allow ACL on first-arg of a sub-command) (#10147 ) Recently we added extensive support for sub-commands in for redis 7.0, this meant that the old ACL mechanism for sub-commands wasn't needed, or actually was improved (to handle both include and exclude control, like for commands), but only for real sub-commands. The old mechanism in ACL was renamed to first-arg, and was able to match the first argument of any command (including sub-commands). We now realized that we might wanna completely delete that first-arg feature some day, so the first step was not to give it new capabilities in 7.0 and it didn't have before. Changes: 1. ACL: Block the first-arg mechanism on subcommands (we keep if in non-subcommands for backward compatibility) 2. COMMAND: When looking up a command, insist the command name doesn't contain extra words. Example: When a user issues `GET key` we want `lookupCommand` to return `getCommand` but when if COMMAND calls `lookupCommand` with `get\|key` we want it to fail. Other changes: 1. ACLSetUser: prevent a redundant command lookup	2022-01-22 14:09:40 +02:00
Madelyn Olson	55c81f2cd3	ACL V2 - Selectors and key based permissions (#9974 ) * Implemented selectors which provide multiple different sets of permissions to users * Implemented key based permissions * Added a new ACL dry-run command to test permissions before execution * Updated module APIs to support checking key based permissions Co-authored-by: Oran Agra <oran@redislabs.com>	2022-01-20 13:05:27 -08:00
perryitay	c4b788230c	Adding module api for processing commands during busy jobs and allow flagging the commands that should be handled at this status (#9963 ) Some modules might perform a long-running logic in different stages of Redis lifetime, for example: * command execution * RDB loading * thread safe context During this long-running logic Redis is not responsive. This PR offers 1. An API to process events while a busy command is running (`RM_Yield`) 2. A new flag (`ALLOW_BUSY`) to mark the commands that should be handled during busy jobs which can also be used by modules (`allow-busy`) 3. In slow commands and thread safe contexts, this flag will start rejecting commands with -BUSY only after `busy-reply-threshold` 4. During loading (`rdb_load` callback), it'll process events right away (not wait for `busy-reply-threshold`), but either way, the processing is throttled to the server hz rate. 5. Allow modules to Yield to redis background tasks, but not to client commands * rename `script-time-limit` to `busy-reply-threshold` (an alias to the pre-7.0 `lua-time-limit`) Co-authored-by: Oran Agra <oran@redislabs.com>	2022-01-20 09:05:53 +02:00
Meir Shpilraien (Spielrein)	c556c57e5e	Added AOF rewrite support for functions. (#10141 ) Function PR was merged without AOF rw support because we thought this feature was going to be removed on Redis 7. Tests was added on aofrw.tcl Other existing aofrw tests where slow due to unwanted rdb-key-save-delay Co-authored-by: Oran Agra <oran@redislabs.com>	2022-01-19 21:21:42 +02:00
Wen Hui	68a8d0b46d	Add sentinel config set test case (#10114 )	2022-01-19 11:57:51 +02:00
Ozan Tezcan	72e1b5de4d	Fix replica count check in migration tests. (#10140 ) Tests were not using loop index as node id, checking replica count of the same node over and over.	2022-01-19 11:36:24 +02:00
Ozan Tezcan	1af0a2c5ae	Fix eventloop module test for valgrind (#10139 ) was eating too much memory, and taking too long with valgrind	2022-01-19 09:13:51 +02:00
Oran Agra	eef9c6b0ee	New detailed key-spec flags (RO, RW, OW, RM, ACCESS, UPDATE, INSERT, DELETE) (#10122 ) The new ACL key based permissions in #9974 require the key-specs (#8324) to have more explicit flags rather than just READ and WRITE. See discussion in #10040 This PR defines two groups of flags: One about how redis internally handles the key (mutually-exclusive). The other is about the logical operation done from the user's point of view (3 mutually exclusive write flags, and one read flag, all optional). In both groups, if we can't explicitly flag something as explicit read-only, delete-only, or insert-only, we flag it as `RW` or `UPDATE`. here's the definition from the code: ``` /* Key-spec flags * * -------------- / / The following refer what the command actually does with the value or metadata * of the key, and not necessarily the user data or how it affects it. * Each key-spec may must have exaclty one of these. Any operation that's not * distinctly deletion, overwrite or read-only would be marked as RW. / #define CMD_KEY_RO (1ULL<<0) / Read-Only - Reads the value of the key, but * doesn't necessarily returns it. / #define CMD_KEY_RW (1ULL<<1) / Read-Write - Modifies the data stored in the * value of the key or its metadata. / #define CMD_KEY_OW (1ULL<<2) / Overwrite - Overwrites the data stored in * the value of the key. / #define CMD_KEY_RM (1ULL<<3) / Deletes the key. / / The follwing refer to user data inside the value of the key, not the metadata * like LRU, type, cardinality. It refers to the logical operation on the user's * data (actual input strings / TTL), being used / returned / copied / changed, * It doesn't refer to modification or returning of metadata (like type, count, * presence of data). Any write that's not INSERT or DELETE, would be an UPADTE. * Each key-spec may have one of the writes with or without access, or none: / #define CMD_KEY_ACCESS (1ULL<<4) / Returns, copies or uses the user data from * the value of the key. / #define CMD_KEY_UPDATE (1ULL<<5) / Updates data to the value, new value may * depend on the old value. / #define CMD_KEY_INSERT (1ULL<<6) / Adds data to the value with no chance of, * modification or deletion of existing data. / #define CMD_KEY_DELETE (1ULL<<7) / Explicitly deletes some content * from the value of the key. */ ``` Unrelated changes: - generate-command-code.py is only compatible with python3 (modified the shabang) - generate-command-code.py print file on json parsing error - rename `shard_channel` key-spec flag to just `channel`. - add INCOMPLETE flag in input spec of SORT and SORT_RO	2022-01-18 16:00:00 +02:00
Wang Yuan	d697daa7a5	Use const char pointer in redismodule.h as far as possible (#10064 ) When I used C++ to develop a redis module. i used `string.data()` as the second parameter `ele` of `RedisModule_DigestAddStringBuffer`, but there is a warning, since we never change the `ele`, i think we should use `const char` for it. This PR adds const to just a handful of module APIs that required it, all not very widely used. The implication is a breaking change in terms of compilation error that's easy to resolve, and no ABI impact. The affected APIs are around Digest, Info injection, and Cluster bus messages.	2022-01-18 15:55:20 +02:00
Ozan Tezcan	99ab4236af	Add event loop support to the module API (#10001 ) Modules can now register sockets/pipe to the Redis main thread event loop and do network operations asynchronously. Previously, modules had to maintain an event loop and another thread for asynchronous network operations. Also, if a module is calling API functions after doing some network operations, it had to synchronize its event loop thread's access with Redis main thread by locking the GIL, causing contention on the lock. After this commit, no synchronization is needed as module can operate in Redis main thread context. So, this commit may improve the performance for some use cases. Added three functions to the module API: * RedisModule_EventLoopAdd(int fd, int mask, RedisModuleEventLoopFunc func, void user_data) RedisModule_EventLoopDel(int fd, int mask) * RedisModule_EventLoopAddOneShot(RedisModuleEventLoopOneShotFunc func, void user_data) - This function can be called from other threads to trigger callback on Redis main thread. Callback will be triggered only once. If Redis main thread is sleeping, this call will wake up the Redis main thread. Event loop callbacks are called by Redis main thread after locking the GIL. Inside callbacks, modules can operate as if they are holding the GIL. Added REDISMODULE_EVENT_EVENTLOOP event with two subevents: REDISMODULE_SUBEVENT_EVENTLOOP_BEFORE_SLEEP * REDISMODULE_SUBEVENT_EVENTLOOP_AFTER_SLEEP These events are for modules that want to participate in the before and after sleep action. e.g It might be useful to implement batching : Read data from the network, write all to a file in one go on BEFORE_SLEEP event.	2022-01-18 13:10:07 +02:00
Yossi Gottlieb	25e6d4d459	Fix additional AOF filename issues. (#10110 ) This extends the previous fix (#10049) to address any form of non-printable or whitespace character (including newlines, quotes, non-printables, etc.) Also, removes the limitation on appenddirname, to align with the way filenames are handled elsewhere in Redis.	2022-01-18 12:52:27 +02:00
Meir Shpilraien (Spielrein)	51f9bed3dd	Fix `FUNCTION LOAD` ignores unknown parameter. (#10131 ) Following discussion on: https://github.com/redis/redis/issues/9899#issuecomment-1014689385 Raise error if unknows parameter is given to `FUNCTION LOAD`. Before the fix: ``` 127.0.0.1:6379> function load LUA lib2 foo bar "local function test1() return 5 end redis.register_function('test1', test1)" OK ``` After the fix: ``` 127.0.0.1:6379> function load LUA lib2 foo bar "local function test1() return 5 end redis.register_function('test1', test1)" (error) ERR Unkowns option given: foo ```	2022-01-18 10:29:52 +02:00
Oran Agra	ae89958972	Set repl-diskless-sync to yes by default, add repl-diskless-sync-max-replicas (#10092 ) 1. enable diskless replication by default 2. add a new config named repl-diskless-sync-max-replicas that enables replication to start before the full repl-diskless-sync-delay was reached. 3. put replica online sooner on the master (see below) 4. test suite uses repl-diskless-sync-delay of 0 to be faster 5. a few tests that use multiple replica on a pre-populated master, are now using the new repl-diskless-sync-max-replicas 6. fix possible timing issues in a few cluster tests (see below) put replica online sooner on the master ---------------------------------------------------- there were two tests that failed because they needed for the master to realize that the replica is online, but the test code was actually only waiting for the replica to realize it's online, and in diskless it could have been before the master realized it. changes include two things: 1. the tests wait on the right thing 2. issues in the master, putting the replica online in two steps. the master used to put the replica as online in 2 steps. the first step was to mark it as online, and the second step was to enable the write event (only after getting ACK), but in fact the first step didn't contains some of the tasks to put it online (like updating good slave count, and sending the module event). this meant that if a test was waiting to see that the replica is online form the point of view of the master, and then confirm that the module got an event, or that the master has enough good replicas, it could fail due to timing issues. so now the full effect of putting the replica online, happens at once, and only the part about enabling the writes is delayed till the ACK. fix cluster tests -------------------- I added some code to wait for the replica to sync and avoid race conditions. later realized the sentinel and cluster tests where using the original 5 seconds delay, so changed it to 0. this means the other changes are probably not needed, but i suppose they're still better (avoid race conditions)	2022-01-17 14:11:11 +02:00
zhaozhao.zz	90916f16a5	show subcommands latencystats (#10103 ) since `info commandstats` already shows sub-commands, we should do the same in `info latencystats`. similarly, the LATENCY HISTOGRAM command now shows sub-commands (with their full name) when: * asking for all commands * asking for a specific container command * asking for a specific sub-command) Co-authored-by: Oran Agra <oran@redislabs.com>	2022-01-17 12:32:32 +02:00
Binbin	26ef5132a6	Fix timing issue in PSETEX/PEXPIRE sub-second expire tests (#10121 ) These two tests have a high probability of failure on MacOS. Or it takes many retries to succeed. Keys often expire before we can access them. So this time we try to avoid this by reducing the time of the first `after`, or removeing the first `after`. The results of doing `20/81` and `0/101` are: - PEXPIRE (20/81): 1069/1949 - PEXPIREAT (20/81): 1093/1949 - PEXPIRE (0/101): 31936 / 31936 - PEXPIREAT (0/101): 31936 / 31936 The first number is the number of times that the test succeeded without any retries. The second number is the total number of executions. And we can see that `0/101` doesn't even need an extra retries. Also reduces the time required for testing. So in the end we chose `0/100`, i.e. remove the first `after`. As for `PEXPIREAT`, there is no failure, but we still changed it together, using `0/201`, after 2W tests, none of them failed.	2022-01-17 10:42:13 +02:00
sundb	32e7b46a17	Fix quicklist node not being recompressed correctly after inserting a new node before or after it (#10120 ) ### Describe Fix crash found by CI, Introduced by #9849. When we do any operation on the quicklist, we should make sure that all nodes of the quicklist should not be in the recompressed state. ### Issues This PR fixes two issues with incorrect recompression. 1. The current quicklist node is full and the previous node isn't full, the current node is not recompressed correctly after inserting elements into the previous node. 2. The current quicklist node is full and the next node isn't full, the current node is not recompressed correctly after inserting elements into the next node. ### Test Add two tests to cover incorrect compression issues. ### Other Fix unittest test failure caused by assertion introduced by #9849.	2022-01-16 08:54:40 +02:00
Binbin	440d28091b	Fix function no-cluster flag test (#10115 ) Fixes cluster test introduced in #10066. ``` Function no-cluster flag: ERR Error registering functions: @user_function: 1: wrong number of arguments to redis.register_function ```	2022-01-15 09:13:53 +02:00
Meir Shpilraien (Spielrein)	4db4b43417	Function Flags support (no-writes, no-cluster, allow-state, allow-oom) (#10066 ) # Redis Functions Flags Following the discussion on #10025 Added Functions Flags support. The PR is divided to 2 sections: * Add named argument support to `redis.register_function` API. * Add support for function flags ## `redis.register_function` named argument support The first part of the PR adds support for named argument on `redis.register_function`, example: ``` redis.register_function{ function_name='f1', callback=function() return 'hello' end, description='some desc' } ``` The positional arguments is also kept, which means that it still possible to write: ``` redis.register_function('f1', function() return 'hello' end) ``` But notice that it is no longer possible to pass the optional description argument on the positional argument version. Positional argument was change to allow passing only the mandatory arguments (function name and callback). To pass more arguments the user must use the named argument version. As with positional arguments, the `function_name` and `callback` is mandatory and an error will be raise if those are missing. Also, an error will be raise if an unknown argument name is given or the arguments type is wrong. Tests was added to verify the new syntax. ## Functions Flags The second part of the PR is adding functions flags support. Flags are given to Redis when the engine calls `functionLibCreateFunction`, supported flags are: * `no-writes` - indicating the function perform no writes which means that it is OK to run it on: * read-only replica * Using FCALL_RO * If disk error detected It will not be possible to run a function in those situations unless the function turns on the `no-writes` flag * `allow-oom` - indicate that its OK to run the function even if Redis is in OOM state, if the function will not turn on this flag it will not be possible to run it if OOM reached (even if the function declares `no-writes` and even if `fcall_ro` is used). If this flag is set, any command will be allow on OOM (even those that is marked with CMD_DENYOOM). The assumption is that this flag is for advance users that knows its meaning and understand what they are doing, and Redis trust them to not increase the memory usage. (e.g. it could be an INCR or a modification on an existing key, or a DEL command) * `allow-state` - indicate that its OK to run the function on stale replica, in this case we will also make sure the function is only perform `stale` commands and raise an error if not. * `no-cluster` - indicate to disallow running the function if cluster is enabled. Default behaviure of functions (if no flags is given): 1. Allow functions to read and write 2. Do not run functions on OOM 3. Do not run functions on stale replica 4. Allow functions on cluster ### Lua API for functions flags On Lua engine, it is possible to give functions flags as `flags` named argument: ``` redis.register_function{function_name='f1', callback=function() return 1 end, flags={'no-writes', 'allow-oom'}, description='description'} ``` The function flags argument must be a Lua table that contains all the requested flags, The following will result in an error: * Unknown flag * Wrong flag type Default behaviour is the same as if no flags are used. Tests were added to verify all flags functionality ## Additional changes * mark FCALL and FCALL_RO with CMD_STALE flag (unlike EVAL), so that they can run if the function was registered with the `allow-stale` flag. * Verify `CMD_STALE` on `scriptCall` (`redis.call`), so it will not be possible to call commands from script while stale unless the command is marked with the `CMD_STALE` flags. so that even if the function is allowed while stale we do not allow it to bypass the `CMD_STALE` flag of commands. * Flags section was added to `FUNCTION LIST` command to provide the set of flags for each function: ``` > FUNCTION list withcode 1) 1) "library_name" 2) "test" 3) "engine" 4) "LUA" 5) "description" 6) (nil) 7) "functions" 8) 1) 1) "name" 2) "f1" 3) "description" 4) (nil) 5) "flags" 6) (empty array) 9) "library_code" 10) "redis.register_function{function_name='f1', callback=function() return 1 end}" ``` * Added API to get Redis version from within a script, The redis version can be provided using: 1. `redis.REDIS_VERSION` - string representation of the redis version in the format of MAJOR.MINOR.PATH 2. `redis.REDIS_VERSION_NUM` - number representation of the redis version in the format of `0x00MMmmpp` (`MM` - major, `mm` - minor, `pp` - patch). The number version can be used to check if version is greater or less another version. The string version can be used to return to the user or print as logs. This new API is provided to eval scripts and functions, it also possible to use this API during functions loading phase.	2022-01-14 14:02:02 +02:00
Binbin	56a802057e	Fix kill aof rewrite child test (#10107 ) The dbs doesn't have any keys, `rdb-key-save-delay` config has no effect that cause the rewrite to complete. It was introduced in #10015.	2022-01-13 12:38:41 +02:00
Ozan Tezcan	f41cc87088	Added RM_MonotonicMicroseconds() API to provide monotonic time function (#10101 ) Added RM_MonotonicMicroseconds(). Modules can use monotonic timestamp counter for measurements.	2022-01-13 11:36:03 +02:00
chenyang8094	e9bff7978a	Always create base AOF file when redis start from empty. (#10102 ) Force create a BASE file (use a foreground `rewriteAppendOnlyFile`) when redis starts from an empty data set and `appendonly` is yes. The reasoning is that normally, after redis is running for some time, and the AOF has gone though a few rewrites, there's always a base rdb file. and the scenario where the base file is missing, is kinda rare (happens only at empty startup), so this change normalizes it. But more importantly, there are or could be some complex modules that are started with some configuration, when they create persistence they write that configuration to RDB AUX fields, so that can can always know with which configuration the persistence file they're loading was created (could be critical). there is (was) one scenario in which they could load their persisted data, and that configuration was missing, and this change fixes it. Add a new module event: REDISMODULE_SUBEVENT_PERSISTENCE_SYNC_AOF_START, similar to REDISMODULE_SUBEVENT_PERSISTENCE_AOF_START which is async. Co-authored-by: Oran Agra <oran@redislabs.com>	2022-01-13 08:49:26 +02:00
Binbin	20c33fe6a8	Show subcommand full name in error log / ACL LOG (#10105 ) Use `getFullCommandName` to get the full name of the command. It can also get the full name of the subcommand, like "script\|help". Before: ``` > SCRIPT HELP (error) NOPERM this user has no permissions to run the 'help' command or its subcommand > ACL LOG 7) "object" 8) "help" ``` After: ``` > SCRIPT HELP (error) NOPERM this user has no permissions to run the 'script\|help' command > ACL LOG 7) "object" 8) "script\|help" ``` Fix #10094	2022-01-12 20:05:14 +02:00
Binbin	e22146b07a	Add script tests to cover keys with expiration time set (#10096 ) This commit adds some tests that the test cases will access the keys with expiration time set in the script call. There was no test case for this part before. See #10080 Also there is a test will cover #1525. we block the time so that the key can not expire in the middle of the script execution. Other changes: 1. Delete `evalTimeSnapshot` and just use `scriptTimeSnapshot` in it's place. 2. Some cleanups to scripting.tcl. 3. better names for tests that run in a loop to make them distinctable	2022-01-11 22:43:18 +02:00
Oran Agra	3204a03574	Move doc metadata from COMMAND to COMMAND DOCS (#10056 ) Syntax: `COMMAND DOCS [<command name> ...]` Background: Apparently old version of hiredis (and thus also redis-cli) can't support more than 7 levels of multi-bulk nesting. The solution is to move all the doc related metadata from COMMAND to a new COMMAND DOCS sub-command. The new DOCS sub-command returns a map of commands (not an array like in COMMAND), And the same goes for the `subcommands` field inside it (also contains a map) Besides that, the remaining new fields of COMMAND (hints, key-specs, and sub-commands), are placed in the outer array rather than a nested map. this was done mainly for consistency with the old format. Other changes: --- * Allow COMMAND INFO with no arguments, which returns all commands, so that we can some day deprecated the plain COMMAND (no args) * Reduce the amount of deferred replies from both COMMAND and COMMAND DOCS, especially in the inner loops, since these create many small reply objects, which lead to many small write syscalls and many small TCP packets. To make this easier, when populating the command table, we count the history, args, and hints so we later know their size in advance. Additionally, the movablekeys flag was moved into the flags register. * Update generate-commands-json.py to take the data from both command, it now executes redis-cli directly, instead of taking input from stdin. * Sub-commands in both COMMAND (and COMMAND INFO), and also COMMAND DOCS, show their full name. i.e. CONFIG * GET will be shown as `config\|get` rather than just `get`. This will be visible both when asking for `COMMAND INFO config` and COMMAND INFO config\|get`, but is especially important for the later. i.e. imagine someone doing `COMMAND INFO slowlog\|get config\|get` not being able to distinguish between the two items in the array response.	2022-01-11 17:16:16 +02:00
Binbin	39feee8e3a	LPOP/RPOP with count against non existing list return null array (#10095 ) It used to return `$-1` in RESP2, now we will return `*-1`. This is a bug in redis 6.2 when COUNT was added, the `COUNT` option was introduced in #8179. Fix #10089. the documentation of [LPOP](https://redis.io/commands/lpop) says ``` When called without the count argument: Bulk string reply: the value of the first element, or nil when key does not exist. When called with the count argument: Array reply: list of popped elements, or nil when key does not exist. ```	2022-01-11 14:26:13 +02:00
Madelyn Olson	d0949b7c5c	Fix timing issue with cluster hostname test (#10086 )	2022-01-10 16:21:05 -08:00
chenyang8094	bd46a2abf4	Support whitespace characters in appendfilename, and ban them in appenddirname (#10049 ) 1. Ban whitespace characters in `appenddirname` 2. Handle the case where `appendfilename` contains spaces (for backwards compatibility)	2022-01-10 09:09:39 +02:00
Madelyn Olson	e8e02f900c	Changed latency histogram output to omit trailing 0s and periods (#10075 ) Changed latency percentile output to omit trailing 0s and periods	2022-01-09 17:04:18 -08:00
Binbin	a84c964d37	Fix crash when error [sub]command name contains \| (#10082 ) The following error commands will crash redis-server: ``` > get\| Error: Server closed the connection > get\|set Error: Server closed the connection > get\|other ``` The reason is in #9504, we use `lookupCommandBySds` for find the container command. And it split the command (argv[0]) with `\|`. If we input something like `get\|other`, after the split, `get` will become a valid command name, pass the `ERR unknown command` check, and finally crash in `addReplySubcommandSyntaxError` In this case we do not need to split the command name with `\|` and just look in the commands dict to find if `argv[0]` is a container command. So this commit introduce a new function call `isContainerCommandBySds` that it will return true if a command name is a container command. Also with the old code, there is a incorrect error message: ``` > config\|get set (error) ERR Unknown subcommand or wrong number of arguments for 'set'. Try CONFIG\|GET HELP. ``` The crash was reported in #10070.	2022-01-09 13:06:51 +02:00
guybe7	7cd6a64d2f	lpGetInteger returns int64_t, avoid overflow (#10068 ) Fix #9410 Crucial for the ms and sequence deltas, but I changed all calls, just in case (e.g. "flags") Before this commit: `ms_delta` and `seq_delta` could have overflown, causing `currid` to be wrong, which in turn would cause `streamTrim` to trim the entire rax node (see new test)	2022-01-07 15:31:05 +02:00
Meir Shpilraien (Spielrein)	885f6b5ceb	Redis Function Libraries (#10004 ) # Redis Function Libraries This PR implements Redis Functions Libraries as describe on: https://github.com/redis/redis/issues/9906. Libraries purpose is to provide a better code sharing between functions by allowing to create multiple functions in a single command. Functions that were created together can safely share code between each other without worrying about compatibility issues and versioning. Creating a new library is done using 'FUNCTION LOAD' command (full API is described below) This PR introduces a new struct called libraryInfo, libraryInfo holds information about a library: * name - name of the library * engine - engine used to create the library * code - library code * description - library description * functions - the functions exposed by the library When Redis gets the `FUNCTION LOAD` command it creates a new empty libraryInfo. Redis passes the `CODE` to the relevant engine alongside the empty libraryInfo. As a result, the engine will create one or more functions by calling 'libraryCreateFunction'. The new funcion will be added to the newly created libraryInfo. So far Everything is happening locally on the libraryInfo so it is easy to abort the operation (in case of an error) by simply freeing the libraryInfo. After the library info is fully constructed we start the joining phase by which we will join the new library to the other libraries currently exist on Redis. The joining phase make sure there is no function collision and add the library to the librariesCtx (renamed from functionCtx). LibrariesCtx is used all around the code in the exact same way as functionCtx was used (with respect to RDB loading, replicatio, ...). The only difference is that apart from function dictionary (maps function name to functionInfo object), the librariesCtx contains also a libraries dictionary that maps library name to libraryInfo object. ## New API ### FUNCTION LOAD `FUNCTION LOAD <ENGINE> <LIBRARY NAME> [REPLACE] [DESCRIPTION <DESCRIPTION>] <CODE>` Create a new library with the given parameters: * ENGINE - REPLACE Engine name to use to create the library. * LIBRARY NAME - The new library name. * REPLACE - If the library already exists, replace it. * DESCRIPTION - Library description. * CODE - Library code. Return "OK" on success, or error on the following cases: * Library name already taken and REPLACE was not used * Name collision with another existing library (even if replace was uses) * Library registration failed by the engine (usually compilation error) ## Changed API ### FUNCTION LIST `FUNCTION LIST [LIBRARYNAME <LIBRARY NAME PATTERN>] [WITHCODE]` Command was modified to also allow getting libraries code (so `FUNCTION INFO` command is no longer needed and removed). In addition the command gets an option argument, `LIBRARYNAME` allows you to only get libraries that match the given `LIBRARYNAME` pattern. By default, it returns all libraries. ### INFO MEMORY Added number of libraries to `INFO MEMORY` ### Commands flags `DENYOOM` flag was set on `FUNCTION LOAD` and `FUNCTION RESTORE`. We consider those commands as commands that add new data to the dateset (functions are data) and so we want to disallows to run those commands on OOM. ## Removed API * FUNCTION CREATE - Decided on https://github.com/redis/redis/issues/9906 * FUNCTION INFO - Decided on https://github.com/redis/redis/issues/9899 ## Lua engine changes When the Lua engine gets the code given on `FUNCTION LOAD` command, it immediately runs it, we call this run the loading run. Loading run is not a usual script run, it is not possible to invoke any Redis command from within the load run. Instead there is a new API provided by `library` object. The new API's: * `redis.log` - behave the same as `redis.log` * `redis.register_function` - register a new function to the library The loading run purpose is to register functions using the new `redis.register_function` API. Any attempt to use any other API will result in an error. In addition, the load run is has a time limit of 500ms, error is raise on timeout and the entire operation is aborted. ### `redis.register_function` `redis.register_function(<function_name>, <callback>, [<description>])` This new API allows users to register a new function that will be linked to the newly created library. This API can only be called during the load run (see definition above). Any attempt to use it outside of the load run will result in an error. The parameters pass to the API are: * function_name - Function name (must be a Lua string) * callback - Lua function object that will be called when the function is invokes using fcall/fcall_ro * description - Function description, optional (must be a Lua string). ### Example The following example creates a library called `lib` with 2 functions, `f1` and `f1`, returns 1 and 2 respectively: ``` local function f1(keys, args) return 1 end local function f2(keys, args) return 2 end redis.register_function('f1', f1) redis.register_function('f2', f2) ``` Notice: Unlike `eval`, functions inside a library get the KEYS and ARGV as arguments to the functions and not as global. ### Technical Details On the load run we only want the user to be able to call a white list on API's. This way, in the future, if new API's will be added, the new API's will not be available to the load run unless specifically added to this white list. We put the while list on the `library` object and make sure the `library` object is only available to the load run by using [lua_setfenv](https://www.lua.org/manual/5.1/manual.html#lua_setfenv) API. This API allows us to set the `globals` of a function (and all the function it creates). Before starting the load run we create a new fresh Lua table (call it `g`) that only contains the `library` API (we make sure to set global protection on this table just like the general global protection already exists today), then we use [lua_setfenv](https://www.lua.org/manual/5.1/manual.html#lua_setfenv) to set `g` as the global table of the load run. After the load run finished we update `g` metatable and set `__index` and `__newindex` functions to be `_G` (Lua default globals), we also pop out the `library` object as we do not need it anymore. This way, any function that was created on the load run (and will be invoke using `fcall`) will see the default globals as it expected to see them and will not have the `library` API anymore. An important outcome of this new approach is that now we can achieve a distinct global table for each library (it is not yet like that but it is very easy to achieve it now). In the future we can decide to remove global protection because global on different libraries will not collide or we can chose to give different API to different libraries base on some configuration or input. Notice that this technique was meant to prevent errors and was not meant to prevent malicious user from exploit it. For example, the load run can still save the `library` object on some local variable and then using in `fcall` context. To prevent such a malicious use, the C code also make sure it is running in the right context and if not raise an error.	2022-01-06 13:39:38 +02:00
Ozan Tezcan	568c2e039b	Set errno to EEXIST in redisFork() if child process exists (#10059 ) Callers of redisFork() are logging `strerror(errno)` on failure. `errno` is not set when there is already a child process, causing printing current value of errno which was set before `redisFork()` call. Setting errno to EEXIST on this failure to provide more meaningful error message.	2022-01-06 09:54:21 +02:00
filipe oliveira	5dd15443ac	Added INFO LATENCYSTATS section: latency by percentile distribution/latency by cumulative distribution of latencies (#9462 ) # Short description The Redis extended latency stats track per command latencies and enables: - exporting the per-command percentile distribution via the `INFO LATENCYSTATS` command. ( percentile distribution is not mergeable between cluster nodes ). - exporting the per-command cumulative latency distributions via the `LATENCY HISTOGRAM` command. Using the cumulative distribution of latencies we can merge several stats from different cluster nodes to calculate aggregate metrics . By default, the extended latency monitoring is enabled since the overhead of keeping track of the command latency is very small. If you don't want to track extended latency metrics, you can easily disable it at runtime using the command: - `CONFIG SET latency-tracking no` By default, the exported latency percentiles are the p50, p99, and p999. You can alter them at runtime using the command: - `CONFIG SET latency-tracking-info-percentiles "0.0 50.0 100.0"` ## Some details: - The total size per histogram should sit around 40 KiB. We only allocate those 40KiB when a command was called for the first time. - With regards to the WRITE overhead As seen below, there is no measurable overhead on the achievable ops/sec or full latency spectrum on the client. Including also the measured redis-benchmark for unstable vs this branch. - We track from 1 nanosecond to 1 second ( everything above 1 second is considered +Inf ) ## `INFO LATENCYSTATS` exposition format - Format: `latency_percentiles_usec_<CMDNAME>:p0=XX,p50....` ## `LATENCY HISTOGRAM [command ...]` exposition format Return a cumulative distribution of latencies in the format of a histogram for the specified command names. The histogram is composed of a map of time buckets: - Each representing a latency range, between 1 nanosecond and roughly 1 second. - Each bucket covers twice the previous bucket's range. - Empty buckets are not printed. - Everything above 1 sec is considered +Inf. - At max there will be log2(1000000000)=30 buckets We reply a map for each command in the format: `<command name> : { `calls`: <total command calls> , `histogram` : { <bucket 1> : latency , < bucket 2> : latency, ... } }` Co-authored-by: Oran Agra <oran@redislabs.com>	2022-01-05 14:01:05 +02:00
sundb	4d3c4cfac7	Show the elapsed time of single test and speed up some tests (#10058 ) Following #10038. This PR introduces two changes. 1. Show the elapsed time of a single test in the test output, in order to have a more detailed understanding of the changes in test run time. 2. Speedup two tests related to `key-load-delay` configuration. other tests do not seem to be affected by #10003.	2022-01-05 13:49:01 +02:00
Ozan Tezcan	d1b5b63872	Fix typo in multi test (#10054 )	2022-01-05 10:16:04 +02:00
Binbin	b7f9e9ae39	Add tests for blocking XREAD[GROUP] when the stream ran dry (#10035 ) The purpose of this commit is to add some tests to cover #5299, which was fixed in #5300 but without tests. This commit should close #5306 and #5299.	2022-01-04 21:48:49 +02:00
guybe7	ac84b1cd82	Ban snapshot-creating commands and other admin commands from transactions (#10015 ) Creating fork (or even a foreground SAVE) during a transaction breaks the atomicity of the transaction. In addition to that, it could mess up the propagated transaction to the AOF file. This change blocks SAVE, PSYNC, SYNC and SHUTDOWN from being executed inside MULTI-EXEC. It does that by adding a command flag, so that modules can flag their commands with that flag too. Besides it changes BGSAVE, BGREWRITEAOF, and CONFIG SET appendonly, to turn the scheduled flag instead of forking righ taway. Other changes: * expose `protected`, `no-async-loading`, and `no_multi` flags in COMMAND command * add a test to validate propagation of FLUSHALL inside a transaction. * add a test to validate how CONFIG SET that errors reacts in a transaction Co-authored-by: Oran Agra <oran@redislabs.com>	2022-01-04 13:37:47 +02:00
zhaozhao.zz	2e1979a21e	use startEvictionTimeProc() in config set maxmemory (#10019 ) This would mean that the effects of `CONFIG SET maxmemory` may not be visible once the command returns. That could anyway happen since incremental eviction was added in redis 6.2 (see #7653) We do this to fix one of the propagation bugs about eviction see #9890 and #10014.	2022-01-04 13:08:10 +02:00
chenyang8094	87789fae0b	Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788 ) Implement Multi-Part AOF mechanism to avoid overheads during AOFRW. Introducing a folder with multiple AOF files tracked by a manifest file. The main issues with the the original AOFRW mechanism are: * buffering of commands that are processed during rewrite (consuming a lot of RAM) * freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it. * double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files) The main modifications of this PR: 1. Remove the AOF rewrite buffer and related code. 2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type, it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the incremental commands since the last AOFRW. 3. Use a AOF manifest file to record and manage these AOF files mentioned above. 4. The original configuration of `appendfilename` will be the base part of the new file name, for example: `appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof` 5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename` 6. Remove the `aof_rewrite_buffer_length` field in info. 7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs. It also gives users the opportunity to preserve the history AOFs. just for testing use now. 8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now), we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately. 9. Support upgrade (load) data from old version redis. 10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and manifest file will be placed in this directory. 11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if `aof-load-truncated` is enabled. Co-authored-by: Oran Agra <oran@redislabs.com>	2022-01-03 19:14:13 +02:00
Meir Shpilraien (Spielrein)	78a62c0124	Fix OOM error not raised of functions (#10048 ) OOM Error did not raise on functions due to a bug. Added test to verify the fix.	2022-01-03 19:04:29 +02:00
Madelyn Olson	5460c10047	Implement clusterbus message extensions and cluster hostname support (#9530 ) Implement the ability for cluster nodes to advertise their location with extension messages.	2022-01-02 19:48:29 -08:00
Harkrishn Patro	9f8885760b	Sharded pubsub implementation (#8621 ) This commit implements a sharded pubsub implementation based off of shard channels. Co-authored-by: Harkrishn Patro <harkrisp@amazon.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>	2022-01-02 16:54:47 -08:00
Binbin	b8ba942ac2	Add DUMP RESTORE tests for redis-cli -x and -X options (#10041 ) This commit adds DUMP RESTORES tests for the -x and -X options. I wanted to add it in #9980 which introduce the -X option, but back then i failed due to some errors (related to redis-cli call).	2022-01-02 13:58:22 +02:00
Viktor Söderqvist	45a155bd0f	Wait for replicas when shutting down (#9872 ) To avoid data loss, this commit adds a grace period for lagging replicas to catch up the replication offset. Done: * Wait for replicas when shutdown is triggered by SIGTERM and SIGINT. * Wait for replicas when shutdown is triggered by the SHUTDOWN command. A new blocked client type BLOCKED_SHUTDOWN is introduced, allowing multiple clients to call SHUTDOWN in parallel. Note that they don't expect a response unless an error happens and shutdown is aborted. * Log warning for each replica lagging behind when finishing shutdown. * CLIENT_PAUSE_WRITE while waiting for replicas. * Configurable grace period 'shutdown-timeout' in seconds (default 10). * New flags for the SHUTDOWN command: - NOW disables the grace period for lagging replicas. - FORCE ignores errors writing the RDB or AOF files which would normally prevent a shutdown. - ABORT cancels ongoing shutdown. Can't be combined with other flags. * New field in the output of the INFO command: 'shutdown_in_milliseconds'. The value is the remaining maximum time to wait for lagging replicas before finishing the shutdown. This field is present in the Server section only during shutdown. Not directly related: * When shutting down, if there is an AOF saving child, it is killed even if AOF is disabled. This can happen if BGREWRITEAOF is used when AOF is off. * Client pause now has end time and type (WRITE or ALL) per purpose. The different pause purposes are CLIENT PAUSE command, failover and shutdown. If clients are unpaused for one purpose, it doesn't affect client pause for other purposes. For example, the CLIENT UNPAUSE command doesn't affect client pause initiated by the failover or shutdown procedures. A completed failover or a failed shutdown doesn't unpause clients paused by the CLIENT PAUSE command. Notes: * DEBUG RESTART doesn't wait for replicas. * We already have a warning logged when a replica disconnects. This means that if any replica connection is lost during the shutdown, it is either logged as disconnected or as lagging at the time of exit. Co-authored-by: Oran Agra <oran@redislabs.com>	2022-01-02 09:50:15 +02:00
yoav-steinberg	1bf6d6f11e	Generate RDB with Functions only via redis-cli --functions-rdb (#9968 ) This is needed in order to ease the deployment of functions for ephemeral cases, where user needs to spin up a server with functions pre-loaded. #### Details: * Added `--functions-rdb` option to _redis-cli_. * Functions only rdb via `REPLCONF rdb-filter-only functions`. This is a placeholder for a space separated inclusion filter for the RDB. In the future can be `REPLCONF rdb-filter-only "functions db:3 key-patten:user"` and a complementing `rdb-filter-exclude` `REPLCONF` can also be added. Handle "slave requirements" specification to RDB saving code so we can use the same RDB when different slaves express the same requirements (like functions-only) and not share the RDB when their requirements differ. This is currently just a flags `int`, but can be extended to a more complex structure with various filter fields. * make sure to support filters only in diskless replication mode (not to override the persistence file), we do that by forcing diskless (even if disabled by config) other changes: * some refactoring in rdb.c (extract portion of a big function to a sub-function) * rdb_key_save_delay used in AOFRW too * sendChildInfo takes the number of updated keys (incremental, rather than absolute) Co-authored-by: Oran Agra <oran@redislabs.com>	2022-01-02 09:39:01 +02:00
sundb	888e92eb57	Fix a valgrind test failure due to slowly shutdown (#10038 ) This pr is mainly to solve the problem that redis process cannot be exited normally, due to changes in #10003. When a test uses the `key-load-delay` config to delay loading, but does not reset it at the end of the test, will lead to server wait for the loading to reach the event loop (once in 2mb) before actually shutting down.	2022-01-01 17:45:13 +02:00
Viktor Söderqvist	e4b3a257ee	Modules: Mark all APIs non-experimental (#9983 ) These exist for quite some time, and are no longer experimental	2021-12-30 12:17:22 +02:00
Binbin	4836ae32c7	redis-cli: Add -X option and extend --cluster call take arg from stdin (#9980 ) There are two changes in this commit: 1. Add -X option to redis-cli. Currently `-x` can only be used to provide the last argument, so you can do `redis-cli dump keyname > key.dump`, and then do `redis-cli -x restore keyname 0 < key.dump`. But what if you want to add the replace argument (which comes last?). oran suggested adding such usage: `redis-cli -X <tag> restore keyname <tag> replace < key.dump` i.e. you're able to provide a string in the arguments that's gonna be substituted with the content from stdin. Note that the tag name should not conflict with others non-replaced args. And the -x and -X options are conflicting. Some usages: ``` [root]# echo mypasswd \| src/redis-cli -X passwd_tag mset username myname password passwd_tag OK [root]# echo username > username.txt [root]# head -c -1 username.txt \| src/redis-cli -X name_tag mget name_tag password 1) "myname" 2) "mypasswd\n" ``` 2. Handle the combination of both `-x` and `--cluster` or `-X` and `--cluster` Extend the broadcast option to receive the last arg or <tag> arg from the stdin. Now we can use `redis-cli -x --cluster call <host>:<port> cmd`, or `redis-cli -X <tag> --cluster call <host>:<port> cmd <tag>`. (support part of #9899)	2021-12-30 12:10:04 +02:00
Ozan Tezcan	b0c06e904a	Fixed typo in test tag (for needs:debug) (#10021 )	2021-12-28 16:23:02 +02:00
guybe7	266d95066d	Remove incomplete fix of a broader problem (#10013 ) Preventing COFIG SET maxmemory from propagating is just the tip of the iceberg. Module that performs a write operation in a notification can cause any command to be propagated, based on server.dirty We need to come up with a better solution.	2021-12-28 10:19:58 +02:00
chenyang8094	af0b50f83a	Tests: don't rely on the response of MEMORY USAGE when mem_allocator is not jemalloc (#10010 ) It turns out that libc malloc can return an allocation of a different size on requests of the same size. this means that matching MEMORY USAGE of one key to another copy of the same data can fail. Solution: Keep running the test that calls MEMORY USAGE, but ignore the response. We do that by introducing a new utility function to get the memory usage, which always returns 1 when the allocator is not jemalloc. Other changes: Some formatting for datatype2.tcl Co-authored-by: Oran Agra <oran@redislabs.com>	2021-12-27 21:37:21 +02:00
Itamar Haber	f810510bb2	Adds utils/gen-commands-json.py (#9958 ) Following #9656, this script generates a "commands.json" file from the output of the new COMMAND. The output of this script is used in redis/redis-doc#1714 and by redis/redis-io#259. This also converts a couple of rogue dashes (in 'key-specs' and 'multiple-token' flags) to underscores (continues #9959).	2021-12-27 19:31:13 +02:00
chenyang8094	317464a386	Fix failing test due to recent change in transaction propagation (#10006 ) PR #9890 may have introduced a problem. There are tests that use MULTI-EXEC to make sure two BGSAVE / BGREWRITEAOF are executed together. But now it's not valid to run run commands that create a snapshot inside a transaction (gonna be blocked soon) This PR modifies the test not to rely on MULTI-EXEC. Co-authored-by: Oran Agra <oran@redislabs.com>	2021-12-27 15:18:17 +02:00
guybe7	0f15e025e6	Fix race in propagation test (#10012 ) There's a race between testing DBSIZE and the thread starting. If the thread hadn't started by the time we checked DBISZE, no keys will have been evicted. The correct way is to check the evicted_keys stat.	2021-12-27 12:31:24 +02:00
Binbin	e84ccc3f56	santize dump payload: fix carsh when zset with NAN score (#10002 ) `zslInsert` with a NAN score will crash the server. This one found by the `corrupt-dump-fuzzer`.	2021-12-26 11:40:11 +02:00
Meir Shpilraien (Spielrein)	365cbf46a7	Add FUNCTION DUMP and RESTORE. (#9938 ) Follow the conclusions to support Functions in redis cluster (#9899) Added 2 new FUNCTION sub-commands: 1. `FUNCTION DUMP` - dump a binary payload representation of all the functions. 2. `FUNCTION RESTORE <PAYLOAD> [FLUSH\|APPEND\|REPLACE]` - give the binary payload extracted using `FUNCTION DUMP`, restore all the functions on the given payload. Restore policy can be given to control how to handle existing functions (default is APPEND): * FLUSH: delete all existing functions. * APPEND: appends the restored functions to the existing functions. On collision, abort. * REPLACE: appends the restored functions to the existing functions. On collision, replace the old function with the new function. Modify `redis-cli --cluster add-node` to use `FUNCTION DUMP` to get existing functions from one of the nodes in the cluster, and `FUNCTION RESTORE` to load the same set of functions to the new node. `redis-cli` will execute this step before sending the `CLUSTER MEET` command to the new node. If `FUNCTION DUMP` returns an error, assume the current Redis version do not support functions and skip `FUNCTION RESTORE`. If `FUNCTION RESTORE` fails, abort and do not send the `CLUSTER MEET` command. If the new node already contains functions (before the `FUNCTION RESTORE` is sent), abort and do not add the node to the cluster. Test was added to verify `redis-cli --cluster add-node` works as expected.	2021-12-26 09:03:37 +02:00
Meir Shpilraien (Spielrein)	08ff606b0b	Changed fuction name to be case insensitive. (#9984 ) Use case insensitive string comparison for function names (like we do for commands and configs) In addition, add verification that the functions only use the following characters: [a-zA-Z0-9_]	2021-12-26 08:37:24 +02:00
guybe7	7ac213079c	Sort out mess around propagation and MULTI/EXEC (#9890 ) The mess: Some parts use alsoPropagate for late propagation, others using an immediate one (propagate()), causing edge cases, ugly/hacky code, and the tendency for bugs The basic idea is that all commands are propagated via alsoPropagate (i.e. added to a list) and the top-most call() is responsible for going over that list and actually propagating them (and wrapping them in MULTI/EXEC if there's more than one command). This is done in the new function, propagatePendingCommands. Callers to propagatePendingCommands: 1. top-most call() (we want all nested call()s to add to the also_propagate array and just the top-most one to propagate them) - via `afterCommand` 2. handleClientsBlockedOnKeys: it is out of call() context and it may propagate stuff - via `afterCommand`. 3. handleClientsBlockedOnKeys edge case: if the looked-up key is already expired, we will propagate the expire but will not unblock any client so `afterCommand` isn't called. in that case, we have to propagate the deletion explicitly. 4. cron stuff: active-expire and eviction may also propagate stuff 5. modules: the module API allows to propagate stuff from just about anywhere (timers, keyspace notifications, threads). I could have tried to catch all the out-of-call-context places but it seemed easier to handle it in one place: when we free the context. in the spirit of what was done in call(), only the top-most freeing of a module context may cause propagation. 6. modules: when using a thread-safe ctx it's not clear when/if the ctx will be freed. we do know that the module must lock the GIL before calling RM_Replicate/RM_Call so we propagate the pending commands when releasing the GIL. A "known limitation", which were actually a bug, was fixed because of this commit (see propagate.tcl): When using a mix of RM_Call with `!` and RM_Replicate, the command would propagate out-of-order: first all the commands from RM_Call, and then the ones from RM_Replicate Another thing worth mentioning is that if, in the past, a client would issue a MULTI/EXEC with just one write command the server would blindly propagate the MULTI/EXEC too, even though it's redundant. not anymore. This commit renames propagate() to propagateNow() in order to cause conflicts in pending PRs. propagatePendingCommands is the only caller of propagateNow, which is now a static, internal helper function. Optimizations: 1. alsoPropagate will not add stuff to also_propagate if there's no AOF and replicas 2. alsoPropagate reallocs also_propagagte exponentially, to save calls to memmove Bugfixes: 1. CONFIG SET can create evictions, sending notifications which can cause to dirty++ with modules. we need to prevent it from propagating to AOF/replicas 2. We need to set current_client in RM_Call. buggy scenario: - CONFIG SET maxmemory, eviction notifications, module hook calls RM_Call - assertion in lookupKey crashes, because current_client has CONFIG SET, which isn't CMD_WRITE 3. minor: in eviction, call propagateDeletion after notification, like active-expire and all commands (we always send a notification before propagating the command)	2021-12-23 00:03:48 +02:00
Oran Agra	b7567394e1	resolve replication test timing sensitivity - 2nd attempt (#9988 ) issue started failing after #9878 was merged (made an exiting test more sensitive) looks like #9982 didn't help, tested this one and it seems to work better. this commit does two things: 1. reduce the extra delay i added earlier and instead add more keys, the effect no duration of replication is the same, but the intervals in which the server is responsive to the tcl client is higher. 2. improve the test infra to print context when assert_error fails.	2021-12-22 23:37:12 +02:00
Oran Agra	e33e0295bb	resolve replication test timing sensitivity (#9982 ) issue started failing after #9878 was merged (made an exiting test more sensitive)	2021-12-22 16:05:53 +02:00
Oran Agra	41e6e05dee	Allow most CONFIG SET during loading, block some commands in async-loading (#9878 ) ## background Till now CONFIG SET was blocked during loading. (In the not so distant past, GET was disallowed too) We recently (not released yet) added an async-loading mode, see #9323, and during that time it'll serve CONFIG SET and any other command. And now we realized (#9770) that some configs, and commands are dangerous during async-loading. ## changes * Allow most CONFIG SET during loading (both on async-loading and normal loading) * Allow CONFIG REWRITE and CONFIG RESETSTAT during loading * Block a few config during loading (`appendonly`, `repl-diskless-load`, and `dir`) * Block a few commands during loading (list below) ## the blocked commands: * SAVE - obviously we don't wanna start a foregreound save during loading 8-) * BGSAVE - we don't mind to schedule one, but we don't wanna fork now * BGREWRITEAOF - we don't mind to schedule one, but we don't wanna fork now * MODULE - we obviously don't wanna unload a module during replication / rdb loading (MODULE HELP and MODULE LIST are not blocked) * SYNC / PSYNC - we're in the middle of RDB loading from master, must not allow sync requests now. * REPLICAOF / SLAVEOF - we're in the middle of replicating, maybe it makes sense to let the user abort it, but he couldn't do that so far, i don't wanna take any risk of bugs due to odd state. * CLUSTER - only allow [HELP, SLOTS, NODES, INFO, MYID, LINKS, KEYSLOT, COUNTKEYSINSLOT, GETKEYSINSLOT, RESET, REPLICAS, COUNT_FAILURE_REPORTS], for others, preserve the status quo ## other fixes * processEventsWhileBlocked had an issue when being nested, this could happen with a busy script during async loading (new), but also in a busy script during AOF loading (old). this lead to a crash in the scenario described in #6988	2021-12-22 14:11:16 +02:00

... 3 4 5 6 7 ...

2072 Commits