redict

mirror of https://codeberg.org/redict/redict.git synced 2025-01-23 08:38:27 -05:00

Author	SHA1	Message	Date
Ozan Tezcan	d1b5b63872	Fix typo in multi test (#10054 )	2022-01-05 10:16:04 +02:00
Binbin	b7f9e9ae39	Add tests for blocking XREAD[GROUP] when the stream ran dry (#10035 ) The purpose of this commit is to add some tests to cover #5299, which was fixed in #5300 but without tests. This commit should close #5306 and #5299.	2022-01-04 21:48:49 +02:00
guybe7	ac84b1cd82	Ban snapshot-creating commands and other admin commands from transactions (#10015 ) Creating fork (or even a foreground SAVE) during a transaction breaks the atomicity of the transaction. In addition to that, it could mess up the propagated transaction to the AOF file. This change blocks SAVE, PSYNC, SYNC and SHUTDOWN from being executed inside MULTI-EXEC. It does that by adding a command flag, so that modules can flag their commands with that flag too. Besides it changes BGSAVE, BGREWRITEAOF, and CONFIG SET appendonly, to turn the scheduled flag instead of forking righ taway. Other changes: * expose `protected`, `no-async-loading`, and `no_multi` flags in COMMAND command * add a test to validate propagation of FLUSHALL inside a transaction. * add a test to validate how CONFIG SET that errors reacts in a transaction Co-authored-by: Oran Agra <oran@redislabs.com>	2022-01-04 13:37:47 +02:00
zhaozhao.zz	2e1979a21e	use startEvictionTimeProc() in config set maxmemory (#10019 ) This would mean that the effects of `CONFIG SET maxmemory` may not be visible once the command returns. That could anyway happen since incremental eviction was added in redis 6.2 (see #7653) We do this to fix one of the propagation bugs about eviction see #9890 and #10014.	2022-01-04 13:08:10 +02:00
chenyang8094	87789fae0b	Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788 ) Implement Multi-Part AOF mechanism to avoid overheads during AOFRW. Introducing a folder with multiple AOF files tracked by a manifest file. The main issues with the the original AOFRW mechanism are: * buffering of commands that are processed during rewrite (consuming a lot of RAM) * freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it. * double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files) The main modifications of this PR: 1. Remove the AOF rewrite buffer and related code. 2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type, it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the incremental commands since the last AOFRW. 3. Use a AOF manifest file to record and manage these AOF files mentioned above. 4. The original configuration of `appendfilename` will be the base part of the new file name, for example: `appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof` 5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename` 6. Remove the `aof_rewrite_buffer_length` field in info. 7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs. It also gives users the opportunity to preserve the history AOFs. just for testing use now. 8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now), we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately. 9. Support upgrade (load) data from old version redis. 10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and manifest file will be placed in this directory. 11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if `aof-load-truncated` is enabled. Co-authored-by: Oran Agra <oran@redislabs.com>	2022-01-03 19:14:13 +02:00
Meir Shpilraien (Spielrein)	78a62c0124	Fix OOM error not raised of functions (#10048 ) OOM Error did not raise on functions due to a bug. Added test to verify the fix.	2022-01-03 19:04:29 +02:00
Madelyn Olson	5460c10047	Implement clusterbus message extensions and cluster hostname support (#9530 ) Implement the ability for cluster nodes to advertise their location with extension messages.	2022-01-02 19:48:29 -08:00
Harkrishn Patro	9f8885760b	Sharded pubsub implementation (#8621 ) This commit implements a sharded pubsub implementation based off of shard channels. Co-authored-by: Harkrishn Patro <harkrisp@amazon.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>	2022-01-02 16:54:47 -08:00
Binbin	b8ba942ac2	Add DUMP RESTORE tests for redis-cli -x and -X options (#10041 ) This commit adds DUMP RESTORES tests for the -x and -X options. I wanted to add it in #9980 which introduce the -X option, but back then i failed due to some errors (related to redis-cli call).	2022-01-02 13:58:22 +02:00
Viktor Söderqvist	45a155bd0f	Wait for replicas when shutting down (#9872 ) To avoid data loss, this commit adds a grace period for lagging replicas to catch up the replication offset. Done: * Wait for replicas when shutdown is triggered by SIGTERM and SIGINT. * Wait for replicas when shutdown is triggered by the SHUTDOWN command. A new blocked client type BLOCKED_SHUTDOWN is introduced, allowing multiple clients to call SHUTDOWN in parallel. Note that they don't expect a response unless an error happens and shutdown is aborted. * Log warning for each replica lagging behind when finishing shutdown. * CLIENT_PAUSE_WRITE while waiting for replicas. * Configurable grace period 'shutdown-timeout' in seconds (default 10). * New flags for the SHUTDOWN command: - NOW disables the grace period for lagging replicas. - FORCE ignores errors writing the RDB or AOF files which would normally prevent a shutdown. - ABORT cancels ongoing shutdown. Can't be combined with other flags. * New field in the output of the INFO command: 'shutdown_in_milliseconds'. The value is the remaining maximum time to wait for lagging replicas before finishing the shutdown. This field is present in the Server section only during shutdown. Not directly related: * When shutting down, if there is an AOF saving child, it is killed even if AOF is disabled. This can happen if BGREWRITEAOF is used when AOF is off. * Client pause now has end time and type (WRITE or ALL) per purpose. The different pause purposes are CLIENT PAUSE command, failover and shutdown. If clients are unpaused for one purpose, it doesn't affect client pause for other purposes. For example, the CLIENT UNPAUSE command doesn't affect client pause initiated by the failover or shutdown procedures. A completed failover or a failed shutdown doesn't unpause clients paused by the CLIENT PAUSE command. Notes: * DEBUG RESTART doesn't wait for replicas. * We already have a warning logged when a replica disconnects. This means that if any replica connection is lost during the shutdown, it is either logged as disconnected or as lagging at the time of exit. Co-authored-by: Oran Agra <oran@redislabs.com>	2022-01-02 09:50:15 +02:00
yoav-steinberg	1bf6d6f11e	Generate RDB with Functions only via redis-cli --functions-rdb (#9968 ) This is needed in order to ease the deployment of functions for ephemeral cases, where user needs to spin up a server with functions pre-loaded. #### Details: * Added `--functions-rdb` option to _redis-cli_. * Functions only rdb via `REPLCONF rdb-filter-only functions`. This is a placeholder for a space separated inclusion filter for the RDB. In the future can be `REPLCONF rdb-filter-only "functions db:3 key-patten:user"` and a complementing `rdb-filter-exclude` `REPLCONF` can also be added. Handle "slave requirements" specification to RDB saving code so we can use the same RDB when different slaves express the same requirements (like functions-only) and not share the RDB when their requirements differ. This is currently just a flags `int`, but can be extended to a more complex structure with various filter fields. * make sure to support filters only in diskless replication mode (not to override the persistence file), we do that by forcing diskless (even if disabled by config) other changes: * some refactoring in rdb.c (extract portion of a big function to a sub-function) * rdb_key_save_delay used in AOFRW too * sendChildInfo takes the number of updated keys (incremental, rather than absolute) Co-authored-by: Oran Agra <oran@redislabs.com>	2022-01-02 09:39:01 +02:00
sundb	888e92eb57	Fix a valgrind test failure due to slowly shutdown (#10038 ) This pr is mainly to solve the problem that redis process cannot be exited normally, due to changes in #10003. When a test uses the `key-load-delay` config to delay loading, but does not reset it at the end of the test, will lead to server wait for the loading to reach the event loop (once in 2mb) before actually shutting down.	2022-01-01 17:45:13 +02:00
Viktor Söderqvist	e4b3a257ee	Modules: Mark all APIs non-experimental (#9983 ) These exist for quite some time, and are no longer experimental	2021-12-30 12:17:22 +02:00
Binbin	4836ae32c7	redis-cli: Add -X option and extend --cluster call take arg from stdin (#9980 ) There are two changes in this commit: 1. Add -X option to redis-cli. Currently `-x` can only be used to provide the last argument, so you can do `redis-cli dump keyname > key.dump`, and then do `redis-cli -x restore keyname 0 < key.dump`. But what if you want to add the replace argument (which comes last?). oran suggested adding such usage: `redis-cli -X <tag> restore keyname <tag> replace < key.dump` i.e. you're able to provide a string in the arguments that's gonna be substituted with the content from stdin. Note that the tag name should not conflict with others non-replaced args. And the -x and -X options are conflicting. Some usages: ``` [root]# echo mypasswd \| src/redis-cli -X passwd_tag mset username myname password passwd_tag OK [root]# echo username > username.txt [root]# head -c -1 username.txt \| src/redis-cli -X name_tag mget name_tag password 1) "myname" 2) "mypasswd\n" ``` 2. Handle the combination of both `-x` and `--cluster` or `-X` and `--cluster` Extend the broadcast option to receive the last arg or <tag> arg from the stdin. Now we can use `redis-cli -x --cluster call <host>:<port> cmd`, or `redis-cli -X <tag> --cluster call <host>:<port> cmd <tag>`. (support part of #9899)	2021-12-30 12:10:04 +02:00
Ozan Tezcan	b0c06e904a	Fixed typo in test tag (for needs:debug) (#10021 )	2021-12-28 16:23:02 +02:00
guybe7	266d95066d	Remove incomplete fix of a broader problem (#10013 ) Preventing COFIG SET maxmemory from propagating is just the tip of the iceberg. Module that performs a write operation in a notification can cause any command to be propagated, based on server.dirty We need to come up with a better solution.	2021-12-28 10:19:58 +02:00
chenyang8094	af0b50f83a	Tests: don't rely on the response of MEMORY USAGE when mem_allocator is not jemalloc (#10010 ) It turns out that libc malloc can return an allocation of a different size on requests of the same size. this means that matching MEMORY USAGE of one key to another copy of the same data can fail. Solution: Keep running the test that calls MEMORY USAGE, but ignore the response. We do that by introducing a new utility function to get the memory usage, which always returns 1 when the allocator is not jemalloc. Other changes: Some formatting for datatype2.tcl Co-authored-by: Oran Agra <oran@redislabs.com>	2021-12-27 21:37:21 +02:00
Itamar Haber	f810510bb2	Adds utils/gen-commands-json.py (#9958 ) Following #9656, this script generates a "commands.json" file from the output of the new COMMAND. The output of this script is used in redis/redis-doc#1714 and by redis/redis-io#259. This also converts a couple of rogue dashes (in 'key-specs' and 'multiple-token' flags) to underscores (continues #9959).	2021-12-27 19:31:13 +02:00
chenyang8094	317464a386	Fix failing test due to recent change in transaction propagation (#10006 ) PR #9890 may have introduced a problem. There are tests that use MULTI-EXEC to make sure two BGSAVE / BGREWRITEAOF are executed together. But now it's not valid to run run commands that create a snapshot inside a transaction (gonna be blocked soon) This PR modifies the test not to rely on MULTI-EXEC. Co-authored-by: Oran Agra <oran@redislabs.com>	2021-12-27 15:18:17 +02:00
guybe7	0f15e025e6	Fix race in propagation test (#10012 ) There's a race between testing DBSIZE and the thread starting. If the thread hadn't started by the time we checked DBISZE, no keys will have been evicted. The correct way is to check the evicted_keys stat.	2021-12-27 12:31:24 +02:00
Binbin	e84ccc3f56	santize dump payload: fix carsh when zset with NAN score (#10002 ) `zslInsert` with a NAN score will crash the server. This one found by the `corrupt-dump-fuzzer`.	2021-12-26 11:40:11 +02:00
Meir Shpilraien (Spielrein)	365cbf46a7	Add FUNCTION DUMP and RESTORE. (#9938 ) Follow the conclusions to support Functions in redis cluster (#9899) Added 2 new FUNCTION sub-commands: 1. `FUNCTION DUMP` - dump a binary payload representation of all the functions. 2. `FUNCTION RESTORE <PAYLOAD> [FLUSH\|APPEND\|REPLACE]` - give the binary payload extracted using `FUNCTION DUMP`, restore all the functions on the given payload. Restore policy can be given to control how to handle existing functions (default is APPEND): * FLUSH: delete all existing functions. * APPEND: appends the restored functions to the existing functions. On collision, abort. * REPLACE: appends the restored functions to the existing functions. On collision, replace the old function with the new function. Modify `redis-cli --cluster add-node` to use `FUNCTION DUMP` to get existing functions from one of the nodes in the cluster, and `FUNCTION RESTORE` to load the same set of functions to the new node. `redis-cli` will execute this step before sending the `CLUSTER MEET` command to the new node. If `FUNCTION DUMP` returns an error, assume the current Redis version do not support functions and skip `FUNCTION RESTORE`. If `FUNCTION RESTORE` fails, abort and do not send the `CLUSTER MEET` command. If the new node already contains functions (before the `FUNCTION RESTORE` is sent), abort and do not add the node to the cluster. Test was added to verify `redis-cli --cluster add-node` works as expected.	2021-12-26 09:03:37 +02:00
Meir Shpilraien (Spielrein)	08ff606b0b	Changed fuction name to be case insensitive. (#9984 ) Use case insensitive string comparison for function names (like we do for commands and configs) In addition, add verification that the functions only use the following characters: [a-zA-Z0-9_]	2021-12-26 08:37:24 +02:00
guybe7	7ac213079c	Sort out mess around propagation and MULTI/EXEC (#9890 ) The mess: Some parts use alsoPropagate for late propagation, others using an immediate one (propagate()), causing edge cases, ugly/hacky code, and the tendency for bugs The basic idea is that all commands are propagated via alsoPropagate (i.e. added to a list) and the top-most call() is responsible for going over that list and actually propagating them (and wrapping them in MULTI/EXEC if there's more than one command). This is done in the new function, propagatePendingCommands. Callers to propagatePendingCommands: 1. top-most call() (we want all nested call()s to add to the also_propagate array and just the top-most one to propagate them) - via `afterCommand` 2. handleClientsBlockedOnKeys: it is out of call() context and it may propagate stuff - via `afterCommand`. 3. handleClientsBlockedOnKeys edge case: if the looked-up key is already expired, we will propagate the expire but will not unblock any client so `afterCommand` isn't called. in that case, we have to propagate the deletion explicitly. 4. cron stuff: active-expire and eviction may also propagate stuff 5. modules: the module API allows to propagate stuff from just about anywhere (timers, keyspace notifications, threads). I could have tried to catch all the out-of-call-context places but it seemed easier to handle it in one place: when we free the context. in the spirit of what was done in call(), only the top-most freeing of a module context may cause propagation. 6. modules: when using a thread-safe ctx it's not clear when/if the ctx will be freed. we do know that the module must lock the GIL before calling RM_Replicate/RM_Call so we propagate the pending commands when releasing the GIL. A "known limitation", which were actually a bug, was fixed because of this commit (see propagate.tcl): When using a mix of RM_Call with `!` and RM_Replicate, the command would propagate out-of-order: first all the commands from RM_Call, and then the ones from RM_Replicate Another thing worth mentioning is that if, in the past, a client would issue a MULTI/EXEC with just one write command the server would blindly propagate the MULTI/EXEC too, even though it's redundant. not anymore. This commit renames propagate() to propagateNow() in order to cause conflicts in pending PRs. propagatePendingCommands is the only caller of propagateNow, which is now a static, internal helper function. Optimizations: 1. alsoPropagate will not add stuff to also_propagate if there's no AOF and replicas 2. alsoPropagate reallocs also_propagagte exponentially, to save calls to memmove Bugfixes: 1. CONFIG SET can create evictions, sending notifications which can cause to dirty++ with modules. we need to prevent it from propagating to AOF/replicas 2. We need to set current_client in RM_Call. buggy scenario: - CONFIG SET maxmemory, eviction notifications, module hook calls RM_Call - assertion in lookupKey crashes, because current_client has CONFIG SET, which isn't CMD_WRITE 3. minor: in eviction, call propagateDeletion after notification, like active-expire and all commands (we always send a notification before propagating the command)	2021-12-23 00:03:48 +02:00
Oran Agra	b7567394e1	resolve replication test timing sensitivity - 2nd attempt (#9988 ) issue started failing after #9878 was merged (made an exiting test more sensitive) looks like #9982 didn't help, tested this one and it seems to work better. this commit does two things: 1. reduce the extra delay i added earlier and instead add more keys, the effect no duration of replication is the same, but the intervals in which the server is responsive to the tcl client is higher. 2. improve the test infra to print context when assert_error fails.	2021-12-22 23:37:12 +02:00
Oran Agra	e33e0295bb	resolve replication test timing sensitivity (#9982 ) issue started failing after #9878 was merged (made an exiting test more sensitive)	2021-12-22 16:05:53 +02:00
Oran Agra	41e6e05dee	Allow most CONFIG SET during loading, block some commands in async-loading (#9878 ) ## background Till now CONFIG SET was blocked during loading. (In the not so distant past, GET was disallowed too) We recently (not released yet) added an async-loading mode, see #9323, and during that time it'll serve CONFIG SET and any other command. And now we realized (#9770) that some configs, and commands are dangerous during async-loading. ## changes * Allow most CONFIG SET during loading (both on async-loading and normal loading) * Allow CONFIG REWRITE and CONFIG RESETSTAT during loading * Block a few config during loading (`appendonly`, `repl-diskless-load`, and `dir`) * Block a few commands during loading (list below) ## the blocked commands: * SAVE - obviously we don't wanna start a foregreound save during loading 8-) * BGSAVE - we don't mind to schedule one, but we don't wanna fork now * BGREWRITEAOF - we don't mind to schedule one, but we don't wanna fork now * MODULE - we obviously don't wanna unload a module during replication / rdb loading (MODULE HELP and MODULE LIST are not blocked) * SYNC / PSYNC - we're in the middle of RDB loading from master, must not allow sync requests now. * REPLICAOF / SLAVEOF - we're in the middle of replicating, maybe it makes sense to let the user abort it, but he couldn't do that so far, i don't wanna take any risk of bugs due to odd state. * CLUSTER - only allow [HELP, SLOTS, NODES, INFO, MYID, LINKS, KEYSLOT, COUNTKEYSINSLOT, GETKEYSINSLOT, RESET, REPLICAS, COUNT_FAILURE_REPORTS], for others, preserve the status quo ## other fixes * processEventsWhileBlocked had an issue when being nested, this could happen with a busy script during async loading (new), but also in a busy script during AOF loading (old). this lead to a crash in the scenario described in #6988	2021-12-22 14:11:16 +02:00
zhugezy	ad55fbaabb	Shorten timeouts of CLIENT PAUSE to avoid hanging when tests fail. (#9975 ) If a test fails at `wait_for_blocked_clients_count` after the `PAUSE` command, It won't send `UNPAUSE` to server, leading to the server hanging until timeout, which is bad and hard to debug sometimes when developing. This PR tries to fix this. Timeout in `CLIENT PAUSE` shortened from 1e5 seconds(extremely long) to 50~100 seconds.	2021-12-22 12:06:29 +02:00
Meir Shpilraien (Spielrein)	3bcf108416	Change FUNCTION CREATE, DELETE and FLUSH to be WRITE commands instead of MAY_REPLICATE. (#9953 ) The issue with MAY_REPLICATE is that all automatic mechanisms to handle write commands will not work. This require have a special treatment for: * Not allow those commands to be executed on RO replica. * Allow those commands to be executed on RO replica from primary connection. * Allow those commands to be executed on the RO replica from AOF. By setting those commands as WRITE commands we are getting all those properties from Redis. Test was added to verify that those properties work as expected. In addition, rearrange when and where functions are flushed. Before this PR functions were flushed manually on `rdbLoadRio` and cleaned manually on failure. This contradicts the assumptions that functions are data and need to be created/deleted alongside with the data. A side effect of this, for example, `debug reload noflush` did not flush the data but did flush the functions, `debug loadaof` flush the data but not the functions. This PR move functions deletion into `emptyDb`. `emptyDb` (renamed to `emptyData`) will now accept an additional flag, `NOFUNCTIONS` which specifically indicate that we do not want to flush the functions (on all other cases, functions will be flushed). Used the new flag on FLUSHALL and FLUSHDB only! Tests were added to `debug reload` and `debug loadaof` to verify that functions behave the same as the data. Notice that because now functions will be deleted along side with the data we can not allow `CLUSTER RESET` to be called from within a function (it will cause the function to be released while running), this PR adds `NO_SCRIPT` flag to `CLUSTER RESET` so it will not be possible to be called from within a function. The other cluster commands are allowed from within a function (there are use-cases that uses `GETKEYSINSLOT` to iterate over all the keys on a given slot). Tests was added to verify `CLUSTER RESET` is denied from within a script. Another small change on this PR is that `RDBFLAGS_ALLOW_DUP` is also applicable on functions. When loading functions, if this flag is set, we will replace old functions with new ones on collisions.	2021-12-21 16:13:29 +02:00
zhugezy	1b0968df46	Remove EVAL script verbatim replication, propagation, and deterministic execution logic (#9812 ) # Background The main goal of this PR is to remove relevant logics on Lua script verbatim replication, only keeping effects replication logic, which has been set as default since Redis 5.0. As a result, Lua in Redis 7.0 would be acting the same as Redis 6.0 with default configuration from users' point of view. There are lots of reasons to remove verbatim replication. Antirez has listed some of the benefits in Issue #5292: >1. No longer need to explain to users side effects into scripts. They can do whatever they want. >2. No need for a cache about scripts that we sent or not to the slaves. >3. No need to sort the output of certain commands inside scripts (SMEMBERS and others): this both simplifies and gains speed. >4. No need to store scripts inside the RDB file in order to startup correctly. >5. No problems about evicting keys during the script execution. When looking back at Redis 5.0, antirez and core team decided to set the config `lua-replicate-commands yes` by default instead of removing verbatim replication directly, in case some bad situations happened. 3 years later now before Redis 7.0, it's time to remove it formally. # Changes - configuration for lua-replicate-commands removed - created config file stub for backward compatibility - Replication script cache removed - this is useless under script effects replication - relevant statistics also removed - script persistence in RDB files is also removed - Propagation of SCRIPT LOAD and SCRIPT FLUSH to replica / AOF removed - Deterministic execution logic in scripts removed (i.e. don't run write commands after random ones, and sorting output of commands with random order) - the flags indicating which commands have non-deterministic results are kept as hints to clients. - `redis.replicate_commands()` & `redis.set_repl()` changed - now `redis.replicate_commands()` does nothing and return an 1 - ...and then `redis.set_repl()` can be issued before `redis.replicate_commands()` now - Relevant TCL cases adjusted - DEBUG lua-always-replicate-commands removed # Other changes - Fix a recent bug comparing CLIENT_ID_AOF to original_client->flags instead of id. (introduced in #9780) Co-authored-by: Oran Agra <oran@redislabs.com>	2021-12-21 08:32:42 +02:00
Binbin	febc3f63b2	Fix recent daily CI test failures (#9966 ) Recent PRs have introduced some failures, this commit try to fix these CI failures. Here are the changes: 1. Enable debug-command in sentinel test. ``` Master reboot in very short time: ERR DEBUG command not allowed. If the enable-debug-command option is set to "local", you can run it from a local connection, otherwise you need to set this option in the configuration file, and then restart the server. ``` 2. Enable protected-config in sentinel test. ``` SDOWN is triggered by misconfigured instance replying with errors: ERR CONFIG SET failed (possibly related to argument 'dir') - can't set protected config ``` 3. Enable debug-command in cluster test. ``` Verify slaves consistency: ERR DEBUG command not allowed. If the enable-debug-command option is set to "local", you can run it from a local connection, otherwise you need to set this option in the configuration file, and then restart the server. ``` 4. quicklist fill should be signed int. The reason for the modification is to eliminate the warning. Modify `int fill: QL_FILL_BITS` to `signed int fill: QL_FILL_BITS` The first three were introduced at #9920 (same issue). And the last one was introduced at #9962.	2021-12-20 12:31:13 +02:00
Oran Agra	6add1b7217	Add external test that runs without debug command (#9964 ) - add needs:debug flag for some tests - disable "save" in external tests (speedup?) - use debug_digest proc instead of debug command directly so it can be skipped - use OBJECT ENCODING instead of DEBUG OBJECT to get encoding - add a proc for OBJECT REFCOUNT so it can be skipped - move a bunch of tests in latency_monitor tests to happen later so that latency monitor has some values in it - add missing close_replication_stream calls - make sure to close the temp client if DEBUG LOG fails	2021-12-19 17:41:51 +02:00
YaacovHazan	ae2f5b7b2e	Protected configs and sensitive commands (#9920 ) Block sensitive configs and commands by default. * `enable-protected-configs` - block modification of configs with the new `PROTECTED_CONFIG` flag. Currently we add this flag to `dbfilename`, and `dir` configs, all of which are non-mutable configs that can set a file redis will write to. * `enable-debug-command` - block the `DEBUG` command * `enable-module-command` - block the `MODULE` command These have a default value set to `no`, so that these features are not exposed by default to client connections, and can only be set by modifying the config file. Users can change each of these to either `yes` (allow all access), or `local` (allow access from local TCP connections and unix domain connections) Note that this is a breaking change (specifically the part about MODULE command being disabled by default). I.e. we don't consider DEBUG command being blocked as an issue (people shouldn't have been using it), and the few configs we protected are unlikely to have been set at runtime anyway. On the other hand, it's likely to assume some users who use modules, load them from the config file anyway. Note that's the whole point of this PR, for redis to be more secure by default and reduce the attack surface on innocent users, so secure defaults will necessarily mean a breaking change.	2021-12-19 10:46:16 +02:00
guybe7	5df070ba39	COMMAND: Use underscores instead of hyphens in attributes (#9959 ) some languages can build a json-like object by parsing a textual json, but it works poorly when attributes contain hyphens example in JS: ``` let j = JSON.parse(json) j['key-name'] <- works j.key-name <= illegal syntax ```	2021-12-18 09:00:42 +02:00
ny0312	792afb4432	Introduce memory management on cluster link buffers (#9774 ) Introduce memory management on cluster link buffers: * Introduce a new `cluster-link-sendbuf-limit` config that caps memory usage of cluster bus link send buffers. * Introduce a new `CLUSTER LINKS` command that displays current TCP links to/from peers. * Introduce a new `mem_cluster_links` field under `INFO` command output, which displays the overall memory usage by all current cluster links. * Introduce a new `total_cluster_links_buffer_limit_exceeded` field under `CLUSTER INFO` command output, which displays the accumulated count of cluster links freed due to `cluster-link-sendbuf-limit`.	2021-12-16 21:56:59 -08:00
Meir Shpilraien (Spielrein)	687210f155	Add FUNCTION FLUSH command to flush all functions (#9936 ) Added `FUNCTION FLUSH` command. The new sub-command allows delete all the functions. An optional `[SYNC\|ASYNC]` argument can be given to control whether or not to flush the functions synchronously or asynchronously. if not given the default flush mode is chosen by `lazyfree-lazy-user-flush` configuration values. Add the missing `functions.tcl` test to the list of tests that are executed in test_helper.tcl, and call FUNCTION FLUSH in between servers in external mode	2021-12-16 17:58:25 +02:00
yoav-steinberg	70ff26b454	Multiparam config get. (#9914 ) Support doing `CONFIG GET <x> <y> <z>`, each of them can also be a pattern with wildcards. This avoids duplicates in the result by looping over the configs and for each once checking all the patterns, once a match is found for a pattern we move on to the next config.	2021-12-16 09:01:13 +02:00
guybe7	867816003e	Auto-generate the command table from JSON files (#9656 ) Delete the hardcoded command table and replace it with an auto-generated table, based on a JSON file that describes the commands (each command must have a JSON file). These JSON files are the SSOT of everything there is to know about Redis commands, and it is reflected fully in COMMAND INFO. These JSON files are used to generate commands.c (using a python script), which is then committed to the repo and compiled. The purpose is: * Clients and proxies will be able to get much more info from redis, instead of relying on hard coded logic. * drop the dependency between Redis-user and the commands.json in redis-doc. * delete help.h and have redis-cli learn everything it needs to know just by issuing COMMAND (will be done in a separate PR) * redis.io should stop using commands.json and learn everything from Redis (ultimately one of the release artifacts should be a large JSON, containing all the information about all of the commands, which will be generated from COMMAND's reply) * the byproduct of this is: * module commands will be able to provide that info and possibly be more of a first-class citizens * in theory, one may be able to generate a redis client library for a strictly typed language, by using this info. ### Interface changes #### COMMAND INFO's reply change (and arg-less COMMAND) Before this commit the reply at index 7 contained the key-specs list and reply at index 8 contained the sub-commands list (Both unreleased). Now, reply at index 7 is a map of: - summary - short command description - since - debut version - group - command group - complexity - complexity string - doc-flags - flags used for documentation (e.g. "deprecated") - deprecated-since - if deprecated, from which version? - replaced-by - if deprecated, which command replaced it? - history - a list of (version, what-changed) tuples - hints - a list of strings, meant to provide hints for clients/proxies. see https://github.com/redis/redis/issues/9876 - arguments - an array of arguments. each element is a map, with the possibility of nesting (sub-arguments) - key-specs - an array of keys specs (already in unstable, just changed location) - subcommands - a list of sub-commands (already in unstable, just changed location) - reply-schema - will be added in the future (see https://github.com/redis/redis/issues/9845) more details on these can be found in https://github.com/redis/redis-doc/pull/1697 only the first three fields are mandatory #### API changes (unreleased API obviously) now they take RedisModuleCommand opaque pointer instead of looking up the command by name - RM_CreateSubcommand - RM_AddCommandKeySpec - RM_SetCommandKeySpecBeginSearchIndex - RM_SetCommandKeySpecBeginSearchKeyword - RM_SetCommandKeySpecFindKeysRange - RM_SetCommandKeySpecFindKeysKeynum Currently, we did not add module API to provide additional information about their commands because we couldn't agree on how the API should look like, see https://github.com/redis/redis/issues/9944. ### Somehow related changes 1. Literals should be in uppercase while placeholder in lowercase. Now all the GEO* command will be documented with M\|KM\|FT\|MI and can take both lowercase and uppercase ### Unrelated changes 1. Bugfix: no_madaory_keys was absent in COMMAND's reply 2. expose CMD_MODULE as "module" via COMMAND 3. have a dedicated uint64 for ACL categories (instead of having them in the same uint64 as command flags) Co-authored-by: Itamar Haber <itamar@garantiadata.com>	2021-12-15 21:23:15 +02:00
Wen Hui	a09bc5045b	Error message improvement for CONFIG SET command (#9924 ) When CONFIG SET fails, print the name of the config that failed. This is helpful since config set is now variadic. however, there are cases where several configs have the same apply function, and we can't be sure which one of them caused the failure.	2021-12-15 09:46:32 +02:00
yoav-steinberg	c7dc17fc0f	Fix possible int overflow when hashing an sds. (#9916 ) This caused a crash when adding elements larger than 2GB to a set (same goes for hash keys). See #8455. Details: * The fix makes the dict hash functions receive a `size_t` instead of an `int`. In practice the dict hash functions call siphash which receives a `size_t` and the callers to the hash function pass a `size_t` to it so the fix is trivial. * The issue was recreated by attempting to add a >2gb value to a set. Appropriate tests were added where I create a set with large elements and check basic functionality on it (SADD, SCARD, SPOP, etc...). * When I added the tests I also refactored a bit all the tests code which is run under the `--large-memory` flag. This removed code duplication for the test framework's `write_big_bulk` and `write_big_bulk` code and also takes care of not allocating the test frameworks helper huge string used by these tests when not run under `--large-memory`. * I also added the _violoations.tcl_ unit tests to be part of the entire test suite and leaned up non relevant list related tests that were in there. This was done in this PR because most of the _violations_ tests are "large memory" tests.	2021-12-13 21:16:25 +02:00
Madelyn Olson	c40d23b89f	Redact ACL SETUSER arguments if the user has spaces (#9935 )	2021-12-13 08:39:04 -08:00
Binbin	b93ccee451	Fix timing issue in strem blocking tests (#9927 ) A test failure was reported in Daily CI (FreeBSD). `XREAD: XADD + DEL should not awake client` ``` *** [err]: XREAD: XADD + DEL should not awake client in tests/unit/type/stream.tcl Expected [lindex 0 0] eq {s1} (context: type eval line 11 cmd {assert {[lindex $res 0 0] eq {s1}}} proc ::test) ``` It seems that `r` is executed before `rd` enters the blocking state. And ended up getting a empty reply by timeout. We use `wait_for_blocked_clients_count` to wait for the blocking client to be ready and avoid this situation. Also fixed other test cases that may have the same issue.	2021-12-10 20:35:51 +02:00
sundb	7f0fae947a	Santize dump payload: fix crash when stream with duplicate consumes (#9918 ) When rdb creates a consumer without determining whether it exists in advance, it may return NULL and crash if it encounters corrupt data with duplicate consumers.	2021-12-08 18:11:57 +02:00
yoav-steinberg	07b1326073	Hide hidden configs from `config get` patterns. (#9888 ) Added `HIDDEN_CONFIG` to hide debug / dev / testing configs from CONFIG GET when it is used with a wildcard. These are not documented in redis.conf so now CONFIG GET only works when they are explicitly specified. The current configs are: ``` key-load-delay loading-process-events-interval-bytes rdb-key-save-delay use-exit-on-panic watchdog-period ```	2021-12-08 12:44:10 +02:00
leishiao	08ed44d722	improvement of a test in unit/pause.tcl (#9868 ) Co-authored-by: xiaolei <xiaolei@91jkys.com>	2021-12-07 17:41:11 -08:00
yoav-steinberg	1736fa4d22	Don't write oom score adj to proc unless we're managing it. (#9904 ) When disabling redis oom-score-adj managment we restore the base value read before enabling oom-score-adj management. This fixes an issue introduced in #9748 where updating `oom-score-adj-values` while `oom-score-adj` was set to `no` would write the base oom score adj value read on startup to `/proc`. This is a bug since while `oom-score-adj` is disabled we should never write to proc and let external processes manage it. Added appropriate tests.	2021-12-07 16:05:51 +02:00
Binbin	b947049f85	Fix timing issue in logging.tcl with FreeBSD (#9910 ) A test failure was reported in Daily CI. `Crash report generated on SIGABRT` with FreeBSD. ``` *** [err]: Crash report generated on SIGABRT in tests/integration/logging.tcl Expected [string match crashed by signal ### Starting...(logs) in tests/integration/logging.tcl] ``` It look like `tail -1000` was executed too early, before it printed out all the crash logs. We can give it a few more chances by using `wait_for_log_messages`. Other changes: 1. In `Server is able to generate a stack trace on selected systems`, use `wait_for_log_messages`to reduce the lines of code. And if it fails, there are more detailed logs that can be printed. 2. In `Crash report generated on DEBUG SEGFAULT`, we also use `wait_for_log_messages` to avoid possible timing issues.	2021-12-07 12:02:58 +02:00
sundb	1808618f5d	Santize dump payload: fix invalid listpack entry start with EOF (#9889 ) When an invalid listpack entry starts with EOF, we will skip it when we verify it in the loop.	2021-12-04 16:43:08 +02:00
Oran Agra	64f6159646	Merge Redis Functions PR (#9780 ) # Redis Function This PR added the Redis Functions capabilities that were suggested on #8693. The PR also introduce a big refactoring to the current Lua implementation (i.e `scripting.c`). The main purpose of the refactoring is to have better code sharing between the Lua implementation that exists today on Redis (`scripting.c`) and the new Lua engine that is introduced on this PR. The refactoring includes code movements and file name changes as well as some logic changes that need to be carefully reviewed. To make the review easier, the PR was split into multiple commits. Each commit is deeply described later on but the main concept is that some commits are just moving code around without making any logical changes, those commits are less likely to cause any issues or regressions and can be reviewed fast. Other commits, which perform code and logic changes, need to be reviewed carefully, but those commits were created after the code movements so it's pretty easy to see what was changed. To sum up, it is highly recommended to review this PR commit by commit as it will be easier to see the changes, it is also recommended to read each commit description (written below) to understand what was changed on the commit and whether or not it's just a huge code movement or a logic changes. ## Terminology Currently, the terminology in Redis is not clearly defined. Scripts refer to Lua scripts and eval also refers only to Lua. Introducing Redis Function requires redefining those terms to be able to clearly understand what is been discussed on each context. * eval - legacy Lua script implementation. * Function - new scripting implementation (currently implemented in Lua but in the future, it might be other languages like javascript). * Engine - the component that is responsible for executing functions. * Script - Function or legacy Lua (executed with `eval` or `evalsha`) ## Refactoring New Structure Today, the entire scripting logic is located on `scripting.c`. This logic can be split into 3 main groups: 1. Script management - responsible for storing the scripts that were sent to Redis and retrieving them when they need to be run (base on the script sha on the current implementation). 2. Script invocation - invoke the script given on `eval` or `evalsha` command (this part includes finding the relevant script, preparing the arguments, ..) 3. Interact back with Redis (command invocation) Those 3 groups are tightly coupled on `scripting.c`. Redis Functions also need to use those groups logics, for example, to interact back with Redis or to execute Lua code. The refactoring attempts to split those 3 groups and define APIs so that we can reuse the code both on legacy Lua scripts and Redis Functions. In order to do so we define the following units: 1. script.c: responsible for interaction with Redis from within a script. 2. script_lua.c: responsible to execute Lua code, uses `script.c` to interact with Redis from within the Lua code. 3. function_lua.c: contains the Lua engine implementation, uses `script_lua.c` to execute the Lua code. 4. functions.c: Contains Redis Functions implementation (`FUNCTION` command,), uses `functions_lua.c` if the function it wants to invoke needs the Lua engine. 4. eval.c: the original `scripting.c` contains the Lua legacy implementation and was refactored to use `script_lua.c` to invoke the Lua code. ## Commits breakdown Notice: Some small commits are omitted from this list as they are small and insignificant (for example build fixes) ### First commit - code movements This commit rename `scripting.c` -> `eval.c` and introduce the new `script_lua.c` unit. The commit moves relevant code from `eval.c` (`scripting.c`) to `script_lua.c`, the purpose of moving the code is so that later we will be able to re-use the code on the Lua engine (`function_lua.c`). The commit only moves the code without modifying even a single line, so there is a very low risk of breaking anything and it also makes it much easier to see the changes on the following commits. Because the commit does not change the code (only moves it), it does not compile. But we do not care about it as the only purpose here is to make the review processes simpler. ### Second commit - move legacy Lua variables into `eval.c` Today, all Lua-related variables are located on the server struct. The commit attempt to identify those variable and take them out from the server struct, leaving only script related variables (variables that later need to be used also by engines) The following variable where renamed and left on the server struct: * lua_caller -> script_caller * lua_time_limit -> script_time_limit * lua_timedout -> script_timedout * lua_oom -> script_oom * lua_disable_deny_script -> script_disable_deny_script * in_eval -> in_script The following variables where moved to lctx under eval.c * lua * lua_client * lua_cur_script * lua_scripts * lua_scripts_mem * lua_replicate_commands * lua_write_dirty * lua_random_dirty * lua_multi_emitted * lua_repl * lua_kill * lua_time_start * lua_time_snapshot This commit is in a low risk of introducing any issues and it is just moving variables around and not changing any logic. ### Third commit - introducing script unit This commit introduces the `script.c` unit. Its purpose (as described above) is to provide an API for scripts to interact with Redis. Interaction includes mostly executing commands, but also other functionalities. The interaction is done using a `ScriptRunCtx` object that needs to be created by the user and initialized using `scriptPrepareForRun`. A detailed list of functionalities expose by the unit: 1. Calling commands (including all the validation checks such as acl, cluster, read only run, ...) 2. Set Resp 3. Set Replication method (AOF/REPLICATION/NONE) 4. Call Redis back on long-running scripts to allow Redis to reply to clients and perform script kill The commit introduces the new unit and uses it on eval commands to interact with Redis. ### Fourth commit - Moved functionality of invoke Lua code to `script_lua.c` This commit moves the logic of invoking the Lua code into `script_lua.c` so later it can be used also by Lua engine (`function_lua.c`). The code is located on `callFunction` function and assumes the Lua function already located on the top of the Lua stack. This commit also change `eval.c` to use the new functionality to invoke Lua code. ### Fith commit - Added Redis Functions unit (`functions.c`) and Lua engine (`function_lua.c`) Added Redis Functions unit under `functions.c`, included: 1. FUNCTION command: * FUNCTION CREATE * FUNCTION CALL * FUNCTION DELETE * FUNCTION KILL * FUNCTION INFO * FUNCTION STATS 2. Register engines In addition, this commit introduces the first engine that uses the Redis Functions capabilities, the Lua engine (`function_lua.c`) ## API Changes ### `lua-time-limit` configuration was renamed to `script-time-limit` (keep `lua-time-limit` as alias for backward compatibility). ### Error log changes When integrating with Redis from within a Lua script, the `Lua` term was removed from all the error messages and instead we write only `script`. For example: `Wrong number of args calling Redis command From Lua script` -> `Wrong number of args calling Redis command From script` ### `info memory` changes: Before stating all the changes made to memory stats we will try to explain the reason behind them and what we want to see on those metrics: * memory metrics should show both totals (for all scripting frameworks), as well as a breakdown per framework / vm. * The totals metrics should have "human" metrics while the breakdown shouldn't. * We did try to maintain backward compatibility in some way, that said we did make some repurpose to existing metrics where it looks reasonable. * We separate between memory used by the script framework (part of redis's used_memory), and memory used by the VM (not part of redis's used_memory) A full breakdown of `info memory` changes: * `used_memory_lua` and `used_memory_lua_human` was deprecated, `used_memory_vm_eval` has the same meaning as `used_memory_lua` * `used_memory_scripts` was renamed to `used_memory_scripts_eval` * `used_memory_scripts` and `used_memory_scripts_human` were repurposed and now return the total memory used by functions and eval (not including vm memory, only code cache, and structs). * `used_memory_vm_function` was added and represents the total memory used by functions vm's * `used_memory_functions` was added and represents the total memory by functions (not including vm memory, only code cache, and structs) * `used_memory_vm_total` and `used_memory_vm_total_human` was added and represents the total memory used by vm's (functions and eval combined) ### `functions.caches` `functions.caches` field was added to `memory stats`, representing the memory used by engines that are not functions (this memory includes data structures like dictionaries, arrays, ...) ## New API ### FUNCTION CREATE Usage: FUNCTION CREATE `ENGINE` `NAME` `[REPLACE]` `[DESC <DESCRIPTION>]` `<CODE>` * `ENGINE` - The name of the engine to use to create the script. * `NAME` - the name of the function that can be used later to call the function using `FUNCTION CALL` command. * `REPLACE` - if given, replace the given function with the existing function (if exists). * `DESCRIPTION` - optional argument describing the function and what it does * `CODE` - function code. The command will return `OK` if created successfully or error in the following cases: * The given engine name does not exist * The function name is already taken and `REPLACE` was not used. * The given function failed on the compilation. ### FCALL and FCALL_RO Usage: FCALL/FCALL_RO `NAME` `NUM_KEYS key1 key2` … ` arg1 arg2` Call and execute the function specified by `NAME`. The function will receive all arguments given after `NUM_KEYS`. The return value from the function will be returned to the user as a result. * `NAME` - Name of the function to run. * The rest is as today with EVALSHA command. The command will return an error in the following cases: * `NAME` does not exist * The function itself returned an error. The `FCALL_RO` is equivalent to `EVAL_RO` and allows only read-only commands to be invoked from the script. ### FUNCTION DELETE Usage: FUNCTION DELETE `NAME` Delete a function identified by `NAME`. Return `OK` on success or error on one of the following: * The given function does not exist ### FUNCTION INFO Usage: FUNCTION INFO `NAME` [WITHCODE] Return information about a function by function name: * Function name * Engine name * Description * Raw code (only if WITHCODE argument is given) ### FUNCTION LIST Usage: FUNCTION LIST Return general information about all the functions: * Function name * Engine name * Description ### FUNCTION STATS Usage: FUNCTION STATS Return information about the current running function: * Function name * Command that was used to invoke the function * Duration in MS that the function is already running If no function is currently running, this section is just a RESP nil. Additionally, return a list of all the available engines. ### FUNCTION KILL Usage: `FUNCTION KILL` Kill the currently executing function. The command will fail if the function already initiated a write command. ## Notes Note: Function creation/deletion is replicated to AOF but AOFRW is not implemented sense its going to be removed: #9794	2021-12-02 21:41:58 +02:00
meir@redislabs.com	cbd463175f	Redis Functions - Added redis function unit and Lua engine Redis function unit is located inside functions.c and contains Redis Function implementation: 1. FUNCTION commands: * FUNCTION CREATE * FCALL * FCALL_RO * FUNCTION DELETE * FUNCTION KILL * FUNCTION INFO 2. Register engine In addition, this commit introduce the first engine that uses the Redis Function capabilities, the Lua engine.	2021-12-02 19:35:52 +02:00
Binbin	e57a4db5d7	Fix CONFIG SET test failures in MacOS/FreeBSD (#9881 ) After the introduction of `Multiparam config set` in #9748, there are two tests cases failed. ``` [exception]: Executing test client: ERR Config set failed - Failed to set current oom_score_adj. Check server logs.. ERR Config set failed - Failed to set current oom_score_adj. Check server logs. ``` `CONFIG sanity` test failed on the `config set oom-score-adj-values` which is a "special" config that does not catch no-op changes. And then it will update `oom-score-adj` which not supported in MacOs. We solve it by adding `oom-score` to the `skip_configs` list. ``` ** [err]: CONFIG SET rollback on apply error in tests/unit/introspection.tcl Expected an error but nothing was caught ``` `CONFIG SET rollback on apply error` test failed on the `config set port $used_port`. In theory, it should throw the error `Unable to listen on this port*`. But it failed on MacOs. We solve it by adding `-myaddr 127.0.0.1` to the socket call.	2021-12-02 18:18:18 +02:00
meir@redislabs.com	fc731bc67f	Redis Functions - Introduce script unit. Script unit is a new unit located on script.c. Its purpose is to provides an API for functions (and eval) to interact with Redis. Interaction includes mostly executing commands, but also functionalities like calling Redis back on long scripts or check if the script was killed. The interaction is done using a scriptRunCtx object that need to be created by the user and initialized using scriptPrepareForRun. Detailed list of functionalities expose by the unit: 1. Calling commands (including all the validation checks such as acl, cluster, read only run, ...) 2. Set Resp 3. Set Replication method (AOF/REPLICATION/NONE) 4. Call Redis back to on long running scripts to allow Redis reply to clients and perform script kill The commit introduce the new unit and uses it on eval commands to interact with Redis.	2021-12-01 23:54:23 +02:00
yoav-steinberg	0e5b813ef9	Multiparam config set (#9748 ) We can now do: `config set maxmemory 10m repl-backlog-size 5m` ## Basic algorithm to support "transaction like" config sets: 1. Backup all relevant current values (via get). 2. Run "verify" and "set" on everything, if we fail run "restore". 3. Run "apply" on everything (optional optimization: skip functions already run). If we fail run "restore". 4. Return success. ### restore 1. Run set on everything in backup. If we fail log it and continue (this puts us in an undefined state but we decided it's better than the alternative of panicking). This indicates either a bug or some unsupported external state. 2. Run apply on everything in backup (optimization: skip functions already run). If we fail log it (see comment above). 3. Return error. ## Implementation/design changes: * Apply function are idempotent (have no effect if they are run more than once for the same config). * No indication in set functions if we're reading the config or running from the `CONFIG SET` command (removed `update` argument). * Set function should set some config variable and assume an (optional) apply function will use that later to apply. If we know this setting can be safely applied immediately and can always be reverted and doesn't depend on any other configuration we can apply immediately from within the set function (and not store the setting anywhere). This is the case of this `dir` config, for example, which has no apply function. No apply function is need also in the case that setting the variable in the `server` struct is all that needs to be done to make the configuration take effect. Note that the original concept of `update_fn`, which received the old and new values was removed and replaced by the optional apply function. * Apply functions use settings written to the `server` struct and don't receive any inputs. * I take care that for the generic (non-special) configs if there's no change I avoid calling the setter (possible optimization: avoid calling the apply function as well). * Passing the same config parameter more than once to `config set` will fail. You can't do `config set my-setting value1 my-setting value2`. Note that getting `save` in the context of the conf file parsing to work here as before was a pain. The conf file supports an aggregate `save` definition, where each `save` line is added to the server's save params. This is unlike any other line in the config file where each line overwrites any previous configuration. Since we now support passing multiple save params in a single line (see top comments about `save` in https://github.com/redis/redis/pull/9644) we should deprecate the aggregate nature of this config line and perhaps reduce this ugly code in the future.	2021-12-01 10:15:11 +02:00
Itamar Haber	21aa1d4b91	Adds auto-seq-only-generation via `XADD ... <ms>-*` (#9217 ) Adds the ability to autogenerate the sequence part of the millisecond-only explicit ID specified for `XADD`. This is useful in case added entries have an externally-provided timestamp without sub-millisecond resolution.	2021-11-30 19:56:39 +02:00
Wen Hui	2afa41f628	Sentinel master reboot fix (#9438 ) Add master-reboot-down-after-period as a configurable parameter, to make it possible to trigger a failover from a master that is responding with `-LOADING` for a long time after being restarted.	2021-11-30 18:46:15 +02:00
Meir Shpilraien (Spielrein)	b8e82d205b	Swap '\r\n' with spaces when returning a big number reply from Lua script. (#9870 ) The issue can only happened with a bad Lua script that claims to return a big number while actually return data which is not a big number (contains chars that are not digits). Such thing will not cause an issue unless the big number value contains `\r\n` and then it messes the resp3 structure. The fix changes all the appearances of '\r\n' with spaces. Such an issue can also happened on simple string or error replies but those already handle it the same way this PR does (replace `\r\n` with spaces). Other replies type are not vulnerable to this issue because they are not counting on free text that is terminated with `\r\n` (either it contains the bulk length like string reply or they are typed reply that can not inject free text like boolean or number). The issue only exists on unstable branch, big number reply on Lua script was not yet added to any official release.	2021-11-30 12:27:05 +02:00
Binbin	3119a3aeb5	Fix CLIENT KILL kill all clients with id 0 (#9853 ) * Fix CLIENT KILL kill all clients with id 0 or with skipme CLIENT KILL with ID argument should only kill the client with the provided ID. In old code, CLIENT KILL with id 0 will kill all the connected clients. Co-authored-by: Ofir Luzon <ofirluzon@gmail.com>	2021-11-29 13:35:36 -08:00
leishiao	d56ded89c5	improvement of a blocking xread test (#9859 ) This test relies on that `XREAD BLOCK 20000 STREAMS s1{t} s2{t} s3{t} $ $ $` is executed by redis before `XADD s2{t} * new abcd1234`. A ` wait_for_blocked_client` is needed between the two to ensure the order, otherwise `XADD s2{t} * new abcd1234` might be executed first due to network delay causing a test failure. Co-authored-by: xiaolei <xiaolei@91jkys.com>	2021-11-29 09:57:21 +02:00
sundb	494ee2f1fc	Fix abnormal compression due to out-of-control recompress (#9849 ) This pr is following #9779 . ## Describe of feature Now when we turn on the `list-compress-depth` configuration, the list will compress the ziplist between `[list-compress-depth, -list-compress-depth]`. When we need to use the compressed data, we will first decompress it, then use it, and finally compress it again. It's controlled by `quicklistNode->recompress`, which is designed to avoid the need to re-traverse the entire quicklist for compression after each decompression, we only need to recompress the quicklsitNode being used. In order to ensure the correctness of recompressing, we should normally let quicklistDecompressNodeForUse and quicklistCompress appear in pairs, otherwise, it may lead to the head and tail being compressed or the middle ziplist not being compressed correctly, which is exactly the problem this pr needs to solve. ## Solution 1. Reset `quicklistIter` after insert and replace. The quicklist node will be compressed in `quicklistInsertAfter`, `quicklistInsertBefore`, `quicklistReplaceAtIndex`, so we can safely reset the quicklistIter to avoid it being used again 2. `quicklistIndex` will return an iterator that can be used to recompress the current node after use. ## Test 1. In the `Stress Tester for #3343-Similar Errors` test, when the server crashes or when `valgrind` or `asan` error is detected, print violating commands. 2. Add a crash test due to wrongly recompressing after `lrem`. 3. Remove `insert before with 0 elements` and `insert after with 0 elements`, Now we forbid any operation on an NULL quicklistIter.	2021-11-29 07:57:01 +02:00
Binbin	8759c1e14b	Improve stability in some blocking command tests (#9856 ) In order to test the situation where multiple clients are blocked, we set up multiple clients to execute some blocking commands. These tests depend on the order of command processing. Those tests are based on the wrong assumption that the command send first will be executed by the server first, which is obviously wrong in some network delyas. This commit ensures orderly execution of commands by waiting and judging the number of blocked clients each time. Fix #9850	2021-11-28 15:37:35 +02:00
Meir Shpilraien (Spielrein)	6b0b04f1b2	Clean Lua stack before parsing call reply to avoid crash on a call with many arguments (#9809 ) This commit `0f8b634cd` (CVE-2021-32626 released in 6.2.6, 6.0.16, 5.0.14) fixes an invalid memory write issue by using `lua_checkstack` API to make sure the Lua stack is not overflow. This fix was added on 3 places: 1. `luaReplyToRedisReply` 2. `ldbRedis` 3. `redisProtocolToLuaType` On the first 2 functions, `lua_checkstack` is handled gracefully while the last is handled with an assert and a statement that this situation can not happened (only with misbehave module): > the Redis reply might be deep enough to explode the LUA stack (notice that currently there is no such command in Redis that returns such a nested reply, but modules might do it) The issue that was discovered is that user arguments is also considered part of the stack, and so the following script (for example) make the assertion reachable: ``` local a = {} for i=1,7999 do a[i] = 1 end return redis.call("lpush", "l", unpack(a)) ``` This is a regression because such a script would have worked before and now its crashing Redis. The solution is to clear the function arguments from the Lua stack which makes the original assumption true and the assertion unreachable.	2021-11-28 11:59:39 +02:00
Viktor Söderqvist	acf3495eb8	Sort out the mess around writable replicas and lookupKeyRead/Write (#9572 ) Writable replicas now no longer use the values of expired keys. Expired keys are deleted when lookupKeyWrite() is used, even on a writable replica. Previously, writable replicas could use the value of an expired key in write commands such as INCR, SUNIONSTORE, etc.. This commit also sorts out the mess around the functions lookupKeyRead() and lookupKeyWrite() so they now indicate what we intend to do with the key and are not affected by the command calling them. Multi-key commands like SUNIONSTORE, ZUNIONSTORE, COPY and SORT with the store option now use lookupKeyRead() for the keys they're reading from (which will not allow reading from logically expired keys). This commit also fixes a bug where PFCOUNT could return a value of an expired key. Test modules commands have their readonly and write flags updated to correctly reflect their lookups for reading or writing. Modules are not required to correctly reflect this in their command flags, but this change is made for consistency since the tests serve as usage examples. Fixes #6842. Fixes #7475.	2021-11-28 11:26:28 +02:00
sundb	4d8700786e	Fix COMMAND GETKEYS on LCS (#9852 ) Remove lcsGetKeys to clean up the remaining STRALGO after #9733. i.e. it still used a getkeys_proc which was still looking for the KEYS or STRINGS arguments	2021-11-28 09:02:38 +02:00
sundb	4512905961	Replace ziplist with listpack in quicklist (#9740 ) Part three of implementing #8702, following #8887 and #9366 . ## Description of the feature 1. Replace the ziplist container of quicklist with listpack. 2. Convert existing quicklist ziplists on RDB loading time. an O(n) operation. ## Interface changes 1. New `list-max-listpack-size` config is an alias for `list-max-ziplist-size`. 2. Replace `debug ziplist` command with `debug listpack`. ## Internal changes 1. Add `lpMerge` to merge two listpacks . (same as `ziplistMerge`) 2. Add `lpRepr` to print info of listpack which is used in debugCommand and `quicklistRepr`. (same as `ziplistRepr`) 3. Replace `QUICKLIST_NODE_CONTAINER_ZIPLIST` with `QUICKLIST_NODE_CONTAINER_PACKED`(following #9357 ). It represent that a quicklistNode is a packed node, as opposed to a plain node. 4. Remove `createZiplistObject` method, which is never used. 5. Calculate listpack entry size using overhead overestimation in `quicklistAllowInsert`. We prefer an overestimation, which would at worse lead to a few bytes below the lowest limit of 4k. ## Improvements 1. Calling `lpShrinkToFit` after converting Ziplist to listpack, which was missed at #9366. 2. Optimize `quicklistAppendPlainNode` to avoid memcpy data. ## Bugfix 1. Fix crash in `quicklistRepr` when ziplist is compressed, introduced from #9366. ## Test 1. Add unittest for `lpMerge`. 2. Modify the old quicklist ziplist corrupt dump test. Co-authored-by: Oran Agra <oran@redislabs.com>	2021-11-24 13:34:13 +02:00
Binbin	fb4f7be22c	Wait for `asyn_loading` to stop in `short read` test (#9841 ) In #9323, when `repl-diskless-load` is enabled and set to `swapdb`, if the master replication ID hasn't changed, we can load data-set asynchronously, and serving read commands during the full resync. In `diskless loading short read` test, after a loading successfully, we will wait for the loading to stop and continue the for loop. After the introduction of `async_loading`, we also need to check it. Otherwise the next loop will start too soon, may trigger a timing issue.	2021-11-24 12:46:43 +02:00
Binbin	9273d09dd4	Add tests to cover EXPIRE overflow fix (#9839 ) In #8287, some overflow checks have been added. But when `when = 1000` overflows, it will become a positive number. And the check not able to catch it. The key will be added with a short expiration time and will deleted a few seconds later. In #9601, will check the overflow after `=` and return an error first, and avoiding this situation. In this commit, added some tests to cover those code paths. Found it in #9825, and close it.	2021-11-24 09:39:23 +02:00
Oran Agra	a3a014294f	fix invalid read on corrupt ziplist (#9831 ) If the last bytes in ziplist are corrupt and we decode from tail to head, we may reach slightly outside the ziplist.	2021-11-23 14:56:52 +02:00
guybe7	b161cff5f9	QUIT is a command, HOST: and POST are not (#9798 ) Some people complain that QUIT is missing from help/command table. Not appearing in COMMAND command, command stats, ACL, etc. and instead, there's a hack in processCommand with a comment that looks outdated. Note that it is [documented](https://redis.io/commands/quit) At the same time, HOST: and POST are there in the command table although these are not real commands. They would appear in the COMMAND command, and even in commandstats. Other changes: 1. Initialize the static logged_time static var in securityWarningCommand 2. add `no-auth` flag to RESET so it can always be executed.	2021-11-23 10:38:25 +02:00
Oran Agra	f07dedf73f	Fix invalid access in lpFind on corrupted listpack (#9819 ) Issue found by corrupt-dump-fuzzer test with ASAN. The problem was that lpSkip and lpGetWithSize could read the next listpack entry without validating that it's in range. Similarly even the memcmp in lpFind could do that and possibly crash on segfault and now they'll crash on assert first. The naive fix of using lpAssertValidEntry every time, resulted in 30% degradation in the lpFind benchmark of the unit test. The final fix with the condition at the bottom has no performance implications.	2021-11-22 15:30:00 +02:00
Oran Agra	f00a8ad93c	fix string escaping in corrupt-dump test to support TCL8.5 (#9824 ) TCL8.5 can't handle cases where part of the string is escaped and part of it isn't, if there's a single char that needs escaping, we need to escape the whole string.	2021-11-22 12:30:06 +02:00
Binbin	698b577413	Fix timing issue in sub-second expires test (#9821 ) The `PEXPIRE/PSETEX/PEXPIREAT can set sub-second expires` test is a very time sensitive test, it used to occasionally fail on MacOS. It will perform there internal tests in a loop, as long as one fails, it will try to excute again in the next loop. oranagra suggested that we can split it into three individual tests, so that if one fails, we do not need to retry the others. And maybe it will increase the chances of success dramatically. Each is executed 500 times, and the number of retries is collected: ``` PSETEX, total: 500, sum: 745, min: 0, max: 13, avg: 1.49 PEXPIRE, total: 500, sum: 575, min: 0, max: 16, avg: 1.15 PEXPIREAT, total: 500, sum: 0, min: 0, max: 0, avg: 0.0 ALL(old_way), total: 500, sum: 8090, min: 0, max: 138, avg: 16.18 ``` And we can see the threshold is very low. Splitting the test also makes the code better to maintain. Co-authored-by: Oran Agra <oran@redislabs.com>	2021-11-22 08:39:27 +02:00
Oran Agra	183b90a625	Fix false positive leak reported by GCC ASAN (#9816 ) Leak found by the corrupt-dump-fuzzer when using GCC ASAN, which seems to falsely report leaks on pointers kept only on the stack when calling exit. Instead we now use _exit on panic / assert to skip these leak checks. Additionally, check for sanitizer warnings in the corrupt-dump-fuzzer between iterations, so that when something is found we know which test to relate it too (and it prints reproduction command list)	2021-11-21 18:47:10 +02:00
Oran Agra	1417648469	Prevent LCS from allocating temp memory over proto-max-bulk-len (#9817 ) LCS can allocate immense amount of memory (sizes of two inputs multiplied by each other). In the past this caused some possible security issues due to overflows, which we solved and also added use of `trymalloc` to return "Insufficient memory" instead of OOM panic zmalloc. But in case overcommit is enabled, it could be that we won't get the OOM panic, and zmalloc will succeed, and then we can get OOM killed by the kernel. The solution here is to prevent LCS from allocating transient memory that's bigger than `proto-max-bulk-len` config. This config is not directly related to transient memory, but using a hard coded value ad well as introducing a specific config seems wrong. This comes to solve an error in the corrupt-dump-fuzzer test that started in the daily CI see #9799	2021-11-21 14:30:20 +02:00
Oran Agra	d4e7ffb38c	Improve active defrag in jemalloc 5.2 (#9778 ) Background: Following the upgrade to jemalloc 5.2, there was a test that used to be flaky and started failing consistently (on 32bit), so we disabled it (see #9645). This is a test that i introduced in #7289 when i attempted to solve a rare stagnation problem, and it later turned out i failed to solve it, ans what's more i added a test that caused it to be not so rare, and as i mentioned, now in jemalloc 5.2 it became consistent on 32bit. Stagnation can happen when all the slabs of the bin are equally utilized, so the decision to move an allocation from a relatively empty slab to a relatively full one, will never happen, and in that test all the slabs are at 50% utilization, so the defragger could just keep scanning the keyspace and not move anything. What this PR changes: * First, finally in jemalloc 5.2 we have the count of non-full slabs, so when we compare the utilization of the current slab, we can compare it to the average utilization of the non-full slabs in our bin, instead of the total average of our bin. this takes the full slabs out of the game, since they're not candidates for migration (neither source nor target). * Secondly, We add some 12% (100/8) to the decision to defrag an allocation, this is the part that aims to avoid stagnation, and it's especially important since the above mentioned change can get us closer to stagnation. * Thirdly, since jemalloc 5.2 adds sharded bins, we take into account all shards (something that's missing from the original PR that merged it), this isn't expected to make any difference since anyway there should be just one shard. How this was benchmarked. What i did was run the memefficiency test unit with `--verbose` and compare the defragger hits and misses the tests reported. At first, when i took into consideration only the non-full slabs, it got a lot worse (i got into stagnation, or just got a lot of misses and a lot of hits), but when i added the 10% i got back to results that were slightly better than the ones of the jemalloc 5.1 branch. i.e. full defragmentation was achieved with fewer hits (relocations), and fewer misses (keyspace scans).	2021-11-21 13:35:39 +02:00
Yossi Gottlieb	366d5101d3	Tests: add a few missing needs:debug tags. (#9806 )	2021-11-18 23:01:56 +02:00
perryitay	0c10f0e1c0	Fix crashes when list-compress-depth is used. (#9779 ) Recently we started using list-compress-depth in tests (was completely untested till now). Turns this triggered test failures with the external mode, since the tests left the setting enabled and then it was used in other tests (specifically the fuzzer named "Stress tester for #3343-alike bugs"). This PR fixes the issue of the `recompress` flag being left set by mistake, which caused the code to later to compress the head or tail nodes (which should never be compressed) The solution is to reset the recompress flag when it should have been (when it was decided not to compress). Additionally we're adding some assertions and improve the tests so in order to catch other similar bugs.	2021-11-18 18:09:30 +02:00
Eduardo Semprebon	1a255e3150	Reject PING with MASTERDOWN when replica-serve-stale-data=no (#9757 ) Currently PING returns different status when server is not serving data, for example when `LOADING` or `BUSY`. But same was not true for `MASTERDOWN` This commit makes PING reply with `MASTERDOWN` when replica-serve-stale-data=no and link is MASTER is down.	2021-11-18 10:53:17 +02:00
guybe7	af7489886d	Obliterate STRALGO! add LCS (which only works on keys) (#9799 ) Drop the STRALGO command, now LCS is a command of its own and it only works on keys (not input strings). The motivation is that STRALGO's syntax was really messed-up... - assumes all (future) string algorithms will take similar arguments - mixes command that takes keys and one that doesn't in the same command. - make it nearly impossible to expose the right key spec in COMMAND INFO (issues cluster clients) - hard for cluster clients to determine the key names (firstkey, lastkey, etc) - hard for ACL / flags (is it a read command?) This is a breaking change.	2021-11-18 10:47:49 +02:00
Binbin	91e77a0cfb	Fixes ZPOPMIN/ZPOPMAX wrong replies when count is 0 with non-zset (#9711 ) Moves ZPOP ... 0 fast exit path after type check to reply with WRONGTYPE. In the past it will return an empty array. Also now count is not allowed to be negative. see #9680 before: ``` 127.0.0.1:6379> set zset str OK 127.0.0.1:6379> zpopmin zset 0 (empty array) 127.0.0.1:6379> zpopmin zset -1 (empty array) ``` after: ``` 127.0.0.1:6379> set zset str OK 127.0.0.1:6379> zpopmin zset 0 (error) WRONGTYPE Operation against a key holding the wrong kind of value 127.0.0.1:6379> zpopmin zset -1 (error) ERR value is out of range, must be positive ```	2021-11-18 10:13:16 +02:00
sundb	985430b4fc	Change lzf to handle values larger than UINT32_MAX (#9776 ) Redis supports inserting data over 4GB into string (and recently for lists too, see #9357), But LZF compression used in RDB files (see `rdbcompression` config), and in quicklist (see `list-compress-depth` config) does not support compress/decompress data over UINT32_MAX, which will result in corrupting the rdb after compression. Internal changes: 1. Modify the `unsigned int` parameter of `lzf_compress/lzf_decompress` to `size_t`. 2. Modify the variable types in `lzf_compress` involving offsets and lengths to `size_t`. 3. Set LZF_USE_OFFSETS to 0. When LZF_USE_OFFSETS is 1, lzf store offset into `LZF_HSLOT`(32bit). Even in 64-bit, `LZF_USE_OFFSETS` defaults to 1, because lzf assumes that it only compresses and decompresses data smaller than UINT32_MAX. But now we need to make lzf support 64-bit, turning on `LZF_USE_OFFSETS` will make it impossible to store 64-bit offsets or pointers. BTW, disable LZF_USE_OFFSETS also brings a few performance improvements. Tests: 1. Add test for compress/decompress string large than UINT32_MAX. 2. Add unittest for compress/decompress quicklistNode.	2021-11-16 13:12:25 +02:00
yoav-steinberg	e968d9ac58	Connection leak in external tests. (#9777 ) Two issues: 1. In many tests we simply forgot to close the connections we created, which doesn't matter for normal tests where the server is killed, but creates a leak on external server tests. 2. When calling `start_server` on external test we create a fresh connection instead of really starting a new server, but never clean it at the end.	2021-11-15 11:07:43 +02:00
Binbin	174eedce44	Tune expire test threshold. (#9775 ) I have seen this CI failure twice on MacOS: *** [err]: PEXPIRE/PSETEX/PEXPIREAT can set sub-second expires in tests/unit/expire.tcl Expected 'somevalue {} somevalue {} somevalue {}' to equal or match '{} {} {} {} somevalue {}' I did some loop test in my own daily CI, the results show that is not particularly stable. Change the threshold from 30 to 50.	2021-11-13 07:55:48 +02:00
Ozan Tezcan	b91d8b289b	Add sanitizer support and clean up sanitizer findings (#9601 ) - Added sanitizer support. `address`, `undefined` and `thread` sanitizers are available. - To build Redis with desired sanitizer : `make SANITIZER=undefined` - There were some sanitizer findings, cleaned up codebase - Added tests with address and undefined behavior sanitizers to daily CI. - Added tests with address sanitizer to the per-PR CI (smoke out mem leaks sooner). Basically, there are three types of issues : 1- Unaligned load/store : Most probably, this issue may cause a crash on a platform that does not support unaligned access. Redis does unaligned access only on supported platforms. 2- Signed integer overflow. Although, signed overflow issue can be problematic time to time and change how compiler generates code, current findings mostly about signed shift or simple addition overflow. For most platforms Redis can be compiled for, this wouldn't cause any issue as far as I can tell (checked generated code on godbolt.org). 3 -Minor leak (redis-cli), use-after-free(just before calling exit()); UB means nothing guaranteed and risky to reason about program behavior but I don't think any of the fixes here worth backporting. As sanitizers are now part of the CI, preventing new issues will be the real benefit.	2021-11-11 13:51:33 +02:00
yoav-steinberg	cd6b3d558b	Archive external redis log in external tests (#9765 ) On test failure store the external redis server logs as CI artifacts so we can review them. Write test name to server log for external server tests. This is attempted and silently failed in case external server doesn't support it. Note that in non-external server mode we use a more robust method of writing to the log which doesn't depend on the server actually running/working. This isn't possible for externl servers and required for some complex tests which are skipped in external mode anyway. Cleanup: remove dup code.	2021-11-11 13:04:02 +02:00
Oran Agra	0927a0dd24	Try solving test timeout on freebsd CI (#9768 ) First, avoid using --accurate on the freebsd CI, we only care about systematic issues there due to being different platform, but not accuracy Secondly, when looking at the test which timed out it seems silly and outdated: - it used KEYS to attempt to trigger lazy expiry, but KEYS doesn't do that anymore. - it used some hard coded sleeps rather than waiting for things to happen and exiting ASAP	2021-11-10 19:39:26 +02:00
Oran Agra	978eadbad4	Increase test timeout in valgrind runs (#9767 ) We saw some tests sporadically time out on valgrind (namely the ones from #9323). Increasing valgrind timeout from 20 mins to 40 mins in CI. And fixing an outdated help message.	2021-11-10 19:38:58 +02:00
YaacovHazan	03406fcb6c	fix short timeout in replication short read tests (#9763 ) In both tests, "diskless loading short read" and "diskless loading short read with module", the timeout of waiting for the replica to respond to a short read and log it, is too short. Also, add --dump-logs in runtest-moduleapi for valgrind runs.	2021-11-09 22:37:18 +02:00
chendianqiang	a527c3e814	Test suite - user server socket to optimize port detection (#9663 ) Optimized port detection for tcl, use 'socket -server' instead of 'socket' to rule out port on TIME_WAIT Co-authored-by: chendianqiang <chendianqiang@meituan.com> Co-authored-by: Oran Agra <oran@redislabs.com>	2021-11-07 13:53:57 +02:00
Eduardo Semprebon	91d0c758e5	Replica keep serving data during repl-diskless-load=swapdb for better availability (#9323 ) For diskless replication in swapdb mode, considering we already spend replica memory having a backup of current db to restore in case of failure, we can have the following benefits by instead swapping database only in case we succeeded in transferring db from master: - Avoid `LOADING` response during failed and successful synchronization for cases where the replica is already up and running with data. - Faster total time of diskless replication, because now we're moving from Transfer + Flush + Load time to Transfer + Load only. Flushing the tempDb is done asynchronously after swapping. - This could be implemented also for disk replication with similar benefits if consumers are willing to spend the extra memory usage. General notes: - The concept of `backupDb` becomes `tempDb` for clarity. - Async loading mode will only kick in if the replica is syncing from a master that has the same repl-id the one it had before. i.e. the data it's getting belongs to a different time of the same timeline. - New property in INFO: `async_loading` to differentiate from the blocking loading - Slot to Key mapping is now a field of `redisDb` as it's more natural to access it from both server.db and the tempDb that is passed around. - Because this is affecting replicas only, we assume that if they are not readonly and write commands during replication, they are lost after SYNC same way as before, but we're still denying CONFIG SET here anyways to avoid complications. Considerations for review: - We have many cases where server.loading flag is used and even though I tried my best, there may be cases where async_loading should be checked as well and cases where it shouldn't (would require very good understanding of whole code) - Several places that had different behavior depending on the loading flag where actually meant to just handle commands coming from the AOF client differently than ones coming from real clients, changed to check CLIENT_ID_AOF instead. Additional for Release Notes - Bugfix - server.dirty was not incremented for any kind of diskless replication, as effect it wouldn't contribute on triggering next database SAVE - New flag for RM_GetContextFlags module API: REDISMODULE_CTX_FLAGS_ASYNC_LOADING - Deprecated RedisModuleEvent_ReplBackup. Starting from Redis 7.0, we don't fire this event. Instead, we have the new RedisModuleEvent_ReplAsyncLoad holding 3 sub-events: STARTED, ABORTED and COMPLETED. - New module flag REDISMODULE_OPTIONS_HANDLE_REPL_ASYNC_LOAD for RedisModule_SetModuleOptions to allow modules to declare they support the diskless replication with async loading (when absent, we fall back to disk-based loading). Co-authored-by: Eduardo Semprebon <edus@saxobank.com> Co-authored-by: Oran Agra <oran@redislabs.com>	2021-11-04 10:46:50 +02:00
Itamar Haber	06dd202a05	Fixes LPOP/RPOP wrong replies when count is 0 (#9692 ) Introduced in #8179, this fixes the command's replies in the 0 count edge case. [BREAKING] changes the reply type when count is 0 to an empty array (instead of nil) Moves LPOP ... 0 fast exit path after type check to reply with WRONGTYPE	2021-11-04 09:43:08 +02:00
menwen	ccf8a651f3	Retry when a blocked connection system call is interrupted by a signal (#9629 ) When repl-diskless-load is enabled, the connection is set to the blocking state. The connection may be interrupted by a signal during a system call. This would have resulted in a disconnection and possibly a reconnection loop. Co-authored-by: Oran Agra <oran@redislabs.com>	2021-11-04 09:09:28 +02:00
Oran Agra	d04f306931	Fix race condition in cluster test 22-replica-in-sync (#9721 ) there was a chance that by the time the assertion is executed, the replica already manages to reconnect. now we make sure the replica is unable to re-connect to the master. additionally, we wait for some gossip from the disconnected replica, to see that it doesn't mess things up. unrelated: fix a typo when trying to exhaust the backlog, one that didn't have any harmful implications Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>	2021-11-04 08:44:18 +02:00
perryitay	f27083a4a8	Add support for list type to store elements larger than 4GB (#9357 ) Redis lists are stored in quicklist, which is currently a linked list of ziplists. Ziplists are limited to storing elements no larger than 4GB, so when bigger items are added they're getting truncated. This PR changes quicklists so that they're capable of storing large items in quicklist nodes that are plain string buffers rather than ziplist. As part of the PR there were few other changes in redis: 1. new DEBUG sub-commands: - QUICKLIST-PACKED-THRESHOLD - set the threshold of for the node type to be plan or ziplist. default (1GB) - QUICKLIST <key> - Shows low level info about the quicklist encoding of <key> 2. rdb format change: - A new type was added - RDB_TYPE_LIST_QUICKLIST_2 . - container type (packed / plain) was added to the beginning of the rdb object (before the actual node list). 3. testing: - Tests that requires over 100MB will be by default skipped. a new flag was added to 'runtest' to run the large memory tests (not used by default) Co-authored-by: sundb <sundbcn@gmail.com> Co-authored-by: Oran Agra <oran@redislabs.com>	2021-11-03 20:47:18 +02:00
guybe7	f11a2d4dd7	Fix COMMAND GETKEYS on EVAL without keys (#9733 ) Add new no-mandatory-keys flag to support COMMAND GETKEYS of commands which have no mandatory keys. In the past we would have got this error: ``` 127.0.0.1:6379> command getkeys eval "return 1" 0 (error) ERR Invalid arguments specified for command ```	2021-11-03 14:38:26 +02:00
Oran Agra	d25dc08932	Solve issues with tracking test in external mode (#9726 ) The issue was that setting maxmemory to used_memory and expecting eviction is insufficient, since we need to take mem_not_counted_for_evict into consideration. This test got broken by #9166	2021-11-02 16:07:51 -07:00
Oran Agra	87321deb3f	attempt to fix tracking test issue with external tests due to lazy free (#9722 ) The External tests started failing recently for unclear reason: ``` *** [err]: Tracking invalidation message of eviction keys should be before response in tests/unit/tracking.tcl Expected '0' to be equal to 'invalidate volatile-key' (context: type eval line 21 cmd {assert_equal $res {invalidate volatile-key}} proc ::test) ``` I suspect the issue is that the used_memory sample is taken while a lazy free is still being processed.	2021-11-02 16:42:53 +02:00
menwen	d5ca72e38b	fix defrag test looking at the wrong latency metric (#9723 ) the latency event was renamed in #7726, and the outcome was that the test was ineffective (unable to measure the max latency, always seeing 0)	2021-11-02 15:52:56 +02:00
Binbin	58a1d16ff6	Fix timing issue in replication test (#9719 ) So it looks like sampling set loglines [count_log_lines -2] was executed too late, and the replication managed to complete before that. ``` *** [err]: diskless no replicas drop during rdb pipe in tests/integration/replication.tcl log message of '"Diskless rdb transfer, done reading from pipe, 2 replicas still up"' not found in ./tests/tmp/server.6124.69/stdout after line: 52 till line: 52 ``` Changes: 1. when we search the master log file, we start to search from before we sent the REPLICAOF command, to prevent a race in which the replication completed before we sampled the log line count. 2. we don't need to sample the replica loglines sine it's a fresh resplica that's just been started, so the message we're looking for is the first occurrence in the log, we can start search from 0. Co-authored-by: Oran Agra <oran@redislabs.com>	2021-11-02 10:32:01 +02:00
Binbin	cea7809cea	Fix race condition in psync2-pingoff test (#9712 ) Test failed on freebsd: ``` *** [err]: Make the old master a replica of the new one and check conditions in tests/integration/psync2-pingoff.tcl Expected '162' to be equal to '176' (context: type eval line 18 cmd {assert_equal [status $R(0) master_repl_offset] [status $R(1) master_repl_offset]} proc ::test) ``` There are two possible race conditions in the test. 1. The code waits for sync_full to increment, and assumes that means the master did the fork. But in fact there are cases the master will increment that sync_full counter (after replica asks for sync), but will see that there's already a fork running and will delay the fork creation. In this case the INCR will be executed before the fork happens, so it'll not be in the command stream. Solve that by waiting for `master_link_status: up` on the replica before the INCR. 2. The repl-ping-replica-period is still high (1 second), so there's a chance the master will send an additional PING between the two calls to INFO (the line that fails is the one that samples INFO from both servers). So there's a chance one of them will have an incremented offset due to PING and the other won't have it yet. In theory we can wait for the repl_offset to match, but then we risk facing a situation where that race will hide an offset mis-match. so instead, i think we should just change repl-ping-replica-period to prevent further pings from being pushed. Co-authored-by: Oran Agra <oran@redislabs.com>	2021-11-01 16:07:08 +02:00
Oran Agra	f1f3cceb50	fix valgrind issues with long double module test (#9709 ) The module test in reply.tcl was introduced by #8521 but didn't run until recently (see #9639) and then it started failing with valgrind. This is because valgrind uses 64 bit long double (unlike most other platforms that have at least 80 bits) But besides valgrind, the tests where also incompatible with ARM32, which also uses 64 bit long doubles. We now use appropriate value to avoid issues with either valgrind or ARM32 In all the double tests, i use 3.141, which is safe since since addReplyDouble uses `%.17Lg` which is able to represent this value without adding any digits due to precision loss. In the long double, since we use `%.17Lf` in ld2string, it preserves 17 significant digits, rather than 17 digit after the decimal point (like in `%.17Lg`). So to make these similar, i use value lower than 1 (no digits left of the period) Lastly, we have the same issue with TCL (no long doubles) so we read raw protocol in that test. Note that the only error before this fix (in both valgrind and ARM32 is this: ``` *** [err]: RM_ReplyWithLongDouble: a float reply in tests/unit/moduleapi/reply.tcl Expected '3.141' to be equal to '3.14100000000000001' (context: type eval line 2 cmd {assert_equal 3.141 [r rw.longdouble 3.141]} proc ::test) ``` so the changes to debug.c and scripting.tcl aren't really needed, but i consider them a cleanup (i.e. scripting.c validated a different constant than the one that's sent to it from debug.c). Another unrelated change is to add the RESP version to the repeated tests in reply.tcl	2021-11-01 13:41:35 +02:00

1 2 3 4 5 ...

1699 Commits