redict

mirror of https://codeberg.org/redict/redict.git synced 2025-01-23 00:28:26 -05:00

Author	SHA1	Message	Date
Oran Agra	1417648469	Prevent LCS from allocating temp memory over proto-max-bulk-len (#9817 ) LCS can allocate immense amount of memory (sizes of two inputs multiplied by each other). In the past this caused some possible security issues due to overflows, which we solved and also added use of `trymalloc` to return "Insufficient memory" instead of OOM panic zmalloc. But in case overcommit is enabled, it could be that we won't get the OOM panic, and zmalloc will succeed, and then we can get OOM killed by the kernel. The solution here is to prevent LCS from allocating transient memory that's bigger than `proto-max-bulk-len` config. This config is not directly related to transient memory, but using a hard coded value ad well as introducing a specific config seems wrong. This comes to solve an error in the corrupt-dump-fuzzer test that started in the daily CI see #9799	2021-11-21 14:30:20 +02:00
Ozan Tezcan	b91d8b289b	Add sanitizer support and clean up sanitizer findings (#9601 ) - Added sanitizer support. `address`, `undefined` and `thread` sanitizers are available. - To build Redis with desired sanitizer : `make SANITIZER=undefined` - There were some sanitizer findings, cleaned up codebase - Added tests with address and undefined behavior sanitizers to daily CI. - Added tests with address sanitizer to the per-PR CI (smoke out mem leaks sooner). Basically, there are three types of issues : 1- Unaligned load/store : Most probably, this issue may cause a crash on a platform that does not support unaligned access. Redis does unaligned access only on supported platforms. 2- Signed integer overflow. Although, signed overflow issue can be problematic time to time and change how compiler generates code, current findings mostly about signed shift or simple addition overflow. For most platforms Redis can be compiled for, this wouldn't cause any issue as far as I can tell (checked generated code on godbolt.org). 3 -Minor leak (redis-cli), use-after-free(just before calling exit()); UB means nothing guaranteed and risky to reason about program behavior but I don't think any of the fixes here worth backporting. As sanitizers are now part of the CI, preventing new issues will be the real benefit.	2021-11-11 13:51:33 +02:00
Oran Agra	0927a0dd24	Try solving test timeout on freebsd CI (#9768 ) First, avoid using --accurate on the freebsd CI, we only care about systematic issues there due to being different platform, but not accuracy Secondly, when looking at the test which timed out it seems silly and outdated: - it used KEYS to attempt to trigger lazy expiry, but KEYS doesn't do that anymore. - it used some hard coded sleeps rather than waiting for things to happen and exiting ASAP	2021-11-10 19:39:26 +02:00
YaacovHazan	03406fcb6c	fix short timeout in replication short read tests (#9763 ) In both tests, "diskless loading short read" and "diskless loading short read with module", the timeout of waiting for the replica to respond to a short read and log it, is too short. Also, add --dump-logs in runtest-moduleapi for valgrind runs.	2021-11-09 22:37:18 +02:00
Eduardo Semprebon	91d0c758e5	Replica keep serving data during repl-diskless-load=swapdb for better availability (#9323 ) For diskless replication in swapdb mode, considering we already spend replica memory having a backup of current db to restore in case of failure, we can have the following benefits by instead swapping database only in case we succeeded in transferring db from master: - Avoid `LOADING` response during failed and successful synchronization for cases where the replica is already up and running with data. - Faster total time of diskless replication, because now we're moving from Transfer + Flush + Load time to Transfer + Load only. Flushing the tempDb is done asynchronously after swapping. - This could be implemented also for disk replication with similar benefits if consumers are willing to spend the extra memory usage. General notes: - The concept of `backupDb` becomes `tempDb` for clarity. - Async loading mode will only kick in if the replica is syncing from a master that has the same repl-id the one it had before. i.e. the data it's getting belongs to a different time of the same timeline. - New property in INFO: `async_loading` to differentiate from the blocking loading - Slot to Key mapping is now a field of `redisDb` as it's more natural to access it from both server.db and the tempDb that is passed around. - Because this is affecting replicas only, we assume that if they are not readonly and write commands during replication, they are lost after SYNC same way as before, but we're still denying CONFIG SET here anyways to avoid complications. Considerations for review: - We have many cases where server.loading flag is used and even though I tried my best, there may be cases where async_loading should be checked as well and cases where it shouldn't (would require very good understanding of whole code) - Several places that had different behavior depending on the loading flag where actually meant to just handle commands coming from the AOF client differently than ones coming from real clients, changed to check CLIENT_ID_AOF instead. Additional for Release Notes - Bugfix - server.dirty was not incremented for any kind of diskless replication, as effect it wouldn't contribute on triggering next database SAVE - New flag for RM_GetContextFlags module API: REDISMODULE_CTX_FLAGS_ASYNC_LOADING - Deprecated RedisModuleEvent_ReplBackup. Starting from Redis 7.0, we don't fire this event. Instead, we have the new RedisModuleEvent_ReplAsyncLoad holding 3 sub-events: STARTED, ABORTED and COMPLETED. - New module flag REDISMODULE_OPTIONS_HANDLE_REPL_ASYNC_LOAD for RedisModule_SetModuleOptions to allow modules to declare they support the diskless replication with async loading (when absent, we fall back to disk-based loading). Co-authored-by: Eduardo Semprebon <edus@saxobank.com> Co-authored-by: Oran Agra <oran@redislabs.com>	2021-11-04 10:46:50 +02:00
menwen	ccf8a651f3	Retry when a blocked connection system call is interrupted by a signal (#9629 ) When repl-diskless-load is enabled, the connection is set to the blocking state. The connection may be interrupted by a signal during a system call. This would have resulted in a disconnection and possibly a reconnection loop. Co-authored-by: Oran Agra <oran@redislabs.com>	2021-11-04 09:09:28 +02:00
perryitay	f27083a4a8	Add support for list type to store elements larger than 4GB (#9357 ) Redis lists are stored in quicklist, which is currently a linked list of ziplists. Ziplists are limited to storing elements no larger than 4GB, so when bigger items are added they're getting truncated. This PR changes quicklists so that they're capable of storing large items in quicklist nodes that are plain string buffers rather than ziplist. As part of the PR there were few other changes in redis: 1. new DEBUG sub-commands: - QUICKLIST-PACKED-THRESHOLD - set the threshold of for the node type to be plan or ziplist. default (1GB) - QUICKLIST <key> - Shows low level info about the quicklist encoding of <key> 2. rdb format change: - A new type was added - RDB_TYPE_LIST_QUICKLIST_2 . - container type (packed / plain) was added to the beginning of the rdb object (before the actual node list). 3. testing: - Tests that requires over 100MB will be by default skipped. a new flag was added to 'runtest' to run the large memory tests (not used by default) Co-authored-by: sundb <sundbcn@gmail.com> Co-authored-by: Oran Agra <oran@redislabs.com>	2021-11-03 20:47:18 +02:00
Binbin	58a1d16ff6	Fix timing issue in replication test (#9719 ) So it looks like sampling set loglines [count_log_lines -2] was executed too late, and the replication managed to complete before that. ``` *** [err]: diskless no replicas drop during rdb pipe in tests/integration/replication.tcl log message of '"Diskless rdb transfer, done reading from pipe, 2 replicas still up"' not found in ./tests/tmp/server.6124.69/stdout after line: 52 till line: 52 ``` Changes: 1. when we search the master log file, we start to search from before we sent the REPLICAOF command, to prevent a race in which the replication completed before we sampled the log line count. 2. we don't need to sample the replica loglines sine it's a fresh resplica that's just been started, so the message we're looking for is the first occurrence in the log, we can start search from 0. Co-authored-by: Oran Agra <oran@redislabs.com>	2021-11-02 10:32:01 +02:00
Binbin	cea7809cea	Fix race condition in psync2-pingoff test (#9712 ) Test failed on freebsd: ``` *** [err]: Make the old master a replica of the new one and check conditions in tests/integration/psync2-pingoff.tcl Expected '162' to be equal to '176' (context: type eval line 18 cmd {assert_equal [status $R(0) master_repl_offset] [status $R(1) master_repl_offset]} proc ::test) ``` There are two possible race conditions in the test. 1. The code waits for sync_full to increment, and assumes that means the master did the fork. But in fact there are cases the master will increment that sync_full counter (after replica asks for sync), but will see that there's already a fork running and will delay the fork creation. In this case the INCR will be executed before the fork happens, so it'll not be in the command stream. Solve that by waiting for `master_link_status: up` on the replica before the INCR. 2. The repl-ping-replica-period is still high (1 second), so there's a chance the master will send an additional PING between the two calls to INFO (the line that fails is the one that samples INFO from both servers). So there's a chance one of them will have an incremented offset due to PING and the other won't have it yet. In theory we can wait for the repl_offset to match, but then we risk facing a situation where that race will hide an offset mis-match. so instead, i think we should just change repl-ping-replica-period to prevent further pings from being pushed. Co-authored-by: Oran Agra <oran@redislabs.com>	2021-11-01 16:07:08 +02:00
Wang Yuan	68886de085	Fix timing issue in replication buffer test (#9697 ) Introduced in #9166	2021-10-29 08:04:12 +03:00
Wang Yuan	37dc2f13b4	Fix not waiting for data loading to complete in AOF tests (#9683 ) Fix timing issue of a new test introduced in #9326	2021-10-26 14:08:09 +03:00
Wang Yuan	9ec3294b97	Add timestamp annotations in AOF (#9326 ) Add timestamp annotation in AOF, one part of #9325. Enabled with the new `aof-timestamp-enabled` config option. Timestamp annotation format is "#TS:${timestamp}\r\n"." TS" is short of timestamp and this method could save extra bytes in AOF. We can use timestamp annotation for some special functions. - know the executing time of commands - restore data to a specific point-in-time (by using redis-check-rdb to truncate the file)	2021-10-25 13:08:34 +03:00
Wang Yuan	c1718f9d86	Replication backlog and replicas use one global shared replication buffer (#9166 ) ## Background For redis master, one replica uses one copy of replication buffer, that is a big waste of memory, more replicas more waste, and allocate/free memory for every reply list also cost much. If we set client-output-buffer-limit small and write traffic is heavy, master may disconnect with replicas and can't finish synchronization with replica. If we set client-output-buffer-limit big, master may be OOM when there are many replicas that separately keep much memory. Because replication buffers of different replica client are the same, one simple idea is that all replicas only use one replication buffer, that will effectively save memory. Since replication backlog content is the same as replicas' output buffer, now we can discard replication backlog memory and use global shared replication buffer to implement replication backlog mechanism. ## Implementation I create one global "replication buffer" which contains content of replication stream. The structure of "replication buffer" is similar to the reply list that exists in every client. But the node of list is `replBufBlock`, which has `id, repl_offset, refcount` fields. ```c /* Replication buffer blocks is the list of replBufBlock. * * +--------------+ +--------------+ +--------------+ * \| refcount = 1 \| ... \| refcount = 0 \| ... \| refcount = 2 \| * +--------------+ +--------------+ +--------------+ * \| / \ * \| / \ * \| / \ * Repl Backlog Replia_A Replia_B * * Each replica or replication backlog increments only the refcount of the * 'ref_repl_buf_node' which it points to. So when replica walks to the next * node, it should first increase the next node's refcount, and when we trim * the replication buffer nodes, we remove node always from the head node which * refcount is 0. If the refcount of the head node is not 0, we must stop * trimming and never iterate the next node. / / Similar with 'clientReplyBlock', it is used for shared buffers between * all replica clients and replication backlog. / typedef struct replBufBlock { int refcount; / Number of replicas or repl backlog using. / long long id; / The unique incremental number. / long long repl_offset; / Start replication offset of the block. */ size_t size, used; char buf[]; } replBufBlock; ``` So now when we feed replication stream into replication backlog and all replicas, we only need to feed stream into replication buffer `feedReplicationBuffer`. In this function, we set some fields of replication backlog and replicas to references of the global replication buffer blocks. And we also need to check replicas' output buffer limit to free if exceeding `client-output-buffer-limit`, and trim replication backlog if exceeding `repl-backlog-size`. When sending reply to replicas, we also need to iterate replication buffer blocks and send its content, when totally sending one block for replica, we decrease current node count and increase the next current node count, and then free the block which reference is 0 from the head of replication buffer blocks. Since now we use linked list to manage replication backlog, it may cost much time for iterating all linked list nodes to find corresponding replication buffer node. So we create a rax tree to store some nodes for index, but to avoid rax tree occupying too much memory, i record one per 64 nodes for index. Currently, to make partial resynchronization as possible as much, we always let replication backlog as the last reference of replication buffer blocks, backlog size may exceeds our setting if slow replicas that reference vast replication buffer blocks, and this method doesn't increase memory usage since they share replication buffer. To avoid freezing server for freeing unreferenced replication buffer blocks when we need to trim backlog for exceeding backlog size setting, we trim backlog incrementally (free 64 blocks per call now), and make it faster in `beforeSleep` (free 640 blocks). ### Other changes - `mem_total_replication_buffers`: we add this field in INFO command, it means the total memory of replication buffers used. - `mem_clients_slaves`: now even replica is slow to replicate, and its output buffer memory is not 0, but it still may be 0, since replication backlog and replicas share one global replication buffer, only if replication buffer memory is more than the repl backlog setting size, we consider the excess as replicas' memory. Otherwise, we think replication buffer memory is the consumption of repl backlog. - Key eviction Since all replicas and replication backlog share global replication buffer, we think only the part of exceeding backlog size the extra separate consumption of replicas. Because we trim backlog incrementally in the background, backlog size may exceeds our setting if slow replicas that reference vast replication buffer blocks disconnect. To avoid massive eviction loop, we don't count the delayed freed replication backlog into used memory even if there are no replicas, i.e. we also regard this memory as replicas's memory. - `client-output-buffer-limit` check for replica clients It doesn't make sense to set the replica clients output buffer limit lower than the repl-backlog-size config (partial sync will succeed and then replica will get disconnected). Such a configuration is ignored (the size of repl-backlog-size will be used). This doesn't have memory consumption implications since the replica client will share the backlog buffers memory. - Drop replication backlog after loading data if needed We always create replication backlog if server is a master, we need it because we put DELs in it when loading expired keys in RDB, but if RDB doesn't have replication info or there is no rdb, it is not possible to support partial resynchronization, to avoid extra memory of replication backlog, we drop it. - Multi IO threads Since all replicas and replication backlog use global replication buffer, if I/O threads are enabled, to guarantee data accessing thread safe, we must let main thread handle sending the output buffer to all replicas. But before, other IO threads could handle sending output buffer of all replicas. ## Other optimizations This solution resolve some other problem: - When replicas disconnect with master since of out of output buffer limit, releasing the output buffer of replicas may freeze server if we set big `client-output-buffer-limit` for replicas, but now, it doesn't cause freezing. - This implementation may mitigate reply list copy cost time(also freezes server) when one replication has huge reply buffer and another replica can copy buffer for full synchronization. now, we just copy reference info, it is very light. - If we set replication backlog size big, it also may cost much time to copy replication backlog into replica's output buffer. But this commit eliminates this problem. - Resizing replication backlog size doesn't empty current replication backlog content.	2021-10-25 09:24:31 +03:00
Oran Agra	276b460ea9	Attempt to fix a valgrind test failure due to timing (#9643 ) in the past few days i've seen two failures in the valgrind daily test. *** [err]: slave fails full sync and diskless load swapdb recovers it in tests/integration/replication.tcl Replica didn't get into loading mode can't reproduce it, but i'm hoping it's just too slow (to start loading within 5 seconds)	2021-10-18 10:45:45 +03:00
YaacovHazan	5becb7c9c6	improve the stability and correctness of "Test child sending info" (#9562 ) Since we measure the COW size in this test by changing some keys and reading the reported COW size, we need to ensure that the "dismiss mechanism" (#8974) will not free memory and reduce the COW size. For that, this commit changes the size of the keys to 512B (less than a page). and because some keys may fall into the same page, we are modifying ten keys on each iteration and check for at least 50% change in the COW size.	2021-10-04 10:32:26 +03:00
Oran Agra	5a4ab7c7d2	Fix stream sanitization for non-int first value (#9553 ) This was recently broken in #9321 when we validated stream IDs to be integers but did that after to the stepping next record instead of before.	2021-09-26 18:46:22 +03:00
Binbin	14d6abd8e9	Add ZMPOP/BZMPOP commands. (#9484 ) This is similar to the recent addition of LMPOP/BLMPOP (#9373), but zset. Syntax for the new ZMPOP command: `ZMPOP numkeys [<key> ...] MIN\|MAX [COUNT count]` Syntax for the new BZMPOP command: `BZMPOP timeout numkeys [<key> ...] MIN\|MAX [COUNT count]` Some background: - ZPOPMIN/ZPOPMAX take only one key, and can return multiple elements. - BZPOPMIN/BZPOPMAX take multiple keys, but return only one element from just one key. - ZMPOP/BZMPOP can take multiple keys, and can return multiple elements from just one key. Note that ZMPOP/BZMPOP can take multiple keys, it eventually operates on just on key. And it will propagate as ZPOPMIN or ZPOPMAX with the COUNT option. As new commands, if we can not pop any elements, the response like: - ZMPOP: Return a NIL in both RESP2 and RESP3, unlike ZPOPMIN/ZPOPMAX return emptyarray. - BZMPOP: Return a NIL in both RESP2 and RESP3 when timeout is reached, like BZPOPMIN/BZPOPMAX. For the normal response is nested arrays in RESP2 and RESP3: ``` ZMPOP/BZMPOP 1) keyname 2) 1) 1) member1 2) score1 2) 1) member2 2) score2 In RESP2: 1) "myzset" 2) 1) 1) "three" 2) "3" 2) 1) "two" 2) "2" In RESP3: 1) "myzset" 2) 1) 1) "three" 2) (double) 3 2) 1) "two" 2) (double) 2 ```	2021-09-23 08:34:40 +03:00
Oran Agra	16be742b08	fix replication test failure, probing the wrong log file (#9513 )	2021-09-19 12:07:04 +03:00
filipe oliveira	b5a879e1c2	Added URI support to redis-benchmark (cli and benchmark share the same uri-parsing methods) (#9314 ) - Add `-u <uri>` command line option to support `redis://` URI scheme. - included server connection information object (`struct cliConnInfo`), used to describe an ip:port pair, db num user input, and user:pass to avoid a large number of function arguments. - Using sds on connection info strings for redis-benchmark/redis-cli Co-authored-by: yoav-steinberg <yoav@monfort.co.il>	2021-09-14 19:45:06 +03:00
zhaozhao.zz	794442b130	PSYNC2: make partial sync possible after master reboot (#8015 ) The main idea is how to allow a master to load replication info from RDB file when rebooting, if master can load replication info it means that replicas may have the chance to psync with master, it can save much traffic. The key point is we need guarantee safety and consistency, so there are two differences between master and replica: 1. master would load the replication info as secondary ID and offset, in case other masters have the same replid. 2. when master loading RDB, it would propagate expired keys as DEL command to replication backlog, then replica can receive these commands to delete stale keys. p.s. the expired keys when RDB loading is useful for users, so we show it as `rdb_last_load_keys_expired` and `rdb_last_load_keys_loaded` in info persistence. Moreover, after load replication info, master should update `no_replica_time` in case loading RDB cost too long time.	2021-09-13 15:39:11 +08:00
sundb	3ca6972ecd	Replace all usage of ziplist with listpack for t_zset (#9366 ) Part two of implementing #8702 (zset), after #8887. ## Description of the feature Replaced all uses of ziplist with listpack in t_zset, and optimized some of the code to optimize performance. ## Rdb format changes New `RDB_TYPE_ZSET_LISTPACK` rdb type. ## Rdb loading improvements: 1) Pre-expansion of dict for validation of duplicate data for listpack and ziplist. 2) Simplifying the release of empty key objects when RDB loading. 3) Unify ziplist and listpack data verify methods for zset and hash, and move code to rdb.c. ## Interface changes 1) New `zset-max-listpack-entries` config is an alias for `zset-max-ziplist-entries` (same with `zset-max-listpack-value`). 2) OBJECT ENCODING will return listpack instead of ziplist. ## Listpack improvements: 1) Add `lpDeleteRange` and `lpDeleteRangeWithEntry` functions to delete a range of entries from listpack. 2) Improve the performance of `lpCompare`, converting from string to integer is faster than converting from integer to string. 3) Replace `snprintf` with `ll2string` to improve performance in converting numbers to strings in `lpGet()`. ## Zset improvements: 1) Improve the performance of `zzlFind` method, use `lpFind` instead of `lpCompare` in a loop. 2) Use `lpDeleteRangeWithEntry` instead of `lpDelete` twice to delete a element of zset. ## Tests 1) Add some unittests for `lpDeleteRange` and `lpDeleteRangeWithEntry` function. 2) Add zset RDB loading test. 3) Add benchmark test for `lpCompare` and `ziplsitCompare`. 4) Add empty listpack zset corrupt dump test.	2021-09-09 18:18:53 +03:00
Binbin	c50af0aeba	Add LMPOP/BLMPOP commands. (#9373 ) We want to add COUNT option for BLPOP. But we can't do it without breaking compatibility due to the command arguments syntax. So this commit introduce two new commands. Syntax for the new LMPOP command: `LMPOP numkeys [<key> ...] LEFT\|RIGHT [COUNT count]` Syntax for the new BLMPOP command: `BLMPOP timeout numkeys [<key> ...] LEFT\|RIGHT [COUNT count]` Some background: - LPOP takes one key, and can return multiple elements. - BLPOP takes multiple keys, but returns one element from just one key. - LMPOP can take multiple keys and return multiple elements from just one key. Note that LMPOP/BLMPOP can take multiple keys, it eventually operates on just one key. And it will propagate as LPOP or RPOP with the COUNT option. As a new command, it still return NIL if we can't pop any elements. For the normal response is nested arrays in RESP2 and RESP3, like: ``` LMPOP/BLMPOP 1) keyname 2) 1) element1 2) element2 ``` I.e. unlike BLPOP that returns a key name and one element so it uses a flat array, and LPOP that returns multiple elements with no key name, and again uses a flat array, this one has to return a nested array, and it does for for both RESP2 and RESP3 (like SCAN does) Some discuss can see: #766 #8824	2021-09-09 12:02:33 +03:00
Wang Yuan	cee3d67f50	Delay to discard cached master when full synchronization (#9398 ) * Delay to discard cache master when full synchronization * Don't disconnect with replicas before loading transferred RDB when full sync Previously, once replica need to start full synchronization with master, it will discard cached master whatever full synchronization is failed or not. Now we discard cached master only when transferring RDB is finished and start to change data space, this make replica could start partial resynchronization with another new master if new master is failed during full synchronization.	2021-09-09 11:32:29 +03:00
Viktor Söderqvist	547c3405d4	Optimize quicklistIndex to seek from the nearest end (#9454 ) Until now, giving a negative index seeks from the end of a list and a positive seeks from the beginning. This change makes it seek from the nearest end, regardless of the sign of the given index. quicklistIndex is used by all list commands which operate by index. LINDEX key 999999 in a list if 1M elements is greately optimized by this change. Latency is cut by 75%. LINDEX key -1000000 in a list of 1M elements, likewise. LRANGE key -1 -1 is affected by this, since LRANGE converts the indices to positive numbers before seeking. The tests for corrupt dumps are updated to make sure the corrup data is seeked in the same direction as before.	2021-09-06 09:12:38 +03:00
Viktor Söderqvist	97dcf95cc8	redis-benchmark: improved help and warnings (#9419 ) 1. The output of --help: * On the Usage line, just write [OPTIONS] [COMMAND ARGS...] instead listing only a few arbitrary options and no command. * For --cluster, describe that if the command is supplied on the command line, the key must contain "{tag}". Otherwise, the command will not be sent to the right cluster node. * For -r, add a note that if -r is omitted, all commands in a benchmark will use the same key. Also align the description. * For -t, describe that -t is ignored if a command is supplied on the command line. 2. Print a warning if -t is present when a specific command is supplied. 3. Print all warnings and errors to stderr. 4. Remove -e from calls in redis-benchmark test suite.	2021-08-29 14:31:08 +03:00
sundb	492d8d0961	Sanitize dump payload: fix double free after insert dup nodekey to stream rax and returns 0 (#9399 )	2021-08-20 10:37:45 +03:00
Yossi Gottlieb	1d9c8d61d8	Skip OOM-related tests on incompatible platforms. (#9386 ) We only run OOM related tests on x86_64 and aarch64, as jemalloc on other platforms (notably s390x) may actually succeed very large allocations. As a result the test may hang for a very long time at the cleanup phase, iterating as many as 2^61 hash table slots.	2021-08-18 16:00:22 +03:00
sundb	02fd76b97c	Replace all usage of ziplist with listpack for t_hash (#8887 ) Part one of implementing #8702 (taking hashes first before other types) ## Description of the feature 1. Change ziplist encoded hash objects to listpack encoding. 2. Convert existing ziplists on RDB loading time. an O(n) operation. ## Rdb format changes 1. Add RDB_TYPE_HASH_LISTPACK rdb type. 2. Bump RDB_VERSION to 10 ## Interface changes 1. New `hash-max-listpack-entries` config is an alias for `hash-max-ziplist-entries` (same with `hash-max-listpack-value`) 2. OBJECT ENCODING will return `listpack` instead of `ziplist` ## Listpack improvements: 1. Support direct insert, replace integer element (rather than convert back and forth from string) 3. Add more listpack capabilities to match the ziplist ones (like `lpFind`, `lpRandomPairs` and such) 4. Optimize element length fetching, avoid multiple calculations 5. Use inline to avoid function call overhead. ## Tests 1. Add a new test to the RDB load time conversion 2. Adding the listpack unit tests. (based on the one in ziplist.c) 3. Add a few "corrupt payload: fuzzer findings" tests, and slightly modify existing ones. Co-authored-by: Oran Agra <oran@redislabs.com>	2021-08-10 09:18:49 +03:00
sundb	cbda492909	Sanitize dump payload: handle remaining empty key when RDB loading and restore command (#9349 ) This commit mainly fixes empty keys due to RDB loading and restore command, which was omitted in #9297. 1) When loading quicklsit, if all the ziplists in the quicklist are empty, NULL will be returned. If only some of the ziplists are empty, then we will skip the empty ziplists silently. 2) When loading hash zipmap, if zipmap is empty, sanitization check will fail. 3) When loading hash ziplist, if ziplist is empty, NULL will be returned. 4) Add RDB loading test with sanitize.	2021-08-09 17:13:46 +03:00
Qu Chen	e8eeba7bee	Allow master to replicate command longer than replica's query buffer limit (#9340 ) Replication client no longer checks incoming command length against the client-query-buffer-limit. This makes the master able to replicate commands longer than replica's configured client-query-buffer-limit	2021-08-08 17:34:11 -07:00
Oran Agra	3f3f678a47	corrupt-dump-fuzzer test, avoid creating junk keys (#9302 ) The execution of the RPOPLPUSH command by the fuzzer created junk keys, that were later being selected by RANDOMKEY and modified. This also meant that lists were statistically tested more than other files. Fix the fuzzer not to pass junk key names to RPOPLPUSH, and add a check that detects that new keys are not added by the fuzzer to detect future similar issues.	2021-08-05 22:57:05 +03:00
Oran Agra	0c90370e6d	Improvements to corrupt payload sanitization (#9321 ) Recently we found two issues in the fuzzer tester: #9302 #9285 After fixing them, more problems surfaced and this PR (as well as #9297) aims to fix them. Here's a list of the fixes - Prevent an overflow when allocating a dict hashtable - Prevent OOM when attempting to allocate a huge string - Prevent a few invalid accesses in listpack - Improve sanitization of listpack first entry - Validate integrity of stream consumer groups PEL - Validate integrity of stream listpack entry IDs - Validate ziplist tail followed by extra data which start with 0xff Co-authored-by: sundb <sundbcn@gmail.com>	2021-08-05 22:56:14 +03:00
sundb	8ea777a6a0	Sanitize dump payload: fix empty keys when RDB loading and restore command (#9297 ) When we load rdb or restore command, if we encounter a length of 0, it will result in the creation of an empty key. This could either be a corrupt payload, or a result of a bug (see #8453 ) This PR mainly fixes the following: 1) When restore command will return `Bad data format` error. 2) When loading RDB, we will silently discard the key. Co-authored-by: Oran Agra <oran@redislabs.com>	2021-08-05 22:42:20 +03:00
Binbin	d0244bfc3d	Make sure execute SLAVEOF command in the right order in psync2 test. (#9316 ) The psync2 test has failed several times recently. In #9159 we only solved half of the problem. i.e. reordering of the replica that's already connected to the newly promoted master. Consider this scenario: 0 slaveof 2 1 slaveof 2 3 slaveof 2 4 slaveof 1 0 slaveof no one, became a new master got a new replid 2 slaveof 0, partial resync and got the new replid 3 reconnect 2, inherit the new replid 3 slaveof 4, use the new replid and got a full resync And another scenario: 1 slaveof 3 2 slaveof 4 3 slaveof 0 4 slaveof 0 4 slaveof no one, became a new master got a new replid 2 reconnect 4, inherit the new replid 2 slaveof 1, use the new replid and got a full resync So maybe we should reattach replicas in the right order. i.e. In the above example, if it would have reattached 1, 3 and 0 to the new chain formed by 4 before trying to attach 2 to 1, it would succeed. This commit break the SLAVEOF loop into two loops. (ideas from oran) First loop that uses random to decide who replicates from who. Second loop that does the actual SLAVEOF command. In the second loop, we make sure to execute it in the right order, and after each SLAVEOF, wait for it to be connected before we proceed. Co-authored-by: Oran Agra <oran@redislabs.com>	2021-08-05 11:26:09 +03:00
Viktor Söderqvist	1c59567a7f	redis-cli ASK redirect test: Add retry loop to fix timing issue (#9315 )	2021-08-05 08:20:30 +03:00
Wang Yuan	d4bca53cd9	Use madvise(MADV_DONTNEED) to release memory to reduce COW (#8974 ) ## Backgroud As we know, after `fork`, one process will copy pages when writing data to these pages(CoW), and another process still keep old pages, they totally cost more memory. For redis, we suffered that redis consumed much memory when the fork child is serializing key/values, even that maybe cause OOM. But actually we find, in redis fork child process, the child process don't need to keep some memory and parent process may write or update that, for example, child process will never access the key-value that is serialized but users may update it in parent process. So we think it may reduce COW if the child process release memory that it is not needed. ## Implementation For releasing key value in child process, we may think we call `decrRefCount` to free memory, but i find the fork child process still use much memory when we don't write any data to redis, and it costs much more time that slows down bgsave. Maybe because memory allocator doesn't really release memory to OS, and it may modify some inner data for this free operation, especially when we free small objects. Moreover, CoW is based on pages, so it is a easy way that we only free the memory bulk that is not less than kernel page size. madvise(MADV_DONTNEED) can quickly release specified region pages to OS bypassing memory allocator, and allocator still consider that this memory still is used and don't change its inner data. There are some buffers we can release in the fork child process: - Serialized key-values the fork child process never access serialized key-values, so we try to free them. Because we only can release big bulk memory, and it is time consumed to iterate all items/members/fields/entries of complex data type. So we decide to iterate them and try to release them only when their average size of item/member/field/entry is more than page size of OS. - Replication backlog Because replication backlog is a cycle buffer, it will be changed quickly if redis has heavy write traffic, but in fork child process, we don't need to access that. - Client buffers If clients have requests during having the fork child process, clients' buffer also be changed frequently. The memory includes client query buffer, output buffer, and client struct used memory. To get child process peak private dirty memory, we need to count peak memory instead of last used memory, because the child process may continue to release memory (since COW used to only grow till now, the last was equivalent to the peak). Also we're adding a new `current_cow_peak` info variable (to complement the existing `current_cow_size`) Co-authored-by: Oran Agra <oran@redislabs.com>	2021-08-04 23:01:46 +03:00
Oran Agra	52df350fe5	Skip new redis-cli ASK test in TLS mode (#9312 )	2021-08-03 13:19:03 -07:00
Huang Zhw	cf61ad14cc	When redis-cli received ASK, it didn't handle it (#8930 ) When redis-cli received ASK, it used string matching wrong and didn't handle it. When we access a slot which is in migrating state, it maybe return ASK. After redirect to the new node, we need send ASKING command before retry the command. In this PR after redis-cli receives ASK, we send a ASKING command before send the origin command after reconnecting. Other changes: * Make redis-cli -u and -c (unix socket and cluster mode) incompatible with one another. * When send command fails, we avoid the 2nd reconnect retry and just print the error info. Users will decide how to do next. See #9277. * Add a test faking two redis nodes in TCL to just send ASK and OK in redis protocol to test ASK behavior. Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: Oran Agra <oran@redislabs.com>	2021-08-02 14:59:08 +03:00
Yossi Gottlieb	68b8b45cd5	Tests: avoid short reads on redis-cli output. (#9301 ) In some cases large replies on slow systems may only be partially read by the test suite, resulting with parsing errors. This fix is still timing sensitive but should greatly reduce the chances of this happening.	2021-08-01 15:07:27 +03:00
sundb	3db0f1a284	Fix missing check for sanitize_dump in corrupt-dump-fuzzer test (#9285 ) this means the assertion that checks that when deep sanitization is enabled, there are no crashes, was missing.	2021-07-29 11:53:21 +03:00
Mikhail Fesenko	1eb4baa5b8	Direct redis-cli repl prints to stderr, because --rdb can print to stdout. fflush stdout after responses (#9136 ) 1. redis-cli can output --rdb data to stdout but redis-cli also write some messages to stdout which will mess up the rdb. 2. Make redis-cli flush stdout when printing a reply This was needed in order to fix a hung in redis-cli test that uses --replica. Note that printf does flush when there's a newline, but fwrite does not. 3. fix the redis-cli --replica test which used to pass previously because it didn't really care what it read, and because redis-cli used printf to print these other things to stdout. 4. improve redis-cli --replica test to run with both diskless and disk-based. Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: Viktor Söderqvist <viktor@zuiderkwast.se>	2021-07-07 08:26:26 +03:00
Binbin	1d5aa37d68	Fix timing issue in psync2 test. (#9159 ) *** [err]: PSYNC2: total sum of full synchronizations is exactly 4 intests/integration/psync2.tcl Expected 5 == 4 (context: type eval line 8 cmd {assert {$sum == 4}} proc::test) Sometime the test got an unexpected full sync since a replica switch to master, before the new master change propagated the new replid to all replicas, a replica attempted to sync with it using a wrong replid and triggered a full resync. Consider this scenario: 1 slaveof 4 full resync 0 slaveof 4 full resync 2 slaveof 0 full resync 3 slaveof 1 full resync 1 slaveof no one, replid changed 3 reconnect 1, did a partial resyn and got the new replid Before 2 inherits the new replid. 3 slaveof 2 3 try to do a partial resyn with 2. But their replication ids are inconsistent, so a full resync happens. :) A special thank you for oran and helping me in this test case. Co-authored-by: Oran Agra <oran@redislabs.com>	2021-06-30 09:18:10 +03:00
Oran Agra	d0819d618e	solve test timing issues in replication tests (#9121 ) # replication-3.tcl had a test timeout failure with valgrind on daily CI: ``` * [err]: SLAVE can reload "lua" AUX RDB fields of duplicated scripts in tests/integration/replication-3.tcl Replication not started. ``` replication took more than 70 seconds. https://github.com/redis/redis/runs/2854037905?check_suite_focus=true on my machine it takes only about 30, but i can see how 50 seconds isn't enough. # replication.tcl loading was over too quickly in freebsd daily CI: ``` * [err]: slave fails full sync and diskless load swapdb recovers it in tests/integration/replication.tcl Expected '0' to be equal to '1' (context: type eval line 44 cmd {assert_equal [s -1 loading] 1} proc ::start_server) ``` # rdb.tcl loading was over too quickly. increase the time loading takes, and decrease the amount of work we try to achieve in that time.	2021-06-22 11:10:11 +03:00
YaacovHazan	1677efb9da	cleanup around loadAppendOnlyFile (#9012 ) Today when we load the AOF on startup, the loadAppendOnlyFile checks if the file is openning for reading. This check is redundent (dead code) as we open the AOF file for writing at initServer, and the file will always be existing for the loadAppendOnlyFile. In this commit: - remove all the exit(1) from loadAppendOnlyFile, as it is the caller responsibility to decide what to do in case of failure. - move the opening of the AOF file for writing, to be after we loading it. - avoid return -ERR in DEBUG LOADAOF, when the AOF is existing but empty	2021-06-14 10:38:08 +03:00
Binbin	0bfccc55e2	Fixed some typos, add a spell check ci and others minor fix (#8890 ) This PR adds a spell checker CI action that will fail future PRs if they introduce typos and spelling mistakes. This spell checker is based on blacklist of common spelling mistakes, so it will not catch everything, but at least it is also unlikely to cause false positives. Besides that, the PR also fixes many spelling mistakes and types, not all are a result of the spell checker we use. Here's a summary of other changes: 1. Scanned the entire source code and fixes all sorts of typos and spelling mistakes (including missing or extra spaces). 2. Outdated function / variable / argument names in comments 3. Fix outdated keyspace masks error log when we check `config.notify-keyspace-events` in loadServerConfigFromString. 4. Trim the white space at the end of line in `module.c`. Check: https://github.com/redis/redis/pull/7751 5. Some outdated https link URLs. 6. Fix some outdated comment. Such as: - In README: about the rdb, we used to said create a `thread`, change to `process` - dbRandomKey function coment (about the dictGetRandomKey, change to dictGetFairRandomKey) - notifyKeyspaceEvent fucntion comment (add type arg) - Some others minor fix in comment (Most of them are incorrectly quoted by variable names) 7. Modified the error log so that users can easily distinguish between TCP and TLS in `changeBindAddr`	2021-06-10 15:39:33 +03:00
Yossi Gottlieb	8a86bca5ed	Improve test suite to handle external servers better. (#9033 ) This commit revives the improves the ability to run the test suite against external servers, instead of launching and managing `redis-server` processes as part of the test fixture. This capability existed in the past, using the `--host` and `--port` options. However, it was quite limited and mostly useful when running a specific tests. Attempting to run larger chunks of the test suite experienced many issues: * Many tests depend on being able to start and control `redis-server` themselves, and there's no clear distinction between external server compatible and other tests. * Cluster mode is not supported (resulting with `CROSSSLOT` errors). This PR cleans up many things and makes it possible to run the entire test suite against an external server. It also provides more fine grained controls to handle cases where the external server supports a subset of the Redis commands, limited number of databases, cluster mode, etc. The tests directory now contains a `README.md` file that describes how this works. This commit also includes additional cleanups and fixes: * Tests can now be tagged. * Tag-based selection is now unified across `start_server`, `tags` and `test`. * More information is provided about skipped or ignored tests. * Repeated patterns in tests have been extracted to common procedures, both at a global level and on a per-test file basis. * Cleaned up some cases where test setup was based on a previous test executing (a major anti-pattern that repeats itself in many places). * Cleaned up some cases where test teardown was not part of a test (in the future we should have dedicated teardown code that executes even when tests fail). * Fixed some tests that were flaky running on external servers.	2021-06-09 15:13:24 +03:00
Oran Agra	b512dfe794	tests: add details when test fails on malformed info (#9042 )	2021-06-03 20:34:54 +03:00
ny0312	53d1acd598	Always replicate time-to-live(TTL) as absolute timestamps in milliseconds (#8474 ) Till now, on replica full-sync we used to transfer absolute time for TTL, however when a command arrived (EXPIRE or EXPIREAT), we used to propagate it as is to replicas (possibly with relative time), but always translate it to EXPIREAT (absolute time) to AOF. This commit changes that and will always use absolute time for propagation. see discussion in #8433 Furthermore, we Introduce new commands: `EXPIRETIME/PEXPIRETIME` that allow extracting the absolute TTL time from a key.	2021-05-30 09:20:32 +03:00
YaacovHazan	501d775583	unregister AE_READABLE from the read pipe in backgroundSaveDoneHandlerSocket (#8991 ) In diskless replication, we create a read pipe for the RDB, between the child and the parent. When we close this pipe (fd), the read handler also needs to be removed from the event loop (if it still registered). Otherwise, next time we will use the same fd, the registration will be fail (panic), because we will use EPOLL_CTL_MOD (the fd still register in the event loop), on fd that already removed from epoll_ctl	2021-05-26 14:51:53 +03:00
YaacovHazan	32a2584e07	stabilize tests that involved with load handlers (#8967 ) When test stop 'load handler' by killing the process that generating the load, some commands that already in the input buffer, still might be processed by the server. This may cause some instability in tests, that count on that no more commands processed after we stop the `load handler' In this commit, new proc 'wait_load_handlers_disconnected' added, to verify that no more cammands from any 'load handler' prossesed, by checking that the clients who genreate the load is disconnceted. Also, replacing check of dbsize with wait_for_ofs_sync before comparing debug digest, as it would fail in case the last key the workload wrote was an overridden key (not a new one). Affected tests Race fix: - failover command to specific replica works - Connect multiple replicas at the same time (issue #141), master diskless=$mdl, replica diskless=$sdl - AOF rewrite during write load: RDB preamble=$rdbpre Cleanup and speedup: - Test replication with blocking lists and sorted sets operations - Test replication with parallel clients writing in different DBs - Test replication partial resync: $descr (diskless: $mdl, $sdl, reconnect: $reconnect	2021-05-20 15:29:43 +03:00

1 2 3 4 5 ...

255 Commits