redict

mirror of https://codeberg.org/redict/redict.git synced 2025-01-23 08:38:27 -05:00

Author	SHA1	Message	Date
Vadim Pushtaev	f069d09ee0	Use utils/create-cluster from any dir (#9766 ) Co-authored-by: Vadim Pushtaev <v.pushtaev@corp.mail.ru>	2021-11-11 08:33:09 +02:00
Oran Agra	0927a0dd24	Try solving test timeout on freebsd CI (#9768 ) First, avoid using --accurate on the freebsd CI, we only care about systematic issues there due to being different platform, but not accuracy Secondly, when looking at the test which timed out it seems silly and outdated: - it used KEYS to attempt to trigger lazy expiry, but KEYS doesn't do that anymore. - it used some hard coded sleeps rather than waiting for things to happen and exiting ASAP	2021-11-10 19:39:26 +02:00
Oran Agra	978eadbad4	Increase test timeout in valgrind runs (#9767 ) We saw some tests sporadically time out on valgrind (namely the ones from #9323). Increasing valgrind timeout from 20 mins to 40 mins in CI. And fixing an outdated help message.	2021-11-10 19:38:58 +02:00
YaacovHazan	03406fcb6c	fix short timeout in replication short read tests (#9763 ) In both tests, "diskless loading short read" and "diskless loading short read with module", the timeout of waiting for the replica to respond to a short read and log it, is too short. Also, add --dump-logs in runtest-moduleapi for valgrind runs.	2021-11-09 22:37:18 +02:00
Jim Brunner	cd0f710be9	fix minor memory leak in rewriteConfigSds (#9762 )	2021-11-09 10:35:22 -08:00
Eduardo Semprebon	c22d3684ba	Fix diskless load handling on broken EOF marker (#9752 ) During diskless replication, the check for broken EOF mark is misplaced and should be earlier. Now we do not swap db, we do proper cleanup and correctly raise module events on this kind of failure. This issue existed prior to #9323, but before, the side effect was not restoring backup and not raising the correct module events on this failure.	2021-11-09 11:46:10 +02:00
Wen Hui	2ce29e032b	Sentinel tls memory leak (#9753 ) There was a memory leak when tls is used in Sentinels. The memory leak is noticed when some of the replicas are offline.	2021-11-08 22:23:31 +02:00
Yossi Gottlieb	a1aba4bf75	Fix EINTR test failures. (#9751 ) * Clean up EINTR handling so EINTR will not change connection state to begin with. * On TLS, catch EINTR and return it as-is before going through OpenSSL error handling (which seems to not distinguish it from EAGAIN).	2021-11-08 16:09:33 +02:00
Huang Zhw	48d870aed1	Move config from clusterCron to config update (#9580 )	2021-11-07 18:56:03 -08:00
chendianqiang	a527c3e814	Test suite - user server socket to optimize port detection (#9663 ) Optimized port detection for tcl, use 'socket -server' instead of 'socket' to rule out port on TIME_WAIT Co-authored-by: chendianqiang <chendianqiang@meituan.com> Co-authored-by: Oran Agra <oran@redislabs.com>	2021-11-07 13:53:57 +02:00
yoav-steinberg	79ac57561f	Refactor config.c for generic setter interface (#9644 ) This refactors all `CONFIG SET`s and conf file loading arguments go through the generic config handling interface. Refactoring changes: - All config params go through the `standardConfig` interface (some stuff which is only related to the config file and not the `CONFIG` command still has special handling for rewrite/config file parsing, `loadmodule`, for example.) . - Added `MULTI_ARG_CONFIG` flag for configs to signify they receive a variable number of arguments instead of a single argument. This is used to break up space separated arguments to `CONFIG SET` so the generic setter interface can pass multiple arguments to the setter function. When parsing the config file we also break up anything after the config name into multiple arguments to the setter function. Interface changes: - A side effect of the above interface is that the `bind` argument in the config file can be empty (no argument at all) this is treated the same as passing an single empty string argument (same as `save` already used to work). - Support rewrite and setting `watchdog-period` from config file (was only supported by the CONFIG command till now). - Another side effect is that the `save T X` config argument now supports multiple Time-Changes pairs in a single line like its `CONFIG SET` counterpart. So in the config file you can either do: ``` save 3600 1 save 600 10 ``` or do ``` save 3600 1 600 10 ``` Co-authored-by: Bjorn Svensson <bjorn.a.svensson@est.tech>	2021-11-07 13:40:08 +02:00
Binbin	ddb508c0a2	Fixing import of sys/errno (#9739 )	2021-11-05 07:50:25 +02:00
Eduardo Semprebon	91d0c758e5	Replica keep serving data during repl-diskless-load=swapdb for better availability (#9323 ) For diskless replication in swapdb mode, considering we already spend replica memory having a backup of current db to restore in case of failure, we can have the following benefits by instead swapping database only in case we succeeded in transferring db from master: - Avoid `LOADING` response during failed and successful synchronization for cases where the replica is already up and running with data. - Faster total time of diskless replication, because now we're moving from Transfer + Flush + Load time to Transfer + Load only. Flushing the tempDb is done asynchronously after swapping. - This could be implemented also for disk replication with similar benefits if consumers are willing to spend the extra memory usage. General notes: - The concept of `backupDb` becomes `tempDb` for clarity. - Async loading mode will only kick in if the replica is syncing from a master that has the same repl-id the one it had before. i.e. the data it's getting belongs to a different time of the same timeline. - New property in INFO: `async_loading` to differentiate from the blocking loading - Slot to Key mapping is now a field of `redisDb` as it's more natural to access it from both server.db and the tempDb that is passed around. - Because this is affecting replicas only, we assume that if they are not readonly and write commands during replication, they are lost after SYNC same way as before, but we're still denying CONFIG SET here anyways to avoid complications. Considerations for review: - We have many cases where server.loading flag is used and even though I tried my best, there may be cases where async_loading should be checked as well and cases where it shouldn't (would require very good understanding of whole code) - Several places that had different behavior depending on the loading flag where actually meant to just handle commands coming from the AOF client differently than ones coming from real clients, changed to check CLIENT_ID_AOF instead. Additional for Release Notes - Bugfix - server.dirty was not incremented for any kind of diskless replication, as effect it wouldn't contribute on triggering next database SAVE - New flag for RM_GetContextFlags module API: REDISMODULE_CTX_FLAGS_ASYNC_LOADING - Deprecated RedisModuleEvent_ReplBackup. Starting from Redis 7.0, we don't fire this event. Instead, we have the new RedisModuleEvent_ReplAsyncLoad holding 3 sub-events: STARTED, ABORTED and COMPLETED. - New module flag REDISMODULE_OPTIONS_HANDLE_REPL_ASYNC_LOAD for RedisModule_SetModuleOptions to allow modules to declare they support the diskless replication with async loading (when absent, we fall back to disk-based loading). Co-authored-by: Eduardo Semprebon <edus@saxobank.com> Co-authored-by: Oran Agra <oran@redislabs.com>	2021-11-04 10:46:50 +02:00
Itamar Haber	06dd202a05	Fixes LPOP/RPOP wrong replies when count is 0 (#9692 ) Introduced in #8179, this fixes the command's replies in the 0 count edge case. [BREAKING] changes the reply type when count is 0 to an empty array (instead of nil) Moves LPOP ... 0 fast exit path after type check to reply with WRONGTYPE	2021-11-04 09:43:08 +02:00
menwen	ccf8a651f3	Retry when a blocked connection system call is interrupted by a signal (#9629 ) When repl-diskless-load is enabled, the connection is set to the blocking state. The connection may be interrupted by a signal during a system call. This would have resulted in a disconnection and possibly a reconnection loop. Co-authored-by: Oran Agra <oran@redislabs.com>	2021-11-04 09:09:28 +02:00
Oran Agra	d04f306931	Fix race condition in cluster test 22-replica-in-sync (#9721 ) there was a chance that by the time the assertion is executed, the replica already manages to reconnect. now we make sure the replica is unable to re-connect to the master. additionally, we wait for some gossip from the disconnected replica, to see that it doesn't mess things up. unrelated: fix a typo when trying to exhaust the backlog, one that didn't have any harmful implications Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>	2021-11-04 08:44:18 +02:00
perryitay	f27083a4a8	Add support for list type to store elements larger than 4GB (#9357 ) Redis lists are stored in quicklist, which is currently a linked list of ziplists. Ziplists are limited to storing elements no larger than 4GB, so when bigger items are added they're getting truncated. This PR changes quicklists so that they're capable of storing large items in quicklist nodes that are plain string buffers rather than ziplist. As part of the PR there were few other changes in redis: 1. new DEBUG sub-commands: - QUICKLIST-PACKED-THRESHOLD - set the threshold of for the node type to be plan or ziplist. default (1GB) - QUICKLIST <key> - Shows low level info about the quicklist encoding of <key> 2. rdb format change: - A new type was added - RDB_TYPE_LIST_QUICKLIST_2 . - container type (packed / plain) was added to the beginning of the rdb object (before the actual node list). 3. testing: - Tests that requires over 100MB will be by default skipped. a new flag was added to 'runtest' to run the large memory tests (not used by default) Co-authored-by: sundb <sundbcn@gmail.com> Co-authored-by: Oran Agra <oran@redislabs.com>	2021-11-03 20:47:18 +02:00
guybe7	f11a2d4dd7	Fix COMMAND GETKEYS on EVAL without keys (#9733 ) Add new no-mandatory-keys flag to support COMMAND GETKEYS of commands which have no mandatory keys. In the past we would have got this error: ``` 127.0.0.1:6379> command getkeys eval "return 1" 0 (error) ERR Invalid arguments specified for command ```	2021-11-03 14:38:26 +02:00
perryitay	77d3c6bff3	fix: lookupKey on SETNX and SETXX only once (#9640 ) When using SETNX and SETXX we could end up doing key lookup twice. This presents a small inefficiency price. Also once we have statistics of write hit and miss they'll be wrong (recording the same key hit twice)	2021-11-03 14:12:33 +02:00
Oran Agra	d25dc08932	Solve issues with tracking test in external mode (#9726 ) The issue was that setting maxmemory to used_memory and expecting eviction is insufficient, since we need to take mem_not_counted_for_evict into consideration. This test got broken by #9166	2021-11-02 16:07:51 -07:00
yiyuaner	78025c4a26	Add checks for overflow in redis-check-aof and loadAppendOnlyFile (#9669 ) Co-authored-by: guoyiyuan <guoyiyuan@sbrella.com>	2021-11-02 17:03:07 +02:00
Oran Agra	87321deb3f	attempt to fix tracking test issue with external tests due to lazy free (#9722 ) The External tests started failing recently for unclear reason: ``` *** [err]: Tracking invalidation message of eviction keys should be before response in tests/unit/tracking.tcl Expected '0' to be equal to 'invalidate volatile-key' (context: type eval line 21 cmd {assert_equal $res {invalidate volatile-key}} proc ::test) ``` I suspect the issue is that the used_memory sample is taken while a lazy free is still being processed.	2021-11-02 16:42:53 +02:00
menwen	d5ca72e38b	fix defrag test looking at the wrong latency metric (#9723 ) the latency event was renamed in #7726, and the outcome was that the test was ineffective (unable to measure the max latency, always seeing 0)	2021-11-02 15:52:56 +02:00
Wang Yuan	526cbb5cff	Fix not updating backlog histlen when trimming repl backlog (#9713 ) Since the loop in incrementalTrimReplicationBacklog checks the size of histlen, we cannot afford to update it only when the loop exits, this may cause deleting much more replication blocks, and replication backlog may be less than setting size. introduce in #9166 Co-authored-by: sundb <sundbcn@gmail.com>	2021-11-02 11:04:11 +02:00
zhaozhao.zz	d08f0552ee	rebuild replication backlog index when master restart (#9720 ) After PR #9166 , replication backlog is not a real block of memory, just contains a reference points to replication buffer's block and the blocks index (to accelerate search offset when partial sync), so we need update both replication buffer's block's offset and replication backlog blocks index's offset when master restart from RDB, since the `server.master_repl_offset` is changed. The implications of this bug was just a slow search, but not a replication failure.	2021-11-02 10:53:52 +02:00
Binbin	58a1d16ff6	Fix timing issue in replication test (#9719 ) So it looks like sampling set loglines [count_log_lines -2] was executed too late, and the replication managed to complete before that. ``` *** [err]: diskless no replicas drop during rdb pipe in tests/integration/replication.tcl log message of '"Diskless rdb transfer, done reading from pipe, 2 replicas still up"' not found in ./tests/tmp/server.6124.69/stdout after line: 52 till line: 52 ``` Changes: 1. when we search the master log file, we start to search from before we sent the REPLICAOF command, to prevent a race in which the replication completed before we sampled the log line count. 2. we don't need to sample the replica loglines sine it's a fresh resplica that's just been started, so the message we're looking for is the first occurrence in the log, we can start search from 0. Co-authored-by: Oran Agra <oran@redislabs.com>	2021-11-02 10:32:01 +02:00
Binbin	cea7809cea	Fix race condition in psync2-pingoff test (#9712 ) Test failed on freebsd: ``` *** [err]: Make the old master a replica of the new one and check conditions in tests/integration/psync2-pingoff.tcl Expected '162' to be equal to '176' (context: type eval line 18 cmd {assert_equal [status $R(0) master_repl_offset] [status $R(1) master_repl_offset]} proc ::test) ``` There are two possible race conditions in the test. 1. The code waits for sync_full to increment, and assumes that means the master did the fork. But in fact there are cases the master will increment that sync_full counter (after replica asks for sync), but will see that there's already a fork running and will delay the fork creation. In this case the INCR will be executed before the fork happens, so it'll not be in the command stream. Solve that by waiting for `master_link_status: up` on the replica before the INCR. 2. The repl-ping-replica-period is still high (1 second), so there's a chance the master will send an additional PING between the two calls to INFO (the line that fails is the one that samples INFO from both servers). So there's a chance one of them will have an incremented offset due to PING and the other won't have it yet. In theory we can wait for the repl_offset to match, but then we risk facing a situation where that race will hide an offset mis-match. so instead, i think we should just change repl-ping-replica-period to prevent further pings from being pushed. Co-authored-by: Oran Agra <oran@redislabs.com>	2021-11-01 16:07:08 +02:00
Oran Agra	f1f3cceb50	fix valgrind issues with long double module test (#9709 ) The module test in reply.tcl was introduced by #8521 but didn't run until recently (see #9639) and then it started failing with valgrind. This is because valgrind uses 64 bit long double (unlike most other platforms that have at least 80 bits) But besides valgrind, the tests where also incompatible with ARM32, which also uses 64 bit long doubles. We now use appropriate value to avoid issues with either valgrind or ARM32 In all the double tests, i use 3.141, which is safe since since addReplyDouble uses `%.17Lg` which is able to represent this value without adding any digits due to precision loss. In the long double, since we use `%.17Lf` in ld2string, it preserves 17 significant digits, rather than 17 digit after the decimal point (like in `%.17Lg`). So to make these similar, i use value lower than 1 (no digits left of the period) Lastly, we have the same issue with TCL (no long doubles) so we read raw protocol in that test. Note that the only error before this fix (in both valgrind and ARM32 is this: ``` *** [err]: RM_ReplyWithLongDouble: a float reply in tests/unit/moduleapi/reply.tcl Expected '3.141' to be equal to '3.14100000000000001' (context: type eval line 2 cmd {assert_equal 3.141 [r rw.longdouble 3.141]} proc ::test) ``` so the changes to debug.c and scripting.tcl aren't really needed, but i consider them a cleanup (i.e. scripting.c validated a different constant than the one that's sent to it from debug.c). Another unrelated change is to add the RESP version to the repeated tests in reply.tcl	2021-11-01 13:41:35 +02:00
罗泽轩	155c291006	Remove duplicate SET_OP_XX definitions in t_set.c. (#4326 ) These definitions already exist in server.h.	2021-11-01 11:09:29 +02:00
Oran Agra	48d54265ce	Fix failing cluster tests (#9707 ) Fix failures introduced by #9695 which was an attempt to solve failures introduced by #9679. And alternative to #9703 (i didn't like the extra argument to kill_instance). Reverting #9695. Instead of stopping AOF on all terminations, stop it only on the two which need it. Do it as part of the test rather than the infra (it was add that kill_instance used `R` to communicate to the instance) Note that the original purpose of these tests was to trigger a crash, but that upsets valgrind so in redis 6.2 i changed it to use SIGTERM, so i now rename the tests (remove "kill" and "crash"). Also add some colors to failures, and the word "FAILED" so that it's searchable. And solve a semi-related race condition in 14-consistency-check.tcl	2021-10-31 19:22:21 +02:00
Yossi Gottlieb	f26e90be0c	Use 'gcc' instead of 'ld' to link test modules. (#9710 ) This solves several problems in a more elegant way: * No need to explicitly use `-lc` on x86_64 when building with `-m32`. * Avoids issues with undefined floating point emulation funcs on ARM.	2021-10-31 16:25:57 +02:00
Binbin	033578839b	Fix multiple COUNT in LMPOP/BLMPOP/ZMPOP/BZMPOP (#9701 ) The previous code did not check whether COUNT is set. So we can use `lmpop 2 key1 key2 left count 1 count 2`. This situation can occur in LMPOP/BLMPOP/ZMPOP/BZMPOP commands. LMPOP/BLMPOP introduced in #9373, ZMPOP/BZMPOP introduced in #9484.	2021-10-31 16:10:29 +02:00
lijinliang	215b909c1f	fix typo in db.c: synchroneus -> synchronous(2 places) (#9702 ) Co-authored-by: lijinliang <lijl@newdt.cn>	2021-10-31 16:01:54 +02:00
Rafi Einstein	734cde7e38	Fix memory leak when there's a read error of module aux data from rdb. (#9705 )	2021-10-31 15:59:48 +02:00
Wang Yuan	68886de085	Fix timing issue in replication buffer test (#9697 ) Introduced in #9166	2021-10-29 08:04:12 +03:00
Oran Agra	22a778c880	fix failing cluster tests (#9695 ) When stopping an instance in the cluster tests, disable appendonly first, so that SIGTERM won't be ignored. Recently in #9679 i change the test infra to use SIGSEGV to kill servers that refuse the SIGTERM rather than do SIGKILL directly. This surfaced an issue that i've added in #7725 which changed SIGKILL to SIGTERM (to resolve valgrind issues). So the current situation in the past months was that sometimes servers refused the SIGTERM and waited 10 seconds for the SIGKILL, and this commit resolves that (faster termination).	2021-10-28 12:16:27 +03:00
yoav-steinberg	b7afac6bc2	Remove update-jemalloc.sh, it's not needed anymore (#9690 ) We now use git subtree for deps/jemalloc, updating jemalloc is detailed in deps/README.md	2021-10-27 14:48:08 +03:00
guybe7	975f51fe16	Add new SLOTSRANGE to subcommands table (#9689 )	2021-10-27 10:44:14 +03:00
Wen Hui	5fb4adba65	New Cluster Command: CLUSTER DELSLOTSRANGE and CLUSTER ADDSLOTSRANGE (#9445 )	2021-10-26 21:44:33 -07:00
Wang Yuan	37dc2f13b4	Fix not waiting for data loading to complete in AOF tests (#9683 ) Fix timing issue of a new test introduced in #9326	2021-10-26 14:08:09 +03:00
Wen Hui	43b22f17dc	Sentinel: don't log auth-pass value for better security (#9652 )	2021-10-26 13:13:12 +03:00
Oran Agra	37559ca79f	Fix race condition in lazy free test (#9682 ) The first test exited before all the memory was reclaimed, so when the second test sampled used_memory, it was too early.	2021-10-26 13:02:31 +03:00
Oran Agra	665e428435	Testsuite: attempt to find / avoid valgrind warnings of killed processes (#9679 ) I recently started seeing a lot of empty valgrind reports in the daily CI. i.e. prints showing valgrind header but no leak report, which causes the tests to fail https://github.com/redis/redis/runs/3991335416?check_suite_focus=true This commit change 2 things: * first, considering valgrind is just slow, we used to give processes 60 seconds timeout on shutdown instead of 10 seconds we give normally. this commit changes that to 120. * secondly, when we reach the timeout, we first try to use SIGSEGV so that maybe we'll get a stack trace indicating where redis is hang, and we only resort to SIGKILL if double that time passed. note that if there are indeed hang processes, we will normally not see that in the non-valgrind runs, since the tests didn't use to detect any failure in that case, and now they will since `crashlog_from_file` is run after `kill_server`.	2021-10-26 08:34:30 +03:00
Wang Yuan	9ec3294b97	Add timestamp annotations in AOF (#9326 ) Add timestamp annotation in AOF, one part of #9325. Enabled with the new `aof-timestamp-enabled` config option. Timestamp annotation format is "#TS:${timestamp}\r\n"." TS" is short of timestamp and this method could save extra bytes in AOF. We can use timestamp annotation for some special functions. - know the executing time of commands - restore data to a specific point-in-time (by using redis-check-rdb to truncate the file)	2021-10-25 13:08:34 +03:00
Oran Agra	085615af97	Improve code doc of allowed_firstargs following #9504 (#9674 ) Improve code doc for allowed_firstargs (used to be allowed_commands before #9504. I don't think the text in the code needs to refer to the history (it's not there just for backwards compatibility). instead it should just describe what it does.	2021-10-25 13:01:25 +03:00
Guy Korland	6cf6c36937	Replace deprecated REDISMODULE_POSTPONED_ARRAY_LEN in module tests and examples (#9677 ) REDISMODULE_POSTPONED_ARRAY_LEN is deprecated, use REDISMODULE_POSTPONED_LEN instead	2021-10-25 12:00:43 +03:00
Itamar Haber	00362f2a94	Removes admin acl category from CLIENT TRACKINGINFO (#9662 ) overlooked in #9504	2021-10-25 11:33:37 +03:00
Shaya Potter	12ce2c3925	Add RM_ReplyWithBigNumber module API (#9639 ) Let modules use additional type of RESP3 response (unused by redis so far) Also fix tests that where introduced in #8521 but didn't actually run. Co-authored-by: Oran Agra <oran@redislabs.com>	2021-10-25 11:31:20 +03:00
Wang Yuan	c1718f9d86	Replication backlog and replicas use one global shared replication buffer (#9166 ) ## Background For redis master, one replica uses one copy of replication buffer, that is a big waste of memory, more replicas more waste, and allocate/free memory for every reply list also cost much. If we set client-output-buffer-limit small and write traffic is heavy, master may disconnect with replicas and can't finish synchronization with replica. If we set client-output-buffer-limit big, master may be OOM when there are many replicas that separately keep much memory. Because replication buffers of different replica client are the same, one simple idea is that all replicas only use one replication buffer, that will effectively save memory. Since replication backlog content is the same as replicas' output buffer, now we can discard replication backlog memory and use global shared replication buffer to implement replication backlog mechanism. ## Implementation I create one global "replication buffer" which contains content of replication stream. The structure of "replication buffer" is similar to the reply list that exists in every client. But the node of list is `replBufBlock`, which has `id, repl_offset, refcount` fields. ```c /* Replication buffer blocks is the list of replBufBlock. * * +--------------+ +--------------+ +--------------+ * \| refcount = 1 \| ... \| refcount = 0 \| ... \| refcount = 2 \| * +--------------+ +--------------+ +--------------+ * \| / \ * \| / \ * \| / \ * Repl Backlog Replia_A Replia_B * * Each replica or replication backlog increments only the refcount of the * 'ref_repl_buf_node' which it points to. So when replica walks to the next * node, it should first increase the next node's refcount, and when we trim * the replication buffer nodes, we remove node always from the head node which * refcount is 0. If the refcount of the head node is not 0, we must stop * trimming and never iterate the next node. / / Similar with 'clientReplyBlock', it is used for shared buffers between * all replica clients and replication backlog. / typedef struct replBufBlock { int refcount; / Number of replicas or repl backlog using. / long long id; / The unique incremental number. / long long repl_offset; / Start replication offset of the block. */ size_t size, used; char buf[]; } replBufBlock; ``` So now when we feed replication stream into replication backlog and all replicas, we only need to feed stream into replication buffer `feedReplicationBuffer`. In this function, we set some fields of replication backlog and replicas to references of the global replication buffer blocks. And we also need to check replicas' output buffer limit to free if exceeding `client-output-buffer-limit`, and trim replication backlog if exceeding `repl-backlog-size`. When sending reply to replicas, we also need to iterate replication buffer blocks and send its content, when totally sending one block for replica, we decrease current node count and increase the next current node count, and then free the block which reference is 0 from the head of replication buffer blocks. Since now we use linked list to manage replication backlog, it may cost much time for iterating all linked list nodes to find corresponding replication buffer node. So we create a rax tree to store some nodes for index, but to avoid rax tree occupying too much memory, i record one per 64 nodes for index. Currently, to make partial resynchronization as possible as much, we always let replication backlog as the last reference of replication buffer blocks, backlog size may exceeds our setting if slow replicas that reference vast replication buffer blocks, and this method doesn't increase memory usage since they share replication buffer. To avoid freezing server for freeing unreferenced replication buffer blocks when we need to trim backlog for exceeding backlog size setting, we trim backlog incrementally (free 64 blocks per call now), and make it faster in `beforeSleep` (free 640 blocks). ### Other changes - `mem_total_replication_buffers`: we add this field in INFO command, it means the total memory of replication buffers used. - `mem_clients_slaves`: now even replica is slow to replicate, and its output buffer memory is not 0, but it still may be 0, since replication backlog and replicas share one global replication buffer, only if replication buffer memory is more than the repl backlog setting size, we consider the excess as replicas' memory. Otherwise, we think replication buffer memory is the consumption of repl backlog. - Key eviction Since all replicas and replication backlog share global replication buffer, we think only the part of exceeding backlog size the extra separate consumption of replicas. Because we trim backlog incrementally in the background, backlog size may exceeds our setting if slow replicas that reference vast replication buffer blocks disconnect. To avoid massive eviction loop, we don't count the delayed freed replication backlog into used memory even if there are no replicas, i.e. we also regard this memory as replicas's memory. - `client-output-buffer-limit` check for replica clients It doesn't make sense to set the replica clients output buffer limit lower than the repl-backlog-size config (partial sync will succeed and then replica will get disconnected). Such a configuration is ignored (the size of repl-backlog-size will be used). This doesn't have memory consumption implications since the replica client will share the backlog buffers memory. - Drop replication backlog after loading data if needed We always create replication backlog if server is a master, we need it because we put DELs in it when loading expired keys in RDB, but if RDB doesn't have replication info or there is no rdb, it is not possible to support partial resynchronization, to avoid extra memory of replication backlog, we drop it. - Multi IO threads Since all replicas and replication backlog use global replication buffer, if I/O threads are enabled, to guarantee data accessing thread safe, we must let main thread handle sending the output buffer to all replicas. But before, other IO threads could handle sending output buffer of all replicas. ## Other optimizations This solution resolve some other problem: - When replicas disconnect with master since of out of output buffer limit, releasing the output buffer of replicas may freeze server if we set big `client-output-buffer-limit` for replicas, but now, it doesn't cause freezing. - This implementation may mitigate reply list copy cost time(also freezes server) when one replication has huge reply buffer and another replica can copy buffer for full synchronization. now, we just copy reference info, it is very light. - If we set replication backlog size big, it also may cost much time to copy replication backlog into replica's output buffer. But this commit eliminates this problem. - Resizing replication backlog size doesn't empty current replication backlog content.	2021-10-25 09:24:31 +03:00
Oran Agra	6b297cd646	Improve errno reporting on fork and fopen rdbLoad failures (#9649 ) I moved a bunch of stats in redisFork to be executed only on successful fork, since they seem wrong to be done when it failed. I guess when fork fails it does that immediately, no latency spike.	2021-10-24 16:52:44 +03:00

... 3 4 5 6 7 ...

10874 Commits