redict

mirror of https://codeberg.org/redict/redict.git synced 2025-01-22 08:08:53 -05:00

Author	SHA1	Message	Date
Chen Tianjie	4cae99e785	Add overhead of all DBs and rehashing dict count to info. (#12913 ) Sometimes we need to make fast judgement about why Redis is suddenly taking more memory. One of the reasons is main DB's dicts doing rehashing. We may use `MEMORY STATS` to monitor the overhead memory of each DB, but there still lacks a total sum to show an overall trend. So this PR adds the total overhead of all DBs to `INFO MEMORY` section, together with the total count of rehashing DB dicts, providing some intuitive metrics about main dicts rehashing. This PR adds the following metrics to INFO MEMORY * `mem_overhead_db_hashtable_rehashing` - only size of ht[0] in dictionaries we're rehashing (i.e. the memory that's gonna get released soon) and a similar ones to MEMORY STATS: * `overhead.db.hashtable.lut` (complements the existing `overhead.hashtable.main` and `overhead.hashtable.expires` which also counts the `dictEntry` structs too) * `overhead.db.hashtable.rehashing` - temporary rehashing overhead. * `db.dict.rehashing.count` - number of top level dictionaries being rehashed. --------- Co-authored-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com> Co-authored-by: Oran Agra <oran@redislabs.com>	2024-03-01 13:41:24 +08:00
Binbin	f17381a38d	Fix propagation of entries_read by calling streamPropagateGroupID unconditionally (#12898 ) In XREADGROUP ACK, because streamPropagateXCLAIM does not propagate entries-read, entries-read will be inconsistent between master and replicas. I.e. if no entries were claimed, it would have propagated correctly, but if some were claimed, then the entries-read field would be inconsistent on the replica. The fix was suggested by guybe7, call streamPropagateGroupID unconditionally, so that we will normalize entries_read on the replicas. In the past, we would only set propagate_last_id when NOACK was specified. And in #9127, XCLAIM did not propagate entries_read in ACK, which would cause entries_read to be inconsistent between master and replicas. Another approach is add another arg to XCLAIM and let it propagate entries_read, but we decided not to use it. Because we want minimal damage in case there's an old target and new source (in the worst case scenario, the new source doesn't recognize XGROUP SETID ... ENTRIES READ and the lag is lost. If we change XCLAIM, the damage is much more severe). In this patch, now if the user uses XREADGROUP .. COUNT 1 there will be an additional overhead of MULTI, EXEC and XGROUPSETID. We assume the extra command in case of COUNT 1 (4x factor, changing from one XCLAIM to MULTI+XCLAIM+XSETID+EXEC), is probably ok since reading just one entry is in any case very inefficient (a client round trip per record), so we're hoping it's not a common case. Issue was introduced in #9127.	2024-02-29 09:48:20 +02:00
zhaozhao.zz	cc9fbd270e	freeDictIfNeeded when kvstoreEmpty (#13098 ) just like `kvstoreDictDelete`, we need check `freeDictIfNeeded` when `kvstoreEmpty`.	2024-02-29 08:16:41 +02:00
Binbin	a7abc2f067	SCRIPT FLUSH run truly async, close lua interpreter in bio (#13087 ) Even if we have SCRIPT FLUSH ASYNC now, when there are a lot of lua scripts, SCRIPT FLUSH ASYNC will still block the main thread. This is because lua_close is executed in the main thread, and lua heap needs to release a lot of memory. In this PR, we take the current lua instance on lctx.lua and call lua_close on it in a background thread, to close it in async way. This is MeirShpilraien's idea.	2024-02-28 17:57:29 +02:00
LiiNen	763827c981	Fix redis-cli --count (for --scan, --bigkeys, etc) was ignored unless --pattern was also used (#13092 ) The --count option for redis-cli has been released in redis 7.2. https://github.com/redis/redis/pull/12042 But I have found in code, that some logic was missing for using this 'count' option. ``` static redisReply sendScan(unsigned long long it) { redisReply reply; if (config.pattern) reply = redisCommand(context, "SCAN %llu MATCH %b COUNT %d", it, config.pattern, sdslen(config.pattern), config.count); else reply = redisCommand(context,"SCAN %llu",it); ``` The intention was being able to using scan count. But in this case, the --count will be only applied when 'pattern' is declared. So, I had fix it simply, to be worked properly - even if --pattern option is not being used. I tested it simply with time() command several times, and I could see it works as intended with this commit. The examples of test results are below: ``` # unstable build time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan >/dev/null 2>/dev/null) real 0m1.287s user 0m0.011s sys 0m0.022s # count is not applied time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan --count 1000 >/dev/null 2>/dev/null) real 0m1.117s user 0m0.011s sys 0m0.020s # count is applied with --pattern time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan --count 1000 --pattern "hash:" >/dev/null 2>/dev/null) real 0m0.045s user 0m0.002s sys 0m0.002s ``` ``` # fix-redis-cli-scan-count build time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan >/dev/null 2>/dev/null) real 0m1.084s user 0m0.008s sys 0m0.024s # count is applied even if --pattern is not declared time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan --count 1000 >/dev/null 2>/dev/null) real 0m0.043s user 0m0.000s sys 0m0.004s # of course this also applied time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan --count 1000 --pattern "hash:*" >/dev/null 2>/dev/null) real 0m0.031s user 0m0.002s sys 0m0.002s ``` Thanks a lot.	2024-02-28 09:44:30 +02:00
Yanqi Lv	0a12f380e8	Optimize DEL on expired keys (#13080 ) If we call `DEL` on expired keys, keys may be deleted in `expireIfNeeded` and we don't need to call `dbSyncDelete` or `dbAsyncDelete` after, which repeat the deletion process(i.e. find keys in main db). In this PR, I refine the return values of `expireIfNeeded` to indicate whether we have deleted the expired key to avoid the potential redundant deletion logic in `delGenericCommand`. Besides, because both KEY_EXPIRED and KEY_DELETED are non-zero, this PR won't affect other functions calling `expireIfNeeded`. I also make a performance test. I first close active expiration by `debug set-active-expire 0` and write 1 million keys with 1ms TTL. Then I repeatedly delete 100 expired keys in one `DEL`. The results are as follow, which shows that this PR can improve performance by about 10% in this situation. unstable ``` Summary: throughput summary: 10080.65 requests per second latency summary (msec): avg min p50 p95 p99 max 0.953 0.136 0.959 1.215 1.335 2.247 ``` This PR ``` Summary: throughput summary: 11074.20 requests per second latency summary (msec): avg min p50 p95 p99 max 0.865 0.128 0.879 1.055 1.175 2.159 ``` --------- Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: Oran Agra <oran@redislabs.com>	2024-02-26 12:50:04 +02:00
Binbin	104b207602	Fix size stat in malloc(0) and cleanups around zmalloc file (#13068 ) In #8554, we added a MALLOC_MIN_SIZE to use a minimum allocation size when using malloc(0). However, we did not update the size, when malloc_size is missing. When malloc_size exists, we record the size that was allocated instead of the size that was requested. This would work with both jemalloc, and libc malloc (the change in #8554, doesn't break this). When malloc_size is missing, we allocate extra size_t bytes and store the requested size in it. In that case, the requested size is probably different than the allocated size anyway (the change in #8554 doesn't conceptually change that). So we have room for improvement since in this case we are aware of the extra bytes we asked for. Same as we're also aware of the extra size_t bytes we asked for. In addition, some cleaning was done: 1. fixes some outupdated comments. 2. test cleanups	2024-02-26 12:07:06 +02:00
Binbin	bfcaa7db0a	Fix minor memory leak in rewriteSetObject (#13086 ) It seems to be a leak caused by code refactoring in #11290. it's a small leak, that only happens if there's an IO error.	2024-02-22 14:46:56 +02:00
debing.sun	4a265554ae	Expose lua os.clock() api (#12971 ) Implement #12699 This PR exposing Lua os.clock() api for getting the elapsed time of Lua code execution. Using: ```lua local start = os.clock() ... do something ... local elpased = os.clock() - start ``` --------- Co-authored-by: Meir Shpilraien (Spielrein) <meir@redis.com> Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com>	2024-02-22 11:29:52 +02:00
debing.sun	165afc5f2a	Determine the large limit of the quicklist node based on fill (#12659 ) Following #12568 In issue #9357, when inserting an element larger than 1GB, we currently store it in a plain node instead of a listpack. Presently, when we insert an element that exceeds the maximum size of a packed node, it cannot be accommodated in any other nodes, thus ending up isolated like a large element. I.e. it's a node with only one element, but it's listpack encoded rather than a plain buffer. This PR lowers the threshold for considering an element as 'large' from 1GB to the maximum size of a node. While this change doesn't completely resolve the bug mentioned in the previous PR, it does mitigate its potential impact. As a result of this change, we can now only use LSET to replace an element with another element that falls below the maximum size threshold. In the worst-case scenario, with a fill of -5, the largest packed node we can create is 2GB (32k * 64k): * 32k: The smallest element in a listpack is 2 bytes, which allows us to store up to 32k elements. * 64k: This is the maximum size for a single quicklist node. ## Others To fully fix #9357, we need more work, as discussed in #12568, when we insert an element into a quicklistNode, it may be created in a new node, put into another node, or merged, and we can't correctly delete the node that was supposed to be deleted. I'm not sure it's worth it, since it involves a lot of modifications.	2024-02-22 10:02:38 +02:00
guybe7	820a4e45f1	Edit the history field of xinfo-consumers (#13078 ) Now it matches the information in xinfo-stream.json	2024-02-22 09:44:29 +02:00
Binbin	5b9fc46523	Add new allocator.muzzy field to memory-stats reply schema (#13076 ) This field was added in #12996 but forgot to add it in json file. This also causes reply-schemas-validator to fail.	2024-02-21 08:35:10 +02:00
debing.sun	f6785df663	Defragger improvements around large bins (#12996 ) Implement #12963 ## Changes 1. large bins don't have external fragmentation or are at least non-defraggable, so we should ignore the effect of large bins when measuring fragmentation, and only measure fragmentation of small bins. this affects both the allocator_frag* metrics and also the active-defrag trigger 2. Adding INFO metrics for `muzzy` memory, which is memory returned to the OS but still shows as RSS until the OS reclaims it. --------- Co-authored-by: Oran Agra <oran@redislabs.com>	2024-02-20 18:11:09 +02:00
Binbin	ca5cac998e	xinfo-stream add minimum to seen-time, skip logreqres in fuzzer (#13056 ) Recently I saw in CI that reply-schemas-validator fails here: ``` Failed validating 'minimum' in schema[1]['properties']['groups']['items']['properties']['consumers']['items']['properties']['active-time']: {'description': 'Last time this consumer was active (successful ' 'reading/claiming).', 'minimum': 0, 'type': 'integer'} On instance['groups'][0]['consumers'][0]['active-time']: -1729380548878722639 ``` The reason is that in fuzzer, we may restore corrupted active-time, which will cause the reply schema CI to fail. The fuzzer can cause corrupt the state in many places, which will bugs that mess up the reply, so we decided to skip logreqres. Also, seen-time is the same type as active-time, adding the minimum. --------- Co-authored-by: Oran Agra <oran@redislabs.com>	2024-02-20 12:21:10 +02:00
Binbin	3c2ea1ea95	Fix wathced client test timing issue caused by late close (#13062 ) There is a timing issue in the test, close may arrive late, or in freeClientAsync we will free the client in async way, which will lead to errors in watching_clients statistics, since we will only unwatch all keys when we truly freeClient. Add a wait here to avoid this problem. Also fixed some outdated comments i saw. The test was introduced in #12966.	2024-02-20 11:12:19 +02:00
Binbin	4e3be944fc	Fix timing issue in blockedclient test (#13071 ) We can see that the past time here happens to be busy_time_limit, causing the test to fail: ``` [err]: RM_Call from blocked client in tests/unit/moduleapi/blockedclient.tcl Expected '50' to be more than '50' (context: type eval line 26 cmd {assert_morethan [expr [clock clicks -milliseconds]-$start] $busy_time_limit} proc ::test) ``` It is reasonable for them to be equal, so equal is added here. It should be noted that in the previous `Busy module command` test, we also used assert_morethan_equal, so this should have been missed at the time.	2024-02-20 08:43:13 +02:00
judeng	fc3a68d8fb	add -fno-omit-frame-pointer to default complication flags (#12973 ) Currently redis uses O3 level optimization would remove the frame pointer in the target bin. In the very old past, when gcc optimized at O1 and above levels, the frame pointer is deleted by default to improve performance. This saves the RBP registers and reduces the pop/push instructions. But it makes it difficult for us to observe the running status of the program. For example, the perf tool cannot be used effectively, especially the modern eBPF tools such as bcc/memleak.	2024-02-19 11:47:02 -08:00
guybe7	6df42df291	Adds a README to the command JSON files (#13066 ) Add readme about the command json folder, what it does, and who should (not) use it. see discussion https://github.com/redis/redis/issues/9359#issuecomment-1936420698 --------- Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: Binbin <binloveplay1314@qq.com>	2024-02-19 18:49:31 +02:00
zhaozhao.zz	8876d264ac	Calculate the incremental rehash time more precisely (#13063 ) In the `databasesCron()`, the time consumed by `kvstoreIncrementallyRehash()` is used to calculate the exit condition. However, within `kvstoreIncrementallyRehash()`, the loop first checks for timeout before performing rehashing. Therefore, the time for the last rehash isn't accounted for, making the consumed time inaccurate. We need to precisely calculate all the time spent on rehashing. Additionally, the time allocated to `kvstoreIncrementallyRehash()` should be the remaining time, which is `INCREMENTAL_REHASHING_THRESHOLD_US` minus the already consumed `elapsed_us`.	2024-02-19 14:29:54 +02:00
Binbin	9103ccc398	AOF_FSYNC_EVERYSEC higher resolution, change aof_last_fsync and aof_flush_postponed_start to use mstime (#13041 ) Currently aof_last_fsync is using a low resolution unixtime is really bad, it checks if the absolute number of (full) seconds changed by one. depending on which side of the second barrier it falls, we can get very different results. This PR change the resolution to use milliseconds instead of complete seconds. In cases where the event loop cycle duration is short and their rapid (e.g. running many fast commands with short pipeline, or a high `hz` config), this change will not make much difference, since in anyway, we'll be quick to detect that we're on a "new second", and it's likely that these fsync will always be executed close to the second switch barrier. But in cases of rare or slow event loops cycles (e.g. either slow commands, or very low rate of traffic to redis, and low `hz`), it could easily be that with the old code, in some cases we'll have over 1.5 seconds between fsyncs, and in others less than 0.5. see discussion in #8612 This PR also handle aof_flush_postponed_start as well, the damage there is smaller since the threshold is 2 seconds, and not 1. --------- Co-authored-by: Oran Agra <oran@redislabs.com>	2024-02-18 12:08:29 +02:00
Binbin	dd92dd8fb5	redis-cli - fix sscanf incorrect return-value check warnings (#13059 ) From CodeQL: The result of scanf is only checked against 0, but it can also return EOF. Reported in https://github.com/redis/redis/security/code-scanning/38. Reported in https://github.com/redis/redis/security/code-scanning/39.	2024-02-18 10:55:11 +02:00
zhaozhao.zz	50d6fe8c4b	Add metrics for WATCH (#12966 ) Redis has some special commands that mark the client's state, such as `subscribe` and `blpop`, which mark the client as `CLIENT_PUBSUB` or `CLIENT_BLOCKED`, and we have metrics for the special use cases. However, there are also other special commands, like `WATCH`, which although do not have a specific flags, and should also be considered stateful client types. For stateful clients, in many scenarios, the connections cannot be shared in "connection pool", meaning connection pool cannot be used. For example, whenever the `WATCH` command is executed, a new connection is required to put the client into the "watch state" because the watched keys are stored in the client. If different business logic requires watching different keys, separate connections must be used; otherwise, there will be contamination. This also means that if a user's business heavily relies on the `WATCH` command, a large number of connections will be required. Recently we have encountered this situation in our platform, where some users consume a significant number of connections when using Redis because of `WATCH`. I hope we can have a way to observe these special use cases and special client connections. Here I add a few monitoring metrics: 1. `watching_clients` in `INFO` reply: The number of clients currently in the "watching" state. 2. `total_watched_keys` in `INFO` reply: The total number of keys being watched. 3. `watch` in `CLIENT LIST` reply: The number of keys each client is currently watching.	2024-02-18 10:36:41 +02:00
Binbin	c854873746	Minor optimization in kvstoreDictAddRaw when dict exists (#13054 ) Usually, the probability that a dict exists is much greater than the probability that it does not exist. In kvstoreDictAddRaw, we will call kvstoreGetDict multiple times. Based on this assumption, we change createDictIfNeeded to something like get or create function: ``` before: dict exist: 2 kvstoreGetDict dict non-exist: 2 kvstoreGetDict after: dict exist: 1 kvstoreGetDict dict non-exist: 3 kvstoreGetDict ``` A possible 3% performance improvement was observed: In addition, some typos/comments i saw have been cleaned up.	2024-02-15 18:07:24 +02:00
Binbin	063de675e0	zunionInterDiffGenericCommand use ztrycalloc to avoid OOM panic (#13052 ) In low memory situations, sending a big number of arguments (sets) may cause OOM panic. Use ztrycalloc, like we do on LCS and XAUTOCLAIM, and fail gracefully. This change affects the following commands: ZUNION, ZINTER, ZDIFF, ZUNIONSTORE, ZINTERSTORE, ZDIFFSTORE, ZINTERCARD.	2024-02-15 10:49:10 +02:00
Binbin	32f44da510	Increase tolerance range to block reprocess tests to avoid timing issues (#13053 ) These tests have all failed in daily CI: ``` * [err]: Blocking XREADGROUP for stream key that has clients blocked on stream - reprocessing command in tests/unit/type/stream-cgroups.tcl Expected '1101' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test) * [err]: BLPOP unblock but the key is expired and then block again - reprocessing command in tests/unit/type/list.tcl Expected '1101' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test) *** [err]: BZPOPMIN unblock but the key is expired and then block again - reprocessing command in tests/unit/type/zset.tcl Expected '1103' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test) ``` Increase the range to avoid failures, and improve the comment to be clearer. tests was introduced in #13004.	2024-02-15 10:44:49 +02:00
Sankar	c1d2ac2a73	Do not include gossip about receiver in cluster messages (#13046 ) The receiver does not update any of its cluster state based on gossip about itself. This commit explicitly avoids sending or processing gossip about the receiver. Currently cluster bus gossips include 10% of nodes in the cluster with a minimum of 3 nodes. For up to 30 node clusters, this commit makes sure that 1/3 of the gossip (1 out of 3 gossips) is never discarded. This should help with relatively faster convergence of cluster state in general.	2024-02-13 16:38:37 -08:00
YaacovHazan	e9c795e777	Fix loading rdb opcode RDB_OPCODE_RESIZEDB (#13050 ) Following the changes introduced by `8cd62f82c`, the dbExpandExpires used the db_size instead of expires_size. Co-authored-by: YaacovHazan <yaacov.hazan@redislabs.com>	2024-02-12 21:55:37 +02:00
YaacovHazan	7ca0b84af6	Fix loading rdb opcode RDB_OPCODE_SLOT_INFO (#13049 ) Following the changes introduced by `8cd62f82c`, the kvstoreDictExpand for the expires kvstore used the slot_size instead of expires_slot_size. Co-authored-by: YaacovHazan <yaacov.hazan@redislabs.com>	2024-02-12 21:46:06 +02:00
Binbin	8eeece4ab3	Fix CLIENAT KILL MAXAGE test timing issue (#13047 ) This test fails occasionally: ``` *** [err]: CLIENT KILL maxAGE will kill old clients in tests/unit/introspection.tcl Expected 2 == 1 (context: type eval line 14 cmd {assert {$res == 1}} proc ::test) ``` This test is very likely to do a false positive if the execute time takes longer than the max age, for example, if the execution time between sleep and kill exceeds 1s, rd2 will also be killed due to the max age. The test can adjust the order of execution statements to increase the probability of passing, but this is still will be a timing issue in some slow machines, so decided give it a few more chances. The test was introduced in #12299.	2024-02-12 08:11:33 +02:00
debing.sun	676f27acb0	Fix the failure of defrag test under 32-bit (#13013 ) Fail CI: https://github.com/redis/redis/actions/runs/7837608438/job/21387609715 ## Why defragment tests only failed under 32-bit First of all, under 32-bit jemalloc will allocate more small bins and less large bins, which will also lead to more external fragmentation, therefore, the fragmentation ratio is higher in 32-bit than in 64-bit, so the defragment tests(`Active defrag eval scripts: cluster` and `Active defrag big keys: cluster`) always fails in 32-bit. ## Why defragment tests only failed with cluster The fowllowing is the result of `Active defrag eval scripts: cluster` test. 1) Before #11695, the fragmentation ratio is 3.11%. 2) After #11695, the fragmentation ratio grew to 4.58%. Since we are using per-slot dictionary to manage slots, we will only defragment the contents of these dictionaries (keys, values), but not the dictionaries' struct and ht_table, which means that frequent shrinking and expanding of the dictionaries, will make more fragments. 3) After #12850 and #12948, In cluster mode, a large number of cluster slot dicts will be shrunk, creating additional fragmention, and the dictionary will not be defragged. ## Solution * Add defragmentation of the per-slot dictionary's own structures, dict struct and ht_table. ## Other change * Increase floating point print precision of `frags` and `rss` in debug logs for defrag --------- Co-authored-by: Oran Agra <oran@redislabs.com>	2024-02-11 15:12:42 +02:00
Binbin	493e31e3ad	Add new DEBUG dict-resizing command to disable the dict resize (#13043 ) The test fails here and there: ``` *** [err]: expire scan should skip dictionaries with lot's of empty buckets in tests/unit/expire.tcl scan didn't handle slot skipping logic. ``` There are two case: 1. In the case of passing the test, we use child process to avoid the dict resize, but it can not completely limit it, since in the dictDelete we still have chance to trigger the resize (hit the force radio). The reason why our test passed before is because the expire dict is still in the rehashing process, so the dictDelete, the dictShrinkIfNeeded can not trigger the resize. 2. In the case of failing the test, the expire dict finished the rehashing, so the last dictDelete, the dictShrinkIfNeeded trigger the dict resize since it hit the force radio, so the skipping logic fail. This PR add a new DEBUG command to disbale the dict resize.	2024-02-08 16:39:58 +02:00
Binbin	813327b231	Fix SORT STORE quicklist with the right options (#13042 ) We forgot to call quicklistSetOptions after createQuicklistObject, in the sort store scenario, we will create a quicklist with default fill or compress options. This PR adds fill and depth parameters to createQuicklistObject to specify that options need to be set after creating a quicklist. This closes #12871. release notes: > Fix lists created by SORT STORE to respect list compression and packing configs.	2024-02-08 14:36:11 +02:00
debing.sun	1e8dc1da0d	Fix crash due to merge of quicklist node introduced by #12955 (#13040 ) Fix two crash introducted by #12955 When a quicklist node can't be inserted and split, we eventually merge the current node with its neighboring nodes after inserting, and compress the current node and its siblings. 1. When the current node is merged with another node, the current node may become invalid and can no longer be used. Solution: let `_quicklistMergeNodes()` return the merged nodes. 3. If the current node is a LZF quicklist node, its recompress will be 1. If the split node can be merged with a sibling node to become head or tail, recompress may cause the head and tail to be compressed, which is not allowed. Solution: always recompress to 0 after merging.	2024-02-08 14:29:16 +02:00
Binbin	81666a6510	Fix heap-use-after-free when pubsubshard_channels became NULL (#13038 ) After fix for #13033, address sanitizer reports this heap-use-after-free error. When the pubsubshard_channels dict becomes empty, we will delete the dict, and the dictReleaseIterator will call dictResetIterator, it will use the dict so we will trigger the error. This PR introduced a new struct kvstoreDictIterator to wrap dictIterator. Replace the original dict iterator with the new kvstore dict iterator. --------- Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: guybe7 <guy.benoish@redislabs.com>	2024-02-07 14:53:50 +02:00
Binbin	886b117031	Fix dict don't rehash when there is child test (#13035 ) The reason is the same as #13016. The reason is that in #12819, in cron, in addition to trying to shrink, we will also tyring to expand. The dict was expanded by cron before we trigger the bgsave since we do have the enough keys (4096) to hit the radio. Before the bgsave, we only add 4095 keys to avoid this issue.	2024-02-07 09:19:18 +02:00
debing.sun	1f00c951c2	Prevent LSET command from causing quicklist plain node size to exceed 4GB (#12955 ) Fix #12864 The main reason for this crash is that when replacing a element of a quicklist packed node with lpReplace() method, if the final size is larger than 4GB, lpReplace() will fail and returns NULL, causing `node->entry` to be incorrectly set to NULL. Since the inserted data is not a large element, we can't just replace it like a large element, first quicklistInsertAfter() and then quicklistDelIndex(), because the current node may be merged and invalidated in quicklistInsertAfter(). The solution of this PR: When replacing a node fails (listpack exceeds 4GB), split the current node, create a new node to put in the middle, and try to merge them. This is the same as inserting a large element. In the worst case, its size will not exceed 4GB.	2024-02-06 18:21:28 +02:00
Gann	0777dc7896	Improve error handling in connSocketBlockingConnect for various connction failures (#13008 ) This commit addresses a problem in connSocketBlockingConnect where different types of connection failures, including timeouts and other errors, were not consistently handled. Previously, the function did not return C_ERR immediately after detecting a connection failure, which could lead to inconsistent states and misinterpretation of the connection status. With this update, connSocketBlockingConnect now correctly returns C_ERR upon encountering any connection error, ensuring that all types of connection failures are handled consistently and the behavior of the function aligns with expected outcomes in case of connection issues. Closes #12900	2024-02-06 14:31:08 +02:00
Binbin	8096515432	Fix invalid dictNext usage when pubsubshard_channels became empty (#13033 ) After #12822, when pubsubshard_channels became empty, kvstoreDictDelete will delete the dict (which is the only one currently deleting dicts that become empty) and in the next loop, we will make an invalid call to dictNext. After the dict becomes empty, we break out of the loop without calling dictNext.	2024-02-06 13:41:02 +02:00
Binbin	13bd3643c2	Re-compute active_defrag_running after adjusting defrag configurations (#13020 ) Currently, once active defrag starts, we can not adjust active_defrag_running downwards. This is because active_defrag_running will be dynamically compute based on the fragmentation, we think we should not lower the effort when the fragmentation drops. However, we need to note that active_defrag_running will also be dynamically computed based on configurations. In this case, we are not respecting cycle-min or cycle-max. Some people may realize halfway through that defrag consumes a lot and want to adjust it. Previously we could only turn off activedefrag and then turn it on again to adjust active_defrag_running downwards. So in this PR, when a active defrag configuration change is made, we will re-compute it. These configuration items are: - active-defrag-cycle-min - active-defrag-cycle-max - active-defrag-threshold-upper	2024-02-06 13:39:07 +02:00
Binbin	87eaf119cd	Minor optimization for expire dict in defragKey (#13027 ) Since now a DB in cluster mode is divided into 16384 dicts, here we directly check kvstoreDictSize instead of kvstoreSize, which may have a higher probability that we can save the lookup. The other change is a cleanup, obviously kvstoreGetHash should be applied to the db->expires dicts.	2024-02-06 12:19:44 +02:00
Binbin	84fd745d65	Fix kvstore unable to push resize_cursor for resize when dict is NULL (#13031 ) When the dict is NULL, we also need to push resize_cursor, otherwise it will keep doing useless continue here, and there is no way to resize the other dict behind it. Introduced in #12822. --------- Co-authored-by: Oran Agra <oran@redislabs.com>	2024-02-06 09:41:14 +02:00
guybe7	8cd62f82ca	Refactor the per-slot dict-array db.c into a new kvstore data structure (#12822 ) # Description Gather most of the scattered `redisDb`-related code from the per-slot dict PR (#11695) and turn it to a new data structure, `kvstore`. i.e. it's a class that represents an array of dictionaries. # Motivation The main motivation is code cleanliness, the idea of using an array of dictionaries is very well-suited to becoming a self-contained data structure. This allowed cleaning some ugly code, among others: loops that run twice on the main dict and expires dict, and duplicate code for allocating and releasing this data structure. # Notes 1. This PR reverts the part of https://github.com/redis/redis/pull/12848 where the `rehashing` list is global (handling rehashing `dict`s is under the responsibility of `kvstore`, and should not be managed by the server) 2. This PR also replaces the type of `server.pubsubshard_channels` from `dict**` to `kvstore` (original PR: https://github.com/redis/redis/pull/12804). After that was done, server.pubsub_channels was also chosen to be a `kvstore` (with only one `dict`, which seems odd) just to make the code cleaner by making it the same type as `server.pubsubshard_channels`, see `pubsubtype.serverPubSubChannels` 3. the keys and expires kvstores are currenlty configured to allocate the individual dicts only when the first key is added (unlike before, in which they allocated them in advance), but they won't release them when the last key is deleted. Worth mentioning that due to the recent change the reply of DEBUG HTSTATS changed, in case no keys were ever added to the db. before: ``` 127.0.0.1:6379> DEBUG htstats 9 [Dictionary HT] Hash table 0 stats (main hash table): No stats available for empty dictionaries [Expires HT] Hash table 0 stats (main hash table): No stats available for empty dictionaries ``` after: ``` 127.0.0.1:6379> DEBUG htstats 9 [Dictionary HT] [Expires HT] ```	2024-02-05 17:21:35 +02:00
Binbin	f20774eced	Fix active expire timeout when db done the scanning (#13030 ) When db->expires_cursor==0, it means the DB is done the scanning, we should exit the loop to avoid the useless scanning. It is easy to see the active expire timeout in the modified test, for example, let's assume that there is only 1 expired key in the DB, and the size / buckets ratio is less than 1%, which means that we will skip it in isExpiryDictValidForSamplingCb, and the return value of expires_cursor is 0. Because `data.sampled == 0` is always true, so `repeat` is also always true, we will keep scanning the DB, but every time it is skipped by the previous judgment (expires_cursor = 0), until the timelimit is finally exhausted.	2024-02-05 16:56:46 +02:00
Daz	02a87885e6	Add missing structural API changes to JSON file (#12434 ) The JSON file lacks the following structural API changes: - GEORADIUSBYMEMBER: add the ANY option for COUNT since 6.2.0. - GEORADIUSBYMEMBER_RO: add the ANY option for COUNT since 6.2.0. - GEORADIUS_RO: Added support for uppercase unit names since 7.0.0. - GEORADIUSBYMEMBER_RO: Added support for uppercase unit names since 7.0.0. --------- Signed-off-by: daz-3ux <daz-3ux@proton.me> Co-authored-by: bodong.ybd <bodong.ybd@alibaba-inc.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: yangpengda.333 <yangpengda.333@bytedance.com> Co-authored-by: Oran Agra <oran@redislabs.com>	2024-02-04 08:42:15 +02:00
Yanqi Lv	c1041c2c0d	Make db->avg_ttl more precise (#12949 ) Currently, We compute `db->avg_ttl` after each short `dbScan` sweep (a few buckets without checking the time limit). But after each `dbScan` sweep, we don't have much data and this makes the db->avg_ttl less precise. For example, even if we scan the whole db, we can't get the exact avg_ttl because we separate the data. i.e. because of the running average, if we issue 16 calls to scan, we'll give lower weight to the first one, and higher weight to the last one. I think we should calculate `db->avg_ttl` until completing more of the db iteration (judgement of time limit or the beginning of iterating next db) because we have more sample data in this db and can get more accurate result. In the best case, if we scan the whole db, we can get the exact avg_ttl. In this PR, we postpone the avg_ttl calculation until the judgement of time limit or iteration of next db, so we can accumulate more data to get more precise avg_ttl. Note that we still need to make sure to decay the old TTLs at the same speed as before, which is why we want to run the decay mechanism several times, or use the Pow formula, see the comment in the code. In my experiment, this PR can improve 89% or 52% accuracy in different workload. Co-authored-by: Oran Agra <oran@redislabs.com>	2024-02-04 08:34:26 +02:00
Yanqi Lv	62153b3b2f	Refine the purpose of rdb saving with accurate flags (#12925 ) In Redis, rdb is produced in three scenarios mainly. - backup, such as `bgsave` and `save` command - full sync in replication - aof rewrite if `aof-use-rdb-preamble` is yes We also have some RDB flags to identify the purpose of rdb saving. ```C /* flags on the purpose of rdb save or load / #define RDBFLAGS_NONE 0 / No special RDB loading. / #define RDBFLAGS_AOF_PREAMBLE (1<<0) / Load/save the RDB as AOF preamble. / #define RDBFLAGS_REPLICATION (1<<1) / Load/save for SYNC. / ``` But currently, it seems that these flags and purposes of rdb saving don't exactly match. I find it in `rdbSaveRioWithEOFMark` which calls `startSaving` with `RDBFLAGS_REPLICATION` but `rdbSaveRio` with `RDBFLAGS_NONE`. ```C int rdbSaveRioWithEOFMark(int req, rio rdb, int error, rdbSaveInfo rsi) { char eofmark[RDB_EOF_MARK_SIZE]; startSaving(RDBFLAGS_REPLICATION); getRandomHexChars(eofmark,RDB_EOF_MARK_SIZE); if (error) error = 0; if (rioWrite(rdb,"$EOF:",5) == 0) goto werr; if (rioWrite(rdb,eofmark,RDB_EOF_MARK_SIZE) == 0) goto werr; if (rioWrite(rdb,"\r\n",2) == 0) goto werr; if (rdbSaveRio(req,rdb,error,RDBFLAGS_NONE,rsi) == C_ERR) goto werr; if (rioWrite(rdb,eofmark,RDB_EOF_MARK_SIZE) == 0) goto werr; stopSaving(1); return C_OK; werr: / Write error. / / Set 'error' only if not already set by rdbSaveRio() call. / if (error && error == 0) *error = errno; stopSaving(0); return C_ERR; } ``` In this PR, I refine the purpose of rdb saving with accurate flags.	2024-02-01 13:41:02 +02:00
Binbin	9a7d311855	Fix dict resize allow test (#13016 ) Ci report this failure: ``` *** [err]: Don't rehash if used memory exceeds maxmemory after rehash in tests/unit/maxmemory.tcl Expected '4098' to equal or match '4002' WARNING: the new maxmemory value set via CONFIG SET (1176088) is smaller than the current memory usage (1231083) ``` It can be seen from the log that used_memory changed before we set maxmemory. The reason is that in #12819, in cron, in addition to trying to shrink, we will also tyring to expand. The dict was expanded by cron before we set maxmemory, causing the test to fail. Before setting maxmemory, we only add 4095 keys to avoid triggering resize.	2024-01-31 13:11:52 +02:00
Binbin	6016973ac0	Fix module assertion crash when timer and timeout are unlocked in the same event loop (#13015 ) When we use a timer to unblock a client in module, if the timer period and the block timeout are very close, they will unblock the client in the same event loop, and it will trigger the assertion. The reason is that in moduleBlockedClientTimedOut we will protect against re-processing, so we don't actually call updateStatsOnUnblock (see #12817), so we are not able to reset the c->duration. The reason is unblockClientOnTimeout() didn't realize that bc had been unblocked. We add a function to the module to determine if bc is blocked, and then use it in unblockClientOnTimeout() to exit. There is the stack: ``` beforeSleep blockedBeforeSleep handleBlockedClientsTimeout checkBlockedClientTimeout unblockClientOnTimeout unblockClient resetClient -- assertion, crash the server 'c->duration == 0' is not true ```	2024-01-31 13:10:19 +02:00
Binbin	74a6e48a3d	Fix module unblock crash due to no timeout_callback (#13017 ) The block timeout is passed in the test case, but we do not pass in the timeout_callback, and it will crash when unlocking. In this case, in moduleBlockedClientTimedOut we will check timeout_callback. There is the stack: ``` beforeSleep blockedBeforeSleep handleBlockedClientsTimeout checkBlockedClientTimeout unblockClientOnTimeout replyToBlockedClientTimedOut moduleBlockedClientTimedOut -- timeout_callback is NULL, invalidFunctionWasCalled bc->timeout_callback(&ctx,(void**)c->argv,c->argc); ```	2024-01-31 09:28:50 +02:00
Chen Tianjie	f469dd8ca6	Add novalues option to command HSCAN. (#12765 ) Add a way to HSCAN a hash key, and get only the filed names. Command syntax is now: ``` HSCAN key cursor [MATCH pattern] [COUNT count] [NOVALUES] ``` when `NOVALUES` is on, the command will only return keys in the hash. --------- Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2024-01-30 20:32:58 +02:00

1 2 3 4 5 ...

12087 Commits