redict

mirror of https://codeberg.org/redict/redict.git synced 2025-01-22 08:08:53 -05:00

Author	SHA1	Message	Date
guybe7	c2a4b78491	WAITAOF: Update fsynced_reploff_pending even if there's nothing to fsync (#12622 ) The problem is that WAITAOF could have hang in case commands were propagated only to replicas. This can happen if a module uses RM_Call with the REDISMODULE_ARGV_NO_AOF flag. In that case, master_repl_offset would increase, but there would be nothing to fsync, so in the absence of other traffic, fsynced_reploff_pending would stay the static, and WAITAOF can hang. This commit updates fsynced_reploff_pending to the latest offset in flushAppendOnlyFile in case there's nothing to fsync. i.e. in case it's behind because of the above mentions case it'll be refreshed and release the WAITAOF. Other changes: Fix a race in wait.tcl (client getting blocked vs. the fsync thread)	2023-09-28 17:19:20 +03:00
guybe7	bfa3931a04	WAITAOF: Update fsynced_reploff_pending just before starting the initial AOFRW fork (#12620 ) If we set `fsynced_reploff_pending` in `startAppendOnly`, and the fork doesn't start immediately (e.g. there's another fork active at the time), any subsequent commands will increment `server.master_repl_offset`, but will not cause a fsync (given they were executed before the fork started, they just ended up in the RDB part of it) Therefore, any WAITAOF will wait on the new master_repl_offset, but it will time out because no fsync will be executed. Release notes: ``` WAITAOF could timeout in the absence of write traffic in case a new AOF is created and an AOFRW can't immediately start. This can happen by the appendonly config is changed at runtime, but also after FLUSHALL, and replica full sync. ```	2023-09-28 17:05:53 +03:00
Viktor Söderqvist	f924bebd83	Rewrite huge printf calls to smaller ones for readability (#12257 ) In a long printf call with many placeholders, it's hard to see which argument belongs to which placeholder. The long printf-like calls in the INFO and CLIENT commands are rewritten into pairs of (format, argument). These pairs are then rewritten to a single call with a long format string and a long list of arguments, using a macro called FMTARGS. The file `fmtargs.h` is added to the repo. Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com>	2023-09-28 09:21:23 +03:00
Binbin	9fe63bdc80	Dump server logs when corrupt fuzzer reports crash (#12612 ) Recently we found some signal crashes, but unable to reproduce them. It is a good idea to dump the server logs when a failure happens.	2023-09-27 09:08:18 +03:00
Sankar	8cdeddc81c	Clear owner_not_claiming_slot bit for the slot in clusterDelSlot (#12564 ) Clear owner_not_claiming_slot bit for the slot in clusterDelSlot to keep it consistent with slot ownership information.	2023-09-26 14:03:27 -07:00
Nir Rattner	24187ed8e3	Fix overflow calculation for next timer event (#12474 ) The `retval` variable is defined as an `int`, so with 4 bytes, it cannot properly represent microsecond values greater than the equivalent of about 35 minutes. This bug shouldn't impact standard Redis behavior because Redis doesn't have timer events that are scheduled as far as 35 minutes out, but it may affect custom Redis modules which interact with the event timers via the RM_CreateTimer API. The impact is that `usUntilEarliestTimer` may return 0 for as long as `retval` is scaled to an overflowing value. While `usUntilEarliestTimer` continues to return `0`, `aeApiPoll` will have a zero timeout, and so Redis will use significantly more CPU iterating through its event loop without pause. For timers scheduled far enough into the future, Redis will cycle between ~35 minute periods of high CPU usage and ~35 minute periods of standard CPU usage.	2023-09-24 13:31:12 +03:00
meiravgri	cc2be63997	Print stack trace from all threads in crash report (#12453 ) In this PR we are adding the functionality to collect all the process's threads' backtraces. ## Changes made in this PR ### introduce threads mngr API The threads mngr API which has 2 abilities: * `ThreadsManager_init() `- register to SIGUSR2. called on the server start-up. * ` ThreadsManager_runOnThreads()` - receives a list of a pid_t and a callback, tells every thread in the list to invoke the callback, and returns the output collected by each invocation. Elaborating atomicvar API * `atomicIncrGet(var,newvalue_var,count) `-- Increment and get the atomic counter new value * `atomicFlagGetSet` -- Get and set the atomic counter value to 1 ### Always set SIGALRM handler SIGALRM handler prints the process's stacktrace to the log file. Up until now, it was set only if the `server.watchdog_period` > 0. This can be also useful if debugging is needed. However, in situations where the server can't get requests, (a deadlock, for example) we weren't able to change the signal handler. To make it available at run time we set SIGALRM handler on server startup. The signal handler name was changed to a more general `sigalrmSignalHandler`. ### Print all the process' threads' stacktraces `logStackTrace()` now calls `writeStacktraces()`, instead of logging the current thread stacktrace. `writeStacktraces()`: * On Linux systems we use the threads manager API to collect the backtraces of all the process' threads. To get the `tids` list (threads ids) we read the `/proc/<redis-server-pid>/tasks` file which includes a list of directories. Each directory name corresponds to one tid (including the main thread). For each thread, we also need to check if it can get the signal from the threads manager (meaning it is not blocking/ignoring that signal). We send the threads manager this tids list and `collect_stacktrace_data()` callback, which collects the thread's backtrace addresses, its name, and tid. * On other systems, the behavior remained as it was (writing only the current thread stacktrace to the log file). ## compatibility notes 1. The threads mngr API is only supported in linux. 2. glibc earlier than 2.3 We use `syscall(SYS_gettid)` and `syscall(SYS_tgkill...)` because their dedicated alternatives (`gettid()` and `tgkill`) were added in glibc 2.3. ## Output example Each thread backtrace will have the following format: `<tid> <thread_name> [additional_info]` * tid: as read from the `/proc/<redis-server-pid>/tasks` file * thread_name: the tread name as it is registered in the os/ * additional_info: Sometimes we want to add specific information about one of the threads. currently. it is only used to mark the thread that handles the backtraces collection by adding "". In case of crash - this also indicates which thread caused the crash. The handling thread in won't necessarily appear first. ``` ------ STACK TRACE ------ EIP: /lib/aarch64-linux-gnu/libc.so.6(epoll_pwait+0x9c)[0xffffb9295ebc] 67089 redis-server linux-vdso.so.1(__kernel_rt_sigreturn+0x0)[0xffffb9437790] /lib/aarch64-linux-gnu/libc.so.6(epoll_pwait+0x9c)[0xffffb9295ebc] redis-server :6379(+0x75e0c)[0xaaaac2fe5e0c] redis-server :6379(aeProcessEvents+0x18c)[0xaaaac2fe6c00] redis-server :6379(aeMain+0x24)[0xaaaac2fe7038] redis-server :6379(main+0xe0c)[0xaaaac3001afc] /lib/aarch64-linux-gnu/libc.so.6(+0x273fc)[0xffffb91d73fc] /lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0x98)[0xffffb91d74cc] redis-server :6379(_start+0x30)[0xaaaac2fe0370] 67093 bio_lazy_free /lib/aarch64-linux-gnu/libc.so.6(+0x79dfc)[0xffffb9229dfc] /lib/aarch64-linux-gnu/libc.so.6(pthread_cond_wait+0x208)[0xffffb922c8fc] redis-server :6379(bioProcessBackgroundJobs+0x174)[0xaaaac30976e8] /lib/aarch64-linux-gnu/libc.so.6(+0x7d5c8)[0xffffb922d5c8] /lib/aarch64-linux-gnu/libc.so.6(+0xe5d1c)[0xffffb9295d1c] 67091 bio_close_file /lib/aarch64-linux-gnu/libc.so.6(+0x79dfc)[0xffffb9229dfc] /lib/aarch64-linux-gnu/libc.so.6(pthread_cond_wait+0x208)[0xffffb922c8fc] redis-server :6379(bioProcessBackgroundJobs+0x174)[0xaaaac30976e8] /lib/aarch64-linux-gnu/libc.so.6(+0x7d5c8)[0xffffb922d5c8] /lib/aarch64-linux-gnu/libc.so.6(+0xe5d1c)[0xffffb9295d1c] 67092 bio_aof /lib/aarch64-linux-gnu/libc.so.6(+0x79dfc)[0xffffb9229dfc] /lib/aarch64-linux-gnu/libc.so.6(pthread_cond_wait+0x208)[0xffffb922c8fc] redis-server :6379(bioProcessBackgroundJobs+0x174)[0xaaaac30976e8] /lib/aarch64-linux-gnu/libc.so.6(+0x7d5c8)[0xffffb922d5c8] /lib/aarch64-linux-gnu/libc.so.6(+0xe5d1c)[0xffffb9295d1c] 67089:signal-handler (1693824528) -------- ```	2023-09-24 09:47:23 +03:00
Chen Tianjie	2aad03fa39	Use server.current_client to decide whether cluster commands should return TLS info. (#12569 ) Starting a change in #12233 (released in 7.2), CLUSTER commands use client's connection to decide whether to return TLS port or non-TLS port, but commands called by Lua script and module's RM_Call don't have a real client with connection, and would currently be regarded as non-TLS connections. We can use server.current_client instead when it is available. When it is not (module calls commands without a real client), we may see this as an undefined behavior, and return null or default port (currently in this PR it returns default port, judged by server.tls_cluster).	2023-09-21 18:41:32 +03:00
Binbin	4031a18732	Fix that slot return in CLUSTER SHARDS should be integer (#12561 ) An unintentional change was introduced in #10536, we used to use addReplyLongLong and now it is addReplyBulkLonglong, revert it back the previous behavior.	2023-09-09 23:33:00 -07:00
Binbin	96e9dec419	Bump codespell from 2.2.4 to 2.2.5 (#12557 ) and adjustments.	2023-09-08 16:10:17 +03:00
nihohit	90e9fc387c	Update command tips on more admin / configuration commands (#12545 ) Updated the command tips for ACL SAVE / SETUSER / DELUSER, CLIENT SETNAME / SETINFO, and LATENCY RESET. The tips now match CONFIG SET, since there's a similar behavior for all of these commands - the user expects to update the various configurations & states on all nodes, not only on a single, random node. For LATENCY RESET the response tip is now agg_sum. Co-authored-by: Shachar Langbeheim <shachlan@amazon.com>	2023-09-04 21:30:42 +03:00
secwall	a2046c1eb1	Check shard_id pointer validity in updateShardId (#12538 ) When connecting between a 7.0 and 7.2 cluster, the 7.0 cluster will not populate the shard_id field, which is expect on the 7.2 cluster. This is not intended behavior, as the 7.2 cluster is supposed to use a temporary shard_id while the node is in the upgrading state, but it wasn't being correctly set in this case.	2023-09-02 20:14:48 -07:00
alonre24	044e29dd34	redis-benchmark - add the support for binary strings (#9414 ) Recently, the option of sending an argument from stdin using `-x` flag was added to redis-benchmark (this option is available in redis-cli as well). However, using the `-x` option for sending a blobs that contains null-characters doesn't work as expected - the argument is trimmed in the first occurrence of `\X00` (unlike in redis-cli). This PR aims to fix this issue and add the support for every binary string input, by sending arguments length to `redisFormatCommandArgv` when processing redis-benchmark command, so we won't treat the arguments as C-strings. Additionally, we add a simple test coverage for `-x` (without binary strings, and also remove an excessive server started in tests, and make sure to select db 0 so that `r` and the benchmark work on the same db. Co-authored-by: Oran Agra <oran@redislabs.com>	2023-09-02 15:37:04 +03:00
Binbin	4ba144a4eb	Add logreqres:skip flag to new INFO obuf limit test (#12537 ) The new test added in #12476 causes reply-schemas-validator to fail. When doing `catch {r get key}`, the req-res output is: ``` 3 get 3 key 12 __argv_end__ $100000 aaaaaaaaaaaaaaaaaaaa...4 info 5 stats 12 __argv_end__ =1670 txt:# Stats ... ``` And we can see the link after `$100000`, there is a 4 in the last, it break the req-res-log-validator script since the format is wrong. The reason i guess is after the client reconnection (after the output buf limit), we will not add newlines, but append args directly. Since obuf-limits.tcl is doing the same thing, and it had the logreqres:skip flag, so this PR is following it.	2023-09-01 14:15:11 +03:00
Roshan Khatri	49f7d173b4	Remove unnecessary use of sds and mem copy in module.c (#12533 ) Found that in moduleConfigValidityCheck and isModuleConfigNameRegistered, sds is not required. This also allowed to remove unnecessary memcopy from some of the config registering APIs.	2023-08-31 14:08:05 -07:00
icy17	370d38016f	Fix potential crash on failed OpenSSL init (#12447 )	2023-08-31 22:45:36 +03:00
Chen Tianjie	b26e8e3213	Optimize ZRANGE offset location from linear search to skiplist jump. (#12450 ) ZRANGE BYSCORE/BYLEX with [LIMIT offset count] option was using every level in skiplist to jump to the first/last node in range, but only use level[0] in skiplist to locate the node at offset, resulting in sub-optimal performance using LIMIT: ``` while (ln && offset--) { if (reverse) { ln = ln->backward; } else { ln = ln->level[0].forward; } } ``` It could be slow when offset is very big. We can get the total rank of the offset location and use skiplist to jump to it. It is an improvement from O(offset) to O(log rank). Below shows how this is implemented (if the offset is positve): Use the skiplist to seach for the first element in the range, record its rank `rank_0`, so we can have the rank of the target node `rank_t`. Meanwhile we record the last node we visited which has zsl->level-1 levels and its rank `rank_1`. Then we start from the zsl->level-1 node, use skiplist to go forward `rank_t-rank_1` nodes to reach the target node. It is very similiar when the offset is reversed. Note that if `rank_t` is very close to `rank_0`, we just start from the first element in range and go node by node, this for the case when zsl->level-1 node is to far away and it is quicker to reach the target node by node. Here is a test using a random generated zset including 10000 elements (with different positive scores), doing a bench mark which compares how fast the `ZRANGE` command is exucuted before and after the optimization. The start score is set to 0 and the count is set to 1 to make sure that most of the time is spent on locating the offset. ``` memtier_benchmark -h 127.0.0.1 -p 6379 --command="zrange test 0 +inf byscore limit <offset> 1" ``` \| offset \| QPS(unstable) \| QPS(optimized) \| \|--------\|--------\|--------\| \| 10 \| 73386.02 \| 74819.82 \| \| 1000 \| 48084.96 \| 73177.73 \| \| 2000 \| 31156.79 \| 72805.83 \| \| 5000 \| 10954.83 \| 71218.21 \| With the result above, we can see that the original code is greatly slowed down when offset gets bigger, and with the optimization the speed is almost not affected. Similiar results are generated when testing reversed offset: ``` memtier_benchmark -h 127.0.0.1 -p 6379 --command="zrange test +inf 0 byscore rev limit <offset> 1" ``` \| offset \| QPS(unstable) \| QPS(optimized) \| \|--------\|--------\|--------\| \| 10 \| 74505.14 \| 71653.67 \| \| 1000 \| 46829.25 \| 72842.75 \| \| 2000 \| 28985.48 \| 73669.01 \| \| 5000 \| 11066.22 \| 73963.45 \| And the same conclusion is drawn from the tests of ZRANGE BYLEX.	2023-08-31 14:42:08 +03:00
Binbin	9ce8c54d74	Update sort_ro reply_schema to mention the null reply (#12534 ) Also added a test to cover this case, so this can cover the reply schemas check.	2023-08-31 06:36:35 +03:00
Roshan Khatri	7519960527	Allows modules to declare new ACL categories. (#12486 ) This PR adds a new Module API int RM_AddACLCategory(RedisModuleCtx ctx, const char category_name) to add a new ACL command category. Here, we initialize the ACLCommandCategories array by allocating space for 64 categories and duplicate the 21 default categories from the predefined array 'ACLDefaultCommandCategories' into the ACLCommandCategories array while ACL initialization. Valid ACL category names can only contain alphanumeric characters, underscores, and dashes. The API when called, checks for the onload flag, category name validity, and for duplicate category name if present. If the conditions are satisfied, the API adds the new category to the trailing end of the ACLCommandCategories array and assigns the acl_categories flag bit according to the index at which the category is added. If any error is encountered the errno is set accordingly by the API. --------- Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>	2023-08-30 13:01:24 -07:00
bodong.ybd	b59f53efb3	Fix sort_ro get-keys function return wrong key number (#12522 ) Before： ``` 127.0.0.1:6379> command getkeys sort_ro key (empty array) 127.0.0.1:6379> ``` After: ``` 127.0.0.1:6379> command getkeys sort_ro key 1) "key" 127.0.0.1:6379> ```	2023-08-30 22:00:02 +03:00
Chen Tianjie	e3d4b30d09	Add two stats to count client input and output buffer oom. (#12476 ) Add these INFO metrics: * client_query_buffer_limit_disconnections * client_output_buffer_limit_disconnections Sometimes it is useful to monitor whether clients reaches size limit of query buffer and output buffer, to decide whether we need to adjust the buffer size limit or reduce client query payload.	2023-08-30 21:51:14 +03:00
nihohit	4b281ce519	Align CONFIG RESETSTAT/REWRITE tips with SET. (#12530 ) Since the three commands have similar behavior (change config, return OK), the tips that govern how they should behave should be similar. Co-authored-by: Shachar Langbeheim <shachlan@amazon.com>	2023-08-30 21:49:02 +03:00
Binbin	e792653753	Add printing for LATENCY related tests (#12514 ) This test failed several times: ``` *** [err]: LATENCY GRAPH can output the event graph in tests/unit/latency-monitor.tcl Expected '478' to be more than or equal to '500' (context: type eval line 8 cmd {assert_morethan_equal $high 500} proc ::test) ``` Not sure why, adding some verbose printing that'll print the command result on the next time.	2023-08-27 11:42:55 +03:00
Danilo Bargen	a6eff389b5	redis.conf: Add data loss warning to "appendonly" (#12506 ) warning against editing the config file and restarting the server. which will attempt to load an AOF file and disregard the RDB. Co-authored-by: Oran Agra <oran@redislabs.com>	2023-08-22 18:15:47 +03:00
Binbin	1407ac1f3e	BITCOUNT and BITPOS with non-existing key and illegal arguments should return error, not 0 (#11734 ) BITCOUNT and BITPOS with non-existing key will return 0 even the arguments are error, before this commit: ``` > flushall OK > bitcount s 0 (integer) 0 > bitpos s 0 0 1 hello (integer) 0 > set s 1 OK > bitcount s 0 (error) ERR syntax error > bitpos s 0 0 1 hello (error) ERR syntax error ``` The reason is that we judged non-existing before parameter checking and returned. This PR fixes it, and after this commit: ``` > flushall OK > bitcount s 0 (error) ERR syntax error > bitpos s 0 0 1 hello (error) ERR syntax error ``` Also BITPOS made the same fix as #12394, check for wrong argument, before checking for key. ``` > lpush mylist a b c (integer) 3 > bitpos mylist 1 a b (error) WRONGTYPE Operation against a key holding the wrong kind of value ```	2023-08-21 19:48:30 +03:00
Wen Hui	45d3310694	BITCOUNT: check for argument, before checking for key (#12394 ) Generally, In any command we first check for the argument and then check if key exist. Some of the examples are ``` 127.0.0.1:6379> getrange no-key invalid1 invalid2 (error) ERR value is not an integer or out of range 127.0.0.1:6379> setbit no-key 1 invalid (error) ERR bit is not an integer or out of range 127.0.0.1:6379> xrange no-key invalid1 invalid2 (error) ERR Invalid stream ID specified as stream command argument ``` Before change ``` bitcount no-key invalid1 invalid2 0 ``` After change ``` bitcount no-key invalid1 invalid2 (error) ERR value is not an integer or out of range ```	2023-08-21 12:53:46 +03:00
Binbin	c98a28a848	Fix LREM count LONG_MIN overflow minor issue (#12465 ) Limit the range of LREM count to -LONG_MAX ~ LONG_MAX. Before the fix, passing -LONG_MAX would cause an overflow and would effectively be the same as passing 0. (Because this condition `toremove && removed == toremove `can never be satisfied). This is a minor fix as it shouldn't really affect users, more like a cleanup.	2023-08-21 12:50:41 +03:00
Yves LeBras	16988208bd	config.memkeys init for consistency (#12505 ) Initializing `memkeys` to 0 for consistency and clarity. the config struct is anyway zeroed, but other fields are explicitly initialized.	2023-08-21 08:17:07 +03:00
Wen Hui	e532c95dfc	Added tests for Client commands (#10276 ) In our test case, now we missed some test coverage for client sub-commands. This pr goal is to add some test coverage cases of the following commands: Client caching Client kill Client no-evict Client pause Client reply Client tracking Client setname At the very least, this is useful to make sure there are no leaks and crashes in these code paths.	2023-08-20 19:17:51 +03:00
meiravgri	fe47c2027b	Signal handler attributes (#12426 ) This PR purpose is to make the crash report process thread safe. main changes include: 1. `setupSigSegvHandler()` is introduced to initialize the signal handler. This function first initializes the signal handler mutex (if not initialized yet) and then registers the process to the signal handler. 2. sigsegvHandler flags : SA_NODEFER - don't add the signal to the process signal mask. We use this flag because we want to be able to handle a second call to the signal manually. removed SA_RESETHAND: this flag resets the signal handler function upon the first entrance to the registered function. The reason to use this flag is to protect from recursively entering the signal handler by the same thread. But, it also means that if a second thread crashes while handling a signal, the process will be terminated immediately and we won't get the crash report. In this PR we discard this flag. The signal handler guard described below purpose is to solve the above issues. 3. Add a signal handler lock with ERRORCHECK attributes. The lock's purpose is to ensure that only one thread generates a crash report. Once a second thread enters the signal handler it will be blocked. We use the ERRORCHECK lock in order to protect from possible deadlock in case the thread handling the crash gets a signal. In the latest scenario, we log what we have collected until the handler crashed. At the end of the crash report we reset the signal handler SIG_DFL, with no flags, and rethrow the signal to generate a core dump (if enabled) and exit the process. During the work on this PR we wanted to understand the historical reasons for how crash is handled. With respect to the choice of the flag, we believe the SA_RESETHAND was not added for any specific purpose. SA_ONSTACK which is removed here from bugReportEnd(), was originally also set in the initial registration to signal handler, but removed in `3ada43e73`. In addition, it was removed from another location in `deee2c1ef` with the following description, which is also relevant to why it should be removed from bugReportEnd: > it seems to be some valgrind bug with SA_ONSTACK. > SA_ONSTACK seems unneeded since WD is not recursive (SA_NODEFER was removed), > also, not sure if it's even valid without a call to sigaltstack()	2023-08-20 19:16:45 +03:00
Binbin	44cc0fcb9d	redis-cli --stat take dbnum value from CONFIG GET to output total keys (#12279 ) In the past we hardcoded it to 20, causing it to not count keys for more databases.	2023-08-16 10:54:37 +03:00
Tyler Bream (Event pipeline)	ac6bc5d1a8	redis-cli: Fix print of keys per cluster host when over int max (#11698 ) When running cluster info, and the number of keys overflows the integer value, the summary no longer makes sense. This fixes by using an appropriate type to handle values over the max int value.	2023-08-16 10:48:49 +03:00
WangYu	17904780ae	skip the rehashed entries in dictNext (#12386 ) If dict is rehashing, the entries in the head of table[0] is moved to table[1] and all entries in `table[0][0:rehashidx]` is NULL. `dictNext` start looking for non-NULL entry from table 0 index 0, and the first call of `dictNext` on a rehashing dict will Iterate many times to skip those NULL entries. We can easily skip those entries by setting `iter->index` as `iter->d->rehashidx` when dict is rehashing and it's the first call of dictNext (`iter->index == -1 && iter->table == 0`). Co-authored-by: sundb <sundbcn@gmail.com>	2023-08-16 10:45:26 +03:00
Wen Hui	965dc90b72	change return type to be consistant (#12479 ) Currently rdbSaveMillisecondTime, rdbSaveDoubleValue api's return type is int but they return the value directly from rdbWriteRaw function which has the return type of ssize_t. As this may cause overflow to int so changed to ssize_t.	2023-08-16 10:38:59 +03:00
Oran Agra	2b8cde71bb	Update supported version list. (#12488 ) Add 7.2, drop 6.0 as per https://redis.io/docs/about/releases/ Also replace a few concordances of the `’` char, with standard `'`	2023-08-16 08:36:40 +03:00
Binbin	f4549d1cf4	Fix CLUSTER REPLICAS time complexity, should be O(N) (#12477 ) We iterate over all replicas to get the result, the time complexity should be O(N), like CLUSTER NODES complexity is O(N).	2023-08-14 20:57:55 -07:00
Madelyn Olson	7c179f9bf4	Fixed a bug where sequential matching ACL rules weren't compressed (#12472 ) When adding a new ACL rule was added, an attempt was made to remove any "overlapping" rules. However, there when a match was found, the search was not resumed at the right location, but instead after the original position of the original command. For example, if the current rules were `-config +config\|get` and a rule `+config` was added. It would identify that `-config` was matched, but it would skip over `+config\|get`, leaving the compacted rule `-config +config`. This would be evaluated safely, but looks weird. This bug can only be triggered with subcommands, since that is the only way to have sequential matching rules. Resolves #12470. This is also only present in 7.2. I think there was also a minor risk of removing another valid rule, since it would start the search of the next command at an arbitrary point. I couldn't find a valid offset that would have cause a match using any of the existing commands that have subcommands with another command.	2023-08-10 09:58:53 +03:00
Binbin	6abfda54c3	Fix flaky SENTINEL RESET test (#12437 ) After SENTINEL RESET, sometimes the sentinel can sense the master again, causing the test to fail. Here we give it a few more chances.	2023-08-10 08:58:52 +03:00
zhaozhao.zz	1b6bdff48d	optimize the check of kill pubsub clients after modifying ACL rules (#12457 ) if there are no subscribers, we can ignore the operation	2023-08-05 10:00:54 +03:00
zhaozhao.zz	8226f39fb2	do not call handleClientsBlockedOnKeys inside yielding command (#12459 ) Fix the assertion when a busy script (timeout) signal ready keys (like LPUSH), and then an arbitrary client's `allow-busy` command steps into `handleClientsBlockedOnKeys` try wake up clients blocked on keys (like BLPOP). Reproduction process: 1. start a redis with aof `./redis-server --appendonly yes` 2. exec blpop `127.0.0.1:6379> blpop a 0` 3. use another client call a busy script and this script push the blocked key `127.0.0.1:6379> eval "redis.call('lpush','a','b') while(1) do end" 0` 4. user a new client call an allow-busy command like auth `127.0.0.1:6379> auth a` BTW, this issue also break the atomicity of script. This bug has been around for many years, the old versions only have the atomic problem, only 7.0/7.2 has the assertion problem. Co-authored-by: Oran Agra <oran@redislabs.com>	2023-08-05 09:52:03 +03:00
sundb	da9c2804a5	Avoid mostly harmless integer overflow in cjson (#12456 ) This PR mainly fixes a possible integer overflow in `json_append_string()`. When we use `cjson.encoding()` to encode a string larger than 2GB, at specific compilation flags, an integer overflow may occur leading to truncation, resulting in the part of the string larger than 2GB not being encoded. On the other hand, this overflow doesn't cause any read or write out-of-range or segment fault. 1) using -O0 for lua_cjson (`make LUA_DEBUG=yes`) In this case, `i` will overflow and leads to truncation. When `i` reaches `INT_MAX+1` and overflows to INT_MIN, when compared to len, `i` (1000000..00) is expanded to 64 bits signed integer (1111111.....000000) . At this point i will be greater than len and jump out of the loop, so `for (i = 0; i < len; i++)` will loop up to 2^31 times, and the part of larger than 2GB will be truncated. ```asm `i` => -0x24(%rbp) <+253>: addl $0x1,-0x24(%rbp) ; overflow if i large than 2^31 <+257>: mov -0x24(%rbp),%eax <+260>: movslq %eax,%rdx ; move a 32-bit value with sign extension into a 64-bit signed <+263>: mov -0x20(%rbp),%rax <+267>: cmp %rax,%rdx ; check `i < len` <+270>: jb 0x212600 <json_append_string+148> ``` 2) using -O2/-O3 for lua_cjson (`make LUA_DEBUG=no`, the default) In this case, because singed integer overflow is an undefined behavior, `i` will not overflow. `i` will be optimized by the compiler and use 64-bit registers for all subsequent instructions. ```asm <+180>: add $0x1,%rbx ; Using 64-bit register `rbx` for i++ <+184>: lea 0x1(%rdx),%rsi <+188>: mov %rsi,0x10(%rbp) <+192>: mov %al,(%rcx,%rdx,1) <+195>: cmp %rbx,(%rsp) ; check `i < len` <+199>: ja 0x20b63a <json_append_string+154> ``` 3) using 32bit Because `strbuf_ensure_empty_length()` preallocates memory of length (len * 6 + 2), in 32-bit `cjson.encode()` can only handle strings smaller than ((2 ^ 32) - 3 ) / 6. So 32bit is not affected. Also change `i` in `strbuf_append_string()` to `size_t`. Since its second argument `str` is taken from the `char2escape` string array which is never larger than 6, so `strbuf_append_string()` is not at risk of overflow (the bug was unreachable).	2023-08-05 07:57:06 +03:00
Binbin	7af9f4b36e	Fix GEOHASH / GEODIST / GEOPOS time complexity, should be O(1) (#12445 ) GEOHASH / GEODIST / GEOPOS use zsetScore to get the score, in skiplist encoding, we use dictFind to get the score, which is O(1), same as ZSCORE command. It is not clear why these commands had O(Log(N)), and O(N) until now.	2023-08-05 07:29:24 +03:00
Meir Shpilraien (Spielrein)	2ee1bbb53b	Ensure that the function load timeout is disabled during loading from RDB/AOF and on replicas. (#12451 ) When loading a function from either RDB/AOF or a replica, it is essential not to fail on timeout errors. The loading time may vary due to various factors, such as hardware specifications or the system's workload during the loading process. Once a function has been successfully loaded, it should be allowed to load from persistence or on replicas without encountering a timeout failure. To maintain a clear separation between the engine and Redis internals, the implementation refrains from directly checking the state of Redis within the engine itself. Instead, the engine receives the desired timeout as part of the library creation and duly respects this timeout value. If Redis wishes to disable any timeout, it can simply send a value of 0.	2023-08-02 11:43:31 +03:00
zhaozhao.zz	90ab91f00b	fix false success and a memory leak for ACL selector with bad parenthesis combination (#12452 ) When doing merge selector, we should check whether the merge has started (i.e., whether open_bracket_start is -1) every time. Otherwise, encountering an illegal selector pattern could succeed and also cause memory leaks, for example: ``` acl setuser test1 (+PING (+SELECT (+DEL ) ``` The above would leak memory and succeed with only DEL being applied, and would now error after the fix. Co-authored-by: Oran Agra <oran@redislabs.com>	2023-08-02 10:46:06 +03:00
Diego Lopez Recas	b653c759cd	Fix race condition in tests/unit/auth.tcl (#12444 ) Changing the masterauth while turning into a replica is racy. Turn into replica after changing the masterauth instead.	2023-08-01 18:03:33 +03:00
DarrenJiang13	6abb3c4038	change log match to line match in tcl sanitizer_errors_from_file. (#12446 ) In the tcl foreach loop, the function should compare line rather than the whole file.	2023-07-30 08:48:29 +03:00
Harkrishn Patro	42985b00ea	Test coverage for incr/decr operation on robj encoding type optimization (#12435 ) Additional test coverage for incr/decr operation. integer number could be present in raw encoding format due to operation like append. A incr/decr operation following it optimize the string to int encoding format.	2023-07-25 16:43:31 -07:00
zhaozhao.zz	01eb939a06	update monitor client's memory and evict correctly (#12420 ) A bug introduced in #11657 (7.2 RC1), causes client-eviction (#8687) and INFO to have inaccurate memory usage metrics of MONITOR clients. Because the type in `c->type` and the type in `getClientType()` are confusing (in the later, `CLIENT_TYPE_NORMAL` not `CLIENT_TYPE_SLAVE`), the comment we wrote in `updateClientMemUsageAndBucket` was wrong, and in fact that function didn't skip monitor clients. And since it doesn't skip monitor clients, it was wrong to delete the call for it from `replicationFeedMonitors` (it wasn't a NOP). That deletion could mean that the monitor client memory usage is not always up to date (updated less frequently, but still a candidate for client eviction).	2023-07-25 16:10:38 +03:00
nihohit	9f512017aa	Update request/response policies. (#12417 ) changing the response and request policy of a few commands, see https://redis.io/docs/reference/command-tips 1. RANDOMKEY used to have no response policy, which means that when sent to multiple shards, the responses should be aggregated. this normally applies to commands that return arrays, but since RANDOMKEY replies with a simple string, it actually requires a SPECIAL response policy (for the client to select just one) 2. SCAN used to have no response policy, but although the key names part of the response can be aggregated, the cursor part certainly can't. 3. MSETNX had a request policy of MULTI_SHARD and response policy of AGG_MIN, but in fact the contract with MSETNX is that when one key exists, it returns 0 and doesn't set any key, routing it to multiple shards would mean that if one failed and another succeeded, it's atomicity is broken and it's impossible to return a valid response to the caller. Co-authored-by: Shachar Langbeheim <shachlan@amazon.com> Co-authored-by: Oran Agra <oran@redislabs.com>	2023-07-25 10:21:23 +03:00
Harkrishn Patro	34b95f752c	Add test case for APPEND command usage on integer value (#12429 ) Add test coverage to validate object encoding update on APPEND command usage on a integer value	2023-07-24 18:25:50 -07:00

... 3 4 5 6 7 ...

12055 Commits