redict

mirror of https://codeberg.org/redict/redict.git synced 2025-01-22 16:18:28 -05:00

Author	SHA1	Message	Date
Binbin	2dd5c3a180	Minor fix to print, set to str (#11934 ) * Minor fix to print, set to str `{commands_filename}` the extra {} actually make it become a Set, and the output print was like this: ``` Processing json files... Linking container command to subcommands... Checking all commands... Generating {'commands'}.c... All done, exiting. ``` Introduced in #11920 * more fix	2023-03-17 14:20:54 +02:00
Meir Shpilraien (Spielrein)	d0da0a6a3f	Support for RM_Call on blocking commands (#11568 ) Allow running blocking commands from within a module using `RM_Call`. Today, when `RM_Call` is used, the fake client that is used to run command is marked with `CLIENT_DENY_BLOCKING` flag. This flag tells the command that it is not allowed to block the client and in case it needs to block, it must fallback to some alternative (either return error or perform some default behavior). For example, `BLPOP` fallback to simple `LPOP` if it is not allowed to block. All the commands must respect the `CLIENT_DENY_BLOCKING` flag (including module commands). When the command invocation finished, Redis asserts that the client was not blocked. This PR introduces the ability to call blocking command using `RM_Call` by passing a callback that will be called when the client will get unblocked. In order to do that, the user must explicitly say that he allow to perform blocking command by passing a new format specifier argument, `K`, to the `RM_Call` function. This new flag will tell Redis that it is allow to run blocking command and block the client. In case the command got blocked, Redis will return a new type of call reply (`REDISMODULE_REPLY_PROMISE`). This call reply indicates that the command got blocked and the user can set the on_unblocked handler using `RM_CallReplyPromiseSetUnblockHandler`. When clients gets unblocked, it eventually reaches `processUnblockedClients` function. This is where we check if the client is a fake module client and if it is, we call the unblock callback instead of performing the usual unblock operations. Notice: `RM_CallReplyPromiseSetUnblockHandler` must be called atomically along side the command invocation (without releasing the Redis lock in between). In addition, unlike other CallReply types, the promise call reply must be released by the module when the Redis GIL is acquired. The module can abort the execution on the blocking command (if it was not yet executed) using `RM_CallReplyPromiseAbort`. the API will return `REDISMODULE_OK` on success and `REDISMODULE_ERR` if the operation is already executed. Notice that in case of misbehave module, Abort might finished successfully but the operation will not really be aborted. This can only happened if the module do not respect the disconnect callback of the blocked client. For pure Redis commands this can not happened. ### Atomicity Guarantees The API promise that the unblock handler will run atomically as an execution unit. This means that all the operation performed on the unblock handler will be wrapped with a multi exec transaction when replicated to the replica and AOF. The API do not grantee any other atomicity properties such as when the unblock handler will be called. This gives us the flexibility to strengthen the grantees (or not) in the future if we will decide that we need a better guarantees. That said, the implementation does provide a better guarantees when performing pure Redis blocking command like `BLPOP`. In this case the unblock handler will run atomically with the operation that got unblocked (for example, in case of `BLPOP`, the unblock handler will run atomically with the `LPOP` operation that run when the command got unblocked). This is an implementation detail that might be change in the future and the module writer should not count on that. ### Calling blocking commands while running on script mode (`S`) `RM_Call` script mode (`S`) was introduced on #0372. It is used for usecases where the command that was invoked on `RM_Call` comes from a user input and we want to make sure the user will not run dangerous commands like `shutdown`. Some command, such as `BLPOP`, are marked with `NO_SCRIPT` flag, which means they will not be allowed on script mode. Those commands are marked with `NO_SCRIPT` just because they are blocking commands and not because they are dangerous. Now that we can run blocking commands on RM_Call, there is no real reason not to allow such commands on script mode. The underline problem is that the `NO_SCRIPT` flag is abused to also mark some of the blocking commands (notice that those commands know not to block the client if it is not allowed to do so, and have a fallback logic to such cases. So even if those commands were not marked with `NO_SCRIPT` flag, it would not harm Redis, and today we can already run those commands within multi exec). In addition, not all blocking commands are marked with `NO_SCRIPT` flag, for example `blmpop` are not marked and can run from within a script. Those facts shows that there are some ambiguity about the meaning of the `NO_SCRIPT` flag, and its not fully clear where it should be use. The PR suggest that blocking commands should not be marked with `NO_SCRIPT` flag, those commands should handle `CLIENT_DENY_BLOCKING` flag and only block when it's safe (like they already does today). To achieve that, the PR removes the `NO_SCRIPT` flag from the following commands: * `blmove` * `blpop` * `brpop` * `brpoplpush` * `bzpopmax` * `bzpopmin` * `wait` This might be considered a breaking change as now, on scripts, instead of getting `command is not allowed from script` error, the user will get some fallback behavior base on the command implementation. That said, the change matches the behavior of scripts and multi exec with respect to those commands and allow running them on `RM_Call` even when script mode is used. ### Additional RedisModule API and changes * `RM_BlockClientSetPrivateData` - Set private data on the blocked client without the need to unblock the client. This allows up to set the promise CallReply as the private data of the blocked client and abort it if the client gets disconnected. * `RM_BlockClientGetPrivateData` - Return the current private data set on a blocked client. We need it so we will have access to this private data on the disconnect callback. * On RM_Call, the returned reply will be added to the auto memory context only if auto memory is enabled, this allows us to keep the call reply for longer time then the context lifetime and does not force an unneeded borrow relationship between the CallReply and the RedisModuleContext.	2023-03-16 14:04:31 +02:00
Binbin	484b73a842	Fix usleep compilation warning in auth.c (#11925 ) There is a -Wimplicit-function-declaration warning in here: ``` auth.c: In function ‘AuthBlock_ThreadMain’: auth.c:116:5: warning: implicit declaration of function ‘usleep’; did you mean ‘sleep’? [-Wimplicit-function-declaration] 116 \| usleep(500000); \| ^~~~~~ \| sleep ```	2023-03-16 11:24:52 +02:00
Binbin	0b159b34ea	Bump codespell to 2.2.4, fix typos and outupdated comments (#11911 ) Fix some seen typos and wrong comments.	2023-03-16 08:50:32 +02:00
KarthikSubbarao	f8a5a4f70c	Custom authentication for Modules (#11659 ) This change adds new module callbacks that can override the default password based authentication associated with ACLs. With this, Modules can register auth callbacks through which they can implement their own Authentication logic. When `AUTH` and `HELLO AUTH ...` commands are used, Module based authentication is attempted and then normal password based authentication is attempted if needed. The new Module APIs added in this PR are - `RM_RegisterCustomAuthCallback` and `RM_BlockClientOnAuth` and `RedisModule_ACLAddLogEntryByUserName `. Module based authentication will be attempted for all Redis users (created through the ACL SETUSER cmd or through Module APIs) even if the Redis user does not exist at the time of the command. This gives a chance for the Module to create the RedisModule user and then authenticate via the RedisModule API - from the custom auth callback. For the AUTH command, we will support both variations - `AUTH <username> <password>` and `AUTH <password>`. In case of the `AUTH <password>` variation, the custom auth callbacks are triggered with “default” as the username and password as what is provided. ### RedisModule_RegisterCustomAuthCallback ``` void RM_RegisterCustomAuthCallback(RedisModuleCtx ctx, RedisModuleCustomAuthCallback cb) { ``` This API registers a callback to execute to prior to normal password based authentication. Multiple callbacks can be registered across different modules. These callbacks are responsible for either handling the authentication, each authenticating the user or explicitly denying, or deferring it to other authentication mechanisms. Callbacks are triggered in the order they were registered. When a Module is unloaded, all the auth callbacks registered by it are unregistered. The callbacks are attempted, in the order of most recently registered callbacks, when the AUTH/HELLO (with AUTH field is provided) commands are called. The callbacks will be called with a module context along with a username and a password, and are expected to take one of the following actions: (1) Authenticate - Use the RM_Authenticate API successfully and return `REDISMODULE_AUTH_HANDLED`. This will immediately end the auth chain as successful and add the OK reply. (2) Block a client on authentication - Use the `RM_BlockClientOnAuth` API and return `REDISMODULE_AUTH_HANDLED`. Here, the client will be blocked until the `RM_UnblockClient `API is used which will trigger the auth reply callback (provided earlier through the `RM_BlockClientOnAuth`). In this reply callback, the Module should authenticate, deny or skip handling authentication. (3) Deny Authentication - Return `REDISMODULE_AUTH_HANDLED` without authenticating or blocking the client. Optionally, `err` can be set to a custom error message. This will immediately end the auth chain as unsuccessful and add the ERR reply. (4) Skip handling Authentication - Return `REDISMODULE_AUTH_NOT_HANDLED` without blocking the client. This will allow the engine to attempt the next custom auth callback. If none of the callbacks authenticate or deny auth, then password based auth is attempted and will authenticate or add failure logs and reply to the clients accordingly. ### RedisModule_BlockClientOnAuth ``` RedisModuleBlockedClient RM_BlockClientOnAuth(RedisModuleCtx ctx, RedisModuleCustomAuthCallback reply_callback, void (free_privdata)(RedisModuleCtx,void)) ``` This API can only be used from a Module from the custom auth callback. If a client is not in the middle of custom module based authentication, ERROR is returned. Otherwise, the client is blocked and the `RedisModule_BlockedClient` is returned similar to the `RedisModule_BlockClient` API. ### RedisModule_ACLAddLogEntryByUserName ``` int RM_ACLAddLogEntryByUserName(RedisModuleCtx ctx, RedisModuleString username, RedisModuleString object, RedisModuleACLLogEntryReason reason) ``` Adds a new entry in the ACL log with the `username` RedisModuleString provided. This simplifies the Module usage because now, developers do not need to create a Module User just to add an error ACL Log entry. Aside from accepting username (RedisModuleString) instead of a RedisModuleUser, it is the same as the existing `RedisModule_ACLAddLogEntry` API. ### Breaking changes - HELLO command - Clients can now only set the client name and RESP protocol from the `HELLO` command if they are authenticated. Also, we now finish command arg validation first and return early with a ERR reply if any arg is invalid. This is to avoid mutating the client name / RESP from a command that would have failed on invalid arguments. ### Notable behaviors - Module unblocking - Now, we will not allow Modules to block the client from inside the context of a reply callback (triggered from the Module unblock flow `moduleHandleBlockedClients`). --------- Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com>	2023-03-15 15:18:42 -07:00
Binbin	58285a6e92	Fix WAITAOF mix-use last_offset and last_numreplicas (#11922 ) There be a situation that satisfies WAIT, and then wrongly unblock WAITAOF because we mix-use last_offset and last_numreplicas. We update last_offset and last_numreplicas only when the condition matches. i.e. output of either replicationCountAOFAcksByOffset or replicationCountAcksByOffset is right. In this case, we need to have separate last_ variables for each of them. Added a last_aof_offset and last_aof_numreplicas for WAITAOF. WAITAOF was added in #11713. Found while coding #11917. A Test was added to validate that case.	2023-03-15 18:16:16 +02:00
Ozan Tezcan	72f5aad014	Use older string format to support earlier python versions (#11920 ) Redis build runs `utils/generate-command-code.py` if there is a change in `src/commands/*.json` files. In https://github.com/redis/redis/pull/10273, we used f-string format in this script. f-string feature was introduced in python3.6. If a system has an earlier python version, build might fail. Added some changes to make that script compatible with earlier python versions.	2023-03-15 13:55:15 +02:00
Binbin	70b2c4f5fd	Fix WAITAOF reply when using last_offset and last_numreplicas (#11917 ) WAITAOF wad added in #11713, its return is an array. But forget to handle WAITAOF in last_offset and last_numreplicas, causing WAITAOF to return a WAIT like reply. Tests was added to validate that case (both WAIT and WAITAOF). This PR also refactored processClientsWaitingReplicas a bit for better maintainability and readability.	2023-03-15 11:07:04 +02:00
Kaige Ye	5360350e4a	cleanup NBSP characters in comments (#10555 ) Replace NBSP character (0xC2 0xA0) with space (0x20). Looks like that was originally added due to misconfigured editor which seems to have been fixed by now.	2023-03-15 11:05:42 +02:00
Slava Koyfman	9344f654c6	Implementing the WAITAOF command (issue #10505 ) (#11713 ) Implementing the WAITAOF functionality which would allow the user to block until a specified number of Redises have fsynced all previous write commands to the AOF. Syntax: `WAITAOF <num_local> <num_replicas> <timeout>` Response: Array containing two elements: num_local, num_replicas num_local is always either 0 or 1 representing the local AOF on the master. num_replicas is the number of replicas that acknowledged the a replication offset of the last write being fsynced to the AOF. Returns an error when called on replicas, or when called with non-zero num_local on a master with AOF disabled, in all other cases the response just contains number of fsync copies. Main changes: * Added code to keep track of replication offsets that are confirmed to have been fsynced to disk. * Keep advancing master_repl_offset even when replication is disabled (and there's no replication backlog, only if there's an AOF enabled). This way we can use this command and it's mechanisms even when replication is disabled. * Extend REPLCONF ACK to `REPLCONF ACK <ofs> FACK <ofs>`, the FACK will be appended only if there's an AOF on the replica, and already ignored on old masters (thus backwards compatible) * WAIT now no longer wait for the replication offset after your last command, but rather the replication offset after your last write (or read command that caused propagation, e.g. lazy expiry). Unrelated changes: * WAIT command respects CLIENT_DENY_BLOCKING (not just CLIENT_MULTI) Implementation details: * Add an atomic var named `fsynced_reploff_pending` that's updated (usually by the bio thread) and later copied to the main `fsynced_reploff` variable (only if the AOF base file exists). I.e. during the initial AOF rewrite it will not be used as the fsynced offset since the AOF base is still missing. * Replace close+fsync bio job with new BIO_CLOSE_AOF (AOF specific) job that will also update fsync offset the field. * Handle all AOF jobs (BIO_CLOSE_AOF, BIO_AOF_FSYNC) in the same bio worker thread, to impose ordering on their execution. This solves a race condition where a job could set `fsynced_reploff_pending` to a higher value than another pending fsync job, resulting in indicating an offset for which parts of the data have not yet actually been fsynced. Imposing an ordering on the jobs guarantees that fsync jobs are executed in increasing order of replication offset. * Drain bio jobs when switching `appendfsync` to "always" This should prevent a write race between updates to `fsynced_reploff_pending` in the main thread (`flushAppendOnlyFile` when set to ALWAYS fsync), and those done in the bio thread. * Drain the pending fsync when starting over a new AOF to avoid race conditions with the previous AOF offsets overriding the new one (e.g. after switching to replicate from a new master). * Make sure to update the fsynced offset at the end of the initial AOF rewrite. a must in case there are no additional writes that trigger a periodic fsync, specifically for a replica that does a full sync. Limitations: It is possible to write a module and a Lua script that propagate to the AOF and doesn't propagate to the replication stream. see REDISMODULE_ARGV_NO_REPLICAS and luaRedisSetReplCommand. These features are incompatible with the WAITAOF command, and can result in two bad cases. The scenario is that the user executes command that only propagates to AOF, and then immediately issues a WAITAOF, and there's no further writes on the replication stream after that. 1. if the the last thing that happened on the replication stream is a PING (which increased the replication offset but won't trigger an fsync on the replica), then the client would hang forever (will wait for an fack that the replica will never send sine it doesn't trigger any fsyncs). 2. if the last thing that happened is a write command that got propagated properly, then WAITAOF will be released immediately, without waiting for an fsync (since the offset didn't change) Refactoring: * Plumbing to allow bio worker to handle multiple job types This introduces infrastructure necessary to allow BIO workers to not have a 1-1 mapping of worker to job-type. This allows in the future to assign multiple job types to a single worker, either as a performance/resource optimization, or as a way of enforcing ordering between specific classes of jobs. Co-authored-by: Oran Agra <oran@redislabs.com>	2023-03-14 20:26:21 +02:00
Binbin	7997874f4d	Fix tail->repl_offset update in feedReplicationBuffer (#11905 ) In #11666, we added a while loop and will split a big reply node to multiple nodes. The update of tail->repl_offset may be wrong. Like before #11666, we would have created at most one new reply node, and now we will create multiple nodes if it is a big reply node. Now we are creating more than one node, and the tail->repl_offset of all the nodes except the last one are incorrect. Because we update master_repl_offset at the beginning, and then use it to update the tail->repl_offset. This would have lead to an assertion during PSYNC, a test was added to validate that case. Besides that, the calculation of size was adjusted to fix tests that failed due to a combination of a very low backlog size, and some thresholds of that get violated because of the relatively high overhead of replBufBlock. So now if the backlog size / 16 is too small, we'll take PROTO_REPLY_CHUNK_BYTES instead. Co-authored-by: Oran Agra <oran@redislabs.com>	2023-03-13 16:12:29 +02:00
xbasel	7be7834e65	Large blocks of replica client output buffer could lead to psync loops and unnecessary memory usage (#11666 ) This can happen when a key almost equal or larger than the client output buffer limit of the replica is written. Example: 1. DB is empty 2. Backlog size is 1 MB 3. Client out put buffer limit is 2 MB 4. Client writes a 3 MB key 5. The shared replication buffer will have a single node which contains the key written above, and it exceeds the backlog size. At this point the client output buffer usage calculation will report the replica buffer to be 3 MB (or more) even after sending all the data to the replica. The primary drops the replica connection for exceeding the limits, the replica reconnects and successfully executes partial sync but the primary will drop the connection again because the buffer usage is still 3 MB. This happens over and over. To mitigate the problem, this fix limits the maximum size of a single backlog node to be (repl_backlog_size/16). This way a single node can't exceed the limits of the COB (the COB has to be larger than the backlog). It also means that if the backlog has some excessive data it can't trim, it would be at most about 6% overuse. other notes: 1. a loop was added in feedReplicationBuffer which caused a massive LOC change due to indentation, the actual changes are just the `min(max` and the loop. 3. an unrelated change in an existing test to speed up a server termination which took 10 seconds. Co-authored-by: Oran Agra <oran@redislabs.com>	2023-03-12 19:47:06 +02:00
Binbin	08cd3bf292	redis-cli reads specified number of replies for UNSUBSCRIBE/PUNSUBSCRIBE/SUNSUBSCRIBE (#11047 ) In unsubscribe related commands, we need to read the specified number of replies according to the number of parameters. These commands may return multiple RESP replies, and currently redis-cli only tries to read only one reply. Fixes #11046, this redis-cli bug seems to be there forever. Note that the [UN]SUBSCRIBE command response is a bit awkward see: https://github.com/redis/redis-doc/pull/2327	2023-03-12 18:08:03 +02:00
Binbin	416842e6c0	Fix the bug that CLIENT REPLY OFF\|SKIP cannot receive push notifications (#11875 ) This bug seems to be there forever, CLIENT REPLY OFF\|SKIP will mark the client with CLIENT_REPLY_OFF or CLIENT_REPLY_SKIP flags. With these flags, prepareClientToWrite called by addReply* will return C_ERR directly. So the client can't receive the Pub/Sub messages and any other push notifications, e.g client side tracking. In this PR, we adding a CLIENT_PUSHING flag, disables the reply silencing flags. When adding push replies, set the flag, after the reply, clear the flag. Then add the flag check in prepareClientToWrite. Fixes #11874 Note, the SUBSCRIBE command response is a bit awkward, see https://github.com/redis/redis-doc/pull/2327 Co-authored-by: Oran Agra <oran@redislabs.com>	2023-03-12 17:50:44 +02:00
Binbin	4e7eb16ae7	Fix race in sentinel manual failover test (#11900 ) In #9408, we added some SENTINEL DEBUG to reduce default timeouts and allow tests to execute faster. The change in 05-manual.tcl may cause a race that SENTINEL FAILOVER response with a NOGOODSLAVE: ``` Manual failover works: FAILED: Expected NOGOODSLAVE No suitable replica to promote eq "OK" (context: type eval line 6 cmd {assert {$reply eq "OK"}} proc ::test) (Jumping to next unit after error) FAILED: caught an error in the test assertion:Expected NOGOODSLAVE No suitable replica to promote eq "OK" (context: type eval line 6 cmd {assert {$reply eq "OK"}} proc ::test) ``` The reason is that the info-period value was reduced in #9408 (the default value is 10000), and then manual failover was performed immediately, but the INFO may not exchanged between the sentinel and replicas, causing the sentinel to skip all the replicas in sentinelSelectSlave (Because replica's info_refresh is not updated, see the code snippet below), then return a NOGOODSLAVE, break the test. Code snippet from sentinelSelectSlave: ``` while((de = dictNext(di)) != NULL) { sentinelRedisInstance slave = dictGetVal(de); mstime_t info_validity_time; if (master->flags & SRI_S_DOWN) info_validity_time = sentinel_ping_period5; else info_validity_time = sentinel_info_period*3; if (mstime() - slave->info_refresh > info_validity_time) continue; } ``` By adding a wait_for_condition, we have the opportunity to let sentinel update the info_period of the replicas.	2023-03-12 13:25:10 +02:00
guybe7	4ba47d2d21	Add reply_schema to command json files (internal for now) (#10273 ) Work in progress towards implementing a reply schema as part of COMMAND DOCS, see #9845 Since ironing the details of the reply schema of each and every command can take a long time, we would like to merge this PR when the infrastructure is ready, and let this mature in the unstable branch. Meanwhile the changes of this PR are internal, they are part of the repo, but do not affect the produced build. ### Background In #9656 we add a lot of information about Redis commands, but we are missing information about the replies ### Motivation 1. Documentation. This is the primary goal. 2. It should be possible, based on the output of COMMAND, to be able to generate client code in typed languages. In order to do that, we need Redis to tell us, in detail, what each reply looks like. 3. We would like to build a fuzzer that verifies the reply structure (for now we use the existing testsuite, see the "Testing" section) ### Schema The idea is to supply some sort of schema for the various replies of each command. The schema will describe the conceptual structure of the reply (for generated clients), as defined in RESP3. Note that the reply structure itself may change, depending on the arguments (e.g. `XINFO STREAM`, with and without the `FULL` modifier) We decided to use the standard json-schema (see https://json-schema.org/) as the reply-schema. Example for `BZPOPMIN`: ``` "reply_schema": { "oneOf": [ { "description": "Timeout reached and no elements were popped.", "type": "null" }, { "description": "The keyname, popped member, and its score.", "type": "array", "minItems": 3, "maxItems": 3, "items": [ { "description": "Keyname", "type": "string" }, { "description": "Member", "type": "string" }, { "description": "Score", "type": "number" } ] } ] } ``` #### Notes 1. It is ok that some commands' reply structure depends on the arguments and it's the caller's responsibility to know which is the relevant one. this comes after looking at other request-reply systems like OpenAPI, where the reply schema can also be oneOf and the caller is responsible to know which schema is the relevant one. 2. The reply schemas will describe RESP3 replies only. even though RESP3 is structured, we want to use reply schema for documentation (and possibly to create a fuzzer that validates the replies) 3. For documentation, the description field will include an explanation of the scenario in which the reply is sent, including any relation to arguments. for example, for `ZRANGE`'s two schemas we will need to state that one is with `WITHSCORES` and the other is without. 4. For documentation, there will be another optional field "notes" in which we will add a short description of the representation in RESP2, in case it's not trivial (RESP3's `ZRANGE`'s nested array vs. RESP2's flat array, for example) Given the above: 1. We can generate the "return" section of all commands in [redis-doc](https://redis.io/commands/) (given that "description" and "notes" are comprehensive enough) 2. We can generate a client in a strongly typed language (but the return type could be a conceptual `union` and the caller needs to know which schema is relevant). see the section below for RESP2 support. 3. We can create a fuzzer for RESP3. ### Limitations (because we are using the standard json-schema) The problem is that Redis' replies are more diverse than what the json format allows. This means that, when we convert the reply to a json (in order to validate the schema against it), we lose information (see the "Testing" section below). The other option would have been to extend the standard json-schema (and json format) to include stuff like sets, bulk-strings, error-string, etc. but that would mean also extending the schema-validator - and that seemed like too much work, so we decided to compromise. Examples: 1. We cannot tell the difference between an "array" and a "set" 2. We cannot tell the difference between simple-string and bulk-string 3. we cannot verify true uniqueness of items in commands like ZRANGE: json-schema doesn't cover the case of two identical members with different scores (e.g. `[["m1",6],["m1",7]]`) because `uniqueItems` compares (member,score) tuples and not just the member name. ### Testing This commit includes some changes inside Redis in order to verify the schemas (existing and future ones) are indeed correct (i.e. describe the actual response of Redis). To do that, we added a debugging feature to Redis that causes it to produce a log of all the commands it executed and their replies. For that, Redis needs to be compiled with `-DLOG_REQ_RES` and run with `--reg-res-logfile <file> --client-default-resp 3` (the testsuite already does that if you run it with `--log-req-res --force-resp3`) You should run the testsuite with the above args (and `--dont-clean`) in order to make Redis generate `.reqres` files (same dir as the `stdout` files) which contain request-response pairs. These files are later on processed by `./utils/req-res-log-validator.py` which does: 1. Goes over req-res files, generated by redis-servers, spawned by the testsuite (see logreqres.c) 2. For each request-response pair, it validates the response against the request's reply_schema (obtained from the extended COMMAND DOCS) 5. In order to get good coverage of the Redis commands, and all their different replies, we chose to use the existing redis test suite, rather than attempt to write a fuzzer. #### Notes about RESP2 1. We will not be able to use the testing tool to verify RESP2 replies (we are ok with that, it's time to accept RESP3 as the future RESP) 2. Since the majority of the test suite is using RESP2, and we want the server to reply with RESP3 so that we can validate it, we will need to know how to convert the actual reply to the one expected. - number and boolean are always strings in RESP2 so the conversion is easy - objects (maps) are always a flat array in RESP2 - others (nested array in RESP3's `ZRANGE` and others) will need some special per-command handling (so the client will not be totally auto-generated) Example for ZRANGE: ``` "reply_schema": { "anyOf": [ { "description": "A list of member elements", "type": "array", "uniqueItems": true, "items": { "type": "string" } }, { "description": "Members and their scores. Returned in case `WITHSCORES` was used.", "notes": "In RESP2 this is returned as a flat array", "type": "array", "uniqueItems": true, "items": { "type": "array", "minItems": 2, "maxItems": 2, "items": [ { "description": "Member", "type": "string" }, { "description": "Score", "type": "number" } ] } } ] } ``` ### Other changes 1. Some tests that behave differently depending on the RESP are now being tested for both RESP, regardless of the special log-req-res mode ("Pub/Sub PING" for example) 2. Update the history field of CLIENT LIST 3. Added basic tests for commands that were not covered at all by the testsuite ### TODO - [x] (maybe a different PR) add a "condition" field to anyOf/oneOf schemas that refers to args. e.g. when `SET` return NULL, the condition is `arguments.get\|\|arguments.condition`, for `OK` the condition is `!arguments.get`, and for `string` the condition is `arguments.get` - https://github.com/redis/redis/issues/11896 - [x] (maybe a different PR) also run `runtest-cluster` in the req-res logging mode - [x] add the new tests to GH actions (i.e. compile with `-DLOG_REQ_RES`, run the tests, and run the validator) - [x] (maybe a different PR) figure out a way to warn about (sub)schemas that are uncovered by the output of the tests - https://github.com/redis/redis/issues/11897 - [x] (probably a separate PR) add all missing schemas - [x] check why "SDOWN is triggered by misconfigured instance replying with errors" fails with --log-req-res - [x] move the response transformers to their own file (run both regular, cluster, and sentinel tests - need to fight with the tcl including mechanism a bit) - [x] issue: module API - https://github.com/redis/redis/issues/11898 - [x] (probably a separate PR): improve schemas: add `required` to `object`s - https://github.com/redis/redis/issues/11899 Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com> Co-authored-by: Hanna Fadida <hanna.fadida@redislabs.com> Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: Shaya Potter <shaya@redislabs.com>	2023-03-11 10:14:16 +02:00
Binbin	c46d68d6d2	Fix Uninitialised value error in createSparklineSequence (LATENCY GRAPH) (#11892 ) This was exposed by a new LATENCY GRAPH valgrind test. There are no security implications, fix by initializing these members.	2023-03-09 12:05:50 +02:00
Binbin	a7c9e5053a	Fix test and improve assert_replication_stream print the whole stream (#11793 ) This PR has two parts: 1. Fix flaky test case, the previous tests set a lot of volatile keys, it injects an unexpected DEL command into the replication stream during the later test, causing it to fail. Add a flushall to avoid it. 2. Improve assert_replication_stream, now it can print the whole stream rather than just the failing line.	2023-03-08 22:39:54 +02:00
Binbin	312654d5be	Fix misleading error message in XREADGROUP (#11799 ) XREADGROUP can output a misleading error message regarding use of the $ special ID. Here is the example (with some newlines): ``` redis> xreadgroup group workers worker1 count 1 streams mystream (error) ERR Unbalanced XREAD list of streams: for each stream key an ID or '$' must be specified. redis> xreadgroup group workers worker1 count 1 streams mystream $ (error) ERR The $ ID is meaningless in the context of XREADGROUP: you want to read the history of this consumer by specifying a proper ID, or use the > ID to get new messages. The $ ID would just return an empty result set. redis> xreadgroup group workers worker1 count 1 streams mystream > 1) 1) "mystream" 2) 1) 1) "1673544607848-0" 2) 1) "n" 2) "1" ``` Note that XREADGROUP first returns an error with the following problems in it: - Command name in the error should be XREADGROUP not XREAD. - It recommends using $ as an option for a stream ID, then when you try this (see second XREADGROUP command above), it errors telling you that `$` doesn't make sense in this context even though the previous error message told you to use it Suggest that the command name be fixed in the first message, and the second part error message be amended not to talk about using `$` but `>` instead, this works, see the third and final XREADGROUP example above. Fixes #11730, commit message took from simonprickett. Co-authored-by: Simon Prickett <simon@redislabs.com>	2023-03-08 11:57:32 +02:00
ranshid	4988b92850	Fix an issue when module decides to unblock a client which is blocked on keys (#11832 ) Currently (starting at #11012) When a module is blocked on keys it sets the CLIENT_PENDING_COMMAND flag. However in case the module decides to unblock the client not via the regular flow (eg timeout, key signal or CLIENT UNBLOCK command) it will attempt to reprocess the module command and potentially blocked again. This fix remove the CLIENT_PENDING_COMMAND flag in case blockedForKeys is issued from module context.	2023-03-08 10:08:54 +02:00
Madelyn Olson	2bb29e4aa3	Always compact nodes in stream listpacks after creating new nodes (#11885 ) This change attempts to alleviate a minor memory usage degradation for Redis 6.2 and onwards when using rather large objects (~2k) in streams. Introduced in #6281, we pre-allocate the head nodes of a stream to be 4kb, to limit the amount of unnecessary initial reallocations that are done. However, if we only ever allocate one object because 2 objects exceeds the max_stream_entry_size, we never actually shrink it to fit the single item. This can lead to a lot of excessive memory usage. For smaller item sizes this becomes less of an issue, as the overhead decreases as the items become smaller in size. This commit also changes the MEMORY USAGE of streams, since it was reporting the lpBytes instead of the allocated size. This introduced an observability issue when diagnosing the memory issue, since Redis reported the same amount of used bytes pre and post change, even though the new implementation allocated more memory.	2023-03-07 15:06:53 -08:00
Binbin	9958ab8b2c	Solve race in CLIENT NO-TOUCH lru test (#11883 ) I've seen it fail here (test-centos7-tls-module-no-tls and test-freebsd): ``` *** [err]: Operations in no-touch mode do not alter the last access time of a key in tests/unit/introspection-2.tcl Expected '244296' to be more than '244296' (context: type eval line 12 cmd {assert_morethan $newlru $oldlru} proc ::test) ``` Our LRU_CLOCK_RESOLUTION value is 1000ms, and default hz is 10, so if the test is really fast, or the timing is just right, newlru will be the same as oldlru. We fixed this by changing `after 1000` to `after 1100`.	2023-03-07 15:27:09 +02:00
sundb	3fba3ccd96	Skip test for sdsRemoveFreeSpace when mem_allocator is not jemalloc (#11878 ) Test `trim on SET with big value` (introduced from #11817) fails under mac m1 with libc mem_allocator. The reason is that malloc(33000) will allocate 65536 bytes(>42000). This test still passes under ubuntu with libc mem_allocator. ``` *** [err]: trim on SET with big value in tests/unit/type/string.tcl Expected [r memory usage key] < 42000 (context: type source line 471 file /Users/iospack/data/redis_fork/tests/unit/type/string.tcl cmd {assert {[r memory usage key] < 42000}} proc ::test) ``` simple test under mac m1 with libc mem_allocator: ```c void *p = zmalloc(33000); printf("malloc size: %zu\n", zmalloc_size(p)); # output malloc size: 65536 ```	2023-03-07 09:06:58 +02:00
某10	3a90ea998c	Add GNUC minor version check for redis_unreachable (#11882 ) __builtin_unreachable is added for the first time in GCC 4.5	2023-03-05 15:28:50 +02:00
Binbin	bfe50a30ed	Increase the threshold of the AOF loading defrag test (#11871 ) This test is very sensitive and fragile. It often fails in Daily, in most cases, it failed in test-ubuntu-32bit (the AOF loading one), with the range in (31, 40): ``` [err]: Active defrag in tests/unit/memefficiency.tcl Expected 38 <= 30 (context: type eval line 113 cmd {assert {$max_latency <= 30}} proc ::test) ``` The AOF loading part isn't tightly fixed to the cron hz. It calls processEventsWhileBlocked once in every 1024 command calls. ``` /* Serve the clients from time to time */ if (!(loops++ % 1024)) { off_t progress_delta = ftello(fp) - last_progress_report_size; loadingIncrProgress(progress_delta); last_progress_report_size += progress_delta; processEventsWhileBlocked(); processModuleLoadingProgressEvent(1); } ``` In this case, we can either decrease the 1024 or increase the threshold of just the AOF part of that test. Considering the test machines are sometimes slow, and all sort of quirks could happen (which do not indicate a bug), and we've already set to 30, we suppose we can set it a little bit higher, set it to 40. We can have this instead of adding another testing config (we can add it when we really need it). Fixes #11868	2023-03-04 12:54:36 +02:00
SkyperTHC	bb57d4ec75	Dont COMMANDS DOCS if not TTY (not interactive) (#11850 ) Avoiding initializing the interactive help and the excessive call to the COMMAND command when using redis-cli with pipe. e.g. ``` echo PING \| redis-cli ```	2023-03-03 10:28:55 +02:00
uriyage	9d336ac398	Try to trim strings only when applicable (#11817 ) As `sdsRemoveFreeSpace` have an impact on performance even if it is a no-op (see details at #11508). Only call the function when there is a possibility that the string contains free space. * For strings coming from the network, it's only if they're bigger than PROTO_MBULK_BIG_ARG * For strings coming from scripts, it's only if they're smaller than LUA_CMD_OBJCACHE_MAX_LEN * For strings coming from modules, it could be anything. Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: sundb <sundbcn@gmail.com>	2023-02-28 19:38:58 +02:00
Oran Agra	b1939b052a	Integer Overflow in RAND commands can lead to assertion (CVE-2023-25155) (#11857 ) Issue happens when passing a negative long value that greater than the max positive value that the long can store.	2023-02-28 15:15:46 +02:00
Oran Agra	dcbfcb916c	String pattern matching had exponential time complexity on pathological patterns (CVE-2022-36021) (#11858 ) Authenticated users can use string matching commands with a specially crafted pattern to trigger a denial-of-service attack on Redis, causing it to hang and consume 100% CPU time. Co-authored-by: Tom Levy <tomlevy93@gmail.com>	2023-02-28 15:15:26 +02:00
ranshid	18017df7c1	Fix possible memory corruption in FLUSHALL when a client watches more than one key (#11854 ) Avoid calling unwatchAllKeys when running touchAllWatchedKeysInDb (which was unnecessary) This can potentially lead to use-after-free and memory corruption when the next entry pointer held by the watched keys iterator is freed when unwatching all keys of a specific client. found with address sanitizer, added a test which will not always fail (depending on the random dict hashing seed) problem introduced in #9829 (Reids 7.0) Co-authored-by: Oran Agra <oran@redislabs.com>	2023-02-28 12:02:55 +02:00
ranshid	4972760b67	assert in case resize output buffer will attempt to shrink too much (#11839 ) Currently there is no BUG. However during some internal code changes I found that it can happen (for example in case new code will not update the buf_peak) which can currently lead to memory overrun which is much harder to detect and root cause. Why did I please the assert here? The reason is to be able to have the buf_peak value without the risk of it being overriden by the peak_reset	2023-02-26 11:54:29 +02:00
Oran Agra	c8226ae378	Try to solve valgrind CI test error with client-eviction test (#11822 ) The test sporadically failed with valgrind trying to match `no client named obuf-client1 found*` in the log it looks like `obuf-client1` was indeed dropped, so i'm guessing it's because CLIENT LIST was processed first.	2023-02-23 13:36:31 +02:00
Binbin	61acf515bc	Add missing since filed for new CLIENT NO-TOUCH command (#11829 ) CLIENT NO-TOUCH added in #11483, but forgot to add the since field in the JSON file. This PR adds the since field to it with a value of 7.2.0	2023-02-23 10:56:52 +02:00
Chen Tianjie	897c3d522c	Add CLIENT NO-TOUCH for clients to run commands without affecting LRU/LFU of keys (#11483 ) When no-touch mode is enabled, the client will not touch LRU/LFU of the keys it accesses, except when executing command `TOUCH`. This allows inspecting or modifying the key-space without affecting their eviction. Changes: - A command `CLIENT NO-TOUCH ON\|OFF` to switch on and off this mode. - A client flag `#define CLIENT_NOTOUCH (1ULL<<45)`, which can be shown with `CLIENT INFO`, by the letter "T" in the "flags" field. - Clear `NO-TOUCH` flag in `clearClientConnectionState`, which is used by `RESET` command and resetting temp clients used by modules. - Also clear `NO-EVICT` flag in `clearClientConnectionState`, this might have been an oversight, spotted by @madolson. - A test using `DEBUG OBJECT` command to verify that LRU stat is not touched when no-touch mode is on. Co-authored-by: chentianjie <chentianjie@alibaba-inc.com> Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com> Co-authored-by: sundb <sundbcn@gmail.com>	2023-02-23 09:07:49 +02:00
Binbin	cd58af4d7f	Speed up test: client evicted due to client tracking prefixes (#11823 ) We noticed that `client evicted due to client tracking prefixes` takes over 200 seconds with valgrind. We combine three prefixes in each command, this will probably save us half the testing time. Before: normal: 3508ms, valgrind: 289503ms -> 290s With three prefixes, normal: 1500ms, valgrind: 135742ms -> 136s Since we did not actually count the memory usage of all prefixes, see getClientMemoryUsage, so we can not use larger prefixes to speed up the test here. Also this PR cleaned up some spaces (IDE jobs) and typos.	2023-02-21 18:58:55 +02:00
Madelyn Olson	dca5927ac8	Prevent Redis from crashing from key tracking invalidations (#11814 ) There is a built in limit to client side tracking keys, which when exceeded will invalidate keys. This occurs in two places, one in the server cron and other before executing a command. If it happens in the second scenario, the invalidations will be queued for later since current client is set. This queue is never drained if a command is not executed (through call) such as a multi-exec command getting queued. This results in a later server assert crashing.	2023-02-21 08:14:41 -08:00
M Sazzadul Hoque	4cc2b0dc1a	Fix HELLO error message command syntax suggestion (#11809 ) A simple HELLO command to a password protected Redis server replies with an error with another command suggestion. This omits protocol version from HELLO command arguments which causes another error. This PR adds the protocol version in the command suggestion.	2023-02-21 15:05:58 +02:00
judeng	40659c3424	add test case and comments for active expiry in the writeable replica (#11789 ) This test case is to cover a edge scenario: when a writable replica enabled AOF at the same time, active expiry keys which was created in writable replicas should propagate to the AOF file, and some versions might crash (fixed by #11615). For details, please refer to #11778	2023-02-20 10:23:25 +02:00
Oran Agra	3ac835777c	Stablize page reclaim CI test (#11818 ) stabilize the test introduced in #11248 * remove random aspect of the test by using DEBUG POPULATE instead of redis-benchmark * disable rdbcompression, so that the rdb file is always about 1GB. when fadvise was disabled, i get about 1GB in the page cace when enabled i get less than 200KB so for now, i'll keep the 500kb threshold.	2023-02-19 18:38:07 +02:00
Binbin	521e54f551	Demoting some of the non-warning messages to notice (#10715 ) We have cases where we print information (might be important but by no means an error indicator) with the LL_WARNING level. Demoting these to LL_NOTICE: - oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo - User requested shutdown... This is also true for cases that we encounter a rare but normal situation. Demoting to LL_NOTICE. Examples: - AOF was enabled but there is already another background operation. An AOF background was scheduled to start when possible. - Connection with master lost. base on yoav-steinberg's https://github.com/redis/redis/pull/10650#issuecomment-1112280554 and yossigo's https://github.com/redis/redis/pull/10650#pullrequestreview-967677676	2023-02-19 16:33:19 +02:00
Oran Agra	5b61b0dc6d	skip new page cache reclame unit test when running in valgrind (#11808 ) the new test is incompatible with valgrind. added a new `--valgrind` argument to `redis-server tests` mode, which will cause that test to be skipped..	2023-02-16 10:50:58 +02:00
Oran Agra	233abbbe03	Cleanup around script_caller, fix tracking of scripts and ACL logging for RM_Call (#11770 ) * Make it clear that current_client is the root client that was called by external connection * add executing_client which is the client that runs the current command (can be a module or a script) * Remove script_caller that was used for commands that have CLIENT_SCRIPT to get the client that called the script. in most cases, that's the current_client, and in others (when being called from a module), it could be an intermediate client when we actually want the original one used by the external connection. bugfixes: * RM_Call with C flag should log ACL errors with the requested user rather than the one used by the original client, this also solves a crash when RM_Call is used with C flag from a detached thread safe context. * addACLLogEntry would have logged info about the script_caller, but in case the script was issued by a module command we actually want the current_client. the exception is when RM_Call is called from a timer event, in which case we don't have a current_client. behavior changes: * client side tracking for scripts now tracks the keys that are read by the script instead of the keys that are declared by the caller for EVAL other changes: * Log both current_client and executing_client in the crash log. * remove prepareLuaClient and resetLuaClient, being dead code that was forgotten. * remove scriptTimeSnapshot and snapshot_time and instead add cmd_time_snapshot that serves all commands and is reset only when execution nesting starts. * remove code to propagate CLIENT_FORCE_REPL from the executed command to the script caller since scripts aren't propagated anyway these days and anyway this flag wouldn't have had an effect since CLIENT_PREVENT_PROP is added by scriptResetRun. * fix a module GIL violation issue in afterSleep that was introduced in #10300 (unreleased)	2023-02-16 08:07:35 +02:00
zhaozhao.zz	a35e08370a	correct cluster inbound link keepalive time (#11785 )	2023-02-16 11:21:17 +08:00
Binbin	7d5382c0ff	Remove wrong code in list pot timeout test (#11805 ) In #9373, actually need to replace `$rd $pop blist1{t} blist2{t} 1` with `bpop_command_two_key $rd $pop blist1{t} blist2{t} 1` but forgot to delete the latter. This doesn't affect the test, because the later assert_error "WRONGTYPE" is expected (and right). And if we read $rd again, it will get the wrong result, like 'ERR unknown command 'BLMPOP_LEFT' \| 'BLMPOP_RIGHT'	2023-02-15 07:46:56 +02:00
Wen Hui	a705184522	Update codes (#11804 ) In this PR, we use function pointer *isPresent replace the variable "present" in auxFieldHandler, so that in the future, when we have more aux fields, we could decide if the aux field is displayed or not.	2023-02-14 13:47:55 -08:00
guybe7	9483ab0b8e	Minor changes around the blockonkeys test module (#11803 ) All of the POP commands must not decr length below 0. So, get_fsl will delete the key if the length is 0 (unless the caller wished to create if doesn't exist) Other: 1. Use REDISMODULE_WRITE where needed (POP commands) 2. Use wait_for_blokced_clients in test Unrelated: Use quotes instead of curly braces in zset.tcl, for variable expansion	2023-02-14 20:06:30 +02:00
guybe7	fd82bccd0e	SCAN/RANDOMKEY and lazy-expire (#11788 ) Starting from Redis 7.0 (#9890) we started wrapping everything a command propagates with MULTI/EXEC. The problem is that both SCAN and RANDOMKEY can lazy-expire arbitrary keys (similar behavior to active-expire), and put DELs in a transaction. Fix: When these commands are called without a parent exec-unit (e.g. not in EVAL or MULTI) we avoid wrapping their DELs in a transaction (for the same reasons active-expire and eviction avoids a transaction) This PR adds a per-command flag that indicates that the command may touch arbitrary keys (not the ones in the arguments), and uses that flag to avoid the MULTI-EXEC. For now, this flag is internal, since we're considering other solutions for the future. Note for cluster mode: if SCAN/RANDOMKEY is inside EVAL/MULTI it can still cause the same situation (as it always did), but it won't cause a CROSSSLOT because replicas and AOF do not perform slot checks. The problem with the above is mainly for 3rd party ecosystem tools that propagate commands from master to master, or feed an AOF file with redis-cli into a master. This PR aims to fix the regression in redis 7.0, and we opened #11792 to try to handle the bigger problem with lazy expire better for another release.	2023-02-14 09:33:21 +02:00
Tian	7dae142a2e	Reclaim page cache of RDB file (#11248 ) # Background The RDB file is usually generated and used once and seldom used again, but the content would reside in page cache until OS evicts it. A potential problem is that once the free memory exhausts, the OS have to reclaim some memory from page cache or swap anonymous page out, which may result in a jitters to the Redis service. Supposing an exact scenario, a high-capacity machine hosts many redis instances, and we're upgrading the Redis together. The page cache in host machine increases as RDBs are generated. Once the free memory drop into low watermark(which is more likely to happen in older Linux kernel like 3.10, before [watermark_scale_factor](https://lore.kernel.org/lkml/1455813719-2395-1-git-send-email-hannes@cmpxchg.org/) is introduced, the `low watermark` is linear to `min watermark`, and there'is not too much buffer space for `kswapd` to be wake up to reclaim memory), a `direct reclaim` happens, which means the process would stall to wait for memory allocation. # What the PR does The PR introduces a capability to reclaim the cache when the RDB is operated. Generally there're two cases, read and write the RDB. For read it's a little messy to address the incremental reclaim, so the reclaim is done in one go in background after the load is finished to avoid blocking the work thread. For write, incremental reclaim amortizes the work of reclaim so no need to put it into background, and the peak watermark of cache can be reduced in this way. Two cases are addresses specially, replication and restart, for both of which the cache is leveraged to speed up the processing, so the reclaim is postponed to a right time. To do this, a flag is added to`rdbSave` and `rdbLoad` to control whether the cache need to be kept, with the default value false. # Something deserve noting 1. Though `posix_fadvise` is the POSIX standard, but only few platform support it, e.g. Linux, FreeBSD 10.0. 2. In Linux `posix_fadvise` only take effect on writeback-ed pages, so a `sync`(or `fsync`, `fdatasync`) is needed to flush the dirty page before `posix_fadvise` if we reclaim write cache. # About test A unit test is added to verify the effect of `posix_fadvise`. In integration test overall cache increase is checked, as well as the cache backed by RDB as a specific TCL test is executed in isolated Github action job.	2023-02-12 09:23:29 +02:00
Meir Shpilraien (Spielrein)	5c3938d5cc	Match REDISMODULE_OPEN_KEY_* flags to LOOKUP_* flags (#11772 ) The PR adds support for the following flags on RedisModule_OpenKey: * REDISMODULE_OPEN_KEY_NONOTIFY - Don't trigger keyspace event on key misses. * REDISMODULE_OPEN_KEY_NOSTATS - Don't update keyspace hits/misses counters. * REDISMODULE_OPEN_KEY_NOEXPIRE - Avoid deleting lazy expired keys. * REDISMODULE_OPEN_KEY_NOEFFECTS - Avoid any effects from fetching the key In addition, added `RM_GetOpenKeyModesAll`, which returns the mask of all supported OpenKey modes. This allows the module to check, in runtime, which OpenKey modes are supported by the current Redis instance.	2023-02-09 14:59:05 +02:00
Binbin	66bed3f220	When DEBUG LOADAOF fails, return an error instead of exiting (#11790 ) Return an error when loadAppendOnlyFiles fails instead of exiting. DEBUF LOADAOF command is only meant to be used by the test suite, and only by tests that generated an AOF file first. So this change is ok (considering that the caller is likely to catch this error and die). This actually revert part of the code in #9012, and now DEBUG LOADAOF behaves the same as DEBUG RELOAD (returns an error when the load fails). Plus remove a `after 2000` in a test, which can save times (looks like copy paste error).	2023-02-09 07:57:19 +02:00

1 2 3 4 5 ...

11622 Commits