redict

mirror of https://codeberg.org/redict/redict.git synced 2025-01-22 08:08:53 -05:00

Author	SHA1	Message	Date
Drew DeVault	50ee0f5be8	all: let's go LGPL over GPL Based on feedback from interested parties	2024-03-21 20:11:44 +01:00
Drew DeVault	5a20af0e76	all: use REUSE for license management	2024-03-21 14:30:47 +01:00
Drew DeVault	e8abf37673	Working test suite under the Redict name	2024-03-21 13:55:33 +01:00
Chen Tianjie	8527959598	Replace slots_to_channels radix tree with slot specific dictionaries for shard channels. (#12804 ) We have achieved replacing `slots_to_keys` radix tree with key->slot linked list (#9356), and then replacing the list with slot specific dictionaries for keys (#11695). Shard channels behave just like keys in many ways, and we also need a slots->channels mapping. Currently this is still done by using a radix tree. So we should split `server.pubsubshard_channels` into 16384 dicts and drop the radix tree, just like what we did to DBs. Some benefits (basically the benefits of what we've done to DBs): 1. Optimize counting channels in a slot. This is currently used only in removing channels in a slot. But this is potentially more useful: sometimes we need to know how many channels there are in a specific slot when doing slot migration. Counting is now implemented by traversing the radix tree, and with this PR it will be as simple as calling `dictSize`, from O(n) to O(1). 2. The radix tree in the cluster has been removed. The shard channel names no longer require additional storage, which can save memory. 3. Potentially useful in slot migration, as shard channels are logically split by slots, thus making it easier to migrate, remove or add as a whole. 4. Avoid rehashing a big dict when there is a large number of channels. Drawbacks: 1. Takes more memory than using radix tree when there are relatively few shard channels. What this PR does: 1. in cluster mode, split `server.pubsubshard_channels` into 16384 dicts, in standalone mode, still use only one dict. 2. drop the `slots_to_channels` radix tree. 3. to save memory (to solve the drawback above), all 16384 dicts are created lazily, which means only when a channel is about to be inserted to the dict will the dict be initialized, and when all channels are deleted, the dict would delete itself. 5. use `server.shard_channel_count` to keep track of the number of all shard channels. --------- Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2023-12-27 17:40:45 +08:00
Binbin	3d9c427f8c	Fix timing issue in CLUSTER SLAVE / REPLICAS consistent test (#12774 ) CI reports that this test failed, the reason is because during the command processing, the node processed PING/PONG, resulting in ping_sent or pong_received mismatch. Change to use MULTI to avoid timing issue. The test was introduced in #12224.	2023-11-19 11:09:33 +02:00
Harkrishn Patro	b784c5375e	Unsubscribe all clients from replica for shard channel if the master ownership changes (#12577 ) Unsubscribe all clients from replica for shard channel if the master ownership changes	2023-10-12 20:48:27 -07:00
Sankar	1190f25ca7	Process loss of slot ownership in cluster bus (#12344 ) Process loss of slot ownership in cluster bus When a node no longer owns a slot, it clears the bit corresponding to the slot in the cluster bus messages. The receiving nodes currently don't record the fact that the sender stopped claiming a slot until some other node in the cluster starts claiming the slot. This can cause a slot to go missing during slot migration when subjected to inopportune race with addition of new shards or a failover. This fix forces the receiving nodes to process the loss of ownership to avoid spreading wrong information.	2023-07-05 17:46:23 -07:00
Harkrishn Patro	a9e32767f7	Allow cluster slots/shards api to respond during loading (#12269 ) It would be helpful for clients to get cluster slots/shards information during a node failover and is loading data.	2023-06-13 18:16:32 +03:00
Binbin	ec5721d6ca	Add dummy CLUSTER SLAVES call tests to fix reply ci (#12224 ) In #12166, we removed a call to CLUSTER SLAVES, which then caused reply-schemas ci to fail: ``` WARNING! The following commands were not hit at all: cluster\|slaves ERROR! at least one command was not hit by the tests ``` Because we already have command output that cover CLUSTER REPLICAS elsewhere, here we simply add some dummy tests to fix the ci.	2023-05-24 09:28:38 +03:00
Ping Xie	4c74dd986f	Exclude aux fields from "cluster nodes" and "cluster replicas" output (#12166 ) This commit excludes aux fields from the output of the `cluster nodes` and `cluster replicas` command. We may decide to re-introduce them in some form or another in the future, but not in v7.2.	2023-05-23 18:32:37 +03:00
Binbin	20533cc1d7	Tests: Do not save an RDB by default and add a SIGTERM default AOFRW test (#12064 ) In order to speed up tests, avoid saving an RDB (mostly notable on shutdown), except for tests that explicitly test the RDB mechanism In addition, use `shutdown-on-sigterm force` to prevetn shutdown from failing in case the server is in the middle of the initial AOFRW Also a a test that checks that the `shutdown-on-sigterm default` is to refuse shutdown if there's an initial AOFRW Co-authored-by: Guy Benoish <guy.benoish@redislabs.com>	2023-04-18 16:14:26 +03:00
guybe7	4ba47d2d21	Add reply_schema to command json files (internal for now) (#10273 ) Work in progress towards implementing a reply schema as part of COMMAND DOCS, see #9845 Since ironing the details of the reply schema of each and every command can take a long time, we would like to merge this PR when the infrastructure is ready, and let this mature in the unstable branch. Meanwhile the changes of this PR are internal, they are part of the repo, but do not affect the produced build. ### Background In #9656 we add a lot of information about Redis commands, but we are missing information about the replies ### Motivation 1. Documentation. This is the primary goal. 2. It should be possible, based on the output of COMMAND, to be able to generate client code in typed languages. In order to do that, we need Redis to tell us, in detail, what each reply looks like. 3. We would like to build a fuzzer that verifies the reply structure (for now we use the existing testsuite, see the "Testing" section) ### Schema The idea is to supply some sort of schema for the various replies of each command. The schema will describe the conceptual structure of the reply (for generated clients), as defined in RESP3. Note that the reply structure itself may change, depending on the arguments (e.g. `XINFO STREAM`, with and without the `FULL` modifier) We decided to use the standard json-schema (see https://json-schema.org/) as the reply-schema. Example for `BZPOPMIN`: ``` "reply_schema": { "oneOf": [ { "description": "Timeout reached and no elements were popped.", "type": "null" }, { "description": "The keyname, popped member, and its score.", "type": "array", "minItems": 3, "maxItems": 3, "items": [ { "description": "Keyname", "type": "string" }, { "description": "Member", "type": "string" }, { "description": "Score", "type": "number" } ] } ] } ``` #### Notes 1. It is ok that some commands' reply structure depends on the arguments and it's the caller's responsibility to know which is the relevant one. this comes after looking at other request-reply systems like OpenAPI, where the reply schema can also be oneOf and the caller is responsible to know which schema is the relevant one. 2. The reply schemas will describe RESP3 replies only. even though RESP3 is structured, we want to use reply schema for documentation (and possibly to create a fuzzer that validates the replies) 3. For documentation, the description field will include an explanation of the scenario in which the reply is sent, including any relation to arguments. for example, for `ZRANGE`'s two schemas we will need to state that one is with `WITHSCORES` and the other is without. 4. For documentation, there will be another optional field "notes" in which we will add a short description of the representation in RESP2, in case it's not trivial (RESP3's `ZRANGE`'s nested array vs. RESP2's flat array, for example) Given the above: 1. We can generate the "return" section of all commands in [redis-doc](https://redis.io/commands/) (given that "description" and "notes" are comprehensive enough) 2. We can generate a client in a strongly typed language (but the return type could be a conceptual `union` and the caller needs to know which schema is relevant). see the section below for RESP2 support. 3. We can create a fuzzer for RESP3. ### Limitations (because we are using the standard json-schema) The problem is that Redis' replies are more diverse than what the json format allows. This means that, when we convert the reply to a json (in order to validate the schema against it), we lose information (see the "Testing" section below). The other option would have been to extend the standard json-schema (and json format) to include stuff like sets, bulk-strings, error-string, etc. but that would mean also extending the schema-validator - and that seemed like too much work, so we decided to compromise. Examples: 1. We cannot tell the difference between an "array" and a "set" 2. We cannot tell the difference between simple-string and bulk-string 3. we cannot verify true uniqueness of items in commands like ZRANGE: json-schema doesn't cover the case of two identical members with different scores (e.g. `[["m1",6],["m1",7]]`) because `uniqueItems` compares (member,score) tuples and not just the member name. ### Testing This commit includes some changes inside Redis in order to verify the schemas (existing and future ones) are indeed correct (i.e. describe the actual response of Redis). To do that, we added a debugging feature to Redis that causes it to produce a log of all the commands it executed and their replies. For that, Redis needs to be compiled with `-DLOG_REQ_RES` and run with `--reg-res-logfile <file> --client-default-resp 3` (the testsuite already does that if you run it with `--log-req-res --force-resp3`) You should run the testsuite with the above args (and `--dont-clean`) in order to make Redis generate `.reqres` files (same dir as the `stdout` files) which contain request-response pairs. These files are later on processed by `./utils/req-res-log-validator.py` which does: 1. Goes over req-res files, generated by redis-servers, spawned by the testsuite (see logreqres.c) 2. For each request-response pair, it validates the response against the request's reply_schema (obtained from the extended COMMAND DOCS) 5. In order to get good coverage of the Redis commands, and all their different replies, we chose to use the existing redis test suite, rather than attempt to write a fuzzer. #### Notes about RESP2 1. We will not be able to use the testing tool to verify RESP2 replies (we are ok with that, it's time to accept RESP3 as the future RESP) 2. Since the majority of the test suite is using RESP2, and we want the server to reply with RESP3 so that we can validate it, we will need to know how to convert the actual reply to the one expected. - number and boolean are always strings in RESP2 so the conversion is easy - objects (maps) are always a flat array in RESP2 - others (nested array in RESP3's `ZRANGE` and others) will need some special per-command handling (so the client will not be totally auto-generated) Example for ZRANGE: ``` "reply_schema": { "anyOf": [ { "description": "A list of member elements", "type": "array", "uniqueItems": true, "items": { "type": "string" } }, { "description": "Members and their scores. Returned in case `WITHSCORES` was used.", "notes": "In RESP2 this is returned as a flat array", "type": "array", "uniqueItems": true, "items": { "type": "array", "minItems": 2, "maxItems": 2, "items": [ { "description": "Member", "type": "string" }, { "description": "Score", "type": "number" } ] } } ] } ``` ### Other changes 1. Some tests that behave differently depending on the RESP are now being tested for both RESP, regardless of the special log-req-res mode ("Pub/Sub PING" for example) 2. Update the history field of CLIENT LIST 3. Added basic tests for commands that were not covered at all by the testsuite ### TODO - [x] (maybe a different PR) add a "condition" field to anyOf/oneOf schemas that refers to args. e.g. when `SET` return NULL, the condition is `arguments.get\|\|arguments.condition`, for `OK` the condition is `!arguments.get`, and for `string` the condition is `arguments.get` - https://github.com/redis/redis/issues/11896 - [x] (maybe a different PR) also run `runtest-cluster` in the req-res logging mode - [x] add the new tests to GH actions (i.e. compile with `-DLOG_REQ_RES`, run the tests, and run the validator) - [x] (maybe a different PR) figure out a way to warn about (sub)schemas that are uncovered by the output of the tests - https://github.com/redis/redis/issues/11897 - [x] (probably a separate PR) add all missing schemas - [x] check why "SDOWN is triggered by misconfigured instance replying with errors" fails with --log-req-res - [x] move the response transformers to their own file (run both regular, cluster, and sentinel tests - need to fight with the tcl including mechanism a bit) - [x] issue: module API - https://github.com/redis/redis/issues/11898 - [x] (probably a separate PR): improve schemas: add `required` to `object`s - https://github.com/redis/redis/issues/11899 Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com> Co-authored-by: Hanna Fadida <hanna.fadida@redislabs.com> Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: Shaya Potter <shaya@redislabs.com>	2023-03-11 10:14:16 +02:00
Madelyn Olson	7379d22196	Harden init-tests for cluster tests (#11635 ) Attempt to harden cluster init-tests by doing two things: * Retry up to 3 times to join the cluster. Cluster meet is entirely idempotent, so it should stabilize if we missed a node. * Validate the connection is actually established, not just exists in the cluster list. Nodes can exist in handshake, but might later get dropped.	2022-12-22 17:37:00 -08:00
Ping Xie	203b12e41f	Introduce Shard IDs to logically group nodes in cluster mode (#10536 ) Introduce Shard IDs to logically group nodes in cluster mode. 1. Added a new "shard_id" field to "cluster nodes" output and nodes.conf after "hostname" 2. Added a new PING extension to propagate "shard_id" 3. Handled upgrade from pre-7.2 releases automatically 4. Refactored PING extension assembling/parsing logic Behavior of Shard IDs: Replicas will always follow the shards of their reported primaries. If a primary updates its shard ID, the replica will follow. (This need not follow for cluster v2) This is not an expected use case.	2022-11-16 19:24:18 -08:00
Brennan	47c493e070	Re-design cluster link send buffer to improve memory management (#11343 ) Re-design cluster link send queue to improve memory management	2022-11-01 19:26:44 -07:00
chendianqiang	e42d98ed27	Correctly handle scripts with shebang (not read-only) on a cluster replica (#11223 ) EVAL scripts are by default not considered `write` commands, so they were allowed on a replica. But when adding a shebang, they become `write` command (unless the `no-writes` flag is added). With this change we'll handle them as write commands, and reply with MOVED instead of READONLY when executed on a redis cluster replica. Co-authored-by: chendianqiang <chendianqiang@meituan.com>	2022-09-05 16:59:14 +03:00
Madelyn Olson	8a4e3bcd8d	Cluster test improvements (#10920 ) * Restructured testing to allow running cluster tests easily as part of the normal testing	2022-07-12 10:41:29 -07:00
Binbin	693acc0114	Trying to fix cluster test (#10963 ) #10942 break the new test added in #10449 ``` Testing unit: 29-slot-migration-response.tcl Cluster Join and auto-discovery test: FAILED: Cluster failed to join into a full mesh. ``` It looks like we need to wait for the cluster in 28 to become stable.	2022-07-11 15:21:35 +03:00
Madelyn Olson	e6a1b2ea95	Fix crash during handshake and cluster shards call (#10942 ) * Fix an engine crash when there are nodes in handshaking and a user calls cluster shards	2022-07-10 22:00:44 -07:00
Wen Hui	f620e6ac73	Add tests for error messages during slot migrations (#10449 ) * Add tests for error messages during slot migrations Co-authored-by: Ubuntu <lucas.guang.yang1@huawei.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>	2022-07-04 10:31:12 -05:00
Wen Hui	51da5c3dde	Fix CLUSTER RESET command argument number issue (#10898 ) Fix regression of CLUSTER RESET command in redis 7.0. cluster reset command format is: CLUSTER RESET [ HARD \| SOFT] According to the cluster reset command doc and codes, the third argument is optional, so the arity in json file should be -2 instead of 3. Add test to verify future regressions with RESET and RESET SOFT that were not covered. Co-authored-by: Ubuntu <lucas.guang.yang1@huawei.com> Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: Binbin <binloveplay1314@qq.com>	2022-06-29 08:17:00 +03:00
Harkrishn Patro	4065b4f27e	Sharded pubsub publish messagebulk as smessage (#10792 ) To easily distinguish between sharded channel message and a global channel message, introducing `smessage` (instead of `message`) as message bulk for sharded channel publish message. This is gonna be a breaking change in 7.0.1! Background: Sharded pubsub introduced in redis 7.0, but after the release we quickly realized that the fact that it's problematic that the client can't distinguish between normal (global) pubsub messages and sharded ones. This is important because the same connection can subscribe to both, but messages sent to one pubsub system are not propagated to the other (they're completely separate), so if one connection is used to subscribe to both, we need to assist the client library to know which message it got so it can forward it to the correct callback.	2022-05-31 08:03:59 +03:00
Meir Shpilraien (Spielrein)	ae020e3d56	Functions: Move library meta data to be part of the library payload. (#10500 ) ## Move library meta data to be part of the library payload. Following the discussion on https://github.com/redis/redis/issues/10429 and the intention to add (in the future) library versioning support, we believe that the entire library metadata (like name and engine) should be part of the library payload and not provided by the `FUNCTION LOAD` command. The reasoning behind this is that the programmer who developed the library should be the one who set those values (name, engine, and in the future also version). It is not the responsibility of the admin who load the library into the database. The PR moves all the library metadata (engine and function name) to be part of the library payload. The metadata needs to be provided on the first line of the payload using the shebang format (`#!<engine> name=<name>`), example: ```lua #!lua name=test redis.register_function('foo', function() return 1 end) ``` The above script will run on the Lua engine and will create a library called `test`. ## API Changes (compare to 7.0 rc2) * `FUNCTION LOAD` command was change and now it simply gets the library payload and extract the engine and name from the payload. In addition, the command will now return the function name which can later be used on `FUNCTION DELETE` and `FUNCTION LIST`. * The description field was completely removed from`FUNCTION LOAD`, and `FUNCTION LIST` ## Breaking Changes (compare to 7.0 rc2) * Library description was removed (we can re-add it in the future either as part of the shebang line or an additional line). * Loading an AOF file that was generated by either 7.0 rc1 or 7.0 rc2 will fail because the old command syntax is invalid. ## Notes * Loading an RDB file that was generated by rc1 / rc2 is supported, Redis will automatically add the shebang to the libraries payloads (we can probably delete that code after 7.0.3 or so since there's no need to keep supporting upgrades from an RC build).	2022-04-05 10:27:24 +03:00
Viktor Söderqvist	b53c7f2c0b	Turn into replica on SETSLOT (#10489 ) * Fix race condition where node loses its last slot and turns into replica When a node has lost its last slot and finds out from the SETSLOT command before the cluster bus PONG from the new owner arrives. In this case, the node didn't turn itself into a replica of the new slot owner. This commit adds the same logic to the SETSLOT command as already exists for the cluster bus PONG processing. * Revert "Fix new / failing cluster slot migration test (#10482)" This reverts commit `0b21ef8d49`. In this test, the old slot owner finds out that it has lost its last slot in a nondeterministic way. Either the cluster bus PONG from the new slot owner and sometimes in a SETSLOT command from redis-cli. In both cases, the result should be the same and the old owner should turn itself into a replica of the new slot owner.	2022-04-02 14:58:07 -07:00
Madelyn Olson	e81bd15e99	Prevent replica failover during manual takeover test (#10499 ) During 11-manual-takeover.tcl, if the killing of the instances happens too slowly, one of the replicas might be able to promote itself. I'm not sure why it was slow, but it was observed taking 6 seconds which is enough time to do an election. I was able to verify the error locally by adding a small delay (1 second) during ASAN CI. A fix is just to disable automated failover until all the nodes are confirmed dead.	2022-03-31 08:15:00 +03:00
Madelyn Olson	557222d1e0	Fix timing issue in shards test and fix displayed TLS port (#10450 )	2022-03-20 22:08:40 -07:00
Madelyn Olson	e8771efda9	Fixed incorrect parsing of hostname information from nodes.conf (#10435 )	2022-03-16 14:07:24 -07:00
Harkrishn Patro	45ccae89bb	Add new cluster shards command (#10293 ) Implement a new cluster shards command, which provides a flexible and extensible API for topology discovery. Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>	2022-03-15 18:24:40 -07:00
Harkrishn Patro	a5d17f0b6c	Check target node is a primary during cluster setslot. (#10277 )	2022-02-10 23:14:27 -08:00
Binbin	d2fde2f655	Fix cluster tests failing due to subcommand names (#10231 ) Introduced in #10128	2022-02-04 11:32:30 +02:00
Wen Hui	c9e1602f90	Add test case to improve code coverage for Addslotsrange and Delslotsrange command (#10128 ) add more test cases for addslotsrange and delslotsrange	2022-02-02 18:22:46 -08:00
Madelyn Olson	8b1cda7568	Change replica migration tests to use continous slots to improve speed (#10215 )	2022-01-30 22:44:32 -08:00
Oran Agra	d364ede59c	Revent the attempt to fix cluster rebalance test (#10207 ) (#10212 ) It seems that fix didn't really solve the problem with ASAN, and also introduced issues with other CI runs. unrelated: - make runtest-cluster able to take multiple --single arguments	2022-01-31 01:47:58 +02:00
Oran Agra	be0d293354	fix cluster rebalance test race (#10207 ) Try to fix the rebalance cluster test that's failing with ASAN daily: Looks like `redis-cli --cluster rebalance` gets `ERR Please use SETSLOT only with masters` in `clusterManagerMoveSlot()`. it happens when `12-replica-migration-2.tcl` is run with ASAN in GH Actions. in `Resharding all the master #0 slots away from it` So the fix (assuming i got it right) is to call `redis-cli --cluster check` before `--cluster rebalance`. p.s. it looks like a few other checks in these tests needed that wait, added them too. Other changes: * in instances.tcl, make sure to catch tcl test crashes and let the rest of the code proceed, so that if there was a redis crash, we'll find it and print it too. * redis-cli, try to make sure it prints an error instead of silently exiting. specifically about redis-cli: 1. clusterManagerMoveSlot used to print an error, only if the caller also asked for it (should be the other way around). 2. clusterManagerCommandReshard asked for an error, but didn't use it (probably tried to avoid the double print). 3. clusterManagerCommandRebalance didn't ask for the error, now it does. 4. making sure that other places in clusterManagerCommandRebalance print something before exiting with an error.	2022-01-30 11:30:19 +02:00
Madelyn Olson	f6b76e50ad	Change expression to look for at least one limit exceeded (#10173 ) This is an attempt to fix some of the issues with the cluster mode tests we are seeing in the daily run. The test is trying to incrementally adds a bunch of publish messages, expecting that eventually one of them will overflow. The tests stops one of the processes, so it expects that just that one Redis node will overflow. I think the test is flaky because under certain circumstances multiple links are getting disconnected, not just the one that is stalled.	2022-01-26 09:59:53 +02:00
yoav-steinberg	7eadc5ee70	Support function flags in script EVAL via shebang header (#10126 ) In #10025 we added a mechanism for flagging certain properties for Redis Functions. This lead us to think we'd like to "port" this mechanism to Redis Scripts (`EVAL`) as well. One good reason for this, other than the added functionality is because it addresses the poor behavior we currently have in `EVAL` in case the script performs a (non DENY_OOM) write operation during OOM state. See #8478 (And a previous attempt to handle it via #10093) for details. Note that in Redis Functions all write operations (including DEL) will return an error during OOM state unless the function is flagged as `allow-oom` in which case no OOM checking is performed at all. This PR: - Enables setting `EVAL` (and `SCRIPT LOAD`) script flags as defined in #10025. - Provides a syntactical framework via [shebang](https://en.wikipedia.org/wiki/Shebang_(Unix)) for additional script annotations and even engine selection (instead of just lua) for scripts. - Provides backwards compatibility so scripts without the new annotations will behave as they did before. - Appropriate tests. - Changes `EVAL[SHA]/_RO` to be flagged as `STALE` commands. This makes it possible to flag individual scripts as `allow-stale` or not flag them as such. In backwards compatibility mode these commands will return the `MASTERDOWN` error as before. - Changes `SCRIPT LOAD` to be flagged as a `STALE` command. This is mainly to make it logically compatible with the change to `EVAL` in the previous point. It enables loading a script on a stale server which is technically okay it doesn't relate directly to the server's dataset. Running the script does, but that won't work unless the script is explicitly marked as `allow-stale`. Note that even though the LUA syntax doesn't support hash tag comments `.lua` files do support a shebang tag on the top so they can be executed on Unix systems like any shell script. LUA's `luaL_loadfile` handles this as part of the LUA library. In the case of `luaL_loadbuffer`, which is what Redis uses, I needed to fix the input script in case of a shebang manually. I did this the same way `luaL_loadfile` does, by replacing the first line with a single line feed character.	2022-01-24 16:50:02 +02:00
ny0312	b40a9ba5fd	Fix flaky cluster tests in 24-links.tcl (#10157 ) * Fix flaky cluster test "Disconnect link when send buffer limit reached" * Fix flaky cluster test "Each node has two links with each peer" Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>	2022-01-23 17:28:32 -08:00
Ozan Tezcan	72e1b5de4d	Fix replica count check in migration tests. (#10140 ) Tests were not using loop index as node id, checking replica count of the same node over and over.	2022-01-19 11:36:24 +02:00
Oran Agra	ae89958972	Set repl-diskless-sync to yes by default, add repl-diskless-sync-max-replicas (#10092 ) 1. enable diskless replication by default 2. add a new config named repl-diskless-sync-max-replicas that enables replication to start before the full repl-diskless-sync-delay was reached. 3. put replica online sooner on the master (see below) 4. test suite uses repl-diskless-sync-delay of 0 to be faster 5. a few tests that use multiple replica on a pre-populated master, are now using the new repl-diskless-sync-max-replicas 6. fix possible timing issues in a few cluster tests (see below) put replica online sooner on the master ---------------------------------------------------- there were two tests that failed because they needed for the master to realize that the replica is online, but the test code was actually only waiting for the replica to realize it's online, and in diskless it could have been before the master realized it. changes include two things: 1. the tests wait on the right thing 2. issues in the master, putting the replica online in two steps. the master used to put the replica as online in 2 steps. the first step was to mark it as online, and the second step was to enable the write event (only after getting ACK), but in fact the first step didn't contains some of the tasks to put it online (like updating good slave count, and sending the module event). this meant that if a test was waiting to see that the replica is online form the point of view of the master, and then confirm that the module got an event, or that the master has enough good replicas, it could fail due to timing issues. so now the full effect of putting the replica online, happens at once, and only the part about enabling the writes is delayed till the ACK. fix cluster tests -------------------- I added some code to wait for the replica to sync and avoid race conditions. later realized the sentinel and cluster tests where using the original 5 seconds delay, so changed it to 0. this means the other changes are probably not needed, but i suppose they're still better (avoid race conditions)	2022-01-17 14:11:11 +02:00
Binbin	440d28091b	Fix function no-cluster flag test (#10115 ) Fixes cluster test introduced in #10066. ``` Function no-cluster flag: ERR Error registering functions: @user_function: 1: wrong number of arguments to redis.register_function ```	2022-01-15 09:13:53 +02:00
Meir Shpilraien (Spielrein)	4db4b43417	Function Flags support (no-writes, no-cluster, allow-state, allow-oom) (#10066 ) # Redis Functions Flags Following the discussion on #10025 Added Functions Flags support. The PR is divided to 2 sections: * Add named argument support to `redis.register_function` API. * Add support for function flags ## `redis.register_function` named argument support The first part of the PR adds support for named argument on `redis.register_function`, example: ``` redis.register_function{ function_name='f1', callback=function() return 'hello' end, description='some desc' } ``` The positional arguments is also kept, which means that it still possible to write: ``` redis.register_function('f1', function() return 'hello' end) ``` But notice that it is no longer possible to pass the optional description argument on the positional argument version. Positional argument was change to allow passing only the mandatory arguments (function name and callback). To pass more arguments the user must use the named argument version. As with positional arguments, the `function_name` and `callback` is mandatory and an error will be raise if those are missing. Also, an error will be raise if an unknown argument name is given or the arguments type is wrong. Tests was added to verify the new syntax. ## Functions Flags The second part of the PR is adding functions flags support. Flags are given to Redis when the engine calls `functionLibCreateFunction`, supported flags are: * `no-writes` - indicating the function perform no writes which means that it is OK to run it on: * read-only replica * Using FCALL_RO * If disk error detected It will not be possible to run a function in those situations unless the function turns on the `no-writes` flag * `allow-oom` - indicate that its OK to run the function even if Redis is in OOM state, if the function will not turn on this flag it will not be possible to run it if OOM reached (even if the function declares `no-writes` and even if `fcall_ro` is used). If this flag is set, any command will be allow on OOM (even those that is marked with CMD_DENYOOM). The assumption is that this flag is for advance users that knows its meaning and understand what they are doing, and Redis trust them to not increase the memory usage. (e.g. it could be an INCR or a modification on an existing key, or a DEL command) * `allow-state` - indicate that its OK to run the function on stale replica, in this case we will also make sure the function is only perform `stale` commands and raise an error if not. * `no-cluster` - indicate to disallow running the function if cluster is enabled. Default behaviure of functions (if no flags is given): 1. Allow functions to read and write 2. Do not run functions on OOM 3. Do not run functions on stale replica 4. Allow functions on cluster ### Lua API for functions flags On Lua engine, it is possible to give functions flags as `flags` named argument: ``` redis.register_function{function_name='f1', callback=function() return 1 end, flags={'no-writes', 'allow-oom'}, description='description'} ``` The function flags argument must be a Lua table that contains all the requested flags, The following will result in an error: * Unknown flag * Wrong flag type Default behaviour is the same as if no flags are used. Tests were added to verify all flags functionality ## Additional changes * mark FCALL and FCALL_RO with CMD_STALE flag (unlike EVAL), so that they can run if the function was registered with the `allow-stale` flag. * Verify `CMD_STALE` on `scriptCall` (`redis.call`), so it will not be possible to call commands from script while stale unless the command is marked with the `CMD_STALE` flags. so that even if the function is allowed while stale we do not allow it to bypass the `CMD_STALE` flag of commands. * Flags section was added to `FUNCTION LIST` command to provide the set of flags for each function: ``` > FUNCTION list withcode 1) 1) "library_name" 2) "test" 3) "engine" 4) "LUA" 5) "description" 6) (nil) 7) "functions" 8) 1) 1) "name" 2) "f1" 3) "description" 4) (nil) 5) "flags" 6) (empty array) 9) "library_code" 10) "redis.register_function{function_name='f1', callback=function() return 1 end}" ``` * Added API to get Redis version from within a script, The redis version can be provided using: 1. `redis.REDIS_VERSION` - string representation of the redis version in the format of MAJOR.MINOR.PATH 2. `redis.REDIS_VERSION_NUM` - number representation of the redis version in the format of `0x00MMmmpp` (`MM` - major, `mm` - minor, `pp` - patch). The number version can be used to check if version is greater or less another version. The string version can be used to return to the user or print as logs. This new API is provided to eval scripts and functions, it also possible to use this API during functions loading phase.	2022-01-14 14:02:02 +02:00
Madelyn Olson	d0949b7c5c	Fix timing issue with cluster hostname test (#10086 )	2022-01-10 16:21:05 -08:00
Madelyn Olson	5460c10047	Implement clusterbus message extensions and cluster hostname support (#9530 ) Implement the ability for cluster nodes to advertise their location with extension messages.	2022-01-02 19:48:29 -08:00
Harkrishn Patro	9f8885760b	Sharded pubsub implementation (#8621 ) This commit implements a sharded pubsub implementation based off of shard channels. Co-authored-by: Harkrishn Patro <harkrisp@amazon.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>	2022-01-02 16:54:47 -08:00
Binbin	febc3f63b2	Fix recent daily CI test failures (#9966 ) Recent PRs have introduced some failures, this commit try to fix these CI failures. Here are the changes: 1. Enable debug-command in sentinel test. ``` Master reboot in very short time: ERR DEBUG command not allowed. If the enable-debug-command option is set to "local", you can run it from a local connection, otherwise you need to set this option in the configuration file, and then restart the server. ``` 2. Enable protected-config in sentinel test. ``` SDOWN is triggered by misconfigured instance replying with errors: ERR CONFIG SET failed (possibly related to argument 'dir') - can't set protected config ``` 3. Enable debug-command in cluster test. ``` Verify slaves consistency: ERR DEBUG command not allowed. If the enable-debug-command option is set to "local", you can run it from a local connection, otherwise you need to set this option in the configuration file, and then restart the server. ``` 4. quicklist fill should be signed int. The reason for the modification is to eliminate the warning. Modify `int fill: QL_FILL_BITS` to `signed int fill: QL_FILL_BITS` The first three were introduced at #9920 (same issue). And the last one was introduced at #9962.	2021-12-20 12:31:13 +02:00
ny0312	792afb4432	Introduce memory management on cluster link buffers (#9774 ) Introduce memory management on cluster link buffers: * Introduce a new `cluster-link-sendbuf-limit` config that caps memory usage of cluster bus link send buffers. * Introduce a new `CLUSTER LINKS` command that displays current TCP links to/from peers. * Introduce a new `mem_cluster_links` field under `INFO` command output, which displays the overall memory usage by all current cluster links. * Introduce a new `total_cluster_links_buffer_limit_exceeded` field under `CLUSTER INFO` command output, which displays the accumulated count of cluster links freed due to `cluster-link-sendbuf-limit`.	2021-12-16 21:56:59 -08:00
Eduardo Semprebon	91d0c758e5	Replica keep serving data during repl-diskless-load=swapdb for better availability (#9323 ) For diskless replication in swapdb mode, considering we already spend replica memory having a backup of current db to restore in case of failure, we can have the following benefits by instead swapping database only in case we succeeded in transferring db from master: - Avoid `LOADING` response during failed and successful synchronization for cases where the replica is already up and running with data. - Faster total time of diskless replication, because now we're moving from Transfer + Flush + Load time to Transfer + Load only. Flushing the tempDb is done asynchronously after swapping. - This could be implemented also for disk replication with similar benefits if consumers are willing to spend the extra memory usage. General notes: - The concept of `backupDb` becomes `tempDb` for clarity. - Async loading mode will only kick in if the replica is syncing from a master that has the same repl-id the one it had before. i.e. the data it's getting belongs to a different time of the same timeline. - New property in INFO: `async_loading` to differentiate from the blocking loading - Slot to Key mapping is now a field of `redisDb` as it's more natural to access it from both server.db and the tempDb that is passed around. - Because this is affecting replicas only, we assume that if they are not readonly and write commands during replication, they are lost after SYNC same way as before, but we're still denying CONFIG SET here anyways to avoid complications. Considerations for review: - We have many cases where server.loading flag is used and even though I tried my best, there may be cases where async_loading should be checked as well and cases where it shouldn't (would require very good understanding of whole code) - Several places that had different behavior depending on the loading flag where actually meant to just handle commands coming from the AOF client differently than ones coming from real clients, changed to check CLIENT_ID_AOF instead. Additional for Release Notes - Bugfix - server.dirty was not incremented for any kind of diskless replication, as effect it wouldn't contribute on triggering next database SAVE - New flag for RM_GetContextFlags module API: REDISMODULE_CTX_FLAGS_ASYNC_LOADING - Deprecated RedisModuleEvent_ReplBackup. Starting from Redis 7.0, we don't fire this event. Instead, we have the new RedisModuleEvent_ReplAsyncLoad holding 3 sub-events: STARTED, ABORTED and COMPLETED. - New module flag REDISMODULE_OPTIONS_HANDLE_REPL_ASYNC_LOAD for RedisModule_SetModuleOptions to allow modules to declare they support the diskless replication with async loading (when absent, we fall back to disk-based loading). Co-authored-by: Eduardo Semprebon <edus@saxobank.com> Co-authored-by: Oran Agra <oran@redislabs.com>	2021-11-04 10:46:50 +02:00
Oran Agra	d04f306931	Fix race condition in cluster test 22-replica-in-sync (#9721 ) there was a chance that by the time the assertion is executed, the replica already manages to reconnect. now we make sure the replica is unable to re-connect to the master. additionally, we wait for some gossip from the disconnected replica, to see that it doesn't mess things up. unrelated: fix a typo when trying to exhaust the backlog, one that didn't have any harmful implications Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>	2021-11-04 08:44:18 +02:00
Oran Agra	48d54265ce	Fix failing cluster tests (#9707 ) Fix failures introduced by #9695 which was an attempt to solve failures introduced by #9679. And alternative to #9703 (i didn't like the extra argument to kill_instance). Reverting #9695. Instead of stopping AOF on all terminations, stop it only on the two which need it. Do it as part of the test rather than the infra (it was add that kill_instance used `R` to communicate to the instance) Note that the original purpose of these tests was to trigger a crash, but that upsets valgrind so in redis 6.2 i changed it to use SIGTERM, so i now rename the tests (remove "kill" and "crash"). Also add some colors to failures, and the word "FAILED" so that it's searchable. And solve a semi-related race condition in 14-consistency-check.tcl	2021-10-31 19:22:21 +02:00
Wen Hui	5fb4adba65	New Cluster Command: CLUSTER DELSLOTSRANGE and CLUSTER ADDSLOTSRANGE (#9445 )	2021-10-26 21:44:33 -07:00

1 2 3 4

152 Commits