redict

mirror of https://codeberg.org/redict/redict.git synced 2025-01-23 16:48:27 -05:00

Author	SHA1	Message	Date
Oran Agra	16be742b08	fix replication test failure, probing the wrong log file (#9513 )	2021-09-19 12:07:04 +03:00
Wang Yuan	cee3d67f50	Delay to discard cached master when full synchronization (#9398 ) * Delay to discard cache master when full synchronization * Don't disconnect with replicas before loading transferred RDB when full sync Previously, once replica need to start full synchronization with master, it will discard cached master whatever full synchronization is failed or not. Now we discard cached master only when transferring RDB is finished and start to change data space, this make replica could start partial resynchronization with another new master if new master is failed during full synchronization.	2021-09-09 11:32:29 +03:00
Oran Agra	d0819d618e	solve test timing issues in replication tests (#9121 ) # replication-3.tcl had a test timeout failure with valgrind on daily CI: ``` * [err]: SLAVE can reload "lua" AUX RDB fields of duplicated scripts in tests/integration/replication-3.tcl Replication not started. ``` replication took more than 70 seconds. https://github.com/redis/redis/runs/2854037905?check_suite_focus=true on my machine it takes only about 30, but i can see how 50 seconds isn't enough. # replication.tcl loading was over too quickly in freebsd daily CI: ``` * [err]: slave fails full sync and diskless load swapdb recovers it in tests/integration/replication.tcl Expected '0' to be equal to '1' (context: type eval line 44 cmd {assert_equal [s -1 loading] 1} proc ::start_server) ``` # rdb.tcl loading was over too quickly. increase the time loading takes, and decrease the amount of work we try to achieve in that time.	2021-06-22 11:10:11 +03:00
Binbin	0bfccc55e2	Fixed some typos, add a spell check ci and others minor fix (#8890 ) This PR adds a spell checker CI action that will fail future PRs if they introduce typos and spelling mistakes. This spell checker is based on blacklist of common spelling mistakes, so it will not catch everything, but at least it is also unlikely to cause false positives. Besides that, the PR also fixes many spelling mistakes and types, not all are a result of the spell checker we use. Here's a summary of other changes: 1. Scanned the entire source code and fixes all sorts of typos and spelling mistakes (including missing or extra spaces). 2. Outdated function / variable / argument names in comments 3. Fix outdated keyspace masks error log when we check `config.notify-keyspace-events` in loadServerConfigFromString. 4. Trim the white space at the end of line in `module.c`. Check: https://github.com/redis/redis/pull/7751 5. Some outdated https link URLs. 6. Fix some outdated comment. Such as: - In README: about the rdb, we used to said create a `thread`, change to `process` - dbRandomKey function coment (about the dictGetRandomKey, change to dictGetFairRandomKey) - notifyKeyspaceEvent fucntion comment (add type arg) - Some others minor fix in comment (Most of them are incorrectly quoted by variable names) 7. Modified the error log so that users can easily distinguish between TCP and TLS in `changeBindAddr`	2021-06-10 15:39:33 +03:00
Yossi Gottlieb	8a86bca5ed	Improve test suite to handle external servers better. (#9033 ) This commit revives the improves the ability to run the test suite against external servers, instead of launching and managing `redis-server` processes as part of the test fixture. This capability existed in the past, using the `--host` and `--port` options. However, it was quite limited and mostly useful when running a specific tests. Attempting to run larger chunks of the test suite experienced many issues: * Many tests depend on being able to start and control `redis-server` themselves, and there's no clear distinction between external server compatible and other tests. * Cluster mode is not supported (resulting with `CROSSSLOT` errors). This PR cleans up many things and makes it possible to run the entire test suite against an external server. It also provides more fine grained controls to handle cases where the external server supports a subset of the Redis commands, limited number of databases, cluster mode, etc. The tests directory now contains a `README.md` file that describes how this works. This commit also includes additional cleanups and fixes: * Tests can now be tagged. * Tag-based selection is now unified across `start_server`, `tags` and `test`. * More information is provided about skipped or ignored tests. * Repeated patterns in tests have been extracted to common procedures, both at a global level and on a per-test file basis. * Cleaned up some cases where test setup was based on a previous test executing (a major anti-pattern that repeats itself in many places). * Cleaned up some cases where test teardown was not part of a test (in the future we should have dedicated teardown code that executes even when tests fail). * Fixed some tests that were flaky running on external servers.	2021-06-09 15:13:24 +03:00
YaacovHazan	501d775583	unregister AE_READABLE from the read pipe in backgroundSaveDoneHandlerSocket (#8991 ) In diskless replication, we create a read pipe for the RDB, between the child and the parent. When we close this pipe (fd), the read handler also needs to be removed from the event loop (if it still registered). Otherwise, next time we will use the same fd, the registration will be fail (panic), because we will use EPOLL_CTL_MOD (the fd still register in the event loop), on fd that already removed from epoll_ctl	2021-05-26 14:51:53 +03:00
YaacovHazan	32a2584e07	stabilize tests that involved with load handlers (#8967 ) When test stop 'load handler' by killing the process that generating the load, some commands that already in the input buffer, still might be processed by the server. This may cause some instability in tests, that count on that no more commands processed after we stop the `load handler' In this commit, new proc 'wait_load_handlers_disconnected' added, to verify that no more cammands from any 'load handler' prossesed, by checking that the clients who genreate the load is disconnceted. Also, replacing check of dbsize with wait_for_ofs_sync before comparing debug digest, as it would fail in case the last key the workload wrote was an overridden key (not a new one). Affected tests Race fix: - failover command to specific replica works - Connect multiple replicas at the same time (issue #141), master diskless=$mdl, replica diskless=$sdl - AOF rewrite during write load: RDB preamble=$rdbpre Cleanup and speedup: - Test replication with blocking lists and sorted sets operations - Test replication with parallel clients writing in different DBs - Test replication partial resync: $descr (diskless: $mdl, $sdl, reconnect: $reconnect	2021-05-20 15:29:43 +03:00
Oran Agra	8458baf6a9	longer timeout in replication test (#8963 ) the test normally passes. but we saw one failure in a valgrind run in github actions	2021-05-18 17:13:59 +03:00
Oran Agra	a9897b0084	Fix timing of new replication test (#8807 ) In github actions CI with valgrind, i saw that even the fast replica (one that wasn't paused), didn't get to complete the replication fast enough, and ended up getting disconnected by timeout. Additionally, due to a typo in uname, we didn't get to actually run the CPU efficiency part of the test.	2021-04-18 15:12:34 +03:00
guybe7	d63d02601f	Add a timeout mechanism for replicas stuck in fullsync (#8762 ) Starting redis 6.0 (part of the TLS feature), diskless master uses pipe from the fork child so that the parent is the one sending data to the replicas. This mechanism has an issue in which a hung replica will cause the master to wait for it to read the data sent to it forever, thus preventing the fork child from terminating and preventing the creations of any other forks. This PR adds a timeout mechanism, much like the ACK-based timeout, we disconnect replicas that aren't reading the RDB file fast enough.	2021-04-15 17:18:51 +03:00
Qu Chen	7de6451818	Properly initialize variable to make valgrind happy in checkChildrenDone(). Removed usage for the obsolete wait3() and wait4() in favor of waitpid(), and properly check for the exit status code. (#8666 )	2021-03-24 08:41:05 -07:00
Oran Agra	a7c02b19bf	Fix race in replication test (#8679 ) Since redis 6.2, redis immediately tries to connect to the master, not waiting for replication cron. in the slow freebsd CI, this test failed and master_link_status was already "up" when INFO was called.	2021-03-22 10:50:39 +02:00
Yossi Gottlieb	522d93607a	Add io-thread daily CI tests. (#8232 ) This adds basic coverage to IO threads by running the cluster and few selected Redis test suite tests with the IO threads enabled. Also provides some necessary additional improvements to the test suite: * Add --config to sentinel/cluster tests for arbitrary configuration. * Fix --tags whitelisting which was broken. * Add a `network` tag to some tests that are more network intensive. This is work in progress and more tests should be properly tagged in the future.	2021-01-17 15:48:48 +02:00
Oran Agra	9122379abc	Propagate GETSET and SET-GET as SET (#7957 ) - Generates a more backwards compatible command stream - Slightly more efficient execution in replica/AOF - Add a test for coverage	2020-11-03 14:56:57 +02:00
Wang Yuan	dc899c4c88	Fix timing dependence in replication tcl tests (#7969 ) Remove 'fork child $pid' log in replication.tcl	2020-10-27 09:36:42 +02:00
Yossi Gottlieb	843a13e88f	Add a --no-latency tests flag. (#7939 ) Useful for running tests on systems which may be way slower than usual.	2020-10-22 11:10:53 +03:00
Felipe Machado	c3f9e01794	Adds new pop-push commands (LMOVE, BLMOVE) (#6929 ) Adding [B]LMOVE <src> <dst> RIGHT\|LEFT RIGHT\|LEFT. deprecating [B]RPOPLPUSH. Note that when receiving a BRPOPLPUSH we'll still propagate an RPOPLPUSH, but on BLMOVE RIGHT LEFT we'll propagate an LMOVE improvement to existing tests - Replace "after 1000" with "wait_for_condition" when wait for clients to block/unblock. - Add a pre-existing element to target list on basic tests so that we can check if the new element was added to the correct side of the list. - check command stats on the replica to make sure the right command was replicated Co-authored-by: Oran Agra <oran@redislabs.com>	2020-10-08 08:33:17 +03:00
Wang Yuan	1bb5794a1f	Kill disk-based fork child when all replicas drop and 'save' is not enabled (#7819 ) When all replicas waiting for a bgsave get disconnected (possibly due to output buffer limit), It may be good to kill the bgsave child. in diskless replication it already happens, but in disk-based, the child may still serve some purpose (for persistence). By killing the child, we prevent it from eating COW memory in vain, and we also allow a new child fork sooner for the next full synchronization or bgsave. We do that only if rdb persistence wasn't enabled in the configuration. Btw, now, rdbRemoveTempFile in killRDBChild won't block server, so we can killRDBChild safely.	2020-09-22 09:47:58 +03:00
Oran Agra	573246f73c	if diskless repl child is killed, make sure to reap the pid (#7742 ) Starting redis 6.0 and the changes we made to the diskless master to be suitable for TLS, I made the master avoid reaping (wait3) the pid of the child until we know all replicas are done reading their rdb. I did that in order to avoid a state where the rdb_child_pid is -1 but we don't yet want to start another fork (still busy serving that data to replicas). It turns out that the solution used so far was problematic in case the fork child was being killed (e.g. by the kernel OOM killer), in that case there's a chance that we currently disabled the read event on the rdb pipe, since we're waiting for a replica to become writable again. and in that scenario the master would have never realized the child exited, and the replica will remain hung too. Note that there's no mechanism to detect a hung replica while it's in rdb transfer state. The solution here is to add another pipe which is used by the parent to tell the child it is safe to exit. this mean that when the child exits, for whatever reason, it is safe to reap it. Besides that, i'm re-introducing an adjustment to REPLCONF ACK which was part of #6271 (Accelerate diskless master connections) but was dropped when that PR was rebased after the TLS fork/pipe changes (`5a47794`). Now that RdbPipeCleanup no longer calls checkChildrenDone, and the ACK has chance to detect that the child exited, it should be the one to call it so that we don't have to wait for cron (server.hz) to do that.	2020-09-06 16:43:57 +03:00
Oran Agra	c17e597d05	Accelerate diskless master connections, and general re-connections (#6271 ) Diskless master has some inherent latencies. 1) fork starts with delay from cron rather than immediately 2) replica is put online only after an ACK. but the ACK was sent only once a second. 3) but even if it would arrive immediately, it will not register in case cron didn't yet detect that the fork is done. Besides that, when a replica disconnects, it doesn't immediately attempts to re-connect, it waits for replication cron (one per second). in case it was already online, it may be important to try to re-connect as soon as possible, so that the backlog at the master doesn't vanish. In case it disconnected during rdb transfer, one can argue that it's not very important to re-connect immediately, but this is needed for the "diskless loading short read" test to be able to run 100 iterations in 5 seconds, rather than 3 (waiting for replication cron re-connection) changes in this commit: 1) sync command starts a fork immediately if no sync_delay is configured 2) replica sends REPLCONF ACK when done reading the rdb (rather than on 1s cron) 3) when a replica unexpectedly disconnets, it immediately tries to re-connect rather than waiting 1s 4) when when a child exits, if there is another replica waiting, we spawn a new one right away, instead of waiting for 1s replicationCron. 5) added a call to connectWithMaster from replicationSetMaster. which is called from the REPLICAOF command but also in 3 places in cluster.c, in all of these the connection attempt will now be immediate instead of delayed by 1 second. side note: we can add a call to rdbPipeReadHandler in replconfCommand when getting a REPLCONF ACK from the replica to solve a race where the replica got the entire rdb and EOF marker before we detected that the pipe was closed. in the test i did see this race happens in one about of some 300 runs, but i concluded that this race is unlikely in real life (where the replica is on another host and we're more likely to first detect the pipe was closed. the test runs 100 iterations in 3 seconds, so in some cases it'll take 4 seconds instead (waiting for another REPLCONF ACK). Removing unneeded startBgsaveForReplication from updateSlavesWaitingForBgsave Now that CheckChildrenDone is calling the new replicationStartPendingFork (extracted from serverCron) there's actually no need to call startBgsaveForReplication from updateSlavesWaitingForBgsave anymore, since as soon as updateSlavesWaitingForBgsave returns, CheckChildrenDone is calling replicationStartPendingFork that handles that anyway. The code in updateSlavesWaitingForBgsave had a bug in which it ignored repl-diskless-sync-delay, but removing that code shows that this bug was hiding another bug, which is that the max_idle should have used >= and not >, this one second delay has a big impact on my new test.	2020-08-06 16:53:06 +03:00
Oran Agra	109b5ccdcd	Fix failing tests due to issues with wait_for_log_message (#7572 ) - the test now waits for specific set of log messages rather than wait for timeout looking for just one message. - we don't wanna sample the current length of the log after an action, due to a race, we need to start the search from the line number of the last message we where waiting for. - when attempting to trigger a full sync, use multi-exec to avoid a race where the replica manages to re-connect before we completed the set of actions that should force a full sync. - fix verify_log_message which was broken and unused	2020-07-28 11:15:29 +03:00
Oran Agra	8e76e13472	stabilize tests that look for log lines (#7367 ) tests were sensitive to additional log lines appearing in the log causing the search to come empty handed. instead of just looking for the n last log lines, capture the log lines before performing the action, and then search from that offset.	2020-07-10 08:28:22 +03:00
Oran Agra	75c11d7fec	fix valgrind test failure in replication test in `b4416280c` i added more keys to that test to make it run longer but in valgrind this now means the test times out, give valgrind more time.	2020-05-18 10:26:53 +03:00
Oran Agra	357aace895	add regression test for the race in #7205 with the original version of 6.0.0, this test detects an excessive full sync. with the fix in `1a7cd2c0e`, this test detects memory corruption, especially when using libc allocator with or without valgrind.	2020-05-17 18:26:02 +03:00
Oran Agra	b4416280cf	fix unstable replication test this test which has coverage for varoius flows of diskless master was failing randomly from time to time. the failure was: [err]: diskless all replicas drop during rdb pipe in tests/integration/replication.tcl log message of 'Diskless rdb transfer, last replica dropped, killing fork child' not found what seemed to have happened is that the master didn't detect that all replicas dropped by the time the replication ended, it thought that one replica is still connected. now the test takes a few seconds longer but it seems stable.	2020-05-12 08:59:09 +03:00
zhaozhao.zz	58554396d6	incrbyfloat: fix issue #5256 ttl lost after propagate	2019-12-18 15:44:51 +08:00
Yossi Gottlieb	0db3b0a0ff	Merge remote-tracking branch 'upstream/unstable' into tls	2019-10-16 17:08:07 +03:00
Daniel Dai	98600c9a11	update typo	2019-10-09 14:15:31 -04:00
Yossi Gottlieb	61733ded14	TLS: Configuration options. Add configuration options for TLS protocol versions, ciphers/cipher suites selection, etc.	2019-10-07 21:07:27 +03:00
Oran Agra	5a47794606	diskless replication rdb transfer uses pipe, and writes to sockets form the parent process. misc: - handle SSL_has_pending by iterating though these in beforeSleep, and setting timeout of 0 to aeProcessEvents - fix issue with epoll signaling EPOLLHUP and EPOLLERR only to the write handlers. (needed to detect the rdb pipe was closed) - add key-load-delay config for testing - trim connShutdown which is no longer needed - rioFdsetWrite -> rioFdWrite - simplified since there's no longer need to write to multiple FDs - don't detect rdb child exited (don't call wait3) until we detect the pipe is closed - Cleanup bad optimization from rio.c, add another one	2019-10-07 21:06:30 +03:00
Yossi Gottlieb	b087dd1db6	TLS: Connections refactoring and TLS support. * Introduce a connection abstraction layer for all socket operations and integrate it across the code base. * Provide an optional TLS connections implementation based on OpenSSL. * Pull a newer version of hiredis with TLS support. * Tests, redis-cli updates for TLS support.	2019-10-07 21:06:13 +03:00
Oran Agra	c56b4ddc6f	prevent diskless replica from terminating on short read now that replica can read rdb directly from the socket, it should avoid exiting on short read and instead try to re-sync. this commit tries to have minimal effects on non-diskless rdb reading. and includes a test that tries to trigger this scenario on various read cases.	2019-07-17 16:46:22 +02:00
Oran Agra	2de544cfcc	diskless replication on slave side (don't store rdb to file), plus some other related fixes The implementation of the diskless replication was currently diskless only on the master side. The slave side was still storing the received rdb file to the disk before loading it back in and parsing it. This commit adds two modes to load rdb directly from socket: 1) when-empty 2) using "swapdb" the third mode of using diskless slave by flushdb is risky and currently not included. other changes: -------------- distinguish between aof configuration and state so that we can re-enable aof only when sync eventually succeeds (and not when exiting from readSyncBulkPayload after a failed attempt) also a CONFIG GET and INFO during rdb loading would have lied When loading rdb from the network, don't kill the server on short read (that can be a network error) Fix rdb check when performed on preamble AOF tests: run replication tests for diskless slave too make replication test a bit more aggressive Add test for diskless load swapdb	2019-07-08 15:37:48 +03:00
antirez	009a929269	Remove debugging printf from replication.tcl test.	2018-12-12 11:55:30 +01:00
antirez	4cf8fdbbd3	Slave removal: remove slave from integration tests descriptions.	2018-09-11 15:32:28 +02:00
antirez	febe102bf6	Test: processing of master stream in slave -BUSY state. See #5297.	2018-08-31 16:45:02 +02:00
antirez	5f0fef5eb9	Regression test for issue #2813 .	2015-10-15 11:23:15 +02:00
antirez	37260bc3be	Test: regression for issue #2473 .	2015-03-27 12:10:46 +01:00
antirez	8a09e12906	Attempt to prevent false positives in replication test.	2014-11-24 11:54:56 +01:00
antirez	d6797d34c0	Diskless replication tested with the multiple slaves consistency test.	2014-10-24 09:49:26 +02:00
Matt Stancliff	1cedebb799	Remove trailing spaces from tests	2014-09-29 06:49:08 -04:00
antirez	e01195e90d	Test: AOF rewrite during write load.	2014-07-10 11:25:12 +02:00
antirez	1f0c0df4a8	Fixed assert conditional in ROLE command test.	2014-06-26 22:13:46 +02:00
antirez	1206bdf13f	Basic tests for the ROLE command.	2014-06-23 09:08:51 +02:00
antirez	d64d2e21c9	Make tests compatible with new INFO replication output.	2013-05-30 11:43:43 +02:00
Johan Bergström	1154283577	Use `info nameofexectuable` to find current executable	2013-01-24 09:37:18 +11:00
antirez	7eb850ef0e	A reimplementation of blocking operation internals. Redis provides support for blocking operations such as BLPOP or BRPOP. This operations are identical to normal LPOP and RPOP operations as long as there are elements in the target list, but if the list is empty they block waiting for new data to arrive to the list. All the clients blocked waiting for th same list are served in a FIFO way, so the first that blocked is the first to be served when there is more data pushed by another client into the list. The previous implementation of blocking operations was conceived to serve clients in the context of push operations. For for instance: 1) There is a client "A" blocked on list "foo". 2) The client "B" performs `LPUSH foo somevalue`. 3) The client "A" is served in the context of the "B" LPUSH, synchronously. Processing things in a synchronous way was useful as if "A" pushes a value that is served by "B", from the point of view of the database is a NOP (no operation) thing, that is, nothing is replicated, nothing is written in the AOF file, and so forth. However later we implemented two things: 1) Variadic LPUSH that could add multiple values to a list in the context of a single call. 2) BRPOPLPUSH that was a version of BRPOP that also provided a "PUSH" side effect when receiving data. This forced us to make the synchronous implementation more complex. If client "B" is waiting for data, and "A" pushes three elemnents in a single call, we needed to propagate an LPUSH with a missing argument in the AOF and replication link. We also needed to make sure to replicate the LPUSH side of BRPOPLPUSH, but only if in turn did not happened to serve another blocking client into another list ;) This were complex but with a few of mutually recursive functions everything worked as expected... until one day we introduced scripting in Redis. Scripting + synchronous blocking operations = Issue #614. Basically you can't "rewrite" a script to have just a partial effect on the replicas and AOF file if the script happened to serve a few blocked clients. The solution to all this problems, implemented by this commit, is to change the way we serve blocked clients. Instead of serving the blocked clients synchronously, in the context of the command performing the PUSH operation, it is now an asynchronous and iterative process: 1) If a key that has clients blocked waiting for data is the subject of a list push operation, We simply mark keys as "ready" and put it into a queue. 2) Every command pushing stuff on lists, as a variadic LPUSH, a script, or whatever it is, is replicated verbatim without any rewriting. 3) Every time a Redis command, a MULTI/EXEC block, or a script, completed its execution, we run the list of keys ready to serve blocked clients (as more data arrived), and process this list serving the blocked clients. 4) As a result of "3" maybe more keys are ready again for other clients (as a result of BRPOPLPUSH we may have push operations), so we iterate back to step "3" if it's needed. The new code has a much simpler semantics, and a simpler to understand implementation, with the disadvantage of not being able to "optmize out" a PUSH+BPOP as a No OP. This commit will be tested with care before the final merge, more tests will be added likely.	2012-09-17 10:26:46 +02:00
antirez	d9241b35e5	Properly wait the slave to sync with master in BRPOPLPUSH test.	2012-04-30 10:55:03 +02:00
antirez	2d4b55214f	A more lightweight implementation of issue 141 regression test.	2012-04-29 17:16:44 +02:00
antirez	28ccb53008	Redis test: More reliable BRPOPLPUSH replication test. Now it uses the new wait_for_condition testing primitive. Also wait_for_condition implementation was fixed in this commit to properly escape the expr command and its argument.	2012-04-26 11:25:13 +02:00

1 2

65 Commits