redict/tests
Oran Agra c17e597d05
Accelerate diskless master connections, and general re-connections (#6271)
Diskless master has some inherent latencies.
1) fork starts with delay from cron rather than immediately
2) replica is put online only after an ACK. but the ACK
   was sent only once a second.
3) but even if it would arrive immediately, it will not
   register in case cron didn't yet detect that the fork is done.

Besides that, when a replica disconnects, it doesn't immediately
attempts to re-connect, it waits for replication cron (one per second).
in case it was already online, it may be important to try to re-connect
as soon as possible, so that the backlog at the master doesn't vanish.

In case it disconnected during rdb transfer, one can argue that it's
not very important to re-connect immediately, but this is needed for the
"diskless loading short read" test to be able to run 100 iterations in 5
seconds, rather than 3 (waiting for replication cron re-connection)

changes in this commit:
1) sync command starts a fork immediately if no sync_delay is configured
2) replica sends REPLCONF ACK when done reading the rdb (rather than on 1s cron)
3) when a replica unexpectedly disconnets, it immediately tries to
   re-connect rather than waiting 1s
4) when when a child exits, if there is another replica waiting, we spawn a new
   one right away, instead of waiting for 1s replicationCron.
5) added a call to connectWithMaster from replicationSetMaster. which is called
   from the REPLICAOF command but also in 3 places in cluster.c, in all of
   these the connection attempt will now be immediate instead of delayed by 1
   second.

side note:
we can add a call to rdbPipeReadHandler in replconfCommand when getting
a REPLCONF ACK from the replica to solve a race where the replica got
the entire rdb and EOF marker before we detected that the pipe was
closed.
in the test i did see this race happens in one about of some 300 runs,
but i concluded that this race is unlikely in real life (where the
replica is on another host and we're more likely to first detect the
pipe was closed.
the test runs 100 iterations in 3 seconds, so in some cases it'll take 4
seconds instead (waiting for another REPLCONF ACK).

Removing unneeded startBgsaveForReplication from updateSlavesWaitingForBgsave
Now that CheckChildrenDone is calling the new replicationStartPendingFork
(extracted from serverCron) there's actually no need to call
startBgsaveForReplication from updateSlavesWaitingForBgsave anymore,
since as soon as updateSlavesWaitingForBgsave returns, CheckChildrenDone is
calling replicationStartPendingFork that handles that anyway.
The code in updateSlavesWaitingForBgsave had a bug in which it ignored
repl-diskless-sync-delay, but removing that code shows that this bug was
hiding another bug, which is that the max_idle should have used >= and
not >, this one second delay has a big impact on my new test.
2020-08-06 16:53:06 +03:00
..
assets Fix test "server is up" detection after logging changes. 2016-12-19 16:49:58 +01:00
cluster Fix tests/cluster/cluster.tcl about wrong usage of lrange. (#6702) 2020-08-04 18:00:58 +03:00
helpers revert an accidental test code change done as part of the tls project 2019-12-01 16:10:09 +02:00
integration Accelerate diskless master connections, and general re-connections (#6271) 2020-08-06 16:53:06 +03:00
modules This PR introduces a new loaded keyspace event (#7536) 2020-07-23 12:38:51 +03:00
sentinel TLS: Configuration options. 2019-10-07 21:07:27 +03:00
support Add a ZMSCORE command returning an array of scores. (#7593) 2020-08-04 17:49:33 +03:00
tmp minor fixes to the new test suite, html doc updated 2010-05-14 18:48:33 +02:00
unit Accelerate diskless master connections, and general re-connections (#6271) 2020-08-06 16:53:06 +03:00
instances.tcl Fix out of update help info in tcl tests. (#7516) 2020-07-14 11:35:04 +03:00
test_helper.tcl runtest --stop pause stops before terminating the redis server (#7513) 2020-07-13 16:09:08 +03:00