Fix timing issue in replication test (#9719)

So it looks like sampling set loglines [count_log_lines -2] was
executed too late, and the replication managed to complete before that.

```
*** [err]: diskless no replicas drop during rdb pipe in tests/integration/replication.tcl
log message of '"*Diskless rdb transfer, done reading from pipe, 2 replicas still up*"' not found in ./tests/tmp/server.6124.69/stdout after line: 52 till line: 52
```

Changes:
1. when we search the master log file, we start to search from before we sent the REPLICAOF
  command, to prevent a race in which the replication completed before we sampled the log line count.
2. we don't need to sample the replica loglines sine it's a fresh resplica that's just been started, so the message
  we're looking for is the first occurrence in the log, we can start search from 0.

Co-authored-by: Oran Agra <oran@redislabs.com>
This commit is contained in:
Binbin 2021-11-02 16:32:01 +08:00 committed by GitHub
parent cea7809cea
commit 58a1d16ff6
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -614,7 +614,7 @@ start_server {tags {"repl external:skip"}} {
# start replication
# it's enough for just one replica to be slow, and have it's write handler enabled
# so that the whole rdb generation process is bound to that
set loglines [count_log_lines -1]
set loglines [count_log_lines -2]
[lindex $replicas 0] config set repl-diskless-load swapdb
[lindex $replicas 0] config set key-load-delay 100 ;# 20k keys and 100 microseconds sleep means at least 2 seconds
[lindex $replicas 0] replicaof $master_host $master_port
@ -622,7 +622,7 @@ start_server {tags {"repl external:skip"}} {
# wait for the replicas to start reading the rdb
# using the log file since the replica only responds to INFO once in 2mb
wait_for_log_messages -1 {"*Loading DB in memory*"} $loglines 800 10
wait_for_log_messages -1 {"*Loading DB in memory*"} 0 800 10
if {$measure_time} {
set master_statfile "/proc/$master_pid/stat"
@ -638,7 +638,6 @@ start_server {tags {"repl external:skip"}} {
$master incr $all_drop
# disconnect replicas depending on the current test
set loglines [count_log_lines -2]
if {$all_drop == "all" || $all_drop == "fast"} {
exec kill [srv 0 pid]
set replicas_alive [lreplace $replicas_alive 1 1]