redict/tests/unit/shutdown.tcl

134 lines
4.3 KiB
Tcl
Raw Normal View History

Improve test suite to handle external servers better. (#9033) This commit revives the improves the ability to run the test suite against external servers, instead of launching and managing `redis-server` processes as part of the test fixture. This capability existed in the past, using the `--host` and `--port` options. However, it was quite limited and mostly useful when running a specific tests. Attempting to run larger chunks of the test suite experienced many issues: * Many tests depend on being able to start and control `redis-server` themselves, and there's no clear distinction between external server compatible and other tests. * Cluster mode is not supported (resulting with `CROSSSLOT` errors). This PR cleans up many things and makes it possible to run the entire test suite against an external server. It also provides more fine grained controls to handle cases where the external server supports a subset of the Redis commands, limited number of databases, cluster mode, etc. The tests directory now contains a `README.md` file that describes how this works. This commit also includes additional cleanups and fixes: * Tests can now be tagged. * Tag-based selection is now unified across `start_server`, `tags` and `test`. * More information is provided about skipped or ignored tests. * Repeated patterns in tests have been extracted to common procedures, both at a global level and on a per-test file basis. * Cleaned up some cases where test setup was based on a previous test executing (a major anti-pattern that repeats itself in many places). * Cleaned up some cases where test teardown was not part of a test (in the future we should have dedicated teardown code that executes even when tests fail). * Fixed some tests that were flaky running on external servers.
2021-06-09 08:13:24 -04:00
start_server {tags {"shutdown external:skip"}} {
test {Temp rdb will be deleted if we use bg_unlink when shutdown} {
for {set i 0} {$i < 20} {incr i} {
r set $i $i
}
r config set rdb-key-save-delay 10000000
# Child is dumping rdb
r bgsave
wait_for_condition 1000 10 {
[s rdb_bgsave_in_progress] eq 1
} else {
fail "bgsave did not start in time"
}
after 100 ;# give the child a bit of time for the file to be created
set dir [lindex [r config get dir] 1]
set child_pid [get_child_pid 0]
set temp_rdb [file join [lindex [r config get dir] 1] temp-${child_pid}.rdb]
# Temp rdb must be existed
assert {[file exists $temp_rdb]}
catch {r shutdown nosave}
# Make sure the server was killed
catch {set rd [redis_deferring_client]} e
assert_match {*connection refused*} $e
# Temp rdb file must be deleted
assert {![file exists $temp_rdb]}
}
}
start_server {tags {"shutdown external:skip"} overrides {save {900 1}}} {
Wait for replicas when shutting down (#9872) To avoid data loss, this commit adds a grace period for lagging replicas to catch up the replication offset. Done: * Wait for replicas when shutdown is triggered by SIGTERM and SIGINT. * Wait for replicas when shutdown is triggered by the SHUTDOWN command. A new blocked client type BLOCKED_SHUTDOWN is introduced, allowing multiple clients to call SHUTDOWN in parallel. Note that they don't expect a response unless an error happens and shutdown is aborted. * Log warning for each replica lagging behind when finishing shutdown. * CLIENT_PAUSE_WRITE while waiting for replicas. * Configurable grace period 'shutdown-timeout' in seconds (default 10). * New flags for the SHUTDOWN command: - NOW disables the grace period for lagging replicas. - FORCE ignores errors writing the RDB or AOF files which would normally prevent a shutdown. - ABORT cancels ongoing shutdown. Can't be combined with other flags. * New field in the output of the INFO command: 'shutdown_in_milliseconds'. The value is the remaining maximum time to wait for lagging replicas before finishing the shutdown. This field is present in the Server section **only** during shutdown. Not directly related: * When shutting down, if there is an AOF saving child, it is killed **even** if AOF is disabled. This can happen if BGREWRITEAOF is used when AOF is off. * Client pause now has end time and type (WRITE or ALL) per purpose. The different pause purposes are *CLIENT PAUSE command*, *failover* and *shutdown*. If clients are unpaused for one purpose, it doesn't affect client pause for other purposes. For example, the CLIENT UNPAUSE command doesn't affect client pause initiated by the failover or shutdown procedures. A completed failover or a failed shutdown doesn't unpause clients paused by the CLIENT PAUSE command. Notes: * DEBUG RESTART doesn't wait for replicas. * We already have a warning logged when a replica disconnects. This means that if any replica connection is lost during the shutdown, it is either logged as disconnected or as lagging at the time of exit. Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-02 02:50:15 -05:00
test {SHUTDOWN ABORT can cancel SIGTERM} {
r debug pause-cron 1
set pid [s process_id]
exec kill -SIGTERM $pid
after 10; # Give signal handler some time to run
r shutdown abort
verify_log_message 0 "*Shutdown manually aborted*" 0
r debug pause-cron 0
r ping
} {PONG}
test {Temp rdb will be deleted in signal handle} {
for {set i 0} {$i < 20} {incr i} {
r set $i $i
}
# It will cost 2s (20 * 100ms) to dump rdb
r config set rdb-key-save-delay 100000
set pid [s process_id]
set temp_rdb [file join [lindex [r config get dir] 1] temp-${pid}.rdb]
# trigger a shutdown which will save an rdb
exec kill -SIGINT $pid
# Wait for creation of temp rdb
wait_for_condition 50 10 {
[file exists $temp_rdb]
} else {
fail "Can't trigger rdb save on shutdown"
}
# Insist on immediate shutdown, temp rdb file must be deleted
exec kill -SIGINT $pid
# wait for the rdb file to be deleted
wait_for_condition 50 10 {
![file exists $temp_rdb]
} else {
fail "Can't trigger rdb save on shutdown"
}
}
}
start_server {tags {"shutdown external:skip"} overrides {save {900 1}}} {
set pid [s process_id]
set dump_rdb [file join [lindex [r config get dir] 1] dump.rdb]
test {RDB save will be failed in shutdown} {
for {set i 0} {$i < 20} {incr i} {
r set $i $i
}
# create a folder called 'dump.rdb' to trigger temp-rdb rename failure
# and it will cause rdb save to fail eventually.
if {[file exists $dump_rdb]} {
exec rm -f $dump_rdb
}
exec mkdir -p $dump_rdb
}
test {SHUTDOWN will abort if rdb save failed on signal} {
# trigger a shutdown which will save an rdb
exec kill -SIGINT $pid
wait_for_log_messages 0 {"*Error trying to save the DB, can't exit*"} 0 100 10
}
test {SHUTDOWN will abort if rdb save failed on shutdown command} {
catch {[r shutdown]} err
assert_match {*Errors trying to SHUTDOWN*} $err
# make sure the server is still alive
assert_equal [r ping] {PONG}
}
test {SHUTDOWN can proceed if shutdown command was with nosave} {
catch {[r shutdown nosave]}
wait_for_log_messages 0 {"*ready to exit, bye bye*"} 0 100 10
}
test {Clean up rdb same named folder} {
exec rm -r $dump_rdb
}
}
start_server {tags {"shutdown external:skip"} overrides {appendonly no}} {
test {SHUTDOWN SIGTERM will abort if there's an initial AOFRW - default} {
r config set shutdown-on-sigterm default
r config set rdb-key-save-delay 10000000
for {set i 0} {$i < 10} {incr i} {
r set $i $i
}
r config set appendonly yes
wait_for_condition 1000 10 {
[s aof_rewrite_in_progress] eq 1
} else {
fail "aof rewrite did not start in time"
}
set pid [s process_id]
exec kill -SIGTERM $pid
wait_for_log_messages 0 {"*Writing initial AOF, can't exit*"} 0 1000 10
r config set shutdown-on-sigterm force
}
}