redict/tests/unit/oom-score-adj.tcl

78 lines
2.6 KiB
Tcl
Raw Normal View History

set system_name [string tolower [exec uname -s]]
set user_id [exec id -u]
if {$system_name eq {linux}} {
Improve test suite to handle external servers better. (#9033) This commit revives the improves the ability to run the test suite against external servers, instead of launching and managing `redis-server` processes as part of the test fixture. This capability existed in the past, using the `--host` and `--port` options. However, it was quite limited and mostly useful when running a specific tests. Attempting to run larger chunks of the test suite experienced many issues: * Many tests depend on being able to start and control `redis-server` themselves, and there's no clear distinction between external server compatible and other tests. * Cluster mode is not supported (resulting with `CROSSSLOT` errors). This PR cleans up many things and makes it possible to run the entire test suite against an external server. It also provides more fine grained controls to handle cases where the external server supports a subset of the Redis commands, limited number of databases, cluster mode, etc. The tests directory now contains a `README.md` file that describes how this works. This commit also includes additional cleanups and fixes: * Tests can now be tagged. * Tag-based selection is now unified across `start_server`, `tags` and `test`. * More information is provided about skipped or ignored tests. * Repeated patterns in tests have been extracted to common procedures, both at a global level and on a per-test file basis. * Cleaned up some cases where test setup was based on a previous test executing (a major anti-pattern that repeats itself in many places). * Cleaned up some cases where test teardown was not part of a test (in the future we should have dedicated teardown code that executes even when tests fail). * Fixed some tests that were flaky running on external servers.
2021-06-09 08:13:24 -04:00
start_server {tags {"oom-score-adj external:skip"}} {
proc get_oom_score_adj {{pid ""}} {
if {$pid == ""} {
set pid [srv 0 pid]
}
set fd [open "/proc/$pid/oom_score_adj" "r"]
set val [gets $fd]
close $fd
return $val
}
test {CONFIG SET oom-score-adj works as expected} {
set base [get_oom_score_adj]
# Enable oom-score-adj, check defaults
r config set oom-score-adj-values "10 20 30"
r config set oom-score-adj yes
assert {[get_oom_score_adj] == [expr $base + 10]}
# Modify current class
r config set oom-score-adj-values "15 20 30"
assert {[get_oom_score_adj] == [expr $base + 15]}
# Check replica class
r replicaof localhost 1
assert {[get_oom_score_adj] == [expr $base + 20]}
r replicaof no one
assert {[get_oom_score_adj] == [expr $base + 15]}
# Check child process
r set key-a value-a
r config set rdb-key-save-delay 1000000
r bgsave
if diskless repl child is killed, make sure to reap the pid (#7742) Starting redis 6.0 and the changes we made to the diskless master to be suitable for TLS, I made the master avoid reaping (wait3) the pid of the child until we know all replicas are done reading their rdb. I did that in order to avoid a state where the rdb_child_pid is -1 but we don't yet want to start another fork (still busy serving that data to replicas). It turns out that the solution used so far was problematic in case the fork child was being killed (e.g. by the kernel OOM killer), in that case there's a chance that we currently disabled the read event on the rdb pipe, since we're waiting for a replica to become writable again. and in that scenario the master would have never realized the child exited, and the replica will remain hung too. Note that there's no mechanism to detect a hung replica while it's in rdb transfer state. The solution here is to add another pipe which is used by the parent to tell the child it is safe to exit. this mean that when the child exits, for whatever reason, it is safe to reap it. Besides that, i'm re-introducing an adjustment to REPLCONF ACK which was part of #6271 (Accelerate diskless master connections) but was dropped when that PR was rebased after the TLS fork/pipe changes (5a47794). Now that RdbPipeCleanup no longer calls checkChildrenDone, and the ACK has chance to detect that the child exited, it should be the one to call it so that we don't have to wait for cron (server.hz) to do that.
2020-09-06 09:43:57 -04:00
set child_pid [get_child_pid 0]
# Wait until background child process to setOOMScoreAdj success.
wait_for_condition 100 10 {
[get_oom_score_adj $child_pid] == [expr $base + 30]
} else {
fail "Set oom-score-adj of background child process is not ok"
}
}
# Failed oom-score-adj tests can only run unprivileged
if {$user_id != 0} {
test {CONFIG SET oom-score-adj handles configuration failures} {
# Bad config
r config set oom-score-adj no
r config set oom-score-adj-values "-1000 -1000 -1000"
# Make sure it fails
catch {r config set oom-score-adj yes} e
assert_match {*Failed to set*} $e
# Make sure it remains off
assert {[r config get oom-score-adj] == "oom-score-adj no"}
# Fix config
r config set oom-score-adj-values "0 100 100"
r config set oom-score-adj yes
# Make sure it fails
catch {r config set oom-score-adj-values "-1000 -1000 -1000"} e
assert_match {*Failed*} $e
# Make sure previous values remain
assert {[r config get oom-score-adj-values] == {oom-score-adj-values {0 100 100}}}
}
}
}
}