redict/tests/unit/oom-score-adj.tcl

78 lines
2.6 KiB
Tcl
Raw Normal View History

set system_name [string tolower [exec uname -s]]
set user_id [exec id -u]
if {$system_name eq {linux}} {
start_server {tags {"oom-score-adj"}} {
proc get_oom_score_adj {{pid ""}} {
if {$pid == ""} {
set pid [srv 0 pid]
}
set fd [open "/proc/$pid/oom_score_adj" "r"]
set val [gets $fd]
close $fd
return $val
}
test {CONFIG SET oom-score-adj works as expected} {
set base [get_oom_score_adj]
# Enable oom-score-adj, check defaults
r config set oom-score-adj-values "10 20 30"
r config set oom-score-adj yes
assert {[get_oom_score_adj] == [expr $base + 10]}
# Modify current class
r config set oom-score-adj-values "15 20 30"
assert {[get_oom_score_adj] == [expr $base + 15]}
# Check replica class
r replicaof localhost 1
assert {[get_oom_score_adj] == [expr $base + 20]}
r replicaof no one
assert {[get_oom_score_adj] == [expr $base + 15]}
# Check child process
r set key-a value-a
r config set rdb-key-save-delay 1000000
r bgsave
if diskless repl child is killed, make sure to reap the pid (#7742) Starting redis 6.0 and the changes we made to the diskless master to be suitable for TLS, I made the master avoid reaping (wait3) the pid of the child until we know all replicas are done reading their rdb. I did that in order to avoid a state where the rdb_child_pid is -1 but we don't yet want to start another fork (still busy serving that data to replicas). It turns out that the solution used so far was problematic in case the fork child was being killed (e.g. by the kernel OOM killer), in that case there's a chance that we currently disabled the read event on the rdb pipe, since we're waiting for a replica to become writable again. and in that scenario the master would have never realized the child exited, and the replica will remain hung too. Note that there's no mechanism to detect a hung replica while it's in rdb transfer state. The solution here is to add another pipe which is used by the parent to tell the child it is safe to exit. this mean that when the child exits, for whatever reason, it is safe to reap it. Besides that, i'm re-introducing an adjustment to REPLCONF ACK which was part of #6271 (Accelerate diskless master connections) but was dropped when that PR was rebased after the TLS fork/pipe changes (5a47794). Now that RdbPipeCleanup no longer calls checkChildrenDone, and the ACK has chance to detect that the child exited, it should be the one to call it so that we don't have to wait for cron (server.hz) to do that.
2020-09-06 09:43:57 -04:00
set child_pid [get_child_pid 0]
# Wait until background child process to setOOMScoreAdj success.
wait_for_condition 100 10 {
[get_oom_score_adj $child_pid] == [expr $base + 30]
} else {
fail "Set oom-score-adj of background child process is not ok"
}
}
# Failed oom-score-adj tests can only run unprivileged
if {$user_id != 0} {
test {CONFIG SET oom-score-adj handles configuration failures} {
# Bad config
r config set oom-score-adj no
r config set oom-score-adj-values "-1000 -1000 -1000"
# Make sure it fails
catch {r config set oom-score-adj yes} e
assert_match {*Failed to set*} $e
# Make sure it remains off
assert {[r config get oom-score-adj] == "oom-score-adj no"}
# Fix config
r config set oom-score-adj-values "0 100 100"
r config set oom-score-adj yes
# Make sure it fails
catch {r config set oom-score-adj-values "-1000 -1000 -1000"} e
assert_match {*Failed*} $e
# Make sure previous values remain
assert {[r config get oom-score-adj-values] == {oom-score-adj-values {0 100 100}}}
}
}
}
}