redict/tests/unit/latency-monitor.tcl

190 lines
6.3 KiB
Tcl
Raw Normal View History

Improve test suite to handle external servers better. (#9033) This commit revives the improves the ability to run the test suite against external servers, instead of launching and managing `redis-server` processes as part of the test fixture. This capability existed in the past, using the `--host` and `--port` options. However, it was quite limited and mostly useful when running a specific tests. Attempting to run larger chunks of the test suite experienced many issues: * Many tests depend on being able to start and control `redis-server` themselves, and there's no clear distinction between external server compatible and other tests. * Cluster mode is not supported (resulting with `CROSSSLOT` errors). This PR cleans up many things and makes it possible to run the entire test suite against an external server. It also provides more fine grained controls to handle cases where the external server supports a subset of the Redis commands, limited number of databases, cluster mode, etc. The tests directory now contains a `README.md` file that describes how this works. This commit also includes additional cleanups and fixes: * Tests can now be tagged. * Tag-based selection is now unified across `start_server`, `tags` and `test`. * More information is provided about skipped or ignored tests. * Repeated patterns in tests have been extracted to common procedures, both at a global level and on a per-test file basis. * Cleaned up some cases where test setup was based on a previous test executing (a major anti-pattern that repeats itself in many places). * Cleaned up some cases where test teardown was not part of a test (in the future we should have dedicated teardown code that executes even when tests fail). * Fixed some tests that were flaky running on external servers.
2021-06-09 15:13:24 +03:00
start_server {tags {"latency-monitor needs:latency"}} {
2014-07-10 16:23:48 +02:00
# Set a threshold high enough to avoid spurious latency events.
r config set latency-monitor-threshold 200
r latency reset
Added INFO LATENCYSTATS section: latency by percentile distribution/latency by cumulative distribution of latencies (#9462) # Short description The Redis extended latency stats track per command latencies and enables: - exporting the per-command percentile distribution via the `INFO LATENCYSTATS` command. **( percentile distribution is not mergeable between cluster nodes ).** - exporting the per-command cumulative latency distributions via the `LATENCY HISTOGRAM` command. Using the cumulative distribution of latencies we can merge several stats from different cluster nodes to calculate aggregate metrics . By default, the extended latency monitoring is enabled since the overhead of keeping track of the command latency is very small. If you don't want to track extended latency metrics, you can easily disable it at runtime using the command: - `CONFIG SET latency-tracking no` By default, the exported latency percentiles are the p50, p99, and p999. You can alter them at runtime using the command: - `CONFIG SET latency-tracking-info-percentiles "0.0 50.0 100.0"` ## Some details: - The total size per histogram should sit around 40 KiB. We only allocate those 40KiB when a command was called for the first time. - With regards to the WRITE overhead As seen below, there is no measurable overhead on the achievable ops/sec or full latency spectrum on the client. Including also the measured redis-benchmark for unstable vs this branch. - We track from 1 nanosecond to 1 second ( everything above 1 second is considered +Inf ) ## `INFO LATENCYSTATS` exposition format - Format: `latency_percentiles_usec_<CMDNAME>:p0=XX,p50....` ## `LATENCY HISTOGRAM [command ...]` exposition format Return a cumulative distribution of latencies in the format of a histogram for the specified command names. The histogram is composed of a map of time buckets: - Each representing a latency range, between 1 nanosecond and roughly 1 second. - Each bucket covers twice the previous bucket's range. - Empty buckets are not printed. - Everything above 1 sec is considered +Inf. - At max there will be log2(1000000000)=30 buckets We reply a map for each command in the format: `<command name> : { `calls`: <total command calls> , `histogram` : { <bucket 1> : latency , < bucket 2> : latency, ... } }` Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-05 12:01:05 +00:00
test {LATENCY HISTOGRAM with empty histogram} {
r config resetstat
set histo [dict create {*}[r latency histogram]]
# Config resetstat is recorded
assert_equal [dict size $histo] 1
assert_match {*config|resetstat*} $histo
Added INFO LATENCYSTATS section: latency by percentile distribution/latency by cumulative distribution of latencies (#9462) # Short description The Redis extended latency stats track per command latencies and enables: - exporting the per-command percentile distribution via the `INFO LATENCYSTATS` command. **( percentile distribution is not mergeable between cluster nodes ).** - exporting the per-command cumulative latency distributions via the `LATENCY HISTOGRAM` command. Using the cumulative distribution of latencies we can merge several stats from different cluster nodes to calculate aggregate metrics . By default, the extended latency monitoring is enabled since the overhead of keeping track of the command latency is very small. If you don't want to track extended latency metrics, you can easily disable it at runtime using the command: - `CONFIG SET latency-tracking no` By default, the exported latency percentiles are the p50, p99, and p999. You can alter them at runtime using the command: - `CONFIG SET latency-tracking-info-percentiles "0.0 50.0 100.0"` ## Some details: - The total size per histogram should sit around 40 KiB. We only allocate those 40KiB when a command was called for the first time. - With regards to the WRITE overhead As seen below, there is no measurable overhead on the achievable ops/sec or full latency spectrum on the client. Including also the measured redis-benchmark for unstable vs this branch. - We track from 1 nanosecond to 1 second ( everything above 1 second is considered +Inf ) ## `INFO LATENCYSTATS` exposition format - Format: `latency_percentiles_usec_<CMDNAME>:p0=XX,p50....` ## `LATENCY HISTOGRAM [command ...]` exposition format Return a cumulative distribution of latencies in the format of a histogram for the specified command names. The histogram is composed of a map of time buckets: - Each representing a latency range, between 1 nanosecond and roughly 1 second. - Each bucket covers twice the previous bucket's range. - Empty buckets are not printed. - Everything above 1 sec is considered +Inf. - At max there will be log2(1000000000)=30 buckets We reply a map for each command in the format: `<command name> : { `calls`: <total command calls> , `histogram` : { <bucket 1> : latency , < bucket 2> : latency, ... } }` Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-05 12:01:05 +00:00
}
test {LATENCY HISTOGRAM all commands} {
r config resetstat
r set a b
r set c d
set histo [dict create {*}[r latency histogram]]
assert_match {calls 2 histogram_usec *} [dict get $histo set]
assert_match {calls 1 histogram_usec *} [dict get $histo "config|resetstat"]
}
test {LATENCY HISTOGRAM sub commands} {
r config resetstat
r client id
r client list
# parent command reply with its sub commands
set histo [dict create {*}[r latency histogram client]]
assert {[dict size $histo] == 2}
assert_match {calls 1 histogram_usec *} [dict get $histo "client|id"]
assert_match {calls 1 histogram_usec *} [dict get $histo "client|list"]
# explicitly ask for one sub-command
set histo [dict create {*}[r latency histogram "client|id"]]
assert {[dict size $histo] == 1}
assert_match {calls 1 histogram_usec *} [dict get $histo "client|id"]
Added INFO LATENCYSTATS section: latency by percentile distribution/latency by cumulative distribution of latencies (#9462) # Short description The Redis extended latency stats track per command latencies and enables: - exporting the per-command percentile distribution via the `INFO LATENCYSTATS` command. **( percentile distribution is not mergeable between cluster nodes ).** - exporting the per-command cumulative latency distributions via the `LATENCY HISTOGRAM` command. Using the cumulative distribution of latencies we can merge several stats from different cluster nodes to calculate aggregate metrics . By default, the extended latency monitoring is enabled since the overhead of keeping track of the command latency is very small. If you don't want to track extended latency metrics, you can easily disable it at runtime using the command: - `CONFIG SET latency-tracking no` By default, the exported latency percentiles are the p50, p99, and p999. You can alter them at runtime using the command: - `CONFIG SET latency-tracking-info-percentiles "0.0 50.0 100.0"` ## Some details: - The total size per histogram should sit around 40 KiB. We only allocate those 40KiB when a command was called for the first time. - With regards to the WRITE overhead As seen below, there is no measurable overhead on the achievable ops/sec or full latency spectrum on the client. Including also the measured redis-benchmark for unstable vs this branch. - We track from 1 nanosecond to 1 second ( everything above 1 second is considered +Inf ) ## `INFO LATENCYSTATS` exposition format - Format: `latency_percentiles_usec_<CMDNAME>:p0=XX,p50....` ## `LATENCY HISTOGRAM [command ...]` exposition format Return a cumulative distribution of latencies in the format of a histogram for the specified command names. The histogram is composed of a map of time buckets: - Each representing a latency range, between 1 nanosecond and roughly 1 second. - Each bucket covers twice the previous bucket's range. - Empty buckets are not printed. - Everything above 1 sec is considered +Inf. - At max there will be log2(1000000000)=30 buckets We reply a map for each command in the format: `<command name> : { `calls`: <total command calls> , `histogram` : { <bucket 1> : latency , < bucket 2> : latency, ... } }` Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-05 12:01:05 +00:00
}
test {LATENCY HISTOGRAM with a subset of commands} {
r config resetstat
r set a b
r set c d
r get a
r hset f k v
r hgetall f
set histo [dict create {*}[r latency histogram set hset]]
assert_match {calls 2 histogram_usec *} [dict get $histo set]
assert_match {calls 1 histogram_usec *} [dict get $histo hset]
assert_equal [dict size $histo] 2
set histo [dict create {*}[r latency histogram hgetall get zadd]]
assert_match {calls 1 histogram_usec *} [dict get $histo hgetall]
assert_match {calls 1 histogram_usec *} [dict get $histo get]
assert_equal [dict size $histo] 2
Added INFO LATENCYSTATS section: latency by percentile distribution/latency by cumulative distribution of latencies (#9462) # Short description The Redis extended latency stats track per command latencies and enables: - exporting the per-command percentile distribution via the `INFO LATENCYSTATS` command. **( percentile distribution is not mergeable between cluster nodes ).** - exporting the per-command cumulative latency distributions via the `LATENCY HISTOGRAM` command. Using the cumulative distribution of latencies we can merge several stats from different cluster nodes to calculate aggregate metrics . By default, the extended latency monitoring is enabled since the overhead of keeping track of the command latency is very small. If you don't want to track extended latency metrics, you can easily disable it at runtime using the command: - `CONFIG SET latency-tracking no` By default, the exported latency percentiles are the p50, p99, and p999. You can alter them at runtime using the command: - `CONFIG SET latency-tracking-info-percentiles "0.0 50.0 100.0"` ## Some details: - The total size per histogram should sit around 40 KiB. We only allocate those 40KiB when a command was called for the first time. - With regards to the WRITE overhead As seen below, there is no measurable overhead on the achievable ops/sec or full latency spectrum on the client. Including also the measured redis-benchmark for unstable vs this branch. - We track from 1 nanosecond to 1 second ( everything above 1 second is considered +Inf ) ## `INFO LATENCYSTATS` exposition format - Format: `latency_percentiles_usec_<CMDNAME>:p0=XX,p50....` ## `LATENCY HISTOGRAM [command ...]` exposition format Return a cumulative distribution of latencies in the format of a histogram for the specified command names. The histogram is composed of a map of time buckets: - Each representing a latency range, between 1 nanosecond and roughly 1 second. - Each bucket covers twice the previous bucket's range. - Empty buckets are not printed. - Everything above 1 sec is considered +Inf. - At max there will be log2(1000000000)=30 buckets We reply a map for each command in the format: `<command name> : { `calls`: <total command calls> , `histogram` : { <bucket 1> : latency , < bucket 2> : latency, ... } }` Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-05 12:01:05 +00:00
}
test {LATENCY HISTOGRAM command} {
r config resetstat
r set a b
r get a
assert {[llength [r latency histogram set get]] == 4}
}
test {LATENCY HISTOGRAM with wrong command name skips the invalid one} {
r config resetstat
assert {[llength [r latency histogram blabla]] == 0}
assert {[llength [r latency histogram blabla blabla2 set get]] == 0}
r set a b
r get a
assert_match {calls 1 histogram_usec *} [lindex [r latency histogram blabla blabla2 set get] 1]
assert_match {calls 1 histogram_usec *} [lindex [r latency histogram blabla blabla2 set get] 3]
assert {[string length [r latency histogram blabla set get]] > 0}
}
tags {"needs:debug"} {
set old_threshold_value [lindex [r config get latency-monitor-threshold] 1]
2014-07-10 16:23:48 +02:00
test {Test latency events logging} {
r config set latency-monitor-threshold 200
r latency reset
2014-07-10 16:23:48 +02:00
r debug sleep 0.3
after 1100
r debug sleep 0.4
after 1100
r debug sleep 0.5
r config set latency-monitor-threshold 0
2014-07-10 16:23:48 +02:00
assert {[r latency history command] >= 3}
}
2014-07-10 16:23:48 +02:00
test {LATENCY HISTORY output is ok} {
set res [r latency history command]
if {$::verbose} {
puts "LATENCY HISTORY data:"
puts $res
}
2014-07-10 16:23:48 +02:00
set min 250
set max 450
foreach event $res {
2014-07-10 16:23:48 +02:00
lassign $event time latency
if {!$::no_latency} {
assert {$latency >= $min && $latency <= $max}
}
2014-07-10 16:23:48 +02:00
incr min 100
incr max 100
set last_time $time ; # Used in the next test
}
}
test {LATENCY LATEST output is ok} {
set res [r latency latest]
if {$::verbose} {
puts "LATENCY LATEST data:"
puts $res
}
foreach event $res {
2014-07-10 16:23:48 +02:00
lassign $event eventname time latency max
assert {$eventname eq "command"}
if {!$::no_latency} {
assert {$max >= 450 & $max <= 650}
assert {$time == $last_time}
}
2014-07-10 16:23:48 +02:00
break
}
}
test {LATENCY GRAPH can output the event graph} {
set res [r latency graph command]
if {$::verbose} {
puts "LATENCY GRAPH data:"
puts $res
}
assert_match {*command*high*low*} $res
# These numbers are taken from the "Test latency events logging" test.
# (debug sleep 0.3) and (debug sleep 0.5), using range to prevent timing issue.
regexp "command - high (.*?) ms, low (.*?) ms" $res -> high low
assert_morethan_equal $high 500
assert_morethan_equal $low 300
}
r config set latency-monitor-threshold $old_threshold_value
} ;# tag
test {LATENCY of expire events are correctly collected} {
r config set latency-monitor-threshold 20
r flushdb
if {$::valgrind} {set count 100000} else {set count 1000000}
r eval {
local i = 0
while (i < tonumber(ARGV[1])) do
redis.call('sadd',KEYS[1],i)
i = i+1
end
} 1 mybigkey $count
r pexpire mybigkey 50
wait_for_condition 5 100 {
[r dbsize] == 0
} else {
fail "key wasn't expired"
}
assert_match {*expire-cycle*} [r latency latest]
test {LATENCY GRAPH can output the expire event graph} {
assert_match {*expire-cycle*high*low*} [r latency graph expire-cycle]
}
r config set latency-monitor-threshold 200
}
test {LATENCY HISTORY / RESET with wrong event name is fine} {
assert {[llength [r latency history blabla]] == 0}
assert {[r latency reset blabla] == 0}
}
test {LATENCY DOCTOR produces some output} {
assert {[string length [r latency doctor]] > 0}
}
test {LATENCY RESET is able to reset events} {
assert {[r latency reset] > 0}
assert {[r latency latest] eq {}}
}
test {LATENCY HELP should not have unexpected options} {
catch {r LATENCY help xxx} e
assert_match "*wrong number of arguments for 'latency|help' command" $e
}
2014-07-10 16:23:48 +02:00
}