# Short description
The Redis extended latency stats track per command latencies and enables:
- exporting the per-command percentile distribution via the `INFO LATENCYSTATS` command.
**( percentile distribution is not mergeable between cluster nodes ).**
- exporting the per-command cumulative latency distributions via the `LATENCY HISTOGRAM` command.
Using the cumulative distribution of latencies we can merge several stats from different cluster nodes
to calculate aggregate metrics .
By default, the extended latency monitoring is enabled since the overhead of keeping track of the
command latency is very small.
If you don't want to track extended latency metrics, you can easily disable it at runtime using the command:
- `CONFIG SET latency-tracking no`
By default, the exported latency percentiles are the p50, p99, and p999.
You can alter them at runtime using the command:
- `CONFIG SET latency-tracking-info-percentiles "0.0 50.0 100.0"`
## Some details:
- The total size per histogram should sit around 40 KiB. We only allocate those 40KiB when a command
was called for the first time.
- With regards to the WRITE overhead As seen below, there is no measurable overhead on the achievable
ops/sec or full latency spectrum on the client. Including also the measured redis-benchmark for unstable
vs this branch.
- We track from 1 nanosecond to 1 second ( everything above 1 second is considered +Inf )
## `INFO LATENCYSTATS` exposition format
- Format: `latency_percentiles_usec_<CMDNAME>:p0=XX,p50....`
## `LATENCY HISTOGRAM [command ...]` exposition format
Return a cumulative distribution of latencies in the format of a histogram for the specified command names.
The histogram is composed of a map of time buckets:
- Each representing a latency range, between 1 nanosecond and roughly 1 second.
- Each bucket covers twice the previous bucket's range.
- Empty buckets are not printed.
- Everything above 1 sec is considered +Inf.
- At max there will be log2(1000000000)=30 buckets
We reply a map for each command in the format:
`<command name> : { `calls`: <total command calls> , `histogram` : { <bucket 1> : latency , < bucket 2> : latency, ... } }`
Co-authored-by: Oran Agra <oran@redislabs.com>
msgpack lib missed using lua_checkstack and so on rare
cases overflow the stack by at most 2 elements. This is a
violation of the Lua C API. Notice that Lua allocates
additional 5 more elements on top of lua->stack_last
so Redis does not access an invalid memory. But it is an
API violation and we should avoid it.
This PR also added a new Lua compilation option. The new
option can be enable using environment variable called
LUA_DEBUG. If set to `yes` (by default `no`), Lua will be
compiled without optimizations and with debug symbols (`-O0 -g`).
In addition, in this new mode, Lua will be compiled with the
`-DLUA_USE_APICHECK` flag that enables extended Lua C API
validations.
In addition, set LUA_DEBUG=yes on daily valgrind flow so we
will be able to catch Lua C API violations in the future.
Background:
Following the upgrade to jemalloc 5.2, there was a test that used to be flaky and
started failing consistently (on 32bit), so we disabled it (see #9645).
This is a test that i introduced in #7289 when i attempted to solve a rare stagnation
problem, and it later turned out i failed to solve it, ans what's more i added a test that
caused it to be not so rare, and as i mentioned, now in jemalloc 5.2 it became consistent on 32bit.
Stagnation can happen when all the slabs of the bin are equally utilized, so the decision
to move an allocation from a relatively empty slab to a relatively full one, will never
happen, and in that test all the slabs are at 50% utilization, so the defragger could just
keep scanning the keyspace and not move anything.
What this PR changes:
* First, finally in jemalloc 5.2 we have the count of non-full slabs, so when we compare
the utilization of the current slab, we can compare it to the average utilization of the non-full
slabs in our bin, instead of the total average of our bin. this takes the full slabs out of the game,
since they're not candidates for migration (neither source nor target).
* Secondly, We add some 12% (100/8) to the decision to defrag an allocation, this is the part
that aims to avoid stagnation, and it's especially important since the above mentioned change
can get us closer to stagnation.
* Thirdly, since jemalloc 5.2 adds sharded bins, we take into account all shards (something
that's missing from the original PR that merged it), this isn't expected to make any difference
since anyway there should be just one shard.
How this was benchmarked.
What i did was run the memefficiency test unit with `--verbose` and compare the defragger hits
and misses the tests reported.
At first, when i took into consideration only the non-full slabs, it got a lot worse (i got into
stagnation, or just got a lot of misses and a lot of hits), but when i added the 10% i got back
to results that were slightly better than the ones of the jemalloc 5.1 branch. i.e. full defragmentation
was achieved with fewer hits (relocations), and fewer misses (keyspace scans).
Upgraded to jemalloc 5.2.1 from 5.1.0.
Cherry picked all relevant fixes (by diffing our 5.1.0 to upstream 5.10 and finding relevant commits).
Details of what was done:
[cherry-picked] fd7d51c 2021-05-03 Resolve nonsense static analysis warnings (Oran Agra)
[cherry-picked] 448c435 2020-09-29 Fix compilation warnings in Lua and jemalloc dependencies (#7785) (YoongHM)
[skipped - already in upstream] 9216b96 2020-09-21 Fix compilation warning in jemalloc's malloc_vsnprintf (#7789) (YoongHM)
[cherry-picked] 88d71f4 2020-05-20 fix a rare active defrag edge case bug leading to stagnation (Oran Agra)
[skipped - already in upstream] 2fec7d9 2019-05-30 Jemalloc: Avoid blocking on background thread lock for stats.
[cherry-picked] 920158e 2018-07-11 Active defrag fixes for 32bit builds (again) (Oran Agra)
[cherry-picked] e8099ca 2018-06-26 add defrag hint support into jemalloc 5 (Oran Agra)
[re-done] 4e729fc 2018-05-24 Generate configure for Jemalloc. (antirez)
Additionally had to do this:
7727cc2 2021-10-10 Fix defrag to support sharded bins in arena (added in v5.2.1) (Yoav Steinberg)
When reviewing please look at all except the first commit which is just replacing 5.1.0 with 5.2.1 sources.
Also I think we should merge this without squashing to preserve the changes we did to to jemalloc.
- The argument `u` in for `ar` is ignored (and generates warnings since `D` became the default.
All it does is avoid updating unchanged objects (shouldn't have any impact on our build)
- Enable `LUA_USE_MKSTEMP` to force the use of `mkstemp()` instead of `tmpname()` (which is dead
code in redis anyway).
- Remove unused variable `c` in `f_parser()`
- Removed misleadingly indented space in `luaL_loadfile()` and ``addfield()`
Co-authored-by: Oran Agra <oran@redislabs.com>
There's a rare case which leads to stagnation in the defragger, causing
it to keep scanning the keyspace and do nothing (not moving any
allocation), this happens when all the allocator slabs of a certain bin
have the same % utilization, but the slab from which new allocations are
made have a lower utilization.
this commit fixes it by removing the current slab from the overall
average utilization of the bin, and also eliminate any precision loss in
the utilization calculation and move the decision about the defrag to
reside inside jemalloc.
and also add a test that consistently reproduce this issue.
The redis-cli command line tool and redis-sentinel service may be vulnerable
to integer overflow when parsing specially crafted large multi-bulk network
replies. This is a result of a vulnerability in the underlying hiredis
library which does not perform an overflow check before calling the calloc()
heap allocation function.
This issue only impacts systems with heap allocators that do not perform their
own overflow checks. Most modern systems do and are therefore not likely to
be affected. Furthermore, by default redis-sentinel uses the jemalloc allocator
which is also not vulnerable.
Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
For a lot of long strings which have same prefix which extends beyond
hashing limit, there will be many hash collisions which result in
performance degradation using commands like KEYS
This is considered a safer approach as it prevents a race condition that
could lead to chmod executed on a different file.
Not a major risk, but CodeQL alerted this so it makes sense to fix.
- The argument `u` in for `ar` is ignored (and generates warnings since `D` became the default.
All it does is avoid updating unchanged objects (shouldn't have any impact on our build)
- Enable `LUA_USE_MKSTEMP` to force the use of `mkstemp()` instead of `tmpname()` (which is dead
code in redis anyway).
- Remove unused variable `c` in `f_parser()`
- Removed misleadingly indented space in `luaL_loadfile()` and ``addfield()`
Co-authored-by: Oran Agra <oran@redislabs.com>
Change `val` to `unsigned char` before being tested.
The fix is identical to the one that's been made in upstream jemalloc.
warning is:
src/malloc_io.c: In function ‘malloc_vsnprintf’:
src/malloc_io.c:369:2: warning: case label value exceeds maximum value for type
369 | case '?' | 0x80: \
| ^~~~
src/malloc_io.c:581:5: note: in expansion of macro ‘GET_ARG_NUMERIC’
581 | GET_ARG_NUMERIC(val, 'p');
| ^~~~~~~~~~~~~~~
This fixes the issue described in CVE-2014-5461. At this time we cannot
confirm that the original issue has a real impact on Redis, but it is
included as an extra safety measure.
A first step to enable a consistent full percentile analysis on query latency so that we can fully understand the performance and stability characteristics of the redis-server system we are measuring. It also improves the instantaneous reported metrics, and the csv output format.
There's a rare case which leads to stagnation in the defragger, causing
it to keep scanning the keyspace and do nothing (not moving any
allocation), this happens when all the allocator slabs of a certain bin
have the same % utilization, but the slab from which new allocations are
made have a lower utilization.
this commit fixes it by removing the current slab from the overall
average utilization of the bin, and also eliminate any precision loss in
the utilization calculation and move the decision about the defrag to
reside inside jemalloc.
and also add a test that consistently reproduce this issue.
* Introduce a connection abstraction layer for all socket operations and
integrate it across the code base.
* Provide an optional TLS connections implementation based on OpenSSL.
* Pull a newer version of hiredis with TLS support.
* Tests, redis-cli updates for TLS support.
Background threads may run for a long time, especially when the # of dirty pages
is high. Avoid blocking stats calls because of this (which may cause latency
spikes).
see https://github.com/jemalloc/jemalloc/issues/1502
cherry picked from commit 1a71533511027dbe3f9d989659efeec446915d6b