Every time when accept a connection, we add the client to `server.clients` list's tail, but in `clientsCron` we rotate the tail to head at first, and then process the head. It means that the "new" client would be processed before "old" client, moreover if connections established and then freed frequently, the "old" client may have no chance to be processed.
To fix it, we need take a fair way to iterate the list, that is take the current head and process, and then rotate the head to tail, thus we can make sure all clients could be processed step by step.
p.s. client has `client_list_node` pointer, we don't need put the current client to head to avoid O(N) when remove it.
Since we do report the RDB error in below:
```
serverLog(LL_WARNING,"Error trying to save the DB, can't exit.");
if (server.supervised_mode == SUPERVISED_SYSTEMD)
redisCommunicateSystemd("STATUS=Error trying to save the DB, can't exit.\n");
goto error;
```
This may be an overlook in #6052
The code in aeProcessEvent was testing AE_DONT_WAIT flag at the wrong time.
The flag is set by by beforeSleep, but was was tested before calling beforeSleep,
which would result in aeProcessEvent waiting when it shouldn't have, impacting TLS's HasPendingData.
Co-authored-by: Oran Agra <oran@redislabs.com>
1. Check for missing schema only after the docs contain sentinel commands
2. The ignore-list in the C file contain only commands that cannot have a reply
schema. The one in the py file is an extension of that list
3. Temp: skipsentinel commands don't have a schema or test coverage yet,
add them to the py list
Solve CI error introduced by #12018
We currently do not allow the use of bool type in redis project.
We didn't catch it in script.c because we included hdr_histogram.h in server.h
Removing it (but still having it in some c files) reducing
the chance to miss the usage of bool type in the future and catch it
in compiling stage.
It also removes the dependency on hdr_histogram for every unit
that includes server.h
Minor test case addition for DECR and DECRBY.
Currently DECR and DECRBY do not have test case coverage for the
scenarios where they run on a non-existing key.
1. it's a bad idea to print these errors and exit after daemonization (if we'll exit,
the failure may not be detected)
2. it's not nice to exit (or even do the check that uses `fork`) after modules already
started (could create threads, or do some changes)
Some sentinel subcommands are missing the reply_schema in the json file,
so add the proper reply_schema part in json file as sentinel replicas commands.
The schema validator was skipping coverage test for sentinel commands, this was initially
done just in order to focus on redis commands and leave sentinel coverage for later,
so this check is now removed.
sentinel commands that were missing reply schema:
* sentinel masters
* sentinel myid
* sentinel sentinels <master-name>
* sentinel slaves (deprecated) <master-name>
In order to speed up tests, avoid saving an RDB (mostly notable on shutdown),
except for tests that explicitly test the RDB mechanism
In addition, use `shutdown-on-sigterm force` to prevetn shutdown from failing
in case the server is in the middle of the initial AOFRW
Also a a test that checks that the `shutdown-on-sigterm default` is to refuse
shutdown if there's an initial AOFRW
Co-authored-by: Guy Benoish <guy.benoish@redislabs.com>
This PR is to fix the compilation warnings and errors generated by the latest
complier toolchain, and to add a new runner of the latest toolchain for daily CI.
## Fix various compilation warnings and errors
1) jemalloc.c
COMPILER: clang-14 with FORTIFY_SOURCE
WARNING:
```
src/jemalloc.c:1028:7: warning: suspicious concatenation of string literals in an array initialization; did you mean to separate the elements with a comma? [-Wstring-concatenation]
"/etc/malloc.conf",
^
src/jemalloc.c:1027:3: note: place parentheses around the string literal to silence warning
"\"name\" of the file referenced by the symbolic link named "
^
```
REASON: the compiler to alert developers to potential issues with string concatenation
that may miss a comma,
just like #9534 which misses a comma.
SOLUTION: use `()` to tell the compiler that these two line strings are continuous.
2) config.h
COMPILER: clang-14 with FORTIFY_SOURCE
WARNING:
```
In file included from quicklist.c:36:
./config.h:319:76: warning: attribute declaration must precede definition [-Wignored-attributes]
char *strcat(char *restrict dest, const char *restrict src) __attribute__((deprecated("please avoid use of unsafe C functions. prefer use of redis_strlcat instead")));
```
REASON: Enabling _FORTIFY_SOURCE will cause the compiler to use `strcpy()` with check,
it results in a deprecated attribute declaration after including <features.h>.
SOLUTION: move the deprecated attribute declaration from config.h to fmacro.h before "#include <features.h>".
3) networking.c
COMPILER: GCC-12
WARNING:
```
networking.c: In function ‘addReplyDouble.part.0’:
networking.c:876:21: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=]
876 | dbuf[start] = '$';
| ^
networking.c:868:14: note: at offset -5 into destination object ‘dbuf’ of size 5152
868 | char dbuf[MAX_LONG_DOUBLE_CHARS+32];
| ^
networking.c:876:21: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=]
876 | dbuf[start] = '$';
| ^
networking.c:868:14: note: at offset -6 into destination object ‘dbuf’ of size 5152
868 | char dbuf[MAX_LONG_DOUBLE_CHARS+32];
```
REASON: GCC-12 predicts that digits10() may return 9 or 10 through `return 9 + (v >= 1000000000UL)`.
SOLUTION: add an assert to let the compiler know the possible length;
4) redis-cli.c & redis-benchmark.c
COMPILER: clang-14 with FORTIFY_SOURCE
WARNING:
```
redis-benchmark.c:1621:2: warning: embedding a directive within macro arguments has undefined behavior [-Wembedded-directive] #ifdef USE_OPENSSL
redis-cli.c:3015:2: warning: embedding a directive within macro arguments has undefined behavior [-Wembedded-directive] #ifdef USE_OPENSSL
```
REASON: when _FORTIFY_SOURCE is enabled, the compiler will use the print() with
check, which is a macro. this may result in the use of directives within the macro, which
is undefined behavior.
SOLUTION: move the directives-related code out of `print()`.
5) server.c
COMPILER: gcc-13 with FORTIFY_SOURCE
WARNING:
```
In function 'lookupCommandLogic',
inlined from 'lookupCommandBySdsLogic' at server.c:3139:32:
server.c:3102:66: error: '*(robj **)argv' may be used uninitialized [-Werror=maybe-uninitialized]
3102 | struct redisCommand *base_cmd = dictFetchValue(commands, argv[0]->ptr);
| ~~~~^~~
```
REASON: The compiler thinks that the `argc` returned by `sdssplitlen()` could be 0,
resulting in an empty array of size 0 being passed to lookupCommandLogic.
this should be a false positive, `argc` can't be 0 when strings are not NULL.
SOLUTION: add an assert to let the compiler know that `argc` is positive.
6) sha1.c
COMPILER: gcc-12
WARNING:
```
In function ‘SHA1Update’,
inlined from ‘SHA1Final’ at sha1.c:195:5:
sha1.c:152:13: warning: ‘SHA1Transform’ reading 64 bytes from a region of size 0 [-Wstringop-overread]
152 | SHA1Transform(context->state, &data[i]);
| ^
sha1.c:152:13: note: referencing argument 2 of type ‘const unsigned char[64]’
sha1.c: In function ‘SHA1Final’:
sha1.c:56:6: note: in a call to function ‘SHA1Transform’
56 | void SHA1Transform(uint32_t state[5], const unsigned char buffer[64])
| ^
In function ‘SHA1Update’,
inlined from ‘SHA1Final’ at sha1.c:198:9:
sha1.c:152:13: warning: ‘SHA1Transform’ reading 64 bytes from a region of size 0 [-Wstringop-overread]
152 | SHA1Transform(context->state, &data[i]);
| ^
sha1.c:152:13: note: referencing argument 2 of type ‘const unsigned char[64]’
sha1.c: In function ‘SHA1Final’:
sha1.c:56:6: note: in a call to function ‘SHA1Transform’
56 | void SHA1Transform(uint32_t state[5], const unsigned char buffer[64])
```
REASON: due to the bug[https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80922], when
enable LTO, gcc-12 will not see `diagnostic ignored "-Wstringop-overread"`, resulting in a warning.
SOLUTION: temporarily set SHA1Update to noinline to avoid compiler warnings due
to LTO being enabled until the above gcc bug is fixed.
7) zmalloc.h
COMPILER: GCC-12
WARNING:
```
In function ‘memset’,
inlined from ‘moduleCreateContext’ at module.c:877:5,
inlined from ‘RM_GetDetachedThreadSafeContext’ at module.c:8410:5:
/usr/include/x86_64-linux-gnu/bits/string_fortified.h:59:10: warning: ‘__builtin_memset’ writing 104 bytes into a region of size 0 overflows the destination [-Wstringop-overflow=]
59 | return __builtin___memset_chk (__dest, __ch, __len,
```
REASON: due to the GCC-12 bug [https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503],
GCC-12 cannot see alloc_size, which causes GCC to think that the actual size of memory
is 0 when checking with __glibc_objsize0().
SOLUTION: temporarily set malloc-related interfaces to `noinline` to avoid compiler warnings
due to LTO being enabled until the above gcc bug is fixed.
## Other changes
1) Fixed `ps -p [pid]` doesn't output `<defunct>` when using procps 4.x causing `replication
child dies when parent is killed - diskless` test to fail.
2) Add a new fortify CI with GCC-13 and ubuntu-lunar docker image.
Minor test case addition.
Currently GETRANGE command does not have the test case coverage for the scenarios:
An error is returned when key exists but of different type
Added missing test cases for getrange command.
The nightly tests showed that the recent PR #12022 caused random failures
in aof.tcl on checking RDB preamble inside an AOF file.
Root cause:
When checking RDB preamble in an AOF file, what's passed into redis_check_rdb is
aof_filename, not aof_filepath. The newly introduced isFifo function does not check return
status of the stat call and hence uses the uninitailized stat_p object.
Fix:
1. Fix isFifo by checking stat call's return code.
2. Pass aof_filepath instead of aof_filename to redis_check_rdb.
3. move the FIFO check to rdb.c since the limitation is the re-opening of the file, and not
anything specific about redis-check-rdb.
this pr fix two wrongs:
1. When client’s querybuf is pre-allocated for a fat argv, we need to update the
querybuf_peak of the client immediately to completely avoid the unexpected
shrinking of querybuf in the next clientCron (before data arrives to set the peak).
2. the protocol's bulklen does not include `\r\n`, but the allocation and the data we
read does. so in `clientsCronResizeQueryBuffer`, the `resize` or `querybuf_peak`
should add these 2 bytes.
the first bug is likely to hit us on large payloads over slow connections, in which case
transferring the payload can take longer and a cron event will be triggered (specifically
if there are not a lot of clients)
When loading RDB over the named piped, redis_check_rdb() is hung at fopen, because fopen blocks until another process opens the FIFO for writing. The fix is to check if RDB is FIFO. If yes, return an error.
There is are some missing test cases for SUBSTR command.
These might already be covered by GETRANGE, but no harm in adding them since they're simple.
Added 3 test case.
* start > stop
* start and stop both greater than string length
* when no key is present.
Currently LINSERT command does not have the test case coverage for following scenarios.
1. When key does not exist, it is considered an empty list and no operation is performed.
2. An error is returned when key exists but does not hold a list value.
Added above two missing test cases for linsert command.
In daily.yml, if the input suggests we don't run the full testsuite,
do not pass --fail-commands-not-all-hit to the validator.
This fixes the first point in #11954. Credit goes to the comment
on the open issue for GH actions: actions/runner#409
Also improve prints to show the dispatch arguments in every job.
* Add RM_ReplyWithErrorFormat that can support format
Reply with the error create from a printf format and arguments.
If the error code is already passed in the string 'fmt', the error
code provided is used, otherwise the string "-ERR " for the generic
error code is automatically added.
The usage is, for example:
RedisModule_ReplyWithErrorFormat(ctx, "An error: %s", "foo");
RedisModule_ReplyWithErrorFormat(ctx, "-WRONGTYPE Wrong Type: %s", "foo");
The function always returns REDISMODULE_OK.
The MacOS CI in github actions often hangs without any logs. GH argues that
it's due to resource utilization, either running out of disk space, memory, or CPU
starvation, and thus the runner is terminated.
This PR contains multiple attempts to resolve this:
1. introducing pause_process instead of SIGSTOP, which waits for the process
to stop before resuming the test, possibly resolving race conditions in some tests,
this was a suspect since there was one test that could result in an infinite loop in that
case, in practice this didn't help, but still a good idea to keep.
2. disable the `save` config in many tests that don't need it, specifically ones that use
heavy writes and could create large files.
3. change the `populate` proc to use short pipeline rather than an infinite one.
4. use `--clients 1` in the macos CI so that we don't risk running multiple resource
demanding tests in parallel.
5. enable `--verbose` to be repeated to elevate verbosity and print more info to stdout
when a test or a server starts.
We do have ZREMRANGEBYLEX tests, but it is a stress test
marked with slow tag and then skipped in reply-schemas daily.
In the past, we were able to succeed on a daily, i guess
it was because there were some random command executions,
such as corrupt-dump-fuzzy, which might call it.
These test examples are taken from ZRANGEBYLEX basics test.
## Issue
When we use GCC-12 later or clang 9.0 later to build with `-D_FORTIFY_SOURCE=3`,
we can see the following buffer overflow:
```
=== REDIS BUG REPORT START: Cut & paste starting from here ===
6263:M 06 Apr 2023 08:59:12.915 # Redis 255.255.255 crashed by signal: 6, si_code: -6
6263:M 06 Apr 2023 08:59:12.915 # Crashed running the instruction at: 0x7f03d59efa7c
------ STACK TRACE ------
EIP:
/lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7f03d59efa7c]
Backtrace:
/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f03d599b520]
/lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7f03d59efa7c]
/lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7f03d599b476]
/lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7f03d59817f3]
/lib/x86_64-linux-gnu/libc.so.6(+0x896f6)[0x7f03d59e26f6]
/lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x2a)[0x7f03d5a8f76a]
/lib/x86_64-linux-gnu/libc.so.6(+0x1350c6)[0x7f03d5a8e0c6]
src/redis-server 127.0.0.1:25111(+0xd5e80)[0x557cddd3be80]
src/redis-server 127.0.0.1:25111(feedReplicationBufferWithObject+0x78)[0x557cddd3c768]
src/redis-server 127.0.0.1:25111(replicationFeedSlaves+0x1a4)[0x557cddd3cbc4]
src/redis-server 127.0.0.1:25111(+0x8721a)[0x557cddced21a]
src/redis-server 127.0.0.1:25111(call+0x47a)[0x557cddcf38ea]
src/redis-server 127.0.0.1:25111(processCommand+0xbf4)[0x557cddcf4aa4]
src/redis-server 127.0.0.1:25111(processInputBuffer+0xe6)[0x557cddd22216]
src/redis-server 127.0.0.1:25111(readQueryFromClient+0x3a8)[0x557cddd22898]
src/redis-server 127.0.0.1:25111(+0x1b9134)[0x557cdde1f134]
src/redis-server 127.0.0.1:25111(aeMain+0x119)[0x557cddce5349]
src/redis-server 127.0.0.1:25111(main+0x466)[0x557cddcd6716]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f03d5982d90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f03d5982e40]
src/redis-server 127.0.0.1:25111(_start+0x25)[0x557cddcd7025]
```
The main reason is that when FORTIFY_SOURCE is enabled, GCC or clang will enhance some
common functions, such as `strcpy`, `memcpy`, `fgets`, etc, so that they can detect buffer
overflow errors and stop program execution, thus improving the safety of the program.
We use `zmalloc_usable_size()` everywhere to use memory blocks, but that is an abuse since the
malloc_usable_size() isn't meant for this kind of use, it is for diagnostics only. That is also why the
behavior is flaky when built with _FORTIFY_SOURCE, the compiler can sense that we reach outside
the allocated block and SIGABRT.
### Solution
If we need to use the additional memory we got, we need to use a dummy realloc with `alloc_size` attribute
and no inlining, (see `extend_to_usable`) to let the compiler see the large of memory we need to use.
This can either be an implicit call inside `z*usable` that returns the size, so that the caller doesn't have any
other worry, or it can be a normal zmalloc call which means that if the caller wants to use
zmalloc_usable_size it must also use extend_to_usable.
### Changes
This PR does the following:
1) rename the current z[try]malloc_usable family to z[try]malloc_internal and don't expose them to users outside zmalloc.c,
2) expose a new set of `z[*]_usable` family that use z[*]_internal and `extend_to_usable()` implicitly, the caller gets the
size of the allocation and it is safe to use.
3) go over all the users of `zmalloc_usable_size` and convert them to use the `z[*]_usable` family if possible.
4) in the places where the caller can't use `z[*]_usable` and store the real size, and must still rely on zmalloc_usable_size,
we still make sure that the allocation used `z[*]_usable` (which has a call to `extend_to_usable()`) and ignores the
returning size, this way a later call to `zmalloc_usable_size` is still safe.
[4] was done for module.c and listpack.c, all the others places (sds, reply proto list, replication backlog, client->buf)
are using [3].
Co-authored-by: Oran Agra <oran@redislabs.com>
Add `RM_RdbLoad()` and `RM_RdbSave()` to load/save RDB files from the module API.
In our use case, we have our clustering implementation as a module. As part of this
implementation, the module needs to trigger RDB save operation at specific points.
Also, this module delivers RDB files to other nodes (not using Redis' replication).
When a node receives an RDB file, it should be able to load the RDB. Currently,
there is no module API to save/load RDB files.
This PR adds four new APIs:
```c
RedisModuleRdbStream *RM_RdbStreamCreateFromFile(const char *filename);
void RM_RdbStreamFree(RedisModuleRdbStream *stream);
int RM_RdbLoad(RedisModuleCtx *ctx, RedisModuleRdbStream *stream, int flags);
int RM_RdbSave(RedisModuleCtx *ctx, RedisModuleRdbStream *stream, int flags);
```
The first step is to create a `RedisModuleRdbStream` object. This PR provides a function to
create RedisModuleRdbStream from the filename. (You can load/save RDB with the filename).
In the future, this API can be extended if needed:
e.g., `RM_RdbStreamCreateFromFd()`, `RM_RdbStreamCreateFromSocket()` to save/load
RDB from an `fd` or a `socket`.
Usage:
```c
/* Save RDB */
RedisModuleRdbStream *stream = RedisModule_RdbStreamCreateFromFile("example.rdb");
RedisModule_RdbSave(ctx, stream, 0);
RedisModule_RdbStreamFree(stream);
/* Load RDB */
RedisModuleRdbStream *stream = RedisModule_RdbStreamCreateFromFile("example.rdb");
RedisModule_RdbLoad(ctx, stream, 0);
RedisModule_RdbStreamFree(stream);
```
This test produces 1GB of data and moves it around, and was expecting less
than 500kb to be present in the system page cache.
It sometimes fails with up to some 6mb in the page cache (0 in the actual RDB files),
increasing the threshold. It looks like some background tasks in the container are
occupying the page cache.
It is safe to ignore the above since we also explicitly check the pages of our dump.rdb
are not cached (matching `vmtouch -v` to `0%`).
An additional fix is to match ` 0%` (add space), so that we don't successfully match `10%`.
details in https://github.com/redis/redis/pull/11818
Redis supports syslog integration via these directives, documented in redis.conf.
While these directives are not documented in sentinel.conf, they do work with Redis-Sentinel.
It took me a while to realize this, adding them to make it clear.
Fix the following config file error
```
*** FATAL CONFIG FILE ERROR (Redis 6.2.7) ***
Reading the configuration file, at line 152
>>> 'sentinel known-replica XXXX 127.0.0.1 5001'
Duplicate hostname and port for replica.
```
that is happening when a user uses the legacy key "known-slave" in
the config file and a config rewrite occurs. The config rewrite logic won't
replace the old line "sentinel known-slave XXXX 127.0.0.1 5001" and
would add a new line with "sentinel known-replica XXXX 127.0.0.1 5001"
which results in the error above "Duplicate hostname and port for replica."
example:
Current sentinal.conf
```
...
sentinel known-slave XXXX 127.0.0.1 5001
sentinel example-random-option X
...
```
after the config rewrite logic runs:
```
....
sentinel known-slave XXXX 127.0.0.1 5001
sentinel example-random-option X
# Generated by CONFIG REWRITE
sentinel known-replica XXXX 127.0.0.1 5001
```
This bug only exists in Redis versions >=6.2 because prior to that it was hidden
by the effects of this bug https://github.com/redis/redis/issues/5388 that was fixed
in https://github.com/redis/redis/pull/8271 and was released in versions >=6.2
In cluster mode, when a node restart as a replica, it doesn't immediately
sync with the master, replication is enabled in clusterCron. It means that
sometime server.masterhost is NULL and we wrongly judge it in beforeSleep.
In this case, we may trigger a fast activeExpireCycle in beforeSleep, but the
node's flag is actually a replica, that can lead to data inconsistency. In this
PR, we use iAmMaster to replace the `server.masterhost == NULL`
This is an overlook in #7001, and more discussion in #11783.
In the Redis 7.0 and newer version,
config set command support multiply `<parameter> <value>` pairs, thus the previous
sensitive command condition does not apply anymore
For example:
The command:
**config set maxmemory 1GB masteruser aa** will be written to redis_cli historyfile
In this PR, we update the condition for these sensitive commands
config set masteruser <username>
config set masterauth <master-password>
config set requirepass foobared
The existing logic for killing pub-sub clients did not handle the `allchannels`
permission correctly. For example, if you:
ACL SETUSER foo allchannels
Have a client authenticate as the user `foo` and subscribe to a channel, and then:
ACL SETUSER foo resetchannels
The subscribed client would not be disconnected, though new clients under that user
would be blocked from subscribing to any channels.
This was caused by an incomplete optimization in `ACLKillPubsubClientsIfNeeded`
checking whether the new channel permissions were a strict superset of the old ones.
Now that the command argument specs are available at runtime (#9656), this PR addresses
#8084 by implementing a complete solution for command-line hinting in `redis-cli`.
It correctly handles nearly every case in Redis's complex command argument definitions, including
`BLOCK` and `ONEOF` arguments, reordering of optional arguments, and repeated arguments
(even when followed by mandatory arguments). It also validates numerically-typed arguments.
It may not correctly handle all possible combinations of those, but overall it is quite robust.
Arguments are only matched after the space bar is typed, so partial word matching is not
supported - that proved to be more confusing than helpful. When the user's current input
cannot be matched against the argument specs, hinting is disabled.
Partial support has been implemented for legacy (pre-7.0) servers that do not support
`COMMAND DOCS`, by falling back to a statically-compiled command argument table.
On startup, if the server does not support `COMMAND DOCS`, `redis-cli` will now issue
an `INFO SERVER` command to retrieve the server version (unless `HELLO` has already
been sent, in which case the server version will be extracted from the reply to `HELLO`).
The server version will be used to filter the commands and arguments in the command table,
removing those not supported by that version of the server. However, the static table only
includes core Redis commands, so with a legacy server hinting will not be supported for
module commands. The auto generated help.h and the scripts that generates it are gone.
Command and argument tables for the server and CLI use different structs, due primarily
to the need to support different runtime data. In order to generate code for both, macros
have been added to `commands.def` (previously `commands.c`) to make it possible to
configure the code generation differently for different use cases (one linked with redis-server,
and one with redis-cli).
Also adding a basic testing framework for the command hints based on new (undocumented)
command line options to `redis-cli`: `--test_hint 'INPUT'` prints out the command-line hint for
a given input string, and `--test_hint_file <filename>` runs a suite of test cases for the hinting
mechanism. The test suite is in `tests/assets/test_cli_hint_suite.txt`, and it is run from
`tests/integration/redis-cli.tcl`.
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
In #11012, we changed the way command durations were computed to handle the same command being executed multiple times. This commit fixes some misses from that commit.
* Wait commands were not correctly reporting their duration if the timeout was reached.
* Multi/scripts/and modules with RM_Call were not properly resetting the duration between inner calls, leading to them reporting cumulative duration.
* When a blocked client is freed, the call and duration are always discarded.
This commit also adds an assert if the duration is not properly reset, potentially indicating that a report to call statistics was missed. The assert potentially be removed in the future, as it's mainly intended to detect misses in tests.
This is an attempt to normalize/formalize command summaries.
Main actions performed:
* Starts with the continuation of the phrase "The XXXX command, when called, ..." for user commands.
* Starts with "An internal command...", "A container command...", etc... when applicable.
* Always uses periods.
* Refrains from referring to other commands. If this is needed, backquotes should be used for command names.
* Tries to be very clear about the data type when applicable.
* Tries to mention additional effects, e.g. "The key is created if it doesn't exist" and "The set is deleted if the last member is removed."
* Prefers being terse over verbose.
* Tries to be consistent.
This PR fix several unrelated bugs that were discovered by the same set of tests
(WAITAOF tests in #11713), could make the `WAITAOF` test hang.
The change in `backgroundRewriteDoneHandler` is about MP-AOF.
That leftover / old code assumes that we started a new AOF file just now
(when we have a new base into which we're gonna incrementally write), but
the fact is that with MP-AOF, the fork done handler doesn't really affect the
incremental file being maintained by the parent process, there's no reason to
re-issue `SELECT`, and no reason to update any of the fsync variables in that flow.
This should have been deleted with MP-AOF (introduced in #9788, 7.0).
The damage is that the update to `aof_fsync_offset` will cause us to miss an fsync
in `flushAppendOnlyFile`, that happens if we stop write commands in `AOF_FSYNC_EVERYSEC`
while an AOFRW is in progress. This caused a new `WAITAOF` test to sometime hang forever.
Also because of MP-AOF, we needed to change `aof_fsync_offset` to `aof_last_incr_fsync_offset`
and match it to `aof_last_incr_size` in `flushAppendOnlyFile`. This is because in the past we compared
`aof_fsync_offset` and `aof_current_size`, but with MP-AOF it could be the total AOF file will be
smaller after AOFRW, and the (already existing) incr file still has data that needs to be fsynced.
The change in `flushAppendOnlyFile`, about the `AOF_FSYNC_ALWAYS`, it is follow #6053
(the details is in #5985), we also check `AOF_FSYNC_ALWAYS` to handle a case where
appendfsync is changed from everysec to always while there is data that's written but not yet fsynced.
Starting with the recent #11926 Makefile specifies `-flto=auto` which is unsupported on clang.
Additionally, detecting clang correctly requires actually running it, since on MacOS gcc can be an alias for clang.
This test fails sporadically:
```
*** [err]: Migrate the last slot away from a node using redis-cli in tests/unit/cluster/cli.tcl
cluster size did not reach a consistent size 4
```
I guess the time (5s) of wait_for_cluster_size is not enough,
usually, the waiting time for our other tests for cluster
consistency is 50s, so also changing it to 50s.
Since we remove the COMMAND COUNT call in sentinel test in #11950,
reply-schemas-validator started reporting this error:
```
WARNING! The following commands were not hit at all:
command|count
ERROR! at least one command was not hit by the tests
```
This PR add a COMMAND COUNT test to cover it and also fix some
typos in req-res-log-validator.py
The reply schema validator is failing since the recent changes to introspection.tcl that use the RESET command, this happens because this test forces RESP3, but RESET command didn't respect that and set back RESP2.
The sanity check test intention was to detect that when a command is
added to sentinel it is on purpose. This test is easily broken, like
CLIENT SETINFO introduced by #11758.
We replace it with a test that validates that a few specific commands
are either there or missing (to test the infrastructure works correctly).