This PR introduces a couple of changes to improve cluster test stability:
1. Increase the cluster node timeout to 3 seconds, which is similar to the
normal cluster tests, but introduce a new mechanism to increase the ping
period so that the tests are still fast. This new config is a debug config.
2. Set `cluster-replica-no-failover yes` on a wider array of tests which are
sensitive to failovers. This was occurring on the ARM CI.
There is a race condition in the test:
```
*** [err]: redis-cli --cluster add-node with cluster-port in tests/unit/cluster/cli.tcl
Expected '5' to be equal to '4' {assert_equal 5 [CI 0 cluster_known_nodes]} proc ::test)
```
When using cli to add node, there can potentially be a race condition
in which all nodes presenting cluster state o.k even though the added
node did not yet meet all cluster nodes.
This comment and the fix were taken from #11221. Also apply it in several
other similar places.
Optimization update from -O2 to -O3 -flto gives up to 5% performance gain
in 'redis-benchmarks-spec-client-runner' tests geomean where GCC 9.4.0 is used for build
* Fix for false-positive warning in bitops.c
Warning appeared with O3, on CentOS during inlininig procedure
* Fixed unitialized streamID within streamTrim() (#1)
Co-authored-by: filipe oliveira <filipecosta.90@gmail.com>
- fix `the the` typo
- `LPOPRPUSH` does not exist, should be `RPOPLPUSH`
- `CLUSTER GETKEYINSLOT` 's time complexity should be O(N)
- `there bytes` should be `three bytes`, this closes#11266
- `slave` word to `replica` in log, modified the front and missed the back
- remove useless aofReadDiffFromParent in server.h
- `trackingHandlePendingKeyInvalidations` method adds a void parameter
- BITOP: turn argument `operation` from string to oneof
- CLIENT KILL: turn argument `skipme` from string to oneof
- COMMAND GETKEYS / GETKEYSANDFLAGS: change arguments to optional, and change arity to -3 (fixes regression in redis 7.0)
- CLIENT PAUSE: this command was added in v3.0.0
Better redis-cli hints for commands that take other commands as arguments.
```
command getkeysandflags hello [protover [AUTH username password]]
acl dryrun user hello [protover [AUTH username password]]
```
If Redis crashes due to calling an invalid function pointer,
the `backtrace` function will try to dereference this invalid pointer
which will cause a crash inside the crash report and will kill
the processes without having all the crash report information.
Example:
```
=== REDIS BUG REPORT START: Cut & paste starting from here ===
198672:M 19 Sep 2022 18:06:12.936 # Redis 255.255.255 crashed by signal: 11, si_code: 1
198672:M 19 Sep 2022 18:06:12.936 # Accessing address: 0x1
198672:M 19 Sep 2022 18:06:12.936 # Crashed running the instruction at: 0x1
// here the processes is crashing
```
This PR tries to fix this crash be:
1. Identify the issue when it happened.
2. Replace the invalid pointer with a pointer to some dummy function
so that `backtrace` will not crash.
I identification is done by comparing `eip` to `info->si_addr`, if they
are the same we know that the crash happened on the same address it tries to
accesses and we can conclude that it tries to call and invalid function pointer.
To replace the invalid pointer we introduce a new function, `setMcontextEip`,
which is very similar to `getMcontextEip` and it knows to set the Eip for the
different supported OS's. After printing the trace we retrieve the old `Eip` value.
Mainly fix two minor bug
1. When handle BL*POP/BLMOVE commands with blocked clients, we should increment server.dirty.
2. `listPopRangeAndReplyWithKey()` in `serveClientBlockedOnList()` should not repeat calling
`signalModifiedKey()` which has been called when an element was pushed on the list.
(was skipped in all bpop commands, other than blmpop)
Other optimization
add `signal` param for `listElementsRemoved` to skip `signalModifiedKey()` to unify all pop operation.
Unifying all pop operations also prepares us for #11303, so that we can avoid having to deal with the
conversion from quicklist to listpack() in various places when the list shrinks.
The original idea behind auto-setting the default (first,last,step) spec was to use
the most "open" flags when the user didn't provide any key-spec flags information.
While the above idea is a good approach, it really makes no sense to set
CMD_KEY_VARIABLE_FLAGS if the user didn't provide the getkeys-api flag:
in this case there's not way to retrieve these variable flags, so what's the point?
Internally in redis there was code to ignore this already, so this fix doesn't change
redis's behavior, it only affects the output of COMMAND command.
SunOS/Solaris and its relatives don't support the flock() function.
While "redis" has been excluding setting up the lock using flock() on the cluster
configuration file when compiling under Solaris, it was still using flock() in the
unlock call while shutting down.
This pull request eliminates the flock() call also in the unlocking stage
for Oracle Solaris and its relatives.
Fix compilation regression from #10912
If a command gets an OOM response and then if we set maxmemory to zero
to disable the limit, server.pre_command_oom_state never gets updated
and it stays true. As RM_Call() calls with "respect deny-oom" flag checks
server.pre_command_oom_state, all calls will fail with OOM.
Added server.maxmemory check in RM_Call() to process deny-oom flag
only if maxmemory is configured.
* Fix CLUSTER SHARDS showing empty hostname
In #10290, we changed clusterNode hostname from `char*`
to `sds`, and the old `node->hostname` was changed to
`sdslen(node->hostname)!=0`.
But in `addNodeDetailsToShardReply` it is missing.
It results in the return of an empty string hostname
in CLUSTER SHARDS command if it unavailable.
Like this (note that we listed it as optional in the doc):
```
9) "hostname"
10) ""
```
Adds a number of user management/ACL validaiton/command execution functions to improve a
Redis module's ability to enforce ACLs correctly and easily.
* RM_SetContextUser - sets a RedisModuleUser on the context, which RM_Call will use to both
validate ACLs (if requested and set) as well as assign to the client so that scripts executed via
RM_Call will have proper ACL validation.
* RM_SetModuleUserACLString - Enables one to pass an entire ACL string, not just a single OP
and have it applied to the user
* RM_GetModuleUserACLString - returns a stringified version of the user's ACL (same format as dump
and list). Contains an optimization to cache the stringified version until the underlying ACL is modified.
* Slightly re-purpose the "C" flag to RM_Call from just being about ACL check before calling the
command, to actually running the command with the right user, so that it also affects commands
inside EVAL scripts. see #11231
Executing an XAUTOCLAIM command on a stream key in a specific state, with a
specially crafted COUNT argument may cause an integer overflow, a subsequent
heap overflow, and potentially lead to remote code execution.
The problem affects Redis versions 7.0.0 or newer.
The bug is that the the server keeps on sending newlines to the client.
As a result, the receiver might not find the EOF marker since it searches
for it only on the end of each payload it reads from the socket.
The but only affects `CLIENT_REPL_RDBONLY`.
This affects `redis-cli --rdb` (depending on timing)
The fixed consist of two steps:
1. The `CLIENT_REPL_RDBONLY` should be closed ASAP (we cannot
always call to `freeClient` so we use `freeClientAsync`)
2. Add new replication state `SLAVE_STATE_RDB_TRANSMITTED`
Starting from 6.2, after ACL SETUSER user reset, the user
will carry the sanitize-payload flag. It was added in #7807,
and then ACL SETUSER reset is inconsistent with default
newly created user which missing sanitize-payload flag.
Same as `off` and `on` these two bits are mutually exclusive,
the default created user needs to have sanitize-payload flag.
Adds USER_FLAG_SANITIZE_PAYLOAD flag to ACLCreateUser.
Note that the bug don't have any real implications,
since the code in rdb.c (rdbLoadObject) checks for
`USER_FLAG_SANITIZE_PAYLOAD_SKIP`, so the fact that
`USER_FLAG_SANITIZE_PAYLOAD` is missing doesn't really matters.
Added tests to make sure it won't be broken in the future,
and updated the comment in ACLSetUser and redis.conf
When using `INFO ALL <section>`, when `section` is a specific module section.
Redis will not print the additional section(s).
The fix in this case, will search the modules info sections if the user provided additional sections to `ALL`.
Co-authored-by: Oran Agra <oran@redislabs.com>
This bug was introduced in #10344 (7.0.3), and it breaks the
redis-cli --cluster create usage in #10436 (7.0 RC3).
At the same time, the cluster-port support introduced in #10344
cannot use the DNS lookup brought by #10436.
This PR mainly deals with 2 crashes introduced in #9357,
and fix the QUICKLIST-PACKED-THRESHOLD mess in external test mode.
1. Fix crash due to deleting an entry from a compress quicklistNode
When inserting a large element, we need to create a new quicklistNode first,
and then delete its previous element, if the node where the deleted element is
located is compressed, it will cause a crash.
Now add `dont_compress` to quicklistNode, if we want to use a quicklistNode
after some operation, we can use this flag like following:
```c
node->dont_compress = 1; /* Prevent to be compressed */
some_operation(node); /* This operation might try to compress this node */
some_other_operation(node); /* We can use this node without decompress it */
node->dont_compress = 0; /* Re-able compression */
quicklistCompressNode(node);
```
Perhaps in the future, we could just disable the current entry from being
compressed during the iterator loop, but that would require more work.
2. Fix crash due to wrongly split quicklist
before #9357, the offset param of _quicklistSplitNode() will not negative.
For now, when offset is negative, the split extent will be wrong.
following example:
```c
int orig_start = after ? offset + 1 : 0;
int orig_extent = after ? -1 : offset;
int new_start = after ? 0 : offset;
int new_extent = after ? offset + 1 : -1;
# offset: -2, after: 1, node->count: 2
# current wrong range: [-1,-1] [0,-1]
# correct range: [1,-1] [0, 1]
```
Because only `_quicklistInsert()` splits the quicklistNode and only
`quicklistInsertAfter()`, `quicklistInsertBefore()` call _quicklistInsert(),
so `quicklistReplaceEntry()` and `listTypeInsert()` might occur this crash.
But the iterator of `listTypeInsert()` is alway from head to tail(iter->offset is
always positive), so it is not affected.
The final conclusion is this crash only occur when we insert a large element
with negative index into a list, that affects `LSET` command and `RM_ListSet`
module api.
3. In external test mode, we need to restore quicklist packed threshold after
when the end of test.
4. Show `node->count` in quicklistRepr().
5. Add new tcl proc `config_get_set` to support restoring config in tests.
This bug is introduced in #7653. (Redis 6.2.0)
When `server.maxmemory_eviction_tenacity` is 100, `eviction_time_limit_us` is
`ULONG_MAX`, and if we cannot find the best key to delete (e.g. maxmemory-policy
is `volatile-lru` and all keys with ttl have been evicted), in `cant_free` redis will sleep
forever if some items are being freed in the lazyfree thread.
Just noticed that there are some inaccurate, or at least confusing information about `repl-diskless-load` in `redis.conf`
It shouldn't scare away users willing to spend the extra memory.
`may mean that we have to flush the contents of the current database before the full rdb was received.`: this is likely related to the time when there was an option `always`, where content on replica was flushed before loading from master.
For the stream data type, some commands, such as **XGROUP CREATE, XGROUP DESTROY, XGROUP CREATECONSUMER,
XGROUP DELCONSUMER and XINFO CONSUMERS** use groupname and consumername in the command description;
However, for the commands **XREADGROUP GROUP, XPENDING, XACK , XCLAIM and XAUTOCLAIM** use term "group and consumer", clients could be confused.
This PR goal is to unify all the commands to groupname and consumername.
When using cli to add node, there can potentially be a race condition in
which all nodes presenting cluster state o.k even though the added node
did not yet meet all cluster nodes.
this adds another utility function to wait until all cluster nodes see the same cluster size
When Redis is built without TLS support, connectionTypeTls() function
keeps searching connection type as cached connection type is NULL.
Added another variable to track if we cached the connection type to
prevent search after the first time.
Noticed a log warning message is printed repeatedly by connectionTypeTls.
Co-authored-by: zhenwei pi <pizhenwei@bytedance.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
EVAL scripts are by default not considered `write` commands, so they were allowed on a replica.
But when adding a shebang, they become `write` command (unless the `no-writes` flag is added).
With this change we'll handle them as write commands, and reply with MOVED instead of
READONLY when executed on a redis cluster replica.
Co-authored-by: chendianqiang <chendianqiang@meituan.com>
Add a new "D" flag to RM_Call which runs whatever verification the user requests,
but returns before the actual execution of the command.
It automatically enables returning error messages as CallReply objects to distinguish
success (NULL) from failure (CallReply returned).
micro optimizations, giving the hints that the returned addresses
are guaranteed to be unique. The alloc_size attribute gives an extra hint
about the source of the size, useful mostly for calloc-like calls or when there
are extra arguments.
When RM_Call was used with `M` (reject OOM), `W` (reject writes),
as well as `S` (rejecting stale or write commands in "Script mode"),
it would have only checked the command flags, but not the declared
script flag in case it's a command that runs a script.
Refactoring: extracts out similar code in server.c's processCommand
to be usable in RM_Call as well.
Bugfix:
with the scenario if we force assigned a slot to other master,
old master will lose the slot ownership, then old master will
call the function delKeysInSlot() to delete all keys which in
the slot. These delete operations should replicate to replicas,
avoid the data divergence issue in master and replicas.
Additionally, in this case, we now call:
* signalModifiedKey (to invalidate WATCH)
* moduleNotifyKeyspaceEvent (key space notification for modules)
* dirty++ (to signal that the persistence file may be outdated)
Co-authored-by: weimeng <weimeng@didiglobal.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Check the validity of the value before performing the create operation,
prevents new data from being generated even if the request fails to execute.
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: chendianqiang <chendianqiang@meituan.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
TLDR: the CLUSTER command originally had the `random` flag,
so all the sub-commands initially got that new flag, but in fact many
of them don't need it.
The only effect of this change is on the output of COMMAND INFO.
* Remove redundant array bio_pending[]. Value at index i identically reflects the
length of list bio_jobs[i]. Better use listLength() instead and discard this array.
(no critical section issues to concern about).
changed returned value of bioPendingJobsOfType() from "long long" to "long".
Remove unused API. Maybe we will use this API later.
Redis 7.0 has #9890 which added an assertion when the propagation queue
was not flushed and we got to beforeSleep.
But it turns out that when processCommands calls getNodeByQuery and
decides to reject the command, it can lead to a key that was lazy
expired and is deleted without later flushing the propagation queue.
This change prevents lazy expiry from deleting the key at this stage
(not as part of a command being processed in `call`)