This PR purpose is to make the crash report process thread safe.
main changes include:
1. `setupSigSegvHandler()` is introduced to initialize the signal handler.
This function first initializes the signal handler mutex (if not initialized yet)
and then registers the process to the signal handler.
2. **sigsegvHandler** flags :
SA_NODEFER - don't add the signal to the process signal mask. We use this
flag because we want to be able to handle a second call to the signal manually.
removed SA_RESETHAND: this flag resets the signal handler function upon the first
entrance to the registered function. The reason to use this flag is to protect from
recursively entering the signal handler by the same thread. But, it also means
that if a second thread crashes while handling a signal, the process will be
terminated immediately and we won't get the crash report.
In this PR we discard this flag. The signal handler guard described below purpose
is to solve the above issues.
3. Add a **signal handler lock** with ERRORCHECK attributes.
The lock's purpose is to ensure that only one thread generates a crash report.
Once a second thread enters the signal handler it will be blocked.
We use the ERRORCHECK lock in order to protect from possible deadlock in
case the thread handling the crash gets a signal. In the latest scenario, we log
what we have collected until the handler crashed.
At the end of the crash report we reset the signal handler SIG_DFL, with no flags, and
rethrow the signal to generate a core dump (if enabled) and exit the process.
During the work on this PR we wanted to understand the historical reasons for
how crash is handled.
With respect to the choice of the flag, we believe the **SA_RESETHAND** was not
added for any specific purpose.
**SA_ONSTACK** which is removed here from bugReportEnd(), was originally also
set in the initial registration to signal handler, but removed in 3ada43e73. In addition,
it was removed from another location in deee2c1ef with the following description,
which is also relevant to why it should be removed from bugReportEnd:
> it seems to be some valgrind bug with SA_ONSTACK.
> SA_ONSTACK seems unneeded since WD is not recursive (SA_NODEFER was removed),
> also, not sure if it's even valid without a call to sigaltstack()
When running cluster info, and the number of keys overflows the integer
value, the summary no longer makes sense. This fixes by using an appropriate
type to handle values over the max int value.
If dict is rehashing, the entries in the head of table[0] is moved to table[1]
and all entries in `table[0][0:rehashidx]` is NULL.
`dictNext` start looking for non-NULL entry from table 0 index 0, and the first call
of `dictNext` on a rehashing dict will Iterate many times to skip those NULL entries.
We can easily skip those entries by setting `iter->index` as `iter->d->rehashidx` when
dict is rehashing and it's the first call of dictNext (`iter->index == -1 && iter->table == 0`).
Co-authored-by: sundb <sundbcn@gmail.com>
Currently rdbSaveMillisecondTime, rdbSaveDoubleValue api's return type is
int but they return the value directly from rdbWriteRaw function which has the
return type of ssize_t. As this may cause overflow to int so changed to ssize_t.
When adding a new ACL rule was added, an attempt was made to remove
any "overlapping" rules. However, there when a match was found, the search
was not resumed at the right location, but instead after the original position of
the original command.
For example, if the current rules were `-config +config|get` and a rule `+config`
was added. It would identify that `-config` was matched, but it would skip over
`+config|get`, leaving the compacted rule `-config +config`. This would be evaluated
safely, but looks weird.
This bug can only be triggered with subcommands, since that is the only way to
have sequential matching rules. Resolves#12470. This is also only present in 7.2.
I think there was also a minor risk of removing another valid rule, since it would start
the search of the next command at an arbitrary point. I couldn't find a valid offset that
would have cause a match using any of the existing commands that have subcommands
with another command.
Fix the assertion when a busy script (timeout) signal ready keys (like LPUSH),
and then an arbitrary client's `allow-busy` command steps into `handleClientsBlockedOnKeys`
try wake up clients blocked on keys (like BLPOP).
Reproduction process:
1. start a redis with aof
`./redis-server --appendonly yes`
2. exec blpop
`127.0.0.1:6379> blpop a 0`
3. use another client call a busy script and this script push the blocked key
`127.0.0.1:6379> eval "redis.call('lpush','a','b') while(1) do end" 0`
4. user a new client call an allow-busy command like auth
`127.0.0.1:6379> auth a`
BTW, this issue also break the atomicity of script.
This bug has been around for many years, the old versions only have the
atomic problem, only 7.0/7.2 has the assertion problem.
Co-authored-by: Oran Agra <oran@redislabs.com>
This PR mainly fixes a possible integer overflow in `json_append_string()`.
When we use `cjson.encoding()` to encode a string larger than 2GB, at specific
compilation flags, an integer overflow may occur leading to truncation, resulting
in the part of the string larger than 2GB not being encoded.
On the other hand, this overflow doesn't cause any read or write out-of-range or segment fault.
1) using -O0 for lua_cjson (`make LUA_DEBUG=yes`)
In this case, `i` will overflow and leads to truncation.
When `i` reaches `INT_MAX+1` and overflows to INT_MIN, when compared to
len, `i` (1000000..00) is expanded to 64 bits signed integer (1111111.....000000) .
At this point i will be greater than len and jump out of the loop, so `for (i = 0; i < len; i++)`
will loop up to 2^31 times, and the part of larger than 2GB will be truncated.
```asm
`i` => -0x24(%rbp)
<+253>: addl $0x1,-0x24(%rbp) ; overflow if i large than 2^31
<+257>: mov -0x24(%rbp),%eax
<+260>: movslq %eax,%rdx ; move a 32-bit value with sign extension into a 64-bit signed
<+263>: mov -0x20(%rbp),%rax
<+267>: cmp %rax,%rdx ; check `i < len`
<+270>: jb 0x212600 <json_append_string+148>
```
2) using -O2/-O3 for lua_cjson (`make LUA_DEBUG=no`, **the default**)
In this case, because singed integer overflow is an undefined behavior, `i` will not overflow.
`i` will be optimized by the compiler and use 64-bit registers for all subsequent instructions.
```asm
<+180>: add $0x1,%rbx ; Using 64-bit register `rbx` for i++
<+184>: lea 0x1(%rdx),%rsi
<+188>: mov %rsi,0x10(%rbp)
<+192>: mov %al,(%rcx,%rdx,1)
<+195>: cmp %rbx,(%rsp) ; check `i < len`
<+199>: ja 0x20b63a <json_append_string+154>
```
3) using 32bit
Because `strbuf_ensure_empty_length()` preallocates memory of length (len * 6 + 2),
in 32-bit `cjson.encode()` can only handle strings smaller than ((2 ^ 32) - 3 ) / 6.
So 32bit is not affected.
Also change `i` in `strbuf_append_string()` to `size_t`.
Since its second argument `str` is taken from the `char2escape` string array which is never
larger than 6, so `strbuf_append_string()` is not at risk of overflow (the bug was unreachable).
GEOHASH / GEODIST / GEOPOS use zsetScore to get the score, in skiplist encoding,
we use dictFind to get the score, which is O(1), same as ZSCORE command.
It is not clear why these commands had O(Log(N)), and O(N) until now.
When loading a function from either RDB/AOF or a replica, it is essential not to
fail on timeout errors. The loading time may vary due to various factors, such as
hardware specifications or the system's workload during the loading process.
Once a function has been successfully loaded, it should be allowed to load from
persistence or on replicas without encountering a timeout failure.
To maintain a clear separation between the engine and Redis internals, the
implementation refrains from directly checking the state of Redis within the
engine itself. Instead, the engine receives the desired timeout as part of the
library creation and duly respects this timeout value. If Redis wishes to disable
any timeout, it can simply send a value of 0.
When doing merge selector, we should check whether the merge
has started (i.e., whether open_bracket_start is -1) every time.
Otherwise, encountering an illegal selector pattern could succeed
and also cause memory leaks, for example:
```
acl setuser test1 (+PING (+SELECT (+DEL )
```
The above would leak memory and succeed with only DEL being applied,
and would now error after the fix.
Co-authored-by: Oran Agra <oran@redislabs.com>
Additional test coverage for incr/decr operation.
integer number could be present in raw encoding format due to operation like append. A incr/decr operation following it optimize the string to int encoding format.
A bug introduced in #11657 (7.2 RC1), causes client-eviction (#8687)
and INFO to have inaccurate memory usage metrics of MONITOR clients.
Because the type in `c->type` and the type in `getClientType()` are confusing
(in the later, `CLIENT_TYPE_NORMAL` not `CLIENT_TYPE_SLAVE`), the comment
we wrote in `updateClientMemUsageAndBucket` was wrong, and in fact that function
didn't skip monitor clients.
And since it doesn't skip monitor clients, it was wrong to delete the call for it from
`replicationFeedMonitors` (it wasn't a NOP).
That deletion could mean that the monitor client memory usage is not always up to
date (updated less frequently, but still a candidate for client eviction).
changing the response and request policy of a few commands,
see https://redis.io/docs/reference/command-tips
1. RANDOMKEY used to have no response policy, which means
that when sent to multiple shards, the responses should be aggregated.
this normally applies to commands that return arrays, but since RANDOMKEY
replies with a simple string, it actually requires a SPECIAL response policy
(for the client to select just one)
2. SCAN used to have no response policy, but although the key names part of
the response can be aggregated, the cursor part certainly can't.
3. MSETNX had a request policy of MULTI_SHARD and response policy of AGG_MIN,
but in fact the contract with MSETNX is that when one key exists, it returns 0
and doesn't set any key, routing it to multiple shards would mean that if one failed
and another succeeded, it's atomicity is broken and it's impossible to return a valid
response to the caller.
Co-authored-by: Shachar Langbeheim <shachlan@amazon.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
When the redis server cluster running on cluster-preferred-endpoint-type unknown-endpoint mode, and receive a request that should be redirected to another redis server node, it does not reply the hostip, but a empty host like MOVED 3999 :6381.
The redis-cli would try to connect to an address without a host, which cause the issue:
```
127.0.0.1:7002> set bar bar
-> Redirected to slot [5061] located at :7000
Could not connect to Redis at :7000: No address associated with hostname
Could not connect to Redis at :7000: No address associated with hostname
not connected> exit
```
In this case, the redis-cli should use the previous hostip when there's no host provided by the server.
---------
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Madelyn Olson <madelynolson@gmail.com>
According to the format shown in https://redis.io/commands/cluster-nodes/
```
<ip:port@cport[,hostname[,auxiliary_field=value]*]>
```
when there is no hostname, and the auxiliary fields are hidden, the cluster topology should be
```
<ip:port@cport>
```
However in the code we always print the hostname even when it is an empty string, leaving an unnecessary comma tailing after cport, which is weird and conflicts with the doc.
```
94ca2f6cf85228a49fde7b738ee1209de7bee325 127.0.0.1:6379@16379, myself,master - 0 0 0 connected 0-16383
```
60e5075d4 Version 1.2.0 (#1202)
adef139a7 Remove support in deprecated TLS versions 1.0 and 1.1 (#1205)
d543baba6 Install major version symlink of shared objects.
052f99ab2 Ensure functionality without _MSC_VER definition
ded32c7d1 Add a test for the TCP_USER_TIMEOUT option. (#1192)
5cbd1f296 Add -Werror as a default. (#1193)
af1445638 CI: Update homebrew Redis version. (#1191)
6de326e87 Fix a typo in b6a052f.
git-subtree-dir: deps/hiredis
git-subtree-split: 60e5075d4ac77424809f855ba3e398df7aacefe8
* Fix integer overflows due to using wrong integer size.
* Add assertions / panic when overflow still happens.
* Deletion of dead code to avoid need to maintain it
* Some changes are not because of bugs, but rather paranoia.
* Improve cmsgpack and cjson test coverage.
Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
valgrind report a Uninitialised warning:
```
==25508== Uninitialised value was created by a heap allocation
==25508== at 0x4848899: malloc (in
/usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==25508== by 0x1A35A1: ztrymalloc_usable_internal (zmalloc.c:117)
==25508== by 0x1A368D: zmalloc (zmalloc.c:145)
==25508== by 0x21FDEA: clusterInit (cluster.c:973)
==25508== by 0x19DC09: main (server.c:7306)
```
Introduced in #12344
This is an addition to #12380, to prevent potential bugs when collecting
keys from multiple commands in the future.
Note that this function also resets numkeys in some cases.
Process loss of slot ownership in cluster bus
When a node no longer owns a slot, it clears the bit corresponding
to the slot in the cluster bus messages. The receiving nodes
currently don't record the fact that the sender stopped claiming
a slot until some other node in the cluster starts claiming the slot.
This can cause a slot to go missing during slot migration when subjected
to inopportune race with addition of new shards or a failover.
This fix forces the receiving nodes to process the loss of ownership
to avoid spreading wrong information.
The test fails on freebsd CI:
```
*** [err]: stats: eventloop metrics in tests/unit/info.tcl
Expected '31777' to be less than '16183' (context: type eval line 17 cmd
{assert_lessthan $el_sum2 [expr $el_sum1+10000] } proc ::test)
```
The test added in #11963, fails on freebsd CI which is slow,
increase tollerance and also add some verbose logs, now we can
see these logs in verbose mode (for better views):
```
eventloop metrics cycle1: 12, cycle2: 15
eventloop metrics el_sum1: 315, el_sum2: 411
eventloop metrics cmd_sum1: 126, cmd_sum2: 137
[ok]: stats: eventloop metrics (111 ms)
instantaneous metrics instantaneous_eventloop_cycles_per_sec: 8
instantaneous metrics instantaneous_eventloop_duration_usec: 55
[ok]: stats: instantaneous metrics (1603 ms)
[ok]: stats: debug metrics (112 ms)
```
When getKeysUsingKeySpecs processes a command with more than one key-spec,
and called with a total of more than 256 keys, it'll call getKeysPrepareResult again,
but since numkeys isn't updated, getKeysPrepareResult will not bother to copy key
names from the old result (leaving these slots uninitialized). Furthermore, it did not
consider the keys it already found when allocating more space.
Co-authored-by: Oran Agra <oran@redislabs.com>
The negative offset check was added in #9052, we realized
that this is a non-mandatory breaking change and we would
like to add it only in 8.0.
This reverts PR #9052, will be re-introduced later in 8.0.
Optimized the performance of the SCAN command in a few ways:
1. Move the key filtering (by MATCH pattern) in the scan callback,
so as to avoid collecting them for later filtering.
2. Reduce a many memory allocations and copying (use a reference
to the original sds, instead of creating an robj, an excessive 2 mallocs
and one string duplication)
3. Compare TYPE filter directly (as integers), instead of inefficient string
compare per key.
4. fixed a small bug: when scan zset and hash types, maxiterations uses
a more accurate number to avoid wrong double maxiterations.
Changes **postponed** for a later version (8.0):
1. Prepare to move the TYPE filtering to the scan callback as well. this was
put on hold since it has side effects that can be considered a breaking
change, which is that we will not attempt to do lazy expire (delete) a key
that was filtered by not matching the TYPE (changing it would mean TYPE filter
starts behaving the same as MATCH filter already does in that respect).
2. when the specified key TYPE filter is an unknown type, server will reply a error
immediately instead of doing a full scan that comes back empty handed.
Benchmark result:
For different scenarios, we obtained about 30% or more performance improvement.
Co-authored-by: Oran Agra <oran@redislabs.com>
Added missing O3 flag to linking stage in default option "-O3 -flto".
Flags doesn't lead to significant changes in performance:
- +0.21% in geomean for all benchmarks on ICX bare-metal (256 cpus)
- +0.33% in geomean for all benchmarks on m6i.2xlarge (16 cpus)
Checked on redis from Mar'30 (commit 1f76bb17dd ). Comparison file is attached.
aof-disable-auto-gc was created for testing purposes,
to check if certain AOF files were actually generated
and if they were deletedcorrectly during testing.
So hiding it, see #12249 for more discussion.
Originally, when "tls-cluster" is enabled, `port` is set to TLS port. In order to support non-TLS clients, `pport` is used to propagate TCP port across cluster nodes. However when "tls-cluster" is disabled, `port` is set to TCP port, and `pport` is not used, which means the cluster cannot provide TLS service unless "tls-cluster" is on.
```
typedef struct {
// ...
uint16_t port; /* Latest known clients port (TLS or plain). */
uint16_t pport; /* Latest known clients plaintext port. Only used if the main clients port is for TLS. */
// ...
} clusterNode;
```
```
typedef struct {
// ...
uint16_t port; /* TCP base port number. */
uint16_t pport; /* Sender TCP plaintext port, if base port is TLS */
// ...
} clusterMsg;
```
This PR renames `port` and `pport` in `clusterNode` to `tcp_port` and `tls_port`, to record both ports no matter "tls-cluster" is enabled or disabled.
This allows to provide TLS service to clients when "tls-cluster" is disabled: when displaying cluster topology, or giving `MOVED` error, server can provide TLS or TCP port according to client's connection type, no matter what type of connection cluster bus is using.
For backwards compatibility, `port` and `pport` in `clusterMsg` are preserved, when "tls-cluster" is enabled, `port` is set to TLS port and `pport` is set to TCP port, when "tls-cluster" is disabled, `port` is set to TCP port and `pport` is set to TLS port (instead of 0).
Also, in the nodes.conf file, a new aux field displaying an extra port is added to complete the persisted info. We may have `tls_port=xxxxx` or `tcp_port=xxxxx` in the aux field, to complete the cluster topology, while the other port is stored in the normal `<ip>:<port>` field. The format is shown below.
```
<node-id> <ip>:<tcp_port>@<cport>,<hostname>,shard-id=...,tls-port=6379 myself,master - 0 0 0 connected 0-1000
```
Or we can switch the position of two ports, both can be correctly resolved.
```
<node-id> <ip>:<tls_port>@<cport>,<hostname>,shard-id=...,tcp-port=6379 myself,master - 0 0 0 connected 0-1000
```
The new blockedBeforeSleep was added in #12337, it breaks the order in 2ecb5ed.
This may be related to #2288, quoted from comment in #2288:
```
Moreover the clusterBeforeSleep() call was misplaced at the end of the chain of the
beforeSleep() call in redis.c. It should be at the top, before processing un blocking
clients. This is exactly the reason in the specific instance of the bug as reported,
why the state was not updated in time before clients served.
```
blocking RM_Call was introduced on: #11568, It allows a module to perform
blocking commands and get the reply asynchronously.If the command gets
block, a special promise CallReply is returned that allow to set the unblock
handler. The unblock handler will be called when the command invocation
finish and it gets, as input, the command real reply.
The issue was that the real CallReply was created using a stack allocated
RedisModuleCtx which is no longer available after the unblock handler finishes.
So if the module keeps the CallReply after the unblock handler finished, the
CallReply holds a pointer to invalid memory and will try to access it when the
CallReply will be released.
The solution is to create the CallReply with a NULL context to make it totally
detached and can be freed freely when the module wants.
Test was added to cover this case, running the test with valgrind before the
fix shows the use after free error. With the fix, there are no valgrind errors.
unrelated: adding a missing `$rd close` in many tests in that file.
Apart from adding the missing coverage, this PR also adds `blockedBeforeSleep`
that gathers all block-related functions from `beforeSleep`
The order inside `blockedBeforeSleep` is different: now `handleClientsBlockedOnKeys`
(which may unblock clients) is called before `processUnblockedClients` (which handles
unblocked clients).
It makes sense to have this order.
There are no visible effects of the wrong ordering, except some cleanups of the now-unblocked
client would have happen in the next `beforeSleep` (will now happen in the current one)
The reason we even got into it is because i triggers an assertion in logresreq.c (breaking
the assumption that `unblockClient` is called **before** actually flushing the reply to the socket):
`handleClientsBlockedOnKeys` is called, then it calls `moduleUnblockClientOnKey`, which calls
`moduleUnblockClient`, which adds the client to `moduleUnblockedClients` back to `beforeSleep`,
we call `handleClientsWithPendingWritesUsingThreads`, it writes the data of buf to the client, so
`client->bufpos` became 0
On the next `beforeSleep`, we call `moduleHandleBlockedClients`, which calls `unblockClient`,
which calls `reqresAppendResponse`, triggering the assert. (because the `bufpos` is 0) - see https://github.com/redis/redis/pull/12301#discussion_r1226386716