Commit Graph

12085 Commits

Author SHA1 Message Date
Viktor Söderqvist
8d675950e6
Don't crash when adding a forgotten node to blacklist twice (#12702)
Add a defensive checks to prevent double freeing a node from the cluster blacklist.
2023-10-31 07:20:06 -07:00
erpeng
4bbb2b0152
Optimize CPU cache efficiency on dict while it's being rehashed (#5692)
when find a key  ,if redis is rehashing, currently we should lookup both tables (ht[0] and ht[1]).
if we use the key's index comparing to the rehashidx,if index < rehashidx,then  we can conclude: 
1. it is rehashing(rehashidx is -1 if it is not rehashing) 
2. we can't find key in ht[0] so just continue to find key in ht[1]

The possible performance gain here, is not the looping over the linked list (which is empty),
but rather the lookup in the table (which could be a cache miss).
---------

Co-authored-by: zhangshihua003 <zhangshihua003@ke.com>
Co-authored-by: sundb <sundbcn@gmail.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: judeng <abc3844@126.com>
2023-10-31 09:57:26 +02:00
Roshan Khatri
f7fa481156
Optimize finding the slot for a given key count in a fenwick tree (#12704)
This PR optimizes the time complexity of findSlotByKeyIndex from O(log^2(N)) to O(log(N)) by using the 
tree structure of binary index tree to find a slot in one search of the index.
2023-10-27 17:15:19 -07:00
Harkrishn Patro
4145d628b4
Reduce dbBuckets operation time complexity from O(N) to O(1) (#12697)
As part of #11695 independent dictionaries were introduced per slot.
Time complexity to discover total no. of buckets across all dictionaries
increased to O(N) with straightforward implementation of iterating over
all dictionaries and adding dictBuckets of each.

To optimize the time complexity, we could maintain a global counter at
db level to keep track of the count of buckets and update it on the start
and end of rehashing.

---------

Co-authored-by: Roshan Khatri <rvkhatri@amazon.com>
2023-10-27 22:05:40 +03:00
Roshan Khatri
7d68208a6e
Reset later item flag after defrag later is done (#12694)
Fixing issues described in #12672, started after #11695
Related to #12674

Fixes the `defrag didn't stop' issue.

In some cases of how the keys were stored in memory
defrag_later_item_in_progress was not getting reset once we finish
defragging the later items and we move to the next slot. This stopped
the scan to happen in the later slots and did not get
2023-10-27 13:56:15 +03:00
Oran Agra
ba900f6cb8
Fix fd leak causing deleted files to remain open and eat disk space (#12693)
This was introduced in v7.2 by #11248
2023-10-25 20:54:02 +03:00
Binbin
372ea21875
Update comment around propagateDeletion (#12687)
Fix some outdated comments and add comment for moduleNotifyKeyspaceEvent
we added in #11084 since it seems a bit implicit.

---------

Co-authored-by: Oran Agra <oran@redislabs.com>
2023-10-24 13:10:03 +03:00
Harkrishn Patro
3fac869f02
Fix test, disable expiration until empty buckets are formed (#12689)
Test failure on freebsd CI:
```
*** [err]: expire scan should skip dictionaries with lot's of empty buckets in tests/unit/expire.tcl
  scan didn't handle slot skipping logic.
```

Observation:

expiry of keys might happen before the empty buckets are formed and won't help with the expiry skip logic validation.

Solution:

Disable expiration until the empty buckets are formed.
2023-10-24 11:29:40 +03:00
Harkrishn Patro
26eb4ce397
Fix defrag test (#12674)
Fixing issues started after #11695 when the defrag tests are being executed in cluster mode too.
For some reason, it looks like the defragmentation is over too quickly, before the test is able to
detect that it's running.
so now instead of waiting to see that it's active, we wait to see that it did some work
```
[err]: Active defrag big list: cluster in tests/unit/memefficiency.tcl
defrag not started.
[err]: Active defrag big keys: cluster in tests/unit/memefficiency.tcl
defrag didn't stop.
```
2023-10-22 11:56:45 +03:00
Harkrishn Patro
becd50d0da
Disable flaky defrag tests affecting daily run (#12672)
Temporarily disabling few of the defrag tests in cluster mode to make the daily run stable:

Active defrag eval scripts
Active defrag big keys
Active defrag big list
Active defrag edge case
2023-10-19 21:12:58 +03:00
Harkrishn Patro
f3bf8485d8
Fix resize hash table dictionary iterator (#12660)
Dictionary iterator logic in the `tryResizeHashTables` method is picking the next
(incorrect) dictionary while the cursor is at a given slot. This could lead to some
dictionary/slot getting skipped from resizing.

Also stabilize the test.

problem introduced recently in #11695
2023-10-19 13:58:32 +03:00
Oran Agra
03345ddc7f
Fix issue of listen before chmod on Unix sockets (CVE-2023-45145) (#12671)
Before this commit, Unix socket setup performed chmod(2) on the socket
file after calling listen(2). Depending on what umask is used, this
could leave the file with the wrong permissions for a short period of
time. As a result, another process could exploit this race condition and
establish a connection that would otherwise not be possible.

We now make sure the socket permissions are set up prior to calling
listen(2).

(cherry picked from commit 1119ecae6fd8796fa337df2212f09173ab6c7b0a)

Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
2023-10-18 14:00:00 +03:00
sundb
3c734b8e9d
Add new compilation CI for macos-11 and macos-13 (#12666)
As discussed in #12611
Add a build CI for macox 11 and 13 to avoid compatibility breakage introduced by future macos sdk versions.
2023-10-18 13:25:52 +03:00
meiravgri
d27c7413a9
remove heap allocations from signal handlers. (#12655)
Using heap allocation during signal handlers is unsafe.
This PR purpose is to replace all the heap allocations done within the signal
handlers raised upon server crash and assertions.
These were added in #12453.

writeStacktraces(): allocates the stacktraces output array on the calling thread's
stack and assigns the address to a global variable.
It calls `ThreadsManager_runOnThreads()` that invokes `collect_stacktrace_data()`
by each thread: each thread writes to a different location in the above array to allow
sync writes.

get_ready_to_signal_threads_tids(): instead of allocating the `tids` array, it receives it
as a fixed size array parameter, allocated on on the stack of the calling function, and
returns the number of valid threads. The array size is hard-coded to 50.

`ThreadsManager_runOnThreads():` To avoid the outputs array allocation, the
**callback signature** was changed. Now it should return void. This function return type
has also changed to int - returns 1 if successful, and 0 otherwise.

Other unsafe calls will be handled in following PRs
2023-10-16 17:21:49 +03:00
Vitaly
0270abda82
Replace cluster metadata with slot specific dictionaries (#11695)
This is an implementation of https://github.com/redis/redis/issues/10589 that eliminates 16 bytes per entry in cluster mode, that are currently used to create a linked list between entries in the same slot.  Main idea is splitting main dictionary into 16k smaller dictionaries (one per slot), so we can perform all slot specific operations, such as iteration, without any additional info in the `dictEntry`. For Redis cluster, the expectation is that there will be a larger number of keys, so the fixed overhead of 16k dictionaries will be The expire dictionary is also split up so that each slot is logically decoupled, so that in subsequent revisions we will be able to atomically flush a slot of data.

## Important changes
* Incremental rehashing - one big change here is that it's not one, but rather up to 16k dictionaries that can be rehashing at the same time, in order to keep track of them, we introduce a separate queue for dictionaries that are rehashing. Also instead of rehashing a single dictionary, cron job will now try to rehash as many as it can in 1ms.
* getRandomKey - now needs to not only select a random key, from the random bucket, but also needs to select a random dictionary. Fairness is a major concern here, as it's possible that keys can be unevenly distributed across the slots. In order to address this search we introduced binary index tree). With that data structure we are able to efficiently find a random slot using binary search in O(log^2(slot count)) time.
* Iteration efficiency - when iterating dictionary with a lot of empty slots, we want to skip them efficiently. We can do this using same binary index that is used for random key selection, this index allows us to find a slot for a specific key index. For example if there are 10 keys in the slot 0, then we can quickly find a slot that contains 11th key using binary search on top of the binary index tree.
* scan API - in order to perform a scan across the entire DB, the cursor now needs to not only save position within the dictionary but also the slot id. In this change we append slot id into LSB of the cursor so it can be passed around between client and the server. This has interesting side effect, now you'll be able to start scanning specific slot by simply providing slot id as a cursor value. The plan is to not document this as defined behavior, however. It's also worth nothing the SCAN API is now technically incompatible with previous versions, although practically we don't believe it's an issue.
* Checksum calculation optimizations - During command execution, we know that all of the keys are from the same slot (outside of a few notable exceptions such as cross slot scripts and modules). We don't want to compute the checksum multiple multiple times, hence we are relying on cached slot id in the client during the command executions. All operations that access random keys, either should pass in the known slot or recompute the slot. 
* Slot info in RDB - in order to resize individual dictionaries correctly, while loading RDB, it's not enough to know total number of keys (of course we could approximate number of keys per slot, but it won't be precise). To address this issue, we've added additional metadata into RDB that contains number of keys in each slot, which can be used as a hint during loading.
* DB size - besides `DBSIZE` API, we need to know size of the DB in many places want, in order to avoid scanning all dictionaries and summing up their sizes in a loop, we've introduced a new field into `redisDb` that keeps track of `key_count`. This way we can keep DBSIZE operation O(1). This is also kept for O(1) expires computation as well.

## Performance
This change improves SET performance in cluster mode by ~5%, most of the gains come from us not having to maintain linked lists for keys in slot, non-cluster mode has same performance. For workloads that rely on evictions, the performance is similar because of the extra overhead for finding keys to evict. 

RDB loading performance is slightly reduced, as the slot of each key needs to be computed during the load.

## Interface changes
* Removed `overhead.hashtable.slot-to-keys` to `MEMORY STATS`
* Scan API will now require 64 bits to store the cursor, even on 32 bit systems, as the slot information will be stored.
* New RDB version to support the new op code for SLOT information. 

---------

Co-authored-by: Vitaly Arbuzov <arvit@amazon.com>
Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: Roshan Khatri <rvkhatri@amazon.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2023-10-14 23:58:26 -07:00
Oran Agra
f0c1c730d4
test suite: clean server pids after server crashed (#12639)
when a server in the test suite crashes and is restarted by redstart_server, we didn't clean it's pid from the list.
we can see that when the corrupt-dump-fuzzer hangs, it has a long list of servers to lean, but in fact they're all already dead.
2023-10-13 16:28:52 +03:00
Harkrishn Patro
b784c5375e
Unsubscribe all clients from replica for shard channel if the master ownership changes (#12577)
Unsubscribe all clients from replica for shard channel if the master ownership changes
2023-10-12 20:48:27 -07:00
Ye Lin Aung
b705049a7a
Replace emptyDb() with new emptyData() (#12646)
The function was renamed, but the comments were outdated.
2023-10-12 15:34:08 +03:00
zhaozhao.zz
77a65e82b2
support XREAD[GROUP] with BLOCK option in scripts (#12596)
In #11568 we removed the NOSCRIPT flag from commands and keep the BLOCKING flag.
Aiming to allow them in scripts and let them implicitly behave in the non-blocking way.

In that sense, the old behavior was to allow LPOP and reject BLPOP, and the new behavior,
is to allow BLPOP too, and fail it only in case it ends up blocking.
So likewise, so far we allowed XREAD and rejected XREAD BLOCK, and we will now allow
that too, and only reject it if it ends up blocking.
2023-10-12 10:54:50 +03:00
Binbin
e5ef161374
Fix crash when running rebalance command in a mixed cluster of 7.0 and 7.2 (#12604)
In #10536, we introduced the assert, some older versions of servers
(like 7.0) doesn't gossip shard_id, so we will not add the node to
cluster->shards, and node->shard_id is filled in randomly and may not
be found here.

It causes that if we add a 7.2 node to a 7.0 cluster and allocate slots
to the 7.2 node, the 7.2 node will crash when it hits this assert. Somehow
like #12538.

In this PR, we remove the assert and replace it with an unconditional removal.
2023-10-11 22:15:25 -07:00
Binbin
4de4fcf280
Fix redis-cli pubsub_mode and connect minor prompt / crash issue (#12571)
When entering pubsub mode and using the redis-cli only
connect command, we need to reset pubsub_mode because
we switch to a different connection.

This will affect the prompt when the connection is successful,
and redis-cli will crash when the connect fails:
```
127.0.0.1:6379> subscribe ch
1) "subscribe"
2) "ch"
3) (integer) 1
127.0.0.1:6379(subscribed mode)> connect 127.0.0.1 6380
127.0.0.1:6380(subscribed mode)> ping
PONG
127.0.0.1:6380(subscribed mode)> connect a b
Could not connect to Redis at a:0: Name or service not known
Segmentation fault
```
2023-10-11 10:45:38 +03:00
Binbin
8d92f7f2b7
Support NO ONE block in REPLICAOF command json (#12633)
The current commands.json doesn't mention the special NO ONE arguments.
This change is also applied to SLAVEOF
2023-10-10 11:10:40 +03:00
Oran Agra
b810384c62
dump server longs on hang corrupt dump fuzzer test
recently there are some incidents of hanged tests in the CI
when we try to reproduce them, we get an assertion, not a hang.
maybe the server logs will reveal some info.
2023-10-08 16:19:31 +03:00
Jachin
a2b0701d2c
Fix compile on macOS 13 (#12611)
Use the __MAC_OS_X_VERSION_MIN_REQUIRED macro to detect the
macOS system version instead of using MAC_OS_X_VERSION_10_6.

From MacOSX14.0.sdk, the default definitions of MAC_OS_X_VERSION_xxx have
been removed in usr/include/AvailabilityMacros.h. It includes AvailabilityVersions.h,
where the following condition must be met:
`#if (!defined(_POSIX_C_SOURCE) && !defined(_XOPEN_SOURCE)) || defined(_DARWIN_C_SOURCE)`
Only then will MAC_OS_X_VERSION_xxx be defined.
However, in the project, _DARWIN_C_SOURCE is not defined, which leads to the
loss of the definition for MAC_OS_X_VERSION_10_6.
2023-10-08 11:12:50 +03:00
Oran Agra
fe37e4fc87
Cleanup nested module keyspace notifications (#12630)
Recently we added a way for the module to declare that it wishes to
receive nested KSN, by setting ALLOW_NESTED_KEYSPACE_NOTIFICATIONS.
but it looks like this flow has a bug, clearing the `active` member
when it was previously set. however, since nesting is permitted,
this bug has no implications, since regardless of the active member,
the notification is permitted.
2023-10-05 13:50:17 +03:00
YaacovHazan
2cf50ddbad
Fix 'load corrupted rdb with no CRC' test (#12629)
After the change in #12626 (2e0f6724e), the is_alive proc gets pid and not
server config.

This PR aligns it in 'load corrupted rdb with no CRC' test.
2023-10-03 11:09:25 +03:00
Madelyn Olson
31c3172d9b
Better standardize around assertions (#12539)
We use the C standard assert() in various places in the codebase, which requires NDEBUG to be undefined. We introduced the redisassert.h file in order to allow low level files to access the assert that maps to serverPanic, but this was only applied tactically and is not available broadly.

This PR removes all usage of the standard library asserts and replaces them with an assert that maps to serverPanic. It makes us immune to accidentally setting the NDEBUG flag preventing assertions. I also marked marked the server asserts as "likely" to not execute. I spot checked various points in the code, and it didn't change the code layout on my x86 mac, but it is more consistent with redisassert.h and seems more correct overall.
2023-10-02 18:58:44 -07:00
Madelyn Olson
9d31768cbb
Fix a couple of tabs that caused misindentation (#12541)
Fixed some usages of tabs which caused weird indentation in the code. Tried to find all of the places so their was one PR. I ignored all of the usages of tabs which don't really affect readability.
2023-10-02 16:44:09 -07:00
meiravgri
4ba9e18ef0
fix crash in crash-report and other improvements (#12623)
## Crash fix
### Current behavior
We might crash if we fail to collect some of the threads' output. If it exceeds timeout for example.

The threads mngr API guarantees that the output array length will be `tids_len`, however, some
indices can be NULL, in case it fails to collect some of the threads' outputs.

When we use the threads mngr to collect the threads' stacktraces, we rely on this and skip NULL
entries. Since the output array was allocated with malloc, instead of NULL, it contained garbage,
so we got a segmentation fault when trying to read this garbage. (in debug.c:writeStacktraces() )

### fix
Allocate the global output array with zcalloc.

### To reproduce the bug, you'll have to change the code:
**in threadsmngr:ThreadsManager_runOnThreads():**
make sure the g_output_array allocation is initialized with garbage and not 0s 
(add `memset(g_output_array, 2, sizeof(void*) * tids_len);` below the allocation).

Force one of the threads to write to the array:
add a global var: `static redisAtomic size_t return_now = 0;` 
add to `invoke_callback()` before writing to the output array:
```
    size_t i_return;
    atomicGetIncr(return_now, i_return, 1);
    if(i_return == 1) return;
```
compile, start the server with `--enable-debug-command local` and run `redis-cli debug assert`
The assertion triggers the the stacktrace collection. 
Expect to get 2 prints of the stack trace - since we get the segmentation fault after we return from
the threads mngr, it can be safely triggered again.

## Added global variables r/w lock in ThreadsManager
To avoid a situation where the main thread runs `ThreadsManager_cleanups` while threads are still
invoking the signal handler, we use a r/w lock.
For cleanups, we will acquire the write lock.
The threads will acquire the read lock to enable them to write simultaneously.
If we fail to acquire the read lock, it means cleanups are in progress and we return immediately.
After acquiring the lock we can safely check that the global output array wasn't nullified and proceed
to write to it.
This way we ensure the threads are not modifying the global variables/ trying to write to the output
array after they were zeroed/nullified/destroyed(the semaphore).

## other minor logging change
1. removed logging if the semaphore times out because the threads can still write to the output array
  after this check. Instead, we print the total number of printed stacktraces compared to the exacted
  number (len_tids).
2. use noinline attribute to make sure the uplevel number of ignored stack trace entries stays correct.
3. improve testing

Co-authored-by: Oran Agra <oran@redislabs.com>
2023-10-02 20:02:02 +03:00
YaacovHazan
2e0f6724e0
Stabilization and improvements around aof tests (#12626)
In some tests, the code manually searches for a log message, and it
uses tail -1 with a delay of 1 second, which can miss the expected line.

Also, because the aof tests use start_server_aof and not start_server,
the test name doesn't log into the server log.

To fix the above, I made the following changes:
- Change the start_server_aof to wrap the start_server.
  This will add the created aof server to the servers list, and make
  srv() and wait_for_log_messages() available for the tests.

- Introduce a new option for start_server.
  'wait_ready' - an option to let the caller start the test code without
  waiting for the server to be ready. useful for tests on a server that
  is expected to exit on startup.

- Create a new start_server_aof_ex.
  The new proc also accept options as argument and make use of the
  new 'short_life' option for tests that are expected to exit on startup
  because of some error in the aof file(s).

Because of the above, I had to change many lines and replace every
local srv variable (a server config) usage with the srv().
2023-10-02 08:20:53 +03:00
guybe7
c2a4b78491
WAITAOF: Update fsynced_reploff_pending even if there's nothing to fsync (#12622)
The problem is that WAITAOF could have hang in case commands were
propagated only to replicas.
This can happen if a module uses RM_Call with the REDISMODULE_ARGV_NO_AOF flag.
In that case, master_repl_offset would increase, but there would be nothing to fsync, so
in the absence of other traffic, fsynced_reploff_pending would stay the static, and WAITAOF can hang.

This commit updates fsynced_reploff_pending to the latest offset in flushAppendOnlyFile in case
there's nothing to fsync. i.e. in case it's behind because of the above mentions case it'll be refreshed
and release the WAITAOF.

Other changes:
Fix a race in wait.tcl (client getting blocked vs. the fsync thread)
2023-09-28 17:19:20 +03:00
guybe7
bfa3931a04
WAITAOF: Update fsynced_reploff_pending just before starting the initial AOFRW fork (#12620)
If we set `fsynced_reploff_pending` in `startAppendOnly`, and the fork doesn't start
immediately (e.g. there's another fork active at the time), any subsequent commands
will increment `server.master_repl_offset`, but will not cause a fsync (given they were
executed before the fork started, they just ended up in the RDB part of it)
Therefore, any WAITAOF will wait on the new master_repl_offset, but it will time out
because no fsync will be executed.

Release notes:
```
WAITAOF could timeout in the absence of write traffic in case a new AOF is created and
an AOFRW can't immediately start.
This can happen by the appendonly config is changed at runtime, but also after FLUSHALL,
and replica full sync.
```
2023-09-28 17:05:53 +03:00
Viktor Söderqvist
f924bebd83
Rewrite huge printf calls to smaller ones for readability (#12257)
In a long printf call with many placeholders, it's hard to see which argument
belongs to which placeholder.

The long printf-like calls in the INFO and CLIENT commands are rewritten into
pairs of (format, argument). These pairs are then rewritten to a single call with
a long format string and a long list of arguments, using a macro called FMTARGS.

The file `fmtargs.h` is added to the repo.

Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com>
2023-09-28 09:21:23 +03:00
Binbin
9fe63bdc80
Dump server logs when corrupt fuzzer reports crash (#12612)
Recently we found some signal crashes, but unable to reproduce them.
It is a good idea to dump the server logs when a failure happens.
2023-09-27 09:08:18 +03:00
Sankar
8cdeddc81c
Clear owner_not_claiming_slot bit for the slot in clusterDelSlot (#12564)
Clear owner_not_claiming_slot bit for the slot in clusterDelSlot to keep it
consistent with slot ownership information.
2023-09-26 14:03:27 -07:00
Nir Rattner
24187ed8e3
Fix overflow calculation for next timer event (#12474)
The `retval` variable is defined as an `int`, so with 4 bytes, it cannot properly represent
microsecond values greater than the equivalent of about 35 minutes. 

This bug shouldn't impact standard Redis behavior because Redis doesn't have timer
events that are scheduled as far as 35 minutes out, but it may affect custom Redis modules
which interact with the event timers via the RM_CreateTimer API.

The impact is that `usUntilEarliestTimer` may return 0 for as long as `retval` is scaled to
an overflowing value. While `usUntilEarliestTimer` continues to return `0`, `aeApiPoll`
will have a zero timeout, and so Redis will use significantly more CPU iterating through
its event loop without pause. For timers scheduled far enough into the future, Redis will
cycle between ~35 minute periods of high CPU usage and ~35 minute periods of standard
CPU usage.
2023-09-24 13:31:12 +03:00
meiravgri
cc2be63997
Print stack trace from all threads in crash report (#12453)
In this PR we are adding the functionality to collect all the process's threads' backtraces.

## Changes made in this PR

### **introduce threads mngr API**
The **threads mngr API** which has 2 abilities:
* `ThreadsManager_init() `- register to SIGUSR2. called on the server start-up.
* ` ThreadsManager_runOnThreads()` - receives a list of a pid_t and a callback, tells every
  thread in the list to invoke the callback, and returns the output collected by each invocation.
**Elaborating atomicvar API**
* `atomicIncrGet(var,newvalue_var,count) `-- Increment and get the atomic counter new value
* `atomicFlagGetSet` -- Get and set the atomic counter value to 1

### **Always set SIGALRM handler**
SIGALRM handler prints the process's stacktrace to the log file. Up until now, it was set only if the
`server.watchdog_period` > 0. This can be also useful if debugging is needed. However, in situations
where the server can't get requests, (a deadlock, for example) we weren't able to change the signal handler.
To make it available at run time we set SIGALRM handler on server startup. The signal handler name was
changed to a more general `sigalrmSignalHandler`.

### **Print all the process' threads' stacktraces**

`logStackTrace()` now calls `writeStacktraces()`, instead of logging the current thread stacktrace.
`writeStacktraces()`:
* On Linux systems we use the threads manager API to collect the backtraces of all the process' threads.
  To get the `tids` list (threads ids) we read the `/proc/<redis-server-pid>/tasks` file which includes a list of directories.
  Each directory name corresponds to one tid (including the main thread). For each thread, we also need to check if it
  can get the signal from the threads manager (meaning it is not blocking/ignoring that signal). We send the threads
  manager this tids list and `collect_stacktrace_data()` callback, which collects the thread's backtrace addresses,
  its name, and tid.
* On other systems, the behavior remained as it was (writing only the current thread stacktrace to the log file).

## compatibility notes
1. **The threads mngr API is only supported in linux.** 
2. glibc earlier than 2.3 We use `syscall(SYS_gettid)` and `syscall(SYS_tgkill...)` because their dedicated
  alternatives (`gettid()` and `tgkill`) were added in glibc 2.3.

## Output example

Each thread backtrace will have the following format:
`<tid> <thread_name> [additional_info]`
* **tid**: as read from the `/proc/<redis-server-pid>/tasks` file
* **thread_name**: the tread name as it is registered in the os/
* **additional_info**: Sometimes we want to add specific information about one of the threads. currently.
  it is only used to mark the thread that handles the backtraces collection by adding "*".
  In case of crash - this also indicates which thread caused the crash. The handling thread in won't
  necessarily appear first.

```
------ STACK TRACE ------
EIP:
/lib/aarch64-linux-gnu/libc.so.6(epoll_pwait+0x9c)[0xffffb9295ebc]

67089 redis-server *
linux-vdso.so.1(__kernel_rt_sigreturn+0x0)[0xffffb9437790]
/lib/aarch64-linux-gnu/libc.so.6(epoll_pwait+0x9c)[0xffffb9295ebc]
redis-server *:6379(+0x75e0c)[0xaaaac2fe5e0c]
redis-server *:6379(aeProcessEvents+0x18c)[0xaaaac2fe6c00]
redis-server *:6379(aeMain+0x24)[0xaaaac2fe7038]
redis-server *:6379(main+0xe0c)[0xaaaac3001afc]
/lib/aarch64-linux-gnu/libc.so.6(+0x273fc)[0xffffb91d73fc]
/lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0x98)[0xffffb91d74cc]
redis-server *:6379(_start+0x30)[0xaaaac2fe0370]

67093 bio_lazy_free
/lib/aarch64-linux-gnu/libc.so.6(+0x79dfc)[0xffffb9229dfc]
/lib/aarch64-linux-gnu/libc.so.6(pthread_cond_wait+0x208)[0xffffb922c8fc]
redis-server *:6379(bioProcessBackgroundJobs+0x174)[0xaaaac30976e8]
/lib/aarch64-linux-gnu/libc.so.6(+0x7d5c8)[0xffffb922d5c8]
/lib/aarch64-linux-gnu/libc.so.6(+0xe5d1c)[0xffffb9295d1c]

67091 bio_close_file
/lib/aarch64-linux-gnu/libc.so.6(+0x79dfc)[0xffffb9229dfc]
/lib/aarch64-linux-gnu/libc.so.6(pthread_cond_wait+0x208)[0xffffb922c8fc]
redis-server *:6379(bioProcessBackgroundJobs+0x174)[0xaaaac30976e8]
/lib/aarch64-linux-gnu/libc.so.6(+0x7d5c8)[0xffffb922d5c8]
/lib/aarch64-linux-gnu/libc.so.6(+0xe5d1c)[0xffffb9295d1c]

67092 bio_aof
/lib/aarch64-linux-gnu/libc.so.6(+0x79dfc)[0xffffb9229dfc]
/lib/aarch64-linux-gnu/libc.so.6(pthread_cond_wait+0x208)[0xffffb922c8fc]
redis-server *:6379(bioProcessBackgroundJobs+0x174)[0xaaaac30976e8]
/lib/aarch64-linux-gnu/libc.so.6(+0x7d5c8)[0xffffb922d5c8]
/lib/aarch64-linux-gnu/libc.so.6(+0xe5d1c)[0xffffb9295d1c]
67089:signal-handler (1693824528) --------
```
2023-09-24 09:47:23 +03:00
Chen Tianjie
2aad03fa39
Use server.current_client to decide whether cluster commands should return TLS info. (#12569)
Starting a change in #12233 (released in 7.2), CLUSTER commands use client's
connection to decide whether to return TLS port or non-TLS port, but commands
called by Lua script and module's RM_Call don't have a real client with connection,
and would currently be regarded as non-TLS connections.

We can use server.current_client instead when it is available. When it is not (module calls
commands without a real client), we may see this as an undefined behavior, and return null
or default port (currently in this PR it returns default port, judged by server.tls_cluster).
2023-09-21 18:41:32 +03:00
Binbin
4031a18732
Fix that slot return in CLUSTER SHARDS should be integer (#12561)
An unintentional change was introduced in #10536, we used
to use addReplyLongLong and now it is addReplyBulkLonglong, 
revert it back the previous behavior.
2023-09-09 23:33:00 -07:00
Binbin
96e9dec419
Bump codespell from 2.2.4 to 2.2.5 (#12557)
and adjustments.
2023-09-08 16:10:17 +03:00
nihohit
90e9fc387c
Update command tips on more admin / configuration commands (#12545)
Updated the command tips for ACL SAVE / SETUSER / DELUSER, CLIENT SETNAME / SETINFO, and LATENCY RESET.
The tips now match CONFIG SET, since there's a similar behavior for all of these commands - the
user expects to update the various configurations & states on all nodes, not only on a single, random node.
For LATENCY RESET the response tip is now agg_sum.

Co-authored-by: Shachar Langbeheim <shachlan@amazon.com>
2023-09-04 21:30:42 +03:00
secwall
a2046c1eb1
Check shard_id pointer validity in updateShardId (#12538)
When connecting between a 7.0 and 7.2 cluster, the 7.0 cluster will not populate the shard_id field, which is expect on the 7.2 cluster. This is not intended behavior, as the 7.2 cluster is supposed to use a temporary shard_id while the node is in the upgrading state, but it wasn't being correctly set in this case.
2023-09-02 20:14:48 -07:00
alonre24
044e29dd34
redis-benchmark - add the support for binary strings (#9414)
Recently, the option of sending an argument from stdin using `-x` flag
was added to redis-benchmark (this option is available in redis-cli as well).
However, using the `-x` option for sending a blobs that contains null-characters
doesn't work as expected - the argument is trimmed in the first occurrence of
`\X00` (unlike in redis-cli).  
This PR aims to fix this issue and add the support for every binary string input,
by sending arguments length to `redisFormatCommandArgv` when processing
redis-benchmark command, so we won't treat the arguments as C-strings.

Additionally, we add a simple test coverage for `-x` (without binary strings, and
also remove an excessive server started in tests, and make sure to select db 0
so that `r` and the benchmark work on the same db.

Co-authored-by: Oran Agra <oran@redislabs.com>
2023-09-02 15:37:04 +03:00
Binbin
4ba144a4eb
Add logreqres:skip flag to new INFO obuf limit test (#12537)
The new test added in #12476 causes reply-schemas-validator to fail.
When doing `catch {r get key}`, the req-res output is:
```
3
get
3
key
12
__argv_end__
$100000
aaaaaaaaaaaaaaaaaaaa...4
info
5
stats
12
__argv_end__
=1670
txt:# Stats
...
```

And we can see the link after `$100000`, there is a 4 in the last,
it break the req-res-log-validator script since the format is wrong.

The reason i guess is after the client reconnection (after the output
buf limit), we will not add newlines, but append args directly.
Since obuf-limits.tcl is doing the same thing, and it had the logreqres:skip
flag, so this PR is following it.
2023-09-01 14:15:11 +03:00
Roshan Khatri
49f7d173b4
Remove unnecessary use of sds and mem copy in module.c (#12533)
Found that in moduleConfigValidityCheck and isModuleConfigNameRegistered, sds is not required. This also allowed to remove unnecessary memcopy from some of the config registering APIs.
2023-08-31 14:08:05 -07:00
icy17
370d38016f
Fix potential crash on failed OpenSSL init (#12447) 2023-08-31 22:45:36 +03:00
Chen Tianjie
b26e8e3213
Optimize ZRANGE offset location from linear search to skiplist jump. (#12450)
ZRANGE BYSCORE/BYLEX with [LIMIT offset count] option was
using every level in skiplist to jump to the first/last node in range,
but only use level[0] in skiplist to locate the node at offset, resulting
in sub-optimal performance using LIMIT:
```
while (ln && offset--) {
    if (reverse) {
        ln = ln->backward;
    } else {
        ln = ln->level[0].forward;
    }
}
```
It could be slow when offset is very big. We can get the total rank of
the offset location and use skiplist to jump to it. It is an improvement
from O(offset) to O(log rank).

Below shows how this is implemented (if the offset is positve):

Use the skiplist to seach for the first element in the range, record its
rank `rank_0`, so we can have the rank of the target node `rank_t`.
Meanwhile we record the last node we visited which has zsl->level-1
levels and its rank `rank_1`. Then we start from the zsl->level-1 node,
use skiplist to go forward `rank_t-rank_1` nodes to reach the target node.

It is very similiar when the offset is reversed.

Note that if `rank_t` is very close to `rank_0`, we just start from the first
element in range and go node by node, this for the case when zsl->level-1
node is to far away and it is quicker to reach the target node by node.

Here is a test using a random generated zset including 10000 elements
(with different positive scores), doing a bench mark which compares how
fast the `ZRANGE` command is exucuted before and after the optimization. 

The start score is set to 0 and the count is set to 1 to make sure that
most of the time is spent on locating the offset.
```
memtier_benchmark -h 127.0.0.1 -p 6379 --command="zrange test 0 +inf byscore limit <offset> 1"
```
| offset | QPS(unstable) | QPS(optimized) |
|--------|--------|--------|
| 10 | 73386.02 | 74819.82 |
| 1000 | 48084.96 | 73177.73 |
| 2000 | 31156.79 | 72805.83 |
| 5000 | 10954.83 | 71218.21 |

With the result above, we can see that the original code is greatly
slowed down when offset gets bigger, and with the optimization the
speed is almost not affected.

Similiar results are generated when testing reversed offset:
```
memtier_benchmark -h 127.0.0.1 -p 6379 --command="zrange test +inf 0 byscore rev limit <offset> 1"
```
| offset | QPS(unstable) | QPS(optimized) |
|--------|--------|--------|
| 10 | 74505.14 | 71653.67 |
| 1000 | 46829.25 | 72842.75 |
| 2000 | 28985.48 | 73669.01 |
| 5000 | 11066.22 | 73963.45 | 

And the same conclusion is drawn from the tests of ZRANGE BYLEX.
2023-08-31 14:42:08 +03:00
Binbin
9ce8c54d74
Update sort_ro reply_schema to mention the null reply (#12534)
Also added a test to cover this case, so this can
cover the reply schemas check.
2023-08-31 06:36:35 +03:00
Roshan Khatri
7519960527
Allows modules to declare new ACL categories. (#12486)
This PR adds a new Module API int RM_AddACLCategory(RedisModuleCtx *ctx, const char *category_name) to add a new ACL command category.

Here, we initialize the ACLCommandCategories array by allocating space for 64 categories and duplicate the 21 default categories from the predefined array 'ACLDefaultCommandCategories' into the ACLCommandCategories array while ACL initialization. Valid ACL category names can only contain alphanumeric characters, underscores, and dashes.

The API when called, checks for the onload flag, category name validity, and for duplicate category name if present. If the conditions are satisfied, the API adds the new category to the trailing end of the ACLCommandCategories array and assigns the acl_categories flag bit according to the index at which the category is added.

If any error is encountered the errno is set accordingly by the API.

---------

Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2023-08-30 13:01:24 -07:00
bodong.ybd
b59f53efb3
Fix sort_ro get-keys function return wrong key number (#12522)
Before:
```
127.0.0.1:6379> command getkeys sort_ro key
(empty array)
127.0.0.1:6379>
```
After:
```
127.0.0.1:6379> command getkeys sort_ro key
1) "key"
127.0.0.1:6379>
```
2023-08-30 22:00:02 +03:00