Commit Graph

883 Commits

Author SHA1 Message Date
Roshan Khatri
88e83e517b
Add DEBUG_ASSERTIONS option to custom assert (#12667)
This PR introduces a new macro, serverAssertWithInfoDebug, to do complex assertions only for debugging. The main intention is to allow running complex operations during tests without impacting runtime performance. This assertion is enabled when setting DEBUG_ASSERTIONS.

The DEBUG_ASSERTIONS flag is set for the daily and CI variants of `test-sanitizer-address`.
2023-11-11 20:31:34 -08:00
Viktor Söderqvist
8878817d89
Optimize SCAN with MATCH when pattern implies cluster slot (#12536)
Optimize the performance of SCAN commands when a match pattern can only contain keys from a 
single slot in cluster mode. This can happen when the pattern contains a hash tag before any 
wildcard matchers or when the key contains no matchers.
2023-11-01 00:06:49 -07:00
Chen Tianjie
e9f312e087
Change stat_client_qbuf_limit_disconnections to atomic. (#12711)
In #12476 server.stat_client_qbuf_limit_disconnections was added. It is written in readQueryFromClient, which may be called by multiple threads when io-threads and io-threads-do-reads are turned on. Somehow we missed to make it an atomic variable.
2023-11-01 10:57:24 +08:00
Harkrishn Patro
4145d628b4
Reduce dbBuckets operation time complexity from O(N) to O(1) (#12697)
As part of #11695 independent dictionaries were introduced per slot.
Time complexity to discover total no. of buckets across all dictionaries
increased to O(N) with straightforward implementation of iterating over
all dictionaries and adding dictBuckets of each.

To optimize the time complexity, we could maintain a global counter at
db level to keep track of the count of buckets and update it on the start
and end of rehashing.

---------

Co-authored-by: Roshan Khatri <rvkhatri@amazon.com>
2023-10-27 22:05:40 +03:00
Harkrishn Patro
f3bf8485d8
Fix resize hash table dictionary iterator (#12660)
Dictionary iterator logic in the `tryResizeHashTables` method is picking the next
(incorrect) dictionary while the cursor is at a given slot. This could lead to some
dictionary/slot getting skipped from resizing.

Also stabilize the test.

problem introduced recently in #11695
2023-10-19 13:58:32 +03:00
Vitaly
0270abda82
Replace cluster metadata with slot specific dictionaries (#11695)
This is an implementation of https://github.com/redis/redis/issues/10589 that eliminates 16 bytes per entry in cluster mode, that are currently used to create a linked list between entries in the same slot.  Main idea is splitting main dictionary into 16k smaller dictionaries (one per slot), so we can perform all slot specific operations, such as iteration, without any additional info in the `dictEntry`. For Redis cluster, the expectation is that there will be a larger number of keys, so the fixed overhead of 16k dictionaries will be The expire dictionary is also split up so that each slot is logically decoupled, so that in subsequent revisions we will be able to atomically flush a slot of data.

## Important changes
* Incremental rehashing - one big change here is that it's not one, but rather up to 16k dictionaries that can be rehashing at the same time, in order to keep track of them, we introduce a separate queue for dictionaries that are rehashing. Also instead of rehashing a single dictionary, cron job will now try to rehash as many as it can in 1ms.
* getRandomKey - now needs to not only select a random key, from the random bucket, but also needs to select a random dictionary. Fairness is a major concern here, as it's possible that keys can be unevenly distributed across the slots. In order to address this search we introduced binary index tree). With that data structure we are able to efficiently find a random slot using binary search in O(log^2(slot count)) time.
* Iteration efficiency - when iterating dictionary with a lot of empty slots, we want to skip them efficiently. We can do this using same binary index that is used for random key selection, this index allows us to find a slot for a specific key index. For example if there are 10 keys in the slot 0, then we can quickly find a slot that contains 11th key using binary search on top of the binary index tree.
* scan API - in order to perform a scan across the entire DB, the cursor now needs to not only save position within the dictionary but also the slot id. In this change we append slot id into LSB of the cursor so it can be passed around between client and the server. This has interesting side effect, now you'll be able to start scanning specific slot by simply providing slot id as a cursor value. The plan is to not document this as defined behavior, however. It's also worth nothing the SCAN API is now technically incompatible with previous versions, although practically we don't believe it's an issue.
* Checksum calculation optimizations - During command execution, we know that all of the keys are from the same slot (outside of a few notable exceptions such as cross slot scripts and modules). We don't want to compute the checksum multiple multiple times, hence we are relying on cached slot id in the client during the command executions. All operations that access random keys, either should pass in the known slot or recompute the slot. 
* Slot info in RDB - in order to resize individual dictionaries correctly, while loading RDB, it's not enough to know total number of keys (of course we could approximate number of keys per slot, but it won't be precise). To address this issue, we've added additional metadata into RDB that contains number of keys in each slot, which can be used as a hint during loading.
* DB size - besides `DBSIZE` API, we need to know size of the DB in many places want, in order to avoid scanning all dictionaries and summing up their sizes in a loop, we've introduced a new field into `redisDb` that keeps track of `key_count`. This way we can keep DBSIZE operation O(1). This is also kept for O(1) expires computation as well.

## Performance
This change improves SET performance in cluster mode by ~5%, most of the gains come from us not having to maintain linked lists for keys in slot, non-cluster mode has same performance. For workloads that rely on evictions, the performance is similar because of the extra overhead for finding keys to evict. 

RDB loading performance is slightly reduced, as the slot of each key needs to be computed during the load.

## Interface changes
* Removed `overhead.hashtable.slot-to-keys` to `MEMORY STATS`
* Scan API will now require 64 bits to store the cursor, even on 32 bit systems, as the slot information will be stored.
* New RDB version to support the new op code for SLOT information. 

---------

Co-authored-by: Vitaly Arbuzov <arvit@amazon.com>
Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: Roshan Khatri <rvkhatri@amazon.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2023-10-14 23:58:26 -07:00
Madelyn Olson
31c3172d9b
Better standardize around assertions (#12539)
We use the C standard assert() in various places in the codebase, which requires NDEBUG to be undefined. We introduced the redisassert.h file in order to allow low level files to access the assert that maps to serverPanic, but this was only applied tactically and is not available broadly.

This PR removes all usage of the standard library asserts and replaces them with an assert that maps to serverPanic. It makes us immune to accidentally setting the NDEBUG flag preventing assertions. I also marked marked the server asserts as "likely" to not execute. I spot checked various points in the code, and it didn't change the code layout on my x86 mac, but it is more consistent with redisassert.h and seems more correct overall.
2023-10-02 18:58:44 -07:00
meiravgri
cc2be63997
Print stack trace from all threads in crash report (#12453)
In this PR we are adding the functionality to collect all the process's threads' backtraces.

## Changes made in this PR

### **introduce threads mngr API**
The **threads mngr API** which has 2 abilities:
* `ThreadsManager_init() `- register to SIGUSR2. called on the server start-up.
* ` ThreadsManager_runOnThreads()` - receives a list of a pid_t and a callback, tells every
  thread in the list to invoke the callback, and returns the output collected by each invocation.
**Elaborating atomicvar API**
* `atomicIncrGet(var,newvalue_var,count) `-- Increment and get the atomic counter new value
* `atomicFlagGetSet` -- Get and set the atomic counter value to 1

### **Always set SIGALRM handler**
SIGALRM handler prints the process's stacktrace to the log file. Up until now, it was set only if the
`server.watchdog_period` > 0. This can be also useful if debugging is needed. However, in situations
where the server can't get requests, (a deadlock, for example) we weren't able to change the signal handler.
To make it available at run time we set SIGALRM handler on server startup. The signal handler name was
changed to a more general `sigalrmSignalHandler`.

### **Print all the process' threads' stacktraces**

`logStackTrace()` now calls `writeStacktraces()`, instead of logging the current thread stacktrace.
`writeStacktraces()`:
* On Linux systems we use the threads manager API to collect the backtraces of all the process' threads.
  To get the `tids` list (threads ids) we read the `/proc/<redis-server-pid>/tasks` file which includes a list of directories.
  Each directory name corresponds to one tid (including the main thread). For each thread, we also need to check if it
  can get the signal from the threads manager (meaning it is not blocking/ignoring that signal). We send the threads
  manager this tids list and `collect_stacktrace_data()` callback, which collects the thread's backtrace addresses,
  its name, and tid.
* On other systems, the behavior remained as it was (writing only the current thread stacktrace to the log file).

## compatibility notes
1. **The threads mngr API is only supported in linux.** 
2. glibc earlier than 2.3 We use `syscall(SYS_gettid)` and `syscall(SYS_tgkill...)` because their dedicated
  alternatives (`gettid()` and `tgkill`) were added in glibc 2.3.

## Output example

Each thread backtrace will have the following format:
`<tid> <thread_name> [additional_info]`
* **tid**: as read from the `/proc/<redis-server-pid>/tasks` file
* **thread_name**: the tread name as it is registered in the os/
* **additional_info**: Sometimes we want to add specific information about one of the threads. currently.
  it is only used to mark the thread that handles the backtraces collection by adding "*".
  In case of crash - this also indicates which thread caused the crash. The handling thread in won't
  necessarily appear first.

```
------ STACK TRACE ------
EIP:
/lib/aarch64-linux-gnu/libc.so.6(epoll_pwait+0x9c)[0xffffb9295ebc]

67089 redis-server *
linux-vdso.so.1(__kernel_rt_sigreturn+0x0)[0xffffb9437790]
/lib/aarch64-linux-gnu/libc.so.6(epoll_pwait+0x9c)[0xffffb9295ebc]
redis-server *:6379(+0x75e0c)[0xaaaac2fe5e0c]
redis-server *:6379(aeProcessEvents+0x18c)[0xaaaac2fe6c00]
redis-server *:6379(aeMain+0x24)[0xaaaac2fe7038]
redis-server *:6379(main+0xe0c)[0xaaaac3001afc]
/lib/aarch64-linux-gnu/libc.so.6(+0x273fc)[0xffffb91d73fc]
/lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0x98)[0xffffb91d74cc]
redis-server *:6379(_start+0x30)[0xaaaac2fe0370]

67093 bio_lazy_free
/lib/aarch64-linux-gnu/libc.so.6(+0x79dfc)[0xffffb9229dfc]
/lib/aarch64-linux-gnu/libc.so.6(pthread_cond_wait+0x208)[0xffffb922c8fc]
redis-server *:6379(bioProcessBackgroundJobs+0x174)[0xaaaac30976e8]
/lib/aarch64-linux-gnu/libc.so.6(+0x7d5c8)[0xffffb922d5c8]
/lib/aarch64-linux-gnu/libc.so.6(+0xe5d1c)[0xffffb9295d1c]

67091 bio_close_file
/lib/aarch64-linux-gnu/libc.so.6(+0x79dfc)[0xffffb9229dfc]
/lib/aarch64-linux-gnu/libc.so.6(pthread_cond_wait+0x208)[0xffffb922c8fc]
redis-server *:6379(bioProcessBackgroundJobs+0x174)[0xaaaac30976e8]
/lib/aarch64-linux-gnu/libc.so.6(+0x7d5c8)[0xffffb922d5c8]
/lib/aarch64-linux-gnu/libc.so.6(+0xe5d1c)[0xffffb9295d1c]

67092 bio_aof
/lib/aarch64-linux-gnu/libc.so.6(+0x79dfc)[0xffffb9229dfc]
/lib/aarch64-linux-gnu/libc.so.6(pthread_cond_wait+0x208)[0xffffb922c8fc]
redis-server *:6379(bioProcessBackgroundJobs+0x174)[0xaaaac30976e8]
/lib/aarch64-linux-gnu/libc.so.6(+0x7d5c8)[0xffffb922d5c8]
/lib/aarch64-linux-gnu/libc.so.6(+0xe5d1c)[0xffffb9295d1c]
67089:signal-handler (1693824528) --------
```
2023-09-24 09:47:23 +03:00
Binbin
96e9dec419
Bump codespell from 2.2.4 to 2.2.5 (#12557)
and adjustments.
2023-09-08 16:10:17 +03:00
Chen Tianjie
b26e8e3213
Optimize ZRANGE offset location from linear search to skiplist jump. (#12450)
ZRANGE BYSCORE/BYLEX with [LIMIT offset count] option was
using every level in skiplist to jump to the first/last node in range,
but only use level[0] in skiplist to locate the node at offset, resulting
in sub-optimal performance using LIMIT:
```
while (ln && offset--) {
    if (reverse) {
        ln = ln->backward;
    } else {
        ln = ln->level[0].forward;
    }
}
```
It could be slow when offset is very big. We can get the total rank of
the offset location and use skiplist to jump to it. It is an improvement
from O(offset) to O(log rank).

Below shows how this is implemented (if the offset is positve):

Use the skiplist to seach for the first element in the range, record its
rank `rank_0`, so we can have the rank of the target node `rank_t`.
Meanwhile we record the last node we visited which has zsl->level-1
levels and its rank `rank_1`. Then we start from the zsl->level-1 node,
use skiplist to go forward `rank_t-rank_1` nodes to reach the target node.

It is very similiar when the offset is reversed.

Note that if `rank_t` is very close to `rank_0`, we just start from the first
element in range and go node by node, this for the case when zsl->level-1
node is to far away and it is quicker to reach the target node by node.

Here is a test using a random generated zset including 10000 elements
(with different positive scores), doing a bench mark which compares how
fast the `ZRANGE` command is exucuted before and after the optimization. 

The start score is set to 0 and the count is set to 1 to make sure that
most of the time is spent on locating the offset.
```
memtier_benchmark -h 127.0.0.1 -p 6379 --command="zrange test 0 +inf byscore limit <offset> 1"
```
| offset | QPS(unstable) | QPS(optimized) |
|--------|--------|--------|
| 10 | 73386.02 | 74819.82 |
| 1000 | 48084.96 | 73177.73 |
| 2000 | 31156.79 | 72805.83 |
| 5000 | 10954.83 | 71218.21 |

With the result above, we can see that the original code is greatly
slowed down when offset gets bigger, and with the optimization the
speed is almost not affected.

Similiar results are generated when testing reversed offset:
```
memtier_benchmark -h 127.0.0.1 -p 6379 --command="zrange test +inf 0 byscore rev limit <offset> 1"
```
| offset | QPS(unstable) | QPS(optimized) |
|--------|--------|--------|
| 10 | 74505.14 | 71653.67 |
| 1000 | 46829.25 | 72842.75 |
| 2000 | 28985.48 | 73669.01 |
| 5000 | 11066.22 | 73963.45 | 

And the same conclusion is drawn from the tests of ZRANGE BYLEX.
2023-08-31 14:42:08 +03:00
Roshan Khatri
7519960527
Allows modules to declare new ACL categories. (#12486)
This PR adds a new Module API int RM_AddACLCategory(RedisModuleCtx *ctx, const char *category_name) to add a new ACL command category.

Here, we initialize the ACLCommandCategories array by allocating space for 64 categories and duplicate the 21 default categories from the predefined array 'ACLDefaultCommandCategories' into the ACLCommandCategories array while ACL initialization. Valid ACL category names can only contain alphanumeric characters, underscores, and dashes.

The API when called, checks for the onload flag, category name validity, and for duplicate category name if present. If the conditions are satisfied, the API adds the new category to the trailing end of the ACLCommandCategories array and assigns the acl_categories flag bit according to the index at which the category is added.

If any error is encountered the errno is set accordingly by the API.

---------

Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2023-08-30 13:01:24 -07:00
Chen Tianjie
e3d4b30d09
Add two stats to count client input and output buffer oom. (#12476)
Add these INFO metrics:
* client_query_buffer_limit_disconnections
* client_output_buffer_limit_disconnections

Sometimes it is useful to monitor whether clients reaches size limit of
query buffer and output buffer, to decide whether we need to adjust the
buffer size limit or reduce client query payload.
2023-08-30 21:51:14 +03:00
meiravgri
fe47c2027b
Signal handler attributes (#12426)
This PR purpose is to make the crash report process thread safe.
main changes include:

1. `setupSigSegvHandler()` is introduced to initialize the signal handler.
This function first initializes the signal handler mutex (if not initialized yet)
and then registers the process to the signal handler. 

2. **sigsegvHandler** flags :
SA_NODEFER - don't add the signal to the process signal mask. We use this
flag because we want to be able to handle a second call to the signal manually.
removed SA_RESETHAND: this flag resets the signal handler function upon the first
entrance to the registered function. The reason to use this flag is to protect from
recursively entering the signal handler by the same thread. But, it also means
that if a second thread crashes while handling a signal, the process will be
terminated immediately and we won't get the crash report.
In this PR we discard this flag. The signal handler guard described below purpose
is to solve the above issues.

3. Add a **signal handler lock** with ERRORCHECK attributes. 
The lock's purpose is to ensure that only one thread generates a crash report.
Once a second thread enters the signal handler it will be blocked.
We use the ERRORCHECK lock in order to protect from possible deadlock in
case the thread handling the crash gets a signal. In the latest scenario, we log
what we have collected until the handler crashed.

At the end of the crash report we reset the signal handler SIG_DFL, with no flags, and
rethrow the signal to generate a core dump (if enabled) and exit the process.

During the work on this PR we wanted to understand the historical reasons for
how crash is handled.
With respect to the choice of the flag, we believe the **SA_RESETHAND** was not
added for any specific purpose.
**SA_ONSTACK** which is removed here from bugReportEnd(), was originally also
set in the initial registration to signal handler, but removed in 3ada43e73. In addition,
it was removed from another location in deee2c1ef with the following description,
which is also relevant to why it should be removed from bugReportEnd:

> it seems to be some valgrind bug with SA_ONSTACK.
> SA_ONSTACK seems unneeded since WD is not recursive (SA_NODEFER was removed),
> also, not sure if it's even valid without a call to sigaltstack()
2023-08-20 19:16:45 +03:00
judeng
07ed0eafa9
improve performance for scan command when matching pattern or data type (#12209)
Optimized the performance of the SCAN command in a few ways:
1. Move the key filtering (by MATCH pattern) in the scan callback,
  so as to avoid collecting them for later filtering.
2. Reduce a many memory allocations and copying (use a reference
  to the original sds, instead of creating an robj, an excessive 2 mallocs
  and one string duplication)
3. Compare TYPE filter directly (as integers), instead of inefficient string
  compare per key.
4. fixed a small bug: when scan zset and hash types, maxiterations uses
  a more accurate number to avoid wrong double maxiterations.

Changes **postponed** for a later version (8.0):
1. Prepare to move the TYPE filtering to the scan callback as well. this was
  put on hold since it has side effects that can be considered a breaking
  change, which is that we will not attempt to do lazy expire (delete) a key
  that was filtered by not matching the TYPE (changing it would mean TYPE filter
  starts behaving the same as MATCH filter already does in that respect). 
2. when the specified key TYPE filter is an unknown type, server will reply a error
  immediately instead of doing a full scan that comes back empty handed. 

Benchmark result:
For different scenarios, we obtained about 30% or more performance improvement.

Co-authored-by: Oran Agra <oran@redislabs.com>
2023-06-27 16:43:46 +03:00
guybe7
3230199920
Modules: Unblock from within a timer coverage (#12337)
Apart from adding the missing coverage, this PR also adds `blockedBeforeSleep`
that gathers all block-related functions from `beforeSleep`

The order inside `blockedBeforeSleep` is different: now `handleClientsBlockedOnKeys`
(which may unblock clients) is called before `processUnblockedClients` (which handles
unblocked clients).
It makes sense to have this order.

There are no visible effects of the wrong ordering, except some cleanups of the now-unblocked
client would have  happen in the next `beforeSleep` (will now happen in the current one)

The reason we even got into it is because i triggers an assertion in logresreq.c (breaking
the assumption that `unblockClient` is called **before** actually flushing the reply to the socket):
`handleClientsBlockedOnKeys` is called, then it calls `moduleUnblockClientOnKey`, which calls
`moduleUnblockClient`, which adds the client to `moduleUnblockedClients` back to `beforeSleep`,
we call `handleClientsWithPendingWritesUsingThreads`, it writes the data of buf to the client, so
`client->bufpos` became 0
On the next `beforeSleep`, we call `moduleHandleBlockedClients`, which calls `unblockClient`,
which calls `reqresAppendResponse`, triggering the assert. (because the `bufpos` is 0) - see https://github.com/redis/redis/pull/12301#discussion_r1226386716
2023-06-22 23:15:16 +03:00
Oran Agra
8ad8f0f9d8
Fix broken protocol when PUBLISH emits local push inside MULTI (#12326)
When a connection that's subscribe to a channel emits PUBLISH inside MULTI-EXEC,
the push notification messes up the EXEC response.

e.g. MULTI, PING, PUSH foo bar, PING, EXEC
the EXEC's response will contain: PONG, {message foo bar}, 1. and the second PONG
will be delivered outside the EXEC's response.

Additionally, this PR changes the order of responses in case of a plain PUBLISH (when
the current client also subscribed to it), by delivering the push after the command's
response instead of before it.
This also affects modules calling RM_PublishMessage in a similar way, so that we don't
run the risk of getting that push mixed together with the module command's response.
2023-06-20 20:41:41 +03:00
judeng
93708c7f6a
use embedded string object and more efficient ll2string for long long value convert to string (#12250)
A value of type long long is always less than 21 bytes when convert to a
string, so always meets the conditions for using embedded string object
which can always get memory reduction and performance gain (less calls
to the heap allocator).
Additionally, for the conversion of longlong type to sds, we also use a faster
algorithm (the one in util.c instead of the one that used to be in sds.c). 

For the DECR command on 32-bit Redis, we get about a 5.7% performance
improvement. There will also be some performance gains for some commands
that heavily use sdscatfmt to convert numbers, such as INFO.

Co-authored-by: Oran Agra <oran@redislabs.com>
2023-06-20 15:14:44 +03:00
Binbin
b510624978
Optimize PSUBSCRIBE and PUNSUBSCRIBE from O(N*M) to O(N) (#12298)
In the original implementation, the time complexity of the commands
is actually O(N*M), where N is the number of patterns the client is
already subscribed and M is the number of patterns to subscribe to.
The docs are all wrong about this.

Specifically, because the original client->pubsub_patterns is a list,
so we need to do listSearchKey which is O(N). In this PR, we change it
to a dict, so the search becomes O(1).

At the same time, both pubsub_channels and pubsubshard_channels are dicts.
Changing pubsub_patterns to a dictionary improves the readability and
maintainability of the code.
2023-06-19 16:31:18 +03:00
Wen Hui
070453eef3
Cluster human readable nodename feature (#9564)
This PR adds a human readable name to a node in clusters that are visible as part of error logs. This is useful so that admins and operators of Redis cluster have better visibility into failures without having to cross-reference the generated ID with some logical identifier (such as pod-ID or EC2 instance ID). This is mentioned in #8948. Specific nodenames can be set by using the variable cluster-announce-human-nodename. The nodename is gossiped using the clusterbus extension in #9530.

Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2023-06-17 21:16:51 -07:00
Oran Agra
c2f1815bcb
Avoid trying to trim string loaded from RDB file. (#12241)
This is a followup fix for #11817
2023-05-30 10:43:25 +03:00
Oran Agra
2764dc3768
Optimize MSETNX to avoid double lookup (#11944)
This is a redo of #11594 which got reverted in #11940
It improves performance by avoiding double lookup of the the key.
2023-05-28 10:58:29 +03:00
judeng
d71478a889
postpone the initialization of oject's lru&lfu until it is added to the db as a value object (#11626)
This pr can get two performance benefits:
1. Stop redundant initialization when most robj objects are created
2. LRU_CLOCK will no longer be called in io threads, so we can avoid the `atomicGet`

Another code optimization:
deleted the redundant judgment in dbSetValue, no matter in LFU or LRU, the lru field inold
robj is always the freshest (it is always updated in lookupkey), so we don't need to judge if in LFU
2023-05-24 09:40:11 +03:00
zhaozhao.zz
07ea220419
add a new loglevel 'nothing' to disable logging (#12133)
Users can record logs of different levels by setting the `loglevel`.
However, sometimes there are many logs even at the warning level,
which can affect the performance of Redis.

For example, when a user accesses the tls-port using a non-encrypted link,
Redis will log lots of "# Error accepting a client connection: ...".

We can provide the ability to disable logging so that users can temporarily turn
off logging and turn it back on after the problem is resolved.
2023-05-23 18:30:44 +03:00
Binbin
006ab26c37
Optimize HRANDFIELD and ZRANDMEMBER case 3 when listpack encoded (#12205)
Optimized HRANDFIELD and ZRANDMEMBER commands as in #8444,
CASE 3 under listpack encoding. Boost optimization to CASE 2.5. 

CASE 2.5 listpack only. Sampling unique elements, in non-random order.
Listpack encoded hashes / zsets are meant to be relatively small, so
HRANDFIELD_SUB_STRATEGY_MUL / ZRANDMEMBER_SUB_STRATEGY_MUL
isn't necessary and we rather not make copies of the entries. Instead, we
emit them directly to the output buffer.

Simple benchmarks shows it provides some 400% improvement in HRANDFIELD
and ZRANGESTORE both in CASE 3.

Unrelated changes: remove useless setTypeRandomElements and fix a typo.
2023-05-22 15:48:32 +03:00
YaacovHazan
49845c24b1
fix CMD_CALL_FROM_MODULE value (#12195)
CMD_CALL_FROM_MODULE overlapped CMD_CALL_REPROCESSING,
changing its value to (1<<3)
2023-05-18 09:23:21 +03:00
Chen Tianjie
29ca87955e
Add basic eventloop latency measurement. (#11963)
The measured latency(duration) includes the list below, which can be shown by `INFO STATS`.
```
eventloop_cycles  // ever increasing counter
eventloop_duration_sum // cumulative duration of eventloop in microseconds
eventloop_duration_cmd_sum  // cumulative duration of executing commands in microseconds
instantaneous_eventloop_cycles_per_sec  // average eventloop count per second in recent 1.6s
instantaneous_eventloop_duration_usec  // average single eventloop duration in recent 1.6s
```

Also added some experimental metrics, which are shown only when `INFO DEBUG` is called.
This section isn't included in the default INFO, or even in `INFO ALL` and the fields in this section
can change in the future without considering backwards compatibility.
```
eventloop_duration_aof_sum  // cumulative duration of writing AOF
eventloop_duration_cron_sum  // cumulative duration cron jobs (serverCron, beforeSleep excluding IO and AOF)
eventloop_cmd_per_cycle_max  // max number of commands executed in one eventloop
eventloop_duration_max  // max duration of one eventloop
```

All of these are being reset by CONFIG RESETSTAT
2023-05-12 20:13:15 +03:00
Madelyn Olson
a129a60181
Minor performance improvement to SADD and HSET (#12019)
For sets and hashes that will eventually be stored as the hash encoding, it's much faster to immediately convert them to their hash encoding and then perform the insertions since it avoids the O(N) search and frequent reallocations. This change checks the number of arguments in the incoming command, and converts the data-structure if the number of new entries exceeds the listpack-max-entries configuration. This can cause us to over-allocate memory if their are duplicate entries in the input, which is unexpected.

unstable

Summary:
  throughput summary: 805.54 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
       61.908    25.680    68.351    73.279    75.967    79.295
hset-improvement

Summary:
  throughput summary: 4701.46 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
       10.546     0.832    11.959    12.471    13.119    14.967
2023-05-08 16:11:20 -07:00
Madelyn Olson
5e3be1be09
Remove prototypes with empty declarations (#12020)
Technically declaring a prototype with an empty declaration has been deprecated since the early days of C, but we never got a warning for it. C2x will apparently be introducing a breaking change if you are using this type of declarator, so Clang 15 has started issuing a warning with -pedantic. Although not apparently a problem for any of the compiler we build on, if feels like the right thing is to properly adhere to the C standard and use (void).
2023-05-02 17:31:32 -07:00
YaacovHazan
74959a0f73
Misuse of bool in redis (#12077)
We currently do not allow the use of bool type in redis project.

We didn't catch it in script.c because we included hdr_histogram.h in server.h

Removing it (but still having it in some c files) reducing
the chance to miss the usage of bool type in the future and catch it
in compiling stage.

It also removes the dependency on hdr_histogram for every unit
that includes server.h
2023-04-20 09:49:19 +03:00
Binbin
bfec2d700b
Add RM_ReplyWithErrorFormat that can support format (#11923)
* Add RM_ReplyWithErrorFormat that can support format

Reply with the error create from a printf format and arguments.

If the error code is already passed in the string 'fmt', the error
code provided is used, otherwise the string "-ERR " for the generic
error code is automatically added.

The usage is, for example:
    RedisModule_ReplyWithErrorFormat(ctx, "An error: %s", "foo");
    RedisModule_ReplyWithErrorFormat(ctx, "-WRONGTYPE Wrong Type: %s", "foo");

The function always returns REDISMODULE_OK.
2023-04-12 10:11:29 +03:00
Subhi Al Hasan
74b29985ce
check for known-slave in sentinel rewrite config (#11775)
Fix the following config file error

```
*** FATAL CONFIG FILE ERROR (Redis 6.2.7) ***
Reading the configuration file, at line 152
>>> 'sentinel known-replica XXXX 127.0.0.1 5001'
Duplicate hostname and port for replica.
```


that is happening when a user uses the legacy key "known-slave" in
the config file and a config rewrite occurs. The config rewrite logic won't
replace the old  line "sentinel known-slave XXXX 127.0.0.1 5001" and
would add a new line with "sentinel known-replica XXXX 127.0.0.1 5001"
which results in the error above "Duplicate hostname and port for replica."

example:

Current sentinal.conf
```
...

sentinel known-slave XXXX 127.0.0.1 5001
sentinel example-random-option X
...
```
after the config rewrite logic runs:
```
....
sentinel known-slave XXXX 127.0.0.1 5001
sentinel example-random-option X

# Generated by CONFIG REWRITE
sentinel known-replica XXXX 127.0.0.1 5001
```

This bug only exists in Redis versions >=6.2 because prior to that it was hidden
by the effects of this bug https://github.com/redis/redis/issues/5388 that was fixed
in https://github.com/redis/redis/pull/8271 and was released in versions >=6.2
2023-04-04 11:53:57 +03:00
Jason Elbaum
1f76bb17dd
Reimplement cli hints based on command arg docs (#10515)
Now that the command argument specs are available at runtime (#9656), this PR addresses
#8084 by implementing a complete solution for command-line hinting in `redis-cli`.

It correctly handles nearly every case in Redis's complex command argument definitions, including
`BLOCK` and `ONEOF` arguments, reordering of optional arguments, and repeated arguments
(even when followed by mandatory arguments). It also validates numerically-typed arguments.
It may not correctly handle all possible combinations of those, but overall it is quite robust.

Arguments are only matched after the space bar is typed, so partial word matching is not
supported - that proved to be more confusing than helpful. When the user's current input
cannot be matched against the argument specs, hinting is disabled.

Partial support has been implemented for legacy (pre-7.0) servers that do not support
`COMMAND DOCS`, by falling back to a statically-compiled command argument table.
On startup, if the server does not support `COMMAND DOCS`, `redis-cli` will now issue
an `INFO SERVER` command to retrieve the server version (unless `HELLO` has already
been sent, in which case the server version will be extracted from the reply to `HELLO`).
The server version will be used to filter the commands and arguments in the command table,
removing those not supported by that version of the server. However, the static table only
includes core Redis commands, so with a legacy server hinting will not be supported for
module commands. The auto generated help.h and the scripts that generates it are gone.

Command and argument tables for the server and CLI use different structs, due primarily
to the need to support different runtime data. In order to generate code for both, macros
have been added to `commands.def` (previously `commands.c`) to make it possible to
configure the code generation differently for different use cases (one linked with redis-server,
and one with redis-cli).

Also adding a basic testing framework for the command hints based on new (undocumented)
command line options to `redis-cli`: `--test_hint 'INPUT'` prints out the command-line hint for
a given input string, and `--test_hint_file <filename>` runs a suite of test cases for the hinting
mechanism. The test suite is in `tests/assets/test_cli_hint_suite.txt`, and it is run from
`tests/integration/redis-cli.tcl`.

Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2023-03-30 19:03:56 +03:00
Binbin
cb17178658
Fix fork done handler wrongly update fsync metrics and enhance AOF_ FSYNC_ALWAYS (#11973)
This PR fix several unrelated bugs that were discovered by the same set of tests
(WAITAOF tests in #11713), could make the `WAITAOF` test hang. 

The change in `backgroundRewriteDoneHandler` is about MP-AOF.
That leftover / old code assumes that we started a new AOF file just now
(when we have a new base into which we're gonna incrementally write), but
the fact is that with MP-AOF, the fork done handler doesn't really affect the
incremental file being maintained by the parent process, there's no reason to
re-issue `SELECT`, and no reason to update any of the fsync variables in that flow.
This should have been deleted with MP-AOF (introduced in #9788, 7.0).
The damage is that the update to `aof_fsync_offset` will cause us to miss an fsync
in `flushAppendOnlyFile`, that happens if we stop write commands in `AOF_FSYNC_EVERYSEC`
while an AOFRW is in progress. This caused a new `WAITAOF` test to sometime hang forever.

Also because of MP-AOF, we needed to change `aof_fsync_offset` to `aof_last_incr_fsync_offset`
and match it to `aof_last_incr_size` in `flushAppendOnlyFile`. This is because in the past we compared
`aof_fsync_offset` and `aof_current_size`, but with MP-AOF it could be the total AOF file will be
smaller after AOFRW, and the (already existing) incr file still has data that needs to be fsynced.

The change in `flushAppendOnlyFile`, about the `AOF_FSYNC_ALWAYS`, it is follow #6053
(the details is in #5985), we also check `AOF_FSYNC_ALWAYS` to handle a case where
appendfsync is changed from everysec to always while there is data that's written but not yet fsynced.
2023-03-29 15:17:05 +03:00
Igor Malinovskiy
c3b9f2fbd9
Allow clients to report name and version (#11758)
This PR allows clients to send information about the client library to redis
to be displayed in CLIENT LIST and CLIENT INFO.

Currently supports:
`CLIENT [lib-name | lib-ver] <value>`
Client libraries are expected to pipeline these right after AUTH, and ignore
the failure in case they're talking to an older version of redis.

These will be shown in CLIENT LIST and CLIENT INFO as:
* `lib-name` - meant to hold the client library name.
* `lib-ver` - meant to hold the client library version.

The values cannot contain spaces, newlines and any wild ASCII characters,
but all other normal chars are accepted, e.g `.`, `=` etc (same as CLIENT NAME).

The RESET command does NOT clear these, but they can be cleared to the
default by sending a command with a blank string.

Co-authored-by: Oran Agra <oran@redislabs.com>
2023-03-22 08:17:20 +02:00
Roshan Khatri
6948dacaf6
Module commands to have ACL categories. (#11708)
This allows modules to register commands to existing ACL categories and blocks the creation of [sub]commands, datatypes and registering the configs outside of the OnLoad function.

For allowing modules to register commands to existing ACL categories,
This PR implements a new API int RM_SetCommandACLCategories() which takes a pointer to a RedisModuleCommand and a C string aclflags containing the set of space separated ACL categories.
Example, 'write slow' marks the command as part of the write and slow ACL categories.

The C string aclflags is tokenized by implementing a helper function categoryFlagsFromString(). Theses tokens are matched and the corresponding ACL categories flags are set by a helper function matchAclCategoriesFlags. The helper function categoryFlagsFromString() returns the corresponding categories_flags or returns -1 if some token not processed correctly.

If the module contains commands which are registered to existing ACL categories, the number of [sub]commands are tracked by num_commands_with_acl_categories in struct RedisModule. Further, the allowed command bit-map of the existing users are recomputed from the command_rules list, by implementing a function called ACLRecomputeCommandBitsFromCommandRulesAllUsers() for the existing users to have access to the module commands on runtime.

## Breaking change
This change requires that registering commands and subcommands only occur during a modules "OnLoad" function, in order to allow efficient recompilation of ACL bits. We also chose to block registering configs and types, since we believe it's only valid for those to be created during onLoad. We check for this onload flag in struct RedisModule to check if the call is made from the OnLoad function.

Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2023-03-21 10:07:11 -07:00
Meir Shpilraien (Spielrein)
d0da0a6a3f
Support for RM_Call on blocking commands (#11568)
Allow running blocking commands from within a module using `RM_Call`.

Today, when `RM_Call` is used, the fake client that is used to run command
is marked with `CLIENT_DENY_BLOCKING` flag. This flag tells the command
that it is not allowed to block the client and in case it needs to block, it must
fallback to some alternative (either return error or perform some default behavior).
For example, `BLPOP` fallback to simple `LPOP` if it is not allowed to block.

All the commands must respect the `CLIENT_DENY_BLOCKING` flag (including
module commands). When the command invocation finished, Redis asserts that
the client was not blocked.

This PR introduces the ability to call blocking command using `RM_Call` by
passing a callback that will be called when the client will get unblocked.
In order to do that, the user must explicitly say that he allow to perform blocking
command by passing a new format specifier argument, `K`, to the `RM_Call`
function. This new flag will tell Redis that it is allow to run blocking command
and block the client. In case the command got blocked, Redis will return a new
type of call reply (`REDISMODULE_REPLY_PROMISE`). This call reply indicates
that the command got blocked and the user can set the on_unblocked handler using
`RM_CallReplyPromiseSetUnblockHandler`.

When clients gets unblocked, it eventually reaches `processUnblockedClients` function.
This is where we check if the client is a fake module client and if it is, we call the unblock
callback instead of performing the usual unblock operations.

**Notice**: `RM_CallReplyPromiseSetUnblockHandler` must be called atomically
along side the command invocation (without releasing the Redis lock in between).
In addition, unlike other CallReply types, the promise call reply must be released
by the module when the Redis GIL is acquired.

The module can abort the execution on the blocking command (if it was not yet
executed) using `RM_CallReplyPromiseAbort`. the API will return `REDISMODULE_OK`
on success and `REDISMODULE_ERR` if the operation is already executed.
**Notice** that in case of misbehave module, Abort might finished successfully but the
operation will not really be aborted. This can only happened if the module do not respect
the disconnect callback of the blocked client. 
For pure Redis commands this can not happened.

### Atomicity Guarantees

The API promise that the unblock handler will run atomically as an execution unit.
This means that all the operation performed on the unblock handler will be wrapped
with a multi exec transaction when replicated to the replica and AOF.
The API **do not** grantee any other atomicity properties such as when the unblock
handler will be called. This gives us the flexibility to strengthen the grantees (or not)
in the future if we will decide that we need a better guarantees.

That said, the implementation **does** provide a better guarantees when performing
pure Redis blocking command like `BLPOP`. In this case the unblock handler will run
atomically with the operation that got unblocked (for example, in case of `BLPOP`, the
unblock handler will run atomically with the `LPOP` operation that run when the command
got unblocked). This is an implementation detail that might be change in the future and the
module writer should not count on that.

### Calling blocking commands while running on script mode (`S`)

`RM_Call` script mode (`S`) was introduced on #0372. It is used for usecases where the
command that was invoked on `RM_Call` comes from a user input and we want to make
sure the user will not run dangerous commands like `shutdown`. Some command, such
as `BLPOP`, are marked with `NO_SCRIPT` flag, which means they will not be allowed on
script mode. Those commands are marked with  `NO_SCRIPT` just because they are
blocking commands and not because they are dangerous. Now that we can run blocking
commands on RM_Call, there is no real reason not to allow such commands on script mode.

The underline problem is that the `NO_SCRIPT` flag is abused to also mark some of the
blocking commands (notice that those commands know not to block the client if it is not
allowed to do so, and have a fallback logic to such cases. So even if those commands
were not marked with `NO_SCRIPT` flag, it would not harm Redis, and today we can
already run those commands within multi exec).

In addition, not all blocking commands are marked with `NO_SCRIPT` flag, for example
`blmpop` are not marked and can run from within a script.

Those facts shows that there are some ambiguity about the meaning of the `NO_SCRIPT`
flag, and its not fully clear where it should be use.

The PR suggest that blocking commands should not be marked with `NO_SCRIPT` flag,
those commands should handle `CLIENT_DENY_BLOCKING` flag and only block when
it's safe (like they already does today). To achieve that, the PR removes the `NO_SCRIPT`
flag from the following commands:
* `blmove`
* `blpop`
* `brpop`
* `brpoplpush`
* `bzpopmax`
* `bzpopmin`
* `wait`

This might be considered a breaking change as now, on scripts, instead of getting
`command is not allowed from script` error, the user will get some fallback behavior
base on the command implementation. That said, the change matches the behavior
of scripts and multi exec with respect to those commands and allow running them on
`RM_Call` even when script mode is used.

### Additional RedisModule API and changes

* `RM_BlockClientSetPrivateData` - Set private data on the blocked client without the
  need to unblock the client. This allows up to set the promise CallReply as the private
  data of the blocked client and abort it if the client gets disconnected.
* `RM_BlockClientGetPrivateData` - Return the current private data set on a blocked client.
  We need it so we will have access to this private data on the disconnect callback.
* On RM_Call, the returned reply will be added to the auto memory context only if auto
  memory is enabled, this allows us to keep the call reply for longer time then the context
  lifetime and does not force an unneeded borrow relationship between the CallReply and
  the RedisModuleContext.
2023-03-16 14:04:31 +02:00
Binbin
0b159b34ea
Bump codespell to 2.2.4, fix typos and outupdated comments (#11911)
Fix some seen typos and wrong comments.
2023-03-16 08:50:32 +02:00
KarthikSubbarao
f8a5a4f70c
Custom authentication for Modules (#11659)
This change adds new module callbacks that can override the default password based authentication associated with ACLs. With this, Modules can register auth callbacks through which they can implement their own Authentication logic. When `AUTH` and `HELLO AUTH ...` commands are used, Module based authentication is attempted and then normal password based authentication is attempted if needed.
The new Module APIs added in this PR are - `RM_RegisterCustomAuthCallback` and `RM_BlockClientOnAuth` and `RedisModule_ACLAddLogEntryByUserName `.

Module based authentication will be attempted for all Redis users (created through the ACL SETUSER cmd or through Module APIs) even if the Redis user does not exist at the time of the command. This gives a chance for the Module to create the RedisModule user and then authenticate via the RedisModule API - from the custom auth callback.

For the AUTH command, we will support both variations - `AUTH <username> <password>` and `AUTH <password>`. In case of the `AUTH <password>` variation, the custom auth callbacks are triggered with “default” as the username and password as what is provided.


### RedisModule_RegisterCustomAuthCallback
```
void RM_RegisterCustomAuthCallback(RedisModuleCtx *ctx, RedisModuleCustomAuthCallback cb) {
```
This API registers a callback to execute to prior to normal password based authentication. Multiple callbacks can be registered across different modules. These callbacks are responsible for either handling the authentication, each authenticating the user or explicitly denying, or deferring it to other authentication mechanisms. Callbacks are triggered in the order they were registered. When a Module is unloaded, all the auth callbacks registered by it are unregistered. The callbacks are attempted, in the order of most recently registered callbacks, when the AUTH/HELLO (with AUTH field is provided) commands are called. The callbacks will be called with a module context along with a username and a password, and are expected to take one of the following actions:

 (1) Authenticate - Use the RM_Authenticate* API successfully and return `REDISMODULE_AUTH_HANDLED`. This will immediately end the auth chain as successful and add the OK reply.
(2) Block a client on authentication - Use the `RM_BlockClientOnAuth` API and return `REDISMODULE_AUTH_HANDLED`. Here, the client will be blocked until the `RM_UnblockClient `API is used which will trigger the auth reply callback (provided earlier through the `RM_BlockClientOnAuth`). In this reply callback, the Module should authenticate, deny or skip handling authentication.
(3) Deny Authentication - Return `REDISMODULE_AUTH_HANDLED` without authenticating or blocking the client. Optionally, `err` can be set to a custom error message. This will immediately end the auth chain as unsuccessful and add the ERR reply.
(4) Skip handling Authentication - Return `REDISMODULE_AUTH_NOT_HANDLED` without blocking the client. This will allow the engine to attempt the next custom auth callback.

If none of the callbacks authenticate or deny auth, then password based auth is attempted and will authenticate or add failure logs and reply to the clients accordingly.

### RedisModule_BlockClientOnAuth
```
RedisModuleBlockedClient *RM_BlockClientOnAuth(RedisModuleCtx *ctx, RedisModuleCustomAuthCallback reply_callback,
                                               void (*free_privdata)(RedisModuleCtx*,void*))
```
This API can only be used from a Module from the custom auth callback. If a client is not in the middle of custom module based authentication, ERROR is returned. Otherwise, the client is blocked and the `RedisModule_BlockedClient` is returned similar to the `RedisModule_BlockClient` API.

### RedisModule_ACLAddLogEntryByUserName
```
int RM_ACLAddLogEntryByUserName(RedisModuleCtx *ctx, RedisModuleString *username, RedisModuleString *object, RedisModuleACLLogEntryReason reason)
```
Adds a new entry in the ACL log with the `username` RedisModuleString provided. This simplifies the Module usage because now, developers do not need to create a Module User just to add an error ACL Log entry. Aside from accepting username (RedisModuleString) instead of a RedisModuleUser, it is the same as the existing `RedisModule_ACLAddLogEntry` API.


### Breaking changes
- HELLO command - Clients can now only set the client name and RESP protocol from the `HELLO` command if they are authenticated. Also, we now finish command arg validation first and return early with a ERR reply if any arg is invalid. This is to avoid mutating the client name / RESP from a command that would have failed on invalid arguments.

### Notable behaviors
- Module unblocking - Now, we will not allow Modules to block the client from inside the context of a reply callback (triggered from the Module unblock flow `moduleHandleBlockedClients`).

---------

Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com>
2023-03-15 15:18:42 -07:00
Binbin
58285a6e92
Fix WAITAOF mix-use last_offset and last_numreplicas (#11922)
There be a situation that satisfies WAIT, and then wrongly unblock
WAITAOF because we mix-use last_offset and last_numreplicas.

We update last_offset and last_numreplicas only when the condition
matches. i.e. output of either replicationCountAOFAcksByOffset or
replicationCountAcksByOffset is right.

In this case, we need to have separate last_ variables for each of
them. Added a last_aof_offset and last_aof_numreplicas for WAITAOF.

WAITAOF was added in #11713. Found while coding #11917.
A Test was added to validate that case.
2023-03-15 18:16:16 +02:00
Slava Koyfman
9344f654c6
Implementing the WAITAOF command (issue #10505) (#11713)
Implementing the WAITAOF functionality which would allow the user to
block until a specified number of Redises have fsynced all previous write
commands to the AOF.

Syntax: `WAITAOF <num_local> <num_replicas> <timeout>`
Response: Array containing two elements: num_local, num_replicas
num_local is always either 0 or 1 representing the local AOF on the master.
num_replicas is the number of replicas that acknowledged the a replication
offset of the last write being fsynced to the AOF.

Returns an error when called on replicas, or when called with non-zero
num_local on a master with AOF disabled, in all other cases the response
just contains number of fsync copies.

Main changes:
* Added code to keep track of replication offsets that are confirmed to have
  been fsynced to disk.
* Keep advancing master_repl_offset even when replication is disabled (and
  there's no replication backlog, only if there's an AOF enabled).
  This way we can use this command and it's mechanisms even when replication
  is disabled.
* Extend REPLCONF ACK to `REPLCONF ACK <ofs> FACK <ofs>`, the FACK
  will be appended only if there's an AOF on the replica, and already ignored on
  old masters (thus backwards compatible)
* WAIT now no longer wait for the replication offset after your last command, but
  rather the replication offset after your last write (or read command that caused
  propagation, e.g. lazy expiry).

Unrelated changes:
* WAIT command respects CLIENT_DENY_BLOCKING (not just CLIENT_MULTI)

Implementation details:
* Add an atomic var named `fsynced_reploff_pending` that's updated
  (usually by the bio thread) and later copied to the main `fsynced_reploff`
  variable (only if the AOF base file exists).
  I.e. during the initial AOF rewrite it will not be used as the fsynced offset
  since the AOF base is still missing.
* Replace close+fsync bio job with new BIO_CLOSE_AOF (AOF specific)
  job that will also update fsync offset the field.
* Handle all AOF jobs (BIO_CLOSE_AOF, BIO_AOF_FSYNC) in the same bio
  worker thread, to impose ordering on their execution. This solves a
  race condition where a job could set `fsynced_reploff_pending` to a higher
  value than another pending fsync job, resulting in indicating an offset
  for which parts of the data have not yet actually been fsynced.
  Imposing an ordering on the jobs guarantees that fsync jobs are executed
  in increasing order of replication offset.
* Drain bio jobs when switching `appendfsync` to "always"
  This should prevent a write race between updates to `fsynced_reploff_pending`
  in the main thread (`flushAppendOnlyFile` when set to ALWAYS fsync), and
  those done in the bio thread.
* Drain the pending fsync when starting over a new AOF to avoid race conditions
  with the previous AOF offsets overriding the new one (e.g. after switching to
  replicate from a new master).
* Make sure to update the fsynced offset at the end of the initial AOF rewrite.
  a must in case there are no additional writes that trigger a periodic fsync,
  specifically for a replica that does a full sync.

Limitations:
It is possible to write a module and a Lua script that propagate to the AOF and doesn't
propagate to the replication stream. see REDISMODULE_ARGV_NO_REPLICAS and luaRedisSetReplCommand.
These features are incompatible with the WAITAOF command, and can result
in two bad cases. The scenario is that the user executes command that only
propagates to AOF, and then immediately
issues a WAITAOF, and there's no further writes on the replication stream after that.
1. if the the last thing that happened on the replication stream is a PING
  (which increased the replication offset but won't trigger an fsync on the replica),
  then the client would hang forever (will wait for an fack that the replica will never
  send sine it doesn't trigger any fsyncs).
2. if the last thing that happened is a write command that got propagated properly,
  then WAITAOF will be released immediately, without waiting for an fsync (since
  the offset didn't change)

Refactoring:
* Plumbing to allow bio worker to handle multiple job types
  This introduces infrastructure necessary to allow BIO workers to
  not have a 1-1 mapping of worker to job-type. This allows in the
  future to assign multiple job types to a single worker, either as
  a performance/resource optimization, or as a way of enforcing
  ordering between specific classes of jobs.

Co-authored-by: Oran Agra <oran@redislabs.com>
2023-03-14 20:26:21 +02:00
Binbin
7997874f4d
Fix tail->repl_offset update in feedReplicationBuffer (#11905)
In #11666, we added a while loop and will split a big reply
node to multiple nodes. The update of tail->repl_offset may
be wrong. Like before #11666, we would have created at most
one new reply node, and now we will create multiple nodes if
it is a big reply node.

Now we are creating more than one node, and the tail->repl_offset
of all the nodes except the last one are incorrect. Because we
update master_repl_offset at the beginning, and then use it to
update the tail->repl_offset. This would have lead to an assertion
during PSYNC, a test was added to validate that case.

Besides that, the calculation of size was adjusted to fix
tests that failed due to a combination of a very low backlog size,
and some thresholds of that get violated because of the relatively
high overhead of replBufBlock. So now if the backlog size / 16 is too
small, we'll take PROTO_REPLY_CHUNK_BYTES instead.

Co-authored-by: Oran Agra <oran@redislabs.com>
2023-03-13 16:12:29 +02:00
Binbin
416842e6c0
Fix the bug that CLIENT REPLY OFF|SKIP cannot receive push notifications (#11875)
This bug seems to be there forever, CLIENT REPLY OFF|SKIP will
mark the client with CLIENT_REPLY_OFF or CLIENT_REPLY_SKIP flags.
With these flags, prepareClientToWrite called by addReply* will
return C_ERR directly. So the client can't receive the Pub/Sub
messages and any other push notifications, e.g client side tracking.

In this PR, we adding a CLIENT_PUSHING flag, disables the reply
silencing flags. When adding push replies, set the flag, after the reply,
clear the flag. Then add the flag check in prepareClientToWrite.

Fixes #11874

Note, the SUBSCRIBE command response is a bit awkward,
see https://github.com/redis/redis-doc/pull/2327

Co-authored-by: Oran Agra <oran@redislabs.com>
2023-03-12 17:50:44 +02:00
guybe7
4ba47d2d21
Add reply_schema to command json files (internal for now) (#10273)
Work in progress towards implementing a reply schema as part of COMMAND DOCS, see #9845
Since ironing the details of the reply schema of each and every command can take a long time, we
would like to merge this PR when the infrastructure is ready, and let this mature in the unstable branch.
Meanwhile the changes of this PR are internal, they are part of the repo, but do not affect the produced build.

### Background
In #9656 we add a lot of information about Redis commands, but we are missing information about the replies

### Motivation
1. Documentation. This is the primary goal.
2. It should be possible, based on the output of COMMAND, to be able to generate client code in typed
  languages. In order to do that, we need Redis to tell us, in detail, what each reply looks like.
3. We would like to build a fuzzer that verifies the reply structure (for now we use the existing
  testsuite, see the "Testing" section)

### Schema
The idea is to supply some sort of schema for the various replies of each command.
The schema will describe the conceptual structure of the reply (for generated clients), as defined in RESP3.
Note that the reply structure itself may change, depending on the arguments (e.g. `XINFO STREAM`, with
and without the `FULL` modifier)
We decided to use the standard json-schema (see https://json-schema.org/) as the reply-schema.

Example for `BZPOPMIN`:
```
"reply_schema": {
    "oneOf": [
        {
            "description": "Timeout reached and no elements were popped.",
            "type": "null"
        },
        {
            "description": "The keyname, popped member, and its score.",
            "type": "array",
            "minItems": 3,
            "maxItems": 3,
            "items": [
                {
                    "description": "Keyname",
                    "type": "string"
                },
                {
                    "description": "Member",
                    "type": "string"
                },
                {
                    "description": "Score",
                    "type": "number"
                }
            ]
        }
    ]
}
```

#### Notes
1.  It is ok that some commands' reply structure depends on the arguments and it's the caller's responsibility
  to know which is the relevant one. this comes after looking at other request-reply systems like OpenAPI,
  where the reply schema can also be oneOf and the caller is responsible to know which schema is the relevant one.
2. The reply schemas will describe RESP3 replies only. even though RESP3 is structured, we want to use reply
  schema for documentation (and possibly to create a fuzzer that validates the replies)
3. For documentation, the description field will include an explanation of the scenario in which the reply is sent,
  including any relation to arguments. for example, for `ZRANGE`'s two schemas we will need to state that one
  is with `WITHSCORES` and the other is without.
4. For documentation, there will be another optional field "notes" in which we will add a short description of
  the representation in RESP2, in case it's not trivial (RESP3's `ZRANGE`'s nested array vs. RESP2's flat
  array, for example)

Given the above:
1. We can generate the "return" section of all commands in [redis-doc](https://redis.io/commands/)
  (given that "description" and "notes" are comprehensive enough)
2. We can generate a client in a strongly typed language (but the return type could be a conceptual
  `union` and the caller needs to know which schema is relevant). see the section below for RESP2 support.
3. We can create a fuzzer for RESP3.

### Limitations (because we are using the standard json-schema)
The problem is that Redis' replies are more diverse than what the json format allows. This means that,
when we convert the reply to a json (in order to validate the schema against it), we lose information (see
the "Testing" section below).
The other option would have been to extend the standard json-schema (and json format) to include stuff
like sets, bulk-strings, error-string, etc. but that would mean also extending the schema-validator - and that
seemed like too much work, so we decided to compromise.

Examples:
1. We cannot tell the difference between an "array" and a "set"
2. We cannot tell the difference between simple-string and bulk-string
3. we cannot verify true uniqueness of items in commands like ZRANGE: json-schema doesn't cover the
  case of two identical members with different scores (e.g. `[["m1",6],["m1",7]]`) because `uniqueItems`
  compares (member,score) tuples and not just the member name. 

### Testing
This commit includes some changes inside Redis in order to verify the schemas (existing and future ones)
are indeed correct (i.e. describe the actual response of Redis).
To do that, we added a debugging feature to Redis that causes it to produce a log of all the commands
it executed and their replies.
For that, Redis needs to be compiled with `-DLOG_REQ_RES` and run with
`--reg-res-logfile <file> --client-default-resp 3` (the testsuite already does that if you run it with
`--log-req-res --force-resp3`)
You should run the testsuite with the above args (and `--dont-clean`) in order to make Redis generate
`.reqres` files (same dir as the `stdout` files) which contain request-response pairs.
These files are later on processed by `./utils/req-res-log-validator.py` which does:
1. Goes over req-res files, generated by redis-servers, spawned by the testsuite (see logreqres.c)
2. For each request-response pair, it validates the response against the request's reply_schema
  (obtained from the extended COMMAND DOCS)
5. In order to get good coverage of the Redis commands, and all their different replies, we chose to use
  the existing redis test suite, rather than attempt to write a fuzzer.

#### Notes about RESP2
1. We will not be able to use the testing tool to verify RESP2 replies (we are ok with that, it's time to
  accept RESP3 as the future RESP)
2. Since the majority of the test suite is using RESP2, and we want the server to reply with RESP3
  so that we can validate it, we will need to know how to convert the actual reply to the one expected.
   - number and boolean are always strings in RESP2 so the conversion is easy
   - objects (maps) are always a flat array in RESP2
   - others (nested array in RESP3's `ZRANGE` and others) will need some special per-command
     handling (so the client will not be totally auto-generated)

Example for ZRANGE:
```
"reply_schema": {
    "anyOf": [
        {
            "description": "A list of member elements",
            "type": "array",
            "uniqueItems": true,
            "items": {
                "type": "string"
            }
        },
        {
            "description": "Members and their scores. Returned in case `WITHSCORES` was used.",
            "notes": "In RESP2 this is returned as a flat array",
            "type": "array",
            "uniqueItems": true,
            "items": {
                "type": "array",
                "minItems": 2,
                "maxItems": 2,
                "items": [
                    {
                        "description": "Member",
                        "type": "string"
                    },
                    {
                        "description": "Score",
                        "type": "number"
                    }
                ]
            }
        }
    ]
}
```

### Other changes
1. Some tests that behave differently depending on the RESP are now being tested for both RESP,
  regardless of the special log-req-res mode ("Pub/Sub PING" for example)
2. Update the history field of CLIENT LIST
3. Added basic tests for commands that were not covered at all by the testsuite

### TODO

- [x] (maybe a different PR) add a "condition" field to anyOf/oneOf schemas that refers to args. e.g.
  when `SET` return NULL, the condition is `arguments.get||arguments.condition`, for `OK` the condition
  is `!arguments.get`, and for `string` the condition is `arguments.get` - https://github.com/redis/redis/issues/11896
- [x] (maybe a different PR) also run `runtest-cluster` in the req-res logging mode
- [x] add the new tests to GH actions (i.e. compile with `-DLOG_REQ_RES`, run the tests, and run the validator)
- [x] (maybe a different PR) figure out a way to warn about (sub)schemas that are uncovered by the output
  of the tests - https://github.com/redis/redis/issues/11897
- [x] (probably a separate PR) add all missing schemas
- [x] check why "SDOWN is triggered by misconfigured instance replying with errors" fails with --log-req-res
- [x] move the response transformers to their own file (run both regular, cluster, and sentinel tests - need to
  fight with the tcl including mechanism a bit)
- [x] issue: module API - https://github.com/redis/redis/issues/11898
- [x] (probably a separate PR): improve schemas: add `required` to `object`s - https://github.com/redis/redis/issues/11899

Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>
Co-authored-by: Hanna Fadida <hanna.fadida@redislabs.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Shaya Potter <shaya@redislabs.com>
2023-03-11 10:14:16 +02:00
uriyage
9d336ac398
Try to trim strings only when applicable (#11817)
As `sdsRemoveFreeSpace` have an impact on performance even if it is a no-op (see details at #11508). 
Only call the function when there is a possibility that the string contains free space.
* For strings coming from the network, it's only if they're bigger than PROTO_MBULK_BIG_ARG
* For strings coming from scripts, it's only if they're smaller than LUA_CMD_OBJCACHE_MAX_LEN
* For strings coming from modules, it could be anything.

Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: sundb <sundbcn@gmail.com>
2023-02-28 19:38:58 +02:00
Chen Tianjie
897c3d522c
Add CLIENT NO-TOUCH for clients to run commands without affecting LRU/LFU of keys (#11483)
When no-touch mode is enabled, the client will not touch LRU/LFU of the
keys it accesses, except when executing command `TOUCH`.
This allows inspecting or modifying the key-space without affecting their eviction.

Changes:
- A command `CLIENT NO-TOUCH ON|OFF` to switch on and off this mode.
- A client flag `#define CLIENT_NOTOUCH (1ULL<<45)`, which can be shown
  with `CLIENT INFO`, by the letter "T" in the "flags" field.
- Clear `NO-TOUCH` flag in `clearClientConnectionState`, which is used by `RESET`
  command and resetting temp clients used by modules.
- Also clear `NO-EVICT` flag in `clearClientConnectionState`, this might have been an
  oversight, spotted by @madolson.
- A test using `DEBUG OBJECT` command to verify that LRU stat is not touched when
  no-touch mode is on.
 

Co-authored-by: chentianjie <chentianjie@alibaba-inc.com>
Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com>
Co-authored-by: sundb <sundbcn@gmail.com>
2023-02-23 09:07:49 +02:00
Oran Agra
233abbbe03
Cleanup around script_caller, fix tracking of scripts and ACL logging for RM_Call (#11770)
* Make it clear that current_client is the root client that was called by
  external connection
* add executing_client which is the client that runs the current command
  (can be a module or a script)
* Remove script_caller that was used for commands that have CLIENT_SCRIPT
  to get the client that called the script. in most cases, that's the current_client,
  and in others (when being called from a module), it could be an intermediate
  client when we actually want the original one used by the external connection.

bugfixes:
* RM_Call with C flag should log ACL errors with the requested user rather than
  the one used by the original client, this also solves a crash when RM_Call is used
  with C flag from a detached thread safe context.
* addACLLogEntry would have logged info about the script_caller, but in case the
  script was issued by a module command we actually want the current_client. the
  exception is when RM_Call is called from a timer event, in which case we don't
  have a current_client.

behavior changes:
* client side tracking for scripts now tracks the keys that are read by the script
  instead of the keys that are declared by the caller for EVAL

other changes:
* Log both current_client and executing_client in the crash log.
* remove prepareLuaClient and resetLuaClient, being dead code that was forgotten.
* remove scriptTimeSnapshot and snapshot_time and instead add cmd_time_snapshot
  that serves all commands and is reset only when execution nesting starts.
* remove code to propagate CLIENT_FORCE_REPL from the executed command
  to the script caller since scripts aren't propagated anyway these days and anyway
  this flag wouldn't have had an effect since CLIENT_PREVENT_PROP is added by scriptResetRun.
* fix a module GIL violation issue in afterSleep that was introduced in #10300 (unreleased)
2023-02-16 08:07:35 +02:00
guybe7
fd82bccd0e
SCAN/RANDOMKEY and lazy-expire (#11788)
Starting from Redis 7.0 (#9890) we started wrapping everything a command
 propagates with MULTI/EXEC. The problem is that both SCAN and RANDOMKEY can
lazy-expire arbitrary keys (similar behavior to active-expire), and put DELs in a transaction.

Fix: When these commands are called without a parent exec-unit (e.g. not in EVAL or
MULTI) we avoid wrapping their DELs in a transaction (for the same reasons active-expire
and eviction avoids a transaction)

This PR adds a per-command flag that indicates that the command may touch arbitrary
keys (not the ones in the arguments), and uses that flag to avoid the MULTI-EXEC.
For now, this flag is internal, since we're considering other solutions for the future.

Note for cluster mode: if SCAN/RANDOMKEY is inside EVAL/MULTI it can still cause the
same situation (as it always did), but it won't cause a CROSSSLOT because replicas and AOF
do not perform slot checks.
The problem with the above is mainly for 3rd party ecosystem tools that propagate commands
from master to master, or feed an AOF file with redis-cli into a master.
This PR aims to fix the regression in redis 7.0, and we opened #11792 to try to handle the
bigger problem with lazy expire better for another release.
2023-02-14 09:33:21 +02:00
Meir Shpilraien (Spielrein)
5c3938d5cc
Match REDISMODULE_OPEN_KEY_* flags to LOOKUP_* flags (#11772)
The PR adds support for the following flags on RedisModule_OpenKey:

* REDISMODULE_OPEN_KEY_NONOTIFY - Don't trigger keyspace event on key misses.
* REDISMODULE_OPEN_KEY_NOSTATS - Don't update keyspace hits/misses counters.
* REDISMODULE_OPEN_KEY_NOEXPIRE - Avoid deleting lazy expired keys.
* REDISMODULE_OPEN_KEY_NOEFFECTS - Avoid any effects from fetching the key

In addition, added `RM_GetOpenKeyModesAll`, which returns the mask of all
supported OpenKey modes. This allows the module to check, in runtime, which
OpenKey modes are supported by the current Redis instance.
2023-02-09 14:59:05 +02:00
Viktor Söderqvist
2bbc89196a Move stat_active_defrag_hits increment to activeDefragAlloc
instead of passing it around to every defrag function
2023-01-11 10:25:20 +01:00
zhenwei pi
dec529f4be
Introduce .is_local method for connection layer (#11672)
Introduce .is_local method to connection, and implement for TCP/TLS/
Unix socket, also drop 'int islocalClient(client *c)'. Then we can
hide the detail into the specific connection types.
Uplayer tests a connection is local or not by abstract method only.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
2023-01-04 10:52:56 +02:00