Commit Graph

1611 Commits

Author SHA1 Message Date
Eduardo Semprebon
91d0c758e5
Replica keep serving data during repl-diskless-load=swapdb for better availability (#9323)
For diskless replication in swapdb mode, considering we already spend replica memory
having a backup of current db to restore in case of failure, we can have the following benefits
by instead swapping database only in case we succeeded in transferring db from master:

- Avoid `LOADING` response during failed and successful synchronization for cases where the
  replica is already up and running with data.
- Faster total time of diskless replication, because now we're moving from Transfer + Flush + Load
  time to Transfer + Load only. Flushing the tempDb is done asynchronously after swapping.
- This could be implemented also for disk replication with similar benefits if consumers are willing
  to spend the extra memory usage.

General notes:
- The concept of `backupDb` becomes `tempDb` for clarity.
- Async loading mode will only kick in if the replica is syncing from a master that has the same
  repl-id the one it had before. i.e. the data it's getting belongs to a different time of the same timeline. 
- New property in INFO: `async_loading` to differentiate from the blocking loading
- Slot to Key mapping is now a field of `redisDb` as it's more natural to access it from both server.db
  and the tempDb that is passed around.
- Because this is affecting replicas only, we assume that if they are not readonly and write commands
  during replication, they are lost after SYNC same way as before, but we're still denying CONFIG SET
  here anyways to avoid complications.

Considerations for review:
- We have many cases where server.loading flag is used and even though I tried my best, there may
  be cases where async_loading should be checked as well and cases where it shouldn't (would require
  very good understanding of whole code)
- Several places that had different behavior depending on the loading flag where actually meant to just
  handle commands coming from the AOF client differently than ones coming from real clients, changed
  to check CLIENT_ID_AOF instead.

**Additional for Release Notes**
- Bugfix - server.dirty was not incremented for any kind of diskless replication, as effect it wouldn't
  contribute on triggering next database SAVE
- New flag for RM_GetContextFlags module API: REDISMODULE_CTX_FLAGS_ASYNC_LOADING
- Deprecated RedisModuleEvent_ReplBackup. Starting from Redis 7.0, we don't fire this event.
  Instead, we have the new RedisModuleEvent_ReplAsyncLoad holding 3 sub-events: STARTED,
  ABORTED and COMPLETED.
- New module flag REDISMODULE_OPTIONS_HANDLE_REPL_ASYNC_LOAD for RedisModule_SetModuleOptions
  to allow modules to declare they support the diskless replication with async loading (when absent, we fall
  back to disk-based loading).

Co-authored-by: Eduardo Semprebon <edus@saxobank.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2021-11-04 10:46:50 +02:00
Itamar Haber
06dd202a05
Fixes LPOP/RPOP wrong replies when count is 0 (#9692)
Introduced in #8179, this fixes the command's replies in the 0 count edge case.
[BREAKING] changes the reply type when count is 0 to an empty array (instead of nil)
Moves LPOP ... 0 fast exit path after type check to reply with WRONGTYPE
2021-11-04 09:43:08 +02:00
menwen
ccf8a651f3
Retry when a blocked connection system call is interrupted by a signal (#9629)
When repl-diskless-load is enabled, the connection is set to the blocking state.
The connection may be interrupted by a signal during a system call.
This would have resulted in a disconnection and possibly a reconnection loop.

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-11-04 09:09:28 +02:00
Oran Agra
d04f306931
Fix race condition in cluster test 22-replica-in-sync (#9721)
there was a chance that by the time the assertion is executed,
the replica already manages to reconnect.

now we make sure the replica is unable to re-connect to the master.

additionally, we wait for some gossip from the disconnected replica,
to see that it doesn't mess things up.

unrelated: fix a typo when trying to exhaust the backlog, one that
didn't have any harmful implications

Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2021-11-04 08:44:18 +02:00
perryitay
f27083a4a8
Add support for list type to store elements larger than 4GB (#9357)
Redis lists are stored in quicklist, which is currently a linked list of ziplists.
Ziplists are limited to storing elements no larger than 4GB, so when bigger
items are added they're getting truncated.
This PR changes quicklists so that they're capable of storing large items
in quicklist nodes that are plain string buffers rather than ziplist.

As part of the PR there were few other changes in redis: 
1. new DEBUG sub-commands: 
   - QUICKLIST-PACKED-THRESHOLD - set the threshold of for the node type to
     be plan or ziplist. default (1GB)
   - QUICKLIST <key> - Shows low level info about the quicklist encoding of <key>
2. rdb format change:
   - A new type was added - RDB_TYPE_LIST_QUICKLIST_2 . 
   - container type (packed / plain) was added to the beginning of the rdb object
     (before the actual node list).
3. testing:
   - Tests that requires over 100MB will be by default skipped. a new flag was
     added to 'runtest' to run the large memory tests (not used by default)

Co-authored-by: sundb <sundbcn@gmail.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2021-11-03 20:47:18 +02:00
guybe7
f11a2d4dd7
Fix COMMAND GETKEYS on EVAL without keys (#9733)
Add new no-mandatory-keys flag to support COMMAND GETKEYS of commands
which have no mandatory keys.

In the past we would have got this error:
```
127.0.0.1:6379> command getkeys eval "return 1" 0
(error) ERR Invalid arguments specified for command
```
2021-11-03 14:38:26 +02:00
Oran Agra
d25dc08932
Solve issues with tracking test in external mode (#9726)
The issue was that setting maxmemory to used_memory and expecting
eviction is insufficient, since we need to take
mem_not_counted_for_evict into consideration.

This test got broken by #9166
2021-11-02 16:07:51 -07:00
Oran Agra
87321deb3f
attempt to fix tracking test issue with external tests due to lazy free (#9722)
The External tests started failing recently for unclear reason:
```
*** [err]: Tracking invalidation message of eviction keys should be before response in tests/unit/tracking.tcl
Expected '0' to be equal to 'invalidate volatile-key' (context: type eval line 21 cmd {assert_equal $res {invalidate volatile-key}} proc ::test)
```

I suspect the issue is that the used_memory sample is taken while a lazy free is still being processed.
2021-11-02 16:42:53 +02:00
menwen
d5ca72e38b
fix defrag test looking at the wrong latency metric (#9723)
the latency event was renamed in #7726, and the outcome was that the test was
ineffective (unable to measure the max latency, always seeing 0)
2021-11-02 15:52:56 +02:00
Binbin
58a1d16ff6
Fix timing issue in replication test (#9719)
So it looks like sampling set loglines [count_log_lines -2] was
executed too late, and the replication managed to complete before that.

```
*** [err]: diskless no replicas drop during rdb pipe in tests/integration/replication.tcl
log message of '"*Diskless rdb transfer, done reading from pipe, 2 replicas still up*"' not found in ./tests/tmp/server.6124.69/stdout after line: 52 till line: 52
```

Changes:
1. when we search the master log file, we start to search from before we sent the REPLICAOF
  command, to prevent a race in which the replication completed before we sampled the log line count.
2. we don't need to sample the replica loglines sine it's a fresh resplica that's just been started, so the message
  we're looking for is the first occurrence in the log, we can start search from 0.

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-11-02 10:32:01 +02:00
Binbin
cea7809cea
Fix race condition in psync2-pingoff test (#9712)
Test failed on freebsd:
```
*** [err]: Make the old master a replica of the new one and check conditions in tests/integration/psync2-pingoff.tcl
Expected '162' to be equal to '176' (context: type eval line 18 cmd {assert_equal [status $R(0) master_repl_offset] [status $R(1) master_repl_offset]} proc ::test)
```

There are two possible race conditions in the test.

1. The code waits for sync_full to increment, and assumes that means the
master did the fork. But in fact there are cases the master will increment
that sync_full counter (after replica asks for sync), but will see that
there's already a fork running and will delay the fork creation.

In this case the INCR will be executed before the fork happens, so it'll
not be in the command stream. Solve that by waiting for `master_link_status: up`
on the replica before the INCR.

2. The repl-ping-replica-period is still high (1 second), so there's a chance the
master will send an additional PING between the two calls to INFO (the line that
fails is the one that samples INFO from both servers). So there's a chance one of
them will have an incremented offset due to PING and the other won't have it yet.

In theory we can wait for the repl_offset to match, but then we risk facing a
situation where that race will hide an offset mis-match. so instead, i think we
should just change repl-ping-replica-period to prevent further pings from being pushed.

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-11-01 16:07:08 +02:00
Oran Agra
f1f3cceb50
fix valgrind issues with long double module test (#9709)
The module test in reply.tcl was introduced by #8521 but didn't run until recently (see #9639)
and then it started failing with valgrind.
This is because valgrind uses 64 bit long double (unlike most other platforms that have at least 80 bits)
But besides valgrind, the tests where also incompatible with ARM32, which also uses 64 bit long doubles.

We now use appropriate value to avoid issues with either valgrind or ARM32

In all the double tests, i use 3.141, which is safe since since addReplyDouble uses
`%.17Lg` which is able to represent this value without adding any digits due to precision loss. 

In the long double, since we use `%.17Lf` in ld2string, it preserves 17 significant
digits, rather than 17 digit after the decimal point (like in `%.17Lg`).
So to make these similar, i use value lower than 1 (no digits left of
the period)

Lastly, we have the same issue with TCL (no long doubles) so we read
raw protocol in that test.

Note that the only error before this fix (in both valgrind and ARM32 is this:
```
*** [err]: RM_ReplyWithLongDouble: a float reply in tests/unit/moduleapi/reply.tcl
Expected '3.141' to be equal to '3.14100000000000001' (context: type eval line 2 cmd {assert_equal 3.141 [r rw.longdouble 3.141]} proc ::test)
```
so the changes to debug.c and scripting.tcl aren't really needed, but i consider them a cleanup
(i.e. scripting.c validated a different constant than the one that's sent to it from debug.c).

Another unrelated change is to add the RESP version to the repeated tests in reply.tcl
2021-11-01 13:41:35 +02:00
Oran Agra
48d54265ce
Fix failing cluster tests (#9707)
Fix failures introduced by #9695 which was an attempt to solve failures introduced by #9679.
And alternative to #9703 (i didn't like the extra argument to kill_instance).

Reverting #9695.
Instead of stopping AOF on all terminations, stop it only on the two which need it.
Do it as part of the test rather than the infra (it was add that kill_instance used `R`
to communicate to the instance)

Note that the original purpose of these tests was to trigger a crash, but that upsets
valgrind so in redis 6.2 i changed it to use SIGTERM, so i now rename the tests
(remove "kill" and "crash").

Also add some colors to failures, and the word "FAILED" so that it's searchable.

And solve a semi-related race condition in 14-consistency-check.tcl
2021-10-31 19:22:21 +02:00
Yossi Gottlieb
f26e90be0c
Use 'gcc' instead of 'ld' to link test modules. (#9710)
This solves several problems in a more elegant way:

* No need to explicitly use `-lc` on x86_64 when building with `-m32`.
* Avoids issues with undefined floating point emulation funcs on ARM.
2021-10-31 16:25:57 +02:00
Binbin
033578839b
Fix multiple COUNT in LMPOP/BLMPOP/ZMPOP/BZMPOP (#9701)
The previous code did not check whether COUNT is set.
So we can use `lmpop 2 key1 key2 left count 1 count 2`.

This situation can occur in LMPOP/BLMPOP/ZMPOP/BZMPOP commands.
LMPOP/BLMPOP introduced in #9373, ZMPOP/BZMPOP introduced in #9484.
2021-10-31 16:10:29 +02:00
Wang Yuan
68886de085
Fix timing issue in replication buffer test (#9697)
Introduced in #9166
2021-10-29 08:04:12 +03:00
Oran Agra
22a778c880
fix failing cluster tests (#9695)
When stopping an instance in the cluster tests, disable appendonly first, so that SIGTERM won't be ignored.

Recently in #9679 i change the test infra to use SIGSEGV to kill servers that refuse
the SIGTERM rather than do SIGKILL directly.

This surfaced an issue that i've added in #7725 which changed SIGKILL to SIGTERM (to resolve valgrind issues).
So the current situation in the past months was that sometimes servers refused the
SIGTERM and waited 10 seconds for the SIGKILL, and this commit resolves that (faster termination).
2021-10-28 12:16:27 +03:00
Wen Hui
5fb4adba65
New Cluster Command: CLUSTER DELSLOTSRANGE and CLUSTER ADDSLOTSRANGE (#9445) 2021-10-26 21:44:33 -07:00
Wang Yuan
37dc2f13b4
Fix not waiting for data loading to complete in AOF tests (#9683)
Fix timing issue of a new test introduced in #9326
2021-10-26 14:08:09 +03:00
Oran Agra
37559ca79f
Fix race condition in lazy free test (#9682)
The first test exited before all the memory was reclaimed, so when the second test
sampled used_memory, it was too early.
2021-10-26 13:02:31 +03:00
Oran Agra
665e428435
Testsuite: attempt to find / avoid valgrind warnings of killed processes (#9679)
I recently started seeing a lot of empty valgrind reports in the daily CI.
i.e. prints showing valgrind header but no leak report, which causes the tests to fail
https://github.com/redis/redis/runs/3991335416?check_suite_focus=true

This commit change 2 things:
* first, considering valgrind is just slow, we used to give processes 60 seconds timeout on shutdown
  instead of 10 seconds we give normally. this commit changes that to 120.
* secondly, when we reach the timeout, we first try to use SIGSEGV so that maybe we'll get a stack
  trace indicating where redis is hang, and we only resort to SIGKILL if double that time passed.

note that if there are indeed hang processes, we will normally not see that in the non-valgrind runs,
since the tests didn't use to detect any failure in that case, and now they will since `crashlog_from_file`
is run after `kill_server`.
2021-10-26 08:34:30 +03:00
Wang Yuan
9ec3294b97
Add timestamp annotations in AOF (#9326)
Add timestamp annotation in AOF, one part of #9325.

Enabled with the new `aof-timestamp-enabled` config option.

Timestamp annotation format is "#TS:${timestamp}\r\n"."
TS" is short of timestamp and this method could save extra bytes in AOF.

We can use timestamp annotation for some special functions. 
- know the executing time of commands
- restore data to a specific point-in-time (by using redis-check-rdb to truncate the file)
2021-10-25 13:08:34 +03:00
Guy Korland
6cf6c36937
Replace deprecated REDISMODULE_POSTPONED_ARRAY_LEN in module tests and examples (#9677)
REDISMODULE_POSTPONED_ARRAY_LEN is deprecated, use REDISMODULE_POSTPONED_LEN instead
2021-10-25 12:00:43 +03:00
Shaya Potter
12ce2c3925
Add RM_ReplyWithBigNumber module API (#9639)
Let modules use additional type of RESP3 response (unused by redis so far)
Also fix tests that where introduced in #8521 but didn't actually run.

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-10-25 11:31:20 +03:00
Wang Yuan
c1718f9d86
Replication backlog and replicas use one global shared replication buffer (#9166)
## Background
For redis master, one replica uses one copy of replication buffer, that is a big waste of memory,
more replicas more waste, and allocate/free memory for every reply list also cost much.
If we set client-output-buffer-limit small and write traffic is heavy, master may disconnect with
replicas and can't finish synchronization with replica. If we set  client-output-buffer-limit big,
master may be OOM when there are many replicas that separately keep much memory.
Because replication buffers of different replica client are the same, one simple idea is that
all replicas only use one replication buffer, that will effectively save memory.

Since replication backlog content is the same as replicas' output buffer, now we
can discard replication backlog memory and use global shared replication buffer
to implement replication backlog mechanism.

## Implementation
I create one global "replication buffer" which contains content of replication stream.
The structure of "replication buffer" is similar to the reply list that exists in every client.
But the node of list is `replBufBlock`, which has `id, repl_offset, refcount` fields.
```c
/* Replication buffer blocks is the list of replBufBlock.
 *
 * +--------------+       +--------------+       +--------------+
 * | refcount = 1 |  ...  | refcount = 0 |  ...  | refcount = 2 |
 * +--------------+       +--------------+       +--------------+
 *      |                                            /       \
 *      |                                           /         \
 *      |                                          /           \
 *  Repl Backlog                               Replia_A      Replia_B
 * 
 * Each replica or replication backlog increments only the refcount of the
 * 'ref_repl_buf_node' which it points to. So when replica walks to the next
 * node, it should first increase the next node's refcount, and when we trim
 * the replication buffer nodes, we remove node always from the head node which
 * refcount is 0. If the refcount of the head node is not 0, we must stop
 * trimming and never iterate the next node. */

/* Similar with 'clientReplyBlock', it is used for shared buffers between
 * all replica clients and replication backlog. */
typedef struct replBufBlock {
    int refcount;           /* Number of replicas or repl backlog using. */
    long long id;           /* The unique incremental number. */
    long long repl_offset;  /* Start replication offset of the block. */
    size_t size, used;
    char buf[];
} replBufBlock;
```
So now when we feed replication stream into replication backlog and all replicas, we only need
to feed stream into replication buffer `feedReplicationBuffer`. In this function, we set some fields of
replication backlog and replicas to references of the global replication buffer blocks. And we also
need to check replicas' output buffer limit to free if exceeding `client-output-buffer-limit`, and trim
replication backlog if exceeding `repl-backlog-size`.

When sending reply to replicas, we also need to iterate replication buffer blocks and send its
content, when totally sending one block for replica, we decrease current node count and
increase the next current node count, and then free the block which reference is 0 from the
head of replication buffer blocks.

Since now we use linked list to manage replication backlog, it may cost much time for iterating
all linked list nodes to find corresponding replication buffer node. So we create a rax tree to
store some nodes  for index, but to avoid rax tree occupying too much memory, i record
one per 64 nodes for index.

Currently, to make partial resynchronization as possible as much, we always let replication
backlog as the last reference of replication buffer blocks, backlog size may exceeds our setting
if slow replicas that reference vast replication buffer blocks, and this method doesn't increase
memory usage since they share replication buffer. To avoid freezing server for freeing unreferenced
replication buffer blocks when we need to trim backlog for exceeding backlog size setting,
we trim backlog incrementally (free 64 blocks per call now), and make it faster in
`beforeSleep` (free 640 blocks).

### Other changes
- `mem_total_replication_buffers`: we add this field in INFO command, it means the total
  memory of replication buffers used.
- `mem_clients_slaves`:  now even replica is slow to replicate, and its output buffer memory
  is not 0, but it still may be 0, since replication backlog and replicas share one global replication
  buffer, only if replication buffer memory is more than the repl backlog setting size, we consider
  the excess as replicas' memory. Otherwise, we think replication buffer memory is the consumption
  of repl backlog.
- Key eviction
  Since all replicas and replication backlog share global replication buffer, we think only the
  part of exceeding backlog size the extra separate consumption of replicas.
  Because we trim backlog incrementally in the background, backlog size may exceeds our
  setting if slow replicas that reference vast replication buffer blocks disconnect.
  To avoid massive eviction loop, we don't count the delayed freed replication backlog into
  used memory even if there are no replicas, i.e. we also regard this memory as replicas's memory.
- `client-output-buffer-limit` check for replica clients
  It doesn't make sense to set the replica clients output buffer limit lower than the repl-backlog-size
  config (partial sync will succeed and then replica will get disconnected). Such a configuration is
  ignored (the size of repl-backlog-size will be used). This doesn't have memory consumption
  implications since the replica client will share the backlog buffers memory.
- Drop replication backlog after loading data if needed
  We always create replication backlog if server is a master, we need it because we put DELs in
  it when loading expired keys in RDB, but if RDB doesn't have replication info or there is no rdb,
  it is not possible to support partial resynchronization, to avoid extra memory of replication backlog,
  we drop it.
- Multi IO threads
 Since all replicas and replication backlog use global replication buffer,  if I/O threads are enabled,
  to guarantee data accessing thread safe, we must let main thread handle sending the output buffer
  to all replicas. But before, other IO threads could handle sending output buffer of all replicas.

## Other optimizations
This solution resolve some other problem:
- When replicas disconnect with master since of out of output buffer limit, releasing the output
  buffer of replicas may freeze server if we set big `client-output-buffer-limit` for replicas, but now,
  it doesn't cause freezing.
- This implementation may mitigate reply list copy cost time(also freezes server) when one replication
  has huge reply buffer and another replica can copy buffer for full synchronization. now, we just copy
  reference info, it is very light.
- If we set replication backlog size big, it also may cost much time to copy replication backlog into
  replica's output buffer. But this commit eliminates this problem.
- Resizing replication backlog size doesn't empty current replication backlog content.
2021-10-25 09:24:31 +03:00
Itamar Haber
48e4d77099
Fixes CLUSTER COUNTKEYSINSLOT (#9672)
Introduced via typo in #9504. 
Also adds a sanity test for coverage.
2021-10-24 12:32:53 +03:00
Shaya Potter
cf860df599
Fix module blocked clients RESP version (#9634)
Before this commit, module blocked clients did not carry through the original RESP version, resulting with RESP3 clients receiving unexpected RESP2 replies.
2021-10-21 14:01:10 +03:00
Yossi Gottlieb
8bf4c2e38c
Fix test modules build issue on OS X 11. (#9658) 2021-10-20 21:01:30 +03:00
Oran Agra
7d6744c739
fix new cluster tests issues (#9657)
Following #9483 the daily CI exposed a few problems.

* The cluster creation code (uses redis-cli) is complicated to test with TLS enabled.
  for now i'm just skipping them since the tests we run there don't really need that kind of coverage
* cluster port binding failures
  note that `find_available_port` already looks for a free cluster port
  but the code in `wait_server_started` couldn't detect the failure of binding
  (the text it greps for wasn't found in the log)
2021-10-20 15:40:28 +03:00
guybe7
43e736f79b
Treat subcommands as commands (#9504)
## Intro

The purpose is to allow having different flags/ACL categories for
subcommands (Example: CONFIG GET is ok-loading but CONFIG SET isn't)

We create a small command table for every command that has subcommands
and each subcommand has its own flags, etc. (same as a "regular" command)

This commit also unites the Redis and the Sentinel command tables

## Affected commands

CONFIG
Used to have "admin ok-loading ok-stale no-script"
Changes:
1. Dropped "ok-loading" in all except GET (this doesn't change behavior since
there were checks in the code doing that)

XINFO
Used to have "read-only random"
Changes:
1. Dropped "random" in all except CONSUMERS

XGROUP
Used to have "write use-memory"
Changes:
1. Dropped "use-memory" in all except CREATE and CREATECONSUMER

COMMAND
No changes.

MEMORY
Used to have "random read-only"
Changes:
1. Dropped "random" in PURGE and USAGE

ACL
Used to have "admin no-script ok-loading ok-stale"
Changes:
1. Dropped "admin" in WHOAMI, GENPASS, and CAT

LATENCY
No changes.

MODULE
No changes.

SLOWLOG
Used to have "admin random ok-loading ok-stale"
Changes:
1. Dropped "random" in RESET

OBJECT
Used to have "read-only random"
Changes:
1. Dropped "random" in ENCODING and REFCOUNT

SCRIPT
Used to have "may-replicate no-script"
Changes:
1. Dropped "may-replicate" in all except FLUSH and LOAD

CLIENT
Used to have "admin no-script random ok-loading ok-stale"
Changes:
1. Dropped "random" in all except INFO and LIST
2. Dropped "admin" in ID, TRACKING, CACHING, GETREDIR, INFO, SETNAME, GETNAME, and REPLY

STRALGO
No changes.

PUBSUB
No changes.

CLUSTER
Changes:
1. Dropped "admin in countkeysinslots, getkeysinslot, info, nodes, keyslot, myid, and slots

SENTINEL
No changes.

(note that DEBUG also fits, but we decided not to convert it since it's for
debugging and anyway undocumented)

## New sub-command
This commit adds another element to the per-command output of COMMAND,
describing the list of subcommands, if any (in the same structure as "regular" commands)
Also, it adds a new subcommand:
```
COMMAND LIST [FILTERBY (MODULE <module-name>|ACLCAT <cat>|PATTERN <pattern>)]
```
which returns a set of all commands (unless filters), but excluding subcommands.

## Module API
A new module API, RM_CreateSubcommand, was added, in order to allow
module writer to define subcommands

## ACL changes:
1. Now, that each subcommand is actually a command, each has its own ACL id.
2. The old mechanism of allowed_subcommands is redundant
(blocking/allowing a subcommand is the same as blocking/allowing a regular command),
but we had to keep it, to support the widespread usage of allowed_subcommands
to block commands with certain args, that aren't subcommands (e.g. "-select +select|0").
3. I have renamed allowed_subcommands to allowed_firstargs to emphasize the difference.
4. Because subcommands are commands in ACL too, you can now use "-" to block subcommands
(e.g. "+client -client|kill"), which wasn't possible in the past.
5. It is also possible to use the allowed_firstargs mechanism with subcommand.
For example: `+config -config|set +config|set|loglevel` will block all CONFIG SET except
for setting the log level.
6. All of the ACL changes above required some amount of refactoring.

## Misc
1. There are two approaches: Either each subcommand has its own function or all
   subcommands use the same function, determining what to do according to argv[0].
   For now, I took the former approaches only with CONFIG and COMMAND,
   while other commands use the latter approach (for smaller blamelog diff).
2. Deleted memoryGetKeys: It is no longer needed because MEMORY USAGE now uses the "range" key spec.
4. Bugfix: GETNAME was missing from CLIENT's help message.
5. Sentinel and Redis now use the same table, with the same function pointer.
   Some commands have a different implementation in Sentinel, so we redirect
   them (these are ROLE, PUBLISH, and INFO).
6. Command stats now show the stats per subcommand (e.g. instead of stats just
   for "config" you will have stats for "config|set", "config|get", etc.)
7. It is now possible to use COMMAND directly on subcommands:
   COMMAND INFO CONFIG|GET (The pipeline syntax was inspired from ACL, and
   can be used in functions lookupCommandBySds and lookupCommandByCString)
8. STRALGO is now a container command (has "help")

## Breaking changes:
1. Command stats now show the stats per subcommand (see (5) above)
2021-10-20 11:52:57 +03:00
qetu3790
4962c5526d
Release clients blocked on module commands in cluster resharding and down state (#9483)
Prevent clients from being blocked forever in cluster when they block with their own module command
and the hash slot is migrated to another master at the same time.
These will get a redirection message when unblocked.
Also, release clients blocked on module commands when cluster is down (same as other blocked clients)

This commit adds basic tests for the main (non-cluster) redis test infra that test the cluster.
This was done because the cluster test infra can't handle some common test features,
but most importantly we only build the test modules with the non-cluster test suite.

note that rather than really supporting cluster operations by the test infra, it was added (as dup code)
in two files, one for module tests and one for non-modules tests, maybe in the future we'll refactor that.

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-10-19 11:50:37 +03:00
Wen Hui
1c2b5f5318
Make Cluster-bus port configurable with new cluster-port config (#9389)
Make Cluster-bus port configurable with new cluster-port config
2021-10-18 22:28:27 -07:00
Viktor Söderqvist
b7f2a1a217
Add RedisModule_KeyExists (#9600)
The LRU of the key is not touched. Locically expired keys are
logically not existing, so they're treated as such.
2021-10-18 22:21:19 +03:00
yoav-steinberg
81095b1bd9
Skip Active-defrag edge case test until we fix it. (#9645)
Test started failing consistently in 32bit builds after upgrading to jemalloc 5.2.1 (#9623).
2021-10-18 13:28:52 +03:00
Oran Agra
276b460ea9
Attempt to fix a valgrind test failure due to timing (#9643)
in the past few days i've seen two failures in the valgrind daily test.

*** [err]: slave fails full sync and diskless load swapdb recovers it in tests/integration/replication.tcl
Replica didn't get into loading mode

can't reproduce it, but i'm hoping it's just too slow (to start loading within 5 seconds)
2021-10-18 10:45:45 +03:00
Hanna Fadida
61bb044156
Modify mem_usage2 module callback to enable to take sample_size argument (#9612)
This is useful for approximating size computation of complex module types.
Note that the mem_usage2 callback is new and has not been released yet, which is why we can modify it.
2021-10-17 17:31:06 +03:00
Yossi Gottlieb
6d5a911707
Fix daily failures due to macos-latest change. (#9637)
* Fix test modules linking on macOS 11.x.
* Use macOS 10.x for FreeBSD VM as VirtualBox is not yet supported on
  11.
2021-10-17 00:07:27 +03:00
Madelyn Olson
a6b5d518a9
Improved the reliability of cluster replica sync tests (#9628)
Improved the reliability of cluster replica sync tests
2021-10-13 00:06:53 -07:00
Bjorn Svensson
54d01e363a
Move config cluster-config-file to generic configs (#9597) 2021-10-07 22:32:40 -07:00
yoav-steinberg
834e8843de
obuf based eviction tests run until eviction occurs (#9611)
obuf based eviction tests run until eviction occurs instead of assuming a certain
amount of writes will fill the obuf enough for eviction to occur.
This handles the kernel buffering written data and emptying the obuf even though
no one actualy reads from it.

The tests have a new timeout of 20sec: if the test doesn't pass after 20 sec it'll fail.
Hopefully this enough for our slow CI targets.

This also eliminates the need to skip some tests in TLS.
2021-10-07 15:43:48 +03:00
Huang Zhw
fd135f3e2d
Make tracking invalidation messages always after command's reply (#9422)
Tracking invalidation messages were sometimes sent in inconsistent order,
before the command's reply rather than after.
In addition to that, they were sometimes embedded inside other commands
responses, like MULTI-EXEC and MGET.
2021-10-07 15:13:42 +03:00
GutovskyMaria
d98d1ad574
Hide empty and loading replicas from CLUSTER SLOTS responses (#9287)
Hide empty and loading replicas from CLUSTER SLOTS responses
2021-10-06 22:22:27 -07:00
yoav-steinberg
123cc1a1bc
Test fails when flushdb triggers a bgsave (#9535)
Flush db and *then* wait for the bgsave to complete.
2021-10-06 11:50:47 +03:00
yoav-steinberg
897c7bddf5
Attempt to fix rare pubsub oubuf maxmemory eviction test failure (#9603)
* Reduce delay between publishes to allow less time to write the obufs.
* More subscribed clients to buffer more data per publish.
* Make sure main connection isn't evicted (it has a large qbuf).
2021-10-05 18:00:19 +03:00
yoav-steinberg
83478e6102
argv mem leak during multi command execution. (#9598)
Changes in #9528 lead to memory leak if the command implementation
used rewriteClientCommandArgument inside MULTI-EXEC.

Adding an explicit test for that case since the test that uncovered it
didn't specifically target this scenario
2021-10-05 12:17:36 +03:00
Meir Shpilraien (Spielrein)
0f8b634cd5
Fix invalid memory write on lua stack overflow (CVE-2021-32626) (#9591)
When LUA call our C code, by default, the LUA stack has room for 10
elements. In most cases, this is more than enough but sometimes it's not
and the caller must verify the LUA stack size before he pushes elements.

On 3 places in the code, there was no verification of the LUA stack size.
On specific inputs this missing verification could have lead to invalid
memory write:
1. On 'luaReplyToRedisReply', one might return a nested reply that will
   explode the LUA stack.
2. On 'redisProtocolToLuaType', the Redis reply might be deep enough
   to explode the LUA stack (notice that currently there is no such
   command in Redis that returns such a nested reply, but modules might
   do it)
3. On 'ldbRedis', one might give a command with enough arguments to
   explode the LUA stack (all the arguments will be pushed to the LUA
   stack)

This commit is solving all those 3 issues by calling 'lua_checkstack' and
verify that there is enough room in the LUA stack to push elements. In
case 'lua_checkstack' returns an error (there is not enough room in the
LUA stack and it's not possible to increase the stack), we will do the
following:
1. On 'luaReplyToRedisReply', we will return an error to the user.
2. On 'redisProtocolToLuaType' we will exit with panic (we assume this
   scenario is rare because it can only happen with a module).
3. On 'ldbRedis', we return an error.
2021-10-04 15:17:50 +03:00
Oran Agra
b0ca3be2bb
Fix protocol parsing on 'ldbReplParseCommand' (CVE-2021-32672) (#9590)
The protocol parsing on 'ldbReplParseCommand' (LUA debugging)
Assumed protocol correctness. This means that if the following
is given:
*1
$100
test
The parser will try to read additional 94 unallocated bytes after
the client buffer.
This commit fixes this issue by validating that there are actually enough
bytes to read. It also limits the amount of data that can be sent by
the debugger client to 1M so the client will not be able to explode
the memory.

Co-authored-by: meir@redislabs.com <meir@redislabs.com>
2021-10-04 12:14:12 +03:00
Oran Agra
c5e6a6204c
Fix ziplist and listpack overflows and truncations (CVE-2021-32627, CVE-2021-32628) (#9589)
- fix possible heap corruption in ziplist and listpack resulting by trying to
  allocate more than the maximum size of 4GB.
- prevent ziplist (hash and zset) from reaching size of above 1GB, will be
  converted to HT encoding, that's not a useful size.
- prevent listpack (stream) from reaching size of above 1GB.
- XADD will start a new listpack if the new record may cause the previous
  listpack to grow over 1GB.
- XADD will respond with an error if a single stream record is over 1GB
- List type (ziplist in quicklist) was truncating strings that were over 4GB,
  now it'll respond with an error.

Co-authored-by: sundb <sundbcn@gmail.com>
2021-10-04 12:11:02 +03:00
Oran Agra
fba15850e5
Prevent unauthenticated client from easily consuming lots of memory (CVE-2021-32675) (#9588)
This change sets a low limit for multibulk and bulk length in the
protocol for unauthenticated connections, so that they can't easily
cause redis to allocate massive amounts of memory by sending just a few
characters on the network.
The new limits are 10 arguments of 16kb each (instead of 1m of 512mb)
2021-10-04 12:10:31 +03:00
YaacovHazan
5becb7c9c6
improve the stability and correctness of "Test child sending info" (#9562)
Since we measure the COW size in this test by changing some keys and reading
the reported COW size, we need to ensure that the "dismiss mechanism" (#8974)
will not free memory and reduce the COW size.

For that, this commit changes the size of the keys to 512B (less than a page).
and because some keys may fall into the same page, we are modifying ten keys
on each iteration and check for at least 50% change in the COW size.
2021-10-04 10:32:26 +03:00
yoav-steinberg
6f4f31f167
decrby LLONG_MIN caused nagation overflow. (#9577)
Note that this breaks compatibility because in the past doing:
DECRBY x -9223372036854775808
would succeed (and create an invalid result) and now this returns an error.
2021-10-03 09:38:05 +03:00
yoav-steinberg
93e8534713
Remove argument count limit, dynamically grow argv. (#9528)
Remove hard coded multi-bulk limit (was 1,048,576), new limit is INT_MAX.
When client sends an m-bulk that's higher than 1024, we initially only allocate
the argv array for 1024 arguments, and gradually grow that allocation as arguments
are received.
2021-10-03 09:13:09 +03:00
Hanna Fadida
ffafb434fb
Modules: add RM_LoadDataTypeFromStringEncver (#9537)
adding an advanced api to enable loading data that was sereialized with a specific encoding version
2021-09-30 11:21:32 +03:00
yoav-steinberg
d715655f16
verbose debug print in test to debug rare CI failure. (#9563) 2021-09-29 17:10:05 +03:00
Oran Agra
5a4ab7c7d2
Fix stream sanitization for non-int first value (#9553)
This was recently broken in #9321 when we validated stream IDs to be
integers but did that after to the stepping next record instead of before.
2021-09-26 18:46:22 +03:00
yoav-steinberg
6600253046
Client eviction ci issues (#9549)
Fixing CI test issues introduced in #8687
- valgrind warnings in readQueryFromClient when client was freed by processInputBuffer
- adding DEBUG pause-cron for tests not to be time dependent.
- skipping a test that depends on socket buffers / events not compatible with TLS
- making sure client got subscribed by not using deferring client
2021-09-26 17:45:02 +03:00
Yossi Gottlieb
0af7fe2cab
Add --skipfile and --skiptest regex support. (#9555)
Empty patterns are not considered and skipped.
Also, improve help text.
2021-09-26 15:12:37 +03:00
Huang Zhw
bdecbd30df
Fix test randstring, compare string and int is wrong. (#9544)
This will cause the generated string containing "\".
Fixes a broken change in #8687
2021-09-24 16:58:38 +03:00
Yossi Gottlieb
bebc7f8470
Add RM_TrimStringAllocation(). (#9540)
This commit makes it possible to explicitly trim the allocation of a
RedisModuleString.

Currently, Redis automatically trims strings that have been retained by
a module command when it returns. However, this is not thread safe and
may result with corruption in threaded modules.

Supporting explicit trimming offers a backwards compatible workaround to
this problem.
2021-09-23 15:00:37 +03:00
yoav-steinberg
2753429c99
Client eviction (#8687)
### Description
A mechanism for disconnecting clients when the sum of all connected clients is above a
configured limit. This prevents eviction or OOM caused by accumulated used memory
between all clients. It's a complimentary mechanism to the `client-output-buffer-limit`
mechanism which takes into account not only a single client and not only output buffers
but rather all memory used by all clients.

#### Design
The general design is as following:
* We track memory usage of each client, taking into account all memory used by the
  client (query buffer, output buffer, parsed arguments, etc...). This is kept up to date
  after reading from the socket, after processing commands and after writing to the socket.
* Based on the used memory we sort all clients into buckets. Each bucket contains all
  clients using up up to x2 memory of the clients in the bucket below it. For example up
  to 1m clients, up to 2m clients, up to 4m clients, ...
* Before processing a command and before sleep we check if we're over the configured
  limit. If we are we start disconnecting clients from larger buckets downwards until we're
  under the limit.

#### Config
`maxmemory-clients` max memory all clients are allowed to consume, above this threshold
we disconnect clients.
This config can either be set to 0 (meaning no limit), a size in bytes (possibly with MB/GB
suffix), or as a percentage of `maxmemory` by using the `%` suffix (e.g. setting it to `10%`
would mean 10% of `maxmemory`).

#### Important code changes
* During the development I encountered yet more situations where our io-threads access
  global vars. And needed to fix them. I also had to handle keeps the clients sorted into the
  memory buckets (which are global) while their memory usage changes in the io-thread.
  To achieve this I decided to simplify how we check if we're in an io-thread and make it
  much more explicit. I removed the `CLIENT_PENDING_READ` flag used for checking
  if the client is in an io-thread (it wasn't used for anything else) and just used the global
  `io_threads_op` variable the same way to check during writes.
* I optimized the cleanup of the client from the `clients_pending_read` list on client freeing.
  We now store a pointer in the `client` struct to this list so we don't need to search in it
  (`pending_read_list_node`).
* Added `evicted_clients` stat to `INFO` command.
* Added `CLIENT NO-EVICT ON|OFF` sub command to exclude a specific client from the
  client eviction mechanism. Added corrosponding 'e' flag in the client info string.
* Added `multi-mem` field in the client info string to show how much memory is used up
  by buffered multi commands.
* Client `tot-mem` now accounts for buffered multi-commands, pubsub patterns and
  channels (partially), tracking prefixes (partially).
* CLIENT_CLOSE_ASAP flag is now handled in a new `beforeNextClient()` function so
  clients will be disconnected between processing different clients and not only before sleep.
  This new function can be used in the future for work we want to do outside the command
  processing loop but don't want to wait for all clients to be processed before we get to it.
  Specifically I wanted to handle output-buffer-limit related closing before we process client
  eviction in case the two race with each other.
* Added a `DEBUG CLIENT-EVICTION` command to print out info about the client eviction
  buckets.
* Each client now holds a pointer to the client eviction memory usage bucket it belongs to
  and listNode to itself in that bucket for quick removal.
* Global `io_threads_op` variable now can contain a `IO_THREADS_OP_IDLE` value
  indicating no io-threading is currently being executed.
* In order to track memory used by each clients in real-time we can't rely on updating
  these stats in `clientsCron()` alone anymore. So now I call `updateClientMemUsage()`
  (used to be `clientsCronTrackClientsMemUsage()`) after command processing, after
  writing data to pubsub clients, after writing the output buffer and after reading from the
  socket (and maybe other places too). The function is written to be fast.
* Clients are evicted if needed (with appropriate log line) in `beforeSleep()` and before
  processing a command (before performing oom-checks and key-eviction).
* All clients memory usage buckets are grouped as follows:
  * All clients using less than 64k.
  * 64K..128K
  * 128K..256K
  * ...
  * 2G..4G
  * All clients using 4g and up.
* Added client-eviction.tcl with a bunch of tests for the new mechanism.
* Extended maxmemory.tcl to test the interaction between maxmemory and
  maxmemory-clients settings.
* Added an option to flag a numeric configuration variable as a "percent", this means that
  if we encounter a '%' after the number in the config file (or config set command) we
  consider it as valid. Such a number is store internally as a negative value. This way an
  integer value can be interpreted as either a percent (negative) or absolute value (positive).
  This is useful for example if some numeric configuration can optionally be set to a percentage
  of something else.

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-09-23 14:02:16 +03:00
YaacovHazan
a56d4533b7
Adding ACL support for modules (#9309)
This commit introduced a new flag to the RM_Call:
'C' - Check if the command can be executed according to the ACLs associated with it.

Also, three new API's added to check if a command, key, or channel can be executed or accessed
by a user, according to the ACLs associated with it.
- RM_ACLCheckCommandPerm
- RM_ACLCheckKeyPerm
- RM_ACLCheckChannelPerm

The user for these API's is a RedisModuleUser object, that for a Module user returned by the RM_CreateModuleUser API, or for a general ACL user can be retrieved by these two new API's:
- RM_GetCurrentUserName - Retrieve the user name of the client connection behind the current context.
- RM_GetModuleUserFromUserName - Get a RedisModuleUser from a user name

As a result of getting a RedisModuleUser from name, it can now also access the general ACL users (not just ones created by the module).
This mean the already existing API RM_SetModuleUserACL(), can be used to change the ACL rules for such users.
2021-09-23 08:52:56 +03:00
Binbin
14d6abd8e9
Add ZMPOP/BZMPOP commands. (#9484)
This is similar to the recent addition of LMPOP/BLMPOP (#9373), but zset.

Syntax for the new ZMPOP command:
`ZMPOP numkeys [<key> ...] MIN|MAX [COUNT count]`

Syntax for the new BZMPOP command:
`BZMPOP timeout numkeys [<key> ...] MIN|MAX [COUNT count]`

Some background:
- ZPOPMIN/ZPOPMAX take only one key, and can return multiple elements.
- BZPOPMIN/BZPOPMAX take multiple keys, but return only one element from just one key.
- ZMPOP/BZMPOP can take multiple keys, and can return multiple elements from just one key.

Note that ZMPOP/BZMPOP can take multiple keys, it eventually operates on just on key.
And it will propagate as ZPOPMIN or ZPOPMAX with the COUNT option.

As new commands, if we can not pop any elements, the response like:
- ZMPOP: Return a NIL in both RESP2 and RESP3, unlike ZPOPMIN/ZPOPMAX return emptyarray.
- BZMPOP: Return a NIL in both RESP2 and RESP3 when timeout is reached, like BZPOPMIN/BZPOPMAX.

For the normal response is nested arrays in RESP2 and RESP3:
```
ZMPOP/BZMPOP
1) keyname
2) 1) 1) member1
      2) score1
   2) 1) member2
      2) score2

In RESP2:
1) "myzset"
2) 1) 1) "three"
      2) "3"
   2) 1) "two"
      2) "2"

In RESP3:
1) "myzset"
2) 1) 1) "three"
      2) (double) 3
   2) 1) "two"
      2) (double) 2
```
2021-09-23 08:34:40 +03:00
Oran Agra
5f7789d329
tune lazyfree test timeout (#9527)
i've seen this CI failure a couple of times on MacOS:

*** [err]: lazy free a stream with all types of metadata in tests/unit/lazyfree.tcl
lazyfree isn't done

only reason i can think of is that 500ms is sometimes not enough on slow systems.
2021-09-22 09:48:44 +03:00
Oran Agra
16be742b08
fix replication test failure, probing the wrong log file (#9513) 2021-09-19 12:07:04 +03:00
Binbin
f898a9e97d
Adds limit to SINTERCARD/ZINTERCARD. (#9425)
Implements the [LIMIT limit] variant of SINTERCARD/ZINTERCARD.
Now with the LIMIT, we can stop the searching when cardinality
reaching the limit, and return the cardinality ASAP.

Note that in SINTERCARD, the old synatx was: `SINTERCARD key [key ...]`
In order to add a optional parameter, we must break the old synatx.
So the new syntax of SINTERCARD will be consistent with ZINTERCARD.
New syntax: `SINTERCARD numkeys key [key ...] [LIMIT limit]`.

Note that this means that SINTERCARD has a different syntax than
SINTER and SINTERSTORE (taking numkeys argument)

As for ZINTERCARD, we can easily add a optional parameter to it.
New syntax: `ZINTERCARD numkeys key [key ...] [LIMIT limit]`
2021-09-16 14:07:08 +03:00
Wen Hui
53ad5627b7
Sentinel: Fix failed daily tests, due to race condition (#9501) 2021-09-15 13:39:50 +03:00
guybe7
03fcc211de
A better approach for COMMAND INFO for movablekeys commands (#8324)
Fix #7297

The problem:

Today, there is no way for a client library or app to know the key name indexes for commands such as
ZUNIONSTORE/EVAL and others with "numkeys", since COMMAND INFO returns no useful info for them.

For cluster-aware redis clients, this requires to 'patch' the client library code specifically for each of these commands or to
resolve each execution of these commands with COMMAND GETKEYS.

The solution:

Introducing key specs other than the legacy "range" (first,last,step)

The 8th element of the command info array, if exists, holds an array of key specs. The array may be empty, which indicates
the command doesn't take any key arguments or may contain one or more key-specs, each one may leads to the discovery
of 0 or more key arguments.

A client library that doesn't support this key-spec feature will keep using the first,last,step and movablekeys flag which will
obviously remain unchanged.

A client that supports this key-specs feature needs only to look at the key-specs array. If it finds an unrecognized spec, it
must resort to using COMMAND GETKEYS if it wishes to get all key name arguments, but if all it needs is one key in order
to know which cluster node to use, then maybe another spec (if the command has several) can supply that, and there's no
need to use GETKEYS.

Each spec is an array of arguments, first one is the spec name, the second is an array of flags, and the third is an array
containing details about the spec (specific meaning for each spec type)
The initial flags we support are "read" and "write" indicating if the keys that this key-spec finds are used for read or for write.
clients should ignore any unfamiliar flags.

In order to easily find the positions of keys in a given array of args we introduce keys specs. There are two logical steps of
key specs:
1. `start_search`: Given an array of args, indicate where we should start searching for keys
2. `find_keys`: Given the output of start_search and an array of args, indicate all possible indices of keys.

### start_search step specs
- `index`: specify an argument index explicitly
  - `index`: 0 based index (1 means the first command argument)
- `keyword`: specify a string to match in `argv`. We should start searching for keys just after the keyword appears.
  - `keyword`: the string to search for
  - `start_search`: an index from which to start the keyword search (can be negative, which means to search from the end)

Examples:
- `SET` has start_search of type `index` with value `1`
- `XREAD` has start_search of type `keyword` with value `[“STREAMS”,1]`
- `MIGRATE` has start_search of type `keyword` with value `[“KEYS”,-2]`

### find_keys step specs
- `range`: specify `[count, step, limit]`.
  - `lastkey`: index of the last key. relative to the index returned from begin_search. -1 indicating till the last argument, -2 one before the last
  - `step`: how many args should we skip after finding a key, in order to find the next one
  - `limit`: if count is -1, we use limit to stop the search by a factor. 0 and 1 mean no limit. 2 means ½ of the remaining args, 3 means ⅓, and so on.
- “keynum”: specify `[keynum_index, first_key_index, step]`.
  - `keynum_index`: is relative to the return of the `start_search` spec.
  - `first_key_index`: is relative to `keynum_index`.
  - `step`: how many args should we skip after finding a key, in order to find the next one

Examples:
- `SET` has `range` of `[0,1,0]`
- `MSET` has `range` of `[-1,2,0]`
- `XREAD` has `range` of `[-1,1,2]`
- `ZUNION` has `start_search` of type `index` with value `1` and `find_keys` of type `keynum` with value `[0,1,1]`
- `AI.DAGRUN` has `start_search` of type `keyword` with value `[“LOAD“,1]` and `find_keys` of type `keynum` with value
  `[0,1,1]` (see https://oss.redislabs.com/redisai/master/commands/#aidagrun)

Note: this solution is not perfect as the module writers can come up with anything, but at least we will be able to find the key
args of the vast majority of commands.
If one of the above specs can’t describe the key positions, the module writer can always fall back to the `getkeys-api` option.

Some keys cannot be found easily (`KEYS` in `MIGRATE`: Imagine the argument for `AUTH` is the string “KEYS” - we will
start searching in the wrong index). 
The guarantee is that the specs may be incomplete (`incomplete` will be specified in the spec to denote that) but we never
report false information (assuming the command syntax is correct).
For `MIGRATE` we start searching from the end - `startfrom=-1` - and if one of the keys is actually called "keys" we will
report only a subset of all keys - hence the `incomplete` flag.
Some `incomplete` specs can be completely empty (i.e. UNKNOWN begin_search) which should tell the client that
COMMAND GETKEYS (or any other way to get the keys) must be used (Example: For `SORT` there is no way to describe
the STORE keyword spec, as the word "store" can appear anywhere in the command).

We will expose these key specs in the `COMMAND` command so that clients can learn, on startup, where the keys are for
all commands instead of holding hardcoded tables or use `COMMAND GETKEYS` in runtime.

Comments:
1. Redis doesn't internally use the new specs, they are only used for COMMAND output.
2. In order to support the current COMMAND INFO format (reply array indices 4, 5, 6) we created a synthetic range, called
   legacy_range, that, if possible, is built according to the new specs.
3. Redis currently uses only getkeys_proc or the legacy_range to get the keys indices (in COMMAND GETKEYS for
   example).

"incomplete" specs:
the command we have issues with are MIGRATE, STRALGO, and SORT
for MIGRATE, because the token KEYS, if exists, must be the last token, we can search in reverse. it one of the keys is
actually the string "keys" will return just a subset of the keys (hence, it's "incomplete")
for SORT and STRALGO we can use this heuristic (the keys can be anywhere in the command) and therefore we added a
key spec that is both "incomplete" and of "unknown type"

if a client encounters an "incomplete" spec it means that it must find a different way (either COMMAND GETKEYS or have
its own parser) to retrieve the keys.
please note that all commands, apart from the three mentioned above, have "complete" key specs
2021-09-15 11:10:29 +03:00
filipe oliveira
b5a879e1c2
Added URI support to redis-benchmark (cli and benchmark share the same uri-parsing methods) (#9314)
- Add `-u <uri>` command line option to support `redis://` URI scheme.
- included server connection information object (`struct cliConnInfo`),
  used to describe an ip:port pair, db num user input, and user:pass to
  avoid a large number of function arguments.
- Using sds on connection info strings for redis-benchmark/redis-cli

Co-authored-by: yoav-steinberg <yoav@monfort.co.il>
2021-09-14 19:45:06 +03:00
Viktor Söderqvist
ea36d4de17
Modules: Add remaining list API functions (#8439)
List functions operating on elements by index:

* RM_ListGet
* RM_ListSet
* RM_ListInsert
* RM_ListDelete

Iteration is done using a simple for loop over indices.
The index based functions use an internal iterator as an optimization.
This is explained in the docs:

```
 * Many of the list functions access elements by index. Since a list is in
 * essence a doubly-linked list, accessing elements by index is generally an
 * O(N) operation. However, if elements are accessed sequentially or with
 * indices close together, the functions are optimized to seek the index from
 * the previous index, rather than seeking from the ends of the list.
 *
 * This enables iteration to be done efficiently using a simple for loop:
 *
 *     long n = RM_ValueLength(key);
 *     for (long i = 0; i < n; i++) {
 *         RedisModuleString *elem = RedisModule_ListGet(key, i);
 *         // Do stuff...
 *     }
```
2021-09-14 17:48:06 +03:00
sundb
1376d83363
Fix memory leak due to missing freeCallback in blockonbackground moduleapi test (#9499)
Before #9497, before redis-server was shut down, we did not manually shut down all the clients,
which would have prevented valgrind from detecting a memory leak in the client's argc.
2021-09-14 15:14:09 +03:00
yoav-steinberg
4c7827588d
Fixed leaked client for "start_server" when running in --loop (#9497)
* On `kill_server` make sure we close the default `"client"` connection.
* Don't reconnect when trying to execute the client's `close` command.
* On `restart_server` make sure to remove the (closed) default `"client"` after killing the old server.
2021-09-13 18:16:47 +03:00
zhaozhao.zz
794442b130
PSYNC2: make partial sync possible after master reboot (#8015)
The main idea is how to allow a master to load replication info from RDB file when rebooting, if master can load replication info it means that replicas may have the chance to psync with master, it can save much traffic.

The key point is we need guarantee safety and consistency, so there
are two differences between master and replica:

1. master would load the replication info as secondary ID and
   offset, in case other masters have the same replid.
2. when master loading RDB, it would propagate expired keys as DEL
   command to replication backlog, then replica can receive these
   commands to delete stale keys.
   p.s. the expired keys when RDB loading is useful for users, so
   we show it as `rdb_last_load_keys_expired` and `rdb_last_load_keys_loaded` in info persistence.

Moreover, after load replication info, master should update
`no_replica_time` in case loading RDB cost too long time.
2021-09-13 15:39:11 +08:00
Huang Zhw
75dd230994
bitpos/bitcount add bit index (#9324)
Make bitpos/bitcount support bit index:

```
BITPOS key bit [start [end [BIT|BYTE]]]
BITCOUNT key [start end [BIT|BYTE]]
```

The default behavior is `BYTE`, so these commands are still compatible with old.
2021-09-12 11:31:22 +03:00
Meir Shpilraien (Spielrein)
05e6b97bed
Fix RedisModule_Call tests on 32bit (#9481) 2021-09-09 23:03:02 +03:00
sundb
3ca6972ecd
Replace all usage of ziplist with listpack for t_zset (#9366)
Part two of implementing #8702 (zset), after #8887.

## Description of the feature
Replaced all uses of ziplist with listpack in t_zset, and optimized some of the code to optimize performance.

## Rdb format changes
New `RDB_TYPE_ZSET_LISTPACK` rdb type.

## Rdb loading improvements:
1) Pre-expansion of dict for validation of duplicate data for listpack and ziplist.
2) Simplifying the release of empty key objects when RDB loading.
3) Unify ziplist and listpack data verify methods for zset and hash, and move code to rdb.c.

## Interface changes
1) New `zset-max-listpack-entries` config is an alias for `zset-max-ziplist-entries` (same with `zset-max-listpack-value`).
2) OBJECT ENCODING will return listpack instead of ziplist.

## Listpack improvements:
1) Add `lpDeleteRange` and `lpDeleteRangeWithEntry` functions to delete a range of entries from listpack.
2) Improve the performance of `lpCompare`, converting from string to integer is faster than converting from integer to string.
3) Replace `snprintf` with `ll2string` to improve performance in converting numbers to strings in `lpGet()`.

## Zset improvements:
1) Improve the performance of `zzlFind` method, use `lpFind` instead of `lpCompare` in a loop.
2) Use `lpDeleteRangeWithEntry` instead of `lpDelete` twice to delete a element of zset.

## Tests
1) Add some unittests for `lpDeleteRange` and `lpDeleteRangeWithEntry` function.
2) Add zset RDB loading test.
3) Add benchmark test for `lpCompare` and `ziplsitCompare`.
4) Add empty listpack zset corrupt dump test.
2021-09-09 18:18:53 +03:00
Madelyn Olson
86b0de5c41
Remove redundant validation and prevent duplicate users during ACL load (#9330)
Throw an error when a user is provided multiple times on the command line instead of silently throwing one of them away.
Remove unneeded validation for validating users on ACL load.
2021-09-09 07:40:33 -07:00
Binbin
c50af0aeba
Add LMPOP/BLMPOP commands. (#9373)
We want to add COUNT option for BLPOP.
But we can't do it without breaking compatibility due to the command arguments syntax.
So this commit introduce two new commands.

Syntax for the new LMPOP command:
`LMPOP numkeys [<key> ...] LEFT|RIGHT [COUNT count]`

Syntax for the new BLMPOP command:
`BLMPOP timeout numkeys [<key> ...] LEFT|RIGHT [COUNT count]`

Some background:
- LPOP takes one key, and can return multiple elements.
- BLPOP takes multiple keys, but returns one element from just one key.
- LMPOP can take multiple keys and return multiple elements from just one key.

Note that LMPOP/BLMPOP  can take multiple keys, it eventually operates on just one key.
And it will propagate as LPOP or RPOP with the COUNT option.

As a new command, it still return NIL if we can't pop any elements.
For the normal response is nested arrays in RESP2 and RESP3, like:
```
LMPOP/BLMPOP 
1) keyname
2) 1) element1
   2) element2
```
I.e. unlike BLPOP that returns a key name and one element so it uses a flat array,
and LPOP that returns multiple elements with no key name, and again uses a flat array,
this one has to return a nested array, and it does for for both RESP2 and RESP3 (like SCAN does)

Some discuss can see: #766 #8824
2021-09-09 12:02:33 +03:00
Wang Yuan
cee3d67f50
Delay to discard cached master when full synchronization (#9398)
* Delay to discard cache master when full synchronization
* Don't disconnect with replicas before loading transferred RDB when full sync

Previously, once replica need to start full synchronization with master,
it will discard cached master whatever full synchronization is failed or
not. 
Now we discard cached master only when transferring RDB is finished
and start to change data space, this make replica could start partial
resynchronization with another new master if new master is failed
during full synchronization.
2021-09-09 11:32:29 +03:00
chenyang8094
bc0c22fabc
Fix callReplyParseCollection memleak when use AutoMemory (#9446)
When parsing an array type reply, ctx will be lost when recursively parsing its
elements, which will cause a memory leak in automemory mode.

This is a result of the changes in #9202

Add test for callReplyParseCollection fix
2021-09-09 11:03:05 +03:00
zhaozhao.zz
1b83353dc3
Fix wrong offset when replica pause (#9448)
When a replica paused, it would not apply any commands event the command comes from master, if we feed the non-applied command to replication stream, the replication offset would be wrong, and data would be lost after failover(since replica's `master_repl_offset` grows but command is not applied).

To fix it, here are the changes:
* Don't update replica's replication offset or propagate commands to sub-replicas when it's paused in `commandProcessed`.
* Show `slave_read_repl_offset` in info reply.
* Add an assert to make sure master client should never be blocked unless pause or module (some modules may use block way to do background (parallel) processing and forward original block module command to the replica, it's not a good way but it can work, so the assert excludes module now, but someday in future all modules should rewrite block command to propagate like what `BLPOP` does).
2021-09-08 16:07:25 +08:00
Viktor Söderqvist
547c3405d4
Optimize quicklistIndex to seek from the nearest end (#9454)
Until now, giving a negative index seeks from the end of a list and a
positive seeks from the beginning. This change makes it seek from
the nearest end, regardless of the sign of the given index.

quicklistIndex is used by all list commands which operate by index.

LINDEX key 999999 in a list if 1M elements is greately optimized by
this change. Latency is cut by 75%.

LINDEX key -1000000 in a list of 1M elements, likewise.

LRANGE key -1 -1 is affected by this, since LRANGE converts the
indices to positive numbers before seeking.

The tests for corrupt dumps are updated to make sure the corrup
data is seeked in the same direction as before.
2021-09-06 09:12:38 +03:00
Wen Hui
763fd09416
Speed up sentinel tests (#9408)
Use sentinel debug to reduce default timeouts and allow tests to execute faster.
2021-09-05 13:26:29 +03:00
Madelyn Olson
8b8f05c86c
Add test verifying PUBSUB NUMPAT behavior (#9209) 2021-09-03 15:52:39 -07:00
Oran Agra
1e7ad894d2
Tune timeout of active defrag test (#9426)
Failed on Raspberry Pi 3b where that single test took about 170 seconds
2021-08-30 12:39:09 +03:00
Binbin
aefbc23451
Better error handling for updateClientOutputBufferLimit. (#9308)
This one follow #9313 and goes deeper (validation of config file parsing)

Move the check/update logic to a new updateClientOutputBufferLimit
function. So that it can be used in CONFIG SET and config file parsing.
2021-08-29 15:03:05 +03:00
Viktor Söderqvist
97dcf95cc8
redis-benchmark: improved help and warnings (#9419)
1. The output of --help:

  * On the Usage line, just write [OPTIONS] [COMMAND ARGS...] instead listing
    only a few arbitrary options and no command.
  * For --cluster, describe that if the command is supplied on the command line,
    the key must contain "{tag}". Otherwise, the command will not be sent to the
    right cluster node.
  * For -r, add a note that if -r is omitted, all commands in a benchmark will
    use the same key. Also align the description.
  * For -t, describe that -t is ignored if a command is supplied on the command
    line.

2. Print a warning if -t is present when a specific command is supplied.

3. Print all warnings and errors to stderr.

4. Remove -e from calls in redis-benchmark test suite.
2021-08-29 14:31:08 +03:00
Binbin
0835f596b8
BITSET and BITFIELD SET only propagate command when the value changed. (#9403)
In old way, we always increase server.dirty in BITSET and BITFIELD SET.
Even the command doesn't really change anything. This commit make 
sure BITSET and BITFIELD SET only increase dirty when the value changed.

Because of that, if the value not changed, some others implications:
- Avoid adding useless AOF
- Reduce replication traffic
- Will not trigger keyspace notifications (setbit)
- Will not invalidate WATCH
- Will not sent the invalidation message to the tracking client
2021-08-22 10:20:53 +03:00
Viktor Söderqvist
8f59c1ecae
Let CONFIG GET * show both replicaof and its alias (#9395) 2021-08-21 19:43:18 -07:00
sundb
492d8d0961
Sanitize dump payload: fix double free after insert dup nodekey to stream rax and returns 0 (#9399) 2021-08-20 10:37:45 +03:00
Yossi Gottlieb
1d9c8d61d8
Skip OOM-related tests on incompatible platforms. (#9386)
We only run OOM related tests on x86_64 and aarch64, as jemalloc on other
platforms (notably s390x) may actually succeed very large allocations. As
a result the test may hang for a very long time at the cleanup phase,
iterating as many as 2^61 hash table slots.
2021-08-18 16:00:22 +03:00
yoav-steinberg
19bc83716c
support regex in "--only" in runtest (#9352) 2021-08-10 14:28:24 +03:00
sundb
02fd76b97c
Replace all usage of ziplist with listpack for t_hash (#8887)
Part one of implementing #8702 (taking hashes first before other types)

## Description of the feature
1. Change ziplist encoded hash objects to listpack encoding.
2. Convert existing ziplists on RDB loading time. an O(n) operation.

## Rdb format changes
1. Add RDB_TYPE_HASH_LISTPACK rdb type.
2. Bump RDB_VERSION to 10

## Interface changes
1. New `hash-max-listpack-entries` config is an alias for `hash-max-ziplist-entries` (same with `hash-max-listpack-value`)
2. OBJECT ENCODING will return `listpack` instead of `ziplist`

## Listpack improvements:
1. Support direct insert, replace integer element (rather than convert back and forth from string)
3. Add more listpack capabilities to match the ziplist ones (like `lpFind`, `lpRandomPairs` and such)
4. Optimize element length fetching, avoid multiple calculations
5. Use inline to avoid function call overhead.

## Tests
1. Add a new test to the RDB load time conversion
2. Adding the listpack unit tests. (based on the one in ziplist.c)
3. Add a few "corrupt payload: fuzzer findings" tests, and slightly modify existing ones.

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-08-10 09:18:49 +03:00
sundb
cbda492909
Sanitize dump payload: handle remaining empty key when RDB loading and restore command (#9349)
This commit mainly fixes empty keys due to RDB loading and restore command,
which was omitted in #9297.

1) When loading quicklsit, if all the ziplists in the quicklist are empty, NULL will be returned.
    If only some of the ziplists are empty, then we will skip the empty ziplists silently.
2) When loading hash zipmap, if zipmap is empty, sanitization check will fail.
3) When loading hash ziplist, if ziplist is empty, NULL will be returned.
4) Add RDB loading test with sanitize.
2021-08-09 17:13:46 +03:00
Eduardo Semprebon
d3356bf614
Add SORT_RO command (#9299)
Add a readonly variant of the STORE command, so it can be used on
read-only workloads (replica, ACL, etc)
2021-08-09 09:40:29 +03:00
Qu Chen
e8eeba7bee
Allow master to replicate command longer than replica's query buffer limit (#9340)
Replication client no longer checks incoming command length against the client-query-buffer-limit. This makes the master able to replicate commands longer than replica's configured client-query-buffer-limit
2021-08-08 17:34:11 -07:00
DarrenJiang13
43eb0ce3bf
[BUGFIX] Add some missed error statistics (#9328)
add error counting for some missed behaviors.
2021-08-06 19:27:24 -07:00
yoav-steinberg
0a9377535b
Ignore resize threshold on idle qbuf resizing (#9322)
Also update qbuf tests to verify both idle and peak based resizing logic.
And delete unused function: getClientsMaxBuffers
2021-08-06 20:50:34 +03:00
Oran Agra
3f3f678a47
corrupt-dump-fuzzer test, avoid creating junk keys (#9302)
The execution of the RPOPLPUSH command by the fuzzer created junk keys,
that were later being selected by RANDOMKEY and modified.
This also meant that lists were statistically tested more than other
files.

Fix the fuzzer not to pass junk key names to RPOPLPUSH, and add a check
that detects that new keys are not added by the fuzzer to detect future
similar issues.
2021-08-05 22:57:05 +03:00
Oran Agra
0c90370e6d
Improvements to corrupt payload sanitization (#9321)
Recently we found two issues in the fuzzer tester: #9302 #9285
After fixing them, more problems surfaced and this PR (as well as #9297) aims to fix them.

Here's a list of the fixes
- Prevent an overflow when allocating a dict hashtable
- Prevent OOM when attempting to allocate a huge string
- Prevent a few invalid accesses in listpack
- Improve sanitization of listpack first entry
- Validate integrity of stream consumer groups PEL
- Validate integrity of stream listpack entry IDs
- Validate ziplist tail followed by extra data which start with 0xff

Co-authored-by: sundb <sundbcn@gmail.com>
2021-08-05 22:56:14 +03:00
sundb
8ea777a6a0
Sanitize dump payload: fix empty keys when RDB loading and restore command (#9297)
When we load rdb or restore command, if we encounter a length of 0, it will result in the creation of an empty key.
This could either be a corrupt payload, or a result of a bug (see #8453 )

This PR mainly fixes the following:
1) When restore command will return `Bad data format` error.
2) When loading RDB, we will silently discard the key.

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-08-05 22:42:20 +03:00