Commit Graph

7909 Commits

Author SHA1 Message Date
Wen Hui
2ce29e032b
Sentinel tls memory leak (#9753)
There was a memory leak when tls is used in Sentinels.
The memory leak is noticed when some of the replicas are offline.
2021-11-08 22:23:31 +02:00
Yossi Gottlieb
a1aba4bf75
Fix EINTR test failures. (#9751)
* Clean up EINTR handling so EINTR will not change connection state to begin with.
* On TLS, catch EINTR and return it as-is before going through OpenSSL error handling (which seems to not distinguish it from EAGAIN).
2021-11-08 16:09:33 +02:00
Huang Zhw
48d870aed1
Move config from clusterCron to config update (#9580) 2021-11-07 18:56:03 -08:00
yoav-steinberg
79ac57561f
Refactor config.c for generic setter interface (#9644)
This refactors all `CONFIG SET`s and conf file loading arguments go through
the generic config handling interface.

Refactoring changes:
- All config params go through the `standardConfig` interface (some stuff which
  is only related to the config file and not the `CONFIG` command still has special
  handling for rewrite/config file parsing, `loadmodule`, for example.) .
- Added `MULTI_ARG_CONFIG` flag for configs to signify they receive a variable
  number of arguments instead of a single argument. This is used to break up space
  separated arguments to `CONFIG SET` so the generic setter interface can pass
  multiple arguments to the setter function. When parsing the config file we also break
  up anything after the config name into multiple arguments to the setter function.

Interface changes:
- A side effect of the above interface is that the `bind` argument in the config file can
  be empty (no argument at all) this is treated the same as passing an single empty
  string argument (same as `save` already used to work).
- Support rewrite and setting `watchdog-period` from config file (was only supported
  by the CONFIG command till now).
- Another side effect is that the `save T X` config argument now supports multiple
  Time-Changes pairs in a single line like its `CONFIG SET` counterpart. So in the
  config file you can either do:
  ```
  save 3600 1
  save 600 10
  ```
  or do
  ```
  save 3600 1 600 10
  ```

Co-authored-by: Bjorn Svensson <bjorn.a.svensson@est.tech>
2021-11-07 13:40:08 +02:00
Binbin
ddb508c0a2
Fixing import of sys/errno (#9739) 2021-11-05 07:50:25 +02:00
Eduardo Semprebon
91d0c758e5
Replica keep serving data during repl-diskless-load=swapdb for better availability (#9323)
For diskless replication in swapdb mode, considering we already spend replica memory
having a backup of current db to restore in case of failure, we can have the following benefits
by instead swapping database only in case we succeeded in transferring db from master:

- Avoid `LOADING` response during failed and successful synchronization for cases where the
  replica is already up and running with data.
- Faster total time of diskless replication, because now we're moving from Transfer + Flush + Load
  time to Transfer + Load only. Flushing the tempDb is done asynchronously after swapping.
- This could be implemented also for disk replication with similar benefits if consumers are willing
  to spend the extra memory usage.

General notes:
- The concept of `backupDb` becomes `tempDb` for clarity.
- Async loading mode will only kick in if the replica is syncing from a master that has the same
  repl-id the one it had before. i.e. the data it's getting belongs to a different time of the same timeline. 
- New property in INFO: `async_loading` to differentiate from the blocking loading
- Slot to Key mapping is now a field of `redisDb` as it's more natural to access it from both server.db
  and the tempDb that is passed around.
- Because this is affecting replicas only, we assume that if they are not readonly and write commands
  during replication, they are lost after SYNC same way as before, but we're still denying CONFIG SET
  here anyways to avoid complications.

Considerations for review:
- We have many cases where server.loading flag is used and even though I tried my best, there may
  be cases where async_loading should be checked as well and cases where it shouldn't (would require
  very good understanding of whole code)
- Several places that had different behavior depending on the loading flag where actually meant to just
  handle commands coming from the AOF client differently than ones coming from real clients, changed
  to check CLIENT_ID_AOF instead.

**Additional for Release Notes**
- Bugfix - server.dirty was not incremented for any kind of diskless replication, as effect it wouldn't
  contribute on triggering next database SAVE
- New flag for RM_GetContextFlags module API: REDISMODULE_CTX_FLAGS_ASYNC_LOADING
- Deprecated RedisModuleEvent_ReplBackup. Starting from Redis 7.0, we don't fire this event.
  Instead, we have the new RedisModuleEvent_ReplAsyncLoad holding 3 sub-events: STARTED,
  ABORTED and COMPLETED.
- New module flag REDISMODULE_OPTIONS_HANDLE_REPL_ASYNC_LOAD for RedisModule_SetModuleOptions
  to allow modules to declare they support the diskless replication with async loading (when absent, we fall
  back to disk-based loading).

Co-authored-by: Eduardo Semprebon <edus@saxobank.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2021-11-04 10:46:50 +02:00
Itamar Haber
06dd202a05
Fixes LPOP/RPOP wrong replies when count is 0 (#9692)
Introduced in #8179, this fixes the command's replies in the 0 count edge case.
[BREAKING] changes the reply type when count is 0 to an empty array (instead of nil)
Moves LPOP ... 0 fast exit path after type check to reply with WRONGTYPE
2021-11-04 09:43:08 +02:00
menwen
ccf8a651f3
Retry when a blocked connection system call is interrupted by a signal (#9629)
When repl-diskless-load is enabled, the connection is set to the blocking state.
The connection may be interrupted by a signal during a system call.
This would have resulted in a disconnection and possibly a reconnection loop.

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-11-04 09:09:28 +02:00
perryitay
f27083a4a8
Add support for list type to store elements larger than 4GB (#9357)
Redis lists are stored in quicklist, which is currently a linked list of ziplists.
Ziplists are limited to storing elements no larger than 4GB, so when bigger
items are added they're getting truncated.
This PR changes quicklists so that they're capable of storing large items
in quicklist nodes that are plain string buffers rather than ziplist.

As part of the PR there were few other changes in redis: 
1. new DEBUG sub-commands: 
   - QUICKLIST-PACKED-THRESHOLD - set the threshold of for the node type to
     be plan or ziplist. default (1GB)
   - QUICKLIST <key> - Shows low level info about the quicklist encoding of <key>
2. rdb format change:
   - A new type was added - RDB_TYPE_LIST_QUICKLIST_2 . 
   - container type (packed / plain) was added to the beginning of the rdb object
     (before the actual node list).
3. testing:
   - Tests that requires over 100MB will be by default skipped. a new flag was
     added to 'runtest' to run the large memory tests (not used by default)

Co-authored-by: sundb <sundbcn@gmail.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2021-11-03 20:47:18 +02:00
guybe7
f11a2d4dd7
Fix COMMAND GETKEYS on EVAL without keys (#9733)
Add new no-mandatory-keys flag to support COMMAND GETKEYS of commands
which have no mandatory keys.

In the past we would have got this error:
```
127.0.0.1:6379> command getkeys eval "return 1" 0
(error) ERR Invalid arguments specified for command
```
2021-11-03 14:38:26 +02:00
perryitay
77d3c6bff3
fix: lookupKey on SETNX and SETXX only once (#9640)
When using SETNX and SETXX we could end up doing key lookup twice.
This presents a small inefficiency price.
Also once we have statistics of write hit and miss they'll be wrong (recording the same key hit twice)
2021-11-03 14:12:33 +02:00
yiyuaner
78025c4a26
Add checks for overflow in redis-check-aof and loadAppendOnlyFile (#9669)
Co-authored-by: guoyiyuan <guoyiyuan@sbrella.com>
2021-11-02 17:03:07 +02:00
Wang Yuan
526cbb5cff
Fix not updating backlog histlen when trimming repl backlog (#9713)
Since the loop in incrementalTrimReplicationBacklog checks the size of histlen,
we cannot afford to update it only when the loop exits, this may cause deleting
much more replication blocks, and replication backlog may be less than setting size.

introduce in #9166 

Co-authored-by: sundb <sundbcn@gmail.com>
2021-11-02 11:04:11 +02:00
zhaozhao.zz
d08f0552ee
rebuild replication backlog index when master restart (#9720)
After PR #9166 , replication backlog is not a real block of memory, just contains a
reference points to replication buffer's block and the blocks index (to accelerate
search offset when partial sync), so we need update both replication buffer's block's
offset and replication backlog blocks index's offset when master restart from RDB,
since the `server.master_repl_offset` is changed.
The implications of this bug was just a slow search, but not a replication failure.
2021-11-02 10:53:52 +02:00
Oran Agra
f1f3cceb50
fix valgrind issues with long double module test (#9709)
The module test in reply.tcl was introduced by #8521 but didn't run until recently (see #9639)
and then it started failing with valgrind.
This is because valgrind uses 64 bit long double (unlike most other platforms that have at least 80 bits)
But besides valgrind, the tests where also incompatible with ARM32, which also uses 64 bit long doubles.

We now use appropriate value to avoid issues with either valgrind or ARM32

In all the double tests, i use 3.141, which is safe since since addReplyDouble uses
`%.17Lg` which is able to represent this value without adding any digits due to precision loss. 

In the long double, since we use `%.17Lf` in ld2string, it preserves 17 significant
digits, rather than 17 digit after the decimal point (like in `%.17Lg`).
So to make these similar, i use value lower than 1 (no digits left of
the period)

Lastly, we have the same issue with TCL (no long doubles) so we read
raw protocol in that test.

Note that the only error before this fix (in both valgrind and ARM32 is this:
```
*** [err]: RM_ReplyWithLongDouble: a float reply in tests/unit/moduleapi/reply.tcl
Expected '3.141' to be equal to '3.14100000000000001' (context: type eval line 2 cmd {assert_equal 3.141 [r rw.longdouble 3.141]} proc ::test)
```
so the changes to debug.c and scripting.tcl aren't really needed, but i consider them a cleanup
(i.e. scripting.c validated a different constant than the one that's sent to it from debug.c).

Another unrelated change is to add the RESP version to the repeated tests in reply.tcl
2021-11-01 13:41:35 +02:00
罗泽轩
155c291006
Remove duplicate SET_OP_XX definitions in t_set.c. (#4326)
These definitions already exist in server.h.
2021-11-01 11:09:29 +02:00
Binbin
033578839b
Fix multiple COUNT in LMPOP/BLMPOP/ZMPOP/BZMPOP (#9701)
The previous code did not check whether COUNT is set.
So we can use `lmpop 2 key1 key2 left count 1 count 2`.

This situation can occur in LMPOP/BLMPOP/ZMPOP/BZMPOP commands.
LMPOP/BLMPOP introduced in #9373, ZMPOP/BZMPOP introduced in #9484.
2021-10-31 16:10:29 +02:00
lijinliang
215b909c1f
fix typo in db.c: synchroneus -> synchronous(2 places) (#9702)
Co-authored-by: lijinliang <lijl@newdt.cn>
2021-10-31 16:01:54 +02:00
Rafi Einstein
734cde7e38
Fix memory leak when there's a read error of module aux data from rdb. (#9705) 2021-10-31 15:59:48 +02:00
guybe7
975f51fe16
Add new SLOTSRANGE to subcommands table (#9689) 2021-10-27 10:44:14 +03:00
Wen Hui
5fb4adba65
New Cluster Command: CLUSTER DELSLOTSRANGE and CLUSTER ADDSLOTSRANGE (#9445) 2021-10-26 21:44:33 -07:00
Wen Hui
43b22f17dc
Sentinel: don't log auth-pass value for better security (#9652) 2021-10-26 13:13:12 +03:00
Wang Yuan
9ec3294b97
Add timestamp annotations in AOF (#9326)
Add timestamp annotation in AOF, one part of #9325.

Enabled with the new `aof-timestamp-enabled` config option.

Timestamp annotation format is "#TS:${timestamp}\r\n"."
TS" is short of timestamp and this method could save extra bytes in AOF.

We can use timestamp annotation for some special functions. 
- know the executing time of commands
- restore data to a specific point-in-time (by using redis-check-rdb to truncate the file)
2021-10-25 13:08:34 +03:00
Oran Agra
085615af97
Improve code doc of allowed_firstargs following #9504 (#9674)
Improve code doc for allowed_firstargs (used to be allowed_commands before #9504.
I don't think the text in the code needs to refer to the history (it's not there just for backwards compatibility).
instead it should just describe what it does.
2021-10-25 13:01:25 +03:00
Guy Korland
6cf6c36937
Replace deprecated REDISMODULE_POSTPONED_ARRAY_LEN in module tests and examples (#9677)
REDISMODULE_POSTPONED_ARRAY_LEN is deprecated, use REDISMODULE_POSTPONED_LEN instead
2021-10-25 12:00:43 +03:00
Itamar Haber
00362f2a94
Removes admin acl category from CLIENT TRACKINGINFO (#9662)
overlooked in #9504
2021-10-25 11:33:37 +03:00
Shaya Potter
12ce2c3925
Add RM_ReplyWithBigNumber module API (#9639)
Let modules use additional type of RESP3 response (unused by redis so far)
Also fix tests that where introduced in #8521 but didn't actually run.

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-10-25 11:31:20 +03:00
Wang Yuan
c1718f9d86
Replication backlog and replicas use one global shared replication buffer (#9166)
## Background
For redis master, one replica uses one copy of replication buffer, that is a big waste of memory,
more replicas more waste, and allocate/free memory for every reply list also cost much.
If we set client-output-buffer-limit small and write traffic is heavy, master may disconnect with
replicas and can't finish synchronization with replica. If we set  client-output-buffer-limit big,
master may be OOM when there are many replicas that separately keep much memory.
Because replication buffers of different replica client are the same, one simple idea is that
all replicas only use one replication buffer, that will effectively save memory.

Since replication backlog content is the same as replicas' output buffer, now we
can discard replication backlog memory and use global shared replication buffer
to implement replication backlog mechanism.

## Implementation
I create one global "replication buffer" which contains content of replication stream.
The structure of "replication buffer" is similar to the reply list that exists in every client.
But the node of list is `replBufBlock`, which has `id, repl_offset, refcount` fields.
```c
/* Replication buffer blocks is the list of replBufBlock.
 *
 * +--------------+       +--------------+       +--------------+
 * | refcount = 1 |  ...  | refcount = 0 |  ...  | refcount = 2 |
 * +--------------+       +--------------+       +--------------+
 *      |                                            /       \
 *      |                                           /         \
 *      |                                          /           \
 *  Repl Backlog                               Replia_A      Replia_B
 * 
 * Each replica or replication backlog increments only the refcount of the
 * 'ref_repl_buf_node' which it points to. So when replica walks to the next
 * node, it should first increase the next node's refcount, and when we trim
 * the replication buffer nodes, we remove node always from the head node which
 * refcount is 0. If the refcount of the head node is not 0, we must stop
 * trimming and never iterate the next node. */

/* Similar with 'clientReplyBlock', it is used for shared buffers between
 * all replica clients and replication backlog. */
typedef struct replBufBlock {
    int refcount;           /* Number of replicas or repl backlog using. */
    long long id;           /* The unique incremental number. */
    long long repl_offset;  /* Start replication offset of the block. */
    size_t size, used;
    char buf[];
} replBufBlock;
```
So now when we feed replication stream into replication backlog and all replicas, we only need
to feed stream into replication buffer `feedReplicationBuffer`. In this function, we set some fields of
replication backlog and replicas to references of the global replication buffer blocks. And we also
need to check replicas' output buffer limit to free if exceeding `client-output-buffer-limit`, and trim
replication backlog if exceeding `repl-backlog-size`.

When sending reply to replicas, we also need to iterate replication buffer blocks and send its
content, when totally sending one block for replica, we decrease current node count and
increase the next current node count, and then free the block which reference is 0 from the
head of replication buffer blocks.

Since now we use linked list to manage replication backlog, it may cost much time for iterating
all linked list nodes to find corresponding replication buffer node. So we create a rax tree to
store some nodes  for index, but to avoid rax tree occupying too much memory, i record
one per 64 nodes for index.

Currently, to make partial resynchronization as possible as much, we always let replication
backlog as the last reference of replication buffer blocks, backlog size may exceeds our setting
if slow replicas that reference vast replication buffer blocks, and this method doesn't increase
memory usage since they share replication buffer. To avoid freezing server for freeing unreferenced
replication buffer blocks when we need to trim backlog for exceeding backlog size setting,
we trim backlog incrementally (free 64 blocks per call now), and make it faster in
`beforeSleep` (free 640 blocks).

### Other changes
- `mem_total_replication_buffers`: we add this field in INFO command, it means the total
  memory of replication buffers used.
- `mem_clients_slaves`:  now even replica is slow to replicate, and its output buffer memory
  is not 0, but it still may be 0, since replication backlog and replicas share one global replication
  buffer, only if replication buffer memory is more than the repl backlog setting size, we consider
  the excess as replicas' memory. Otherwise, we think replication buffer memory is the consumption
  of repl backlog.
- Key eviction
  Since all replicas and replication backlog share global replication buffer, we think only the
  part of exceeding backlog size the extra separate consumption of replicas.
  Because we trim backlog incrementally in the background, backlog size may exceeds our
  setting if slow replicas that reference vast replication buffer blocks disconnect.
  To avoid massive eviction loop, we don't count the delayed freed replication backlog into
  used memory even if there are no replicas, i.e. we also regard this memory as replicas's memory.
- `client-output-buffer-limit` check for replica clients
  It doesn't make sense to set the replica clients output buffer limit lower than the repl-backlog-size
  config (partial sync will succeed and then replica will get disconnected). Such a configuration is
  ignored (the size of repl-backlog-size will be used). This doesn't have memory consumption
  implications since the replica client will share the backlog buffers memory.
- Drop replication backlog after loading data if needed
  We always create replication backlog if server is a master, we need it because we put DELs in
  it when loading expired keys in RDB, but if RDB doesn't have replication info or there is no rdb,
  it is not possible to support partial resynchronization, to avoid extra memory of replication backlog,
  we drop it.
- Multi IO threads
 Since all replicas and replication backlog use global replication buffer,  if I/O threads are enabled,
  to guarantee data accessing thread safe, we must let main thread handle sending the output buffer
  to all replicas. But before, other IO threads could handle sending output buffer of all replicas.

## Other optimizations
This solution resolve some other problem:
- When replicas disconnect with master since of out of output buffer limit, releasing the output
  buffer of replicas may freeze server if we set big `client-output-buffer-limit` for replicas, but now,
  it doesn't cause freezing.
- This implementation may mitigate reply list copy cost time(also freezes server) when one replication
  has huge reply buffer and another replica can copy buffer for full synchronization. now, we just copy
  reference info, it is very light.
- If we set replication backlog size big, it also may cost much time to copy replication backlog into
  replica's output buffer. But this commit eliminates this problem.
- Resizing replication backlog size doesn't empty current replication backlog content.
2021-10-25 09:24:31 +03:00
Oran Agra
6b297cd646
Improve errno reporting on fork and fopen rdbLoad failures (#9649)
I moved a bunch of stats in redisFork to be executed only on successful
fork, since they seem wrong to be done when it failed.
I guess when fork fails it does that immediately, no latency spike.
2021-10-24 16:52:44 +03:00
Itamar Haber
48e4d77099
Fixes CLUSTER COUNTKEYSINSLOT (#9672)
Introduced via typo in #9504. 
Also adds a sanity test for coverage.
2021-10-24 12:32:53 +03:00
Shaya Potter
cf860df599
Fix module blocked clients RESP version (#9634)
Before this commit, module blocked clients did not carry through the original RESP version, resulting with RESP3 clients receiving unexpected RESP2 replies.
2021-10-21 14:01:10 +03:00
guybe7
8f745da159
Fix sentinel commands, ACL dictIter leak (#9661) 2021-10-21 12:50:58 +03:00
Oran Agra
7d6744c739
fix new cluster tests issues (#9657)
Following #9483 the daily CI exposed a few problems.

* The cluster creation code (uses redis-cli) is complicated to test with TLS enabled.
  for now i'm just skipping them since the tests we run there don't really need that kind of coverage
* cluster port binding failures
  note that `find_available_port` already looks for a free cluster port
  but the code in `wait_server_started` couldn't detect the failure of binding
  (the text it greps for wasn't found in the log)
2021-10-20 15:40:28 +03:00
guybe7
43e736f79b
Treat subcommands as commands (#9504)
## Intro

The purpose is to allow having different flags/ACL categories for
subcommands (Example: CONFIG GET is ok-loading but CONFIG SET isn't)

We create a small command table for every command that has subcommands
and each subcommand has its own flags, etc. (same as a "regular" command)

This commit also unites the Redis and the Sentinel command tables

## Affected commands

CONFIG
Used to have "admin ok-loading ok-stale no-script"
Changes:
1. Dropped "ok-loading" in all except GET (this doesn't change behavior since
there were checks in the code doing that)

XINFO
Used to have "read-only random"
Changes:
1. Dropped "random" in all except CONSUMERS

XGROUP
Used to have "write use-memory"
Changes:
1. Dropped "use-memory" in all except CREATE and CREATECONSUMER

COMMAND
No changes.

MEMORY
Used to have "random read-only"
Changes:
1. Dropped "random" in PURGE and USAGE

ACL
Used to have "admin no-script ok-loading ok-stale"
Changes:
1. Dropped "admin" in WHOAMI, GENPASS, and CAT

LATENCY
No changes.

MODULE
No changes.

SLOWLOG
Used to have "admin random ok-loading ok-stale"
Changes:
1. Dropped "random" in RESET

OBJECT
Used to have "read-only random"
Changes:
1. Dropped "random" in ENCODING and REFCOUNT

SCRIPT
Used to have "may-replicate no-script"
Changes:
1. Dropped "may-replicate" in all except FLUSH and LOAD

CLIENT
Used to have "admin no-script random ok-loading ok-stale"
Changes:
1. Dropped "random" in all except INFO and LIST
2. Dropped "admin" in ID, TRACKING, CACHING, GETREDIR, INFO, SETNAME, GETNAME, and REPLY

STRALGO
No changes.

PUBSUB
No changes.

CLUSTER
Changes:
1. Dropped "admin in countkeysinslots, getkeysinslot, info, nodes, keyslot, myid, and slots

SENTINEL
No changes.

(note that DEBUG also fits, but we decided not to convert it since it's for
debugging and anyway undocumented)

## New sub-command
This commit adds another element to the per-command output of COMMAND,
describing the list of subcommands, if any (in the same structure as "regular" commands)
Also, it adds a new subcommand:
```
COMMAND LIST [FILTERBY (MODULE <module-name>|ACLCAT <cat>|PATTERN <pattern>)]
```
which returns a set of all commands (unless filters), but excluding subcommands.

## Module API
A new module API, RM_CreateSubcommand, was added, in order to allow
module writer to define subcommands

## ACL changes:
1. Now, that each subcommand is actually a command, each has its own ACL id.
2. The old mechanism of allowed_subcommands is redundant
(blocking/allowing a subcommand is the same as blocking/allowing a regular command),
but we had to keep it, to support the widespread usage of allowed_subcommands
to block commands with certain args, that aren't subcommands (e.g. "-select +select|0").
3. I have renamed allowed_subcommands to allowed_firstargs to emphasize the difference.
4. Because subcommands are commands in ACL too, you can now use "-" to block subcommands
(e.g. "+client -client|kill"), which wasn't possible in the past.
5. It is also possible to use the allowed_firstargs mechanism with subcommand.
For example: `+config -config|set +config|set|loglevel` will block all CONFIG SET except
for setting the log level.
6. All of the ACL changes above required some amount of refactoring.

## Misc
1. There are two approaches: Either each subcommand has its own function or all
   subcommands use the same function, determining what to do according to argv[0].
   For now, I took the former approaches only with CONFIG and COMMAND,
   while other commands use the latter approach (for smaller blamelog diff).
2. Deleted memoryGetKeys: It is no longer needed because MEMORY USAGE now uses the "range" key spec.
4. Bugfix: GETNAME was missing from CLIENT's help message.
5. Sentinel and Redis now use the same table, with the same function pointer.
   Some commands have a different implementation in Sentinel, so we redirect
   them (these are ROLE, PUBLISH, and INFO).
6. Command stats now show the stats per subcommand (e.g. instead of stats just
   for "config" you will have stats for "config|set", "config|get", etc.)
7. It is now possible to use COMMAND directly on subcommands:
   COMMAND INFO CONFIG|GET (The pipeline syntax was inspired from ACL, and
   can be used in functions lookupCommandBySds and lookupCommandByCString)
8. STRALGO is now a container command (has "help")

## Breaking changes:
1. Command stats now show the stats per subcommand (see (5) above)
2021-10-20 11:52:57 +03:00
qetu3790
4962c5526d
Release clients blocked on module commands in cluster resharding and down state (#9483)
Prevent clients from being blocked forever in cluster when they block with their own module command
and the hash slot is migrated to another master at the same time.
These will get a redirection message when unblocked.
Also, release clients blocked on module commands when cluster is down (same as other blocked clients)

This commit adds basic tests for the main (non-cluster) redis test infra that test the cluster.
This was done because the cluster test infra can't handle some common test features,
but most importantly we only build the test modules with the non-cluster test suite.

note that rather than really supporting cluster operations by the test infra, it was added (as dup code)
in two files, one for module tests and one for non-modules tests, maybe in the future we'll refactor that.

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-10-19 11:50:37 +03:00
Bjorn Svensson
c9fabc2ef0
Move config unixsocketperm to generic configs (#9607)
Since the size of mode_t is platform dependant we handle the
`unixsocketperm` configuration as a generic int type.
mode_t is either an unsigned int or unsigned short (macOS) and
the range-limits allows for a simple cast to a mode_t.
2021-10-18 23:58:52 -07:00
Wen Hui
1c2b5f5318
Make Cluster-bus port configurable with new cluster-port config (#9389)
Make Cluster-bus port configurable with new cluster-port config
2021-10-18 22:28:27 -07:00
Viktor Söderqvist
b7f2a1a217
Add RedisModule_KeyExists (#9600)
The LRU of the key is not touched. Locically expired keys are
logically not existing, so they're treated as such.
2021-10-18 22:21:19 +03:00
DarrenJiang13
aa6deff01e
add missed error counting (#9646)
* add: add missed error counting in sentinel.c and cluster.c
2021-10-18 15:53:10 +03:00
七飒
afd8c4e007
sdstrim remove excessive check (#4045)
there is no need to compare the value of ep and sp
```
    sp = start = s;
    // the only way that make ep > sp is sdslen(s) == 0
    // so when ep > sp,must exist ep-sp == -1
    ep = end = s+sdslen(s)-1;
    while(sp <= end && strchr(cset, *sp)) sp++;
    while(ep > sp && strchr(cset, *ep)) ep--;
    // -1 + 1 already equals 0
    len = (sp > ep) ? 0 : ((ep-sp)+1);
```

Signed-off-by: Bo Cai <charpty@gmail.com>
2021-10-17 20:37:52 +03:00
Ilya Shipitsin
94fded4f4f
Code cleanup, resolve an issue identified by cppcheck (#4373)
[src/bitops.c:512] -> [src/bitops.c:507]: (warning) Either the condition 'if(o&&o->encoding==1)' is redundant or there is possible null pointer dereference: o.

This function has checks for `o` to be null or non-null, so it is odd that it accesses it first..
2021-10-17 18:48:15 +03:00
Hanna Fadida
61bb044156
Modify mem_usage2 module callback to enable to take sample_size argument (#9612)
This is useful for approximating size computation of complex module types.
Note that the mem_usage2 callback is new and has not been released yet, which is why we can modify it.
2021-10-17 17:31:06 +03:00
Oran Agra
e7864a2b70
fix typos in module doc / header (#9641)
the RedisModule_ReplyWithPush prototype was merged by mistake (no such API yet)
2021-10-17 17:15:27 +03:00
guoxiang1996
3c9e5271c6
Use fcntl(fd,F_FULLFSYNC) instead of fsync on OSX, improve power failure safety (#9545)
On MacOS calling fsync does not guarantee the cache on the disk itself is flushed.
2021-10-15 08:44:25 +03:00
Shaya Potter
24b67d5520
Add RM_ReplyWithVerbatimStringType that takes an ext/type arg (#9632)
Verbatim Stings in RESP3 have a type/extension.
The existing redismoule reply function, hard coded it to "txt".
2021-10-14 09:53:46 +03:00
Ofir Luzon
49d26a9658
Add sleep interval to redis-cli --scan option (#3751)
Adding -i option (sleep interval) of repeat and bigkeys to redis-cli --scan.
When the keyspace contains many already expired keys scanning the
dataset with redis-cli --scan can impact the performance

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-10-13 16:54:35 +03:00
Madelyn Olson
a6b5d518a9
Improved the reliability of cluster replica sync tests (#9628)
Improved the reliability of cluster replica sync tests
2021-10-13 00:06:53 -07:00
Ning Xie
075ac34545
Fix redis-cli SCAN sleep interval for big/hot keys (could have been skipped) (#9624)
bigkeys sleep is defined each 100 scanned keys, and it is checked it only between scan cycles.
In cases that scan does not return exactly 10 keys it will never sleep.
In addition the comment was sleep each 100 SCANs but it was 100 scanned keys.
2021-10-12 23:00:49 +03:00
yoav-steinberg
252981914f
XADD - skip rewrite the id arg if it was given and is valid. (#9599)
When calling `XADD` with a predefined id (instead of `*`) there's no need to run
the code which replaces the supplied id with itself. Only when we pass a wildcard
id we need to do this.
For apps which always supply their own id this is a slight optimization.
2021-10-11 13:09:18 +03:00
zhaozhao.zz
484a1ad67e
master client should ignore proto_max_bulk_len in bitops (#9626) 2021-10-11 13:58:42 +08:00
menwen
7ff7536e2c
Delete unused 'time' fields from struct bio_job (#9622)
looks like this field was never actually used and the call to time() is excessive.
2021-10-10 08:17:54 +03:00
Bjorn Svensson
b874c6f1fc
Move config logfile to generic config (#9592)
Move config `logfile` to generic configs
2021-10-07 22:33:08 -07:00
Bjorn Svensson
54d01e363a
Move config cluster-config-file to generic configs (#9597) 2021-10-07 22:32:40 -07:00
Huang Zhw
fd135f3e2d
Make tracking invalidation messages always after command's reply (#9422)
Tracking invalidation messages were sometimes sent in inconsistent order,
before the command's reply rather than after.
In addition to that, they were sometimes embedded inside other commands
responses, like MULTI-EXEC and MGET.
2021-10-07 15:13:42 +03:00
GutovskyMaria
d98d1ad574
Hide empty and loading replicas from CLUSTER SLOTS responses (#9287)
Hide empty and loading replicas from CLUSTER SLOTS responses
2021-10-06 22:22:27 -07:00
Andy Pan
2391aefd82
Implement anetPipe() to combine creating pipe and setting flags (#9511)
Implement createPipe() to combine creating pipe and setting flags, also reduce
system calls by prioritizing pipe2() over pipe().

Without createPipe(), we have to call pipe() to create a pipe and then call some
functions (like anetCloexec() and anetNonBlock()) of anet.c to set flags respectively,
which leads to some extra system calls, now we can leverage pipe2() to combine
them and make the process of creating pipe more convergent in createPipe().

Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Oran Agra <oran@redislabs.com>
2021-10-06 16:08:13 +03:00
yoav-steinberg
5725088ff2
Avoid argv memcpy when queuing a multi command. (#9602)
When queuing a multi command we duplicated the argv (meaning an alloc
and a memcpy). This isn't needed since we can use the previously allocated
argv and just reset the client objects argv to NULL. This should saves some
memory and is a minor optimization in heavy MULTI/EXEC traffic, especially
if there are lots of arguments.
2021-10-06 11:39:09 +03:00
Meir Shpilraien (Spielrein)
4fb39b6700
Added module-acquire-GIL latency stats (#9608)
The new value indicates how long Redis wait to
acquire the GIL after sleep. This can help identify
problems where a module perform some background
operation for a long time (with the GIL held) and
blocks the Redis main thread.
2021-10-06 11:33:01 +03:00
tzongw
f5160ed0aa
improve latency when a client is unblocked by module timer (#9593)
Scenario:
1. client block on command `XREAD BLOCK 0 STREAMS mystream  $`
2. in a module, calling `XADD mystream * field value` via lua from a timer callback
3. client will receive response after some latency up to 100ms

Reason:
When `XADD` signal the key `mystream` as ready, `beforeSleep` in next eventloop will call
`handleClientsBlockedOnKeys` to unblock the client and add pending data to write but not
actually install a write handler, so next redis will block in `aeApiPoll` up to 100ms given `hz`
config as default 10, pending data will be sent in another next eventloop by
`handleClientsWithPendingWritesUsingThreads`.

Calling `handleClientsBlockedOnKeys` before `handleClientsWithPendingWritesUsingThreads`
in `beforeSleep` solves the problem.
2021-10-06 10:15:03 +03:00
yoav-steinberg
83478e6102
argv mem leak during multi command execution. (#9598)
Changes in #9528 lead to memory leak if the command implementation
used rewriteClientCommandArgument inside MULTI-EXEC.

Adding an explicit test for that case since the test that uncovered it
didn't specifically target this scenario
2021-10-05 12:17:36 +03:00
Meir Shpilraien (Spielrein)
0f8b634cd5
Fix invalid memory write on lua stack overflow (CVE-2021-32626) (#9591)
When LUA call our C code, by default, the LUA stack has room for 10
elements. In most cases, this is more than enough but sometimes it's not
and the caller must verify the LUA stack size before he pushes elements.

On 3 places in the code, there was no verification of the LUA stack size.
On specific inputs this missing verification could have lead to invalid
memory write:
1. On 'luaReplyToRedisReply', one might return a nested reply that will
   explode the LUA stack.
2. On 'redisProtocolToLuaType', the Redis reply might be deep enough
   to explode the LUA stack (notice that currently there is no such
   command in Redis that returns such a nested reply, but modules might
   do it)
3. On 'ldbRedis', one might give a command with enough arguments to
   explode the LUA stack (all the arguments will be pushed to the LUA
   stack)

This commit is solving all those 3 issues by calling 'lua_checkstack' and
verify that there is enough room in the LUA stack to push elements. In
case 'lua_checkstack' returns an error (there is not enough room in the
LUA stack and it's not possible to increase the stack), we will do the
following:
1. On 'luaReplyToRedisReply', we will return an error to the user.
2. On 'redisProtocolToLuaType' we will exit with panic (we assume this
   scenario is rare because it can only happen with a module).
3. On 'ldbRedis', we return an error.
2021-10-04 15:17:50 +03:00
Oran Agra
9e3dca8bef
Fix mem leak in loading AOF, introduced by #9528 (#9582)
Recently merged PR introduced a leak when loading AOF files.
This was because argv_len wasn't set, so rewriteClientCommandArgument
would shrink the argv array and updating argc to a small value.
2021-10-04 12:17:22 +03:00
Oran Agra
b0ca3be2bb
Fix protocol parsing on 'ldbReplParseCommand' (CVE-2021-32672) (#9590)
The protocol parsing on 'ldbReplParseCommand' (LUA debugging)
Assumed protocol correctness. This means that if the following
is given:
*1
$100
test
The parser will try to read additional 94 unallocated bytes after
the client buffer.
This commit fixes this issue by validating that there are actually enough
bytes to read. It also limits the amount of data that can be sent by
the debugger client to 1M so the client will not be able to explode
the memory.

Co-authored-by: meir@redislabs.com <meir@redislabs.com>
2021-10-04 12:14:12 +03:00
Oran Agra
c5e6a6204c
Fix ziplist and listpack overflows and truncations (CVE-2021-32627, CVE-2021-32628) (#9589)
- fix possible heap corruption in ziplist and listpack resulting by trying to
  allocate more than the maximum size of 4GB.
- prevent ziplist (hash and zset) from reaching size of above 1GB, will be
  converted to HT encoding, that's not a useful size.
- prevent listpack (stream) from reaching size of above 1GB.
- XADD will start a new listpack if the new record may cause the previous
  listpack to grow over 1GB.
- XADD will respond with an error if a single stream record is over 1GB
- List type (ziplist in quicklist) was truncating strings that were over 4GB,
  now it'll respond with an error.

Co-authored-by: sundb <sundbcn@gmail.com>
2021-10-04 12:11:02 +03:00
Oran Agra
fba15850e5
Prevent unauthenticated client from easily consuming lots of memory (CVE-2021-32675) (#9588)
This change sets a low limit for multibulk and bulk length in the
protocol for unauthenticated connections, so that they can't easily
cause redis to allocate massive amounts of memory by sending just a few
characters on the network.
The new limits are 10 arguments of 16kb each (instead of 1m of 512mb)
2021-10-04 12:10:31 +03:00
Oran Agra
7cb89a5a1c
Fix Integer overflow issue with intsets (CVE-2021-32687) (#9586)
The vulnerability involves changing the default set-max-intset-entries
configuration parameter to a very large value and constructing specially
crafted commands to manipulate sets
2021-10-04 12:09:25 +03:00
yiyuaner
24cc0b984d
Fix integer overflow in _sdsMakeRoomFor (CVE-2021-41099) (#9558)
The existing overflow checks handled the greedy growing, but didn't handle
a case where the addition of the header size is what causes the overflow.
2021-10-04 11:11:09 +03:00
yoav-steinberg
6f4f31f167
decrby LLONG_MIN caused nagation overflow. (#9577)
Note that this breaks compatibility because in the past doing:
DECRBY x -9223372036854775808
would succeed (and create an invalid result) and now this returns an error.
2021-10-03 09:38:05 +03:00
yoav-steinberg
93e8534713
Remove argument count limit, dynamically grow argv. (#9528)
Remove hard coded multi-bulk limit (was 1,048,576), new limit is INT_MAX.
When client sends an m-bulk that's higher than 1024, we initially only allocate
the argv array for 1024 arguments, and gradually grow that allocation as arguments
are received.
2021-10-03 09:13:09 +03:00
Binbin
dd3ac97ffe
Cleanup typos, incorrect comments, and fixed small memory leak in redis-cli (#9153)
1. Remove forward declarations from header files to functions that do not exist:
hmsetCommand and rdbSaveTime.
2. Minor phrasing fixes in #9519
3. Add missing sdsfree(title) and fix typo in redis-benchmark.
4. Modify some error comments in some zset commands.
5. Fix copy-paste bug comment in syncWithMaster about `ip-address`.
2021-10-02 22:19:33 -07:00
Viktor Söderqvist
9a3bd07e9f
Unify dbSyncDelete and dbAsyncDelete (#9573)
Just a cleanup to make the code easier to maintain and reduce the risk of something being overlooked.
2021-10-01 15:49:33 +03:00
Yunier Pérez
12e4f31d94
Allow to override OPENSSL_PREFIX (#9567)
While the original issue was on Linux, this should work for other
platforms as well.
2021-09-30 15:51:19 +03:00
Hanna Fadida
ffafb434fb
Modules: add RM_LoadDataTypeFromStringEncver (#9537)
adding an advanced api to enable loading data that was sereialized with a specific encoding version
2021-09-30 11:21:32 +03:00
Wen Hui
2c38caa176
adding missing error check for fstat (#9532) 2021-09-28 21:10:33 -07:00
Ozan Tezcan
4be2dd6ab9
Use __common__ attribute in redismodule.h for Clang C builds (#9541) 2021-09-27 09:51:33 +03:00
Oran Agra
5a4ab7c7d2
Fix stream sanitization for non-int first value (#9553)
This was recently broken in #9321 when we validated stream IDs to be
integers but did that after to the stepping next record instead of before.
2021-09-26 18:46:22 +03:00
yoav-steinberg
6600253046
Client eviction ci issues (#9549)
Fixing CI test issues introduced in #8687
- valgrind warnings in readQueryFromClient when client was freed by processInputBuffer
- adding DEBUG pause-cron for tests not to be time dependent.
- skipping a test that depends on socket buffers / events not compatible with TLS
- making sure client got subscribed by not using deferring client
2021-09-26 17:45:02 +03:00
chenyang8094
7c1f9ef503
Fix obtain the AOF file length error when load AOF (#9510)
this was a regression from #9012 (not released yet)
2021-09-26 10:24:52 +03:00
Ozan Tezcan
3ff56a6dde
Fix crash due to free() call for a string literal in redis-benchmark (#9546) 2021-09-24 22:03:19 +03:00
sundb
9967a53f4c
Use dictGetFairRandomKey() for HRANDFIELD,SRANDMEMBER,ZRANDMEMBER (#9538)
In the `HRANDFIELD`, `SRANDMEMBER` and `ZRANDMEMBER` commands,
There are some strategies that could in some rare cases return an unfair random.
these cases are where s small dict happens be be hashed unevenly.

Specifically when `count*ZRANDMEMBER_SUB_STRATEGY_MUL > size`,
using `dictGetRandomKey` to randomize from a dict will result in an unfair random result.
2021-09-24 17:36:26 +03:00
Huang Zhw
f0fab99d6f
Minor optimize getMaxmemoryState, when server.maxmemory is not set (#9533)
Minor optimize getMaxmemoryState, when server.maxmemory is not set,
don't count AOF and replicas buffers.

Co-authored-by: Viktor Söderqvist <viktor@zuiderkwast.se>
2021-09-23 17:12:11 +03:00
Yossi Gottlieb
bebc7f8470
Add RM_TrimStringAllocation(). (#9540)
This commit makes it possible to explicitly trim the allocation of a
RedisModuleString.

Currently, Redis automatically trims strings that have been retained by
a module command when it returns. However, this is not thread safe and
may result with corruption in threaded modules.

Supporting explicit trimming offers a backwards compatible workaround to
this problem.
2021-09-23 15:00:37 +03:00
yoav-steinberg
2753429c99
Client eviction (#8687)
### Description
A mechanism for disconnecting clients when the sum of all connected clients is above a
configured limit. This prevents eviction or OOM caused by accumulated used memory
between all clients. It's a complimentary mechanism to the `client-output-buffer-limit`
mechanism which takes into account not only a single client and not only output buffers
but rather all memory used by all clients.

#### Design
The general design is as following:
* We track memory usage of each client, taking into account all memory used by the
  client (query buffer, output buffer, parsed arguments, etc...). This is kept up to date
  after reading from the socket, after processing commands and after writing to the socket.
* Based on the used memory we sort all clients into buckets. Each bucket contains all
  clients using up up to x2 memory of the clients in the bucket below it. For example up
  to 1m clients, up to 2m clients, up to 4m clients, ...
* Before processing a command and before sleep we check if we're over the configured
  limit. If we are we start disconnecting clients from larger buckets downwards until we're
  under the limit.

#### Config
`maxmemory-clients` max memory all clients are allowed to consume, above this threshold
we disconnect clients.
This config can either be set to 0 (meaning no limit), a size in bytes (possibly with MB/GB
suffix), or as a percentage of `maxmemory` by using the `%` suffix (e.g. setting it to `10%`
would mean 10% of `maxmemory`).

#### Important code changes
* During the development I encountered yet more situations where our io-threads access
  global vars. And needed to fix them. I also had to handle keeps the clients sorted into the
  memory buckets (which are global) while their memory usage changes in the io-thread.
  To achieve this I decided to simplify how we check if we're in an io-thread and make it
  much more explicit. I removed the `CLIENT_PENDING_READ` flag used for checking
  if the client is in an io-thread (it wasn't used for anything else) and just used the global
  `io_threads_op` variable the same way to check during writes.
* I optimized the cleanup of the client from the `clients_pending_read` list on client freeing.
  We now store a pointer in the `client` struct to this list so we don't need to search in it
  (`pending_read_list_node`).
* Added `evicted_clients` stat to `INFO` command.
* Added `CLIENT NO-EVICT ON|OFF` sub command to exclude a specific client from the
  client eviction mechanism. Added corrosponding 'e' flag in the client info string.
* Added `multi-mem` field in the client info string to show how much memory is used up
  by buffered multi commands.
* Client `tot-mem` now accounts for buffered multi-commands, pubsub patterns and
  channels (partially), tracking prefixes (partially).
* CLIENT_CLOSE_ASAP flag is now handled in a new `beforeNextClient()` function so
  clients will be disconnected between processing different clients and not only before sleep.
  This new function can be used in the future for work we want to do outside the command
  processing loop but don't want to wait for all clients to be processed before we get to it.
  Specifically I wanted to handle output-buffer-limit related closing before we process client
  eviction in case the two race with each other.
* Added a `DEBUG CLIENT-EVICTION` command to print out info about the client eviction
  buckets.
* Each client now holds a pointer to the client eviction memory usage bucket it belongs to
  and listNode to itself in that bucket for quick removal.
* Global `io_threads_op` variable now can contain a `IO_THREADS_OP_IDLE` value
  indicating no io-threading is currently being executed.
* In order to track memory used by each clients in real-time we can't rely on updating
  these stats in `clientsCron()` alone anymore. So now I call `updateClientMemUsage()`
  (used to be `clientsCronTrackClientsMemUsage()`) after command processing, after
  writing data to pubsub clients, after writing the output buffer and after reading from the
  socket (and maybe other places too). The function is written to be fast.
* Clients are evicted if needed (with appropriate log line) in `beforeSleep()` and before
  processing a command (before performing oom-checks and key-eviction).
* All clients memory usage buckets are grouped as follows:
  * All clients using less than 64k.
  * 64K..128K
  * 128K..256K
  * ...
  * 2G..4G
  * All clients using 4g and up.
* Added client-eviction.tcl with a bunch of tests for the new mechanism.
* Extended maxmemory.tcl to test the interaction between maxmemory and
  maxmemory-clients settings.
* Added an option to flag a numeric configuration variable as a "percent", this means that
  if we encounter a '%' after the number in the config file (or config set command) we
  consider it as valid. Such a number is store internally as a negative value. This way an
  integer value can be interpreted as either a percent (negative) or absolute value (positive).
  This is useful for example if some numeric configuration can optionally be set to a percentage
  of something else.

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-09-23 14:02:16 +03:00
YaacovHazan
a56d4533b7
Adding ACL support for modules (#9309)
This commit introduced a new flag to the RM_Call:
'C' - Check if the command can be executed according to the ACLs associated with it.

Also, three new API's added to check if a command, key, or channel can be executed or accessed
by a user, according to the ACLs associated with it.
- RM_ACLCheckCommandPerm
- RM_ACLCheckKeyPerm
- RM_ACLCheckChannelPerm

The user for these API's is a RedisModuleUser object, that for a Module user returned by the RM_CreateModuleUser API, or for a general ACL user can be retrieved by these two new API's:
- RM_GetCurrentUserName - Retrieve the user name of the client connection behind the current context.
- RM_GetModuleUserFromUserName - Get a RedisModuleUser from a user name

As a result of getting a RedisModuleUser from name, it can now also access the general ACL users (not just ones created by the module).
This mean the already existing API RM_SetModuleUserACL(), can be used to change the ACL rules for such users.
2021-09-23 08:52:56 +03:00
Binbin
14d6abd8e9
Add ZMPOP/BZMPOP commands. (#9484)
This is similar to the recent addition of LMPOP/BLMPOP (#9373), but zset.

Syntax for the new ZMPOP command:
`ZMPOP numkeys [<key> ...] MIN|MAX [COUNT count]`

Syntax for the new BZMPOP command:
`BZMPOP timeout numkeys [<key> ...] MIN|MAX [COUNT count]`

Some background:
- ZPOPMIN/ZPOPMAX take only one key, and can return multiple elements.
- BZPOPMIN/BZPOPMAX take multiple keys, but return only one element from just one key.
- ZMPOP/BZMPOP can take multiple keys, and can return multiple elements from just one key.

Note that ZMPOP/BZMPOP can take multiple keys, it eventually operates on just on key.
And it will propagate as ZPOPMIN or ZPOPMAX with the COUNT option.

As new commands, if we can not pop any elements, the response like:
- ZMPOP: Return a NIL in both RESP2 and RESP3, unlike ZPOPMIN/ZPOPMAX return emptyarray.
- BZMPOP: Return a NIL in both RESP2 and RESP3 when timeout is reached, like BZPOPMIN/BZPOPMAX.

For the normal response is nested arrays in RESP2 and RESP3:
```
ZMPOP/BZMPOP
1) keyname
2) 1) 1) member1
      2) score1
   2) 1) member2
      2) score2

In RESP2:
1) "myzset"
2) 1) 1) "three"
      2) "3"
   2) 1) "two"
      2) "2"

In RESP3:
1) "myzset"
2) 1) 1) "three"
      2) (double) 3
   2) 1) "two"
      2) (double) 2
```
2021-09-23 08:34:40 +03:00
chenyang8094
59c9716f96
Fix redis-check-rdb rdb_type_string initialization error (#9534) 2021-09-22 10:40:31 +03:00
Wen Hui
572abee6f1
Add const for relevant method parameters in redis-benchmark (#9516) 2021-09-20 09:32:33 -07:00
郭伟光
67fcbdf8a1
replace redis-trib to redis-cli since it is no longer available (#9525) 2021-09-20 09:30:22 -07:00
Wen Hui
7c376398b1
CLIENT LIST / INFO show resp version (#9508) 2021-09-19 12:13:46 +03:00
Binbin
f898a9e97d
Adds limit to SINTERCARD/ZINTERCARD. (#9425)
Implements the [LIMIT limit] variant of SINTERCARD/ZINTERCARD.
Now with the LIMIT, we can stop the searching when cardinality
reaching the limit, and return the cardinality ASAP.

Note that in SINTERCARD, the old synatx was: `SINTERCARD key [key ...]`
In order to add a optional parameter, we must break the old synatx.
So the new syntax of SINTERCARD will be consistent with ZINTERCARD.
New syntax: `SINTERCARD numkeys key [key ...] [LIMIT limit]`.

Note that this means that SINTERCARD has a different syntax than
SINTER and SINTERSTORE (taking numkeys argument)

As for ZINTERCARD, we can easily add a optional parameter to it.
New syntax: `ZINTERCARD numkeys key [key ...] [LIMIT limit]`
2021-09-16 14:07:08 +03:00
guybe7
08f4e1335c
createSharedObjects: zopomin and zpopmax apeear twice (#9505)
Introduced by https://github.com/redis/redis/pull/9502
2021-09-15 15:29:35 +03:00
guybe7
7759ec7c43
Cleanup: propagate and alsoPropagate do not need redisCommand (#9502)
The `cmd` argument was completely unused, and all the code that bothered to pass it was unnecessary.
This is a prepartion for a future commit that treats subcommands as commands
2021-09-15 12:53:42 +03:00
guybe7
03fcc211de
A better approach for COMMAND INFO for movablekeys commands (#8324)
Fix #7297

The problem:

Today, there is no way for a client library or app to know the key name indexes for commands such as
ZUNIONSTORE/EVAL and others with "numkeys", since COMMAND INFO returns no useful info for them.

For cluster-aware redis clients, this requires to 'patch' the client library code specifically for each of these commands or to
resolve each execution of these commands with COMMAND GETKEYS.

The solution:

Introducing key specs other than the legacy "range" (first,last,step)

The 8th element of the command info array, if exists, holds an array of key specs. The array may be empty, which indicates
the command doesn't take any key arguments or may contain one or more key-specs, each one may leads to the discovery
of 0 or more key arguments.

A client library that doesn't support this key-spec feature will keep using the first,last,step and movablekeys flag which will
obviously remain unchanged.

A client that supports this key-specs feature needs only to look at the key-specs array. If it finds an unrecognized spec, it
must resort to using COMMAND GETKEYS if it wishes to get all key name arguments, but if all it needs is one key in order
to know which cluster node to use, then maybe another spec (if the command has several) can supply that, and there's no
need to use GETKEYS.

Each spec is an array of arguments, first one is the spec name, the second is an array of flags, and the third is an array
containing details about the spec (specific meaning for each spec type)
The initial flags we support are "read" and "write" indicating if the keys that this key-spec finds are used for read or for write.
clients should ignore any unfamiliar flags.

In order to easily find the positions of keys in a given array of args we introduce keys specs. There are two logical steps of
key specs:
1. `start_search`: Given an array of args, indicate where we should start searching for keys
2. `find_keys`: Given the output of start_search and an array of args, indicate all possible indices of keys.

### start_search step specs
- `index`: specify an argument index explicitly
  - `index`: 0 based index (1 means the first command argument)
- `keyword`: specify a string to match in `argv`. We should start searching for keys just after the keyword appears.
  - `keyword`: the string to search for
  - `start_search`: an index from which to start the keyword search (can be negative, which means to search from the end)

Examples:
- `SET` has start_search of type `index` with value `1`
- `XREAD` has start_search of type `keyword` with value `[“STREAMS”,1]`
- `MIGRATE` has start_search of type `keyword` with value `[“KEYS”,-2]`

### find_keys step specs
- `range`: specify `[count, step, limit]`.
  - `lastkey`: index of the last key. relative to the index returned from begin_search. -1 indicating till the last argument, -2 one before the last
  - `step`: how many args should we skip after finding a key, in order to find the next one
  - `limit`: if count is -1, we use limit to stop the search by a factor. 0 and 1 mean no limit. 2 means ½ of the remaining args, 3 means ⅓, and so on.
- “keynum”: specify `[keynum_index, first_key_index, step]`.
  - `keynum_index`: is relative to the return of the `start_search` spec.
  - `first_key_index`: is relative to `keynum_index`.
  - `step`: how many args should we skip after finding a key, in order to find the next one

Examples:
- `SET` has `range` of `[0,1,0]`
- `MSET` has `range` of `[-1,2,0]`
- `XREAD` has `range` of `[-1,1,2]`
- `ZUNION` has `start_search` of type `index` with value `1` and `find_keys` of type `keynum` with value `[0,1,1]`
- `AI.DAGRUN` has `start_search` of type `keyword` with value `[“LOAD“,1]` and `find_keys` of type `keynum` with value
  `[0,1,1]` (see https://oss.redislabs.com/redisai/master/commands/#aidagrun)

Note: this solution is not perfect as the module writers can come up with anything, but at least we will be able to find the key
args of the vast majority of commands.
If one of the above specs can’t describe the key positions, the module writer can always fall back to the `getkeys-api` option.

Some keys cannot be found easily (`KEYS` in `MIGRATE`: Imagine the argument for `AUTH` is the string “KEYS” - we will
start searching in the wrong index). 
The guarantee is that the specs may be incomplete (`incomplete` will be specified in the spec to denote that) but we never
report false information (assuming the command syntax is correct).
For `MIGRATE` we start searching from the end - `startfrom=-1` - and if one of the keys is actually called "keys" we will
report only a subset of all keys - hence the `incomplete` flag.
Some `incomplete` specs can be completely empty (i.e. UNKNOWN begin_search) which should tell the client that
COMMAND GETKEYS (or any other way to get the keys) must be used (Example: For `SORT` there is no way to describe
the STORE keyword spec, as the word "store" can appear anywhere in the command).

We will expose these key specs in the `COMMAND` command so that clients can learn, on startup, where the keys are for
all commands instead of holding hardcoded tables or use `COMMAND GETKEYS` in runtime.

Comments:
1. Redis doesn't internally use the new specs, they are only used for COMMAND output.
2. In order to support the current COMMAND INFO format (reply array indices 4, 5, 6) we created a synthetic range, called
   legacy_range, that, if possible, is built according to the new specs.
3. Redis currently uses only getkeys_proc or the legacy_range to get the keys indices (in COMMAND GETKEYS for
   example).

"incomplete" specs:
the command we have issues with are MIGRATE, STRALGO, and SORT
for MIGRATE, because the token KEYS, if exists, must be the last token, we can search in reverse. it one of the keys is
actually the string "keys" will return just a subset of the keys (hence, it's "incomplete")
for SORT and STRALGO we can use this heuristic (the keys can be anywhere in the command) and therefore we added a
key spec that is both "incomplete" and of "unknown type"

if a client encounters an "incomplete" spec it means that it must find a different way (either COMMAND GETKEYS or have
its own parser) to retrieve the keys.
please note that all commands, apart from the three mentioned above, have "complete" key specs
2021-09-15 11:10:29 +03:00
filipe oliveira
b5a879e1c2
Added URI support to redis-benchmark (cli and benchmark share the same uri-parsing methods) (#9314)
- Add `-u <uri>` command line option to support `redis://` URI scheme.
- included server connection information object (`struct cliConnInfo`),
  used to describe an ip:port pair, db num user input, and user:pass to
  avoid a large number of function arguments.
- Using sds on connection info strings for redis-benchmark/redis-cli

Co-authored-by: yoav-steinberg <yoav@monfort.co.il>
2021-09-14 19:45:06 +03:00
Viktor Söderqvist
ea36d4de17
Modules: Add remaining list API functions (#8439)
List functions operating on elements by index:

* RM_ListGet
* RM_ListSet
* RM_ListInsert
* RM_ListDelete

Iteration is done using a simple for loop over indices.
The index based functions use an internal iterator as an optimization.
This is explained in the docs:

```
 * Many of the list functions access elements by index. Since a list is in
 * essence a doubly-linked list, accessing elements by index is generally an
 * O(N) operation. However, if elements are accessed sequentially or with
 * indices close together, the functions are optimized to seek the index from
 * the previous index, rather than seeking from the ends of the list.
 *
 * This enables iteration to be done efficiently using a simple for loop:
 *
 *     long n = RM_ValueLength(key);
 *     for (long i = 0; i < n; i++) {
 *         RedisModuleString *elem = RedisModule_ListGet(key, i);
 *         // Do stuff...
 *     }
```
2021-09-14 17:48:06 +03:00
zhaozhao.zz
794442b130
PSYNC2: make partial sync possible after master reboot (#8015)
The main idea is how to allow a master to load replication info from RDB file when rebooting, if master can load replication info it means that replicas may have the chance to psync with master, it can save much traffic.

The key point is we need guarantee safety and consistency, so there
are two differences between master and replica:

1. master would load the replication info as secondary ID and
   offset, in case other masters have the same replid.
2. when master loading RDB, it would propagate expired keys as DEL
   command to replication backlog, then replica can receive these
   commands to delete stale keys.
   p.s. the expired keys when RDB loading is useful for users, so
   we show it as `rdb_last_load_keys_expired` and `rdb_last_load_keys_loaded` in info persistence.

Moreover, after load replication info, master should update
`no_replica_time` in case loading RDB cost too long time.
2021-09-13 15:39:11 +08:00
Huang Zhw
75dd230994
bitpos/bitcount add bit index (#9324)
Make bitpos/bitcount support bit index:

```
BITPOS key bit [start [end [BIT|BYTE]]]
BITCOUNT key [start end [BIT|BYTE]]
```

The default behavior is `BYTE`, so these commands are still compatible with old.
2021-09-12 11:31:22 +03:00
David CARLIER
418c2e7931
TLS build fix on OpenBSD when built with LibreSSL. (#9486) 2021-09-11 22:54:09 +03:00
zhaozhao.zz
d7fa44f4da
init client pause value in more appropriate place (#9479) 2021-09-10 14:02:45 +08:00
sundb
3ca6972ecd
Replace all usage of ziplist with listpack for t_zset (#9366)
Part two of implementing #8702 (zset), after #8887.

## Description of the feature
Replaced all uses of ziplist with listpack in t_zset, and optimized some of the code to optimize performance.

## Rdb format changes
New `RDB_TYPE_ZSET_LISTPACK` rdb type.

## Rdb loading improvements:
1) Pre-expansion of dict for validation of duplicate data for listpack and ziplist.
2) Simplifying the release of empty key objects when RDB loading.
3) Unify ziplist and listpack data verify methods for zset and hash, and move code to rdb.c.

## Interface changes
1) New `zset-max-listpack-entries` config is an alias for `zset-max-ziplist-entries` (same with `zset-max-listpack-value`).
2) OBJECT ENCODING will return listpack instead of ziplist.

## Listpack improvements:
1) Add `lpDeleteRange` and `lpDeleteRangeWithEntry` functions to delete a range of entries from listpack.
2) Improve the performance of `lpCompare`, converting from string to integer is faster than converting from integer to string.
3) Replace `snprintf` with `ll2string` to improve performance in converting numbers to strings in `lpGet()`.

## Zset improvements:
1) Improve the performance of `zzlFind` method, use `lpFind` instead of `lpCompare` in a loop.
2) Use `lpDeleteRangeWithEntry` instead of `lpDelete` twice to delete a element of zset.

## Tests
1) Add some unittests for `lpDeleteRange` and `lpDeleteRangeWithEntry` function.
2) Add zset RDB loading test.
3) Add benchmark test for `lpCompare` and `ziplsitCompare`.
4) Add empty listpack zset corrupt dump test.
2021-09-09 18:18:53 +03:00
Madelyn Olson
86b0de5c41
Remove redundant validation and prevent duplicate users during ACL load (#9330)
Throw an error when a user is provided multiple times on the command line instead of silently throwing one of them away.
Remove unneeded validation for validating users on ACL load.
2021-09-09 07:40:33 -07:00
yancz2000
47c001dde6
Add make test-cluster option (#9478)
Add make test-cluster option
2021-09-09 06:52:21 -07:00
yvette903
f560531d5b
Fix: client pause uses an old timeout (#9477)
A write request may be paused unexpectedly because `server.client_pause_end_time` is old.

**Recreate this:**
redis-cli -p 6379
127.0.0.1:6379> client pause 500000000 write
OK
127.0.0.1:6379> client unpause
OK
127.0.0.1:6379> client pause 10000 write
OK
127.0.0.1:6379> set key value

The write request `set key value` is paused util  the timeout of 500000000 milliseconds was reached.

**Fix:**
reset `server.client_pause_end_time` = 0 in `unpauseClients`
2021-09-09 13:44:48 +03:00
Binbin
c50af0aeba
Add LMPOP/BLMPOP commands. (#9373)
We want to add COUNT option for BLPOP.
But we can't do it without breaking compatibility due to the command arguments syntax.
So this commit introduce two new commands.

Syntax for the new LMPOP command:
`LMPOP numkeys [<key> ...] LEFT|RIGHT [COUNT count]`

Syntax for the new BLMPOP command:
`BLMPOP timeout numkeys [<key> ...] LEFT|RIGHT [COUNT count]`

Some background:
- LPOP takes one key, and can return multiple elements.
- BLPOP takes multiple keys, but returns one element from just one key.
- LMPOP can take multiple keys and return multiple elements from just one key.

Note that LMPOP/BLMPOP  can take multiple keys, it eventually operates on just one key.
And it will propagate as LPOP or RPOP with the COUNT option.

As a new command, it still return NIL if we can't pop any elements.
For the normal response is nested arrays in RESP2 and RESP3, like:
```
LMPOP/BLMPOP 
1) keyname
2) 1) element1
   2) element2
```
I.e. unlike BLPOP that returns a key name and one element so it uses a flat array,
and LPOP that returns multiple elements with no key name, and again uses a flat array,
this one has to return a nested array, and it does for for both RESP2 and RESP3 (like SCAN does)

Some discuss can see: #766 #8824
2021-09-09 12:02:33 +03:00
Huang Zhw
216f168b2b
Add INFO total_active_defrag_time and current_active_defrag_time (#9377)
Add two INFO metrics:
```
total_active_defrag_time:12345
current_active_defrag_time:456
```
`current_active_defrag_time` if greater than 0, means how much time has
passed since active defrag started running. If active defrag stops, this metric is reset to 0.
`total_active_defrag_time` means total time the fragmentation
was over the defrag threshold since the server started.

This is a followup PR for #9031
2021-09-09 11:38:10 +03:00
Wang Yuan
cee3d67f50
Delay to discard cached master when full synchronization (#9398)
* Delay to discard cache master when full synchronization
* Don't disconnect with replicas before loading transferred RDB when full sync

Previously, once replica need to start full synchronization with master,
it will discard cached master whatever full synchronization is failed or
not. 
Now we discard cached master only when transferring RDB is finished
and start to change data space, this make replica could start partial
resynchronization with another new master if new master is failed
during full synchronization.
2021-09-09 11:32:29 +03:00
chenyang8094
bc0c22fabc
Fix callReplyParseCollection memleak when use AutoMemory (#9446)
When parsing an array type reply, ctx will be lost when recursively parsing its
elements, which will cause a memory leak in automemory mode.

This is a result of the changes in #9202

Add test for callReplyParseCollection fix
2021-09-09 11:03:05 +03:00
chenyang8094
7a0e668560
Add stdlib.h for RedisModule_Assert (#9470) 2021-09-08 23:22:01 +03:00
zhaozhao.zz
1b83353dc3
Fix wrong offset when replica pause (#9448)
When a replica paused, it would not apply any commands event the command comes from master, if we feed the non-applied command to replication stream, the replication offset would be wrong, and data would be lost after failover(since replica's `master_repl_offset` grows but command is not applied).

To fix it, here are the changes:
* Don't update replica's replication offset or propagate commands to sub-replicas when it's paused in `commandProcessed`.
* Show `slave_read_repl_offset` in info reply.
* Add an assert to make sure master client should never be blocked unless pause or module (some modules may use block way to do background (parallel) processing and forward original block module command to the replica, it's not a good way but it can work, so the assert excludes module now, but someday in future all modules should rewrite block command to propagate like what `BLPOP` does).
2021-09-08 16:07:25 +08:00
Viktor Söderqvist
547c3405d4
Optimize quicklistIndex to seek from the nearest end (#9454)
Until now, giving a negative index seeks from the end of a list and a
positive seeks from the beginning. This change makes it seek from
the nearest end, regardless of the sign of the given index.

quicklistIndex is used by all list commands which operate by index.

LINDEX key 999999 in a list if 1M elements is greately optimized by
this change. Latency is cut by 75%.

LINDEX key -1000000 in a list of 1M elements, likewise.

LRANGE key -1 -1 is affected by this, since LRANGE converts the
indices to positive numbers before seeking.

The tests for corrupt dumps are updated to make sure the corrup
data is seeked in the same direction as before.
2021-09-06 09:12:38 +03:00
guybe7
6aa2285e32
Fix two minor bugs (MIGRATE key args and getKeysUsingCommandTable) (#9455)
1. MIGRATE has a potnetial key arg in argv[3]. It should be reflected in the command table.
2. getKeysUsingCommandTable should never free getKeysResult, it is always freed by the caller)
   The reason we never encountered this double-free bug is that almost always getKeysResult
   uses the statis buffer and doesn't allocate a new one.
2021-09-02 17:19:27 +03:00
sundb
306a5ccd2d
Fix the timing of read and write events under kqueue (#9416)
Normally we execute the read event first and then the write event.
When the barrier is set, we will do it reverse.
However, under `kqueue`, if an `fd` has both read and write events,
reading the event using `kevent` will generate two events, which will
result in uncontrolled read and write timing.

This also means that the guarantees of AOF `appendfsync` = `always` are
not met on MacOS without this fix.

The main change to this pr is to cache the events already obtained when reading
them, so that if the same `fd` occurs again, only the mask in the cache is updated,
rather than a new event is generated.

This was exposed by the following test failure on MacOS:
```
*** [err]: AOF fsync always barrier issue in tests/integration/aof.tcl
Expected 544 != 544 (context: type eval line 26 cmd {assert {$size1 != $size2}} proc ::test)
```
2021-09-02 11:07:51 +03:00
Viktor Söderqvist
f24c63a292
Slot-to-keys using dict entry metadata (#9356)
* Enhance dict to support arbitrary metadata carried in dictEntry

Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>

* Rewrite slot-to-keys mapping to linked lists using dict entry metadata

This is a memory enhancement for Redis Cluster.

The radix tree slots_to_keys (which duplicates all key names prefixed with their
slot number) is replaced with a linked list for each slot. The dict entries of
the same cluster slot form a linked list and the pointers are stored as metadata
in each dict entry of the main DB dict.

This commit also moves the slot-to-key API from db.c to cluster.c.

Co-authored-by: Jim Brunner <brunnerj@amazon.com>
2021-08-30 23:25:36 -07:00
Wang Yuan
9a0c0617f1
Use sync_file_range to optimize fsync if possible (#9409)
We implement incremental data sync in rio.c by call fsync, on slow disk, that may cost a lot of time,
sync_file_range could provide async fsync, so we could serialize key/value and sync file data at the same time.

> one tip for sync_file_range usage: http://lkml.iu.edu/hypermail/linux/kernel/1005.2/01845.html

Additionally, this change avoids a single large write to be used, which can result in a mass of dirty
pages in the kernel (increasing the risk of someone else's write to block).

On HDD, current solution could reduce approximate half of dumping RDB time,
this PR costs 50s for dump 7.7G rdb but unstable branch costs 93s.
On NVME SSD, this PR can't reduce much time,  this PR costs 40s, unstable branch costs 48s.

Moreover, I find calling data sync every 4MB is better than 32MB.
2021-08-30 10:24:53 +03:00
Binbin
aefbc23451
Better error handling for updateClientOutputBufferLimit. (#9308)
This one follow #9313 and goes deeper (validation of config file parsing)

Move the check/update logic to a new updateClientOutputBufferLimit
function. So that it can be used in CONFIG SET and config file parsing.
2021-08-29 15:03:05 +03:00
Viktor Söderqvist
97dcf95cc8
redis-benchmark: improved help and warnings (#9419)
1. The output of --help:

  * On the Usage line, just write [OPTIONS] [COMMAND ARGS...] instead listing
    only a few arbitrary options and no command.
  * For --cluster, describe that if the command is supplied on the command line,
    the key must contain "{tag}". Otherwise, the command will not be sent to the
    right cluster node.
  * For -r, add a note that if -r is omitted, all commands in a benchmark will
    use the same key. Also align the description.
  * For -t, describe that -t is ignored if a command is supplied on the command
    line.

2. Print a warning if -t is present when a specific command is supplied.

3. Print all warnings and errors to stderr.

4. Remove -e from calls in redis-benchmark test suite.
2021-08-29 14:31:08 +03:00
Huang Zhw
b375f5919e
redis-benchmark: make show throughput in only one thread. (#9146)
In multipe threads mode, every thread output throughput info. This
may cause some problems:
- Bug in https://github.com/redis/redis/pull/8615;
- The show throughput is called too frequently;
- showThroughput which updates shared variable lacks synchronization
mechanism.

This commit also reverts changes in #8615 and changes time event
interval to macro.
2021-08-25 14:58:35 +03:00
Garen Chan
945a83d406
Fix boundary problem of adjusting open files limit. (#5722)
When `decr_step` is greater than `oldlimit`, the final `bestlimit` may be invalid.

    For example, oldlimit = 10, decr_step = 16.
    Current bestlimit = 15 and setrlimit() failed. Since bestlimit  is less than decr_step , then exit the loop.
    The final bestlimit is larger than oldlimit but is invalid.

Note that this only matters if the system fd limit is below 16, so unlikely to have any actual effect.
2021-08-24 22:54:21 +03:00
Wen Hui
641780a9c6
config memory limits: handle values larger than (signed) LLONG_MAX (#9313)
This aims to solve the issue in CONFIG SET maxmemory can only set maxmemory to up
to 9223372036854775807 (2^63) while the maxmemory should be ULLONG.
Added a memtoull function to convert a string representing an amount of memory
into the number of bytes (similar to memtoll but for ull). Also added ull2string to
convert a ULLong to string (Similar to ll2string).
2021-08-23 21:00:40 +03:00
Viktor Söderqvist
74590f8345
redis-cli: Assert > 0 before dividing, to silence warning by tool (#9396)
Also make sure function can't return NULL by another assert.
2021-08-22 13:57:18 +03:00
Binbin
0835f596b8
BITSET and BITFIELD SET only propagate command when the value changed. (#9403)
In old way, we always increase server.dirty in BITSET and BITFIELD SET.
Even the command doesn't really change anything. This commit make 
sure BITSET and BITFIELD SET only increase dirty when the value changed.

Because of that, if the value not changed, some others implications:
- Avoid adding useless AOF
- Reduce replication traffic
- Will not trigger keyspace notifications (setbit)
- Will not invalidate WATCH
- Will not sent the invalidation message to the tracking client
2021-08-22 10:20:53 +03:00
Viktor Söderqvist
8f59c1ecae
Let CONFIG GET * show both replicaof and its alias (#9395) 2021-08-21 19:43:18 -07:00
sundb
492d8d0961
Sanitize dump payload: fix double free after insert dup nodekey to stream rax and returns 0 (#9399) 2021-08-20 10:37:45 +03:00
yoav-steinberg
0e8d469f82
More generic crash report for unsupported archs (#9385)
Following compilation warnings on s390x.
2021-08-18 15:46:11 +03:00
Yossi Gottlieb
fe359cbfc2
Fix: don't assume char is unsigned. (#9375)
On systems that have unsigned char by default (s390x, arm), redis-server
could crash as soon as it populates the command table.
2021-08-15 21:37:44 +03:00
Wang Yuan
8edc3cd62c
Fix the wrong detection of sync_file_range system call (#9371)
If we want to check `defined(SYNC_FILE_RANGE_WAIT_BEFORE)`, we should include fcntl.h.
otherwise, SYNC_FILE_RANGE_WAIT_BEFORE is not defined, and there is alway not `sync_file_range` system call.
Introduced by #8532
2021-08-14 23:52:44 +03:00
Madelyn Olson
0cf2df84d4
Added additional validation for cluster SETSLOT (#9360) 2021-08-12 14:59:17 -07:00
Madelyn Olson
2402f5a7a1
Update cluster debug log to include human readable packet type (#9361) 2021-08-12 14:50:09 -07:00
Yossi Gottlieb
1221f7cd5e
Improve setup operations order after fork. (#9365)
The order of setting things up follows some reasoning: Setup signal
handlers first because a signal could fire at any time. Adjust OOM score
before everything else to assist the OOM killer if memory resources are
low.

The trigger for this is a valgrind test failure which resulted with the
child catching a SIGUSR1 before initializing the handler.
2021-08-12 14:31:12 +03:00
sundb
5705cec68e
Fix missing dismiss hash listpack memory due to ziplist->listpack migration (#9353) 2021-08-10 16:54:19 +03:00
Huang Zhw
82528d2678
Redis-cli monitor and pubsub can be aborted with Ctrl+C, keeping the cli alive (#9347)
Abort cli blocking modes with SIGINT without exiting the cli.

Co-authored-by: charsyam <charsyam@gmail.com>
2021-08-10 15:03:49 +03:00
DarrenJiang13
8ab33c18e4
fix a compilation error around madvise when make with jemalloc on MacOS (#9350)
We only use MADV_DONTNEED on Linux, that's were it was tested.
2021-08-10 11:32:27 +03:00
Meir Shpilraien (Spielrein)
8f8117f78e
Format fixes and naming. SentReplyOnKeyMiss -> addReplyOrErrorObject (#9346)
Following the comments on #8659, this PR fix some formatting
and naming issues.
2021-08-10 10:19:21 +03:00
sundb
02fd76b97c
Replace all usage of ziplist with listpack for t_hash (#8887)
Part one of implementing #8702 (taking hashes first before other types)

## Description of the feature
1. Change ziplist encoded hash objects to listpack encoding.
2. Convert existing ziplists on RDB loading time. an O(n) operation.

## Rdb format changes
1. Add RDB_TYPE_HASH_LISTPACK rdb type.
2. Bump RDB_VERSION to 10

## Interface changes
1. New `hash-max-listpack-entries` config is an alias for `hash-max-ziplist-entries` (same with `hash-max-listpack-value`)
2. OBJECT ENCODING will return `listpack` instead of `ziplist`

## Listpack improvements:
1. Support direct insert, replace integer element (rather than convert back and forth from string)
3. Add more listpack capabilities to match the ziplist ones (like `lpFind`, `lpRandomPairs` and such)
4. Optimize element length fetching, avoid multiple calculations
5. Use inline to avoid function call overhead.

## Tests
1. Add a new test to the RDB load time conversion
2. Adding the listpack unit tests. (based on the one in ziplist.c)
3. Add a few "corrupt payload: fuzzer findings" tests, and slightly modify existing ones.

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-08-10 09:18:49 +03:00
sundb
cbda492909
Sanitize dump payload: handle remaining empty key when RDB loading and restore command (#9349)
This commit mainly fixes empty keys due to RDB loading and restore command,
which was omitted in #9297.

1) When loading quicklsit, if all the ziplists in the quicklist are empty, NULL will be returned.
    If only some of the ziplists are empty, then we will skip the empty ziplists silently.
2) When loading hash zipmap, if zipmap is empty, sanitization check will fail.
3) When loading hash ziplist, if ziplist is empty, NULL will be returned.
4) Add RDB loading test with sanitize.
2021-08-09 17:13:46 +03:00
Qu Chen
0b643e930d
Cleanup: createAOFClient uses createClient to avoid overlooked mismatches (#9338)
AOF fake client creation (createAOFClient) was doing similar work as createClient,
with some minor differences, most of which unintended, this was dangerous and
meant that many changes to createClient should have always been reflected to aof.c

This cleanup changes createAOFClient to call createClient with NULL, like we
do in module.c and elsewhere.
2021-08-09 11:03:59 +03:00
Eduardo Semprebon
d3356bf614
Add SORT_RO command (#9299)
Add a readonly variant of the STORE command, so it can be used on
read-only workloads (replica, ACL, etc)
2021-08-09 09:40:29 +03:00
Qu Chen
e8eeba7bee
Allow master to replicate command longer than replica's query buffer limit (#9340)
Replication client no longer checks incoming command length against the client-query-buffer-limit. This makes the master able to replicate commands longer than replica's configured client-query-buffer-limit
2021-08-08 17:34:11 -07:00
Yossi Gottlieb
3307958bd0
Propagate OPENSSL_PREFIX to hiredis. (#9345) 2021-08-08 18:30:17 +03:00
Binbin
e1dc979054
sds.c: Fix potential overflow in sdsll2str. (#8910)
Fixes an undefined behavior, same way as our `ll2string` does.
2021-08-08 14:30:47 +03:00
Binbin
563ba7a3f0
Fix the wrong method used in quicklistTest. (#8951)
The test try to test `insert before 1 element`, but it use quicklist
InsertAfter, a copy-paste typo.

The commit also add an assert to verify results in some tests
to make sure it is as expected.
2021-08-08 09:03:52 +03:00
DarrenJiang13
43eb0ce3bf
[BUGFIX] Add some missed error statistics (#9328)
add error counting for some missed behaviors.
2021-08-06 19:27:24 -07:00
yoav-steinberg
0a9377535b
Ignore resize threshold on idle qbuf resizing (#9322)
Also update qbuf tests to verify both idle and peak based resizing logic.
And delete unused function: getClientsMaxBuffers
2021-08-06 20:50:34 +03:00
Oran Agra
0c90370e6d
Improvements to corrupt payload sanitization (#9321)
Recently we found two issues in the fuzzer tester: #9302 #9285
After fixing them, more problems surfaced and this PR (as well as #9297) aims to fix them.

Here's a list of the fixes
- Prevent an overflow when allocating a dict hashtable
- Prevent OOM when attempting to allocate a huge string
- Prevent a few invalid accesses in listpack
- Improve sanitization of listpack first entry
- Validate integrity of stream consumer groups PEL
- Validate integrity of stream listpack entry IDs
- Validate ziplist tail followed by extra data which start with 0xff

Co-authored-by: sundb <sundbcn@gmail.com>
2021-08-05 22:56:14 +03:00
sundb
8ea777a6a0
Sanitize dump payload: fix empty keys when RDB loading and restore command (#9297)
When we load rdb or restore command, if we encounter a length of 0, it will result in the creation of an empty key.
This could either be a corrupt payload, or a result of a bug (see #8453 )

This PR mainly fixes the following:
1) When restore command will return `Bad data format` error.
2) When loading RDB, we will silently discard the key.

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-08-05 22:42:20 +03:00
Madelyn Olson
39a4a44d7d
Add debug config flag to print certain config values on engine crash (#9304)
Add debug config flag to print certain config values on engine crash
2021-08-05 11:59:12 -07:00
Wen Hui
63e2a6d212
Add sentinel debug option command (#9291)
This makes it possible to tune many parameters that were previously hard coded.
We don't intend these to be user configurable, but only used by tests to accelerate certain conditions which would otherwise take a long time and slow down the test suite.

Co-authored-by: Lucas Guang Yang <l84193800@china.huawei.com>
2021-08-05 11:12:55 +03:00
menwen
ca559819f7
Add latency monitor sample when key is deleted via lazy expire (#9317)
Fix that there is no sample latency after the key expires via expireIfNeeded().
Some refactoring for shared code.
2021-08-05 11:09:24 +03:00
yoav-steinberg
d32f8641ed
fix dict access broken by #9228 (#9319) 2021-08-05 09:02:30 +03:00
yoav-steinberg
5e908a290c
dict struct memory optimizations (#9228)
Reduce dict struct memory overhead
on 64bit dict size goes down from jemalloc's 96 byte bin to its 56 byte bin.

summary of changes:
- Remove `privdata` from callbacks and dict creation. (this affects many files, see "Interface change" below).
- Meld `dictht` struct into the `dict` struct to eliminate struct padding. (this affects just dict.c and defrag.c)
- Eliminate the `sizemask` field, can be calculated from size when needed.
- Convert the `size` field into `size_exp` (exponent), utilizes one byte instead of 8.

Interface change: pass dict pointer to dict type call back functions.
This is instead of passing the removed privdata field. In the future if
we'd like to have private data in the callbacks we can extract it from
the dict type. We can extend dictType to include a custom dict struct
allocator and use it to allocate more data at the end of the dict
struct. This data can then be used to store private data later acccessed
by the callbacks.
2021-08-05 08:25:58 +03:00
Wang Yuan
d4bca53cd9
Use madvise(MADV_DONTNEED) to release memory to reduce COW (#8974)
## Backgroud
As we know, after `fork`, one process will copy pages when writing data to these
pages(CoW), and another process still keep old pages, they totally cost more memory.
For redis, we suffered that redis consumed much memory when the fork child is serializing
key/values, even that maybe cause OOM.

But actually we find, in redis fork child process, the child process don't need to keep some
memory and parent process may write or update that, for example, child process will never
access the key-value that is serialized but users may update it in parent process.
So we think it may reduce COW if the child process release memory that it is not needed.

## Implementation
For releasing key value in child process, we may think we call `decrRefCount` to free memory,
but i find the fork child process still use much memory when we don't write any data to redis,
and it costs much more time that slows down bgsave. Maybe because memory allocator doesn't
really release memory to OS, and it may modify some inner data for this free operation, especially
when we free small objects.

Moreover, CoW is based on  pages, so it is a easy way that we only free the memory bulk that is
not less than kernel page size. madvise(MADV_DONTNEED) can quickly release specified region
pages to OS bypassing memory allocator, and allocator still consider that this memory still is used
and don't change its inner data.

There are some buffers we can release in the fork child process:
- **Serialized key-values**
  the fork child process never access serialized key-values, so we try to free them.
  Because we only can release big bulk memory, and it is time consumed to iterate all
  items/members/fields/entries of complex data type. So we decide to iterate them and
  try to release them only when their average size of item/member/field/entry is more
  than page size of OS.
- **Replication backlog**
  Because replication backlog is a cycle buffer, it will be changed quickly if redis has heavy
  write traffic, but in fork child process, we don't need to access that.
- **Client buffers**
  If clients have requests during having the fork child process, clients' buffer also be changed
  frequently. The memory includes client query buffer, output buffer, and client struct used memory.

To get child process peak private dirty memory, we need to count peak memory instead
of last used memory, because the child process may continue to release memory (since
COW used to only grow till now, the last was equivalent to the peak).
Also we're adding a new `current_cow_peak` info variable (to complement the existing
`current_cow_size`)

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-08-04 23:01:46 +03:00
Meir Shpilraien (Spielrein)
2237131e15
Unified Lua and modules reply parsing and added RESP3 support to RM_Call (#9202)
## Current state
1. Lua has its own parser that handles parsing `reds.call` replies and translates them
  to Lua objects that can be used by the user Lua code. The parser partially handles
  resp3 (missing big number, verbatim, attribute, ...)
2. Modules have their own parser that handles parsing `RM_Call` replies and translates
  them to RedisModuleCallReply objects. The parser does not support resp3.

In addition, in the future, we want to add Redis Function (#8693) that will probably
support more languages. At some point maintaining so many parsers will stop
scaling (bug fixes and protocol changes will need to be applied on all of them).
We will probably end up with different parsers that support different parts of the
resp protocol (like we already have today with Lua and modules)

## PR Changes
This PR attempt to unified the reply parsing of Lua and modules (and in the future
Redis Function) by introducing a new parser unit (`resp_parser.c`). The new parser
handles parsing the reply and calls different callbacks to allow the users (another
unit that uses the parser, i.e, Lua, modules, or Redis Function) to analyze the reply.

### Lua API Additions
The code that handles reply parsing on `scripting.c` was removed. Instead, it uses
the resp_parser to parse and create a Lua object out of the reply. As mentioned
above the Lua parser did not handle parsing big numbers, verbatim, and attribute.
The new parser can handle those and so Lua also gets it for free.
Those are translated to Lua objects in the following way:
1. Big Number - Lua table `{'big_number':'<str representation for big number>'}`
2. Verbatim - Lua table `{'verbatim_string':{'format':'<verbatim format>', 'string':'<verbatim string value>'}}`
3. Attribute - currently ignored and not expose to the Lua parser, another issue will be open to decide how to expose it.

Tests were added to check resp3 reply parsing on Lua

### Modules API Additions
The reply parsing code on `module.c` was also removed and the new resp_parser is used instead.
In addition, the RedisModuleCallReply was also extracted to a separate unit located on `call_reply.c`
(in the future, this unit will also be used by Redis Function). A nice side effect of unified parsing is
that modules now also support resp3. Resp3 can be enabled by giving `3` as a parameter to the
fmt argument of `RM_Call`. It is also possible to give `0`, which will indicate an auto mode. i.e, Redis
will automatically chose the reply protocol base on the current client set on the RedisModuleCtx
(this mode will mostly be used when the module want to pass the reply to the client as is).
In addition, the following RedisModuleAPI were added to allow analyzing resp3 replies:

* New RedisModuleCallReply types:
   * `REDISMODULE_REPLY_MAP`
   * `REDISMODULE_REPLY_SET`
   * `REDISMODULE_REPLY_BOOL`
   * `REDISMODULE_REPLY_DOUBLE`
   * `REDISMODULE_REPLY_BIG_NUMBER`
   * `REDISMODULE_REPLY_VERBATIM_STRING`
   * `REDISMODULE_REPLY_ATTRIBUTE`

* New RedisModuleAPI:
   * `RedisModule_CallReplyDouble` - getting double value from resp3 double reply
   * `RedisModule_CallReplyBool` - getting boolean value from resp3 boolean reply
   * `RedisModule_CallReplyBigNumber` - getting big number value from resp3 big number reply
   * `RedisModule_CallReplyVerbatim` - getting format and value from resp3 verbatim reply
   * `RedisModule_CallReplySetElement` - getting element from resp3 set reply
   * `RedisModule_CallReplyMapElement` - getting key and value from resp3 map reply
   * `RedisModule_CallReplyAttribute` - getting a reply attribute
   * `RedisModule_CallReplyAttributeElement` - getting key and value from resp3 attribute reply
   
* New context flags:
   * `REDISMODULE_CTX_FLAGS_RESP3` - indicate that the client is using resp3

Tests were added to check the new RedisModuleAPI

### Modules API Changes
* RM_ReplyWithCallReply might return REDISMODULE_ERR if the given CallReply is in resp3
  but the client expects resp2. This is not a breaking change because in order to get a resp3
  CallReply one needs to specifically specify `3` as a parameter to the fmt argument of
  `RM_Call` (as mentioned above).

Tests were added to check this change

### More small Additions
* Added `debug set-disable-deny-scripts` that allows to turn on and off the commands no-script
flag protection. This is used by the Lua resp3 tests so it will be possible to run `debug protocol`
and check the resp3 parsing code.

Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
2021-08-04 16:28:07 +03:00
sundb
b4eda14257
Fix head and tail check with negative offset in _quicklistInsert (#9311)
Some background:
This fixes a problem that used to be dead code till now,
but became alive (only in the unit tests, not in redis) when #9113 got merged.
The problem it fixes doesn't actually cause any significant harm,
but that PR also added a test that fails verification because of that.
This test was merged with that problem due to human error, we didn't run it
on the last modified version before merging.
The fix in this PR existed in #8641 (closed because it's just dead code)
and #4674 (still pending but has other changes in it).

Now to the actual fix:
On quicklist insertion, if the insertion offset is -1 or `-(quicklist->count)`,
we can insert into the head of the next node rather than the tail of the
current node. this is especially important when the current node is full,
and adding anything to it will cause it to be split (or be over it's fill limit setting).

The bug was that the code attempted to determine that we're adding to
the tail of the current node by matching `offset == node->count` when in
fact it should have been `offset == node->count-1` (so it never entered that `if`).
and also that since we take negative offsets too, we can also match `-1`.
same applies for the head, i.e. `0` and `-count`.

The bug will cause the code to attempt inserting into the current node (thinking
we have to insert into the middle of the node rather than head or tail), and
in case the current node is full it'll have to be split (something that also
happens in valid cases).
On top of that, since it calls _quicklistSplitNode with an edge case, it'll actually
split the node in a way that all the entries fall into one split, and 0 into the other,
and then still insert the new entry into the first one, causing it to be populated
beyond it's intended fill limit.

This problem does not create any bug in redis, because the existing code does
not iterate from tail to head, and the offset never has a negative value when insert.

The other change this PR makes in the test code is just for some coverage,
insertion at index 0 is tested a lot, so it's nice to test some negative offsets too.
2021-08-04 11:19:48 +03:00
filipe oliveira
4aa927d16d
Enabled -x option (Read last argument from STDIN) on redis-benchmark (#9130)
Add the -x option (Read last argument from STDIN) on redis-benchmark.

Other changes:
To be able to use the code from redis-cli some helper methods were moved to cli_common.(h|c)

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-08-03 12:21:29 +03:00
Jonah H. Harris
432c92d8df
Add SINTERCARD/ZINTERCARD Commands (#8946)
Add SINTERCARD and ZINTERCARD commands that are similar to
ZINTER and SINTER but only return the cardinality with minimum
processing and memory overheads.

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-08-03 11:45:27 +03:00
Ariel Shtul
bdbf5eedae
Module api support for RESP3 (#8521)
Add new Module APS for RESP3 responses:
- RM_ReplyWithMap
- RM_ReplyWithSet
- RM_ReplyWithAttribute
- RM_ReplySetMapLength
- RM_ReplySetSetLength
- RM_ReplySetAttributeLength
- RM_ReplyWithBool

Deprecate REDISMODULE_POSTPONED_ARRAY_LEN in favor of a generic REDISMODULE_POSTPONED_LEN

Improve documentation
Add tests

Co-authored-by: Guy Benoish <guy.benoish@redislabs.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2021-08-03 11:37:19 +03:00
Huang Zhw
cf61ad14cc
When redis-cli received ASK, it didn't handle it (#8930)
When redis-cli received ASK, it used string matching wrong and didn't
handle it. 

When we access a slot which is in migrating state, it maybe
return ASK. After redirect to the new node, we need send ASKING
command before retry the command.  In this PR after redis-cli receives 
ASK, we send a ASKING command before send the origin command 
after reconnecting.

Other changes:
* Make redis-cli -u and -c (unix socket and cluster mode) incompatible 
  with one another.
* When send command fails, we avoid the 2nd reconnect retry and just
  print the error info. Users will decide how to do next. 
  See #9277.
* Add a test faking two redis nodes in TCL to just send ASK and OK in 
  redis protocol to test ASK behavior. 

Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Oran Agra <oran@redislabs.com>
2021-08-02 14:59:08 +03:00
Binbin
4000cb7d34
Modify some error logs printing level. (#9306)
1. In sendBulkToSlave, we used LL_VERBOSE in the past, changed to
LL_WARNING. (all the other places that do freeClient(slave) use LL_WARNING)
2. The old style LOG_WARNING, chang it to LL_WARNING. Introduced in an
old pr (#1690).
2021-08-02 11:18:35 +03:00
Ning Sun
f74af0e61d
Add NX/XX/GT/LT options to EXPIRE command group (#2795)
Add NX, XX, GT, and LT flags to EXPIRE, PEXPIRE, EXPIREAT, PEXAPIREAT.
- NX - only modify the TTL if no TTL is currently set 
- XX - only modify the TTL if there is a TTL currently set 
- GT - only increase the TTL (considering non-volatile keys as infinite expire time)
- LT - only decrease the TTL (considering non-volatile keys as infinite expire time)
return value of the command is 0 when the operation was skipped due to one of these flags.

Signed-off-by: Ning Sun <sunng@protonmail.com>
2021-08-02 08:57:49 +03:00
menwen
82c3158ad5
Fix if consumer is created as a side effect without notify and dirty++ (#9263)
Fixes:
- When a consumer is created as a side effect, redis didn't issue a keyspace notification,
  nor incremented the server.dirty (affects periodic snapshots).
  this was a bug in XREADGROUP, XCLAIM, and XAUTOCLAIM.
- When attempting to delete a non-existent consumer, don't issue a keyspace notification
  and don't increment server.dirty
  this was a bug in XGROUP DELCONSUMER

Other changes:
- Changed streamLookupConsumer() to always only do lookup consumer (never do implicit creation),
  Its last seen time is updated unless the SLC_NO_REFRESH flag is specified.
- Added streamCreateConsumer() to create a new consumer. When the creation is successful,
  it will notify and dirty++ unless the SCC_NO_NOTIFY or SCC_NO_DIRTIFY flags is specified.
- Changed streamDelConsumer() to always only do delete consumer.
- Added keyspace notifications tests about stream events.
2021-08-02 08:31:33 +03:00
cmemory
27a68a4d1b
improve quicklist insert head/tail while head/tail node is full. (#9113)
In _quicklistInsert when `at_head` / `at_tail` is true, but `prev` / `next` is NULL,
the code was reaching the last if-else block at the bottom of the function,
and would have unnecessarily executed _quicklistSplitNode, instead of just creating a new node.
This was because the penultimate if-else was checking `node->next && full_next`.
but in fact it was unnecessary to check if `node->next` exists, if we're gonna create one anyway,
we only care that it's not full, or doesn't exist, so the condition could have been changed to `!node->next || full_next`.

Instead, this PR makes a small refactory to negate `full_next` to a more meaningful variable
`avail_next` that indicates that the next node is available for pushing additional elements or
not (this would be true only if it exists and it is non-full)
2021-08-02 08:30:25 +03:00
SmartKeyerror
8ea18fa8d2
Fix copy-paste typo in t_list.c comment (#9305) 2021-08-02 08:06:36 +03:00
Binbin
86555ae0f7
GEO* STORE with empty src key delete the dest key and return 0, not empty array (#9271)
With an empty src key, we need to deal with two situations:
1. non-STORE: We should return emptyarray.
2. STORE: Try to delete the store key and return 0.

This applies to both GEOSEARCHSTORE (new to v6.2), and
also GEORADIUS STORE (which was broken since forever)

This pr try to fix #9261. i.e. both STORE variants would have behaved
like the non-STORE variants when the source key was missing,
returning an empty array and not deleting the destination key,
instead of returning 0, and deleting the destination key.

Also add more tests for some commands.
- GEORADIUS: wrong type src key, non existing src key, empty search,
  store with non existing src key, store with empty search
- GEORADIUSBYMEMBER: wrong type src key, non existing src key,
  non existing member, store with non existing src key
- GEOSEARCH: wrong type src key, non existing src key, empty search,
  frommember with non existing member
- GEOSEARCHSTORE: wrong type key, non existing src key,
  fromlonlat with empty search, frommember with non existing member

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-08-01 19:32:24 +03:00
Guy Korland
1483f5aa9b
Remove const from CommandFilterArgGet result (#9247)
Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
2021-08-01 11:29:32 +03:00
Quinn Klassen
8647addf93
Free unused capacity in the cluster send buffer. (#9255)
* Free unused capacity in the cluster send buffer.
* Refactor cluster cron to include a dedicated loop for node based cron jobs
2021-07-29 17:31:51 -07:00
Ewg-c
a403816405
Minor refactoring for rioConnRead and adding errno (#9280)
minor refactoring for rioConnRead and adding errno
2021-07-29 17:29:23 -07:00
Wen Hui
db41536454
Remove duplicate zero-port sentinels (#9240)
The issue is that when a sentinel with the same address and IP is turned on with a different runid, its port is set to 0 but it is still present in the dictionary master->sentinels which contain all the sentinels for a master.

This causes a problem when we do INFO SENTINEL because it takes the size of the dictionary of sentinels. This might also cause a problem for failover if enough sentinels have their port set to 0 since the number of voters in failover is also determined by the size of the dictionary of sentinels.

This commits removes the sentinels with the port set to zero from the dictionary of sentinels.
Fixes #8786
2021-07-29 12:32:28 +03:00
Oran Agra
b458b2999b
Fix LRU blue moon bug in RESTORE, RDB loading, module API (#9279)
The `lru_clock` and `lru` bits in `robj` save the least significant 24 bits of the unixtime (seconds since 1/1/1970),
and wrap around every 194 days.
The `objectSetLRUOrLFU` function, which is used in RESTORE with IDLETIME argument, and also in replica
or master loading an RDB that contains LRU, and by a module API had a bug that's triggered when that happens.

The scenario was that the idle time that came from the user, let's say RESTORE command is about 1000 seconds
(e.g. in the `RESTORE can set LRU` test we have), and the current `lru_clock` just wrapped around and is less than
1000 (i.e. a period of 1000 seconds once in some 6 months), the expression in that function would produce a negative
value and the code (and comment) specified that the best way to solve that is push the idle time backwards into the
past by 3 months. i.e. an idle time of 3 months instead of 1000 seconds.

instead, the right thing to do is to unwrap it, and put it near LRU_CLOCK_MAX. since now `lru_clock` is smaller than
`obj->lru` it will be unwrapped again by `estimateObjectIdleTime`.

bug was introduced by 052e03495f, but the code before it also seemed wrong.
2021-07-29 12:11:29 +03:00
Huang Zhw
17511df59b
Add INFO stat total_eviction_exceeded_time and current_eviction_exceeded_time (#9031)
Add two INFO metrics:
```
total_eviction_exceeded_time:69734
current_eviction_exceeded_time:10230
```
`current_eviction_exceeded_time` if greater than 0, means how much time current used memory is greater than `maxmemory`. And we are still over the maxmemory. If used memory is below `maxmemory`, this metric is reset to 0.
`total_eviction_exceeded_time` means total time used memory is greater than `maxmemory` since server startup. 
The units of these two metrics are ms.

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-07-26 10:07:20 +03:00
Oran Agra
9ca5e8c547
fix zslGetRank bug in dead-code (#9246)
This fixes an issue with zslGetRank which will happen only if the
skiplist data stracture is added two entries with the same element name,
this can't happen in redis zsets (we use dict), but in theory this is a
bug in the underlaying skiplist code.

Fixes #3081 and #4032

Co-authored-by: minjian.cai <cmjgithub@163.com>
2021-07-22 13:40:00 +03:00
Huang Zhw
71d452876e
On 32 bit platform, the bit position of GETBIT/SETBIT/BITFIELD/BITCOUNT,BITPOS may overflow (see CVE-2021-32761) (#9191)
GETBIT, SETBIT may access wrong address because of wrap.
BITCOUNT and BITPOS may return wrapped results.
BITFIELD may access the wrong address but also allocate insufficient memory and segfault (see CVE-2021-32761).

This commit uses `uint64_t` or `long long` instead of `size_t`.
related https://github.com/redis/redis/pull/8096

At 32bit platform:
> setbit bit 4294967295 1
(integer) 0
> config set proto-max-bulk-len 536870913
OK
> append bit "\xFF"
(integer) 536870913
> getbit bit 4294967296
(integer) 0

When the bit index is larger than 4294967295, size_t can't hold bit index. In the past,  `proto-max-bulk-len` is limit to 536870912, so there is no problem.

After this commit, bit position is stored in `uint64_t` or `long long`. So when `proto-max-bulk-len > 536870912`, 32bit platforms can still be correct.

For 64bit platform, this problem still exists. The major reason is bit pos 8 times of byte pos. When proto-max-bulk-len is very larger, bit pos may overflow.
But at 64bit platform, we don't have so long string. So this bug may never happen.

Additionally this commit add a test cost `512MB` memory which is tag as `large-memory`. Make freebsd ci and valgrind ci ignore this test.
2021-07-21 16:25:19 +03:00
Oran Agra
32e61ee295
Fix ACL category for SELECT, WAIT, ROLE, LASTSAVE, READONLY, READWRITE, ASKING (#9208)
- SELECT and WAIT don't read or write from the keyspace (unlike DEL, EXISTS, EXPIRE, DBSIZE, KEYS, etc).
they're more similar to AUTH and HELLO (and maybe PING and COMMAND).
they only affect the current connection, not the server state, so they should be `@connection`, not `@keyspace`

- ROLE, like LASTSAVE is `@admin` (and `@dangerous` like INFO) 

- ASKING, READONLY, READWRITE are `@connection` too (not `@keyspace`)

- Additionally, i'm now documenting the exact meaning of each ACL category so it's clearer which commands belong where.
2021-07-20 21:48:43 +03:00
Huang Zhw
1895e134a7
Fix missing separator in module info line (usedby and using lists) (#9241)
Fix module info genModulesInfoStringRenderModulesList lack separator when there's more than one module in the list.

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-07-19 11:10:25 +03:00
Huang Zhw
d54c9086c2
Remove testmodule in src/modules/Makefile. (#9250)
src/modules make failed. As in #3718 testmodule.c was removed. But the makefile was not updated
2021-07-19 08:23:25 +03:00
Binbin
e7ccdeebc9
Fix rank overflow in zslInsert with more than 2B entries. (#9249)
If there are more than 2B entries in a zset.
The calculated span will overflow.
2021-07-18 16:47:33 +03:00
Erik Dubbelboer
ee4bdf10ee
Fix format strings serverLogObjectDebugInfo to use unsigned (#2927)
This doesn't have any real impact, just a cleanup.
2021-07-18 15:27:42 +03:00
Paul Kulchenko
8a73cd1f9a Update Lua debugging help message commands for consistency. (#2946) 2021-07-18 14:11:09 +03:00
Binbin
11dc4e59b3
SMOVE only notify dstset when the addition is successful. (#9244)
in case dest key already contains the member, the dest key isn't modified, so the command shouldn't invalidate watch.
2021-07-17 09:54:06 +03:00
qetu3790
f03af47a34
Set TCP keepalive on inbound clusterbus connections (#9230)
Set TCP keepalive on inbound clusterbus connections to prevent memory leak
2021-07-15 20:40:25 -07:00
ZEEKLING
334a9789f1
fix timeout spell error (#9150) 2021-07-15 16:01:11 -07:00
Oran Agra
6a5bac309e
Test infra, handle RESP3 attributes and big-numbers and bools (#9235)
- promote the code in DEBUG PROTOCOL to addReplyBigNum
- DEBUG PROTOCOL ATTRIB skips the attribute when client is RESP2
- networking.c addReply for push and attributes generate assertion when
  called on a RESP2 client, anything else would produce a broken
  protocol that clients can't handle.
2021-07-14 19:14:31 +03:00
Yossi Gottlieb
277e4dc203
Fix compatibility with OpenSSL 1.1.0. (#9233) 2021-07-14 12:32:13 +03:00
gourav
7ac42ab7b3
Return -1 when connTLSWrite fails to write due to socket being closed (#9222) 2021-07-14 10:46:33 +03:00
Binbin
d0f36eeda0
Add missing comma in memory/xgroup command help message. (#9210)
This would have resulted in missing newline in the help message
2021-07-13 18:16:05 +03:00
perryitay
ac8b1df885
Fail EXEC command in case a watched key is expired (#9194)
There are two issues fixed in this commit: 
1. we want to fail the EXEC command in case there is a watched key that's logically
   expired but not yet deleted by active expire or lazy expire.
2. we saw that currently cache time is update in every `call()` (including nested calls),
   this time is being also being use for the isKeyExpired comparison, we want to update
   the cache time only in the first call (execCommand)

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-07-11 13:17:23 +03:00
Huang Zhw
cb961d8c8e
Do not install a file event to send data to rewrite child when parent stop sending diff to child in aof rewrite. (#8767)
In aof rewrite, when parent stop sending data to child, if there is
new rewrite data, aofChildWriteDiffData write event will be installed.
Then this event is issued and deletes the file event without do anyting.
This will happen over and over again until aof rewrite finish.

This bug used to waste a few system calls per excessive wake-up
(epoll_ctl and epoll_wait) per cycle, each cycle triggered by receiving
a write command from a client.
2021-07-11 13:00:17 +03:00
Binbin
fe5d325107
Add test for ziplist (nextdiff == -4 && reqlen < 4) (#9218)
The if judgement `nextdiff == -4 && reqlen < 4` in __ziplistInsert.
It's strange, but it's useful. Without it there will be problems during
chain update.
Till now these lines didn't have coverage in the tests, and there was
a question if they are at all needed (#7170)
2021-07-11 11:26:31 +03:00
Shogo Hayashi
350c6306b4
fix explanation of sha256 (#9220) 2021-07-10 10:04:54 -05:00
Huang Zhw
9a22457e8f
Start redis-check-aof/redis-check-rdb only if executable name contains (#9215)
redis-check-aof/redis-check-rdb.

Related to #9176. Before this commit, redis-server starts as
redis-check-aof/redis-check-rdb if the directory it is started from
contains the string redis-check-aof/redis-check-rdb. We check the
executable name instead of directory.
2021-07-09 21:45:58 +03:00
Mikhail Fesenko
1eb4baa5b8
Direct redis-cli repl prints to stderr, because --rdb can print to stdout. fflush stdout after responses (#9136)
1. redis-cli can output --rdb data to stdout
   but redis-cli also write some messages to stdout which will mess up the rdb.

2. Make redis-cli flush stdout when printing a reply
  This was needed in order to fix a hung in redis-cli test that uses
  --replica.
   Note that printf does flush when there's a newline, but fwrite does not.

3. fix the redis-cli --replica test which used to pass previously
   because it didn't really care what it read, and because redis-cli
   used printf to print these other things to stdout.

4. improve redis-cli --replica test to run with both diskless and disk-based.

Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Viktor Söderqvist <viktor@zuiderkwast.se>
2021-07-07 08:26:26 +03:00
Binbin
3a26f61fb3
Add range check for master port in replicaof. (#9201)
So that we can avoid commands that are obviously wrong.

Also unified with loadServerConfigFromString
because we also checked the range.
2021-07-06 10:46:10 +03:00
zhaozhao.zz
5ea3c3db15
a little optimization in touchAllWatchedKeysInDb (#9200)
touch keys only if the key is in one of the dbs,
in case rewind the list when the key doesn't exist.
2021-07-06 15:01:37 +08:00
Omer Shadmi
f06d782f5a
Avoid exiting to allow diskless loading to recover from RDB short read on module AUX data (#9199)
Currently a replica is able to recover from a short read (when diskless loading
is enabled) and avoid crashing/exiting, replying to the master and then the rdb
could be sent again by the master for another load attempt by the replica.
There were a few scenarios that were not behaving similarly, such as when
there is no end-of-file marker, or when module aux data failed to load, which
should be allowed to occur due to a short read.
2021-07-06 08:21:17 +03:00
uriyage
67732bacb1
redis-cli: reset dbnum and tx prompt state after RESET (#9096) 2021-07-05 10:55:55 +03:00
Binbin
a418a2d3fc
hrandfield and zrandmember with count should return emptyarray when key does not exist. (#9178)
due to a copy-paste bug, it used to reply with null response rather than empty array.
this commit includes new tests that are looking at the RESP response directly in
order to be able to tell the difference between them.

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-07-05 10:41:57 +03:00
Oran Agra
5a3de81925
Use accept4 on linux instead of fcntl to make a client socket non-blocking (#9177)
This reduces system calls on linux when a new connection is made / accepted.

Changes:
* Add the SOCK_CLOEXEC option to the accept4() call
  This  ensure that a fork/exec call does not leak a file descriptor.
* Move anetCloexec and connNonBlock info anetGenericAccept
* Moving connNonBlock from accept handlers to anetGenericAccept

Moving connNonBlock from createClient, is safe because createClient is
used in the following ways:
1. without a connection (fake client)
2. on an accepted connection (see above)
3. creating the master client by using connConnect (see below)

The third case, can either use anetTcpNonBlockConnect, or connTLSConnect
which is by default non-blocking.

Co-authored-by: Rajiv Kurian <geetasen@gmail.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Yoav Steinberg <yoav@redislabs.com>
2021-07-05 10:34:20 +03:00
Oran Agra
ec582cc7ad Query buffer shrinking improvements (#5013)
when tracking the peak, don't reset the peak to 0, reset it to the
maximum of the current used, and the planned to be used by the current
arg.

when shrining, split the two separate conditions.
the idle time shrinking will remove all free space.
but the peak based shrinking will keep room for the current arg.

when we resize due to a peak (rahter than idle time), don't trim all
unused space, let the qbuf keep a size that's sufficient for the
currently process bulklen, and the current peak.

Co-authored-by: sundb <sundbcn@gmail.com>
Co-authored-by: yoav-steinberg <yoav@monfort.co.il>
2021-07-05 09:30:16 +03:00
zhaozhao.zz
2248eaacac resize query buffer more accurately
1. querybuf_peak has not been updated correctly in readQueryFromClient.
2. qbuf shrinking uses sdsalloc instead of sdsAllocSize

see more details in issue #4983
2021-07-05 09:30:16 +03:00
Madelyn Olson
8f59f131e5
Update incrDecrCommand to use addReplyLongLong (#9188)
Update incrDecrCommand to use addReplyLongLong
2021-07-03 10:51:53 -05:00
Yossi Gottlieb
aa139e2f02
Fix CLIENT UNBLOCK crashing modules. (#9167)
Modules that use background threads with thread safe contexts are likely
to use RM_BlockClient() without a timeout function, because they do not
set up a timeout.

Before this commit, `CLIENT UNBLOCK` would result with a crash as the
`NULL` timeout callback is called. Beyond just crashing, this is also
logically wrong as it may throw the module into an unexpected client
state.

This commits makes `CLIENT UNBLOCK` on such clients behave the same as
any other client that is not in a blocked state and therefore cannot be
unblocked.
2021-07-01 17:11:27 +03:00
Oran Agra
de9bae21ef
Fix bug in sdscatfmt when % is the last format char (#9173)
For the sdscatfmt function in sds.c, when the parameter fmt ended up with '%',
the behavior is undefined. This commit fix this bug.

Co-authored-by: stafuc <stafuc@gmail.com>
2021-07-01 08:19:04 +03:00
Wang Yuan
16e04ed944
Don't start in sentinel mode if only the folder name contains redis-sentinel (#9176)
Before this commit, redis-server starts in sentinel mode if the first startup
argument has the string redis-sentinel, so redis also starts in sentinel mode
if the directory it was started from contains the string redis-sentinel.
Now we check the executable name instead of directory.

Some examples:
1. Execute ./redis-sentinel/redis/src/redis-sentinel, starts in sentinel mode.
2. Execute ./redis-sentinel/redis/src/redis-server, starts in server mode,
   but before, redis will start in sentinel mode.
3. Execute ./redis-sentinel/redis/src/redis-server --sentinel, of course, like
   before, starts in sentinel mode.
2021-07-01 08:05:18 +03:00
ZhaolongLi
6476c8e856
Fix range issues in default value of LIMIT argument to XADD and XTRIM (#9147)
This seems to be an unimportant bug that was accidentally generated. If the user does not specify limit in streamParseAddOrTrimArgsOrReply, the initial value of args->limit is 100 * server.stream_node_max_entries, which may lead to out of bounds, and then the default function of limit in xadd becomes invalid (this failure occurs in streamTrim).

Additionally, provide sane default for args->limit in case stream_node_max_entries is set to 0.

Co-authored-by: lizhaolong.lzl <lizhaolong.lzl@B-54MPMD6R-0221.local>
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: guybe7 <guy.benoish@redislabs.com>
2021-06-30 16:55:09 +03:00
Mikhail Fesenko
74fe15b360
redis-cli --rdb: fix broken fsync/ftruncate for stdout (#9135)
A change in redis 6.2 caused redis-cli --rdb that's directed to stdout to fail because fsync fails.
This commit avoids doing ftruncate (fails with a warning) and fsync (fails with an error) when the
output file is `-`, and adds the missing documentation that `-` means stdout.

Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Wang Yuan <wangyuancode@163.com>
2021-06-30 16:49:54 +03:00
Rob Snyder
eaa52719a3
Fix ziplist length updates on bigendian platforms (#2080)
Adds call to intrev16ifbe to ensure ZIPLIST_LENGTH is compared correctly
2021-06-30 16:46:06 +03:00
Huang Zhw
70dac4b436
Make redis-cli --help/-h output to stdout and exit with 0. (#9124)
If user executes redis-cli --help, the content should be output to stdout and exits with 0.
2021-06-30 14:56:44 +03:00
luvine
4278c45c91
HSETNX can lead coding type change even when skipped due to NX (#4615)
1. Add one key-value pair to myhash, which the length of key and value both less than hash-max-ziplist-value, for example:
>hset myhash key value
2. Then execute the following command
>hsetnx myhash key value1 (the length greater than hash-max-ziplist-value)
3. This will add nothing, but the code type of "myhash" changed from ziplist to dict even there are only one key-value pair in "myhash", and both of them less than hash-max-ziplist-value.
2021-06-30 12:14:11 +03:00
ZhaolongLi
cf1d49adbf
Simplify logic in raxSeek, eliminate it->key reassembly on gt and lt (#9115)
In the original version, the operation of traversing the stack only seems to
reconstruct the key that does not contain the current node.

But in fact We have got the matched length and splitpos in the key in the
raxlowwalk, so I think we can simplify the logic of this part.

Co-authored-by: lizhaolong.lzl <lizhaolong.lzl@B-54MPMD6R-0221.local>
2021-06-30 08:21:19 +03:00
Leibale Eidelman
95274f1f8a
fix ZRANGESTORE - should return 0 when src points to an empty key (#9089)
mistakenly it used to return an empty array rather than 0.

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-06-29 16:38:10 +03:00
Tsonglew
7913d34d7c
Improve doc comment about AE_DONT_WAIT (#9165)
Co-authored-by: yoav-steinberg <yoav@monfort.co.il>
2021-06-29 14:37:02 +03:00
guybe7
4434cebbb3
Include sizeof(struct stream) in objectComputeSize (#9164)
Affects MEMORY USAGE
2021-06-29 14:34:18 +03:00
Binbin
4bc5a8324d
ZRANDMEMBER WITHSCORES with negative COUNT may return bad score (#9162)
Return a bad score when used with negative count (or count of 1), and non-ziplist encoded zset.
Also add test to validate the return value and cover the issue.
2021-06-29 10:14:28 +03:00
Wang Yuan
4fa3e23092
Remove unnecessary replication backlog memory copy (#9157)
in the past, the reply list was a list of sds objects, so this didn't have any overhead,
but now addReplySds just copies the data from the sds and frees it, so there's no
need to make a copy of the buffer before copying again.
this reduces an excessive allocation and free and a memcpy.
2021-06-28 09:43:40 +03:00
Yossi Gottlieb
1071430875
Corrections about the new protected-mode usage. (#9143) 2021-06-27 11:34:48 +03:00
Yossi Gottlieb
f233c4c59d
Add bind-source-addr configuration argument. (#9142)
In the past, the first bind address that was explicitly specified was
also used to bind outgoing connections. This could result with some
problems. For example: on some systems using `bind 127.0.0.1` would
result with outgoing connections also binding to `127.0.0.1` and failing
to connect to remote addresses.

With the recent change to the way `bind` is handled, this presented
other issues:

* The default first bind address is '*' which is not a valid address.
* We make no distinction between user-supplied config that is identical
to the default, and the default config.

This commit addresses both these issues by introducing an explicit
configuration parameter to control the bind address on outgoing
connections.
2021-06-24 19:48:18 +03:00
Huang Zhw
d1a21e0292
Clean redis-benchmark Throughput output. (#9139)
some leftovers from print are visible when the new line is printed.
2021-06-24 18:04:19 +03:00
ZhaolongLi
89ae353748
change streamAppendItem to use raxEOF instead of raxNext (#9138)
The call to raxNext didn't really progress in the rax, since we were already on the last item.
instead, all it does is check that it is indeed a valid item, so the new code clearer.
2021-06-24 15:08:26 +03:00
Oran Agra
ae418eca24
Adjustments to recent RM_StringTruncate fix (#3718) (#9125)
- Introduce a new sdssubstr api as a building block for sdsrange.
  The API of sdsrange is many times hard to work with and also has
  corner case that cause bugs. sdsrange is easy to work with and also
  simplifies the implementation of sdsrange.
- Revert the fix to RM_StringTruncate and just use sdssubstr instead of
  sdsrange.
- Solve valgrind warnings from the new tests introduced by the previous
  PR.
2021-06-22 17:22:40 +03:00
Yossi Gottlieb
07b0d144ce
Improve bind and protected-mode config handling. (#9034)
* Specifying an empty `bind ""` configuration prevents Redis from listening on any TCP port. Before this commit, such configuration was not accepted.
* Using `CONFIG GET bind` will always return an explicit configuration value. Before this commit, if a bind address was not specified the returned value was empty (which was an anomaly).

Another behavior change is that modifying the `bind` configuration to a non-default value will NO LONGER DISABLE protected-mode implicitly.
2021-06-22 12:50:17 +03:00
Evan
1ccf2ca2f4
modules: Add newlen == 0 handling to RM_StringTruncate (#3717) (#3718)
Previously, passing 0 for newlen would not truncate the string at all.
This adds handling of this case, freeing the old string and creating a new empty string.

Other changes:
- Move `src/modules/testmodule.c` to `tests/modules/basics.c`
- Introduce that basic test into the test suite
- Add tests to cover StringTruncate
- Add `test-modules` build target for the main makefile
- Extend `distclean` build target to clean modules too
2021-06-22 12:26:48 +03:00
Binbin
bf92000e2d
Remove extra semicolon (#9117)
Remove extra semicolon.
2021-06-21 22:53:28 -07:00
Wen Hui
81d5f05b6e
add tilt mode since in sentinel info (#9000)
Sentinel shows tilt mode boolean in info, now it'll show an indication of how long ago it started.
2021-06-22 07:37:46 +03:00
Oran Agra
9b564b525d
Fix race in client side tracking (#9116)
The `Tracking gets notification of expired keys` test in tracking.tcl
used to hung in valgrind CI quite a lot.

It turns out the reason is that with valgrind and a busy machine, the
server cron active expire cycle could easily run in the same event loop
as the command that created `mykey`, so that when they key got expired,
there were two change events to broadcast, one that set the key and one
that expired it, but since we used raxTryInsert, the client that was
associated with the "last" change was the one that created the key, so
the NOLOOP filtered that event.

This commit adds a test that reproduces the problem by using lazy expire
in a multi-exec which makes sure the key expires in the same event loop
as the one that added it.
2021-06-22 07:35:59 +03:00
Binbin
f004073b54
Check cluster_enabled in readwriteCommand just like readonlyCommand. (#9015) 2021-06-21 14:02:44 -07:00
Maxim Galushka
96bb078577
redis-cli: support for REDIS_REPLY_SET in CSV and RAW output. (#7338)
Fixes #6792. Added support of REDIS_REPLY_SET in raw and csv output of `./redis-cli`

Test:

run commands to test:
  ./redis-cli -3 --csv COMMAND
  ./redis-cli -3 --raw COMMAND

Now they are returning resuts, were failing with: "Unknown reply type: 10" before the change.
2021-06-21 11:01:31 +03:00
Binbin
1fabb89a41
Make CONFIG GET OOM-score-adj-values case insensitive. (#9114) 2021-06-20 12:32:02 +03:00
SUN
4c52aa9faa
chdir to dir before fopen logfile in loadServerConfigFromString() (#6741)
Open the log file only after parsing the entire config file, so that it's
location isn't dependent on the order of configs (`dir` and `logfile`).
Also solves the problem of creating multiple log files if the `logfile`
directive appears many times in the config file.
2021-06-20 11:34:04 +03:00
Binbin
561c69c285
Improve the debug help command message (#9098)
cleanups:
1: Re-introduce debug leak subcommand in help text.
Mistankenly deleted in https://github.com/redis/redis/pull/5531

2: Formatted the text.
Some text lacks commas resulting in no line breaks.

3: Supplementary debug restart command descriptions of delay arg.
2021-06-20 09:46:27 +03:00
Binbin
89152f8e41
Use loaderr label to handle error. (#9111)
So that we can easily see the lines of the config.
Also unified with other error handling.
2021-06-20 08:04:12 +03:00
gourav
6248127c43
Use exitFromChild to exit from child process in linuxMadvFreeForkBugCheck (#9101) 2021-06-17 13:37:38 -07:00
sundb
1a22445d30
Make readQueryFromClient more aggressive when reading big arg again (Followup for #9003) (#9100)
Due to the change in #9003, a long-standing bug was raised under `valgrind`.
This bug can cause the master-slave sync to take a very long time, causing the `pendingquerybuf.tcl` test to fail.
This problem does not only occur in master-slave sync, it is triggered when the big arg is greater than 32k.
step:
```sh
dd if=/dev/zero of=bigfile bs=1M count=32
./src/redis-cli -x hset a a < bigfile
```

1) Make room for querybuf in processMultibulkBuffer, now the alloc of querybuf will be more than 32k.
2) If this happens to trigger the `clientsCronResizeQueryBuffer`, querybuf will be resized to 0.
3) Finally, in readQueryFromClient, we expand the querybuf non-greedily, from 0 to 32k.
    Old code, make room for querybuf is greedy, so it only needs 11 times to expand to 32M(16k*(2^11)),
    but now we need 2048(32*1024/16) times to reach it, due to the slow allocation under valgrind that exposed the problem.

The fix for the excessive shrinking of the query buf to 0, will be handled in #5013 (that other change on it's own can fix failing test too), but the fix in this PR will also fix the failing test.

The fix in this PR will makes the reading in `readQueryFromClient` more aggressive when working on a big arg (so that it is in par with the same code in `processMultibulkBuffer` (i.e. the two calls to `sdsMakeRoomForNonGreedy` should both use the bulk size).
In the code before this fix the one in readQueryFromClient always has `readlen = PROTO_IOBUF_LEN`
2021-06-17 21:49:15 +03:00
DarrenJiang13
ada60d8003
Improve objectComputeSize() with allocator fragmentaiton. (#9095)
This commit improve MEMORY USAGE command to include internal fragmentation overheads of:
1. EMBSTR encoded strings
2. ziplist encoded zsets and hashes
3. List type nodes
2021-06-17 13:30:37 +03:00
Sam Bortman
c2b93ff83f
Support glob pattern matching for config include files (#8980)
This will allow distros to use an "include conf.d/*.conf" statement in the default configuration file
which will facilitate customization across upgrades/downgrades.

The change itself is trivial: instead of opening an individual file, the glob call creates a vector of files to open, and each file is opened in turn, and its content is added to the configuration.
2021-06-16 21:59:38 +03:00
yoav-steinberg
362786c58a
Remove gopher protocol support. (#9057)
Gopher support was added mainly because it was simple (trivial to add).
But apparently even something that was trivial at the time, does cause complications
down the line when adding more features.
We recently ran into a few issues with io-threads conflicting with the gopher support.
We had to either complicate the code further in order to solve them, or drop gopher.
AFAIK it's completely unused, so we wanna chuck it, rather than keep supporting it.
2021-06-16 09:47:25 +03:00
chenyang8094
e0cd3ad0de
Enhance mem_usage/free_effort/unlink/copy callbacks and add GetDbFromIO api. (#8999)
Create new module type enhanced callbacks: mem_usage2, free_effort2, unlink2, copy2.
These will be given a context point from which the module can obtain the key name and database id.
In addition the digest and defrag context can now be used to obtain the key name and database id.
2021-06-16 09:45:49 +03:00
Jason Elbaum
7f342020dc
Change return value type for ZPOPMAX/MIN in RESP3 (#8981)
When using RESP3, ZPOPMAX/ZPOPMIN should return nested arrays for consistency
with other commands (e.g. ZRANGE).

We do that only when COUNT argument is present (similarly to how LPOP behaves).
for reasoning see https://github.com/redis/redis/issues/8824#issuecomment-855427955

This is a breaking change only when RESP3 is used, and COUNT argument is present!
2021-06-16 09:29:57 +03:00
Uri Shachar
c7e502a07b
Cleaning up the cluster interface by moving almost all related declar… (#9080)
* Cleaning up the cluster interface by moving almost all related declarations into cluster.h
(no logic change -- just moving declarations/definitions around)

This initial effort leaves two items out of scope - the configuration parsing into the server
struct and the internals exposed by the clusterNode struct.

* Remove unneeded declarations of dictSds*
Ideally all the dictSds functionality would move from server.c into a dedicated module
so we can avoid the duplication in redis-benchmark/cli

* Move crc16 back into server.h, will be moved out once we create a seperate header file for
hashing functions
2021-06-15 20:35:13 -07:00
sundb
e5d8a5eb85
Fix the wrong reisze of querybuf (#9003)
The initialize memory of `querybuf` is `PROTO_IOBUF_LEN(1024*16) * 2` (due to sdsMakeRoomFor being greedy), under `jemalloc`, the allocated memory will be 40k.
This will most likely result in the `querybuf` being resized when call `clientsCronResizeQueryBuffer` unless the client requests it fast enough.

Note that this bug existed even before #7875, since the condition for resizing includes the sds headers (32k+6).

## Changes
1. Use non-greedy sdsMakeRoomFor when allocating the initial query buffer (of 16k).
1. Also use non-greedy allocation when working with BIG_ARG (we won't use that extra space anyway)
2. in case we did use a greedy allocation, read as much as we can into the buffer we got (including internal frag), to reduce system calls.
3. introduce a dedicated constant for the shrinking (same value as before)
3. Add test for querybuf.
4. improve a maxmemory test by ignoring the effect of replica query buffers (can accumulate many ACKs on slow env)
5. improve a maxmemory by disabling slowlog (it will cause slight memory growth on slow env).
2021-06-15 14:46:19 +03:00
Binbin
eb15c456c9
Check CLIENT_DIRTY_CAS flag before process transaction. (#9086)
Do not queue command in an already aborted MULTI state.
We can detect an error (watched key).
So in queueMultiCommand, we also can return early.
Like we deal with `CLIENT_DIRTY_EXEC`.
2021-06-15 14:36:04 +03:00
DarrenJiang13
678e67b58e
presize hashtable to avoid rehashing when hashTypeConvertZiplist() (#8943) 2021-06-15 12:52:36 +03:00
Binbin
b109977301
Fix XINFO help for unexpected options. (#9075)
Small cleanup and consistency.
2021-06-15 10:01:11 +03:00
yvette903
096c5fd5d2
Fix coredump after Client Unpause command when threaded I/O is enabled (#9041)
Fix crash when using io-threads-do-reads and issuing CLIENT PAUSE and
CLIENT UNPAUSE.
This issue was introduced in redis 6.2 together with the FAILOVER command.
2021-06-15 09:21:52 +03:00
Binbin
7900b48bc7
slowlog get command supports passing in -1 to get all logs. (#9018)
This was already the case before this commit, but it wasn't clear / intended in the code, now it does.
2021-06-14 16:46:45 +03:00
ZhaolongLi
dee378d76d
Simplify a redundant condition in computeDefragCycles (#9076)
Co-authored-by: lizhaolong.lzl <lizhaolong.lzl@B-54MPMD6R-0221.local>
2021-06-14 11:48:15 +03:00
ZhaolongLi
688cdb05b4
Remove dead code in defrag.c (ziplist encoded list) (#9074)
Co-authored-by: lizhaolong.lzl <lizhaolong.lzl@B-54MPMD6R-0221.local>
2021-06-14 10:55:17 +03:00
YaacovHazan
1677efb9da
cleanup around loadAppendOnlyFile (#9012)
Today when we load the AOF on startup, the loadAppendOnlyFile checks if
the file is openning for reading.
This check is redundent (dead code) as we open the AOF file for writing at initServer,
and the file will always be existing for the loadAppendOnlyFile.

In this commit:
- remove all the exit(1) from loadAppendOnlyFile, as it is the caller
  responsibility to decide what to do in case of failure.
- move the opening of the AOF file for writing, to be after we loading it.
- avoid return -ERR in DEBUG LOADAOF, when the AOF is existing but empty
2021-06-14 10:38:08 +03:00
fishsalter
2727a2cec0
msetnx reply error (#9061) 2021-06-13 20:35:23 -07:00
Binbin
b8a5da80c4
Fix accidental deletion of sinterstore command when we meet wrong type error. (#9032)
SINTERSTORE would have deleted the dest key right away,
even when later on it is bound to fail on an (WRONGTYPE) error.

With this change it first picks up all the input keys, and only later
delete the dest key if one is empty.

Also add more tests for some commands.
Mainly focus on 
- `wrong type error`: 
	expand test case (base on sinter bug) in non-store variant
	add tests for store variant (although it exists in non-store variant, i think it would be better to have same tests)
- the dstkey result when we meet `non-exist key (empty set)` in *store

sdiff:
- improve test case about wrong type error (the one we found in sinter, although it is safe in sdiff)
- add test about using non-exist key (treat it like an empty set)
sdiffstore:
- according to sdiff test case, also add some tests about `wrong type error` and `non-exist key`
- the different is that in sdiffstore, we will consider the `dstkey` result

sunion/sunionstore add more tests (same as above)

sinter/sinterstore also same as above ...
2021-06-13 10:53:46 +03:00
Binbin
5517f3624d
Remove duplicate dbid lookup in performEvictions. (#9063)
Minor code cleanup.
2021-06-13 09:31:19 +03:00
ny0312
fb140a1bff
Fix flaky test case for absolute TTL replication (#9069)
The root cause is that one test (`5 keys in, 5 keys out`) is leaking a volatile key
that can expire while another later test(`All TTL in commands are propagated
as absolute timestamp in replication stream`) is running.
Such leaked expiration injects an unexpected `DEL` command into the
replication command during the later test, causing it to fail.

The fixes are two fold:
1. Plug the leak in the first test.
2. Add FLUSHALL to the later test, to avoid future interference from other tests.
2021-06-13 08:42:20 +03:00