Commit Graph

759 Commits

Author SHA1 Message Date
Ozan Tezcan
6790d848c5
Reuse temporary client objects for blocked clients by module (#9940)
Added a pool for temporary client objects to reuse in module operations.
By reusing temporary clients, we are avoiding expensive createClient()/freeClient()
calls and improving performance of RM_BlockClient() and  RM_GetThreadSafeContext() calls. 

This commit contains two optimizations: 

1 - RM_BlockClient() and RM_GetThreadSafeContext() calls create temporary clients and they are freed in
RM_UnblockClient() and RM_FreeThreadSafeContext() calls respectively. Creating/destroying client object
takes quite time. To avoid that, added a pool of temporary clients. Pool expands when more clients are needed.
Also, added a cron function to shrink the pool and free unused clients after some time. Pool starts with zero
clients in it. It does not have max size and can grow unbounded as we need it. We will keep minimum of 8
temporary clients in the pool once created. Keeping small amount of clients to avoid client allocation costs
if temporary clients are required after some idle period.

2 - After unblocking a client (RM_UnblockClient()), one byte is written to pipe to wake up Redis main thread.
If there are many clients that will be unblocked, each operation requires one write() call which is quite expensive.
Changed code to avoid subsequent calls if possible. 

There are a few more places that need temporary client objects (e.g RM_Call()). These are now using the same
temporary client pool to make things more centralized.
2022-01-11 19:00:56 +02:00
Oran Agra
3204a03574
Move doc metadata from COMMAND to COMMAND DOCS (#10056)
Syntax:
`COMMAND DOCS [<command name> ...]`

Background:
Apparently old version of hiredis (and thus also redis-cli) can't
support more than 7 levels of multi-bulk nesting.

The solution is to move all the doc related metadata from COMMAND to a
new COMMAND DOCS sub-command.

The new DOCS sub-command returns a map of commands (not an array like in COMMAND),
And the same goes for the `subcommands` field inside it (also contains a map)

Besides that, the remaining new fields of COMMAND (hints, key-specs, and
sub-commands), are placed in the outer array rather than a nested map.
this was done mainly for consistency with the old format.

Other changes:
---
* Allow COMMAND INFO with no arguments, which returns all commands, so that we can some day deprecated
  the plain COMMAND (no args)

* Reduce the amount of deferred replies from both COMMAND and COMMAND
  DOCS, especially in the inner loops, since these create many small
  reply objects, which lead to many small write syscalls and many small
  TCP packets.
  To make this easier, when populating the command table, we count the
  history, args, and hints so we later know their size in advance.
  Additionally, the movablekeys flag was moved into the flags register.
* Update generate-commands-json.py to take the data from both command, it
  now executes redis-cli directly, instead of taking input from stdin.
* Sub-commands in both COMMAND (and COMMAND INFO), and also COMMAND DOCS,
  show their full name. i.e. CONFIG 
*   GET will be shown as `config|get` rather than just `get`.
  This will be visible both when asking for `COMMAND INFO config` and COMMAND INFO config|get`, but is
  especially important for the later.
  i.e. imagine someone doing `COMMAND INFO slowlog|get config|get` not being able to distinguish between the two
  items in the array response.
2022-01-11 17:16:16 +02:00
Binbin
a1ae260e8a
Make sure replicas don't write their own replies to the replication link (#10081)
The following steps will crash redis-server:
```
[root]# cat crash
PSYNC replicationid -1
SLOWLOG GET
GET key
[root]# nc 127.0.0.1 6379 < crash
```

This one following #10020 and the crash was reported in #10076.

Other changes about the output info:
1. Cmd with a full name by using `getFullCommandName`, now it will print the right
   subcommand name like `slowlog|get`.
2. Print the full client info by using `catClientInfoString`, the info is also valuable.:
2022-01-10 08:21:16 +02:00
Meir Shpilraien (Spielrein)
885f6b5ceb
Redis Function Libraries (#10004)
# Redis Function Libraries

This PR implements Redis Functions Libraries as describe on: https://github.com/redis/redis/issues/9906.

Libraries purpose is to provide a better code sharing between functions by allowing to create multiple
functions in a single command. Functions that were created together can safely share code between
each other without worrying about compatibility issues and versioning.

Creating a new library is done using 'FUNCTION LOAD' command (full API is described below)

This PR introduces a new struct called libraryInfo, libraryInfo holds information about a library:
* name - name of the library
* engine - engine used to create the library
* code - library code
* description - library description
* functions - the functions exposed by the library

When Redis gets the `FUNCTION LOAD` command it creates a new empty libraryInfo.
Redis passes the `CODE` to the relevant engine alongside the empty libraryInfo.
As a result, the engine will create one or more functions by calling 'libraryCreateFunction'.
The new funcion will be added to the newly created libraryInfo. So far Everything is happening
locally on the libraryInfo so it is easy to abort the operation (in case of an error) by simply
freeing the libraryInfo. After the library info is fully constructed we start the joining phase by
which we will join the new library to the other libraries currently exist on Redis.
The joining phase make sure there is no function collision and add the library to the
librariesCtx (renamed from functionCtx). LibrariesCtx is used all around the code in the exact
same way as functionCtx was used (with respect to RDB loading, replicatio, ...).
The only difference is that apart from function dictionary (maps function name to functionInfo
object), the librariesCtx contains also a libraries dictionary that maps library name to libraryInfo object.

## New API
### FUNCTION LOAD
`FUNCTION LOAD <ENGINE> <LIBRARY NAME> [REPLACE] [DESCRIPTION <DESCRIPTION>] <CODE>`
Create a new library with the given parameters:
* ENGINE - REPLACE Engine name to use to create the library.
* LIBRARY NAME - The new library name.
* REPLACE - If the library already exists, replace it.
* DESCRIPTION - Library description.
* CODE - Library code.

Return "OK" on success, or error on the following cases:
* Library name already taken and REPLACE was not used
* Name collision with another existing library (even if replace was uses)
* Library registration failed by the engine (usually compilation error)

## Changed API
### FUNCTION LIST
`FUNCTION LIST [LIBRARYNAME <LIBRARY NAME PATTERN>] [WITHCODE]`
Command was modified to also allow getting libraries code (so `FUNCTION INFO` command is no longer
needed and removed). In addition the command gets an option argument, `LIBRARYNAME` allows you to
only get libraries that match the given `LIBRARYNAME` pattern. By default, it returns all libraries.

### INFO MEMORY
Added number of libraries to `INFO MEMORY`

### Commands flags
`DENYOOM` flag was set on `FUNCTION LOAD` and `FUNCTION RESTORE`. We consider those commands
as commands that add new data to the dateset (functions are data) and so we want to disallows
to run those commands on OOM.

## Removed API
* FUNCTION CREATE - Decided on https://github.com/redis/redis/issues/9906
* FUNCTION INFO - Decided on https://github.com/redis/redis/issues/9899

## Lua engine changes
When the Lua engine gets the code given on `FUNCTION LOAD` command, it immediately runs it, we call
this run the loading run. Loading run is not a usual script run, it is not possible to invoke any
Redis command from within the load run.
Instead there is a new API provided by `library` object. The new API's: 
* `redis.log` - behave the same as `redis.log`
* `redis.register_function` - register a new function to the library

The loading run purpose is to register functions using the new `redis.register_function` API.
Any attempt to use any other API will result in an error. In addition, the load run is has a time
limit of 500ms, error is raise on timeout and the entire operation is aborted.

### `redis.register_function`
`redis.register_function(<function_name>, <callback>, [<description>])`
This new API allows users to register a new function that will be linked to the newly created library.
This API can only be called during the load run (see definition above). Any attempt to use it outside
of the load run will result in an error.
The parameters pass to the API are:
* function_name - Function name (must be a Lua string)
* callback - Lua function object that will be called when the function is invokes using fcall/fcall_ro
* description - Function description, optional (must be a Lua string).

### Example
The following example creates a library called `lib` with 2 functions, `f1` and `f1`, returns 1 and 2 respectively:
```
local function f1(keys, args)
    return 1
end

local function f2(keys, args)
    return 2
end

redis.register_function('f1', f1)
redis.register_function('f2', f2)
```

Notice: Unlike `eval`, functions inside a library get the KEYS and ARGV as arguments to the
functions and not as global.

### Technical Details

On the load run we only want the user to be able to call a white list on API's. This way, in
the future, if new API's will be added, the new API's will not be available to the load run
unless specifically added to this white list. We put the while list on the `library` object and
make sure the `library` object is only available to the load run by using [lua_setfenv](https://www.lua.org/manual/5.1/manual.html#lua_setfenv) API. This API allows us to set
the `globals` of a function (and all the function it creates). Before starting the load run we
create a new fresh Lua table (call it `g`) that only contains the `library` API (we make sure
to set global protection on this table just like the general global protection already exists
today), then we use [lua_setfenv](https://www.lua.org/manual/5.1/manual.html#lua_setfenv)
to set `g` as the global table of the load run. After the load run finished we update `g`
metatable and set `__index` and `__newindex` functions to be `_G` (Lua default globals),
we also pop out the `library` object as we do not need it anymore.
This way, any function that was created on the load run (and will be invoke using `fcall`) will
see the default globals as it expected to see them and will not have the `library` API anymore.

An important outcome of this new approach is that now we can achieve a distinct global table
for each library (it is not yet like that but it is very easy to achieve it now). In the future we can
decide to remove global protection because global on different libraries will not collide or we
can chose to give different API to different libraries base on some configuration or input.

Notice that this technique was meant to prevent errors and was not meant to prevent malicious
user from exploit it. For example, the load run can still save the `library` object on some local
variable and then using in `fcall` context. To prevent such a malicious use, the C code also make
sure it is running in the right context and if not raise an error.
2022-01-06 13:39:38 +02:00
filipe oliveira
5dd15443ac
Added INFO LATENCYSTATS section: latency by percentile distribution/latency by cumulative distribution of latencies (#9462)
# Short description

The Redis extended latency stats track per command latencies and enables:
- exporting the per-command percentile distribution via the `INFO LATENCYSTATS` command.
  **( percentile distribution is not mergeable between cluster nodes ).**
- exporting the per-command cumulative latency distributions via the `LATENCY HISTOGRAM` command.
  Using the cumulative distribution of latencies we can merge several stats from different cluster nodes
  to calculate aggregate metrics .

By default, the extended latency monitoring is enabled since the overhead of keeping track of the
command latency is very small.
 
If you don't want to track extended latency metrics, you can easily disable it at runtime using the command:
 - `CONFIG SET latency-tracking no`

By default, the exported latency percentiles are the p50, p99, and p999.
You can alter them at runtime using the command:
- `CONFIG SET latency-tracking-info-percentiles "0.0 50.0 100.0"`


## Some details:
- The total size per histogram should sit around 40 KiB. We only allocate those 40KiB when a command
  was called for the first time.
- With regards to the WRITE overhead As seen below, there is no measurable overhead on the achievable
  ops/sec or full latency spectrum on the client. Including also the measured redis-benchmark for unstable
  vs this branch. 
- We track from 1 nanosecond to 1 second ( everything above 1 second is considered +Inf )

## `INFO LATENCYSTATS` exposition format

   - Format: `latency_percentiles_usec_<CMDNAME>:p0=XX,p50....` 

## `LATENCY HISTOGRAM [command ...]` exposition format

Return a cumulative distribution of latencies in the format of a histogram for the specified command names.

The histogram is composed of a map of time buckets:
- Each representing a latency range, between 1 nanosecond and roughly 1 second.
- Each bucket covers twice the previous bucket's range.
- Empty buckets are not printed.
- Everything above 1 sec is considered +Inf.
- At max there will be log2(1000000000)=30 buckets

We reply a map for each command in the format:
`<command name> : { `calls`: <total command calls> , `histogram` : { <bucket 1> : latency , < bucket 2> : latency, ...  } }`

Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-05 14:01:05 +02:00
yoav-steinberg
65a7635793
redis-cli --replica reads dummy empty rdb instead of full snapshot (#10044)
This makes redis-cli --replica much faster and reduces COW/fork risks on server side.
This commit also improves the RDB filtering via REPLCONF rdb-filter-only to support no "include" specifiers at all.
2022-01-04 17:09:22 +02:00
Matthieu MOREL
d5a3b3f5ec
Setup dependabot for github-actions and codespell (#9857)
This sets up  dependabot to check weekly updates for pip and github-actions dependencies.
If it finds an update it will create a PR to update the dependency. More information can be found here

It includes the update of:

* vmactions/freebsd-vm from 0.1.4 to 0.1.5
* codespell from 2.0.0 to 2.1.0

Also includes spelling fixes found by the latest version of codespell.
Includes a dedicated .codespell folder so dependabot can read a requirements.txt file and every files dedicated to codespell can be grouped in the same place

Co-Authored-By: Matthieu MOREL <mmorel-35@users.noreply.github.com>
Co-Authored-By: MOREL Matthieu <matthieu.morel@cnp.fr>
2022-01-04 16:19:28 +02:00
guybe7
ac84b1cd82
Ban snapshot-creating commands and other admin commands from transactions (#10015)
Creating fork (or even a foreground SAVE) during a transaction breaks the atomicity of the transaction.
In addition to that, it could mess up the propagated transaction to the AOF file.

This change blocks SAVE, PSYNC, SYNC and SHUTDOWN from being executed inside MULTI-EXEC.
It does that by adding a command flag, so that modules can flag their commands with that flag too.

Besides it changes BGSAVE, BGREWRITEAOF, and CONFIG SET appendonly, to turn the
scheduled flag instead of forking righ taway.

Other changes:
* expose `protected`, `no-async-loading`, and `no_multi` flags in COMMAND command
* add a test to validate propagation of FLUSHALL inside a transaction.
* add a test to validate how CONFIG SET that errors reacts in a transaction

Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 13:37:47 +02:00
zhaozhao.zz
2e1979a21e
use startEvictionTimeProc() in config set maxmemory (#10019)
This would mean that the effects of `CONFIG SET maxmemory` may not be visible once the command returns.
That could anyway happen since incremental eviction was added in redis 6.2 (see #7653)

We do this to fix one of the propagation bugs about eviction see #9890 and #10014.
2022-01-04 13:08:10 +02:00
chenyang8094
87789fae0b
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.

The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)

The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
  it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
  one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
  incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
  `appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
  It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
  we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
  delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
  period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
  manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
  `aof-load-truncated` is enabled.

Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-03 19:14:13 +02:00
Madelyn Olson
5460c10047
Implement clusterbus message extensions and cluster hostname support (#9530)
Implement the ability for cluster nodes to advertise their location with extension messages.
2022-01-02 19:48:29 -08:00
Harkrishn Patro
9f8885760b
Sharded pubsub implementation (#8621)
This commit implements a sharded pubsub implementation based off of shard channels.

Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2022-01-02 16:54:47 -08:00
Joey from AWS
09c668f220
Report slot to keys map size in MEMORY STATS in cluster mode (#10017)
Report slot to keys map size in MEMORY STATS in cluster mode
Report dictMetadataSize in MEMORY USAGE command as well
2022-01-02 10:39:59 +02:00
Viktor Söderqvist
45a155bd0f
Wait for replicas when shutting down (#9872)
To avoid data loss, this commit adds a grace period for lagging replicas to
catch up the replication offset.

Done:

* Wait for replicas when shutdown is triggered by SIGTERM and SIGINT.

* Wait for replicas when shutdown is triggered by the SHUTDOWN command. A new
  blocked client type BLOCKED_SHUTDOWN is introduced, allowing multiple clients
  to call SHUTDOWN in parallel.
  Note that they don't expect a response unless an error happens and shutdown is aborted.

* Log warning for each replica lagging behind when finishing shutdown.

* CLIENT_PAUSE_WRITE while waiting for replicas.

* Configurable grace period 'shutdown-timeout' in seconds (default 10).

* New flags for the SHUTDOWN command:

    - NOW disables the grace period for lagging replicas.

    - FORCE ignores errors writing the RDB or AOF files which would normally
      prevent a shutdown.

    - ABORT cancels ongoing shutdown. Can't be combined with other flags.

* New field in the output of the INFO command: 'shutdown_in_milliseconds'. The
  value is the remaining maximum time to wait for lagging replicas before
  finishing the shutdown. This field is present in the Server section **only**
  during shutdown.

Not directly related:

* When shutting down, if there is an AOF saving child, it is killed **even** if AOF
  is disabled. This can happen if BGREWRITEAOF is used when AOF is off.

* Client pause now has end time and type (WRITE or ALL) per purpose. The
  different pause purposes are *CLIENT PAUSE command*, *failover* and
  *shutdown*. If clients are unpaused for one purpose, it doesn't affect client
  pause for other purposes. For example, the CLIENT UNPAUSE command doesn't
  affect client pause initiated by the failover or shutdown procedures. A completed
  failover or a failed shutdown doesn't unpause clients paused by the CLIENT
  PAUSE command.

Notes:

* DEBUG RESTART doesn't wait for replicas.

* We already have a warning logged when a replica disconnects. This means that
  if any replica connection is lost during the shutdown, it is either logged as
  disconnected or as lagging at the time of exit.

Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-02 09:50:15 +02:00
yoav-steinberg
1bf6d6f11e
Generate RDB with Functions only via redis-cli --functions-rdb (#9968)
This is needed in order to ease the deployment of functions for ephemeral cases, where user
needs to spin up a server with functions pre-loaded.

#### Details:

* Added `--functions-rdb` option to _redis-cli_.
* Functions only rdb via `REPLCONF rdb-filter-only functions`. This is a placeholder for a space
  separated inclusion filter for the RDB. In the future can be `REPLCONF rdb-filter-only
  "functions db:3 key-patten:user*"` and a complementing `rdb-filter-exclude` `REPLCONF`
  can also be added.
* Handle "slave requirements" specification to RDB saving code so we can use the same RDB
  when different slaves express the same requirements (like functions-only) and not share the
  RDB when their requirements differ. This is currently just a flags `int`, but can be extended to
  a more complex structure with various filter fields.
* make sure to support filters only in diskless replication mode (not to override the persistence file),
  we do that by forcing diskless (even if disabled by config)

other changes:
* some refactoring in rdb.c (extract portion of a big function to a sub-function)
* rdb_key_save_delay used in AOFRW too
* sendChildInfo takes the number of updated keys (incremental, rather than absolute)

Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-02 09:39:01 +02:00
Itamar Haber
aec8c57701
Add missing metadata to the commands SSOT files. (#10016)
Add missing information about commands, mainly from reviewing redis-doc and removing
the metadata from it (https://github.com/redis/redis-doc/pull/1722)

* Reintroduces CLUSTER S****S (supported by Redis) but missing from the JSON / docs (related? #9675).
  Note that without that json file, the command won't work (breaking change)
* Adds the `replicas` argument (exists in Redis) to `CLIENT KILL`.
* Adds `history` entries to several commands based on redis-doc's man pages.
* Adds `since` to applicable command arguments based on `history` (this basically makes
  some of `history` redundant - perhaps at a later stage).
* Uses proper semantic versioning in all version references.
* Also removes `geoencodeCommand` and `geodecodeCommand` header
  declarations per b96af595a5.
2021-12-29 21:57:40 +02:00
Meir Shpilraien (Spielrein)
365cbf46a7
Add FUNCTION DUMP and RESTORE. (#9938)
Follow the conclusions to support Functions in redis cluster (#9899)

Added 2 new FUNCTION sub-commands:
1. `FUNCTION DUMP` - dump a binary payload representation of all the functions.
2. `FUNCTION RESTORE <PAYLOAD> [FLUSH|APPEND|REPLACE]` - give the binary payload extracted
   using `FUNCTION DUMP`, restore all the functions on the given payload. Restore policy can be given to
   control how to handle existing functions (default is APPEND):
   * FLUSH: delete all existing functions.
   * APPEND: appends the restored functions to the existing functions. On collision, abort.
   * REPLACE: appends the restored functions to the existing functions. On collision,
     replace the old function with the new function.

Modify `redis-cli --cluster add-node` to use `FUNCTION DUMP` to get existing functions from
one of the nodes in the cluster, and `FUNCTION RESTORE` to load the same set of functions
to the new node. `redis-cli` will execute this step before sending the `CLUSTER MEET` command
to the new node. If `FUNCTION DUMP` returns an error, assume the current Redis version do not
support functions and skip `FUNCTION RESTORE`. If `FUNCTION RESTORE` fails, abort and do not send
the `CLUSTER MEET` command. If the new node already contains functions (before the `FUNCTION RESTORE`
is sent), abort and do not add the node to the cluster. Test was added to verify
`redis-cli --cluster add-node` works as expected.
2021-12-26 09:03:37 +02:00
guybe7
7ac213079c
Sort out mess around propagation and MULTI/EXEC (#9890)
The mess:
Some parts use alsoPropagate for late propagation, others using an immediate one (propagate()),
causing edge cases, ugly/hacky code, and the tendency for bugs

The basic idea is that all commands are propagated via alsoPropagate (i.e. added to a list) and the
top-most call() is responsible for going over that list and actually propagating them (and wrapping
them in MULTI/EXEC if there's more than one command). This is done in the new function,
propagatePendingCommands.

Callers to propagatePendingCommands:
1. top-most call() (we want all nested call()s to add to the also_propagate array and just the top-most
   one to propagate them) - via `afterCommand`
2. handleClientsBlockedOnKeys: it is out of call() context and it may propagate stuff - via `afterCommand`. 
3. handleClientsBlockedOnKeys edge case: if the looked-up key is already expired, we will propagate the
   expire but will not unblock any client so `afterCommand` isn't called. in that case, we have to propagate
   the deletion explicitly.
4. cron stuff: active-expire and eviction may also propagate stuff
5. modules: the module API allows to propagate stuff from just about anywhere (timers, keyspace notifications,
   threads). I could have tried to catch all the out-of-call-context places but it seemed easier to handle it in one
   place: when we free the context. in the spirit of what was done in call(), only the top-most freeing of a module
   context may cause propagation.
6. modules: when using a thread-safe ctx it's not clear when/if the ctx will be freed. we do know that the module
   must lock the GIL before calling RM_Replicate/RM_Call so we propagate the pending commands when
   releasing the GIL.

A "known limitation", which were actually a bug, was fixed because of this commit (see propagate.tcl):
   When using a mix of RM_Call with `!` and RM_Replicate, the command would propagate out-of-order:
   first all the commands from RM_Call, and then the ones from RM_Replicate

Another thing worth mentioning is that if, in the past, a client would issue a MULTI/EXEC with just one
write command the server would blindly propagate the MULTI/EXEC too, even though it's redundant.
not anymore.

This commit renames propagate() to propagateNow() in order to cause conflicts in pending PRs.
propagatePendingCommands is the only caller of propagateNow, which is now a static, internal helper function.

Optimizations:
1. alsoPropagate will not add stuff to also_propagate if there's no AOF and replicas
2. alsoPropagate reallocs also_propagagte exponentially, to save calls to memmove

Bugfixes:
1. CONFIG SET can create evictions, sending notifications which can cause to dirty++ with modules.
   we need to prevent it from propagating to AOF/replicas
2. We need to set current_client in RM_Call. buggy scenario:
   - CONFIG SET maxmemory, eviction notifications, module hook calls RM_Call
   - assertion in lookupKey crashes, because current_client has CONFIG SET, which isn't CMD_WRITE
3. minor: in eviction, call propagateDeletion after notification, like active-expire and all commands
   (we always send a notification before propagating the command)
2021-12-23 00:03:48 +02:00
Oran Agra
41e6e05dee
Allow most CONFIG SET during loading, block some commands in async-loading (#9878)
## background
Till now CONFIG SET was blocked during loading.
(In the not so distant past, GET was disallowed too)

We recently (not released yet) added an async-loading mode, see #9323,
and during that time it'll serve CONFIG SET and any other command.
And now we realized (#9770) that some configs, and commands are dangerous
during async-loading.

## changes
* Allow most CONFIG SET during loading (both on async-loading and normal loading)
* Allow CONFIG REWRITE and CONFIG RESETSTAT during loading
* Block a few config during loading (`appendonly`, `repl-diskless-load`, and `dir`)
* Block a few commands during loading (list below)

## the blocked commands:
* SAVE - obviously we don't wanna start a foregreound save during loading 8-)
* BGSAVE - we don't mind to schedule one, but we don't wanna fork now
* BGREWRITEAOF - we don't mind to schedule one, but we don't wanna fork now
* MODULE - we obviously don't wanna unload a module during replication / rdb loading
  (MODULE HELP and MODULE LIST are not blocked)
* SYNC / PSYNC - we're in the middle of RDB loading from master, must not allow sync
  requests now.
* REPLICAOF / SLAVEOF - we're in the middle of replicating, maybe it makes sense to let
  the user abort it, but he couldn't do that so far, i don't wanna take any risk of bugs due to odd state.
* CLUSTER - only allow [HELP, SLOTS, NODES, INFO, MYID, LINKS, KEYSLOT, COUNTKEYSINSLOT,
  GETKEYSINSLOT, RESET, REPLICAS, COUNT_FAILURE_REPORTS], for others, preserve the status quo

## other fixes
* processEventsWhileBlocked had an issue when being nested, this could happen with a busy script
  during async loading (new), but also in a busy script during AOF loading (old). this lead to a crash in
  the scenario described in #6988
2021-12-22 14:11:16 +02:00
Meir Shpilraien (Spielrein)
3bcf108416
Change FUNCTION CREATE, DELETE and FLUSH to be WRITE commands instead of MAY_REPLICATE. (#9953)
The issue with MAY_REPLICATE is that all automatic mechanisms to handle
write commands will not work. This require have a special treatment for:
* Not allow those commands to be executed on RO replica.
* Allow those commands to be executed on RO replica from primary connection.
* Allow those commands to be executed on the RO replica from AOF.

By setting those commands as WRITE commands we are getting all those properties from Redis.
Test was added to verify that those properties work as expected.

In addition, rearrange when and where functions are flushed. Before this PR functions were
flushed manually on `rdbLoadRio` and cleaned manually on failure. This contradicts the
assumptions that functions are data and need to be created/deleted alongside with the
data. A side effect of this, for example, `debug reload noflush` did not flush the data but
did flush the functions, `debug loadaof` flush the data but not the functions.
This PR move functions deletion into `emptyDb`. `emptyDb` (renamed to `emptyData`) will
now accept an additional flag, `NOFUNCTIONS` which specifically indicate that we do not
want to flush the functions (on all other cases, functions will be flushed). Used the new flag
on FLUSHALL and FLUSHDB only! Tests were added to `debug reload` and `debug loadaof`
to verify that functions behave the same as the data.

Notice that because now functions will be deleted along side with the data we can not allow
`CLUSTER RESET` to be called from within a function (it will cause the function to be released
while running), this PR adds `NO_SCRIPT` flag to `CLUSTER RESET`  so it will not be possible
to be called from within a function. The other cluster commands are allowed from within a
function (there are use-cases that uses `GETKEYSINSLOT` to iterate over all the keys on a
given slot). Tests was added to verify `CLUSTER RESET` is denied from within a script.

Another small change on this PR is that `RDBFLAGS_ALLOW_DUP` is also applicable on functions.
When loading functions, if this flag is set, we will replace old functions with new ones on collisions.
2021-12-21 16:13:29 +02:00
zhugezy
1b0968df46
Remove EVAL script verbatim replication, propagation, and deterministic execution logic (#9812)
# Background

The main goal of this PR is to remove relevant logics on Lua script verbatim replication,
only keeping effects replication logic, which has been set as default since Redis 5.0.
As a result, Lua in Redis 7.0 would be acting the same as Redis 6.0 with default
configuration from users' point of view.

There are lots of reasons to remove verbatim replication.
Antirez has listed some of the benefits in Issue #5292:

>1. No longer need to explain to users side effects into scripts.
    They can do whatever they want.
>2. No need for a cache about scripts that we sent or not to the slaves.
>3. No need to sort the output of certain commands inside scripts
    (SMEMBERS and others): this both simplifies and gains speed.
>4. No need to store scripts inside the RDB file in order to startup correctly.
>5. No problems about evicting keys during the script execution.

When looking back at Redis 5.0, antirez and core team decided to set the config
`lua-replicate-commands yes` by default instead of removing verbatim replication
directly, in case some bad situations happened. 3 years later now before Redis 7.0,
it's time to remove it formally.

# Changes

- configuration for lua-replicate-commands removed
  - created config file stub for backward compatibility
- Replication script cache removed
  - this is useless under script effects replication
  - relevant statistics also removed
- script persistence in RDB files is also removed
- Propagation of SCRIPT LOAD and SCRIPT FLUSH to replica / AOF removed
- Deterministic execution logic in scripts removed (i.e. don't run write commands
  after random ones, and sorting output of commands with random order)
  - the flags indicating which commands have non-deterministic results are kept as hints to clients.
- `redis.replicate_commands()` & `redis.set_repl()` changed
  - now `redis.replicate_commands()` does nothing and return an 1
  - ...and then `redis.set_repl()` can be issued before `redis.replicate_commands()` now
- Relevant TCL cases adjusted
- DEBUG lua-always-replicate-commands removed

# Other changes
- Fix a recent bug comparing CLIENT_ID_AOF to original_client->flags instead of id. (introduced in #9780)

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-12-21 08:32:42 +02:00
YaacovHazan
ae2f5b7b2e
Protected configs and sensitive commands (#9920)
Block sensitive configs and commands by default.

* `enable-protected-configs` - block modification of configs with the new `PROTECTED_CONFIG` flag.
   Currently we add this flag to `dbfilename`, and `dir` configs,
   all of which are non-mutable configs that can set a file redis will write to.
* `enable-debug-command` - block the `DEBUG` command
* `enable-module-command` - block the `MODULE` command

These have a default value set to `no`, so that these features are not
exposed by default to client connections, and can only be set by modifying the config file.

Users can change each of these to either `yes` (allow all access), or `local` (allow access from
local TCP connections and unix domain connections)

Note that this is a **breaking change** (specifically the part about MODULE command being disabled by default).
I.e. we don't consider DEBUG command being blocked as an issue (people shouldn't have been using it),
and the few configs we protected are unlikely to have been set at runtime anyway.
On the other hand, it's likely to assume some users who use modules, load them from the config file anyway.
Note that's the whole point of this PR, for redis to be more secure by default and reduce the attack surface on
innocent users, so secure defaults will necessarily mean a breaking change.
2021-12-19 10:46:16 +02:00
ny0312
792afb4432
Introduce memory management on cluster link buffers (#9774)
Introduce memory management on cluster link buffers:
 * Introduce a new `cluster-link-sendbuf-limit` config that caps memory usage of cluster bus link send buffers.
 * Introduce a new `CLUSTER LINKS` command that displays current TCP links to/from peers.
 * Introduce a new `mem_cluster_links` field under `INFO` command output, which displays the overall memory usage by all current cluster links.
 * Introduce a new `total_cluster_links_buffer_limit_exceeded` field under `CLUSTER INFO` command output, which displays the accumulated count of cluster links freed due to `cluster-link-sendbuf-limit`.
2021-12-16 21:56:59 -08:00
Meir Shpilraien (Spielrein)
687210f155
Add FUNCTION FLUSH command to flush all functions (#9936)
Added `FUNCTION FLUSH` command. The new sub-command allows delete all the functions.
An optional `[SYNC|ASYNC]` argument can be given to control whether or not to flush the
functions synchronously or asynchronously. if not given the default flush mode is chosen by
`lazyfree-lazy-user-flush` configuration values.

Add the missing `functions.tcl` test to the list of tests that are executed in test_helper.tcl,
and call FUNCTION FLUSH in between servers in external mode
2021-12-16 17:58:25 +02:00
guybe7
867816003e
Auto-generate the command table from JSON files (#9656)
Delete the hardcoded command table and replace it with an auto-generated table, based
on a JSON file that describes the commands (each command must have a JSON file).

These JSON files are the SSOT of everything there is to know about Redis commands,
and it is reflected fully in COMMAND INFO.

These JSON files are used to generate commands.c (using a python script), which is then
committed to the repo and compiled.

The purpose is:
* Clients and proxies will be able to get much more info from redis, instead of relying on hard coded logic.
* drop the dependency between Redis-user and the commands.json in redis-doc.
* delete help.h and have redis-cli learn everything it needs to know just by issuing COMMAND (will be
  done in a separate PR)
* redis.io should stop using commands.json and learn everything from Redis (ultimately one of the release
  artifacts should be a large JSON, containing all the information about all of the commands, which will be
  generated from COMMAND's reply)
* the byproduct of this is:
  * module commands will be able to provide that info and possibly be more of a first-class citizens
  * in theory, one may be able to generate a redis client library for a strictly typed language, by using this info.

### Interface changes

#### COMMAND INFO's reply change (and arg-less COMMAND)

Before this commit the reply at index 7 contained the key-specs list
and reply at index 8 contained the sub-commands list (Both unreleased).
Now, reply at index 7 is a map of:
- summary - short command description
- since - debut version
- group - command group
- complexity - complexity string
- doc-flags - flags used for documentation (e.g. "deprecated")
- deprecated-since - if deprecated, from which version?
- replaced-by - if deprecated, which command replaced it?
- history - a list of (version, what-changed) tuples
- hints - a list of strings, meant to provide hints for clients/proxies. see https://github.com/redis/redis/issues/9876
- arguments - an array of arguments. each element is a map, with the possibility of nesting (sub-arguments)
- key-specs - an array of keys specs (already in unstable, just changed location)
- subcommands - a list of sub-commands (already in unstable, just changed location)
- reply-schema - will be added in the future (see https://github.com/redis/redis/issues/9845)

more details on these can be found in https://github.com/redis/redis-doc/pull/1697

only the first three fields are mandatory 

#### API changes (unreleased API obviously)

now they take RedisModuleCommand opaque pointer instead of looking up the command by name

- RM_CreateSubcommand
- RM_AddCommandKeySpec
- RM_SetCommandKeySpecBeginSearchIndex
- RM_SetCommandKeySpecBeginSearchKeyword
- RM_SetCommandKeySpecFindKeysRange
- RM_SetCommandKeySpecFindKeysKeynum

Currently, we did not add module API to provide additional information about their commands because
we couldn't agree on how the API should look like, see https://github.com/redis/redis/issues/9944.

### Somehow related changes
1. Literals should be in uppercase while placeholder in lowercase. Now all the GEO* command
   will be documented with M|KM|FT|MI and can take both lowercase and uppercase

### Unrelated changes
1. Bugfix: no_madaory_keys was absent in COMMAND's reply
2. expose CMD_MODULE as "module" via COMMAND
3. have a dedicated uint64 for ACL categories (instead of having them in the same uint64 as command flags)

Co-authored-by: Itamar Haber <itamar@garantiadata.com>
2021-12-15 21:23:15 +02:00
yoav-steinberg
1736fa4d22
Don't write oom score adj to proc unless we're managing it. (#9904)
When disabling redis oom-score-adj managment we restore the
base value read before enabling oom-score-adj management.

This fixes an issue introduced in #9748 where updating
`oom-score-adj-values` while `oom-score-adj` was set to `no`
would write the base oom score adj value read on startup to `/proc`.
This is a bug since while `oom-score-adj` is disabled we should
never write to proc and let external processes manage it.

Added appropriate tests.
2021-12-07 16:05:51 +02:00
meir@redislabs.com
cbd463175f Redis Functions - Added redis function unit and Lua engine
Redis function unit is located inside functions.c
and contains Redis Function implementation:
1. FUNCTION commands:
  * FUNCTION CREATE
  * FCALL
  * FCALL_RO
  * FUNCTION DELETE
  * FUNCTION KILL
  * FUNCTION INFO
2. Register engine

In addition, this commit introduce the first engine
that uses the Redis Function capabilities, the
Lua engine.
2021-12-02 19:35:52 +02:00
meir@redislabs.com
f21dc38a6e Redis Functions - Moved invoke Lua code functionality to script_lua.c
The functionality was moved to script_lua.c under
callFunction function. Its purpose is to call the Lua
function already located on the top of the Lua stack.

Used the new function on eval.c to invoke Lua code.
The function will also be used to invoke Lua
code on the Lua engine.
2021-12-01 23:57:06 +02:00
meir@redislabs.com
fc731bc67f Redis Functions - Introduce script unit.
Script unit is a new unit located on script.c.
Its purpose is to provides an API for functions (and eval)
to interact with Redis. Interaction includes mostly
executing commands, but also functionalities like calling
Redis back on long scripts or check if the script was killed.

The interaction is done using a scriptRunCtx object that
need to be created by the user and initialized using scriptPrepareForRun.

Detailed list of functionalities expose by the unit:
1. Calling commands (including all the validation checks such as
   acl, cluster, read only run, ...)
2. Set Resp
3. Set Replication method (AOF/REPLICATION/NONE)
4. Call Redis back to on long running scripts to allow Redis reply
   to clients and perform script kill

The commit introduce the new unit and uses it on eval commands to
interact with Redis.
2021-12-01 23:54:23 +02:00
meir@redislabs.com
e0cd580aef Redis Functions - Move Lua related variable into luaCtx struct
The following variable was renamed:
1. lua_caller 			-> script_caller
2. lua_time_limit 		-> script_time_limit
3. lua_timedout 		-> script_timedout
4. lua_oom 			-> script_oom
5. lua_disable_deny_script 	-> script_disable_deny_script
6. in_eval			-> in_script

The following variables was moved to lctx under eval.c
1.  lua
2.  lua_client
3.  lua_cur_script
4.  lua_scripts
5.  lua_scripts_mem
6.  lua_replicate_commands
7.  lua_write_dirty
8.  lua_random_dirty
9.  lua_multi_emitted
10. lua_repl
11. lua_kill
12. lua_time_start
13. lua_time_snapshot

This commit is in a low risk of introducing any issues and it
is just moving varibales around and not changing any logic.
2021-12-01 23:31:08 +02:00
yoav-steinberg
0e5b813ef9
Multiparam config set (#9748)
We can now do: `config set maxmemory 10m repl-backlog-size 5m`

## Basic algorithm to support "transaction like" config sets:

1. Backup all relevant current values (via get).
2. Run "verify" and "set" on everything, if we fail run "restore".
3. Run "apply" on everything (optional optimization: skip functions already run). If we fail run "restore".
4. Return success.

### restore
1. Run set on everything in backup. If we fail log it and continue (this puts us in an undefined
   state but we decided it's better than the alternative of panicking). This indicates either a bug
   or some unsupported external state.
2. Run apply on everything in backup (optimization: skip functions already run). If we fail log
   it (see comment above).
3. Return error.

## Implementation/design changes:
* Apply function are idempotent (have no effect if they are run more than once for the same config).
* No indication in set functions if we're reading the config or running from the `CONFIG SET` command
   (removed `update` argument).
* Set function should set some config variable and assume an (optional) apply function will use that
   later to apply. If we know this setting can be safely applied immediately and can always be reverted
   and doesn't depend on any other configuration we can apply immediately from within the set function
   (and not store the setting anywhere). This is the case of this `dir` config, for example, which has no
   apply function. No apply function is need also in the case that setting the variable in the `server` struct
   is all that needs to be done to make the configuration take effect. Note that the original concept of `update_fn`,
   which received the old and new values was removed and replaced by the optional apply function.
* Apply functions use settings written to the `server` struct and don't receive any inputs.
* I take care that for the generic (non-special) configs if there's no change I avoid calling the setter (possible
   optimization: avoid calling the apply function as well).
* Passing the same config parameter more than once to `config set` will fail. You can't do `config set my-setting
   value1 my-setting value2`.

Note that getting `save` in the context of the conf file parsing to work here as before was a pain.
The conf file supports an aggregate `save` definition, where each `save` line is added to the server's
save params. This is unlike any other line in the config file where each line overwrites any previous
configuration. Since we now support passing multiple save params in a single line (see top comments
about `save` in https://github.com/redis/redis/pull/9644) we should deprecate the aggregate nature of
this config line and perhaps reduce this ugly code in the future.
2021-12-01 10:15:11 +02:00
丽媛自己动
af072c26bc
modify misleading note in comment (#9865)
now rdbSaveInfo used in both way, so i think we should update previous notes, in case of misleading
2021-11-30 14:10:49 +02:00
Viktor Söderqvist
acf3495eb8
Sort out the mess around writable replicas and lookupKeyRead/Write (#9572)
Writable replicas now no longer use the values of expired keys. Expired keys are
deleted when lookupKeyWrite() is used, even on a writable replica. Previously,
writable replicas could use the value of an expired key in write commands such
as INCR, SUNIONSTORE, etc..

This commit also sorts out the mess around the functions lookupKeyRead() and
lookupKeyWrite() so they now indicate what we intend to do with the key and
are not affected by the command calling them.

Multi-key commands like SUNIONSTORE, ZUNIONSTORE, COPY and SORT with the
store option now use lookupKeyRead() for the keys they're reading from (which will
not allow reading from logically expired keys).

This commit also fixes a bug where PFCOUNT could return a value of an
expired key.

Test modules commands have their readonly and write flags updated to correctly
reflect their lookups for reading or writing. Modules are not required to
correctly reflect this in their command flags, but this change is made for
consistency since the tests serve as usage examples.

Fixes #6842. Fixes #7475.
2021-11-28 11:26:28 +02:00
sundb
4d8700786e
Fix COMMAND GETKEYS on LCS (#9852)
Remove lcsGetKeys to clean up the remaining STRALGO after #9733.
i.e. it still used a getkeys_proc which was still looking for the KEYS or STRINGS arguments
2021-11-28 09:02:38 +02:00
Huang Zhw
596635fa0c
Cleanup: fix typo and remove some obsoleting definitions. (#9851) 2021-11-27 14:47:51 -08:00
sundb
4512905961
Replace ziplist with listpack in quicklist (#9740)
Part three of implementing #8702, following #8887 and #9366 .

## Description of the feature
1. Replace the ziplist container of quicklist with listpack.
2. Convert existing quicklist ziplists on RDB loading time. an O(n) operation.

## Interface changes
1. New `list-max-listpack-size` config is an alias for `list-max-ziplist-size`.
2. Replace `debug ziplist` command with `debug listpack`.

## Internal changes
1. Add `lpMerge` to merge two listpacks . (same as `ziplistMerge`)
2. Add `lpRepr` to print info of listpack which is used in debugCommand and `quicklistRepr`. (same as `ziplistRepr`)
3. Replace `QUICKLIST_NODE_CONTAINER_ZIPLIST` with `QUICKLIST_NODE_CONTAINER_PACKED`(following #9357 ).
    It represent that a quicklistNode is a packed node, as opposed to a plain node.
4. Remove `createZiplistObject` method, which is never used.
5. Calculate listpack entry size using overhead overestimation in `quicklistAllowInsert`.
    We prefer an overestimation, which would at worse lead to a few bytes below the lowest limit of 4k.

## Improvements
1. Calling `lpShrinkToFit` after converting Ziplist to listpack, which was missed at #9366.
2. Optimize `quicklistAppendPlainNode` to avoid memcpy data.

## Bugfix
1. Fix crash in `quicklistRepr` when ziplist is compressed, introduced from #9366.

## Test
1. Add unittest for `lpMerge`.
2. Modify the old quicklist ziplist corrupt dump test.

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-11-24 13:34:13 +02:00
guybe7
b161cff5f9
QUIT is a command, HOST: and POST are not (#9798)
Some people complain that QUIT is missing from help/command table.
Not appearing in COMMAND command, command stats, ACL, etc.
and instead, there's a hack in processCommand with a comment that looks outdated.
Note that it is [documented](https://redis.io/commands/quit)

At the same time, HOST: and POST are there in the command table although these are not real commands.
They would appear in the COMMAND command, and even in commandstats.

Other changes:
1. Initialize the static logged_time static var in securityWarningCommand
2. add `no-auth` flag to RESET so it can always be executed.
2021-11-23 10:38:25 +02:00
guybe7
af7489886d
Obliterate STRALGO! add LCS (which only works on keys) (#9799)
Drop the STRALGO command, now LCS is a command of its own and it only works on keys (not input strings).
The motivation is that STRALGO's syntax was really messed-up...
- assumes all (future) string algorithms will take similar arguments
- mixes command that takes keys and one that doesn't in the same command.
- make it nearly impossible to expose the right key spec in COMMAND INFO (issues cluster clients)
- hard for cluster clients to determine the key names (firstkey, lastkey, etc)
- hard for ACL / flags (is it a read command?)

This is a breaking change.
2021-11-18 10:47:49 +02:00
Binbin
91e77a0cfb
Fixes ZPOPMIN/ZPOPMAX wrong replies when count is 0 with non-zset (#9711)
Moves ZPOP ... 0 fast exit path after type check to reply with
WRONGTYPE. In the past it will return an empty array.

Also now count is not allowed to be negative.

see #9680

before:
```
127.0.0.1:6379> set zset str
OK
127.0.0.1:6379> zpopmin zset 0
(empty array)
127.0.0.1:6379> zpopmin zset -1
(empty array)
```

after:
```
127.0.0.1:6379> set zset str
OK
127.0.0.1:6379> zpopmin zset 0
(error) WRONGTYPE Operation against a key holding the wrong kind of value
127.0.0.1:6379> zpopmin zset -1
(error) ERR value is out of range, must be positive
```
2021-11-18 10:13:16 +02:00
yoav-steinberg
79ac57561f
Refactor config.c for generic setter interface (#9644)
This refactors all `CONFIG SET`s and conf file loading arguments go through
the generic config handling interface.

Refactoring changes:
- All config params go through the `standardConfig` interface (some stuff which
  is only related to the config file and not the `CONFIG` command still has special
  handling for rewrite/config file parsing, `loadmodule`, for example.) .
- Added `MULTI_ARG_CONFIG` flag for configs to signify they receive a variable
  number of arguments instead of a single argument. This is used to break up space
  separated arguments to `CONFIG SET` so the generic setter interface can pass
  multiple arguments to the setter function. When parsing the config file we also break
  up anything after the config name into multiple arguments to the setter function.

Interface changes:
- A side effect of the above interface is that the `bind` argument in the config file can
  be empty (no argument at all) this is treated the same as passing an single empty
  string argument (same as `save` already used to work).
- Support rewrite and setting `watchdog-period` from config file (was only supported
  by the CONFIG command till now).
- Another side effect is that the `save T X` config argument now supports multiple
  Time-Changes pairs in a single line like its `CONFIG SET` counterpart. So in the
  config file you can either do:
  ```
  save 3600 1
  save 600 10
  ```
  or do
  ```
  save 3600 1 600 10
  ```

Co-authored-by: Bjorn Svensson <bjorn.a.svensson@est.tech>
2021-11-07 13:40:08 +02:00
Eduardo Semprebon
91d0c758e5
Replica keep serving data during repl-diskless-load=swapdb for better availability (#9323)
For diskless replication in swapdb mode, considering we already spend replica memory
having a backup of current db to restore in case of failure, we can have the following benefits
by instead swapping database only in case we succeeded in transferring db from master:

- Avoid `LOADING` response during failed and successful synchronization for cases where the
  replica is already up and running with data.
- Faster total time of diskless replication, because now we're moving from Transfer + Flush + Load
  time to Transfer + Load only. Flushing the tempDb is done asynchronously after swapping.
- This could be implemented also for disk replication with similar benefits if consumers are willing
  to spend the extra memory usage.

General notes:
- The concept of `backupDb` becomes `tempDb` for clarity.
- Async loading mode will only kick in if the replica is syncing from a master that has the same
  repl-id the one it had before. i.e. the data it's getting belongs to a different time of the same timeline. 
- New property in INFO: `async_loading` to differentiate from the blocking loading
- Slot to Key mapping is now a field of `redisDb` as it's more natural to access it from both server.db
  and the tempDb that is passed around.
- Because this is affecting replicas only, we assume that if they are not readonly and write commands
  during replication, they are lost after SYNC same way as before, but we're still denying CONFIG SET
  here anyways to avoid complications.

Considerations for review:
- We have many cases where server.loading flag is used and even though I tried my best, there may
  be cases where async_loading should be checked as well and cases where it shouldn't (would require
  very good understanding of whole code)
- Several places that had different behavior depending on the loading flag where actually meant to just
  handle commands coming from the AOF client differently than ones coming from real clients, changed
  to check CLIENT_ID_AOF instead.

**Additional for Release Notes**
- Bugfix - server.dirty was not incremented for any kind of diskless replication, as effect it wouldn't
  contribute on triggering next database SAVE
- New flag for RM_GetContextFlags module API: REDISMODULE_CTX_FLAGS_ASYNC_LOADING
- Deprecated RedisModuleEvent_ReplBackup. Starting from Redis 7.0, we don't fire this event.
  Instead, we have the new RedisModuleEvent_ReplAsyncLoad holding 3 sub-events: STARTED,
  ABORTED and COMPLETED.
- New module flag REDISMODULE_OPTIONS_HANDLE_REPL_ASYNC_LOAD for RedisModule_SetModuleOptions
  to allow modules to declare they support the diskless replication with async loading (when absent, we fall
  back to disk-based loading).

Co-authored-by: Eduardo Semprebon <edus@saxobank.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2021-11-04 10:46:50 +02:00
guybe7
f11a2d4dd7
Fix COMMAND GETKEYS on EVAL without keys (#9733)
Add new no-mandatory-keys flag to support COMMAND GETKEYS of commands
which have no mandatory keys.

In the past we would have got this error:
```
127.0.0.1:6379> command getkeys eval "return 1" 0
(error) ERR Invalid arguments specified for command
```
2021-11-03 14:38:26 +02:00
perryitay
77d3c6bff3
fix: lookupKey on SETNX and SETXX only once (#9640)
When using SETNX and SETXX we could end up doing key lookup twice.
This presents a small inefficiency price.
Also once we have statistics of write hit and miss they'll be wrong (recording the same key hit twice)
2021-11-03 14:12:33 +02:00
zhaozhao.zz
d08f0552ee
rebuild replication backlog index when master restart (#9720)
After PR #9166 , replication backlog is not a real block of memory, just contains a
reference points to replication buffer's block and the blocks index (to accelerate
search offset when partial sync), so we need update both replication buffer's block's
offset and replication backlog blocks index's offset when master restart from RDB,
since the `server.master_repl_offset` is changed.
The implications of this bug was just a slow search, but not a replication failure.
2021-11-02 10:53:52 +02:00
Wang Yuan
9ec3294b97
Add timestamp annotations in AOF (#9326)
Add timestamp annotation in AOF, one part of #9325.

Enabled with the new `aof-timestamp-enabled` config option.

Timestamp annotation format is "#TS:${timestamp}\r\n"."
TS" is short of timestamp and this method could save extra bytes in AOF.

We can use timestamp annotation for some special functions. 
- know the executing time of commands
- restore data to a specific point-in-time (by using redis-check-rdb to truncate the file)
2021-10-25 13:08:34 +03:00
Oran Agra
085615af97
Improve code doc of allowed_firstargs following #9504 (#9674)
Improve code doc for allowed_firstargs (used to be allowed_commands before #9504.
I don't think the text in the code needs to refer to the history (it's not there just for backwards compatibility).
instead it should just describe what it does.
2021-10-25 13:01:25 +03:00
Wang Yuan
c1718f9d86
Replication backlog and replicas use one global shared replication buffer (#9166)
## Background
For redis master, one replica uses one copy of replication buffer, that is a big waste of memory,
more replicas more waste, and allocate/free memory for every reply list also cost much.
If we set client-output-buffer-limit small and write traffic is heavy, master may disconnect with
replicas and can't finish synchronization with replica. If we set  client-output-buffer-limit big,
master may be OOM when there are many replicas that separately keep much memory.
Because replication buffers of different replica client are the same, one simple idea is that
all replicas only use one replication buffer, that will effectively save memory.

Since replication backlog content is the same as replicas' output buffer, now we
can discard replication backlog memory and use global shared replication buffer
to implement replication backlog mechanism.

## Implementation
I create one global "replication buffer" which contains content of replication stream.
The structure of "replication buffer" is similar to the reply list that exists in every client.
But the node of list is `replBufBlock`, which has `id, repl_offset, refcount` fields.
```c
/* Replication buffer blocks is the list of replBufBlock.
 *
 * +--------------+       +--------------+       +--------------+
 * | refcount = 1 |  ...  | refcount = 0 |  ...  | refcount = 2 |
 * +--------------+       +--------------+       +--------------+
 *      |                                            /       \
 *      |                                           /         \
 *      |                                          /           \
 *  Repl Backlog                               Replia_A      Replia_B
 * 
 * Each replica or replication backlog increments only the refcount of the
 * 'ref_repl_buf_node' which it points to. So when replica walks to the next
 * node, it should first increase the next node's refcount, and when we trim
 * the replication buffer nodes, we remove node always from the head node which
 * refcount is 0. If the refcount of the head node is not 0, we must stop
 * trimming and never iterate the next node. */

/* Similar with 'clientReplyBlock', it is used for shared buffers between
 * all replica clients and replication backlog. */
typedef struct replBufBlock {
    int refcount;           /* Number of replicas or repl backlog using. */
    long long id;           /* The unique incremental number. */
    long long repl_offset;  /* Start replication offset of the block. */
    size_t size, used;
    char buf[];
} replBufBlock;
```
So now when we feed replication stream into replication backlog and all replicas, we only need
to feed stream into replication buffer `feedReplicationBuffer`. In this function, we set some fields of
replication backlog and replicas to references of the global replication buffer blocks. And we also
need to check replicas' output buffer limit to free if exceeding `client-output-buffer-limit`, and trim
replication backlog if exceeding `repl-backlog-size`.

When sending reply to replicas, we also need to iterate replication buffer blocks and send its
content, when totally sending one block for replica, we decrease current node count and
increase the next current node count, and then free the block which reference is 0 from the
head of replication buffer blocks.

Since now we use linked list to manage replication backlog, it may cost much time for iterating
all linked list nodes to find corresponding replication buffer node. So we create a rax tree to
store some nodes  for index, but to avoid rax tree occupying too much memory, i record
one per 64 nodes for index.

Currently, to make partial resynchronization as possible as much, we always let replication
backlog as the last reference of replication buffer blocks, backlog size may exceeds our setting
if slow replicas that reference vast replication buffer blocks, and this method doesn't increase
memory usage since they share replication buffer. To avoid freezing server for freeing unreferenced
replication buffer blocks when we need to trim backlog for exceeding backlog size setting,
we trim backlog incrementally (free 64 blocks per call now), and make it faster in
`beforeSleep` (free 640 blocks).

### Other changes
- `mem_total_replication_buffers`: we add this field in INFO command, it means the total
  memory of replication buffers used.
- `mem_clients_slaves`:  now even replica is slow to replicate, and its output buffer memory
  is not 0, but it still may be 0, since replication backlog and replicas share one global replication
  buffer, only if replication buffer memory is more than the repl backlog setting size, we consider
  the excess as replicas' memory. Otherwise, we think replication buffer memory is the consumption
  of repl backlog.
- Key eviction
  Since all replicas and replication backlog share global replication buffer, we think only the
  part of exceeding backlog size the extra separate consumption of replicas.
  Because we trim backlog incrementally in the background, backlog size may exceeds our
  setting if slow replicas that reference vast replication buffer blocks disconnect.
  To avoid massive eviction loop, we don't count the delayed freed replication backlog into
  used memory even if there are no replicas, i.e. we also regard this memory as replicas's memory.
- `client-output-buffer-limit` check for replica clients
  It doesn't make sense to set the replica clients output buffer limit lower than the repl-backlog-size
  config (partial sync will succeed and then replica will get disconnected). Such a configuration is
  ignored (the size of repl-backlog-size will be used). This doesn't have memory consumption
  implications since the replica client will share the backlog buffers memory.
- Drop replication backlog after loading data if needed
  We always create replication backlog if server is a master, we need it because we put DELs in
  it when loading expired keys in RDB, but if RDB doesn't have replication info or there is no rdb,
  it is not possible to support partial resynchronization, to avoid extra memory of replication backlog,
  we drop it.
- Multi IO threads
 Since all replicas and replication backlog use global replication buffer,  if I/O threads are enabled,
  to guarantee data accessing thread safe, we must let main thread handle sending the output buffer
  to all replicas. But before, other IO threads could handle sending output buffer of all replicas.

## Other optimizations
This solution resolve some other problem:
- When replicas disconnect with master since of out of output buffer limit, releasing the output
  buffer of replicas may freeze server if we set big `client-output-buffer-limit` for replicas, but now,
  it doesn't cause freezing.
- This implementation may mitigate reply list copy cost time(also freezes server) when one replication
  has huge reply buffer and another replica can copy buffer for full synchronization. now, we just copy
  reference info, it is very light.
- If we set replication backlog size big, it also may cost much time to copy replication backlog into
  replica's output buffer. But this commit eliminates this problem.
- Resizing replication backlog size doesn't empty current replication backlog content.
2021-10-25 09:24:31 +03:00
guybe7
43e736f79b
Treat subcommands as commands (#9504)
## Intro

The purpose is to allow having different flags/ACL categories for
subcommands (Example: CONFIG GET is ok-loading but CONFIG SET isn't)

We create a small command table for every command that has subcommands
and each subcommand has its own flags, etc. (same as a "regular" command)

This commit also unites the Redis and the Sentinel command tables

## Affected commands

CONFIG
Used to have "admin ok-loading ok-stale no-script"
Changes:
1. Dropped "ok-loading" in all except GET (this doesn't change behavior since
there were checks in the code doing that)

XINFO
Used to have "read-only random"
Changes:
1. Dropped "random" in all except CONSUMERS

XGROUP
Used to have "write use-memory"
Changes:
1. Dropped "use-memory" in all except CREATE and CREATECONSUMER

COMMAND
No changes.

MEMORY
Used to have "random read-only"
Changes:
1. Dropped "random" in PURGE and USAGE

ACL
Used to have "admin no-script ok-loading ok-stale"
Changes:
1. Dropped "admin" in WHOAMI, GENPASS, and CAT

LATENCY
No changes.

MODULE
No changes.

SLOWLOG
Used to have "admin random ok-loading ok-stale"
Changes:
1. Dropped "random" in RESET

OBJECT
Used to have "read-only random"
Changes:
1. Dropped "random" in ENCODING and REFCOUNT

SCRIPT
Used to have "may-replicate no-script"
Changes:
1. Dropped "may-replicate" in all except FLUSH and LOAD

CLIENT
Used to have "admin no-script random ok-loading ok-stale"
Changes:
1. Dropped "random" in all except INFO and LIST
2. Dropped "admin" in ID, TRACKING, CACHING, GETREDIR, INFO, SETNAME, GETNAME, and REPLY

STRALGO
No changes.

PUBSUB
No changes.

CLUSTER
Changes:
1. Dropped "admin in countkeysinslots, getkeysinslot, info, nodes, keyslot, myid, and slots

SENTINEL
No changes.

(note that DEBUG also fits, but we decided not to convert it since it's for
debugging and anyway undocumented)

## New sub-command
This commit adds another element to the per-command output of COMMAND,
describing the list of subcommands, if any (in the same structure as "regular" commands)
Also, it adds a new subcommand:
```
COMMAND LIST [FILTERBY (MODULE <module-name>|ACLCAT <cat>|PATTERN <pattern>)]
```
which returns a set of all commands (unless filters), but excluding subcommands.

## Module API
A new module API, RM_CreateSubcommand, was added, in order to allow
module writer to define subcommands

## ACL changes:
1. Now, that each subcommand is actually a command, each has its own ACL id.
2. The old mechanism of allowed_subcommands is redundant
(blocking/allowing a subcommand is the same as blocking/allowing a regular command),
but we had to keep it, to support the widespread usage of allowed_subcommands
to block commands with certain args, that aren't subcommands (e.g. "-select +select|0").
3. I have renamed allowed_subcommands to allowed_firstargs to emphasize the difference.
4. Because subcommands are commands in ACL too, you can now use "-" to block subcommands
(e.g. "+client -client|kill"), which wasn't possible in the past.
5. It is also possible to use the allowed_firstargs mechanism with subcommand.
For example: `+config -config|set +config|set|loglevel` will block all CONFIG SET except
for setting the log level.
6. All of the ACL changes above required some amount of refactoring.

## Misc
1. There are two approaches: Either each subcommand has its own function or all
   subcommands use the same function, determining what to do according to argv[0].
   For now, I took the former approaches only with CONFIG and COMMAND,
   while other commands use the latter approach (for smaller blamelog diff).
2. Deleted memoryGetKeys: It is no longer needed because MEMORY USAGE now uses the "range" key spec.
4. Bugfix: GETNAME was missing from CLIENT's help message.
5. Sentinel and Redis now use the same table, with the same function pointer.
   Some commands have a different implementation in Sentinel, so we redirect
   them (these are ROLE, PUBLISH, and INFO).
6. Command stats now show the stats per subcommand (e.g. instead of stats just
   for "config" you will have stats for "config|set", "config|get", etc.)
7. It is now possible to use COMMAND directly on subcommands:
   COMMAND INFO CONFIG|GET (The pipeline syntax was inspired from ACL, and
   can be used in functions lookupCommandBySds and lookupCommandByCString)
8. STRALGO is now a container command (has "help")

## Breaking changes:
1. Command stats now show the stats per subcommand (see (5) above)
2021-10-20 11:52:57 +03:00
Bjorn Svensson
c9fabc2ef0
Move config unixsocketperm to generic configs (#9607)
Since the size of mode_t is platform dependant we handle the
`unixsocketperm` configuration as a generic int type.
mode_t is either an unsigned int or unsigned short (macOS) and
the range-limits allows for a simple cast to a mode_t.
2021-10-18 23:58:52 -07:00
Wen Hui
1c2b5f5318
Make Cluster-bus port configurable with new cluster-port config (#9389)
Make Cluster-bus port configurable with new cluster-port config
2021-10-18 22:28:27 -07:00
Hanna Fadida
61bb044156
Modify mem_usage2 module callback to enable to take sample_size argument (#9612)
This is useful for approximating size computation of complex module types.
Note that the mem_usage2 callback is new and has not been released yet, which is why we can modify it.
2021-10-17 17:31:06 +03:00
Bjorn Svensson
b874c6f1fc
Move config logfile to generic config (#9592)
Move config `logfile` to generic configs
2021-10-07 22:33:08 -07:00
Bjorn Svensson
54d01e363a
Move config cluster-config-file to generic configs (#9597) 2021-10-07 22:32:40 -07:00
Huang Zhw
fd135f3e2d
Make tracking invalidation messages always after command's reply (#9422)
Tracking invalidation messages were sometimes sent in inconsistent order,
before the command's reply rather than after.
In addition to that, they were sometimes embedded inside other commands
responses, like MULTI-EXEC and MGET.
2021-10-07 15:13:42 +03:00
yoav-steinberg
5725088ff2
Avoid argv memcpy when queuing a multi command. (#9602)
When queuing a multi command we duplicated the argv (meaning an alloc
and a memcpy). This isn't needed since we can use the previously allocated
argv and just reset the client objects argv to NULL. This should saves some
memory and is a minor optimization in heavy MULTI/EXEC traffic, especially
if there are lots of arguments.
2021-10-06 11:39:09 +03:00
Oran Agra
c5e6a6204c
Fix ziplist and listpack overflows and truncations (CVE-2021-32627, CVE-2021-32628) (#9589)
- fix possible heap corruption in ziplist and listpack resulting by trying to
  allocate more than the maximum size of 4GB.
- prevent ziplist (hash and zset) from reaching size of above 1GB, will be
  converted to HT encoding, that's not a useful size.
- prevent listpack (stream) from reaching size of above 1GB.
- XADD will start a new listpack if the new record may cause the previous
  listpack to grow over 1GB.
- XADD will respond with an error if a single stream record is over 1GB
- List type (ziplist in quicklist) was truncating strings that were over 4GB,
  now it'll respond with an error.

Co-authored-by: sundb <sundbcn@gmail.com>
2021-10-04 12:11:02 +03:00
Oran Agra
fba15850e5
Prevent unauthenticated client from easily consuming lots of memory (CVE-2021-32675) (#9588)
This change sets a low limit for multibulk and bulk length in the
protocol for unauthenticated connections, so that they can't easily
cause redis to allocate massive amounts of memory by sending just a few
characters on the network.
The new limits are 10 arguments of 16kb each (instead of 1m of 512mb)
2021-10-04 12:10:31 +03:00
yoav-steinberg
93e8534713
Remove argument count limit, dynamically grow argv. (#9528)
Remove hard coded multi-bulk limit (was 1,048,576), new limit is INT_MAX.
When client sends an m-bulk that's higher than 1024, we initially only allocate
the argv array for 1024 arguments, and gradually grow that allocation as arguments
are received.
2021-10-03 09:13:09 +03:00
Binbin
dd3ac97ffe
Cleanup typos, incorrect comments, and fixed small memory leak in redis-cli (#9153)
1. Remove forward declarations from header files to functions that do not exist:
hmsetCommand and rdbSaveTime.
2. Minor phrasing fixes in #9519
3. Add missing sdsfree(title) and fix typo in redis-benchmark.
4. Modify some error comments in some zset commands.
5. Fix copy-paste bug comment in syncWithMaster about `ip-address`.
2021-10-02 22:19:33 -07:00
yoav-steinberg
6600253046
Client eviction ci issues (#9549)
Fixing CI test issues introduced in #8687
- valgrind warnings in readQueryFromClient when client was freed by processInputBuffer
- adding DEBUG pause-cron for tests not to be time dependent.
- skipping a test that depends on socket buffers / events not compatible with TLS
- making sure client got subscribed by not using deferring client
2021-09-26 17:45:02 +03:00
yoav-steinberg
2753429c99
Client eviction (#8687)
### Description
A mechanism for disconnecting clients when the sum of all connected clients is above a
configured limit. This prevents eviction or OOM caused by accumulated used memory
between all clients. It's a complimentary mechanism to the `client-output-buffer-limit`
mechanism which takes into account not only a single client and not only output buffers
but rather all memory used by all clients.

#### Design
The general design is as following:
* We track memory usage of each client, taking into account all memory used by the
  client (query buffer, output buffer, parsed arguments, etc...). This is kept up to date
  after reading from the socket, after processing commands and after writing to the socket.
* Based on the used memory we sort all clients into buckets. Each bucket contains all
  clients using up up to x2 memory of the clients in the bucket below it. For example up
  to 1m clients, up to 2m clients, up to 4m clients, ...
* Before processing a command and before sleep we check if we're over the configured
  limit. If we are we start disconnecting clients from larger buckets downwards until we're
  under the limit.

#### Config
`maxmemory-clients` max memory all clients are allowed to consume, above this threshold
we disconnect clients.
This config can either be set to 0 (meaning no limit), a size in bytes (possibly with MB/GB
suffix), or as a percentage of `maxmemory` by using the `%` suffix (e.g. setting it to `10%`
would mean 10% of `maxmemory`).

#### Important code changes
* During the development I encountered yet more situations where our io-threads access
  global vars. And needed to fix them. I also had to handle keeps the clients sorted into the
  memory buckets (which are global) while their memory usage changes in the io-thread.
  To achieve this I decided to simplify how we check if we're in an io-thread and make it
  much more explicit. I removed the `CLIENT_PENDING_READ` flag used for checking
  if the client is in an io-thread (it wasn't used for anything else) and just used the global
  `io_threads_op` variable the same way to check during writes.
* I optimized the cleanup of the client from the `clients_pending_read` list on client freeing.
  We now store a pointer in the `client` struct to this list so we don't need to search in it
  (`pending_read_list_node`).
* Added `evicted_clients` stat to `INFO` command.
* Added `CLIENT NO-EVICT ON|OFF` sub command to exclude a specific client from the
  client eviction mechanism. Added corrosponding 'e' flag in the client info string.
* Added `multi-mem` field in the client info string to show how much memory is used up
  by buffered multi commands.
* Client `tot-mem` now accounts for buffered multi-commands, pubsub patterns and
  channels (partially), tracking prefixes (partially).
* CLIENT_CLOSE_ASAP flag is now handled in a new `beforeNextClient()` function so
  clients will be disconnected between processing different clients and not only before sleep.
  This new function can be used in the future for work we want to do outside the command
  processing loop but don't want to wait for all clients to be processed before we get to it.
  Specifically I wanted to handle output-buffer-limit related closing before we process client
  eviction in case the two race with each other.
* Added a `DEBUG CLIENT-EVICTION` command to print out info about the client eviction
  buckets.
* Each client now holds a pointer to the client eviction memory usage bucket it belongs to
  and listNode to itself in that bucket for quick removal.
* Global `io_threads_op` variable now can contain a `IO_THREADS_OP_IDLE` value
  indicating no io-threading is currently being executed.
* In order to track memory used by each clients in real-time we can't rely on updating
  these stats in `clientsCron()` alone anymore. So now I call `updateClientMemUsage()`
  (used to be `clientsCronTrackClientsMemUsage()`) after command processing, after
  writing data to pubsub clients, after writing the output buffer and after reading from the
  socket (and maybe other places too). The function is written to be fast.
* Clients are evicted if needed (with appropriate log line) in `beforeSleep()` and before
  processing a command (before performing oom-checks and key-eviction).
* All clients memory usage buckets are grouped as follows:
  * All clients using less than 64k.
  * 64K..128K
  * 128K..256K
  * ...
  * 2G..4G
  * All clients using 4g and up.
* Added client-eviction.tcl with a bunch of tests for the new mechanism.
* Extended maxmemory.tcl to test the interaction between maxmemory and
  maxmemory-clients settings.
* Added an option to flag a numeric configuration variable as a "percent", this means that
  if we encounter a '%' after the number in the config file (or config set command) we
  consider it as valid. Such a number is store internally as a negative value. This way an
  integer value can be interpreted as either a percent (negative) or absolute value (positive).
  This is useful for example if some numeric configuration can optionally be set to a percentage
  of something else.

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-09-23 14:02:16 +03:00
YaacovHazan
a56d4533b7
Adding ACL support for modules (#9309)
This commit introduced a new flag to the RM_Call:
'C' - Check if the command can be executed according to the ACLs associated with it.

Also, three new API's added to check if a command, key, or channel can be executed or accessed
by a user, according to the ACLs associated with it.
- RM_ACLCheckCommandPerm
- RM_ACLCheckKeyPerm
- RM_ACLCheckChannelPerm

The user for these API's is a RedisModuleUser object, that for a Module user returned by the RM_CreateModuleUser API, or for a general ACL user can be retrieved by these two new API's:
- RM_GetCurrentUserName - Retrieve the user name of the client connection behind the current context.
- RM_GetModuleUserFromUserName - Get a RedisModuleUser from a user name

As a result of getting a RedisModuleUser from name, it can now also access the general ACL users (not just ones created by the module).
This mean the already existing API RM_SetModuleUserACL(), can be used to change the ACL rules for such users.
2021-09-23 08:52:56 +03:00
Binbin
14d6abd8e9
Add ZMPOP/BZMPOP commands. (#9484)
This is similar to the recent addition of LMPOP/BLMPOP (#9373), but zset.

Syntax for the new ZMPOP command:
`ZMPOP numkeys [<key> ...] MIN|MAX [COUNT count]`

Syntax for the new BZMPOP command:
`BZMPOP timeout numkeys [<key> ...] MIN|MAX [COUNT count]`

Some background:
- ZPOPMIN/ZPOPMAX take only one key, and can return multiple elements.
- BZPOPMIN/BZPOPMAX take multiple keys, but return only one element from just one key.
- ZMPOP/BZMPOP can take multiple keys, and can return multiple elements from just one key.

Note that ZMPOP/BZMPOP can take multiple keys, it eventually operates on just on key.
And it will propagate as ZPOPMIN or ZPOPMAX with the COUNT option.

As new commands, if we can not pop any elements, the response like:
- ZMPOP: Return a NIL in both RESP2 and RESP3, unlike ZPOPMIN/ZPOPMAX return emptyarray.
- BZMPOP: Return a NIL in both RESP2 and RESP3 when timeout is reached, like BZPOPMIN/BZPOPMAX.

For the normal response is nested arrays in RESP2 and RESP3:
```
ZMPOP/BZMPOP
1) keyname
2) 1) 1) member1
      2) score1
   2) 1) member2
      2) score2

In RESP2:
1) "myzset"
2) 1) 1) "three"
      2) "3"
   2) 1) "two"
      2) "2"

In RESP3:
1) "myzset"
2) 1) 1) "three"
      2) (double) 3
   2) 1) "two"
      2) (double) 2
```
2021-09-23 08:34:40 +03:00
Binbin
f898a9e97d
Adds limit to SINTERCARD/ZINTERCARD. (#9425)
Implements the [LIMIT limit] variant of SINTERCARD/ZINTERCARD.
Now with the LIMIT, we can stop the searching when cardinality
reaching the limit, and return the cardinality ASAP.

Note that in SINTERCARD, the old synatx was: `SINTERCARD key [key ...]`
In order to add a optional parameter, we must break the old synatx.
So the new syntax of SINTERCARD will be consistent with ZINTERCARD.
New syntax: `SINTERCARD numkeys key [key ...] [LIMIT limit]`.

Note that this means that SINTERCARD has a different syntax than
SINTER and SINTERSTORE (taking numkeys argument)

As for ZINTERCARD, we can easily add a optional parameter to it.
New syntax: `ZINTERCARD numkeys key [key ...] [LIMIT limit]`
2021-09-16 14:07:08 +03:00
guybe7
7759ec7c43
Cleanup: propagate and alsoPropagate do not need redisCommand (#9502)
The `cmd` argument was completely unused, and all the code that bothered to pass it was unnecessary.
This is a prepartion for a future commit that treats subcommands as commands
2021-09-15 12:53:42 +03:00
guybe7
03fcc211de
A better approach for COMMAND INFO for movablekeys commands (#8324)
Fix #7297

The problem:

Today, there is no way for a client library or app to know the key name indexes for commands such as
ZUNIONSTORE/EVAL and others with "numkeys", since COMMAND INFO returns no useful info for them.

For cluster-aware redis clients, this requires to 'patch' the client library code specifically for each of these commands or to
resolve each execution of these commands with COMMAND GETKEYS.

The solution:

Introducing key specs other than the legacy "range" (first,last,step)

The 8th element of the command info array, if exists, holds an array of key specs. The array may be empty, which indicates
the command doesn't take any key arguments or may contain one or more key-specs, each one may leads to the discovery
of 0 or more key arguments.

A client library that doesn't support this key-spec feature will keep using the first,last,step and movablekeys flag which will
obviously remain unchanged.

A client that supports this key-specs feature needs only to look at the key-specs array. If it finds an unrecognized spec, it
must resort to using COMMAND GETKEYS if it wishes to get all key name arguments, but if all it needs is one key in order
to know which cluster node to use, then maybe another spec (if the command has several) can supply that, and there's no
need to use GETKEYS.

Each spec is an array of arguments, first one is the spec name, the second is an array of flags, and the third is an array
containing details about the spec (specific meaning for each spec type)
The initial flags we support are "read" and "write" indicating if the keys that this key-spec finds are used for read or for write.
clients should ignore any unfamiliar flags.

In order to easily find the positions of keys in a given array of args we introduce keys specs. There are two logical steps of
key specs:
1. `start_search`: Given an array of args, indicate where we should start searching for keys
2. `find_keys`: Given the output of start_search and an array of args, indicate all possible indices of keys.

### start_search step specs
- `index`: specify an argument index explicitly
  - `index`: 0 based index (1 means the first command argument)
- `keyword`: specify a string to match in `argv`. We should start searching for keys just after the keyword appears.
  - `keyword`: the string to search for
  - `start_search`: an index from which to start the keyword search (can be negative, which means to search from the end)

Examples:
- `SET` has start_search of type `index` with value `1`
- `XREAD` has start_search of type `keyword` with value `[“STREAMS”,1]`
- `MIGRATE` has start_search of type `keyword` with value `[“KEYS”,-2]`

### find_keys step specs
- `range`: specify `[count, step, limit]`.
  - `lastkey`: index of the last key. relative to the index returned from begin_search. -1 indicating till the last argument, -2 one before the last
  - `step`: how many args should we skip after finding a key, in order to find the next one
  - `limit`: if count is -1, we use limit to stop the search by a factor. 0 and 1 mean no limit. 2 means ½ of the remaining args, 3 means ⅓, and so on.
- “keynum”: specify `[keynum_index, first_key_index, step]`.
  - `keynum_index`: is relative to the return of the `start_search` spec.
  - `first_key_index`: is relative to `keynum_index`.
  - `step`: how many args should we skip after finding a key, in order to find the next one

Examples:
- `SET` has `range` of `[0,1,0]`
- `MSET` has `range` of `[-1,2,0]`
- `XREAD` has `range` of `[-1,1,2]`
- `ZUNION` has `start_search` of type `index` with value `1` and `find_keys` of type `keynum` with value `[0,1,1]`
- `AI.DAGRUN` has `start_search` of type `keyword` with value `[“LOAD“,1]` and `find_keys` of type `keynum` with value
  `[0,1,1]` (see https://oss.redislabs.com/redisai/master/commands/#aidagrun)

Note: this solution is not perfect as the module writers can come up with anything, but at least we will be able to find the key
args of the vast majority of commands.
If one of the above specs can’t describe the key positions, the module writer can always fall back to the `getkeys-api` option.

Some keys cannot be found easily (`KEYS` in `MIGRATE`: Imagine the argument for `AUTH` is the string “KEYS” - we will
start searching in the wrong index). 
The guarantee is that the specs may be incomplete (`incomplete` will be specified in the spec to denote that) but we never
report false information (assuming the command syntax is correct).
For `MIGRATE` we start searching from the end - `startfrom=-1` - and if one of the keys is actually called "keys" we will
report only a subset of all keys - hence the `incomplete` flag.
Some `incomplete` specs can be completely empty (i.e. UNKNOWN begin_search) which should tell the client that
COMMAND GETKEYS (or any other way to get the keys) must be used (Example: For `SORT` there is no way to describe
the STORE keyword spec, as the word "store" can appear anywhere in the command).

We will expose these key specs in the `COMMAND` command so that clients can learn, on startup, where the keys are for
all commands instead of holding hardcoded tables or use `COMMAND GETKEYS` in runtime.

Comments:
1. Redis doesn't internally use the new specs, they are only used for COMMAND output.
2. In order to support the current COMMAND INFO format (reply array indices 4, 5, 6) we created a synthetic range, called
   legacy_range, that, if possible, is built according to the new specs.
3. Redis currently uses only getkeys_proc or the legacy_range to get the keys indices (in COMMAND GETKEYS for
   example).

"incomplete" specs:
the command we have issues with are MIGRATE, STRALGO, and SORT
for MIGRATE, because the token KEYS, if exists, must be the last token, we can search in reverse. it one of the keys is
actually the string "keys" will return just a subset of the keys (hence, it's "incomplete")
for SORT and STRALGO we can use this heuristic (the keys can be anywhere in the command) and therefore we added a
key spec that is both "incomplete" and of "unknown type"

if a client encounters an "incomplete" spec it means that it must find a different way (either COMMAND GETKEYS or have
its own parser) to retrieve the keys.
please note that all commands, apart from the three mentioned above, have "complete" key specs
2021-09-15 11:10:29 +03:00
Viktor Söderqvist
ea36d4de17
Modules: Add remaining list API functions (#8439)
List functions operating on elements by index:

* RM_ListGet
* RM_ListSet
* RM_ListInsert
* RM_ListDelete

Iteration is done using a simple for loop over indices.
The index based functions use an internal iterator as an optimization.
This is explained in the docs:

```
 * Many of the list functions access elements by index. Since a list is in
 * essence a doubly-linked list, accessing elements by index is generally an
 * O(N) operation. However, if elements are accessed sequentially or with
 * indices close together, the functions are optimized to seek the index from
 * the previous index, rather than seeking from the ends of the list.
 *
 * This enables iteration to be done efficiently using a simple for loop:
 *
 *     long n = RM_ValueLength(key);
 *     for (long i = 0; i < n; i++) {
 *         RedisModuleString *elem = RedisModule_ListGet(key, i);
 *         // Do stuff...
 *     }
```
2021-09-14 17:48:06 +03:00
zhaozhao.zz
794442b130
PSYNC2: make partial sync possible after master reboot (#8015)
The main idea is how to allow a master to load replication info from RDB file when rebooting, if master can load replication info it means that replicas may have the chance to psync with master, it can save much traffic.

The key point is we need guarantee safety and consistency, so there
are two differences between master and replica:

1. master would load the replication info as secondary ID and
   offset, in case other masters have the same replid.
2. when master loading RDB, it would propagate expired keys as DEL
   command to replication backlog, then replica can receive these
   commands to delete stale keys.
   p.s. the expired keys when RDB loading is useful for users, so
   we show it as `rdb_last_load_keys_expired` and `rdb_last_load_keys_loaded` in info persistence.

Moreover, after load replication info, master should update
`no_replica_time` in case loading RDB cost too long time.
2021-09-13 15:39:11 +08:00
sundb
3ca6972ecd
Replace all usage of ziplist with listpack for t_zset (#9366)
Part two of implementing #8702 (zset), after #8887.

## Description of the feature
Replaced all uses of ziplist with listpack in t_zset, and optimized some of the code to optimize performance.

## Rdb format changes
New `RDB_TYPE_ZSET_LISTPACK` rdb type.

## Rdb loading improvements:
1) Pre-expansion of dict for validation of duplicate data for listpack and ziplist.
2) Simplifying the release of empty key objects when RDB loading.
3) Unify ziplist and listpack data verify methods for zset and hash, and move code to rdb.c.

## Interface changes
1) New `zset-max-listpack-entries` config is an alias for `zset-max-ziplist-entries` (same with `zset-max-listpack-value`).
2) OBJECT ENCODING will return listpack instead of ziplist.

## Listpack improvements:
1) Add `lpDeleteRange` and `lpDeleteRangeWithEntry` functions to delete a range of entries from listpack.
2) Improve the performance of `lpCompare`, converting from string to integer is faster than converting from integer to string.
3) Replace `snprintf` with `ll2string` to improve performance in converting numbers to strings in `lpGet()`.

## Zset improvements:
1) Improve the performance of `zzlFind` method, use `lpFind` instead of `lpCompare` in a loop.
2) Use `lpDeleteRangeWithEntry` instead of `lpDelete` twice to delete a element of zset.

## Tests
1) Add some unittests for `lpDeleteRange` and `lpDeleteRangeWithEntry` function.
2) Add zset RDB loading test.
3) Add benchmark test for `lpCompare` and `ziplsitCompare`.
4) Add empty listpack zset corrupt dump test.
2021-09-09 18:18:53 +03:00
Binbin
c50af0aeba
Add LMPOP/BLMPOP commands. (#9373)
We want to add COUNT option for BLPOP.
But we can't do it without breaking compatibility due to the command arguments syntax.
So this commit introduce two new commands.

Syntax for the new LMPOP command:
`LMPOP numkeys [<key> ...] LEFT|RIGHT [COUNT count]`

Syntax for the new BLMPOP command:
`BLMPOP timeout numkeys [<key> ...] LEFT|RIGHT [COUNT count]`

Some background:
- LPOP takes one key, and can return multiple elements.
- BLPOP takes multiple keys, but returns one element from just one key.
- LMPOP can take multiple keys and return multiple elements from just one key.

Note that LMPOP/BLMPOP  can take multiple keys, it eventually operates on just one key.
And it will propagate as LPOP or RPOP with the COUNT option.

As a new command, it still return NIL if we can't pop any elements.
For the normal response is nested arrays in RESP2 and RESP3, like:
```
LMPOP/BLMPOP 
1) keyname
2) 1) element1
   2) element2
```
I.e. unlike BLPOP that returns a key name and one element so it uses a flat array,
and LPOP that returns multiple elements with no key name, and again uses a flat array,
this one has to return a nested array, and it does for for both RESP2 and RESP3 (like SCAN does)

Some discuss can see: #766 #8824
2021-09-09 12:02:33 +03:00
Huang Zhw
216f168b2b
Add INFO total_active_defrag_time and current_active_defrag_time (#9377)
Add two INFO metrics:
```
total_active_defrag_time:12345
current_active_defrag_time:456
```
`current_active_defrag_time` if greater than 0, means how much time has
passed since active defrag started running. If active defrag stops, this metric is reset to 0.
`total_active_defrag_time` means total time the fragmentation
was over the defrag threshold since the server started.

This is a followup PR for #9031
2021-09-09 11:38:10 +03:00
Viktor Söderqvist
f24c63a292
Slot-to-keys using dict entry metadata (#9356)
* Enhance dict to support arbitrary metadata carried in dictEntry

Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>

* Rewrite slot-to-keys mapping to linked lists using dict entry metadata

This is a memory enhancement for Redis Cluster.

The radix tree slots_to_keys (which duplicates all key names prefixed with their
slot number) is replaced with a linked list for each slot. The dict entries of
the same cluster slot form a linked list and the pointers are stored as metadata
in each dict entry of the main DB dict.

This commit also moves the slot-to-key API from db.c to cluster.c.

Co-authored-by: Jim Brunner <brunnerj@amazon.com>
2021-08-30 23:25:36 -07:00
Wang Yuan
9a0c0617f1
Use sync_file_range to optimize fsync if possible (#9409)
We implement incremental data sync in rio.c by call fsync, on slow disk, that may cost a lot of time,
sync_file_range could provide async fsync, so we could serialize key/value and sync file data at the same time.

> one tip for sync_file_range usage: http://lkml.iu.edu/hypermail/linux/kernel/1005.2/01845.html

Additionally, this change avoids a single large write to be used, which can result in a mass of dirty
pages in the kernel (increasing the risk of someone else's write to block).

On HDD, current solution could reduce approximate half of dumping RDB time,
this PR costs 50s for dump 7.7G rdb but unstable branch costs 93s.
On NVME SSD, this PR can't reduce much time,  this PR costs 40s, unstable branch costs 48s.

Moreover, I find calling data sync every 4MB is better than 32MB.
2021-08-30 10:24:53 +03:00
Meir Shpilraien (Spielrein)
8f8117f78e
Format fixes and naming. SentReplyOnKeyMiss -> addReplyOrErrorObject (#9346)
Following the comments on #8659, this PR fix some formatting
and naming issues.
2021-08-10 10:19:21 +03:00
sundb
02fd76b97c
Replace all usage of ziplist with listpack for t_hash (#8887)
Part one of implementing #8702 (taking hashes first before other types)

## Description of the feature
1. Change ziplist encoded hash objects to listpack encoding.
2. Convert existing ziplists on RDB loading time. an O(n) operation.

## Rdb format changes
1. Add RDB_TYPE_HASH_LISTPACK rdb type.
2. Bump RDB_VERSION to 10

## Interface changes
1. New `hash-max-listpack-entries` config is an alias for `hash-max-ziplist-entries` (same with `hash-max-listpack-value`)
2. OBJECT ENCODING will return `listpack` instead of `ziplist`

## Listpack improvements:
1. Support direct insert, replace integer element (rather than convert back and forth from string)
3. Add more listpack capabilities to match the ziplist ones (like `lpFind`, `lpRandomPairs` and such)
4. Optimize element length fetching, avoid multiple calculations
5. Use inline to avoid function call overhead.

## Tests
1. Add a new test to the RDB load time conversion
2. Adding the listpack unit tests. (based on the one in ziplist.c)
3. Add a few "corrupt payload: fuzzer findings" tests, and slightly modify existing ones.

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-08-10 09:18:49 +03:00
Eduardo Semprebon
d3356bf614
Add SORT_RO command (#9299)
Add a readonly variant of the STORE command, so it can be used on
read-only workloads (replica, ACL, etc)
2021-08-09 09:40:29 +03:00
DarrenJiang13
43eb0ce3bf
[BUGFIX] Add some missed error statistics (#9328)
add error counting for some missed behaviors.
2021-08-06 19:27:24 -07:00
yoav-steinberg
0a9377535b
Ignore resize threshold on idle qbuf resizing (#9322)
Also update qbuf tests to verify both idle and peak based resizing logic.
And delete unused function: getClientsMaxBuffers
2021-08-06 20:50:34 +03:00
Oran Agra
0c90370e6d
Improvements to corrupt payload sanitization (#9321)
Recently we found two issues in the fuzzer tester: #9302 #9285
After fixing them, more problems surfaced and this PR (as well as #9297) aims to fix them.

Here's a list of the fixes
- Prevent an overflow when allocating a dict hashtable
- Prevent OOM when attempting to allocate a huge string
- Prevent a few invalid accesses in listpack
- Improve sanitization of listpack first entry
- Validate integrity of stream consumer groups PEL
- Validate integrity of stream listpack entry IDs
- Validate ziplist tail followed by extra data which start with 0xff

Co-authored-by: sundb <sundbcn@gmail.com>
2021-08-05 22:56:14 +03:00
Madelyn Olson
39a4a44d7d
Add debug config flag to print certain config values on engine crash (#9304)
Add debug config flag to print certain config values on engine crash
2021-08-05 11:59:12 -07:00
menwen
ca559819f7
Add latency monitor sample when key is deleted via lazy expire (#9317)
Fix that there is no sample latency after the key expires via expireIfNeeded().
Some refactoring for shared code.
2021-08-05 11:09:24 +03:00
yoav-steinberg
5e908a290c
dict struct memory optimizations (#9228)
Reduce dict struct memory overhead
on 64bit dict size goes down from jemalloc's 96 byte bin to its 56 byte bin.

summary of changes:
- Remove `privdata` from callbacks and dict creation. (this affects many files, see "Interface change" below).
- Meld `dictht` struct into the `dict` struct to eliminate struct padding. (this affects just dict.c and defrag.c)
- Eliminate the `sizemask` field, can be calculated from size when needed.
- Convert the `size` field into `size_exp` (exponent), utilizes one byte instead of 8.

Interface change: pass dict pointer to dict type call back functions.
This is instead of passing the removed privdata field. In the future if
we'd like to have private data in the callbacks we can extract it from
the dict type. We can extend dictType to include a custom dict struct
allocator and use it to allocate more data at the end of the dict
struct. This data can then be used to store private data later acccessed
by the callbacks.
2021-08-05 08:25:58 +03:00
Wang Yuan
d4bca53cd9
Use madvise(MADV_DONTNEED) to release memory to reduce COW (#8974)
## Backgroud
As we know, after `fork`, one process will copy pages when writing data to these
pages(CoW), and another process still keep old pages, they totally cost more memory.
For redis, we suffered that redis consumed much memory when the fork child is serializing
key/values, even that maybe cause OOM.

But actually we find, in redis fork child process, the child process don't need to keep some
memory and parent process may write or update that, for example, child process will never
access the key-value that is serialized but users may update it in parent process.
So we think it may reduce COW if the child process release memory that it is not needed.

## Implementation
For releasing key value in child process, we may think we call `decrRefCount` to free memory,
but i find the fork child process still use much memory when we don't write any data to redis,
and it costs much more time that slows down bgsave. Maybe because memory allocator doesn't
really release memory to OS, and it may modify some inner data for this free operation, especially
when we free small objects.

Moreover, CoW is based on  pages, so it is a easy way that we only free the memory bulk that is
not less than kernel page size. madvise(MADV_DONTNEED) can quickly release specified region
pages to OS bypassing memory allocator, and allocator still consider that this memory still is used
and don't change its inner data.

There are some buffers we can release in the fork child process:
- **Serialized key-values**
  the fork child process never access serialized key-values, so we try to free them.
  Because we only can release big bulk memory, and it is time consumed to iterate all
  items/members/fields/entries of complex data type. So we decide to iterate them and
  try to release them only when their average size of item/member/field/entry is more
  than page size of OS.
- **Replication backlog**
  Because replication backlog is a cycle buffer, it will be changed quickly if redis has heavy
  write traffic, but in fork child process, we don't need to access that.
- **Client buffers**
  If clients have requests during having the fork child process, clients' buffer also be changed
  frequently. The memory includes client query buffer, output buffer, and client struct used memory.

To get child process peak private dirty memory, we need to count peak memory instead
of last used memory, because the child process may continue to release memory (since
COW used to only grow till now, the last was equivalent to the peak).
Also we're adding a new `current_cow_peak` info variable (to complement the existing
`current_cow_size`)

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-08-04 23:01:46 +03:00
Meir Shpilraien (Spielrein)
2237131e15
Unified Lua and modules reply parsing and added RESP3 support to RM_Call (#9202)
## Current state
1. Lua has its own parser that handles parsing `reds.call` replies and translates them
  to Lua objects that can be used by the user Lua code. The parser partially handles
  resp3 (missing big number, verbatim, attribute, ...)
2. Modules have their own parser that handles parsing `RM_Call` replies and translates
  them to RedisModuleCallReply objects. The parser does not support resp3.

In addition, in the future, we want to add Redis Function (#8693) that will probably
support more languages. At some point maintaining so many parsers will stop
scaling (bug fixes and protocol changes will need to be applied on all of them).
We will probably end up with different parsers that support different parts of the
resp protocol (like we already have today with Lua and modules)

## PR Changes
This PR attempt to unified the reply parsing of Lua and modules (and in the future
Redis Function) by introducing a new parser unit (`resp_parser.c`). The new parser
handles parsing the reply and calls different callbacks to allow the users (another
unit that uses the parser, i.e, Lua, modules, or Redis Function) to analyze the reply.

### Lua API Additions
The code that handles reply parsing on `scripting.c` was removed. Instead, it uses
the resp_parser to parse and create a Lua object out of the reply. As mentioned
above the Lua parser did not handle parsing big numbers, verbatim, and attribute.
The new parser can handle those and so Lua also gets it for free.
Those are translated to Lua objects in the following way:
1. Big Number - Lua table `{'big_number':'<str representation for big number>'}`
2. Verbatim - Lua table `{'verbatim_string':{'format':'<verbatim format>', 'string':'<verbatim string value>'}}`
3. Attribute - currently ignored and not expose to the Lua parser, another issue will be open to decide how to expose it.

Tests were added to check resp3 reply parsing on Lua

### Modules API Additions
The reply parsing code on `module.c` was also removed and the new resp_parser is used instead.
In addition, the RedisModuleCallReply was also extracted to a separate unit located on `call_reply.c`
(in the future, this unit will also be used by Redis Function). A nice side effect of unified parsing is
that modules now also support resp3. Resp3 can be enabled by giving `3` as a parameter to the
fmt argument of `RM_Call`. It is also possible to give `0`, which will indicate an auto mode. i.e, Redis
will automatically chose the reply protocol base on the current client set on the RedisModuleCtx
(this mode will mostly be used when the module want to pass the reply to the client as is).
In addition, the following RedisModuleAPI were added to allow analyzing resp3 replies:

* New RedisModuleCallReply types:
   * `REDISMODULE_REPLY_MAP`
   * `REDISMODULE_REPLY_SET`
   * `REDISMODULE_REPLY_BOOL`
   * `REDISMODULE_REPLY_DOUBLE`
   * `REDISMODULE_REPLY_BIG_NUMBER`
   * `REDISMODULE_REPLY_VERBATIM_STRING`
   * `REDISMODULE_REPLY_ATTRIBUTE`

* New RedisModuleAPI:
   * `RedisModule_CallReplyDouble` - getting double value from resp3 double reply
   * `RedisModule_CallReplyBool` - getting boolean value from resp3 boolean reply
   * `RedisModule_CallReplyBigNumber` - getting big number value from resp3 big number reply
   * `RedisModule_CallReplyVerbatim` - getting format and value from resp3 verbatim reply
   * `RedisModule_CallReplySetElement` - getting element from resp3 set reply
   * `RedisModule_CallReplyMapElement` - getting key and value from resp3 map reply
   * `RedisModule_CallReplyAttribute` - getting a reply attribute
   * `RedisModule_CallReplyAttributeElement` - getting key and value from resp3 attribute reply
   
* New context flags:
   * `REDISMODULE_CTX_FLAGS_RESP3` - indicate that the client is using resp3

Tests were added to check the new RedisModuleAPI

### Modules API Changes
* RM_ReplyWithCallReply might return REDISMODULE_ERR if the given CallReply is in resp3
  but the client expects resp2. This is not a breaking change because in order to get a resp3
  CallReply one needs to specifically specify `3` as a parameter to the fmt argument of
  `RM_Call` (as mentioned above).

Tests were added to check this change

### More small Additions
* Added `debug set-disable-deny-scripts` that allows to turn on and off the commands no-script
flag protection. This is used by the Lua resp3 tests so it will be possible to run `debug protocol`
and check the resp3 parsing code.

Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
2021-08-04 16:28:07 +03:00
Jonah H. Harris
432c92d8df
Add SINTERCARD/ZINTERCARD Commands (#8946)
Add SINTERCARD and ZINTERCARD commands that are similar to
ZINTER and SINTER but only return the cardinality with minimum
processing and memory overheads.

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-08-03 11:45:27 +03:00
Huang Zhw
17511df59b
Add INFO stat total_eviction_exceeded_time and current_eviction_exceeded_time (#9031)
Add two INFO metrics:
```
total_eviction_exceeded_time:69734
current_eviction_exceeded_time:10230
```
`current_eviction_exceeded_time` if greater than 0, means how much time current used memory is greater than `maxmemory`. And we are still over the maxmemory. If used memory is below `maxmemory`, this metric is reset to 0.
`total_eviction_exceeded_time` means total time used memory is greater than `maxmemory` since server startup. 
The units of these two metrics are ms.

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-07-26 10:07:20 +03:00
Huang Zhw
71d452876e
On 32 bit platform, the bit position of GETBIT/SETBIT/BITFIELD/BITCOUNT,BITPOS may overflow (see CVE-2021-32761) (#9191)
GETBIT, SETBIT may access wrong address because of wrap.
BITCOUNT and BITPOS may return wrapped results.
BITFIELD may access the wrong address but also allocate insufficient memory and segfault (see CVE-2021-32761).

This commit uses `uint64_t` or `long long` instead of `size_t`.
related https://github.com/redis/redis/pull/8096

At 32bit platform:
> setbit bit 4294967295 1
(integer) 0
> config set proto-max-bulk-len 536870913
OK
> append bit "\xFF"
(integer) 536870913
> getbit bit 4294967296
(integer) 0

When the bit index is larger than 4294967295, size_t can't hold bit index. In the past,  `proto-max-bulk-len` is limit to 536870912, so there is no problem.

After this commit, bit position is stored in `uint64_t` or `long long`. So when `proto-max-bulk-len > 536870912`, 32bit platforms can still be correct.

For 64bit platform, this problem still exists. The major reason is bit pos 8 times of byte pos. When proto-max-bulk-len is very larger, bit pos may overflow.
But at 64bit platform, we don't have so long string. So this bug may never happen.

Additionally this commit add a test cost `512MB` memory which is tag as `large-memory`. Make freebsd ci and valgrind ci ignore this test.
2021-07-21 16:25:19 +03:00
Oran Agra
6a5bac309e
Test infra, handle RESP3 attributes and big-numbers and bools (#9235)
- promote the code in DEBUG PROTOCOL to addReplyBigNum
- DEBUG PROTOCOL ATTRIB skips the attribute when client is RESP2
- networking.c addReply for push and attributes generate assertion when
  called on a RESP2 client, anything else would produce a broken
  protocol that clients can't handle.
2021-07-14 19:14:31 +03:00
perryitay
ac8b1df885
Fail EXEC command in case a watched key is expired (#9194)
There are two issues fixed in this commit: 
1. we want to fail the EXEC command in case there is a watched key that's logically
   expired but not yet deleted by active expire or lazy expire.
2. we saw that currently cache time is update in every `call()` (including nested calls),
   this time is being also being use for the isKeyExpired comparison, we want to update
   the cache time only in the first call (execCommand)

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-07-11 13:17:23 +03:00
Madelyn Olson
8f59f131e5
Update incrDecrCommand to use addReplyLongLong (#9188)
Update incrDecrCommand to use addReplyLongLong
2021-07-03 10:51:53 -05:00
Yossi Gottlieb
aa139e2f02
Fix CLIENT UNBLOCK crashing modules. (#9167)
Modules that use background threads with thread safe contexts are likely
to use RM_BlockClient() without a timeout function, because they do not
set up a timeout.

Before this commit, `CLIENT UNBLOCK` would result with a crash as the
`NULL` timeout callback is called. Beyond just crashing, this is also
logically wrong as it may throw the module into an unexpected client
state.

This commits makes `CLIENT UNBLOCK` on such clients behave the same as
any other client that is not in a blocked state and therefore cannot be
unblocked.
2021-07-01 17:11:27 +03:00
Yossi Gottlieb
f233c4c59d
Add bind-source-addr configuration argument. (#9142)
In the past, the first bind address that was explicitly specified was
also used to bind outgoing connections. This could result with some
problems. For example: on some systems using `bind 127.0.0.1` would
result with outgoing connections also binding to `127.0.0.1` and failing
to connect to remote addresses.

With the recent change to the way `bind` is handled, this presented
other issues:

* The default first bind address is '*' which is not a valid address.
* We make no distinction between user-supplied config that is identical
to the default, and the default config.

This commit addresses both these issues by introducing an explicit
configuration parameter to control the bind address on outgoing
connections.
2021-06-24 19:48:18 +03:00
Yossi Gottlieb
07b0d144ce
Improve bind and protected-mode config handling. (#9034)
* Specifying an empty `bind ""` configuration prevents Redis from listening on any TCP port. Before this commit, such configuration was not accepted.
* Using `CONFIG GET bind` will always return an explicit configuration value. Before this commit, if a bind address was not specified the returned value was empty (which was an anomaly).

Another behavior change is that modifying the `bind` configuration to a non-default value will NO LONGER DISABLE protected-mode implicitly.
2021-06-22 12:50:17 +03:00
yoav-steinberg
362786c58a
Remove gopher protocol support. (#9057)
Gopher support was added mainly because it was simple (trivial to add).
But apparently even something that was trivial at the time, does cause complications
down the line when adding more features.
We recently ran into a few issues with io-threads conflicting with the gopher support.
We had to either complicate the code further in order to solve them, or drop gopher.
AFAIK it's completely unused, so we wanna chuck it, rather than keep supporting it.
2021-06-16 09:47:25 +03:00
chenyang8094
e0cd3ad0de
Enhance mem_usage/free_effort/unlink/copy callbacks and add GetDbFromIO api. (#8999)
Create new module type enhanced callbacks: mem_usage2, free_effort2, unlink2, copy2.
These will be given a context point from which the module can obtain the key name and database id.
In addition the digest and defrag context can now be used to obtain the key name and database id.
2021-06-16 09:45:49 +03:00
Uri Shachar
c7e502a07b
Cleaning up the cluster interface by moving almost all related declar… (#9080)
* Cleaning up the cluster interface by moving almost all related declarations into cluster.h
(no logic change -- just moving declarations/definitions around)

This initial effort leaves two items out of scope - the configuration parsing into the server
struct and the internals exposed by the clusterNode struct.

* Remove unneeded declarations of dictSds*
Ideally all the dictSds functionality would move from server.c into a dedicated module
so we can avoid the duplication in redis-benchmark/cli

* Move crc16 back into server.h, will be moved out once we create a seperate header file for
hashing functions
2021-06-15 20:35:13 -07:00
sundb
e5d8a5eb85
Fix the wrong reisze of querybuf (#9003)
The initialize memory of `querybuf` is `PROTO_IOBUF_LEN(1024*16) * 2` (due to sdsMakeRoomFor being greedy), under `jemalloc`, the allocated memory will be 40k.
This will most likely result in the `querybuf` being resized when call `clientsCronResizeQueryBuffer` unless the client requests it fast enough.

Note that this bug existed even before #7875, since the condition for resizing includes the sds headers (32k+6).

## Changes
1. Use non-greedy sdsMakeRoomFor when allocating the initial query buffer (of 16k).
1. Also use non-greedy allocation when working with BIG_ARG (we won't use that extra space anyway)
2. in case we did use a greedy allocation, read as much as we can into the buffer we got (including internal frag), to reduce system calls.
3. introduce a dedicated constant for the shrinking (same value as before)
3. Add test for querybuf.
4. improve a maxmemory test by ignoring the effect of replica query buffers (can accumulate many ACKs on slow env)
5. improve a maxmemory by disabling slowlog (it will cause slight memory growth on slow env).
2021-06-15 14:46:19 +03:00
YaacovHazan
1677efb9da
cleanup around loadAppendOnlyFile (#9012)
Today when we load the AOF on startup, the loadAppendOnlyFile checks if
the file is openning for reading.
This check is redundent (dead code) as we open the AOF file for writing at initServer,
and the file will always be existing for the loadAppendOnlyFile.

In this commit:
- remove all the exit(1) from loadAppendOnlyFile, as it is the caller
  responsibility to decide what to do in case of failure.
- move the opening of the AOF file for writing, to be after we loading it.
- avoid return -ERR in DEBUG LOADAOF, when the AOF is existing but empty
2021-06-14 10:38:08 +03:00
ZhaolongLi
63da66bb63
Delete an unnecessary function declaration (#9065) 2021-06-10 16:33:16 -07:00
Binbin
0bfccc55e2
Fixed some typos, add a spell check ci and others minor fix (#8890)
This PR adds a spell checker CI action that will fail future PRs if they introduce typos and spelling mistakes.
This spell checker is based on blacklist of common spelling mistakes, so it will not catch everything,
but at least it is also unlikely to cause false positives.

Besides that, the PR also fixes many spelling mistakes and types, not all are a result of the spell checker we use.

Here's a summary of other changes:
1. Scanned the entire source code and fixes all sorts of typos and spelling mistakes (including missing or extra spaces).
2. Outdated function / variable / argument names in comments
3. Fix outdated keyspace masks error log when we check `config.notify-keyspace-events` in loadServerConfigFromString.
4. Trim the white space at the end of line in `module.c`. Check: https://github.com/redis/redis/pull/7751
5. Some outdated https link URLs.
6. Fix some outdated comment. Such as:
    - In README: about the rdb, we used to said create a `thread`, change to `process`
    - dbRandomKey function coment (about the dictGetRandomKey, change to dictGetFairRandomKey)
    - notifyKeyspaceEvent fucntion comment (add type arg)
    - Some others minor fix in comment (Most of them are incorrectly quoted by variable names)
7. Modified the error log so that users can easily distinguish between TCP and TLS in `changeBindAddr`
2021-06-10 15:39:33 +03:00