Commit Graph

1756 Commits

Author SHA1 Message Date
Harkrishn Patro
45ccae89bb
Add new cluster shards command (#10293)
Implement a new cluster shards command, which provides a flexible and extensible API for topology discovery.

Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2022-03-15 18:24:40 -07:00
Madelyn Olson
416c9ac2ef
Add module API for redacting command arguments (#10425)
Add module API for redacting client commands
2022-03-15 18:21:13 -07:00
ranshid
1078e30c5f
make sort/ro commands validate external keys access patterns (#10106) (#10340)
Currently the sort and sort_ro can access external keys via `GET` and `BY`
in order to make sure the user cannot violate the authorization ACL
rules, the decision is to reject external keys access patterns unless ACL allows
SORT full access to all keys.
I.e. for backwards compatibility, SORT with GET/BY keeps working, but
if ACL has restrictions to certain keys, these features get permission denied.

### Implemented solution
We have discussed several potential solutions and decided to only allow the GET and BY
arguments when the user has all key permissions with the SORT command. The reasons
being that SORT with GET or BY is problematic anyway, for instance it is not supported in
cluster mode since it doesn't declare keys, and we're not sure the combination of that feature
with ACL key restriction is really required.
**HOWEVER** If in the fullness of time we will identify a real need for fine grain access
support for SORT, we would implement the complete solution which is the alternative
described below.

### Alternative (Completion solution):
Check sort ACL rules after executing it and before committing output (either via store or
to COB). it would require making several changes to the sort command itself. and would
potentially cause performance degradation since we will have to collect all the get keys
instead of just applying them to a temp array and then scan the access keys against the
ACL selectors. This solution can include an optimization to avoid the overheads of collecting
the key names, in case the ACL rules grant SORT full key-access, or if the ACL key pattern
literal matches the one used in GET/BY. It would also mean that authorization would be
O(nlogn) since we will have to complete most of the command execution before we can
perform verification

Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-03-15 17:14:53 +02:00
Binbin
871fa12fec
Sentinel: fix reconnect test timing issue (#10424)
We need to wait for `sentinelTimer` to kick in, and then trigger the reconnect.

As for another change, we should better call `server_set_password` before calling SENTINEL SET auth-pass.

Fixes problem introeuced in #10400
2022-03-14 11:13:14 +02:00
Moti Cohen
a6bf509810
Sentinel: fix no reconnect after auth-pass is changed (#10400)
When updating SENTINEL with master’s new password (command:
`SENTINEL SET mymaster auth-pass some-new-password`), 
sentinel might still keep the old connection and avoid reconnecting 
with the new password. This is because of wrong logic that traces 
the last ping (pong) time to servers. In fact it worked fine until 8631e64 
changed the condition to send ping. To resolve it with minimal risk, 
let’s disconnect master and replicas once changing password/user. 

Based on earlier work of yz1509.
2022-03-13 10:13:47 +02:00
ranshid
11b071a22b
ACL DRYRUN does not validate the verified command args. (#10405)
As a result we segfault when parsing and matching the command keys.
2022-03-10 10:08:41 +02:00
zhugezy
a26cab9dd6
set "disable-thp" config immutable (#10409)
It's confusing for this config to be modifiable since it only takes effect on startup
2022-03-10 09:52:49 +02:00
蔡相跃
24da71e507
Fix typo "the the" (#10399) 2022-03-09 13:55:17 +02:00
guybe7
2a2954086a
XREADGROUP: Unblock client if stream is deleted (#10306)
Deleting a stream while a client is blocked XREADGROUP should unblock the client.

The idea is that if a client is blocked via XREADGROUP is different from
any other blocking type in the sense that it depends on the existence of both
the key and the group. Even if the key is deleted and then revived with XADD
it won't help any clients blocked on XREADGROUP because the group no longer
exist, so they would fail with -NOGROUP anyway.
The conclusion is that it's better to unblock these clients (with error) upon
the deletion of the key, rather than waiting for the first XADD. 

Other changes:
1. Slightly optimize all `serveClientsBlockedOn*` functions by checking `server.blocked_clients_by_type`
2. All `serveClientsBlockedOn*` functions now use a list iterator rather than looking at `listFirst`, relying
  on `unblockClient` to delete the head of the list. Before this commit, only `serveClientsBlockedOnStreams`
  used to work like that.
3. bugfix: CLIENT UNBLOCK ERROR should work even if the command doesn't have a timeout_callback
  (only relevant to module commands)
2022-03-08 17:10:36 +02:00
zhaozhao.zz
728e62523e
script should not allow may-replicate commands when client pause write (#10364)
In some special commands like eval_ro / fcall_ro we allow no-writes commands.
But may-replicate commands are no-writes too, that leads crash when client pause write:
2022-03-08 16:53:11 +02:00
Shaya Potter
23f03e7965
Modules: Add REDISMODULE_EVENT_CONFIG (#10311)
Add a new REDISMODULE_EVENT_CONFIG event type for notifying modules when Redis configuration changes.
2022-03-07 17:37:57 +02:00
Binbin
45d83fb2d4
Fix timing issue in rehash test (#10388)
`Expected '*table size: 4096*' to match '*table size: 8192*'`

This test failed once on daily macOS, the reason is because
the bgsave has not stopped after the kill and `after 200`.
So there is a child process and no rehash triggered.

This commit use `waitForBgsave` to wait for it to finish.
2022-03-07 13:44:07 +02:00
Yossi Gottlieb
6740e1753d
Fix redis-cli test issues on tcl8.5. (#10386)
Apparently using `\x` produces different results between tclsh 8.5 and
8.6, whereas `\u` is more consistent.
2022-03-06 13:02:35 +02:00
Yuta Hongo
e3ef73dc2a
redis-cli: Better --json Unicode support and --quoted-json (#10286)
Normally, `redis-cli` escapes non-printable data received from Redis, using a custom scheme (which is also used to handle quoted input). When using `--json` this is not desired as it is not compatible with RFC 7159, which specifies JSON strings are assumed to be Unicode and how they should be escaped.

This commit changes `--json` to follow RFC 7159, which means that properly encoded Unicode strings in Redis will result with a valid Unicode JSON.

However, this introduces a new problem with `--json` and data that is not valid Unicode (e.g., random binary data, text that follows other encoding, etc.). To address this, we add `--quoted-json` which produces JSON strings that follow the original redis-cli quoting scheme.

For example, a value that consists of only null (0x00) bytes will show up as:
* `"\u0000\u0000\u0000"` when using `--json`
* `"\\x00\\x00\\x00"` when using `--quoted-json`
2022-03-05 21:25:52 +02:00
ranshid
9b15dd288e
Introduce debug command to disable reply buffer resizing (#10360)
In order to resolve some flaky tests which hard rely on examine memory footprint.
we introduce the following fixes:

# Fix in client-eviction test - by @yoav-steinberg 
Sometime the libc allocator can use different size client struct allocations.
this may cause unexpected memory calculations to fail the test.

# Introduce new DEBUG command for disabling reply buffer resizing
In order to eliminate reply buffer resizing during specific tests.
we introduced the ability to disable (and enable) the resizing cron job

Co-authored-by: yoav-steinberg yoav@redislabs.com
2022-03-01 14:40:29 +02:00
Harkrishn Patro
21aabab401
Fix acl dryrun to return the tested common permission error. (#10359) 2022-02-28 20:26:58 -08:00
ranshid
5860fa3d9c
deflake client-eviction test "evict clients only until below limit" (#10354)
After introducing #9822 need to prevent client reply buffer shrink
to maintain correct client memory math.

add needs:debug missing one one test.

Co-authored-by: Oran Agra <oran@redislabs.com>
2022-02-28 11:32:42 +02:00
Meir Shpilraien (Spielrein)
aa856b39f2
Sort out the mess around Lua error messages and error stats (#10329)
This PR fix 2 issues on Lua scripting:
* Server error reply statistics (some errors were counted twice).
* Error code and error strings returning from scripts (error code was missing / misplaced).

## Statistics
a Lua script user is considered part of the user application, a sophisticated transaction,
so we want to count an error even if handled silently by the script, but when it is
propagated outwards from the script we don't wanna count it twice. on the other hand,
if the script decides to throw an error on its own (using `redis.error_reply`), we wanna
count that too.
Besides, we do count the `calls` in command statistics for the commands the script calls,
we we should certainly also count `failed_calls`.
So when a simple `eval "return redis.call('set','x','y')" 0` fails, it should count the failed call
to both SET and EVAL, but the `errorstats` and `total_error_replies` should be counted only once.

The PR changes the error object that is raised on errors. Instead of raising a simple Lua
string, Redis will raise a Lua table in the following format:

```
{
    err='<error message (including error code)>',
    source='<User source file name>',
    line='<line where the error happned>',
    ignore_error_stats_update=true/false,
}
```

The `luaPushError` function was modified to construct the new error table as describe above.
The `luaRaiseError` was renamed to `luaError` and is now simply called `lua_error` to raise
the table on the top of the Lua stack as the error object.
The reason is that since its functionality is changed, in case some Redis branch / fork uses it,
it's better to have a compilation error than a bug.

The `source` and `line` fields are enriched by the error handler (if possible) and the
`ignore_error_stats_update` is optional and if its not present then the default value is `false`.
If `ignore_error_stats_update` is true, the error will not be counted on the error stats.

When parsing Redis call reply, each error is translated to a Lua table on the format describe
above and the `ignore_error_stats_update` field is set to `true` so we will not count errors
twice (we counted this error when we invoke the command).

The changes in this PR might have been considered as a breaking change for users that used
Lua `pcall` function. Before, the error was a string and now its a table. To keep backward
comparability the PR override the `pcall` implementation and extract the error message from
the error table and return it.

Example of the error stats update:

```
127.0.0.1:6379> lpush l 1
(integer) 2
127.0.0.1:6379> eval "return redis.call('get', 'l')" 0
(error) WRONGTYPE Operation against a key holding the wrong kind of value. script: e471b73f1ef44774987ab00bdf51f21fd9f7974a, on @user_script:1.

127.0.0.1:6379> info Errorstats
# Errorstats
errorstat_WRONGTYPE:count=1

127.0.0.1:6379> info commandstats
# Commandstats
cmdstat_eval:calls=1,usec=341,usec_per_call=341.00,rejected_calls=0,failed_calls=1
cmdstat_info:calls=1,usec=35,usec_per_call=35.00,rejected_calls=0,failed_calls=0
cmdstat_lpush:calls=1,usec=14,usec_per_call=14.00,rejected_calls=0,failed_calls=0
cmdstat_get:calls=1,usec=10,usec_per_call=10.00,rejected_calls=0,failed_calls=1
```

## error message
We can now construct the error message (sent as a reply to the user) from the error table,
so this solves issues where the error message was malformed and the error code appeared
in the middle of the error message:

```diff
127.0.0.1:6379> eval "return redis.call('set','x','y')" 0
-(error) ERR Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479): @user_script:1: OOM command not allowed when used memory > 'maxmemory'.
+(error) OOM command not allowed when used memory > 'maxmemory' @user_script:1. Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479)
```

```diff
127.0.0.1:6379> eval "redis.call('get', 'l')" 0
-(error) ERR Error running script (call to f_8a705cfb9fb09515bfe57ca2bd84a5caee2cbbd1): @user_script:1: WRONGTYPE Operation against a key holding the wrong kind of value
+(error) WRONGTYPE Operation against a key holding the wrong kind of value script: 8a705cfb9fb09515bfe57ca2bd84a5caee2cbbd1, on @user_script:1.
```

Notica that `redis.pcall` was not change:
```
127.0.0.1:6379> eval "return redis.pcall('get', 'l')" 0
(error) WRONGTYPE Operation against a key holding the wrong kind of value
```


## other notes
Notice that Some commands (like GEOADD) changes the cmd variable on the client stats so we
can not count on it to update the command stats. In order to be able to update those stats correctly
we needed to promote `realcmd` variable to be located on the client struct.

Tests was added and modified to verify the changes.

Related PR's: #10279, #10218, #10278, #10309

Co-authored-by: Oran Agra <oran@redislabs.com>
2022-02-27 13:40:57 +02:00
Itamar Haber
c81c7f51c3
Add stream consumer group lag tracking and reporting (#9127)
Adds the ability to track the lag of a consumer group (CG), that is, the number
of entries yet-to-be-delivered from the stream.

The proposed constant-time solution is in the spirit of "best-effort."

Partially addresses #8737.

## Description of approach

We add a new "entries_added" property to the stream. This starts at 0 for a new
stream and is incremented by 1 with every `XADD`.  It is essentially an all-time
counter of the entries added to the stream.

Given the stream's length and this counter value, we can trivially find the logical
"entries_added" counter of the first ID if and only if the stream is contiguous.
A fragmented stream contains one or more tombstones generated by `XDEL`s.
The new "xdel_max_id" stream property tracks the latest tombstone.

The CG also tracks its last delivered ID's as an "entries_read" counter and
increments it independently when delivering new messages, unless the this
read counter is invalid (-1 means invalid offset). When the CG's counter is
available, the reported lag is the difference between added and read counters.

Lastly, this also adds a "first_id" field to the stream structure in order to make
looking it up cheaper in most cases.

## Limitations

There are two cases in which the mechanism isn't able to track the lag.
In these cases, `XINFO` replies with `null` in the "lag" field.

The first case is when a CG is created with an arbitrary last delivered ID,
that isn't "0-0", nor the first or the last entries of the stream. In this case,
it is impossible to obtain a valid read counter (short of an O(N) operation).
The second case is when there are one or more tombstones fragmenting
the stream's entries range.

In both cases, given enough time and assuming that the consumers are
active (reading and lacking) and advancing, the CG should be able to
catch up with the tip of the stream and report zero lag.
Once that's achieved, lag tracking would resume as normal (until the
next tombstone is set).

## API changes

* `XGROUP CREATE` added with the optional named argument `[ENTRIESREAD entries-read]`
  for explicitly specifying the new CG's counter.
* `XGROUP SETID` added with an optional positional argument `[ENTRIESREAD entries-read]`
  for specifying the CG's counter.
* `XINFO` reports the maximal tombstone ID, the recorded first entry ID, and total
  number of entries added to the stream.
* `XINFO` reports the current lag and logical read counter of CGs.
* `XSETID` is an internal command that's used in replication/aof. It has been added with
  the optional positional arguments `[ENTRIESADDED entries-added] [MAXDELETEDID max-deleted-entry-id]`
  for propagating the CG's offset and maximal tombstone ID of the stream.

## The generic unsolved problem

The current stream implementation doesn't provide an efficient way to obtain the
approximate/exact size of a range of entries. While it could've been nice to have
that ability (#5813) in general, let alone specifically in the context of CGs, the risk
and complexities involved in such implementation are in all likelihood prohibitive.

## A refactoring note

The `streamGetEdgeID` has been refactored to accommodate both the existing seek
of any entry as well as seeking non-deleted entries (the addition of the `skip_tombstones`
argument). Furthermore, this refactoring also migrated the seek logic to use the
`streamIterator` (rather than `raxIterator`) that was, in turn, extended with the
`skip_tombstones` Boolean struct field to control the emission of these.

Co-authored-by: Guy Benoish <guy.benoish@redislabs.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-02-23 22:34:58 +02:00
Binbin
488aecb3ab
Fix timing issue in EXEC fail on lazy expired WATCHed key test (#10332)
The test will fail on slow machines (valgrind or FreeBsd).
Because in #10256 when WATCH is called on a key that's already
logically expired, we will add an `expired` flag, and we will
skip it in `isWatchedKeyExpired` check.

Apparently we need to increase the expiration time so that
the key can not expire logically then the WATCH is called.
Also added retries to make sure it doesn't fail. I suppose
100ms is enough in valgrind, tested locally, no need to retry.
2022-02-23 08:47:16 +02:00
Viktor Söderqvist
e9ae03787e
Delete key doesn't dirty client who watched stale key (#10256)
When WATCH is called on a key that's already logically expired, avoid discarding the
transaction when the keys is actually deleted.

When WATCH is called, a flag is stored if the key is already expired
at the time of watch. The expired key is not deleted, only checked.

When a key is "touched", if it is deleted and it was already expired
when a client watched it, the client is not marked as dirty.

Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>
2022-02-22 12:09:46 +02:00
ranshid
47c51d0c78
introduce dynamic client reply buffer size - save memory on idle clients (#9822)
Current implementation simple idle client which serves no traffic still
use ~17Kb of memory. this is mainly due to a fixed size reply buffer
currently set to 16kb.

We have encountered some cases in which the server operates in a low memory environments.
In such cases a user who wishes to create large connection pools to support potential burst period,
will exhaust a large amount of memory  to maintain connected Idle clients.
Some users may choose to "sacrifice" performance in order to save memory.

This commit introduce a dynamic mechanism to shrink and expend the client reply buffer based on
periodic observed peak.
the algorithm works as follows:
1. each time a client reply buffer has been fully written, the last recorded peak is updated: 
new peak = MAX( last peak, current written size)
2. during clients cron we check for each client if the last observed peak was:
     a. matching the current buffer size - in which case we expend (resize) the buffer size by 100%
     b. less than half the buffer size - in which case we shrink the buffer size by 50%
3. In any case we will **not** resize the buffer in case:
    a. the current buffer peak is less then the current buffer usable size and higher than 1/2 the
      current buffer usable size
    b. the value of (current buffer usable size/2) is less than 1Kib
    c. the value of  (current buffer usable size*2) is larger than 16Kib
4. the peak value is reset to the current buffer position once every **5** seconds. we maintain a new
   field in the client structure (buf_peak_last_reset_time) which is used to keep track of how long it
   passed since the last buffer peak reset.

### **Interface changes:**
**CIENT LIST** - now contains 2 new extra fields:
rbs= < the current size in bytes of the client reply buffer >
rbp=< the current value in bytes of the last observed buffer peak position >

**INFO STATS** - now contains 2 new statistics:
reply_buffer_shrinks = < total number of buffer shrinks performed >
reply_buffer_expends = < total number of buffer expends performed >

Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Yoav Steinberg <yoav@redislabs.com>
2022-02-22 11:19:38 +02:00
Madelyn Olson
71204f9632
Implemented module getchannels api and renamed channel keyspec (#10299)
This implements the following main pieces of functionality:
* Renames key spec "CHANNEL" to be "NOT_KEY", and update the documentation to
  indicate it's for cluster routing and not for any other key related purpose.
* Add the getchannels-api, so that modules can now define commands that are subject to
  ACL channel permission checks. 
* Add 4 new flags that describe how a module interacts with a command (SUBSCRIBE, PUBLISH,
  UNSUBSCRIBE, and PATTERN). They are all technically composable, however not sure how a
  command could both subscribe and unsubscribe from a command at once, but didn't see
  a reason to add explicit validation there.
* Add two new module apis RM_ChannelAtPosWithFlags and RM_IsChannelsPositionRequest to
  duplicate the functionality provided by the keys position APIs.
* The RM_ACLCheckChannelPermissions (only released in 7.0 RC1) was changed to take flags
  rather than a boolean literal.
* The RM_ACLCheckKeyPermissions (only released in 7.0 RC1) was changed to take flags
  corresponding to keyspecs instead of custom permission flags. These keyspec flags mimic
  the flags for ACLCheckChannelPermissions.
2022-02-22 11:00:03 +02:00
YaacovHazan
65e4bce0e7
fix return value of loadAppendOnlyFiles (#10295)
Make sure the status return from loading multiple AOF files reflects the overall
result, not just the one of the last file.

When one of the AOF files succeeded to load, but the last AOF file
was empty, the loadAppendOnlyFiles will return AOF_EMPTY.
This commit changes this behavior, and return AOF_OK in that case.

This can happen for example, when loading old AOF file, and no more commands processed,
the manifest file will include base AOF file with data, and empty incr AOF file.

Co-authored-by: chenyang8094 <chenyang8094@users.noreply.github.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-02-22 08:59:23 +02:00
Oran Agra
fad0b0d2a6
Fix error stats and failed command stats for blocked clients (#10309)
This is a followup work for #10278, and a discussion about #10279

The changes:
- fix failed_calls in command stats for blocked clients that got error.
  including CLIENT UNBLOCK, and module replying an error from a thread.
- fix latency stats for XREADGROUP that filed with -NOGROUP

Theory behind which errors should be counted:
- error stats represents errors returned to the user, so an error handled by a
  module should not be counted.
- total error counter should be the same.
- command stats represents execution of commands (even with RM_Call, and if
  they fail or get rejected it counts these calls in commandstats, so it should
  also count failed_calls)

Some thoughts about Scripts:
for scripts it could be different since they're part of user code, not the infra (not an extension to redis)
we certainly want commandstats to contain all calls and errors
a simple script is like mult-exec transaction so an error inside it should be counted in error stats
a script that replies with an error to the user (using redis.error_reply) should also be counted in error stats
but then the problem is that a plain `return redis.call("SET")` should not be counted twice (once for the SET
and once for EVAL)
so that's something left to be resolved in #10279
2022-02-21 11:20:41 +02:00
yoav-steinberg
b59bb9b476
Fix script active defrag test (#10318)
This includes two fixes:
* We forgot to count non-key reallocs in defragmentation stats.
* Fix the script defrag tests so to make dict entries less signigicant in fragmentation by making the scripts larger.
This assures active defrage will complete and reach desired results.
Some inherent fragmentation might exists in dict entries which we need to ignore.
This lead to occasional CI failures.
2022-02-21 09:37:25 +02:00
qetu3790
b2d393b990
Fix geo search bounding box check causing missing results (#10018)
Consider the following example:
1. geoadd k1 -0.15307903289794921875 85 n1 0.3515625 85.00019260486917005437 n2.
2. geodist k1 n1 n2 returns  "4891.9380"
3. but GEORADIUSBYMEMBER k1 n1 4891.94 m only returns n1.
n2 is in the  boundingbox but out of search areas.So we let  search areas contain boundingbox to get n2.

Co-authored-by: Binbin <binloveplay1314@qq.com>
2022-02-21 08:06:58 +02:00
chenyang8094
a50aa29bde
Adapt redis-check-aof tool for Multi Part Aof (#10061)
Modifications of this PR:
1. Support the verification of `Multi Part AOF`, while still maintaining support for the
  old-style `AOF/RDB-preamble`. `redis-check-aof` will automatically choose which
  mode to use according to the incoming file format.
   
`Usage: redis-check-aof [--fix|--truncate-to-timestamp $timestamp] <AOF/manifest>`
 
2. Refactor part of the code to make it easier to understand
3. Currently only supports truncate  (`--fix` or `--truncate-to-timestamp`) the last AOF
  file (may be `BASE` or `INCR`)

The reasons for 3 above:
- for `--fix`: Only the last AOF may be truncated, this is guaranteed by redis
- for `--truncate-to-timestamp`:  Normally, we only have `BASE` + `INCR` files
  at most, and `BASE` cannot be truncated(It only contains a timestamp annotation
  at the beginning of the file), so only `INCR` can be truncated. If we have a
  `BASE+INCR1+INCR2` file (meaning we have an interrupted AOFRW), Only `INCR2`
  files can be truncated at this time. If we still insist on truncate `INCR1`, we need to
  manually delete `INCR2` and update the manifest file, then re-run `redis-check-aof`
- If we want to support truncate any file, we need to add very complicated code to support
  the atomic modification of multiple file deletion and update manifest, I think this is unnecessary
2022-02-17 08:13:28 +02:00
YaacovHazan
e6478cfd10
fix "Connect multiple replicas at the same time" test (#10294)
In order to make sure no more commands processed, we wait that
the 'load handlers' will disconncet.

The test by mistake waited on the (last) slave instead of the master.
2022-02-14 08:46:58 +02:00
Oran Agra
b099889a3a
Fix and improve module error reply statistics (#10278)
This PR handles several aspects
1. Calls to RM_ReplyWithError from thread safe contexts don't violate thread safety.
2. Errors returning from RM_Call to the module aren't counted in the statistics (they
  might be handled silently by the module)
3. When a module propagates a reply it got from RM_Call to it's client, then the error
  statistics are counted.

This is done by:
1. When appending an error reply to the output buffer, we avoid updating the global
  error statistics, instead we cache that error in a deferred list in the client struct.
2. When creating a RedisModuleCallReply object, the deferred error list is moved from
  the client into that object.
3. when a module calls RM_ReplyWithCallReply we copy the deferred replies to the dest
  client (if that's a real client, then that's when the error statistics are updated to the server)

Note about RM_ReplyWithCallReply: if the original reply had an array with errors, and the module
replied with just a portion of the original reply, and not the entire reply, the errors are currently not
propagated and the errors stats will not get propagated.

Fix #10180
2022-02-13 18:37:32 +02:00
Binbin
62c8be28ee
Regression test for sync psync crash (#10288)
Added regression tests for #10020 / #10081 / #10243.
The above PRs fixed some crashes due to an asserting,
see function `clientHasPendingReplies` (introduced in #9166).

This commit added some tests to cover the above scenario.
These tests will all fail in #9166, althought fixed not,
there is value in adding these tests to cover and verify
the changes. And it also can cover #8868 (verify the logs).

Other changes: 
1. Reduces the wait time in `waitForBgsave` and `waitForBgrewriteaof`
from 1s to 50ms, which should reduce the time for some tests.
2. Improve the test infra to print context when `assert_match` fails.
3. Improve the test infra to print `$error` when `assert_error` fails.
```
Expected an error matching 'ERR*' but got 'OK' (context: type eval line 4 cmd {assert_error "ERR*" {r set a b}} proc ::test)
```
2022-02-13 09:52:38 +02:00
yoav-steinberg
2eb9b19612
Fix Eval scripts defrag (broken 7.0 in RC1) (#10271)
Remove scripts defragger since it was broken since #10126 (released in 7.0 RC1).
would crash the server if defragger starts in a server that contains eval scripts.

In #10126 the global `lua_script` dict became a dict to a custom `luaScript` struct with an internal `robj`
in it instead of a generic `sds` -> `robj` dict. This means we need custom code to defrag it and since scripts
should never really cause much fragmentation it makes more sense to simply remove the defrag code for scripts.
2022-02-11 21:58:05 +02:00
sundb
5f0119ca91
Fix duplicate module options define (#10284)
The bug is introduced by #9323. (released in 7.0 RC1)
The define of `REDISMODULE_OPTIONS_HANDLE_IO_ERRORS` and `REDISMODULE_OPTION_NO_IMPLICIT_SIGNAL_MODIFIED` have the same value.

This will result in skipping `signalModifiedKey()` after `RM_CloseKey()` if the module has set
`REDISMODULE_OPTIONS_HANDLE_REPL_ASYNC_LOAD` option.
The implication is missing WATCH and client side tracking invalidations.

Other changes:
- add `no-implicit-signal-modified` to the options in INFO modules

Co-authored-by: Oran Agra <oran@redislabs.com>
2022-02-11 20:15:52 +02:00
Harkrishn Patro
a5d17f0b6c
Check target node is a primary during cluster setslot. (#10277) 2022-02-10 23:14:27 -08:00
Wen Hui
64e1e7e207
Add AUTH arity test (#10266)
Add test for AUTH with too many arguments
2022-02-09 22:09:20 +02:00
Binbin
beb94c901e
Fix INFO SENTINEL memory leak (#10268)
* Fix INFO SENTINEL memory leak

Introduced in #6891

* remove the copy-paste sentence
2022-02-09 07:33:24 +02:00
Wen Hui
2e1bc942aa
Make INFO command variadic (#6891)
This is an enhancement for INFO command, previously INFO only support one argument
for different info section , if user want to get more categories information, either perform
INFO all / default or calling INFO for multiple times.

**Description of the feature**

The goal of adding this feature is to let the user retrieve multiple categories via the INFO
command, and still avoid emitting the same section twice.

A use case for this is like Redis Sentinel, which periodically calling INFO command to refresh
info from monitored Master/Slaves, only Server and Replication part categories are used for
parsing information. If the INFO command can return just enough categories that client side
needs, it can save a lot of time for client side parsing it as well as network bandwidth.

**Implementation**
To share code between redis, sentinel, and other users of INFO (DEBUG and modules),
we have a new `genInfoSectionDict` function that returns a dict and some boolean flags
(e.g. `all`) to the caller (built from user input).
Sentinel is later purging unwanted sections from that, and then it is forwarded to the info `genRedisInfoString`.

**Usage Examples**
INFO Server Replication   
INFO CPU Memory
INFO default commandstats

Co-authored-by: Oran Agra <oran@redislabs.com>
2022-02-08 13:14:42 +02:00
yoav-steinberg
b76016a948
Consistent erros returned from EVAL scripts (#10218)
This PR handles inconsistencies in errors returned from lua scripts.
Details of the problem can be found in #10165.

### Changes

- Remove double stack trace. It's enough that a stack trace is automatically added by the engine's error handler
  see d0bc4fff18/src/function_lua.c (L472-L485)
  and d0bc4fff18/src/eval.c (L243-L255)
- Make sure all errors a preceded with an error code. Passing a simple string to `luaPushError()` will prepend it
  with a generic `ERR` error code.
- Make sure lua error table doesn't include a RESP `-` error status. Lua stores redis error's as a lua table with a
  single `err` field and a string. When the string is translated back to RESP we add a `-` to it.
  See d0bc4fff18/src/script_lua.c (L510-L517)
  So there's no need to store it in the lua table.

### Before & After
```diff
--- <unnamed>
+++ <unnamed>
@@ -1,14 +1,14 @@
  1: config set maxmemory 1
  2: +OK
  3: eval "return redis.call('set','x','y')" 0
- 4: -ERR Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479): @user_script:1: @user_script: 1: -OOM command not allowed when used memory > 'maxmemory'.
+ 4: -ERR Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479): @user_script:1: OOM command not allowed when used memory > 'maxmemory'.
  5: eval "return redis.pcall('set','x','y')" 0
- 6: -@user_script: 1: -OOM command not allowed when used memory > 'maxmemory'.
+ 6: -OOM command not allowed when used memory > 'maxmemory'.
  7: eval "return redis.call('select',99)" 0
  8: -ERR Error running script (call to 4ad5abfc50bbccb484223905f9a16f09cd043ba8): @user_script:1: ERR DB index is out of range
  9: eval "return redis.pcall('select',99)" 0
 10: -ERR DB index is out of range
 11: eval_ro "return redis.call('set','x','y')" 0
-12: -ERR Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479): @user_script:1: @user_script: 1: Write commands are not allowed from read-only scripts.
+12: -ERR Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479): @user_script:1: ERR Write commands are not allowed from read-only scripts.
 13: eval_ro "return redis.pcall('set','x','y')" 0
-14: -@user_script: 1: Write commands are not allowed from read-only scripts.
+14: -ERR Write commands are not allowed from read-only scripts.
```
2022-02-08 11:44:40 +02:00
guybe7
3c3e6cc1c7
X[AUTO]CLAIM should skip deleted entries (#10227)
Fix #7021 #8924 #10198

# Intro
Before this commit X[AUTO]CLAIM used to transfer deleted entries from one
PEL to another, but reply with "nil" for every such entry (instead of the entry id).
The idea (for XCLAIM) was that the caller could see this "nil", realize the entry
no longer exists, and XACK it in order to remove it from PEL.
The main problem with that approach is that it assumes there's a correlation
between the index of the "id" arguments and the array indices, which there
isn't (in case some of the input IDs to XCLAIM never existed/read):

```
127.0.0.1:6379> XADD x 1 f1 v1
"1-0"
127.0.0.1:6379> XADD x 2 f1 v1
"2-0"
127.0.0.1:6379> XADD x 3 f1 v1
"3-0"
127.0.0.1:6379> XGROUP CREATE x grp 0
OK
127.0.0.1:6379> XREADGROUP GROUP grp Alice COUNT 2 STREAMS x >
1) 1) "x"
   2) 1) 1) "1-0"
         2) 1) "f1"
            2) "v1"
      2) 1) "2-0"
         2) 1) "f1"
            2) "v1"
127.0.0.1:6379> XDEL x 1 2
(integer) 2
127.0.0.1:6379> XCLAIM x grp Bob 0 0-99 1-0 1-99 2-0
1) (nil)
2) (nil)
```

# Changes
Now,  X[AUTO]CLAIM acts in the following way:
1. If one tries to claim a deleted entry, we delete it from the PEL we found it in
  (and the group PEL too). So de facto, such entry is not claimed, just cleared
  from PEL (since anyway it doesn't exist in the stream)
2. since we never claim deleted entries, X[AUTO]CLAIM will never return "nil"
  instead of an entry.
3. add a new element to XAUTOCLAIM's response (see below)

# Knowing which entries were cleared from the PEL
The caller may want to log any entries that were found in a PEL but deleted from
the stream itself (it would suggest that there might be a bug in the application:
trimming the stream while some entries were still no processed by the consumers)

## XCLAIM
the set {XCLAIM input ids} - {XCLAIM returned ids} contains all the entry ids that were
not claimed which means they were deleted (assuming the input contains only entries
from some PEL). The user doesn't need to XACK them because XCLAIM had already
deleted them from the source PEL.

## XAUTOCLAIM
XAUTOCLAIM has a new element added to its reply: it's an array of all the deleted
stream IDs it stumbled upon.

This is somewhat of a breaking change since X[AUTO]CLAIM used to be able to reply
with "nil" and now it can't... But since it was undocumented (and generally a bad idea
to rely on it, as explained above) the breakage is not that bad.
2022-02-08 10:20:09 +02:00
Oran Agra
66be30f7fc
Handle key-spec flags with modules (#10237)
- add COMMAND GETKEYSANDFLAGS sub-command
- add RM_KeyAtPosWithFlags and GetCommandKeysWithFlags
- RM_KeyAtPos and RM_CreateCommand set flags requiring full access for keys
- RM_CreateCommand set VARIABLE_FLAGS
- expose `variable_flags` flag in COMMAND INFO key-specs
- getKeysFromCommandWithSpecs prefers key-specs over getkeys-api
- add tests for all of these
2022-02-08 10:01:35 +02:00
Binbin
7f4cca11dc
COMMAND DOCS avoid adding summary/since if they don't exist (#10252)
If summary or since is empty, we used to return NULL in
COMMAND DOCS. Currently all redis commands will have these
two fields.

But not for module command, summary and since are optional
for RM_SetCommandInfo. With the change in #10043, if a module
command doesn't have the summary or since, redis-cli will
crash (see #10250).

In this commit, COMMAND DOCS avoid adding summary or since
when they are missing.
2022-02-07 19:57:50 +02:00
yoav-steinberg
9dfeda58ed
acl check api for functions and eval (#10220)
Changes:
1. Adds the `redis.acl_check_cmd()` api to lua scripts. It can be used to check if the
  current user has permissions to execute a given command. The new function receives
  the command to check as an argument exactly like `redis.call()` receives the command
  to execute as an argument.
2. In the PR I unified the code used to convert lua arguments to redis argv arguments from
  both the new `redis.acl_check_cmd()` API and the `redis.[p]call()` API. This cleans up
  potential duplicate code.
3. While doing the refactoring in 2 I noticed there's an optimization to reduce allocation calls
  when parsing lua arguments into an `argv` array in the `redis.[p]call()` implementation.
  These optimizations were introduced years ago in 48c49c4851
  and 4f686555ce. It is unclear why this was added.
  The original commit message claims a 4% performance increase which I couldn't recreate
  and might not be worth it even if it did recreate. This PR removes that optimization.
  Following are details of the benchmark I did that couldn't reveal any performance
  improvements due to this optimization:

```
benchmark 1: src/redis-benchmark -P 500 -n 10000000 eval 'return redis.call("ping")' 0
benchmark 2: src/redis-benchmark -P 500 -r 1000 -n 1000000 eval 'return redis.call("mset","k1__rand_int__","v1__rand_int__","k2__rand_int__","v2__rand_int__","k3__rand_int__","v3__rand_int__","k4__rand_int__","v4__rand_int__")' 0
benchmark 3: src/redis-benchmark -P 500 -r 1000 -n 100000 eval "for i=1,100,1 do redis.call('set','kk'..i,'vv'..__rand_int__) end return redis.call('get','kk5')" 0
benchmark 4: src/redis-benchmark -P 500 -r 1000 -n 1000000 eval 'return redis.call("mset","k1__rand_int__","v1__rand_int__","k2__rand_int__","v2__rand_int__","k3__rand_int__","v3__rand_int__","k4__rand_int__","v4__rand_int__xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx")'
```
I ran the benchmark on this branch with and without commit 68b71680a4d3bb8f0509e06578a9f15d05b92a47
Results in requests per second:
cmd | without optimization | without optimization 2nd run | with original optimization | with original optimization 2nd run
-- | -- | -- | -- | --
1 | 461233.34 | 477395.31 | 471098.16 | 469946.91
2 | 34774.14 | 35469.8 | 35149.38 | 34464.93
3 | 6390.59 | 6281.41 | 6146.28 | 6464.12
4 | 28005.71 |   | 27965.77 |  

As you can see, different use cases showed identical or negligible performance differences.
So finally I decided to chuck the original optimization and simplify the code.
2022-02-07 08:04:01 +02:00
Oran Agra
98b3f52599
add test suite infra to test RESP3 attributes (#10247)
So far we only tested attributes using readraw, not the
resp parser caches them, so that after getting the reply,
you can query them if you want.
2022-02-07 00:10:05 +02:00
Wen Hui
6ebb679f06
Add tests for ACL command error cases (#10183) 2022-02-06 07:58:28 +02:00
Jason Elbaum
5b17909c4f
redis-cli generates command help tables from the results of COMMAND (#10043)
This is a followup to #9656 and implements the following step mentioned in that PR:

* When possible, extract all the help and completion tips from COMMAND DOCS (Redis 7.0 and up)
* If COMMAND DOCS fails, use the static help.h compiled into redis-cli.
* Supplement additional command names from COMMAND (pre-Redis 7.0)

The last step is needed to add module command and other non-standard commands.

This PR does not change the interactive hinting mechanism, which still uses only the param
strings to provide somewhat unreliable and inconsistent command hints (see #8084).
That task is left for a future PR. 

Co-authored-by: Oran Agra <oran@redislabs.com>
2022-02-05 16:54:16 +02:00
Viktor Söderqvist
0a82fe8447
Command info module API (#10108)
Adds RM_SetCommandInfo, allowing modules to provide the following command info:

* summary
* complexity
* since
* history
* hints
* arity
* key specs
* args

This information affects the output of `COMMAND`, `COMMAND INFO` and `COMMAND DOCS`,
Cluster, ACL and is used to filter commands with the wrong number of arguments before
the call reaches the module code.

The recently added API functions for key specs (never released) are removed.

A minimalist example would look like so:
```c
    RedisModuleCommand *mycmd = RedisModule_GetCommand(ctx,"mymodule.mycommand");
    RedisModuleCommandInfo mycmd_info = {
        .version = REDISMODULE_COMMAND_INFO_VERSION,
        .arity = -5,
        .summary = "some description",
    };
    if (RedisModule_SetCommandInfo(mycmd, &mycmd_info) == REDISMODULE_ERR)
        return REDISMODULE_ERR;
````

Notes:
* All the provided information (including strings) is copied, not keeping references to the API input data.
* The version field is actually a static struct that contains the sizes of the the structs used in arrays,
  so we can extend these in the future and old version will still be able to take the part they can support.
2022-02-04 21:09:36 +02:00
Binbin
d7fcb3c5a1
Fix SENTINEL SET config rewrite test (#10232)
Change the sentinel config file to a directory in SENTINEL SET test.
So it will now fail on the `rename` in `rewriteConfigOverwriteFile`.

The test used to set the sentinel config file permissions to `000` to
simulate failure. But it fails on centos7 / freebsd / alpine. (introduced in #10151)

Other changes:
1. More error messages after the config rewrite failure.
2. Modify arg name `force_all` in `rewriteConfig` to `force_write`. (was rename in #9304)
3. Fix a typo in debug quicklist-packed-threshold, then -> than. (#9357)
2022-02-04 11:39:51 +02:00
Binbin
d2fde2f655
Fix cluster tests failing due to subcommand names (#10231)
Introduced in #10128
2022-02-04 11:32:30 +02:00
Wen Hui
65ef543f8c
Sentinel: return an error if configuration save fails (#10151)
When performing `SENTINEL SET`, Sentinel updates the local configuration file. Before this commit, failure to update the file would still result with an `+OK` reply. Now, a `-ERR Failed to save config file` error will be returned.

Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
2022-02-03 13:20:35 +02:00
Vo Trong Phuc
53c43fcc84
Add check min-slave-* feature when evaluating Lua scripts and Functions (#10160)
Add check enough good slaves for write command when evaluating scripts.
This check is made before the script is executed, if we have function flags, and per redis command if we don't.

Co-authored-by: Phuc. Vo Trong <phucvt@vng.com.vn>
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Meir Shpilraien (Spielrein) <meir@redis.com>
2022-02-03 11:57:51 +02:00