Commit Graph

1372 Commits

Author SHA1 Message Date
Yossi Gottlieb
5e3a15ae1b
Fix failing cluster tests. (#8763)
Disable replica migration to avoid a race condition where the
migrated-from node turns into a replica.

Long term, this test should probably be improved to handle multiple
slots and accept such auto migrations but this is a quick fix to
stabilize the CI without completely dropping this test.
2021-04-13 00:00:57 +03:00
Yang Bodong
4c14e8668c
Fix out of range confusing error messages (XAUTOCLAIM, RPOP count) (#8746)
Fix out of range error messages to be clearer (avoid mentioning 9223372036854775807)
* Fix XAUTOCLAIM COUNT option confusing error msg
* Fix other RPOP and alike error message to mention positive
2021-04-07 10:01:28 +03:00
Bonsai
808f3004f0
Fix: server will crash if rdbload or rdbsave method is not provided in module (#8670)
With this fix, module data type registration will fail if the load or save callbacks are not defined, or the optional aux load and save callbacks are not either both defined or both missing.
2021-04-06 12:09:36 +03:00
Yossi Gottlieb
4724dd439e
Clean up and stabilize cluster migration tests. (#8745)
This is work in progress, focusing on two main areas:
* Avoiding race conditions with cluster configuration propagation.
* Ignoring limitations with redis-cli --cluster fix which makes it hard
  to distinguish real errors (e.g. failure to fix) from expected
  conditions in this test (e.g. nodes not agreeing on configuration).
2021-04-06 11:57:57 +03:00
Huang Zhw
3b74b55084
Fix "default" and overwritten / reset users will not have pubsub channels permissions by default. (#8723)
Background:
Redis 6.2 added ACL control for pubsub channels (#7993), which were supposed
to be permissive by default to retain compatibility with redis 6.0 ACL. 
But due to a bug, only newly created users got this `acl-pubsub-default` applied,
while overwritten (updated) users got reset to `resetchannels` (denied).

Since the "default" user exists before loading the config file,
any ACL change to it, results in an update / overwrite.

So when a "default" user is loaded from config file or include ACL
file with no channels related rules, the user will not have any
permissions to any channels. But other users will have default
permissions to any channels.

When upgraded from 6.0 with config rewrite, this will lead to
"default" user channels permissions lost.
When users are loaded from include file, then call "acl load", users
will also lost channels permissions.

Similarly, the `reset` ACL rule, would have reset the user to be denied
access to any channels, ignoring `acl-pubsub-default` and breaking
compatibility with redis 6.0.

The implication of this fix is that it regains compatibility with redis 6.0,
but breaks compatibility with redis 6.2.0 and 2.0.1. e.g. after the upgrade,
the default user will regain access to pubsub channels.

Other changes:
Additionally this commit rename server.acl_pubusub_default to
server.acl_pubsub_default and fix typo in acl tests.
2021-04-05 23:13:20 +03:00
Sokolov Yura
1cab962098
Add cluster-allow-replica-migration option. (#5285)
Previously (and by default after commit) when master loose its last slot
(due to migration, for example), its replicas will migrate to new last slot
holder.

There are cases where this is not desired:
* Consolidation that results with removed nodes (including the replica, eventually).
* Manually configured cluster topologies, which the admin wishes to preserve.

Needlessly migrating a replica triggers a full synchronization and can have a negative impact, so
we prefer to be able to avoid it where possible.

This commit adds 'cluster-allow-replica-migration' configuration option that is
enabled by default to preserve existed behavior. When disabled, replicas will
not be auto-migrated.

Fixes #4896

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-04-04 09:43:24 +03:00
Valentino Geron
44d8b039e8
Fix XAUTOCLAIM response to return the next available id as the cursor (#8725)
This command used to return the last scanned entry id as the cursor,
instead of the next one to be scanned.
so in the next call, the user could / should have sent `(cursor` and not
just `cursor` if he wanted to avoid scanning the same record twice.

Scanning the record twice would look odd if someone is checking what
exactly was scanned, but it also has a side effect of incrementing the
delivery count twice.
2021-04-01 12:13:55 +03:00
Oran Agra
370ab4c4db
Solve sentinel test issue in TLS due to recent tests change. (#8728)
5629dbe71 added a change that configures the tcp (plaintext) port
alongside the tls port, this causes the INFO command for tcp_port
to return that instead of the tls port when running in tls, and that broke
the sentinel tests that query it.

the fix is to add a method that gets the right port from CONFIG instead
of relying on the tcp_port info field.
2021-04-01 09:44:44 +03:00
guybe7
843f769b96
zsetAdd: Fix wrong reply in case of INCR and GT/LT (#8717)
If GT/LT fails the operation we need to reply with
nill (like failure due to NX).

Other changes:
Add the missing $encoding suffix to many zset tests

Note: there's a behavior change just in case of INCR + GT/LT that fails.
The old code was replying with the wrong (rejected) score, and now it'll reply with nil.

Note that that's anyway a corner case so this "behavior change" shouldn't have too much affect.
Using GT/LT with INCR has a predictable result even before we run the command
(INCR GT will only only / always fail if the increment is negative).
2021-04-01 09:33:53 +03:00
sundb
569a3f4548
Use chi-square for random distributivity verification in test (#8709)
Problem:
Currently, when performing random distribution verification, we determine
the probability of each element occurring in the sum, but the probability is
only an estimate, these tests had rare sporadic failures, and we cannot verify
what the probability of failure will be.

Solution:
Using the chi-square distribution instead of the original random distribution
validation makes the test more reasonable and easier to find problems.
2021-04-01 08:20:15 +03:00
Jérôme Loyet
91f4f41665
Add replica-announced config option (#8653)
The 'sentinel replicas <master>' command will ignore replicas with
`replica-announced` set to no.

The goal of disabling the config setting replica-announced is to allow ghost
replicas. The replica is in the cluster, synchronize with its master, can be
promoted to master and is not exposed to sentinel clients. This way, it is
acting as a live backup or living ghost.

In addition, to prevent the replica to be promoted as master, set
replica-priority to 0.
2021-03-30 23:40:22 +03:00
Yossi Gottlieb
6a052af890
Cluster migration test cleanup. (#8726)
* Dump more output on error (always, cluster tests currently have no
verbose flag).
* Slow down redis-cli check iteration.
2021-03-30 23:33:01 +03:00
Viktor Söderqvist
5629dbe715
Add support for plaintext clients in TLS cluster (#8587)
The cluster bus is established over TLS or non-TLS depending on the configuration tls-cluster. The client ports distributed in the cluster and sent to clients are assumed to be TLS or non-TLS also depending on tls-cluster.

The cluster bus is now extended to also contain the non-TLS port of clients in a TLS cluster, when available. The non-TLS port of a cluster node, when available, is sent to clients connected without TLS in responses to CLUSTER SLOTS, CLUSTER NODES, CLUSTER SLAVES and MOVED and ASK redirects, instead of the TLS port.

The user was able to override the client port by defining cluster-announce-port. Now cluster-announce-tls-port is added, so the user can define an alternative announce port for both TLS and non-TLS clients.

Fixes #8134
2021-03-30 23:11:32 +03:00
JunhuaY
28375ff63e
re-fix config rewrite for empty save directive (#8722)
the bug was also discussed in #8716, and was solved in #8719, but incompletely:
when the server is started, and the save option is default, if you issue the " config set save "" "
to change the save option, and then issue the “config rewrite” command, the " save "" " won't be saved.
2021-03-30 22:49:06 +03:00
Oran Agra
cd81dcf18b
solve race conditions in psync2-pingoff test (#8720)
Another test race condition in the macos tests.
the test was waiting for PINGs to be generated and put on the replication stream,
but waiting for 1 or 2 seconds doesn't really guarantee that.
then the test that expected 6 full syncs, found only 4
2021-03-30 11:41:06 +03:00
Yossi Gottlieb
65311a3360
Fix config rewrite with an empty "save" parameter. (#8719) 2021-03-29 18:53:20 +03:00
Sokolov Yura
315df9ada0
Add cluster slot migration tests (#8649)
Add tests for fixing migrating slot at all stages:

1. when migration is half inited on "migrating" node
2. when migration is half inited on "importing" node
3. migration inited, but not finished
4. migration is half finished on "migrating" node
5. migration is half finished on "importing" node

Also add tests for many simultaneous slot migrations.

Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
2021-03-29 13:52:02 +03:00
Meir Shpilraien (Spielrein)
036963a7da
Restore old client 'processCommandAndResetClient' to fix false dead client indicator (#8715)
'processCommandAndResetClient' returns 1 if client is dead. It does it
by checking if serve.current_client is NULL. On script timeout, Redis will re-enter
'processCommandAndResetClient' and when finish we will set server.current_client
to NULL. This will cause later to falsely return 1 and think that the client that
sent the timed-out script is dead (Redis to stop reading from the client buffer).
2021-03-29 13:34:16 +03:00
Huang Zhw
e138698e54
make processCommand check publish channel permissions. (#8534)
Add publish channel permissions check in processCommand.

processCommand didn't check publish channel permissions, so we can
queue a publish command in a transaction. But when exec the transaction,
it will fail with -NOPERM.

We also union keys/commands/channels permissions check togegher in
ACLCheckAllPerm. Remove pubsubCheckACLPermissionsOrReply in 
publishCommand/subscribeCommand/psubscribeCommand. Always 
check permissions in processCommand/execCommand/
luaRedisGenericCommand.
2021-03-26 14:10:01 +03:00
Oran Agra
497351ad07
Fix SLOWLOG for blocked commands (#8632)
* SLOWLOG didn't record anything for blocked commands because the client
  was reset and argv was already empty. there was a fix for this issue
  specifically for modules, now it works for all blocked clients.
* The original command argv (before being re-written) was also reset
  before adding the slowlog on behalf of the blocked command.
* Latency monitor is now updated regardless of the slowlog flags of the
  command or its execution (their purpose is to hide sensitive info from
  the slowlog, not hide the fact the latency happened).
* Latency monitor now uses real_cmd rather than c->cmd (which may be
  different if the command got re-written, e.g. GEOADD)

Changes:
* Unify shared code between slowlog insertion in call() and
  updateStatsOnUnblock(), hopefully prevent future bugs from happening
  due to the later being overlooked.
* Reset CLIENT_PREVENT_LOGGING in resetClient rather than after command
  processing.
* Add a test for SLOWLOG and BLPOP

Notes:
- real_cmd == c->lastcmd, except inside MULTI and Lua.
- blocked commands never happen in these cases (MULTI / Lua)
- real_cmd == c->cmd, except for when the command is rewritten (e.g.
  GEOADD)
- blocked commands (currently) are never rewritten
- other than the command's CLIENT_PREVENT_LOGGING, and the
  execution flag CLIENT_PREVENT_LOGGING, other cases that we want to
  avoid slowlog are on AOF loading (specifically CMD_CALL_SLOWLOG will
  be off when executed from execCommand that runs from an AOF)
2021-03-25 10:20:27 +02:00
Qu Chen
7de6451818
Properly initialize variable to make valgrind happy in checkChildrenDone(). Removed usage for the obsolete wait3() and wait4() in favor of waitpid(), and properly check for the exit status code. (#8666) 2021-03-24 08:41:05 -07:00
yoav-steinberg
3060de88ce
Remove cron saving during BGSAVE test. (#8688)
This fixes a race where a bgsave can start during the test after we verified no bgsave is running.
2021-03-24 15:14:47 +02:00
Oran Agra
f6e1a94e03
Corrupt stream key access to uninitialized memory (#8681)
the corrupt-dump-fuzzer test found a case where an access to a corrupt
stream would have caused accessing to uninitialized memory.
now it'll panic instead.

The issue was that there was a stream that says it has more than 0
records, but looking for the max ID came back empty handed.

p.s. when sanitize-dump-payload is used, this corruption is detected,
and the RESTORE command is gracefully rejected.
2021-03-24 11:33:49 +02:00
Yossi Gottlieb
c4ef1efdb7
Add support for reading encrypted keyfiles. (#8644) 2021-03-22 13:27:46 +02:00
Oran Agra
2f717c156a
fix race in diskless load cluster tests (#8674) 2021-03-22 10:51:13 +02:00
Oran Agra
a7c02b19bf
Fix race in replication test (#8679)
Since redis 6.2, redis immediately tries to connect to the master, not
waiting for replication cron.

in the slow freebsd CI, this test failed and master_link_status was
already "up" when INFO was called.
2021-03-22 10:50:39 +02:00
Meir Shpilraien (Spielrein)
9ae4f5c73d
Fix script kill to work also on scripts that use pcall (#8661)
pcall function runs another LUA function in protected mode, this means
that any error will be caught by this function and will not stop the LUA
execution. The script kill mechanism uses error to stop the running script.
Scripts that uses pcall can catch the error raise by the script kill mechanism,
this will cause a script like this to be unkillable:

local f = function()
        while 1 do
                redis.call('ping')
        end
end
while 1 do
        pcall(f)
end

The fix is, when we want to kill the script, we set the hook function to be invoked 
after each line. This will promise that the execution will get another
error before it is able to enter the pcall function again.
2021-03-17 18:52:11 +02:00
Huang Zhw
a19c4058be
When tests exit normally, some processes may still be alive (#8647)
In certain scenario start_server may think it failed to start a redis server
although it started successfully. in these cases, it'll not terminate it, and
it'll remain running when the test is over.

In start_server if config doesn't have bind (the minimal.conf in introspection.tcl),
it will try to bind ipv4 and ipv6. One may success while other fails. It will
output "Could not create server TCP listening socket".
wait_server_started uses this message to check whether instance started
successfully. So it will consider that it failed even though redis started successfully.

Additionally, in some cases it wasn't clear to users why the server exited,
since the warning message printed to the log, could in some cases be harmless,
and in some cases fatal.

This PR adds makes a clear distinction between a warning log message and
a fatal one, and changes the test suite to look for the fatal message.
2021-03-16 17:25:30 +02:00
Madelyn Olson
e1d98bca5a
Redact slowlog entries for config with sensitive data. (#8584)
Redact config set requirepass/masterauth/masteruser from slowlog in addition to showing ACL commands without sensitive values.
2021-03-15 22:00:29 -07:00
guybe7
dba33a943d
Missing EXEC on modules propagation after failed EVAL execution (#8654)
1. moduleReplicateMultiIfNeeded should use server.in_eval like
   moduleHandlePropagationAfterCommandCallback
2. server.in_eval could have been set to 1 and not reset back
   to 0 (a lot of missed early-exits after in_eval is already 1)

Note: The new assertions in processCommand cover (2) and I added
two module tests to cover (1)

Implications:
If an EVAL that failed (and thus left server.in_eval=1) runs before a module
command that replicates, the replication stream will contain MULTI (because
moduleReplicateMultiIfNeeded used to check server.lua_caller which is NULL
at this point) but not EXEC (because server.in_eval==1)
This only affects modules as module.c the only user of server.in_eval.

Affects versions 6.2.0, 6.2.1
2021-03-15 21:19:57 +02:00
KinWaiYuen
5b48d90049
Optimize CLUSTER SLOTS reply by reducing unneeded loops (#8541)
This commit more efficiently computes the cluster bulk slots response 
by looping over the entire slot space once, instead of for each node.
2021-03-11 22:40:35 -08:00
guybe7
a4f03bd7eb
Fix some memory leaks in propagagte.c (#8636)
Introduced by 3d0b427c30
2021-03-11 13:50:13 +02:00
Harkrishn Patro
b70d81f60b
Process hello command even if the default user has no permissions. (#8633)
Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>
2021-03-10 21:19:35 -08:00
guybe7
3d0b427c30
Fix some issues with modules and MULTI/EXEC (#8617)
Bug 1:
When a module ctx is freed moduleHandlePropagationAfterCommandCallback
is called and handles propagation. We want to prevent it from propagating
commands that were not replicated by the same context. Example:
1. module1.foo does: RM_Replicate(cmd1); RM_Call(cmd2); RM_Replicate(cmd3)
2. RM_Replicate(cmd1) propagates MULTI and adds cmd1 to also_propagagte
3. RM_Call(cmd2) create a new ctx, calls call() and destroys the ctx.
4. moduleHandlePropagationAfterCommandCallback is called, calling
   alsoPropagates EXEC (Note: EXEC is still not written to socket),
   setting server.in_trnsaction = 0
5. RM_Replicate(cmd3) is called, propagagting yet another MULTI (now
   we have nested MULTI calls, which is no good) and then cmd3

We must prevent RM_Call(cmd2) from resetting server.in_transaction.
REDISMODULE_CTX_MULTI_EMITTED was revived for that purpose.

Bug 2:
Fix issues with nested RM_Call where some have '!' and some don't.
Example:
1. module1.foo does RM_Call of module2.bar without replication (i.e. no '!')
2. module2.bar internally calls RM_Call of INCR with '!'
3. at the end of module1.foo we call RM_ReplicateVerbatim

We want the replica/AOF to see only module1.foo and not the INCR from module2.bar

Introduced a global replication_allowed flag inside RM_Call to determine
whether we need to replicate or not (even if '!' was specified)

Other changes:
Split beforePropagateMultiOrExec to beforePropagateMulti afterPropagateExec
just for better readability
2021-03-10 18:02:17 +02:00
Yossi Gottlieb
817894c012
Fix test false positive due to a race condition. (#8616) 2021-03-08 21:22:08 +02:00
Yossi Gottlieb
7d81f39222
Fix flaky unit/maxmemory test on MacOS/BSD. (#8619)
It seems like non-Linux sockets may be less greedy, resulting with more
transient client output buffers.

Haven't proven this but empirically when stressing this test on
non-Linux tends to exhibit increased mem_clients_normal values.
2021-03-08 20:53:53 +02:00
Yossi Gottlieb
3c7d6a1853
Improve redis-cli non-binary safe string handling. (#8566)
* The `redis-cli --scan` output should honor output mode (set explicitly or implicitly), and quote key names when not in raw mode.
  * Technically this is a breaking change, but it should be very minor since raw mode is by default on for non-tty output.
  * It should only affect  TTY output (human users) or non-tty output if `--no-raw` is specified.

* Added `--quoted-input` option to treat all arguments as potentially quoted strings.
* Added `--quoted-pattern` option to accept a potentially quoted pattern.

Unquoting is applied to potentially quoted input only if single or double quotes are used. 

Fixes #8561, #8563
2021-03-04 15:03:49 +02:00
Yossi Gottlieb
5d180d2834
Fix potential replication-4 test race condition. (#8583)
Co-authored-by: Oran Agra <oran@redislabs.com>
2021-03-02 18:12:11 +02:00
YaacovHazan
c19530bc71
fix new networking tests to work when the test suite is used in tls mode (#8582)
the tests were unable to connect to the server since the attempted to use normal tcp
2021-03-01 20:53:02 +02:00
Oran Agra
349ef3f6a0
fix stream deep sanitization with deleted records (#8568)
When sanitizing the stream listpack, we need to count the deleted records too.
otherwise the last line that checks the next pointer fails.

Add test to cover that state in the stream tests.
2021-03-01 17:23:29 +02:00
YaacovHazan
a031d268b1
Make port, tls-port and bind configurations modifiable (#8510)
Add ability to modify port, tls-port and bind configurations by CONFIG SET command.

To simplify the code and make it cleaner, a new structure
added, socketFds, which contains the file descriptors array and its counter,
and used for TCP, TLS and Cluster sockets file descriptors.
2021-03-01 16:04:44 +02:00
Bonsai
81a55d026f
fix: call CLIENT INFO from redis module will crash the server (#8560)
Because when the RM_Call is invoked. It will create a faker client.
The point is client connection is NULL, so server will crash in connGetInfo

Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2021-03-01 08:18:14 +02:00
Viktor Söderqvist
6122f1c450
Shared reusable client for RM_Call() (#8516)
A single client pointer is added in the server struct. This is
initialized by the first RM_Call() and reused for every subsequent
RM_Call() except if it's already in use, which means that it's not
used for (recursive) module calls to modules. For these, a new
"fake" client is created each time.

Other changes:
* Avoid allocating a dict iterator in pubsubUnsubscribeAllChannels
  when not needed
2021-02-28 14:11:18 +02:00
Madelyn Olson
c6f0ea2c81
Allow stopped redis processes to be killed in tests (#8552) 2021-02-24 14:26:16 -08:00
sundb
60d5ef4d82
Use addReplyErrorObject with shared.noscripterr (#8544) 2021-02-24 08:45:13 -08:00
guybe7
f745c0181a
Fix race in CONFIG REWRITE sanity (#8536)
server may still be LOADING the RDB when receiving the ping
2021-02-23 20:28:03 +02:00
Yossi Gottlieb
95ea74549c
Fix failed tests on Linux Alpine and add a CI job. (#8532)
* Remove linux/version.h dependency.

This introduces unnecessary dependencies, and generally not a good idea
as the platform we build on may be different than the platform we run
on.

To determine if sync_file_range exists we can simply rely on header file
hints.

* Fix setproctitle() on libmusl.

The previous ifdef checks were a bit too strict for no apparent
reason.

* Fix tests failure on Linux with no backtrace.

* Add alpine daily CI job.
2021-02-23 12:57:45 +02:00
Harkrishn Patro
4739131ca6
Remove acl subcommand validation if fully added command exists. (#8483)
This validation was only done for sub-commands and not for commands.

These would have been valid (not produce any error)
ACL SETUSER bob +@all +client
ACL SETUSER bob +client +client

so no reason for this one to fail:
ACL SETUSER bob +client +client|id

One example why this is needed is that pfdebug wasn't part of the @hyperloglog
group and now it is. so something like:
acl setuser user1 +@hyperloglog +pfdebug|test
would have succeeded in early 6.0.x, and fail in 6.2 RC3

Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2021-02-22 15:22:25 +02:00
Huang Zw
f687ac0c32
Client tracking tracking-redir-broken push len is 2 not 3 (#8456)
When redis responds with tracking-redir-broken push message (RESP3),
it was responding with a broken protocol: an array of 3 elements, but only
pushes 2 elements.

Some bugs in the test make this pass. Read the push reply
will consume an extra reply, because the reply length is 3, but there
are only two elements, so the next reply will be treated as third
element. So the test is corrected too.

Other changes:
* checkPrefixCollisionsOrReply success should return 1 instead of -1,
  this bug didn't have any implications.
* improve client tracking tests to validate more of the response it reads.
2021-02-21 09:34:46 +02:00
Gnanesh
0772098b1b
EXPIRE, EXPIREAT, SETEX, GETEX: Return error when expire time overflows (#8287)
Respond with error if expire time overflows from positive to negative of vice versa.

* `SETEX`, `SET EX`, `GETEX` etc would have already error on negative value,
but now they would also error on overflows (i.e. when the input was positive but
after the manipulation it becomes negative, which would have passed before)
* `EXPIRE` and `EXPIREAT` was ok taking negative values (would implicitly delete
the key), we keep that, but we do error if the user provided a value that changes
sign when manipulated (except the case of changing sign when `basetime` is added)

Signed-off-by: Gnanesh <gnaneshkunal@outlook.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2021-02-21 09:09:54 +02:00