Commit Graph

1363 Commits

Author SHA1 Message Date
sundb
569a3f4548
Use chi-square for random distributivity verification in test (#8709)
Problem:
Currently, when performing random distribution verification, we determine
the probability of each element occurring in the sum, but the probability is
only an estimate, these tests had rare sporadic failures, and we cannot verify
what the probability of failure will be.

Solution:
Using the chi-square distribution instead of the original random distribution
validation makes the test more reasonable and easier to find problems.
2021-04-01 08:20:15 +03:00
Jérôme Loyet
91f4f41665
Add replica-announced config option (#8653)
The 'sentinel replicas <master>' command will ignore replicas with
`replica-announced` set to no.

The goal of disabling the config setting replica-announced is to allow ghost
replicas. The replica is in the cluster, synchronize with its master, can be
promoted to master and is not exposed to sentinel clients. This way, it is
acting as a live backup or living ghost.

In addition, to prevent the replica to be promoted as master, set
replica-priority to 0.
2021-03-30 23:40:22 +03:00
Yossi Gottlieb
6a052af890
Cluster migration test cleanup. (#8726)
* Dump more output on error (always, cluster tests currently have no
verbose flag).
* Slow down redis-cli check iteration.
2021-03-30 23:33:01 +03:00
Viktor Söderqvist
5629dbe715
Add support for plaintext clients in TLS cluster (#8587)
The cluster bus is established over TLS or non-TLS depending on the configuration tls-cluster. The client ports distributed in the cluster and sent to clients are assumed to be TLS or non-TLS also depending on tls-cluster.

The cluster bus is now extended to also contain the non-TLS port of clients in a TLS cluster, when available. The non-TLS port of a cluster node, when available, is sent to clients connected without TLS in responses to CLUSTER SLOTS, CLUSTER NODES, CLUSTER SLAVES and MOVED and ASK redirects, instead of the TLS port.

The user was able to override the client port by defining cluster-announce-port. Now cluster-announce-tls-port is added, so the user can define an alternative announce port for both TLS and non-TLS clients.

Fixes #8134
2021-03-30 23:11:32 +03:00
JunhuaY
28375ff63e
re-fix config rewrite for empty save directive (#8722)
the bug was also discussed in #8716, and was solved in #8719, but incompletely:
when the server is started, and the save option is default, if you issue the " config set save "" "
to change the save option, and then issue the “config rewrite” command, the " save "" " won't be saved.
2021-03-30 22:49:06 +03:00
Oran Agra
cd81dcf18b
solve race conditions in psync2-pingoff test (#8720)
Another test race condition in the macos tests.
the test was waiting for PINGs to be generated and put on the replication stream,
but waiting for 1 or 2 seconds doesn't really guarantee that.
then the test that expected 6 full syncs, found only 4
2021-03-30 11:41:06 +03:00
Yossi Gottlieb
65311a3360
Fix config rewrite with an empty "save" parameter. (#8719) 2021-03-29 18:53:20 +03:00
Sokolov Yura
315df9ada0
Add cluster slot migration tests (#8649)
Add tests for fixing migrating slot at all stages:

1. when migration is half inited on "migrating" node
2. when migration is half inited on "importing" node
3. migration inited, but not finished
4. migration is half finished on "migrating" node
5. migration is half finished on "importing" node

Also add tests for many simultaneous slot migrations.

Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
2021-03-29 13:52:02 +03:00
Meir Shpilraien (Spielrein)
036963a7da
Restore old client 'processCommandAndResetClient' to fix false dead client indicator (#8715)
'processCommandAndResetClient' returns 1 if client is dead. It does it
by checking if serve.current_client is NULL. On script timeout, Redis will re-enter
'processCommandAndResetClient' and when finish we will set server.current_client
to NULL. This will cause later to falsely return 1 and think that the client that
sent the timed-out script is dead (Redis to stop reading from the client buffer).
2021-03-29 13:34:16 +03:00
Huang Zhw
e138698e54
make processCommand check publish channel permissions. (#8534)
Add publish channel permissions check in processCommand.

processCommand didn't check publish channel permissions, so we can
queue a publish command in a transaction. But when exec the transaction,
it will fail with -NOPERM.

We also union keys/commands/channels permissions check togegher in
ACLCheckAllPerm. Remove pubsubCheckACLPermissionsOrReply in 
publishCommand/subscribeCommand/psubscribeCommand. Always 
check permissions in processCommand/execCommand/
luaRedisGenericCommand.
2021-03-26 14:10:01 +03:00
Oran Agra
497351ad07
Fix SLOWLOG for blocked commands (#8632)
* SLOWLOG didn't record anything for blocked commands because the client
  was reset and argv was already empty. there was a fix for this issue
  specifically for modules, now it works for all blocked clients.
* The original command argv (before being re-written) was also reset
  before adding the slowlog on behalf of the blocked command.
* Latency monitor is now updated regardless of the slowlog flags of the
  command or its execution (their purpose is to hide sensitive info from
  the slowlog, not hide the fact the latency happened).
* Latency monitor now uses real_cmd rather than c->cmd (which may be
  different if the command got re-written, e.g. GEOADD)

Changes:
* Unify shared code between slowlog insertion in call() and
  updateStatsOnUnblock(), hopefully prevent future bugs from happening
  due to the later being overlooked.
* Reset CLIENT_PREVENT_LOGGING in resetClient rather than after command
  processing.
* Add a test for SLOWLOG and BLPOP

Notes:
- real_cmd == c->lastcmd, except inside MULTI and Lua.
- blocked commands never happen in these cases (MULTI / Lua)
- real_cmd == c->cmd, except for when the command is rewritten (e.g.
  GEOADD)
- blocked commands (currently) are never rewritten
- other than the command's CLIENT_PREVENT_LOGGING, and the
  execution flag CLIENT_PREVENT_LOGGING, other cases that we want to
  avoid slowlog are on AOF loading (specifically CMD_CALL_SLOWLOG will
  be off when executed from execCommand that runs from an AOF)
2021-03-25 10:20:27 +02:00
Qu Chen
7de6451818
Properly initialize variable to make valgrind happy in checkChildrenDone(). Removed usage for the obsolete wait3() and wait4() in favor of waitpid(), and properly check for the exit status code. (#8666) 2021-03-24 08:41:05 -07:00
yoav-steinberg
3060de88ce
Remove cron saving during BGSAVE test. (#8688)
This fixes a race where a bgsave can start during the test after we verified no bgsave is running.
2021-03-24 15:14:47 +02:00
Oran Agra
f6e1a94e03
Corrupt stream key access to uninitialized memory (#8681)
the corrupt-dump-fuzzer test found a case where an access to a corrupt
stream would have caused accessing to uninitialized memory.
now it'll panic instead.

The issue was that there was a stream that says it has more than 0
records, but looking for the max ID came back empty handed.

p.s. when sanitize-dump-payload is used, this corruption is detected,
and the RESTORE command is gracefully rejected.
2021-03-24 11:33:49 +02:00
Yossi Gottlieb
c4ef1efdb7
Add support for reading encrypted keyfiles. (#8644) 2021-03-22 13:27:46 +02:00
Oran Agra
2f717c156a
fix race in diskless load cluster tests (#8674) 2021-03-22 10:51:13 +02:00
Oran Agra
a7c02b19bf
Fix race in replication test (#8679)
Since redis 6.2, redis immediately tries to connect to the master, not
waiting for replication cron.

in the slow freebsd CI, this test failed and master_link_status was
already "up" when INFO was called.
2021-03-22 10:50:39 +02:00
Meir Shpilraien (Spielrein)
9ae4f5c73d
Fix script kill to work also on scripts that use pcall (#8661)
pcall function runs another LUA function in protected mode, this means
that any error will be caught by this function and will not stop the LUA
execution. The script kill mechanism uses error to stop the running script.
Scripts that uses pcall can catch the error raise by the script kill mechanism,
this will cause a script like this to be unkillable:

local f = function()
        while 1 do
                redis.call('ping')
        end
end
while 1 do
        pcall(f)
end

The fix is, when we want to kill the script, we set the hook function to be invoked 
after each line. This will promise that the execution will get another
error before it is able to enter the pcall function again.
2021-03-17 18:52:11 +02:00
Huang Zhw
a19c4058be
When tests exit normally, some processes may still be alive (#8647)
In certain scenario start_server may think it failed to start a redis server
although it started successfully. in these cases, it'll not terminate it, and
it'll remain running when the test is over.

In start_server if config doesn't have bind (the minimal.conf in introspection.tcl),
it will try to bind ipv4 and ipv6. One may success while other fails. It will
output "Could not create server TCP listening socket".
wait_server_started uses this message to check whether instance started
successfully. So it will consider that it failed even though redis started successfully.

Additionally, in some cases it wasn't clear to users why the server exited,
since the warning message printed to the log, could in some cases be harmless,
and in some cases fatal.

This PR adds makes a clear distinction between a warning log message and
a fatal one, and changes the test suite to look for the fatal message.
2021-03-16 17:25:30 +02:00
Madelyn Olson
e1d98bca5a
Redact slowlog entries for config with sensitive data. (#8584)
Redact config set requirepass/masterauth/masteruser from slowlog in addition to showing ACL commands without sensitive values.
2021-03-15 22:00:29 -07:00
guybe7
dba33a943d
Missing EXEC on modules propagation after failed EVAL execution (#8654)
1. moduleReplicateMultiIfNeeded should use server.in_eval like
   moduleHandlePropagationAfterCommandCallback
2. server.in_eval could have been set to 1 and not reset back
   to 0 (a lot of missed early-exits after in_eval is already 1)

Note: The new assertions in processCommand cover (2) and I added
two module tests to cover (1)

Implications:
If an EVAL that failed (and thus left server.in_eval=1) runs before a module
command that replicates, the replication stream will contain MULTI (because
moduleReplicateMultiIfNeeded used to check server.lua_caller which is NULL
at this point) but not EXEC (because server.in_eval==1)
This only affects modules as module.c the only user of server.in_eval.

Affects versions 6.2.0, 6.2.1
2021-03-15 21:19:57 +02:00
KinWaiYuen
5b48d90049
Optimize CLUSTER SLOTS reply by reducing unneeded loops (#8541)
This commit more efficiently computes the cluster bulk slots response 
by looping over the entire slot space once, instead of for each node.
2021-03-11 22:40:35 -08:00
guybe7
a4f03bd7eb
Fix some memory leaks in propagagte.c (#8636)
Introduced by 3d0b427c30
2021-03-11 13:50:13 +02:00
Harkrishn Patro
b70d81f60b
Process hello command even if the default user has no permissions. (#8633)
Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>
2021-03-10 21:19:35 -08:00
guybe7
3d0b427c30
Fix some issues with modules and MULTI/EXEC (#8617)
Bug 1:
When a module ctx is freed moduleHandlePropagationAfterCommandCallback
is called and handles propagation. We want to prevent it from propagating
commands that were not replicated by the same context. Example:
1. module1.foo does: RM_Replicate(cmd1); RM_Call(cmd2); RM_Replicate(cmd3)
2. RM_Replicate(cmd1) propagates MULTI and adds cmd1 to also_propagagte
3. RM_Call(cmd2) create a new ctx, calls call() and destroys the ctx.
4. moduleHandlePropagationAfterCommandCallback is called, calling
   alsoPropagates EXEC (Note: EXEC is still not written to socket),
   setting server.in_trnsaction = 0
5. RM_Replicate(cmd3) is called, propagagting yet another MULTI (now
   we have nested MULTI calls, which is no good) and then cmd3

We must prevent RM_Call(cmd2) from resetting server.in_transaction.
REDISMODULE_CTX_MULTI_EMITTED was revived for that purpose.

Bug 2:
Fix issues with nested RM_Call where some have '!' and some don't.
Example:
1. module1.foo does RM_Call of module2.bar without replication (i.e. no '!')
2. module2.bar internally calls RM_Call of INCR with '!'
3. at the end of module1.foo we call RM_ReplicateVerbatim

We want the replica/AOF to see only module1.foo and not the INCR from module2.bar

Introduced a global replication_allowed flag inside RM_Call to determine
whether we need to replicate or not (even if '!' was specified)

Other changes:
Split beforePropagateMultiOrExec to beforePropagateMulti afterPropagateExec
just for better readability
2021-03-10 18:02:17 +02:00
Yossi Gottlieb
817894c012
Fix test false positive due to a race condition. (#8616) 2021-03-08 21:22:08 +02:00
Yossi Gottlieb
7d81f39222
Fix flaky unit/maxmemory test on MacOS/BSD. (#8619)
It seems like non-Linux sockets may be less greedy, resulting with more
transient client output buffers.

Haven't proven this but empirically when stressing this test on
non-Linux tends to exhibit increased mem_clients_normal values.
2021-03-08 20:53:53 +02:00
Yossi Gottlieb
3c7d6a1853
Improve redis-cli non-binary safe string handling. (#8566)
* The `redis-cli --scan` output should honor output mode (set explicitly or implicitly), and quote key names when not in raw mode.
  * Technically this is a breaking change, but it should be very minor since raw mode is by default on for non-tty output.
  * It should only affect  TTY output (human users) or non-tty output if `--no-raw` is specified.

* Added `--quoted-input` option to treat all arguments as potentially quoted strings.
* Added `--quoted-pattern` option to accept a potentially quoted pattern.

Unquoting is applied to potentially quoted input only if single or double quotes are used. 

Fixes #8561, #8563
2021-03-04 15:03:49 +02:00
Yossi Gottlieb
5d180d2834
Fix potential replication-4 test race condition. (#8583)
Co-authored-by: Oran Agra <oran@redislabs.com>
2021-03-02 18:12:11 +02:00
YaacovHazan
c19530bc71
fix new networking tests to work when the test suite is used in tls mode (#8582)
the tests were unable to connect to the server since the attempted to use normal tcp
2021-03-01 20:53:02 +02:00
Oran Agra
349ef3f6a0
fix stream deep sanitization with deleted records (#8568)
When sanitizing the stream listpack, we need to count the deleted records too.
otherwise the last line that checks the next pointer fails.

Add test to cover that state in the stream tests.
2021-03-01 17:23:29 +02:00
YaacovHazan
a031d268b1
Make port, tls-port and bind configurations modifiable (#8510)
Add ability to modify port, tls-port and bind configurations by CONFIG SET command.

To simplify the code and make it cleaner, a new structure
added, socketFds, which contains the file descriptors array and its counter,
and used for TCP, TLS and Cluster sockets file descriptors.
2021-03-01 16:04:44 +02:00
Bonsai
81a55d026f
fix: call CLIENT INFO from redis module will crash the server (#8560)
Because when the RM_Call is invoked. It will create a faker client.
The point is client connection is NULL, so server will crash in connGetInfo

Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2021-03-01 08:18:14 +02:00
Viktor Söderqvist
6122f1c450
Shared reusable client for RM_Call() (#8516)
A single client pointer is added in the server struct. This is
initialized by the first RM_Call() and reused for every subsequent
RM_Call() except if it's already in use, which means that it's not
used for (recursive) module calls to modules. For these, a new
"fake" client is created each time.

Other changes:
* Avoid allocating a dict iterator in pubsubUnsubscribeAllChannels
  when not needed
2021-02-28 14:11:18 +02:00
Madelyn Olson
c6f0ea2c81
Allow stopped redis processes to be killed in tests (#8552) 2021-02-24 14:26:16 -08:00
sundb
60d5ef4d82
Use addReplyErrorObject with shared.noscripterr (#8544) 2021-02-24 08:45:13 -08:00
guybe7
f745c0181a
Fix race in CONFIG REWRITE sanity (#8536)
server may still be LOADING the RDB when receiving the ping
2021-02-23 20:28:03 +02:00
Yossi Gottlieb
95ea74549c
Fix failed tests on Linux Alpine and add a CI job. (#8532)
* Remove linux/version.h dependency.

This introduces unnecessary dependencies, and generally not a good idea
as the platform we build on may be different than the platform we run
on.

To determine if sync_file_range exists we can simply rely on header file
hints.

* Fix setproctitle() on libmusl.

The previous ifdef checks were a bit too strict for no apparent
reason.

* Fix tests failure on Linux with no backtrace.

* Add alpine daily CI job.
2021-02-23 12:57:45 +02:00
Harkrishn Patro
4739131ca6
Remove acl subcommand validation if fully added command exists. (#8483)
This validation was only done for sub-commands and not for commands.

These would have been valid (not produce any error)
ACL SETUSER bob +@all +client
ACL SETUSER bob +client +client

so no reason for this one to fail:
ACL SETUSER bob +client +client|id

One example why this is needed is that pfdebug wasn't part of the @hyperloglog
group and now it is. so something like:
acl setuser user1 +@hyperloglog +pfdebug|test
would have succeeded in early 6.0.x, and fail in 6.2 RC3

Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2021-02-22 15:22:25 +02:00
Huang Zw
f687ac0c32
Client tracking tracking-redir-broken push len is 2 not 3 (#8456)
When redis responds with tracking-redir-broken push message (RESP3),
it was responding with a broken protocol: an array of 3 elements, but only
pushes 2 elements.

Some bugs in the test make this pass. Read the push reply
will consume an extra reply, because the reply length is 3, but there
are only two elements, so the next reply will be treated as third
element. So the test is corrected too.

Other changes:
* checkPrefixCollisionsOrReply success should return 1 instead of -1,
  this bug didn't have any implications.
* improve client tracking tests to validate more of the response it reads.
2021-02-21 09:34:46 +02:00
Gnanesh
0772098b1b
EXPIRE, EXPIREAT, SETEX, GETEX: Return error when expire time overflows (#8287)
Respond with error if expire time overflows from positive to negative of vice versa.

* `SETEX`, `SET EX`, `GETEX` etc would have already error on negative value,
but now they would also error on overflows (i.e. when the input was positive but
after the manipulation it becomes negative, which would have passed before)
* `EXPIRE` and `EXPIREAT` was ok taking negative values (would implicitly delete
the key), we keep that, but we do error if the user provided a value that changes
sign when manipulated (except the case of changing sign when `basetime` is added)

Signed-off-by: Gnanesh <gnaneshkunal@outlook.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2021-02-21 09:09:54 +02:00
sundb
46346e9e3a
Fix timing error oom-score-adj test (#8513)
fixes timing issue, fork didn't always get to set the oom score before the test verified it.
2021-02-19 13:01:25 +02:00
uriyage
fd052d2a86
Adds INFO fields to track fork child progress (#8414)
* Adding current_save_keys_total and current_save_keys_processed info fields.
  Present in replication, BGSAVE and AOFRW.
* Changing RM_SendChildCOWInfo() to RM_SendChildHeartbeat(double progress)
* Adding new info field current_fork_perc. Present in Replication, BGSAVE, AOFRW,
  and module forks.
2021-02-16 16:06:51 +02:00
Oran Agra
fb3457d157
minor test suite cleanup, revive old test (#8497)
There are two tests in other.tcl that were dependant of the sha1 package
import which meant that they didn't usually run.
The reason it was like that was that prior to the creation of DEBUG
DIGEST, the test suite used to have an equivalent function, but that's
no longer the case and this dependency isn't needed.

The other change is to revert config changes done by the test before the
test suite continues. can be useful if using `--host` to run multiple
units against the same server
2021-02-15 17:20:03 +02:00
Yossi Gottlieb
141ac8df59
Escape unsafe field name characters in INFO. (#8492)
Fixes #8489
2021-02-15 17:08:53 +02:00
Oran Agra
30775bc3e3
solve race in replication-2 test - again (#8491)
this should make it timing independent and also faster in most cases
2021-02-15 12:50:23 +02:00
Viktor Söderqvist
0bc8c9c8f9
Modules: In RM_HashSet, add COUNT_ALL flag and set errno (#8446)
The added flag affects the return value of RM_HashSet() to include
the number of inserted fields, in addition to updated and deleted
fields.

errno is set on errors, tests are added and documentation updated.
2021-02-15 11:40:05 +02:00
Yossi Gottlieb
8c42d1257f
Fix errors with sentinel leaked fds test. (#8482)
* Don't run test script on non-Linux.
* Verify that reported fds do indeed exist also in parent, to avoid
  false negatives on some systems (namely CentOS).

Co-authored-by: Andy Pan <panjf2000@gmail.com>
2021-02-11 15:25:01 +02:00
filipe oliveira
b5ca1e9e53
Removed time sensitive checks from block on background tests. Fixed uninitialized variable (#8479)
- removes time sensitive checks from block on background tests during leak checks.
- fix uninitialized variable on RedisModuleBlockedClient() when calling
  RM_BlockedClientMeasureTimeEnd() without RM_BlockedClientMeasureTimeStart()
2021-02-10 08:59:07 +02:00
WuYunlong
203f357c32
Cleanup in redis-cli and tests: release memory on exit, change dup test name (#8475)
1. Rename 18-cluster-nodes-slots.tcl to 19-cluster-nodes-slots.tcl.
  it was conflicting with another test prefixed by 18
2. Release memory on exit in redis-cli.c.
3. Fix freeConvertedSds indentation.
2021-02-09 12:36:09 +02:00