Commit Graph

1714 Commits

Author SHA1 Message Date
Oran Agra
3f3f678a47
corrupt-dump-fuzzer test, avoid creating junk keys (#9302)
The execution of the RPOPLPUSH command by the fuzzer created junk keys,
that were later being selected by RANDOMKEY and modified.
This also meant that lists were statistically tested more than other
files.

Fix the fuzzer not to pass junk key names to RPOPLPUSH, and add a check
that detects that new keys are not added by the fuzzer to detect future
similar issues.
2021-08-05 22:57:05 +03:00
Oran Agra
0c90370e6d
Improvements to corrupt payload sanitization (#9321)
Recently we found two issues in the fuzzer tester: #9302 #9285
After fixing them, more problems surfaced and this PR (as well as #9297) aims to fix them.

Here's a list of the fixes
- Prevent an overflow when allocating a dict hashtable
- Prevent OOM when attempting to allocate a huge string
- Prevent a few invalid accesses in listpack
- Improve sanitization of listpack first entry
- Validate integrity of stream consumer groups PEL
- Validate integrity of stream listpack entry IDs
- Validate ziplist tail followed by extra data which start with 0xff

Co-authored-by: sundb <sundbcn@gmail.com>
2021-08-05 22:56:14 +03:00
sundb
8ea777a6a0
Sanitize dump payload: fix empty keys when RDB loading and restore command (#9297)
When we load rdb or restore command, if we encounter a length of 0, it will result in the creation of an empty key.
This could either be a corrupt payload, or a result of a bug (see #8453 )

This PR mainly fixes the following:
1) When restore command will return `Bad data format` error.
2) When loading RDB, we will silently discard the key.

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-08-05 22:42:20 +03:00
Binbin
d0244bfc3d
Make sure execute SLAVEOF command in the right order in psync2 test. (#9316)
The psync2 test has failed several times recently.
In #9159 we only solved half of the problem.
i.e. reordering of the replica that's already connected to
the newly promoted master.

Consider this scenario:
0 slaveof 2
1 slaveof 2
3 slaveof 2
4 slaveof 1
0 slaveof no one, became a new master got a new replid
2 slaveof 0, partial resync and got the new replid
3 reconnect 2, inherit the new replid
3 slaveof 4, use the new replid and got a full resync

And another scenario:
1 slaveof 3
2 slaveof 4
3 slaveof 0
4 slaveof 0
4 slaveof no one, became a new master got a new replid
2 reconnect 4, inherit the new replid
2 slaveof 1, use the new replid and got a full resync

So maybe we should reattach replicas in the right order.
i.e. In the above example, if it would have reattached 1, 3 and 0 to
the new chain formed by 4 before trying to attach 2 to 1, it would succeed.

This commit break the SLAVEOF loop into two loops. (ideas from oran)

First loop that uses random to decide who replicates from who.
Second loop that does the actual SLAVEOF command.
In the second loop, we make sure to execute it in the right order,
and after each SLAVEOF, wait for it to be connected before we proceed.

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-08-05 11:26:09 +03:00
Wen Hui
63e2a6d212
Add sentinel debug option command (#9291)
This makes it possible to tune many parameters that were previously hard coded.
We don't intend these to be user configurable, but only used by tests to accelerate certain conditions which would otherwise take a long time and slow down the test suite.

Co-authored-by: Lucas Guang Yang <l84193800@china.huawei.com>
2021-08-05 11:12:55 +03:00
Viktor Söderqvist
1c59567a7f
redis-cli ASK redirect test: Add retry loop to fix timing issue (#9315) 2021-08-05 08:20:30 +03:00
Wang Yuan
d4bca53cd9
Use madvise(MADV_DONTNEED) to release memory to reduce COW (#8974)
## Backgroud
As we know, after `fork`, one process will copy pages when writing data to these
pages(CoW), and another process still keep old pages, they totally cost more memory.
For redis, we suffered that redis consumed much memory when the fork child is serializing
key/values, even that maybe cause OOM.

But actually we find, in redis fork child process, the child process don't need to keep some
memory and parent process may write or update that, for example, child process will never
access the key-value that is serialized but users may update it in parent process.
So we think it may reduce COW if the child process release memory that it is not needed.

## Implementation
For releasing key value in child process, we may think we call `decrRefCount` to free memory,
but i find the fork child process still use much memory when we don't write any data to redis,
and it costs much more time that slows down bgsave. Maybe because memory allocator doesn't
really release memory to OS, and it may modify some inner data for this free operation, especially
when we free small objects.

Moreover, CoW is based on  pages, so it is a easy way that we only free the memory bulk that is
not less than kernel page size. madvise(MADV_DONTNEED) can quickly release specified region
pages to OS bypassing memory allocator, and allocator still consider that this memory still is used
and don't change its inner data.

There are some buffers we can release in the fork child process:
- **Serialized key-values**
  the fork child process never access serialized key-values, so we try to free them.
  Because we only can release big bulk memory, and it is time consumed to iterate all
  items/members/fields/entries of complex data type. So we decide to iterate them and
  try to release them only when their average size of item/member/field/entry is more
  than page size of OS.
- **Replication backlog**
  Because replication backlog is a cycle buffer, it will be changed quickly if redis has heavy
  write traffic, but in fork child process, we don't need to access that.
- **Client buffers**
  If clients have requests during having the fork child process, clients' buffer also be changed
  frequently. The memory includes client query buffer, output buffer, and client struct used memory.

To get child process peak private dirty memory, we need to count peak memory instead
of last used memory, because the child process may continue to release memory (since
COW used to only grow till now, the last was equivalent to the peak).
Also we're adding a new `current_cow_peak` info variable (to complement the existing
`current_cow_size`)

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-08-04 23:01:46 +03:00
Meir Shpilraien (Spielrein)
56eb7f7de4
Fix tests failure on 32bit build (#9318)
Fix test introduced in #9202 that failed on 32bit CI.
The failure was due to a wrong double comparison.
Change code to stringify the double first and then compare.
2021-08-04 21:33:38 +03:00
Meir Shpilraien (Spielrein)
2237131e15
Unified Lua and modules reply parsing and added RESP3 support to RM_Call (#9202)
## Current state
1. Lua has its own parser that handles parsing `reds.call` replies and translates them
  to Lua objects that can be used by the user Lua code. The parser partially handles
  resp3 (missing big number, verbatim, attribute, ...)
2. Modules have their own parser that handles parsing `RM_Call` replies and translates
  them to RedisModuleCallReply objects. The parser does not support resp3.

In addition, in the future, we want to add Redis Function (#8693) that will probably
support more languages. At some point maintaining so many parsers will stop
scaling (bug fixes and protocol changes will need to be applied on all of them).
We will probably end up with different parsers that support different parts of the
resp protocol (like we already have today with Lua and modules)

## PR Changes
This PR attempt to unified the reply parsing of Lua and modules (and in the future
Redis Function) by introducing a new parser unit (`resp_parser.c`). The new parser
handles parsing the reply and calls different callbacks to allow the users (another
unit that uses the parser, i.e, Lua, modules, or Redis Function) to analyze the reply.

### Lua API Additions
The code that handles reply parsing on `scripting.c` was removed. Instead, it uses
the resp_parser to parse and create a Lua object out of the reply. As mentioned
above the Lua parser did not handle parsing big numbers, verbatim, and attribute.
The new parser can handle those and so Lua also gets it for free.
Those are translated to Lua objects in the following way:
1. Big Number - Lua table `{'big_number':'<str representation for big number>'}`
2. Verbatim - Lua table `{'verbatim_string':{'format':'<verbatim format>', 'string':'<verbatim string value>'}}`
3. Attribute - currently ignored and not expose to the Lua parser, another issue will be open to decide how to expose it.

Tests were added to check resp3 reply parsing on Lua

### Modules API Additions
The reply parsing code on `module.c` was also removed and the new resp_parser is used instead.
In addition, the RedisModuleCallReply was also extracted to a separate unit located on `call_reply.c`
(in the future, this unit will also be used by Redis Function). A nice side effect of unified parsing is
that modules now also support resp3. Resp3 can be enabled by giving `3` as a parameter to the
fmt argument of `RM_Call`. It is also possible to give `0`, which will indicate an auto mode. i.e, Redis
will automatically chose the reply protocol base on the current client set on the RedisModuleCtx
(this mode will mostly be used when the module want to pass the reply to the client as is).
In addition, the following RedisModuleAPI were added to allow analyzing resp3 replies:

* New RedisModuleCallReply types:
   * `REDISMODULE_REPLY_MAP`
   * `REDISMODULE_REPLY_SET`
   * `REDISMODULE_REPLY_BOOL`
   * `REDISMODULE_REPLY_DOUBLE`
   * `REDISMODULE_REPLY_BIG_NUMBER`
   * `REDISMODULE_REPLY_VERBATIM_STRING`
   * `REDISMODULE_REPLY_ATTRIBUTE`

* New RedisModuleAPI:
   * `RedisModule_CallReplyDouble` - getting double value from resp3 double reply
   * `RedisModule_CallReplyBool` - getting boolean value from resp3 boolean reply
   * `RedisModule_CallReplyBigNumber` - getting big number value from resp3 big number reply
   * `RedisModule_CallReplyVerbatim` - getting format and value from resp3 verbatim reply
   * `RedisModule_CallReplySetElement` - getting element from resp3 set reply
   * `RedisModule_CallReplyMapElement` - getting key and value from resp3 map reply
   * `RedisModule_CallReplyAttribute` - getting a reply attribute
   * `RedisModule_CallReplyAttributeElement` - getting key and value from resp3 attribute reply
   
* New context flags:
   * `REDISMODULE_CTX_FLAGS_RESP3` - indicate that the client is using resp3

Tests were added to check the new RedisModuleAPI

### Modules API Changes
* RM_ReplyWithCallReply might return REDISMODULE_ERR if the given CallReply is in resp3
  but the client expects resp2. This is not a breaking change because in order to get a resp3
  CallReply one needs to specifically specify `3` as a parameter to the fmt argument of
  `RM_Call` (as mentioned above).

Tests were added to check this change

### More small Additions
* Added `debug set-disable-deny-scripts` that allows to turn on and off the commands no-script
flag protection. This is used by the Lua resp3 tests so it will be possible to run `debug protocol`
and check the resp3 parsing code.

Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
2021-08-04 16:28:07 +03:00
Oran Agra
52df350fe5
Skip new redis-cli ASK test in TLS mode (#9312) 2021-08-03 13:19:03 -07:00
Jonah H. Harris
432c92d8df
Add SINTERCARD/ZINTERCARD Commands (#8946)
Add SINTERCARD and ZINTERCARD commands that are similar to
ZINTER and SINTER but only return the cardinality with minimum
processing and memory overheads.

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-08-03 11:45:27 +03:00
Ariel Shtul
bdbf5eedae
Module api support for RESP3 (#8521)
Add new Module APS for RESP3 responses:
- RM_ReplyWithMap
- RM_ReplyWithSet
- RM_ReplyWithAttribute
- RM_ReplySetMapLength
- RM_ReplySetSetLength
- RM_ReplySetAttributeLength
- RM_ReplyWithBool

Deprecate REDISMODULE_POSTPONED_ARRAY_LEN in favor of a generic REDISMODULE_POSTPONED_LEN

Improve documentation
Add tests

Co-authored-by: Guy Benoish <guy.benoish@redislabs.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2021-08-03 11:37:19 +03:00
Yossi Gottlieb
4bd7748362
Tests: fix commandfilter crash on alpine. (#9307)
Loading and unloading the shared object does not initialize global vars
on alpine.
2021-08-02 15:50:45 +03:00
Huang Zhw
cf61ad14cc
When redis-cli received ASK, it didn't handle it (#8930)
When redis-cli received ASK, it used string matching wrong and didn't
handle it. 

When we access a slot which is in migrating state, it maybe
return ASK. After redirect to the new node, we need send ASKING
command before retry the command.  In this PR after redis-cli receives 
ASK, we send a ASKING command before send the origin command 
after reconnecting.

Other changes:
* Make redis-cli -u and -c (unix socket and cluster mode) incompatible 
  with one another.
* When send command fails, we avoid the 2nd reconnect retry and just
  print the error info. Users will decide how to do next. 
  See #9277.
* Add a test faking two redis nodes in TCL to just send ASK and OK in 
  redis protocol to test ASK behavior. 

Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Oran Agra <oran@redislabs.com>
2021-08-02 14:59:08 +03:00
Ning Sun
f74af0e61d
Add NX/XX/GT/LT options to EXPIRE command group (#2795)
Add NX, XX, GT, and LT flags to EXPIRE, PEXPIRE, EXPIREAT, PEXAPIREAT.
- NX - only modify the TTL if no TTL is currently set 
- XX - only modify the TTL if there is a TTL currently set 
- GT - only increase the TTL (considering non-volatile keys as infinite expire time)
- LT - only decrease the TTL (considering non-volatile keys as infinite expire time)
return value of the command is 0 when the operation was skipped due to one of these flags.

Signed-off-by: Ning Sun <sunng@protonmail.com>
2021-08-02 08:57:49 +03:00
menwen
82c3158ad5
Fix if consumer is created as a side effect without notify and dirty++ (#9263)
Fixes:
- When a consumer is created as a side effect, redis didn't issue a keyspace notification,
  nor incremented the server.dirty (affects periodic snapshots).
  this was a bug in XREADGROUP, XCLAIM, and XAUTOCLAIM.
- When attempting to delete a non-existent consumer, don't issue a keyspace notification
  and don't increment server.dirty
  this was a bug in XGROUP DELCONSUMER

Other changes:
- Changed streamLookupConsumer() to always only do lookup consumer (never do implicit creation),
  Its last seen time is updated unless the SLC_NO_REFRESH flag is specified.
- Added streamCreateConsumer() to create a new consumer. When the creation is successful,
  it will notify and dirty++ unless the SCC_NO_NOTIFY or SCC_NO_DIRTIFY flags is specified.
- Changed streamDelConsumer() to always only do delete consumer.
- Added keyspace notifications tests about stream events.
2021-08-02 08:31:33 +03:00
Binbin
86555ae0f7
GEO* STORE with empty src key delete the dest key and return 0, not empty array (#9271)
With an empty src key, we need to deal with two situations:
1. non-STORE: We should return emptyarray.
2. STORE: Try to delete the store key and return 0.

This applies to both GEOSEARCHSTORE (new to v6.2), and
also GEORADIUS STORE (which was broken since forever)

This pr try to fix #9261. i.e. both STORE variants would have behaved
like the non-STORE variants when the source key was missing,
returning an empty array and not deleting the destination key,
instead of returning 0, and deleting the destination key.

Also add more tests for some commands.
- GEORADIUS: wrong type src key, non existing src key, empty search,
  store with non existing src key, store with empty search
- GEORADIUSBYMEMBER: wrong type src key, non existing src key,
  non existing member, store with non existing src key
- GEOSEARCH: wrong type src key, non existing src key, empty search,
  frommember with non existing member
- GEOSEARCHSTORE: wrong type key, non existing src key,
  fromlonlat with empty search, frommember with non existing member

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-08-01 19:32:24 +03:00
Yossi Gottlieb
68b8b45cd5
Tests: avoid short reads on redis-cli output. (#9301)
In some cases large replies on slow systems may only be partially read
by the test suite, resulting with parsing errors.

This fix is still timing sensitive but should greatly reduce the chances
of this happening.
2021-08-01 15:07:27 +03:00
Guy Korland
1483f5aa9b
Remove const from CommandFilterArgGet result (#9247)
Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
2021-08-01 11:29:32 +03:00
Long Dai
89fdcbec8c
tests: fix a typo (#9294)
Signed-off-by: Long Dai <long0dai@foxmail.com>
2021-07-30 10:00:07 +03:00
Yossi Gottlieb
8bf433dc86
Clean unused var compiler warning in module test. (#9289) 2021-07-29 19:45:29 +03:00
Wen Hui
db41536454
Remove duplicate zero-port sentinels (#9240)
The issue is that when a sentinel with the same address and IP is turned on with a different runid, its port is set to 0 but it is still present in the dictionary master->sentinels which contain all the sentinels for a master.

This causes a problem when we do INFO SENTINEL because it takes the size of the dictionary of sentinels. This might also cause a problem for failover if enough sentinels have their port set to 0 since the number of voters in failover is also determined by the size of the dictionary of sentinels.

This commits removes the sentinels with the port set to zero from the dictionary of sentinels.
Fixes #8786
2021-07-29 12:32:28 +03:00
sundb
3db0f1a284
Fix missing check for sanitize_dump in corrupt-dump-fuzzer test (#9285)
this means the assertion that checks that when deep sanitization is enabled,
there are no crashes, was missing.
2021-07-29 11:53:21 +03:00
ZhaolongLi
8d00493485
tests: fix exec fails when grep exists with status other than 0 (#9066)
Co-authored-by: lizhaolong.lzl <lizhaolong.lzl@B-54MPMD6R-0221.local>
2021-07-25 09:58:21 +03:00
Huang Zhw
71d452876e
On 32 bit platform, the bit position of GETBIT/SETBIT/BITFIELD/BITCOUNT,BITPOS may overflow (see CVE-2021-32761) (#9191)
GETBIT, SETBIT may access wrong address because of wrap.
BITCOUNT and BITPOS may return wrapped results.
BITFIELD may access the wrong address but also allocate insufficient memory and segfault (see CVE-2021-32761).

This commit uses `uint64_t` or `long long` instead of `size_t`.
related https://github.com/redis/redis/pull/8096

At 32bit platform:
> setbit bit 4294967295 1
(integer) 0
> config set proto-max-bulk-len 536870913
OK
> append bit "\xFF"
(integer) 536870913
> getbit bit 4294967296
(integer) 0

When the bit index is larger than 4294967295, size_t can't hold bit index. In the past,  `proto-max-bulk-len` is limit to 536870912, so there is no problem.

After this commit, bit position is stored in `uint64_t` or `long long`. So when `proto-max-bulk-len > 536870912`, 32bit platforms can still be correct.

For 64bit platform, this problem still exists. The major reason is bit pos 8 times of byte pos. When proto-max-bulk-len is very larger, bit pos may overflow.
But at 64bit platform, we don't have so long string. So this bug may never happen.

Additionally this commit add a test cost `512MB` memory which is tag as `large-memory`. Make freebsd ci and valgrind ci ignore this test.
2021-07-21 16:25:19 +03:00
Binbin
11dc4e59b3
SMOVE only notify dstset when the addition is successful. (#9244)
in case dest key already contains the member, the dest key isn't modified, so the command shouldn't invalidate watch.
2021-07-17 09:54:06 +03:00
Oran Agra
6a5bac309e
Test infra, handle RESP3 attributes and big-numbers and bools (#9235)
- promote the code in DEBUG PROTOCOL to addReplyBigNum
- DEBUG PROTOCOL ATTRIB skips the attribute when client is RESP2
- networking.c addReply for push and attributes generate assertion when
  called on a RESP2 client, anything else would produce a broken
  protocol that clients can't handle.
2021-07-14 19:14:31 +03:00
perryitay
ac8b1df885
Fail EXEC command in case a watched key is expired (#9194)
There are two issues fixed in this commit: 
1. we want to fail the EXEC command in case there is a watched key that's logically
   expired but not yet deleted by active expire or lazy expire.
2. we saw that currently cache time is update in every `call()` (including nested calls),
   this time is being also being use for the isKeyExpired comparison, we want to update
   the cache time only in the first call (execCommand)

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-07-11 13:17:23 +03:00
Yossi Gottlieb
92e8004705
Pre-test bind-source-addr before running test. (#9214)
This attempts to catch any non-standard configuration where the test may
fail and produce a false positive.
2021-07-11 09:54:07 +03:00
Mikhail Fesenko
1eb4baa5b8
Direct redis-cli repl prints to stderr, because --rdb can print to stdout. fflush stdout after responses (#9136)
1. redis-cli can output --rdb data to stdout
   but redis-cli also write some messages to stdout which will mess up the rdb.

2. Make redis-cli flush stdout when printing a reply
  This was needed in order to fix a hung in redis-cli test that uses
  --replica.
   Note that printf does flush when there's a newline, but fwrite does not.

3. fix the redis-cli --replica test which used to pass previously
   because it didn't really care what it read, and because redis-cli
   used printf to print these other things to stdout.

4. improve redis-cli --replica test to run with both diskless and disk-based.

Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Viktor Söderqvist <viktor@zuiderkwast.se>
2021-07-07 08:26:26 +03:00
Binbin
a418a2d3fc
hrandfield and zrandmember with count should return emptyarray when key does not exist. (#9178)
due to a copy-paste bug, it used to reply with null response rather than empty array.
this commit includes new tests that are looking at the RESP response directly in
order to be able to tell the difference between them.

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-07-05 10:41:57 +03:00
Oran Agra
7103367ad4
Tests: add a way to read raw RESP protocol reponses (#9193)
This makes it possible to distinguish between null response and an empty
array (currently the tests infra translates both to an empty string/list)
2021-07-04 19:43:58 +03:00
Oran Agra
a8518cce95
fix valgrind issues with recently added test in modules/blockonbackground (#9192)
fixes test issue introduced in #9167

1. invalid reads due to accessing non-retained string (passed as unblock context).
2. leaking module blocked client context, see #6922 for info.
2021-07-04 14:21:53 +03:00
Yossi Gottlieb
aa139e2f02
Fix CLIENT UNBLOCK crashing modules. (#9167)
Modules that use background threads with thread safe contexts are likely
to use RM_BlockClient() without a timeout function, because they do not
set up a timeout.

Before this commit, `CLIENT UNBLOCK` would result with a crash as the
`NULL` timeout callback is called. Beyond just crashing, this is also
logically wrong as it may throw the module into an unexpected client
state.

This commits makes `CLIENT UNBLOCK` on such clients behave the same as
any other client that is not in a blocked state and therefore cannot be
unblocked.
2021-07-01 17:11:27 +03:00
Binbin
5dddf496ce
Add missing pause tcl test to test_helper.tcl (#9158)
* Add keyname tags to avoid CROSSSLOT errors in external server CI
* Use new wait_for_blocked_clients_count in pause.tcl
2021-06-30 13:32:51 +03:00
Binbin
1d5aa37d68
Fix timing issue in psync2 test. (#9159)
*** [err]: PSYNC2: total sum of full synchronizations is exactly 4 intests/integration/psync2.tcl
Expected 5 == 4 (context: type eval line 8 cmd {assert {$sum == 4}} proc::test)

Sometime the test got an unexpected full sync since a replica switch to master,
before the new master change propagated the new replid to all replicas,
a replica attempted to sync with it using a wrong replid and triggered a full resync.

Consider this scenario:
    1 slaveof 4 full resync
    0 slaveof 4 full resync
    2 slaveof 0 full resync
    3 slaveof 1 full resync

    1 slaveof no one, replid changed
    3 reconnect 1, did a partial resyn and got the new replid

    Before 2 inherits the new replid.
    3 slaveof 2
    3 try to do a partial resyn with 2.
    But their replication ids are inconsistent, so a full resync happens.

:) A special thank you for oran and helping me in this test case.

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-06-30 09:18:10 +03:00
Yossi Gottlieb
5d8ea4b326
Add missing needs:repl tag. (#9169) 2021-06-29 16:48:52 +03:00
Leibale Eidelman
95274f1f8a
fix ZRANGESTORE - should return 0 when src points to an empty key (#9089)
mistakenly it used to return an empty array rather than 0.

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-06-29 16:38:10 +03:00
Binbin
4bc5a8324d
ZRANDMEMBER WITHSCORES with negative COUNT may return bad score (#9162)
Return a bad score when used with negative count (or count of 1), and non-ziplist encoded zset.
Also add test to validate the return value and cover the issue.
2021-06-29 10:14:28 +03:00
Yossi Gottlieb
f233c4c59d
Add bind-source-addr configuration argument. (#9142)
In the past, the first bind address that was explicitly specified was
also used to bind outgoing connections. This could result with some
problems. For example: on some systems using `bind 127.0.0.1` would
result with outgoing connections also binding to `127.0.0.1` and failing
to connect to remote addresses.

With the recent change to the way `bind` is handled, this presented
other issues:

* The default first bind address is '*' which is not a valid address.
* We make no distinction between user-supplied config that is identical
to the default, and the default config.

This commit addresses both these issues by introducing an explicit
configuration parameter to control the bind address on outgoing
connections.
2021-06-24 19:48:18 +03:00
Oran Agra
5ffdbae1f6
Fix failing basics moduleapi test on 32bit CI (#9140) 2021-06-24 12:44:13 +03:00
Oran Agra
ae418eca24
Adjustments to recent RM_StringTruncate fix (#3718) (#9125)
- Introduce a new sdssubstr api as a building block for sdsrange.
  The API of sdsrange is many times hard to work with and also has
  corner case that cause bugs. sdsrange is easy to work with and also
  simplifies the implementation of sdsrange.
- Revert the fix to RM_StringTruncate and just use sdssubstr instead of
  sdsrange.
- Solve valgrind warnings from the new tests introduced by the previous
  PR.
2021-06-22 17:22:40 +03:00
Yossi Gottlieb
a49b766860
Remove leftover after CONFIG SET bind change. (#9129) 2021-06-22 14:03:00 +03:00
Yossi Gottlieb
8284544adb
Fix typo in test. (#9128) 2021-06-22 13:30:20 +03:00
Yossi Gottlieb
07b0d144ce
Improve bind and protected-mode config handling. (#9034)
* Specifying an empty `bind ""` configuration prevents Redis from listening on any TCP port. Before this commit, such configuration was not accepted.
* Using `CONFIG GET bind` will always return an explicit configuration value. Before this commit, if a bind address was not specified the returned value was empty (which was an anomaly).

Another behavior change is that modifying the `bind` configuration to a non-default value will NO LONGER DISABLE protected-mode implicitly.
2021-06-22 12:50:17 +03:00
Evan
1ccf2ca2f4
modules: Add newlen == 0 handling to RM_StringTruncate (#3717) (#3718)
Previously, passing 0 for newlen would not truncate the string at all.
This adds handling of this case, freeing the old string and creating a new empty string.

Other changes:
- Move `src/modules/testmodule.c` to `tests/modules/basics.c`
- Introduce that basic test into the test suite
- Add tests to cover StringTruncate
- Add `test-modules` build target for the main makefile
- Extend `distclean` build target to clean modules too
2021-06-22 12:26:48 +03:00
Oran Agra
d0819d618e
solve test timing issues in replication tests (#9121)
# replication-3.tcl
had a test timeout failure with valgrind on daily CI:
```
*** [err]: SLAVE can reload "lua" AUX RDB fields of duplicated scripts in tests/integration/replication-3.tcl
Replication not started.
```
replication took more than 70 seconds.
https://github.com/redis/redis/runs/2854037905?check_suite_focus=true

on my machine it takes only about 30, but i can see how 50 seconds isn't enough.

# replication.tcl
loading was over too quickly in freebsd daily CI:
```
*** [err]: slave fails full sync and diskless load swapdb recovers it in tests/integration/replication.tcl
Expected '0' to be equal to '1' (context: type eval line 44 cmd {assert_equal [s -1 loading] 1} proc ::start_server)
```

# rdb.tcl
loading was over too quickly.
increase the time loading takes, and decrease the amount of work we try to achieve in that time.
2021-06-22 11:10:11 +03:00
Oran Agra
9b564b525d
Fix race in client side tracking (#9116)
The `Tracking gets notification of expired keys` test in tracking.tcl
used to hung in valgrind CI quite a lot.

It turns out the reason is that with valgrind and a busy machine, the
server cron active expire cycle could easily run in the same event loop
as the command that created `mykey`, so that when they key got expired,
there were two change events to broadcast, one that set the key and one
that expired it, but since we used raxTryInsert, the client that was
associated with the "last" change was the one that created the key, so
the NOLOOP filtered that event.

This commit adds a test that reproduces the problem by using lazy expire
in a multi-exec which makes sure the key expires in the same event loop
as the one that added it.
2021-06-22 07:35:59 +03:00
sundb
eae0983d2d
Fix leak and double free issues in datatype2 module test (#9102)
* Add missing call for RedisModule_DictDel in datatype2 test
* Fix memory leak in datatype2 test
2021-06-17 21:45:21 +03:00
sundb
b586d5b567
Fix querybuf test failure (#9091)
Fix test failure which introduced by #9003.
The following case will occur when querybuf expansion will allocate memory equal to (16*1024)k.
1) make use ```CFLAGS=-DNO_MALLOC_USABLE_SIZE```.
2) ```malloc``` will not allocate more under ```alpine```.
2021-06-16 22:01:37 +03:00
chenyang8094
e0cd3ad0de
Enhance mem_usage/free_effort/unlink/copy callbacks and add GetDbFromIO api. (#8999)
Create new module type enhanced callbacks: mem_usage2, free_effort2, unlink2, copy2.
These will be given a context point from which the module can obtain the key name and database id.
In addition the digest and defrag context can now be used to obtain the key name and database id.
2021-06-16 09:45:49 +03:00
Jason Elbaum
7f342020dc
Change return value type for ZPOPMAX/MIN in RESP3 (#8981)
When using RESP3, ZPOPMAX/ZPOPMIN should return nested arrays for consistency
with other commands (e.g. ZRANGE).

We do that only when COUNT argument is present (similarly to how LPOP behaves).
for reasoning see https://github.com/redis/redis/issues/8824#issuecomment-855427955

This is a breaking change only when RESP3 is used, and COUNT argument is present!
2021-06-16 09:29:57 +03:00
sundb
e5d8a5eb85
Fix the wrong reisze of querybuf (#9003)
The initialize memory of `querybuf` is `PROTO_IOBUF_LEN(1024*16) * 2` (due to sdsMakeRoomFor being greedy), under `jemalloc`, the allocated memory will be 40k.
This will most likely result in the `querybuf` being resized when call `clientsCronResizeQueryBuffer` unless the client requests it fast enough.

Note that this bug existed even before #7875, since the condition for resizing includes the sds headers (32k+6).

## Changes
1. Use non-greedy sdsMakeRoomFor when allocating the initial query buffer (of 16k).
1. Also use non-greedy allocation when working with BIG_ARG (we won't use that extra space anyway)
2. in case we did use a greedy allocation, read as much as we can into the buffer we got (including internal frag), to reduce system calls.
3. introduce a dedicated constant for the shrinking (same value as before)
3. Add test for querybuf.
4. improve a maxmemory test by ignoring the effect of replica query buffers (can accumulate many ACKs on slow env)
5. improve a maxmemory by disabling slowlog (it will cause slight memory growth on slow env).
2021-06-15 14:46:19 +03:00
Binbin
b109977301
Fix XINFO help for unexpected options. (#9075)
Small cleanup and consistency.
2021-06-15 10:01:11 +03:00
Binbin
7900b48bc7
slowlog get command supports passing in -1 to get all logs. (#9018)
This was already the case before this commit, but it wasn't clear / intended in the code, now it does.
2021-06-14 16:46:45 +03:00
YaacovHazan
1677efb9da
cleanup around loadAppendOnlyFile (#9012)
Today when we load the AOF on startup, the loadAppendOnlyFile checks if
the file is openning for reading.
This check is redundent (dead code) as we open the AOF file for writing at initServer,
and the file will always be existing for the loadAppendOnlyFile.

In this commit:
- remove all the exit(1) from loadAppendOnlyFile, as it is the caller
  responsibility to decide what to do in case of failure.
- move the opening of the AOF file for writing, to be after we loading it.
- avoid return -ERR in DEBUG LOADAOF, when the AOF is existing but empty
2021-06-14 10:38:08 +03:00
Binbin
b8a5da80c4
Fix accidental deletion of sinterstore command when we meet wrong type error. (#9032)
SINTERSTORE would have deleted the dest key right away,
even when later on it is bound to fail on an (WRONGTYPE) error.

With this change it first picks up all the input keys, and only later
delete the dest key if one is empty.

Also add more tests for some commands.
Mainly focus on 
- `wrong type error`: 
	expand test case (base on sinter bug) in non-store variant
	add tests for store variant (although it exists in non-store variant, i think it would be better to have same tests)
- the dstkey result when we meet `non-exist key (empty set)` in *store

sdiff:
- improve test case about wrong type error (the one we found in sinter, although it is safe in sdiff)
- add test about using non-exist key (treat it like an empty set)
sdiffstore:
- according to sdiff test case, also add some tests about `wrong type error` and `non-exist key`
- the different is that in sdiffstore, we will consider the `dstkey` result

sunion/sunionstore add more tests (same as above)

sinter/sinterstore also same as above ...
2021-06-13 10:53:46 +03:00
ny0312
fb140a1bff
Fix flaky test case for absolute TTL replication (#9069)
The root cause is that one test (`5 keys in, 5 keys out`) is leaking a volatile key
that can expire while another later test(`All TTL in commands are propagated
as absolute timestamp in replication stream`) is running.
Such leaked expiration injects an unexpected `DEL` command into the
replication command during the later test, causing it to fail.

The fixes are two fold:
1. Plug the leak in the first test.
2. Add FLUSHALL to the later test, to avoid future interference from other tests.
2021-06-13 08:42:20 +03:00
Binbin
0bfccc55e2
Fixed some typos, add a spell check ci and others minor fix (#8890)
This PR adds a spell checker CI action that will fail future PRs if they introduce typos and spelling mistakes.
This spell checker is based on blacklist of common spelling mistakes, so it will not catch everything,
but at least it is also unlikely to cause false positives.

Besides that, the PR also fixes many spelling mistakes and types, not all are a result of the spell checker we use.

Here's a summary of other changes:
1. Scanned the entire source code and fixes all sorts of typos and spelling mistakes (including missing or extra spaces).
2. Outdated function / variable / argument names in comments
3. Fix outdated keyspace masks error log when we check `config.notify-keyspace-events` in loadServerConfigFromString.
4. Trim the white space at the end of line in `module.c`. Check: https://github.com/redis/redis/pull/7751
5. Some outdated https link URLs.
6. Fix some outdated comment. Such as:
    - In README: about the rdb, we used to said create a `thread`, change to `process`
    - dbRandomKey function coment (about the dictGetRandomKey, change to dictGetFairRandomKey)
    - notifyKeyspaceEvent fucntion comment (add type arg)
    - Some others minor fix in comment (Most of them are incorrectly quoted by variable names)
7. Modified the error log so that users can easily distinguish between TCP and TLS in `changeBindAddr`
2021-06-10 15:39:33 +03:00
Yossi Gottlieb
8a86bca5ed
Improve test suite to handle external servers better. (#9033)
This commit revives the improves the ability to run the test suite against
external servers, instead of launching and managing `redis-server` processes as
part of the test fixture.

This capability existed in the past, using the `--host` and `--port` options.
However, it was quite limited and mostly useful when running a specific tests.
Attempting to run larger chunks of the test suite experienced many issues:

* Many tests depend on being able to start and control `redis-server` themselves,
and there's no clear distinction between external server compatible and other
tests.
* Cluster mode is not supported (resulting with `CROSSSLOT` errors).

This PR cleans up many things and makes it possible to run the entire test suite
against an external server. It also provides more fine grained controls to
handle cases where the external server supports a subset of the Redis commands,
limited number of databases, cluster mode, etc.

The tests directory now contains a `README.md` file that describes how this
works.

This commit also includes additional cleanups and fixes:

* Tests can now be tagged.
* Tag-based selection is now unified across `start_server`, `tags` and `test`.
* More information is provided about skipped or ignored tests.
* Repeated patterns in tests have been extracted to common procedures, both at a
  global level and on a per-test file basis.
* Cleaned up some cases where test setup was based on a previous test executing
  (a major anti-pattern that repeats itself in many places).
* Cleaned up some cases where test teardown was not part of a test (in the
  future we should have dedicated teardown code that executes even when tests
  fail).
* Fixed some tests that were flaky running on external servers.
2021-06-09 15:13:24 +03:00
Fabian Eichinger
39b0f0dd73
Add support for combining NX and GET flags on SET command (#8906)
Till now GET and NX were mutually exclusive.
This change make their combination mean a "Get or Set" command.

If the key exists it returns the old value and avoids setting,
and if it does't exist it returns nil and sets it to the new value (possibly with expiry time)
2021-06-07 16:47:58 +03:00
Huang Zhw
eaa7a7bb93
Fix XTRIM or XADD with LIMIT may delete more entries than Count. (#9048)
The decision to stop trimming due to LIMIT in XADD and XTRIM was after the limit was reached.
i.e. the code was deleting **at least** that count of records (from the LIMIT argument's perspective, not the MAXLEN),
instead of **up to** that count of records.
see #9046
2021-06-07 14:43:36 +03:00
Oran Agra
3e39ea0b83
tests: prevent name clash in variables leading to wrong test name (#8995)
running the "geo" unit would have shown that it completed a unit named
"north". this was because the variable `$name` was overwritten.
This commit isn't perfect, but it slightly reduces the chance for
variable name clash.

```
$ ./runtest --single unit/geo
.......
Testing unit/geo
.......
[1/1 done]: north (15 seconds)
```
2021-06-06 17:35:30 +03:00
Oran Agra
b512dfe794
tests: add details when test fails on malformed info (#9042) 2021-06-03 20:34:54 +03:00
pgxiaolianzi
f63bb9583d
Fix typo on buckup to backup (#8919) 2021-06-01 22:54:30 -07:00
Oran Agra
7cb42c9c36 add test for modules load/unload and config rewrite 2021-06-01 13:43:48 +03:00
Oran Agra
ae67539c8b
Improve new time sensitive pexpireat propagation test (#9010)
The test that was merged yesterday fails with valgrind and freebsd CI
that are too slow, and 10 seconds apparently passed between the time the
command was sent to redis and the time it was actually executed.

```
*** [err]: All TTL in commands are propagated as absolute timestamp in replication stream in tests/unit/expire.tcl
Expected 'del a' to match 'set foo1 bar PXAT *' (context: type source line 778 file /home/runner/work/redis/redis/tests/test_helper.tcl cmd {assert_match [lindex $patterns $j] [read_from_replication_stream $s]} proc ::assert_replication_stream level 1)
```
2021-06-01 08:01:10 +03:00
ny0312
53d1acd598
Always replicate time-to-live(TTL) as absolute timestamps in milliseconds (#8474)
Till now, on replica full-sync we used to transfer absolute time for TTL,
however when a command arrived (EXPIRE or EXPIREAT),
we used to propagate it as is to replicas (possibly with relative time),
but always translate it to EXPIREAT (absolute time) to AOF.

This commit changes that and will always use absolute time for propagation.
see discussion in #8433

Furthermore, we Introduce new commands: `EXPIRETIME/PEXPIRETIME`
that allow extracting the absolute TTL time from a key.
2021-05-30 09:20:32 +03:00
YaacovHazan
501d775583
unregister AE_READABLE from the read pipe in backgroundSaveDoneHandlerSocket (#8991)
In diskless replication, we create a read pipe for the RDB, between the child and the parent.
When we close this pipe (fd), the read handler also needs to be removed from the event loop (if it still registered).
Otherwise, next time we will use the same fd, the registration will be fail (panic), because
we will use EPOLL_CTL_MOD (the fd still register in the event loop), on fd that already removed from epoll_ctl
2021-05-26 14:51:53 +03:00
YaacovHazan
32a2584e07
stabilize tests that involved with load handlers (#8967)
When test stop 'load handler' by killing the process that generating the load,
some commands that already in the input buffer, still might be processed by the server.
This may cause some instability in tests, that count on that no more commands
processed after we stop the `load handler'

In this commit, new proc 'wait_load_handlers_disconnected' added, to verify that no more
cammands from any 'load handler' prossesed, by checking that the clients who
genreate the load is disconnceted.

Also, replacing check of dbsize with wait_for_ofs_sync before comparing debug digest, as
it would fail in case the last key the workload wrote was an overridden key (not a new one).

Affected tests
Race fix:
- failover command to specific replica works
- Connect multiple replicas at the same time (issue #141), master diskless=$mdl, replica diskless=$sdl
- AOF rewrite during write load: RDB preamble=$rdbpre

Cleanup and speedup:
- Test replication with blocking lists and sorted sets operations
- Test replication with parallel clients writing in different DBs
- Test replication partial resync: $descr (diskless: $mdl, $sdl, reconnect: $reconnect
2021-05-20 15:29:43 +03:00
Madelyn Olson
a59e75a475
Hide migrate command from slowlog if they include auth (#8859)
Redact commands that include sensitive data from slowlog and monitor
2021-05-19 08:23:54 -07:00
Oran Agra
d67e66de72
Fix race in new lazyfree test (#8965)
I recently saw this failure:
[err]: lazy free a stream with all types of metadata in tests/unit/lazyfree.tcl
Expected '2' to be equal to '1' (context: type eval line 23 cmd {assert_equal [s lazyfreed_objects] 1} proc ::test)

The only explanation for such a thing is that the async flushdb wasn't
done before we did the resetstat
2021-05-19 16:06:43 +03:00
Oran Agra
8458baf6a9
longer timeout in replication test (#8963)
the test normally passes. but we saw one failure in a valgrind run in github actions
2021-05-18 17:13:59 +03:00
Oran Agra
cf41c0b5ff
fix race in config rewrite test (#8960) 2021-05-18 17:10:06 +03:00
Oran Agra
fbc0e2b834
Reset lazyfreed_objects info field with RESETSTAT, test for stream lazyfree (#8934)
And also add tests to cover lazy free of streams with various types of
metadata (see #8932)
2021-05-17 16:54:37 +03:00
Raghav Muddur
31edc22ecc
EVALSHA_RO and EVAL_RO Commands (#8820)
* EVALSHA_RO and EVAL_RO Commands

Added new readonly versions of EVAL
and EVALSHA.
2021-05-12 21:07:34 -07:00
yoav-steinberg
152fce5e2c
Enforce client output buffer soft limit when no traffic. (#8833)
When client breached the output buffer soft limit but then went idle,
we didn't disconnect on soft limit timeout, now we do.
Note this also resolves some sporadic test failures in due to Linux
buffering data which caused tests to fail if during the test we went
back under the soft COB limit.

Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: sundb <sundbcn@gmail.com>
2021-05-04 13:45:08 +03:00
Huang Zhw
2b22fffc78
Fix potential CONFIG SET bind test failure. (#8875)
Use an invalid IP address to trigger CONFIG SET bind failure, instead of DNS which is not guaranteed to always fail.
2021-04-27 18:02:23 +03:00
Oran Agra
611959eee5
fuzz tester, try to print hung command (#8837) 2021-04-25 13:08:46 +03:00
bugwz
761d7d2771
Print the number of abnormal line in AOF (#8823)
When redis-check-aof finds an error, it prints the line number for faster troubleshooting.
2021-04-20 21:51:24 +03:00
Madelyn Olson
c73b4ddfd9
Fix memory leak when doing lazyfreeing client tracking table (#8822)
Interior rax pointers were not being freed
2021-04-19 22:16:27 -07:00
Hanna Fadida
53a4d6c3b1
Modules: adding a module type for key space notification (#8759)
Adding a new type mask ​for key space notification, REDISMODULE_NOTIFY_MODULE, to enable unique notifications from commands on REDISMODULE_KEYTYPE_MODULE type keys (which is currently unsupported).

Modules can subscribe to a module key keyspace notification by RM_SubscribeToKeyspaceEvents,
and clients by notify-keyspace-events of redis.conf or via the CONFIG SET, with the characters 'd' or 'A' 
(REDISMODULE_NOTIFY_MODULE type mask is part of the '**A**ll' notation for key space notifications).

Refactor: move some pubsub test infra from pubsub.tcl to util.tcl to be re-used by other tests.
2021-04-19 21:33:26 +03:00
guybe7
f40ca9cb58
Modules: Replicate lazy-expire even if replication is not allowed (#8816)
Before this commit using RM_Call without "!" could cause the master
to lazy-expire a key (delete it) but without replicating to replicas.
This could cause the replica's memory usage to gradually grow and
could also cause consistency issues if the master and replica have
a clock diff.
This bug was introduced in #8617

Added a test which demonstrates that scenario.
2021-04-19 17:16:02 +03:00
Harkrishn Patro
7a3d1487e4
ACL channels permission handling for save/load scenario. (#8794)
In the initial release of Redis 6.2 setting a user to only allow pubsub access to
a specific channel, and doing ACL SAVE, resulted in an assertion when
ACL LOAD was used. This was later changed by #8723 (not yet released),
but still not properly resolved (now it errors instead of crash).

The problem is that the server that generates an ACL file, doesn't know what
would be the setting of the acl-pubsub-default config in the server that will load it.
so ACL SAVE needs to always start with resetchannels directive.

This should still be compatible with old acl files (from redis 6.0), and ones from earlier
versions of 6.2 that didn't mess with channels.

Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2021-04-19 13:27:44 +03:00
sundb
3a955d9ad4
Fix ouput buffer limit test (#8803)
The tail size of c->reply is 16kb, but in the test only publish a
few chars each time, due to a change in #8699, the obuf limit
is now checked a new memory allocation is made, so this test
would have sometimes failed to trigger a soft limit disconnection
in time.

The solution is to write bigger payloads to the output buffer, but
still limit their rate (not more than 100k/s).
2021-04-19 10:08:07 +03:00
Yossi Gottlieb
c0f5c678c2
Revert cluster slot migration tests. (#8806)
Disables #8649 and subsequent attempts to stabilize the test.
2021-04-18 20:51:08 +03:00
Oran Agra
a9897b0084
Fix timing of new replication test (#8807)
In github actions CI with valgrind, i saw that even the fast replica
(one that wasn't paused), didn't get to complete the replication fast
enough, and ended up getting disconnected by timeout.

Additionally, due to a typo in uname, we didn't get to actually run the
CPU efficiency part of the test.
2021-04-18 15:12:34 +03:00
Oran Agra
f4b5a4d869
Improve testsuite print of log file (#8805)
1. the `dump_logs` option would have printed only logs of servers that were
   spawn before the test proc started, and not ones that the test proc
   started inside it.
2. when a server proc catches an exception it should normally forward the
   exception upwards, specifically when it's an assertion that should be
   caught by a test proc above. however, in `durable` mode, we caught all
   exceptions printed them to stdout and let the code continue,
   this was wrong to do for assertions, which should have still been
   propagated to the test function.
3. don't bother to search for crash log to print if we printed the the
   entire log anyway
4. if no crash log was found, no need to print anything (i.e. the fact it
   wasn't found)
5. rename warnings_from_file to crashlog_from_file
2021-04-18 11:55:54 +03:00
guybe7
d63d02601f
Add a timeout mechanism for replicas stuck in fullsync (#8762)
Starting redis 6.0 (part of the TLS feature), diskless master uses pipe from the fork
child so that the parent is the one sending data to the replicas.
This mechanism has an issue in which a hung replica will cause the master to wait
for it to read the data sent to it forever, thus preventing the fork child from terminating
and preventing the creations of any other forks.

This PR adds a timeout mechanism, much like the ACK-based timeout,
we disconnect replicas that aren't reading the RDB file fast enough.
2021-04-15 17:18:51 +03:00
YaacovHazan
645c664fbb
stabilized and improve pendingquerybuf test suit (#8780)
replace the hardcoded after 2000, with waiting for the sync and
wait for condition
2021-04-14 11:49:00 +03:00
Oran Agra
b278e44376
Revert "Fix: server will crash if rdbload or rdbsave method is not provided in module (#8670)" (#8771)
This reverts commit 808f3004f0.
2021-04-13 17:41:46 +03:00
Oran Agra
c07e16fadd
Add more attempts to a timing sensitive test (#8770) 2021-04-13 17:35:10 +03:00
Yossi Gottlieb
5e3a15ae1b
Fix failing cluster tests. (#8763)
Disable replica migration to avoid a race condition where the
migrated-from node turns into a replica.

Long term, this test should probably be improved to handle multiple
slots and accept such auto migrations but this is a quick fix to
stabilize the CI without completely dropping this test.
2021-04-13 00:00:57 +03:00
Yang Bodong
4c14e8668c
Fix out of range confusing error messages (XAUTOCLAIM, RPOP count) (#8746)
Fix out of range error messages to be clearer (avoid mentioning 9223372036854775807)
* Fix XAUTOCLAIM COUNT option confusing error msg
* Fix other RPOP and alike error message to mention positive
2021-04-07 10:01:28 +03:00
Bonsai
808f3004f0
Fix: server will crash if rdbload or rdbsave method is not provided in module (#8670)
With this fix, module data type registration will fail if the load or save callbacks are not defined, or the optional aux load and save callbacks are not either both defined or both missing.
2021-04-06 12:09:36 +03:00
Yossi Gottlieb
4724dd439e
Clean up and stabilize cluster migration tests. (#8745)
This is work in progress, focusing on two main areas:
* Avoiding race conditions with cluster configuration propagation.
* Ignoring limitations with redis-cli --cluster fix which makes it hard
  to distinguish real errors (e.g. failure to fix) from expected
  conditions in this test (e.g. nodes not agreeing on configuration).
2021-04-06 11:57:57 +03:00
Huang Zhw
3b74b55084
Fix "default" and overwritten / reset users will not have pubsub channels permissions by default. (#8723)
Background:
Redis 6.2 added ACL control for pubsub channels (#7993), which were supposed
to be permissive by default to retain compatibility with redis 6.0 ACL. 
But due to a bug, only newly created users got this `acl-pubsub-default` applied,
while overwritten (updated) users got reset to `resetchannels` (denied).

Since the "default" user exists before loading the config file,
any ACL change to it, results in an update / overwrite.

So when a "default" user is loaded from config file or include ACL
file with no channels related rules, the user will not have any
permissions to any channels. But other users will have default
permissions to any channels.

When upgraded from 6.0 with config rewrite, this will lead to
"default" user channels permissions lost.
When users are loaded from include file, then call "acl load", users
will also lost channels permissions.

Similarly, the `reset` ACL rule, would have reset the user to be denied
access to any channels, ignoring `acl-pubsub-default` and breaking
compatibility with redis 6.0.

The implication of this fix is that it regains compatibility with redis 6.0,
but breaks compatibility with redis 6.2.0 and 2.0.1. e.g. after the upgrade,
the default user will regain access to pubsub channels.

Other changes:
Additionally this commit rename server.acl_pubusub_default to
server.acl_pubsub_default and fix typo in acl tests.
2021-04-05 23:13:20 +03:00
Sokolov Yura
1cab962098
Add cluster-allow-replica-migration option. (#5285)
Previously (and by default after commit) when master loose its last slot
(due to migration, for example), its replicas will migrate to new last slot
holder.

There are cases where this is not desired:
* Consolidation that results with removed nodes (including the replica, eventually).
* Manually configured cluster topologies, which the admin wishes to preserve.

Needlessly migrating a replica triggers a full synchronization and can have a negative impact, so
we prefer to be able to avoid it where possible.

This commit adds 'cluster-allow-replica-migration' configuration option that is
enabled by default to preserve existed behavior. When disabled, replicas will
not be auto-migrated.

Fixes #4896

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-04-04 09:43:24 +03:00
Valentino Geron
44d8b039e8
Fix XAUTOCLAIM response to return the next available id as the cursor (#8725)
This command used to return the last scanned entry id as the cursor,
instead of the next one to be scanned.
so in the next call, the user could / should have sent `(cursor` and not
just `cursor` if he wanted to avoid scanning the same record twice.

Scanning the record twice would look odd if someone is checking what
exactly was scanned, but it also has a side effect of incrementing the
delivery count twice.
2021-04-01 12:13:55 +03:00
Oran Agra
370ab4c4db
Solve sentinel test issue in TLS due to recent tests change. (#8728)
5629dbe71 added a change that configures the tcp (plaintext) port
alongside the tls port, this causes the INFO command for tcp_port
to return that instead of the tls port when running in tls, and that broke
the sentinel tests that query it.

the fix is to add a method that gets the right port from CONFIG instead
of relying on the tcp_port info field.
2021-04-01 09:44:44 +03:00
guybe7
843f769b96
zsetAdd: Fix wrong reply in case of INCR and GT/LT (#8717)
If GT/LT fails the operation we need to reply with
nill (like failure due to NX).

Other changes:
Add the missing $encoding suffix to many zset tests

Note: there's a behavior change just in case of INCR + GT/LT that fails.
The old code was replying with the wrong (rejected) score, and now it'll reply with nil.

Note that that's anyway a corner case so this "behavior change" shouldn't have too much affect.
Using GT/LT with INCR has a predictable result even before we run the command
(INCR GT will only only / always fail if the increment is negative).
2021-04-01 09:33:53 +03:00
sundb
569a3f4548
Use chi-square for random distributivity verification in test (#8709)
Problem:
Currently, when performing random distribution verification, we determine
the probability of each element occurring in the sum, but the probability is
only an estimate, these tests had rare sporadic failures, and we cannot verify
what the probability of failure will be.

Solution:
Using the chi-square distribution instead of the original random distribution
validation makes the test more reasonable and easier to find problems.
2021-04-01 08:20:15 +03:00
Jérôme Loyet
91f4f41665
Add replica-announced config option (#8653)
The 'sentinel replicas <master>' command will ignore replicas with
`replica-announced` set to no.

The goal of disabling the config setting replica-announced is to allow ghost
replicas. The replica is in the cluster, synchronize with its master, can be
promoted to master and is not exposed to sentinel clients. This way, it is
acting as a live backup or living ghost.

In addition, to prevent the replica to be promoted as master, set
replica-priority to 0.
2021-03-30 23:40:22 +03:00
Yossi Gottlieb
6a052af890
Cluster migration test cleanup. (#8726)
* Dump more output on error (always, cluster tests currently have no
verbose flag).
* Slow down redis-cli check iteration.
2021-03-30 23:33:01 +03:00
Viktor Söderqvist
5629dbe715
Add support for plaintext clients in TLS cluster (#8587)
The cluster bus is established over TLS or non-TLS depending on the configuration tls-cluster. The client ports distributed in the cluster and sent to clients are assumed to be TLS or non-TLS also depending on tls-cluster.

The cluster bus is now extended to also contain the non-TLS port of clients in a TLS cluster, when available. The non-TLS port of a cluster node, when available, is sent to clients connected without TLS in responses to CLUSTER SLOTS, CLUSTER NODES, CLUSTER SLAVES and MOVED and ASK redirects, instead of the TLS port.

The user was able to override the client port by defining cluster-announce-port. Now cluster-announce-tls-port is added, so the user can define an alternative announce port for both TLS and non-TLS clients.

Fixes #8134
2021-03-30 23:11:32 +03:00
JunhuaY
28375ff63e
re-fix config rewrite for empty save directive (#8722)
the bug was also discussed in #8716, and was solved in #8719, but incompletely:
when the server is started, and the save option is default, if you issue the " config set save "" "
to change the save option, and then issue the “config rewrite” command, the " save "" " won't be saved.
2021-03-30 22:49:06 +03:00
Oran Agra
cd81dcf18b
solve race conditions in psync2-pingoff test (#8720)
Another test race condition in the macos tests.
the test was waiting for PINGs to be generated and put on the replication stream,
but waiting for 1 or 2 seconds doesn't really guarantee that.
then the test that expected 6 full syncs, found only 4
2021-03-30 11:41:06 +03:00
Yossi Gottlieb
65311a3360
Fix config rewrite with an empty "save" parameter. (#8719) 2021-03-29 18:53:20 +03:00
Sokolov Yura
315df9ada0
Add cluster slot migration tests (#8649)
Add tests for fixing migrating slot at all stages:

1. when migration is half inited on "migrating" node
2. when migration is half inited on "importing" node
3. migration inited, but not finished
4. migration is half finished on "migrating" node
5. migration is half finished on "importing" node

Also add tests for many simultaneous slot migrations.

Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
2021-03-29 13:52:02 +03:00
Meir Shpilraien (Spielrein)
036963a7da
Restore old client 'processCommandAndResetClient' to fix false dead client indicator (#8715)
'processCommandAndResetClient' returns 1 if client is dead. It does it
by checking if serve.current_client is NULL. On script timeout, Redis will re-enter
'processCommandAndResetClient' and when finish we will set server.current_client
to NULL. This will cause later to falsely return 1 and think that the client that
sent the timed-out script is dead (Redis to stop reading from the client buffer).
2021-03-29 13:34:16 +03:00
Huang Zhw
e138698e54
make processCommand check publish channel permissions. (#8534)
Add publish channel permissions check in processCommand.

processCommand didn't check publish channel permissions, so we can
queue a publish command in a transaction. But when exec the transaction,
it will fail with -NOPERM.

We also union keys/commands/channels permissions check togegher in
ACLCheckAllPerm. Remove pubsubCheckACLPermissionsOrReply in 
publishCommand/subscribeCommand/psubscribeCommand. Always 
check permissions in processCommand/execCommand/
luaRedisGenericCommand.
2021-03-26 14:10:01 +03:00
Oran Agra
497351ad07
Fix SLOWLOG for blocked commands (#8632)
* SLOWLOG didn't record anything for blocked commands because the client
  was reset and argv was already empty. there was a fix for this issue
  specifically for modules, now it works for all blocked clients.
* The original command argv (before being re-written) was also reset
  before adding the slowlog on behalf of the blocked command.
* Latency monitor is now updated regardless of the slowlog flags of the
  command or its execution (their purpose is to hide sensitive info from
  the slowlog, not hide the fact the latency happened).
* Latency monitor now uses real_cmd rather than c->cmd (which may be
  different if the command got re-written, e.g. GEOADD)

Changes:
* Unify shared code between slowlog insertion in call() and
  updateStatsOnUnblock(), hopefully prevent future bugs from happening
  due to the later being overlooked.
* Reset CLIENT_PREVENT_LOGGING in resetClient rather than after command
  processing.
* Add a test for SLOWLOG and BLPOP

Notes:
- real_cmd == c->lastcmd, except inside MULTI and Lua.
- blocked commands never happen in these cases (MULTI / Lua)
- real_cmd == c->cmd, except for when the command is rewritten (e.g.
  GEOADD)
- blocked commands (currently) are never rewritten
- other than the command's CLIENT_PREVENT_LOGGING, and the
  execution flag CLIENT_PREVENT_LOGGING, other cases that we want to
  avoid slowlog are on AOF loading (specifically CMD_CALL_SLOWLOG will
  be off when executed from execCommand that runs from an AOF)
2021-03-25 10:20:27 +02:00
Qu Chen
7de6451818
Properly initialize variable to make valgrind happy in checkChildrenDone(). Removed usage for the obsolete wait3() and wait4() in favor of waitpid(), and properly check for the exit status code. (#8666) 2021-03-24 08:41:05 -07:00
yoav-steinberg
3060de88ce
Remove cron saving during BGSAVE test. (#8688)
This fixes a race where a bgsave can start during the test after we verified no bgsave is running.
2021-03-24 15:14:47 +02:00
Oran Agra
f6e1a94e03
Corrupt stream key access to uninitialized memory (#8681)
the corrupt-dump-fuzzer test found a case where an access to a corrupt
stream would have caused accessing to uninitialized memory.
now it'll panic instead.

The issue was that there was a stream that says it has more than 0
records, but looking for the max ID came back empty handed.

p.s. when sanitize-dump-payload is used, this corruption is detected,
and the RESTORE command is gracefully rejected.
2021-03-24 11:33:49 +02:00
Yossi Gottlieb
c4ef1efdb7
Add support for reading encrypted keyfiles. (#8644) 2021-03-22 13:27:46 +02:00
Oran Agra
2f717c156a
fix race in diskless load cluster tests (#8674) 2021-03-22 10:51:13 +02:00
Oran Agra
a7c02b19bf
Fix race in replication test (#8679)
Since redis 6.2, redis immediately tries to connect to the master, not
waiting for replication cron.

in the slow freebsd CI, this test failed and master_link_status was
already "up" when INFO was called.
2021-03-22 10:50:39 +02:00
Meir Shpilraien (Spielrein)
9ae4f5c73d
Fix script kill to work also on scripts that use pcall (#8661)
pcall function runs another LUA function in protected mode, this means
that any error will be caught by this function and will not stop the LUA
execution. The script kill mechanism uses error to stop the running script.
Scripts that uses pcall can catch the error raise by the script kill mechanism,
this will cause a script like this to be unkillable:

local f = function()
        while 1 do
                redis.call('ping')
        end
end
while 1 do
        pcall(f)
end

The fix is, when we want to kill the script, we set the hook function to be invoked 
after each line. This will promise that the execution will get another
error before it is able to enter the pcall function again.
2021-03-17 18:52:11 +02:00
Huang Zhw
a19c4058be
When tests exit normally, some processes may still be alive (#8647)
In certain scenario start_server may think it failed to start a redis server
although it started successfully. in these cases, it'll not terminate it, and
it'll remain running when the test is over.

In start_server if config doesn't have bind (the minimal.conf in introspection.tcl),
it will try to bind ipv4 and ipv6. One may success while other fails. It will
output "Could not create server TCP listening socket".
wait_server_started uses this message to check whether instance started
successfully. So it will consider that it failed even though redis started successfully.

Additionally, in some cases it wasn't clear to users why the server exited,
since the warning message printed to the log, could in some cases be harmless,
and in some cases fatal.

This PR adds makes a clear distinction between a warning log message and
a fatal one, and changes the test suite to look for the fatal message.
2021-03-16 17:25:30 +02:00
Madelyn Olson
e1d98bca5a
Redact slowlog entries for config with sensitive data. (#8584)
Redact config set requirepass/masterauth/masteruser from slowlog in addition to showing ACL commands without sensitive values.
2021-03-15 22:00:29 -07:00
guybe7
dba33a943d
Missing EXEC on modules propagation after failed EVAL execution (#8654)
1. moduleReplicateMultiIfNeeded should use server.in_eval like
   moduleHandlePropagationAfterCommandCallback
2. server.in_eval could have been set to 1 and not reset back
   to 0 (a lot of missed early-exits after in_eval is already 1)

Note: The new assertions in processCommand cover (2) and I added
two module tests to cover (1)

Implications:
If an EVAL that failed (and thus left server.in_eval=1) runs before a module
command that replicates, the replication stream will contain MULTI (because
moduleReplicateMultiIfNeeded used to check server.lua_caller which is NULL
at this point) but not EXEC (because server.in_eval==1)
This only affects modules as module.c the only user of server.in_eval.

Affects versions 6.2.0, 6.2.1
2021-03-15 21:19:57 +02:00
KinWaiYuen
5b48d90049
Optimize CLUSTER SLOTS reply by reducing unneeded loops (#8541)
This commit more efficiently computes the cluster bulk slots response 
by looping over the entire slot space once, instead of for each node.
2021-03-11 22:40:35 -08:00
guybe7
a4f03bd7eb
Fix some memory leaks in propagagte.c (#8636)
Introduced by 3d0b427c30
2021-03-11 13:50:13 +02:00
Harkrishn Patro
b70d81f60b
Process hello command even if the default user has no permissions. (#8633)
Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>
2021-03-10 21:19:35 -08:00
guybe7
3d0b427c30
Fix some issues with modules and MULTI/EXEC (#8617)
Bug 1:
When a module ctx is freed moduleHandlePropagationAfterCommandCallback
is called and handles propagation. We want to prevent it from propagating
commands that were not replicated by the same context. Example:
1. module1.foo does: RM_Replicate(cmd1); RM_Call(cmd2); RM_Replicate(cmd3)
2. RM_Replicate(cmd1) propagates MULTI and adds cmd1 to also_propagagte
3. RM_Call(cmd2) create a new ctx, calls call() and destroys the ctx.
4. moduleHandlePropagationAfterCommandCallback is called, calling
   alsoPropagates EXEC (Note: EXEC is still not written to socket),
   setting server.in_trnsaction = 0
5. RM_Replicate(cmd3) is called, propagagting yet another MULTI (now
   we have nested MULTI calls, which is no good) and then cmd3

We must prevent RM_Call(cmd2) from resetting server.in_transaction.
REDISMODULE_CTX_MULTI_EMITTED was revived for that purpose.

Bug 2:
Fix issues with nested RM_Call where some have '!' and some don't.
Example:
1. module1.foo does RM_Call of module2.bar without replication (i.e. no '!')
2. module2.bar internally calls RM_Call of INCR with '!'
3. at the end of module1.foo we call RM_ReplicateVerbatim

We want the replica/AOF to see only module1.foo and not the INCR from module2.bar

Introduced a global replication_allowed flag inside RM_Call to determine
whether we need to replicate or not (even if '!' was specified)

Other changes:
Split beforePropagateMultiOrExec to beforePropagateMulti afterPropagateExec
just for better readability
2021-03-10 18:02:17 +02:00
Yossi Gottlieb
817894c012
Fix test false positive due to a race condition. (#8616) 2021-03-08 21:22:08 +02:00
Yossi Gottlieb
7d81f39222
Fix flaky unit/maxmemory test on MacOS/BSD. (#8619)
It seems like non-Linux sockets may be less greedy, resulting with more
transient client output buffers.

Haven't proven this but empirically when stressing this test on
non-Linux tends to exhibit increased mem_clients_normal values.
2021-03-08 20:53:53 +02:00
Yossi Gottlieb
3c7d6a1853
Improve redis-cli non-binary safe string handling. (#8566)
* The `redis-cli --scan` output should honor output mode (set explicitly or implicitly), and quote key names when not in raw mode.
  * Technically this is a breaking change, but it should be very minor since raw mode is by default on for non-tty output.
  * It should only affect  TTY output (human users) or non-tty output if `--no-raw` is specified.

* Added `--quoted-input` option to treat all arguments as potentially quoted strings.
* Added `--quoted-pattern` option to accept a potentially quoted pattern.

Unquoting is applied to potentially quoted input only if single or double quotes are used. 

Fixes #8561, #8563
2021-03-04 15:03:49 +02:00
Yossi Gottlieb
5d180d2834
Fix potential replication-4 test race condition. (#8583)
Co-authored-by: Oran Agra <oran@redislabs.com>
2021-03-02 18:12:11 +02:00
YaacovHazan
c19530bc71
fix new networking tests to work when the test suite is used in tls mode (#8582)
the tests were unable to connect to the server since the attempted to use normal tcp
2021-03-01 20:53:02 +02:00
Oran Agra
349ef3f6a0
fix stream deep sanitization with deleted records (#8568)
When sanitizing the stream listpack, we need to count the deleted records too.
otherwise the last line that checks the next pointer fails.

Add test to cover that state in the stream tests.
2021-03-01 17:23:29 +02:00
YaacovHazan
a031d268b1
Make port, tls-port and bind configurations modifiable (#8510)
Add ability to modify port, tls-port and bind configurations by CONFIG SET command.

To simplify the code and make it cleaner, a new structure
added, socketFds, which contains the file descriptors array and its counter,
and used for TCP, TLS and Cluster sockets file descriptors.
2021-03-01 16:04:44 +02:00
Bonsai
81a55d026f
fix: call CLIENT INFO from redis module will crash the server (#8560)
Because when the RM_Call is invoked. It will create a faker client.
The point is client connection is NULL, so server will crash in connGetInfo

Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2021-03-01 08:18:14 +02:00
Viktor Söderqvist
6122f1c450
Shared reusable client for RM_Call() (#8516)
A single client pointer is added in the server struct. This is
initialized by the first RM_Call() and reused for every subsequent
RM_Call() except if it's already in use, which means that it's not
used for (recursive) module calls to modules. For these, a new
"fake" client is created each time.

Other changes:
* Avoid allocating a dict iterator in pubsubUnsubscribeAllChannels
  when not needed
2021-02-28 14:11:18 +02:00
Madelyn Olson
c6f0ea2c81
Allow stopped redis processes to be killed in tests (#8552) 2021-02-24 14:26:16 -08:00
sundb
60d5ef4d82
Use addReplyErrorObject with shared.noscripterr (#8544) 2021-02-24 08:45:13 -08:00
guybe7
f745c0181a
Fix race in CONFIG REWRITE sanity (#8536)
server may still be LOADING the RDB when receiving the ping
2021-02-23 20:28:03 +02:00
Yossi Gottlieb
95ea74549c
Fix failed tests on Linux Alpine and add a CI job. (#8532)
* Remove linux/version.h dependency.

This introduces unnecessary dependencies, and generally not a good idea
as the platform we build on may be different than the platform we run
on.

To determine if sync_file_range exists we can simply rely on header file
hints.

* Fix setproctitle() on libmusl.

The previous ifdef checks were a bit too strict for no apparent
reason.

* Fix tests failure on Linux with no backtrace.

* Add alpine daily CI job.
2021-02-23 12:57:45 +02:00
Harkrishn Patro
4739131ca6
Remove acl subcommand validation if fully added command exists. (#8483)
This validation was only done for sub-commands and not for commands.

These would have been valid (not produce any error)
ACL SETUSER bob +@all +client
ACL SETUSER bob +client +client

so no reason for this one to fail:
ACL SETUSER bob +client +client|id

One example why this is needed is that pfdebug wasn't part of the @hyperloglog
group and now it is. so something like:
acl setuser user1 +@hyperloglog +pfdebug|test
would have succeeded in early 6.0.x, and fail in 6.2 RC3

Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2021-02-22 15:22:25 +02:00
Huang Zw
f687ac0c32
Client tracking tracking-redir-broken push len is 2 not 3 (#8456)
When redis responds with tracking-redir-broken push message (RESP3),
it was responding with a broken protocol: an array of 3 elements, but only
pushes 2 elements.

Some bugs in the test make this pass. Read the push reply
will consume an extra reply, because the reply length is 3, but there
are only two elements, so the next reply will be treated as third
element. So the test is corrected too.

Other changes:
* checkPrefixCollisionsOrReply success should return 1 instead of -1,
  this bug didn't have any implications.
* improve client tracking tests to validate more of the response it reads.
2021-02-21 09:34:46 +02:00
Gnanesh
0772098b1b
EXPIRE, EXPIREAT, SETEX, GETEX: Return error when expire time overflows (#8287)
Respond with error if expire time overflows from positive to negative of vice versa.

* `SETEX`, `SET EX`, `GETEX` etc would have already error on negative value,
but now they would also error on overflows (i.e. when the input was positive but
after the manipulation it becomes negative, which would have passed before)
* `EXPIRE` and `EXPIREAT` was ok taking negative values (would implicitly delete
the key), we keep that, but we do error if the user provided a value that changes
sign when manipulated (except the case of changing sign when `basetime` is added)

Signed-off-by: Gnanesh <gnaneshkunal@outlook.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2021-02-21 09:09:54 +02:00
sundb
46346e9e3a
Fix timing error oom-score-adj test (#8513)
fixes timing issue, fork didn't always get to set the oom score before the test verified it.
2021-02-19 13:01:25 +02:00
uriyage
fd052d2a86
Adds INFO fields to track fork child progress (#8414)
* Adding current_save_keys_total and current_save_keys_processed info fields.
  Present in replication, BGSAVE and AOFRW.
* Changing RM_SendChildCOWInfo() to RM_SendChildHeartbeat(double progress)
* Adding new info field current_fork_perc. Present in Replication, BGSAVE, AOFRW,
  and module forks.
2021-02-16 16:06:51 +02:00
Oran Agra
fb3457d157
minor test suite cleanup, revive old test (#8497)
There are two tests in other.tcl that were dependant of the sha1 package
import which meant that they didn't usually run.
The reason it was like that was that prior to the creation of DEBUG
DIGEST, the test suite used to have an equivalent function, but that's
no longer the case and this dependency isn't needed.

The other change is to revert config changes done by the test before the
test suite continues. can be useful if using `--host` to run multiple
units against the same server
2021-02-15 17:20:03 +02:00
Yossi Gottlieb
141ac8df59
Escape unsafe field name characters in INFO. (#8492)
Fixes #8489
2021-02-15 17:08:53 +02:00
Oran Agra
30775bc3e3
solve race in replication-2 test - again (#8491)
this should make it timing independent and also faster in most cases
2021-02-15 12:50:23 +02:00
Viktor Söderqvist
0bc8c9c8f9
Modules: In RM_HashSet, add COUNT_ALL flag and set errno (#8446)
The added flag affects the return value of RM_HashSet() to include
the number of inserted fields, in addition to updated and deleted
fields.

errno is set on errors, tests are added and documentation updated.
2021-02-15 11:40:05 +02:00
Yossi Gottlieb
8c42d1257f
Fix errors with sentinel leaked fds test. (#8482)
* Don't run test script on non-Linux.
* Verify that reported fds do indeed exist also in parent, to avoid
  false negatives on some systems (namely CentOS).

Co-authored-by: Andy Pan <panjf2000@gmail.com>
2021-02-11 15:25:01 +02:00
filipe oliveira
b5ca1e9e53
Removed time sensitive checks from block on background tests. Fixed uninitialized variable (#8479)
- removes time sensitive checks from block on background tests during leak checks.
- fix uninitialized variable on RedisModuleBlockedClient() when calling
  RM_BlockedClientMeasureTimeEnd() without RM_BlockedClientMeasureTimeStart()
2021-02-10 08:59:07 +02:00
WuYunlong
203f357c32
Cleanup in redis-cli and tests: release memory on exit, change dup test name (#8475)
1. Rename 18-cluster-nodes-slots.tcl to 19-cluster-nodes-slots.tcl.
  it was conflicting with another test prefixed by 18
2. Release memory on exit in redis-cli.c.
3. Fix freeConvertedSds indentation.
2021-02-09 12:36:09 +02:00
Yossi Gottlieb
dbcc0a85d0
Fix and cleanup Sentinel leaked fds test. (#8469)
* For consistency, use tclsh for the script as well
* Ignore leaked fds that originate from grandparent process, since we
  only care about fds redis-sentinel itself is responsible for
* Check every test iteration to catch problems early
* Some cleanups, e.g. parameterization of file name, etc.
2021-02-08 17:02:46 +02:00
filipe oliveira
b2351ea0dc
[fix] Increasing block on background timeout time to avoid test failure (#8470)
The test failed from time to time on Github actions.
We think it's possible that on the module's blocking timeout
time tracking test, the timeout is happening prior we issue the
RedisModule_BlockedClientMeasureTimeStart(bc) on the
background thread. If that is the case one possible solution
is to increase the timeout.
Increasing to 200ms to 500ms to see if nightly stops failing.
2021-02-08 16:24:00 +02:00
Oran Agra
02ab14cc2e
solve race in replication-2 test (#8461)
use SIGSTOP instead of DEBUG SLEEP, reduces the test
time by some 2 seconds and avoids failures on slow machines
2021-02-07 16:22:30 +02:00
Yossi Gottlieb
5b8350aaaa
Add --dump-logs tests option. (#8459)
Dump the entire server log if a test failed, to easy troubleshooting
with no access to log files.
2021-02-07 12:37:24 +02:00
Viktor Söderqvist
aea6e71ef8
RM_ZsetRem: Delete key if empty (#8453)
Without this fix, RM_ZsetRem can leave empty sorted sets which are
not allowed to exist.

Removing from a sorted set while iterating seems to work (while
inserting causes failed assetions). RM_ZsetRangeEndReached is
modified to return 1 if the key doesn't exist, to terminate
iteration when the last element has been removed.
2021-02-05 19:54:01 +02:00
filipe oliveira
b3bdcd2278
Fix compiler warning on implicit declaration of ‘nanosleep’ . Removed unused variable (#8454) 2021-02-05 19:51:31 +02:00
sundb
18ac41973b
RAND* commands: fix risk of OOM panic in hash and zset, use fair random in hash, and add tests for even distribution to all (#8429)
Changes to HRANDFIELD and ZRANDMEMBER:
* Fix risk of OOM panic when client query a very big negative count (avoid allocating huge temporary buffer).
* Fix uneven random distribution in HRANDFIELD with negative count (wasn't using dictGetFairRandomKey).
* Add tests to check an even random distribution (HRANDFIELD, SRANDMEMBER, ZRANDMEMBER).

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-02-05 15:56:20 +02:00
Yang Bodong
b7b23a0ff5
Fix GEOSEARCH tcl test error (#8451)
Issue with new test due to longitude wraparound.
2021-02-04 19:39:07 +02:00
Yang Bodong
ded1655d49
GEOSEARCH bybox bug fixes and new fuzzy tester (#8445)
Fix errors of GEOSEARCH bybox search due to:
1. projection of the box to a trapezoid (when the meter box is converted to long / lat it's no longer a box).
2. width and height mismatch

Changes:
- New GEOSEARCH point in rectangle algorithm
- Fix GEOSEARCH bybox width and height mismatch bug
- Add GEOSEARCH bybox testing to the existing "GEOADD + GEORANGE randomized test"
- Add new fuzzy test to stress test the bybox corners and edges
- Add some tests for edge cases of the bybox algorithm

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-02-04 18:08:35 +02:00
Yossi Gottlieb
52fb306535
Fix 32-bit test modules build. (#8448) 2021-02-04 11:37:28 +02:00
Yossi Gottlieb
de6f3ad017
Fix FreeBSD tests and CI Daily issues. (#8438)
* Add bash temporarily to allow sentinel fd leaks test to run.
* Use vmactions-freebsd rdist sync to work around bind permission denied
  and slow execution issues.
* Upgrade to tcl8.6 to be aligned with latest Ubuntu envs.
* Concat all command executions to avoid ignoring failures.
* Skip intensive fuzzer on FreeBSD. For some yet unknown reason, generate_fuzzy_traffic_on_key causes TCL to significantly bloat on FreeBSD resulting with out of memory.
2021-02-03 17:35:28 +02:00
Oran Agra
8f27578de2
temporarily disable sentinel test FD leak print (#8425)
These tests are not yet stable. on github actions they show some false leaks.
2021-01-31 12:14:36 +02:00
Oran Agra
5a7eb9c881
Fix test issues from introduction of HRANDFIELD (#8424)
* The corrupt dump fuzzer found a division by zero.
* in some cases the random fields from the HRANDFIELD tests produced
  fields with newlines and other special chars (due to \ char), this caused
  the TCL tests to see a bulk response that has a newline in it and add {}
  around it, later it can think this is a nested list. in fact the `alpha` random
  string generator isn't using spaces and newlines, so it should not use `\`
  either.
2021-01-31 12:13:45 +02:00
Wen Hui
eacccd2acb
fix sentinel tests error (#8422)
This commit fixes sentinel announces hostnames test error in certain linux environment
Before this commit, we only check localhost is resolved into 127.0.0.1, however in ubuntu
or some other linux environments "localhost" will be resolved into ::1 ipv6 address first if
the network stack is capable.
2021-01-30 11:18:58 +02:00
filipe oliveira
f0c5052aa8
Enabled background and reply time tracking on blocked on keys/blocked on background work clients (#7491)
This commit enables tracking time of the background tasks and on replies,
opening the door for properly tracking commands that rely on blocking / background
 work via the slowlog, latency history, and commandstats. 

Some notes:
- The time spent blocked waiting for key changes, or blocked on synchronous
  replication is not accounted for. 

- **This commit does not affect latency tracking of commands that are non-blocking
  or do not have background work.** ( meaning that it all stays the same with exception to
  `BZPOPMIN`,`BZPOPMAX`,`BRPOP`,`BLPOP`, etc... and module's commands that rely
  on background threads ). 

-  Specifically for latency history command we've added a new event class named
  `command-unblocking` that will enable latency monitoring on commands that spawn
  background threads to do the work.

- For blocking commands we're now considering the total time of a command as the
  time spent on call() + the time spent on replying when unblocked.

- For Modules commands that rely on background threads we're now considering the
  total time of a command as the time spent on call (main thread) + the time spent on
  the background thread ( if marked within `RedisModule_MeasureTimeStart()` and
  `RedisModule_MeasureTimeEnd()` ) + the time spent on replying (main thread)

To test for this feature we've added a `unit/moduleapi/blockonbackground` test that relies on
a module that blocks the client and sleeps on the background for a given time. 
- check blocked command that uses RedisModule_MeasureTimeStart() is tracking background time
- check blocked command that uses RedisModule_MeasureTimeStart() is tracking background time even in timeout
- check blocked command with multiple calls RedisModule_MeasureTimeStart()  is tracking the total background time
- check blocked command without calling RedisModule_MeasureTimeStart() is not reporting background time
2021-01-29 15:38:30 +02:00
Yang Bodong
b9a0500f16
Add HRANDFIELD and ZRANDMEMBER. improvements to SRANDMEMBER (#8297)
New commands:
`HRANDFIELD [<count> [WITHVALUES]]`
`ZRANDMEMBER [<count> [WITHSCORES]]`
Algorithms are similar to the one in SRANDMEMBER.

Both return a simple bulk response when no arguments are given, and an array otherwise.
In case values/scores are requested, RESP2 returns a long array, and RESP3 a nested array.
note: in all 3 commands, the only option that also provides random order is the one with negative count.

Changes to SRANDMEMBER
* Optimization when count is 1, we can use the more efficient algorithm of non-unique random
* optimization: work with sds strings rather than robj

Other changes:
* zzlGetScore: when zset needs to convert string to double, we use safer memcpy (in
  case the buffer is too small)
* Solve a "bug" in SRANDMEMBER test: it intended to test a positive count (case 3 or
  case 4) and by accident used a negative count

Co-authored-by: xinluton <xinluton@qq.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2021-01-29 10:47:28 +02:00
Allen Farris
0d18a1e85f
implement FAILOVER command (#8315)
Implement FAILOVER command, which coordinates failover
between the server and one of its replicas.
2021-01-28 13:18:05 -08:00
Yossi Gottlieb
4bb5ccbefb
Add proc-title-template option. (#8397)
Make it possible to customize the process title, i.e. include custom
strings, immutable configuration like port, tls-port, unix socket name,
etc.
2021-01-28 18:17:39 +02:00
Viktor Söderqvist
4355145a62
Add modules API for streams (#8288)
APIs added for these stream operations: add, delete, iterate and
trim (by ID or maxlength). The functions are prefixed by RM_Stream.

* RM_StreamAdd
* RM_StreamDelete
* RM_StreamIteratorStart
* RM_StreamIteratorStop
* RM_StreamIteratorNextID
* RM_StreamIteratorNextField
* RM_StreamIteratorDelete
* RM_StreamTrimByLength
* RM_StreamTrimByID

The type RedisModuleStreamID is added and functions for converting
from and to RedisModuleString.

* RM_CreateStringFromStreamID
* RM_StringToStreamID

Whenever the stream functions return REDISMODULE_ERR, errno is set to
provide additional error information.

Refactoring: The zset iterator fields in the RedisModuleKey struct
are wrapped in a union, to allow the same space to be used for type-
specific info for streams and allow future use for other key types.
2021-01-28 16:19:43 +02:00
Yossi Gottlieb
bb7cd97439
Add hostname support in Sentinel. (#8282)
This is both a bugfix and an enhancement.

Internally, Sentinel relies entirely on IP addresses to identify
instances. When configured with a new master, it also requires users to
specify and IP and not hostname.

However, replicas may use the replica-announce-ip configuration to
announce a hostname. When that happens, Sentinel fails to match the
announced hostname with the expected IP and considers that a different
instance, triggering reconfiguration, etc.

Another use case is where TLS is used and clients are expected to match
the hostname to connect to with the certificate's SAN attribute. To
properly implement this configuration, it is necessary for Sentinel to
redirect clients to a hostname rather than an IP address.

The new 'resolve-hostnames' configuration parameter determines if
Sentinel is willing to accept hostnames. It is set by default to no,
which maintains backwards compatibility and avoids unexpected DNS
resolution delays on systems with DNS configuration issues.

Internally, Sentinel continues to identify instances by their resolved
IP address and will also report the IP by default. The new
'announce-hostnames' parameter determines if Sentinel should prefer to
announce a hostname, when available, rather than an IP address. This
applies to addresses returned to clients, as well as their
representation in the configuration file, REPLICAOF configuration
commands, etc.

This commit also introduces SENTINEL CONFIG GET and SENTINEL CONFIG SET
which can be used to introspect or configure global Sentinel
configuration that was previously was only possible by directly
accessing the configuration file and possibly restarting the instance.

Co-authored-by: myl1024 <myl92916@qq.com>
Co-authored-by: sundb <sundbcn@gmail.com>
2021-01-28 12:09:11 +02:00
Z. Liu
17b34c7309
Add 'set-proc-title' config so that this mechanism can be disabled (#3623)
if option `set-proc-title' is no, then do nothing for proc title.

The reason has been explained long ago, see following:

We update redis to 2.8.8, then found there are some side effect when
redis always change the process title.

We run several slave instance on one computer, and all these salves
listen on unix socket only, then ps will show:

  1 S redis 18036 1 0 80 0 - 56130 ep_pol 14:02 ? 00:00:31 /usr/sbin/redis-server *:0
  1 S redis 23949 1 0 80 0 - 11074 ep_pol 15:41 ? 00:00:00 /usr/sbin/redis-server *:0

for redis 2.6 the output of ps is like following:

  1 S redis 18036 1 0 80 0 - 56130 ep_pol 14:02 ? 00:00:31 /usr/sbin/redis-server /etc/redis/a.conf
  1 S redis 23949 1 0 80 0 - 11074 ep_pol 15:41 ? 00:00:00 /usr/sbin/redis-server /etc/redis/b.conf

Later is more informational in our case. The situation
is worse when we manage the config and process running
state by salt. Salt check the process by running "ps |
grep SIG" (for Gentoo System) to check the running
state, where SIG is the string to search for when
looking for the service process with ps. Previously, we
define sig as "/usr/sbin/redis-server
/etc/redis/a.conf". Since the ps output is identical for
our case, so we have no way to check the state of
specified redis instance.

So, for our case, we prefer the old behavior, i.e, do
not change the process title for the main redis process.
Or add an option such as "set-proc-title [yes|no]" to
control this behavior.

Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2021-01-28 11:12:39 +02:00
Raghav Muddur
0367a80819
GETEX, GETDEL and SET PXAT/EXAT (#8327)
This commit introduces two new command and two options for an existing command

GETEX <key> [PERSIST][EX seconds][PX milliseconds] [EXAT seconds-timestamp]
[PXAT milliseconds-timestamp]

The getexCommand() function implements extended options and variants of the GET
command. Unlike GET command this command is not read-only. Only one of the options
can be used at a given time.

1. PERSIST removes any TTL associated with the key.
2. EX Set expiry TTL in seconds.
3. PX Set expiry TTL in milliseconds.
4. EXAT Same like EX instead of specifying the number of seconds representing the
    TTL (time to live), it takes an absolute Unix timestamp
5. PXAT Same like PX instead of specifying the number of milliseconds representing the
    TTL (time to live), it takes an absolute Unix timestamp

Command would return either the bulk string, error or nil.

GETDEL <key>
Would delete the key after getting.

SET key value [NX] [XX] [KEEPTTL] [GET] [EX <seconds>] [PX <milliseconds>]
[EXAT <seconds-timestamp>][PXAT <milliseconds-timestamp>]

Two new options added here are EXAT and PXAT

Key implementation notes
- `SET` with `PX/EX/EXAT/PXAT` is always translated to `PXAT` in `AOF`. When relative time is
  specified (`PX/EX`), replication will always use `PX`.
- `setexCommand` and `psetexCommand` would no longer need translation in `feedAppendOnlyFile`
  as they are modified to invoke `setGenericCommand ` with appropriate flags which will take care of
  correct AOF translation.
- `GETEX` without any optional argument behaves like `GET`.
- `GETEX` command is never propagated, It is either propagated as `PEXPIRE[AT], or PERSIST`.
- `GETDEL` command is propagated as `DEL`
- Combined the validation for `SET` and `GETEX` arguments. 
- Test cases to validate AOF/Replication propagation
2021-01-27 19:47:26 +02:00
Oran Agra
9e56d3969a
Add tests for RESP3 responce of ZINTER and ZRANGE (#8391)
It was confusing as to why these don't return a map type.
the reason is that order matters, so we need to make sure the client
library knows to respect it.
Added comments in the implementation and tests to cover it.
2021-01-26 17:55:32 +02:00
Wen Hui
1aad55b66f
Sentinel: Fix Config Dependency and Rewrite Sequence (#8271)
This commit fixes a well known and an annoying issue in Sentinel mode.

Cause of this issue:
Currently, Redis rewrite process works well in server mode, however in sentinel mode,
the sentinel config has variant semantics for different configurations, in example configuration
https://github.com/redis/redis/blob/unstable/sentinel.conf, we put comments on these.
However the rewrite process only treat the sentinel config as a single option. During rewrite
process, it will mess up with the lines and comments.

Approaches:
In order to solve this issue, we need to differentiate different subconfig options in sentinel separately,
for example, sentinel monitor <master-name> <ip> <redis-port> <quorum>
we can treat it as sentinel monitor option, instead of the sentinel option.

This commit also fixes the dependency issue when putting configurations in sentinel.conf.
For example before this commit,we must put
`sentinel monitor <master-name> <ip> <redis-port> <quorum>` before
`sentinel auth-pass <master-name> <password>` for a single master,
otherwise the server cannot start and will return error. This commit fixes this issue, as long as
the monitoring master was configured, no matter the sequence is, the sentinel can start and run properly.
2021-01-26 09:31:54 +02:00
Oran Agra
437e258384
Fix rare test failures due to repl-ping-replica-period (#8393)
some tests use attach_to_replication_stream to watch what's propagated
to replicas, but in some cases the periodic ping may slip in and fail
the test.
we disable that ping by setting the period to once an hour (tests should
not run for that long).

other change is so that the next time this oom-score-adj test fails,
we'll see the value (assert_equals prints it)
2021-01-25 11:05:25 +02:00
Oran Agra
f225891526
Fix recent test failures (#8386)
1. Valgrind leak in a recent change in a module api test
2. Increase treshold of a RESTORE TTL test
3. Change assertions to use assert_range which prints the values
2021-01-23 21:53:58 +02:00
Viktor Söderqvist
9c1483100a
Test that module can wake up module blocked on non-empty list key (#8382)
BLPOP and other blocking list commands can only block on empty keys
and LPUSH only wakes up clients when the list is created.

Using the module API, it's possible to block on a non-empty key.
Unblocking a client blocked on a non-empty list (or zset) can only
be done using RedisModule_SignalKeyAsReady(). This commit tests it.
2021-01-22 16:19:37 +02:00
Andy Pan
8449a5df87
Sentinel tests, disable FD leak check, and print more details (#8376)
* Print more details about fd leaks
* temporarily prevent the leaks from failing the tests

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-01-22 12:11:58 +02:00
guybe7
5a77d015be
Fix misleading module test (#8366)
the test was misleading because the module would actually woke up on a wrong type and
re-blocked, while the test name suggests the module doesn't not wake up at all on a wrong type..

i changed the name of the test + added verification that indeed the module wakes up and gets
re-blocked after it understand it's the wrong type
2021-01-20 14:03:38 +02:00
Andy Pan
6401920d70
Fix sentinel FD leak test, checking the wrong OS name (#8364) 2021-01-20 10:17:20 +02:00
Andy Pan
1be29606c5
Fix sentinel FD leak test, not printing the list of leaks (#8363) 2021-01-20 09:58:02 +02:00
Andy Pan
fb66e2e249
Use FD_CLOEXEC in Sentinel, so that FDs don't leak to the scripts it runs (#8242)
Sentinel uses execve to run scripts, so it needs to use FD_CLOEXEC
on all file descriptors, so that they're not accessible by the script it runs.

This commit includes a change to the sentinel tests, which verifies no
FDs are left opened when the script is executed.
2021-01-19 22:57:30 +02:00
Oran Agra
a29aec9abb
Add tests to make sure that relative EXPIRE is propagated to replicas (#8357)
This commit adds tests to make sure that relative and absolute expire commands
are propagated as is to replicas and stop any future attempt to change that without
a proper discussion. see #8327 and #5171

Additionally it slightly improve the AOF test that tests the opposite (always
propagating absolute times), by covering more commands, and shaving 2
seconds from the test time.
2021-01-19 18:49:26 +02:00
Viktor Söderqvist
4985c11bd6
Bugfix: Make modules blocked on keys unblock on commands like LPUSH (#8356)
This was a regression from #7625 (only in 6.2 RC2).

This makes it possible again to implement blocking list and zset
commands using the modules API.

This commit also includes a test case for the reverse: A module
unblocks a client blocked on BLPOP by inserting elements using
RedisModule_ListPush(). This already works, but it was untested.
2021-01-19 13:15:33 +02:00
Yossi Gottlieb
522d93607a
Add io-thread daily CI tests. (#8232)
This adds basic coverage to IO threads by running the cluster and few selected Redis test suite tests with the IO threads enabled.

Also provides some necessary additional improvements to the test suite:

* Add --config to sentinel/cluster tests for arbitrary configuration.
* Fix --tags whitelisting which was broken.
* Add a `network` tag to some tests that are more network intensive. This is work in progress and more tests should be properly tagged in the future.
2021-01-17 15:48:48 +02:00
Yang Bodong
294f93af97
Add lazyfree-lazy-user-flush config to control default behavior of FLUSH[ALL|DB], SCRIPT FLUSH (#8258)
* Adds ASYNC and SYNC arguments to SCRIPT FLUSH
* Adds SYNC argument to FLUSHDB and FLUSHALL
* Adds new config to control the default behavior of FLUSHDB, FLUSHALL and SCRIPT FLUASH.

the new behavior is as follows:
* FLUSH[ALL|DB],SCRIPT FLUSH: Determine sync or async according to the
  value of lazyfree-lazy-user-flush.
* FLUSH[ALL|DB],SCRIPT FLUSH ASYNC: Always flushes the database in an async manner.
* FLUSH[ALL|DB],SCRIPT FLUSH SYNC: Always flushes the database in a sync manner.
2021-01-15 15:32:58 +02:00
Wang Yuan
9cb9f98d2f
Optimize performance of clusterGenNodesDescription for large clusters (#8182)
Optimize the performance of clusterGenNodesDescription by only checking slot ownership of each slot once, instead of checking each slot for each node.
2021-01-13 12:36:03 -08:00
Oran Agra
4f8458d8d6
fix race in cluster transactions test (#8312)
we didn't wait for the commands executed on the master to reach the replica.
2021-01-12 10:03:45 +02:00
Madelyn Olson
b24b490393
Fix issues in wait test (#8310)
This fixes three issues:
1.  Using debug SLEEP was impacting the subsequent test, and causing it to pass reliably even though it should have failed. There was exactly 5 seconds of artificial pause (after 1000, wait 3000, wait 1000) between the debug sleep 5 and when we needed to unblock the client in the subsequent test. Now the test properly makes sure the client is unblocked, and the subsequent test is fixed.
2. Minor, the client pause types were using & comparisons instead of ==, since it was previously a flag.
3. Test is faster now that some of the hand wavy time is removed.
2021-01-12 09:46:24 +02:00
Oran Agra
264953871b
Fix cluster diskless load swapdb test (#8308)
The test was trying to wait for the replica to start loading the rdb
from the master before it kills the master, but it was actually waiting
for ROLE to be in "sync" mode, which corresponds to REPL_STATE_TRANSFER
that starts before the actual loading starts.
now instead it waits for the loading flag to be set.

Besides, the test was dependent on the previous configuration of the
servers, relying on the fact the replica is configured to persist
(either RDB of AOF), now it is set explicitly.
2021-01-12 09:41:57 +02:00
Oran Agra
8dd16caec8
Fix last COW INFO report, Skip test on non-linux platforms (#8301)
- the last COW report wasn't always read from the pipe
  (receiveLastChildInfo wasn't used)
- but in fact, there's no reason we won't always try to drain that pipe
  so i'm unifying receiveLastChildInfo with receiveChildInfo
- adjust threshold of the COW test when run in accurate mode
- add some prints in case this test fails again
- fix indentation, page size, and PID! in MacOS proc info

p.s. it seems that pri_pages_dirtied is always 0
2021-01-08 23:35:30 +02:00
Yang Bodong
ea5350c5ec
GEOSEARCH - ANY option, for limited search that returns ASAP (#8259)
Support ANY option to return some results that match the criteria ASAP,
without a complete search and implicit sorting.
2021-01-08 18:29:44 +02:00
guybe7
814aad65f1
XADD and XTRIM, Trim by MINID, and new LIMIT argument (#8169)
This PR adds another trimming strategy to XADD and XTRIM named MINID
(complements the existing MAXLEN).
It also adds a new LIMIT argument that allows incremental trimming by repeated
calls (rather than all at once).

This provides the ability to trim all records older than a certain ID (which makes it
possible for the user to trim by age too).
Example:
XTRIM mystream MINID ~ 1608540753 will trim entries with id < 1608540753,
but might not trim all (because of the ~ modifier)

The purpose is to ease the use of streams. many users use streams as logs and
the common case is wanting a log
of the last X seconds rather than a log that contains maximum X entries (new
MINID vs existing MAXLEN)

The new LIMIT modifier is only supported when the trim strategy uses ~.
i.e. when the user asked for exact trimming, it all happens in one go (no
possibility for incremental trimming).
However, when ~ is provided, we trim full rax nodes, up to the limit number
of records.
The default limit is 100*stream_node_max_entries (used when LIMIT is not
provided).
I.e. this is a behavior change (even if the existing MAXLEN strategy is used).
An explicit limit of 0 means unlimited (but note that it's not the default).

Other changes:

Refactor arg parsing code for XADD and XTRIM to use common code.
2021-01-08 18:13:25 +02:00
Oran Agra
5843a45d01
Skip defrag tests on systems with bigger page sizes (#8294)
The defragger works well on these systems, but the tests and their
thresholds are not adjusted for these big pages, so the defragger isn't
able to get down the fragmentation to the levels the test expects and it
fails on "defrag didn't stop".

Randomly choosing 8k as the threshold for the skipping

Fixes #8265 (which had 65k pages)
2021-01-08 10:03:21 +02:00
Madelyn Olson
999494cef8
Throw error for conflicting bcast tracking prefixes (#8176)
Throw an error if there are conflicting bcast tracking prefixes.
2021-01-08 00:00:35 -08:00
Madelyn Olson
47579bdf5c
Add support for client pause WRITE (#8170)
Implementation of client pause WRITE and client unpause
2021-01-07 23:36:54 -08:00
YaacovHazan
ea930a352c Report child copy-on-write info continuously
Add INFO field, rdb_active_cow_size, to report COW of a live fork child while
it's active.
- once in 1024 keys check the time, and if there's more than one second since
  the last report send a report to the parent via the pipe.
- refactor the child_info_data struct, it's an implementation detail that
  shouldn't be in the server struct, and not used to communicate data between
  caller and callee
- remove the magic value from that struct (not sure what it was good for), and
  instead add handling of short reads.
- add another value to the structure, cow_type, to indicate if the report is
  for the new rdb_active_cow_size field, or it's the last report of a
  successful operation
- add new Module API to report the active COW
- add more asserts variants to test.tcl
2021-01-07 16:14:29 +02:00
Jonah H. Harris
b5029dfdad
Add ZRANGESTORE command, and improve ZSTORE command (#7844)
Add ZRANGESTORE command, and improve ZSTORE command to deprecated Z[REV]RANGE[BYSCORE|BYLEX].

Syntax for the new ZRANGESTORE command:
ZRANGESTORE [BYSCORE | BYLEX] [REV] [LIMIT offset count]

New syntax for ZRANGE:
ZRANGE [BYSCORE | BYLEX] [REV] [WITHSCORES] [LIMIT offset count]

Old syntax for ZRANGE:
ZRANGE [WITHSCORES]

Other ZRANGE commands remain unchanged.

The implementation uses common code for all of these, by utilizing a consumer interface that in one
command response to the client, and in the other command stores a zset key.

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-01-07 10:58:53 +02:00
guybe7
714e103ac3
Add XAUTOCLAIM (#7973)
New command: XAUTOCLAIM <key> <group> <consumer> <min-idle-time> <start> [COUNT <count>] [JUSTID]

The purpose is to claim entries from a stale consumer without the usual
XPENDING+XCLAIM combo which takes two round trips.

The syntax for XAUTOCLAIM is similar to scan: A cursor is returned (streamID)
by each call and should be used as start for the next call. 0-0 means the scan is complete.

This PR extends the deferred reply mechanism for any bulk string (not just counts)

This PR carries some unrelated test code changes:
- Renames the term "client" into "consumer" in the stream-cgroups test
- And also changes DEBUG SLEEP into "after"

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-01-06 10:34:27 +02:00
Oran Agra
2017407b4d
Fix wrong order of key/value in Lua map response (#8266)
When a Lua script returns a map to redis (a feature which was added in
redis 6 together with RESP3), it would have returned the value first and
the key second.

If the client was using RESP2, it was getting them out of order, and if
the client was in RESP3, it was getting a map of value => key.
This was happening regardless of the Lua script using redis.setresp(3)
or not.

This also affects a case where the script was returning a map which it got
from from redis by doing something like: redis.setresp(3); return redis.call()

This fix is a breaking change for redis 6.0 users who happened to rely
on the wrong order (either ones that used redis.setresp(3), or ones that
returned a map explicitly).

This commit also includes other two changes in the tests:
1. The test suite now handles RESP3 maps as dicts rather than nested
   lists
2. Remove some redundant (duplicate) tests from tracking.tcl
2021-01-05 08:29:20 +02:00
Yang Bodong
10f94b0ab1
Swapdb should make transaction fail if there is any client watching keys (#8239)
This PR not only fixes the problem that swapdb does not make the
transaction fail, but also optimizes the FLUSHALL and FLUSHDB command to
set the CLIENT_DIRTY_CAS flag to avoid unnecessary traversal of clients.

FLUSHDB was changed to first iterate on all watched keys, and then on the
clients watching each key.
Instead of iterating though all clients, and for each iterate on watched keys.

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-01-04 14:48:28 +02:00
kukey
33fb617053
GEOADD - add [CH] [NX|XX] options (#8227)
New command flags similar to what SADD already has.

Co-authored-by: huangwei03 <huangwei03@kuaishou.com>
Co-authored-by: Itamar Haber <itamar@redislabs.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2021-01-03 17:13:37 +02:00
Oran Agra
71fbe6e800
Fix leak in new errorstats commit, and a flaky test (#8278) 2021-01-02 08:37:19 +02:00
filipe oliveira
90b9f08e5d
Add errorstats info section, Add failed_calls and rejected_calls to commandstats (#8217)
This Commit pushes forward the observability on overall error statistics and command statistics within redis-server:

It extends INFO COMMANDSTATS to have
- failed_calls in - so we can keep track of errors that happen from the command itself, broken by command.
- rejected_calls - so we can keep track of errors that were triggered outside the commmand processing per se

Adds a new section to INFO, named ERRORSTATS that enables keeping track of the different errors that
occur within redis ( within processCommand and call ) based on the reply Error Prefix ( The first word
after the "-", up to the first space ).

This commit also fixes RM_ReplyWithError so that it can be correctly identified as an error reply.
2020-12-31 16:53:43 +02:00
Oran Agra
19d4705ffd
Make the protocol-version argument of HELLO optional (#7377) 2020-12-27 16:37:27 +02:00
zhaozhao.zz
299f9ebffa
Tracking: add CLIENT TRACKINGINFO subcommand (#7309)
Add CLIENT TRACKINGINFO subcommand

Co-authored-by: Oran Agra <oran@redislabs.com>
2020-12-27 13:14:39 +02:00
Itamar Haber
f44186e575
Adds count to L/RPOP (#8179)
Adds: `L/RPOP <key> [count]`

Implements no. 2 of the following strategies:

1. Loop on listTypePop - this would result in multiple calls for memory freeing and allocating (see 769167a079)
2. Iterate the range to build the reply, then call quickListDelRange - this requires two iterations and **is the current choice**
3. Refactor quicklist to have a pop variant of quickListDelRange - probably optimal but more complex

Also:
* There's a historical check for NULL after calling listTypePop that was converted to an assert.
* This refactors common logic shared between LRANGE and the new form of LPOP/RPOP into addListRangeReply (adds test for b/w compat)
* Consequently, it may have made sense to have `LRANGE l -1 -2` and `LRANGE l 9 0` be legit and return a reverse reply. Due to historical reasons that would be, however, a breaking change.
* Added minimal comments to existing commands to adhere to the style, make core dev life easier and get commit karma, naturally.
2020-12-25 21:49:24 +02:00
Oran Agra
4617960863
resolve hung test. 2020-12-24 14:33:53 +02:00
xhe
ef14c18c8e fix the test
Signed-off-by: xhe <xw897002528@gmail.com>
2020-12-24 17:31:50 +08:00
xhe
60f13e7a86 try to fix the test
Signed-off-by: xhe <xw897002528@gmail.com>
2020-12-24 16:50:08 +08:00
xhe
7a7c60459e add a test
Signed-off-by: xhe <xw897002528@gmail.com>
2020-12-24 15:26:24 +08:00
Yang Bodong
ee59dc1b5c
Tests: fix the problem that Darwin memory leak detection may fail (#8213)
Apparently the "leaks" took reports a different error string about process
that's not found in each version of MacOS.
This cause the test suite to fail on some OS versions, since some tests terminate
the process before looking for leaks.
Instead of looking at the error string, we now look at the (documented) exit code.
2020-12-23 16:28:17 +02:00
Oran Agra
411c18bbce
Remove read-only flag from non-keyspace cmds, different approach for EXEC to propagate MULTI (#8216)
In the distant history there was only the read flag for commands, and whatever
command that didn't have the read flag was a write one.
Then we added the write flag, but some portions of the code still used !read
Also some commands that don't work on the keyspace at all, still have the read
flag.

Changes in this commit:
1. remove the read-only flag from TIME, ECHO, ROLE and LASTSAVE

2. EXEC command used to decides if it should propagate a MULTI by looking at
   the command flags (!read & !admin).
   When i was about to change it to look at the write flag instead, i realized
   that this would cause it not to propagate a MULTI for PUBLISH, EVAL, and
   SCRIPT, all 3 are not marked as either a read command or a write one (as
   they should), but all 3 are calling forceCommandPropagation.

   So instead of introducing a new flag to denote a command that "writes" but
   not into the keyspace, and still needs propagation, i decided to rely on
   the forceCommandPropagation, and just fix the code to propagate MULTI when
   needed rather than depending on the command flags at all.

   The implication of my change then is that now it won't decide to propagate
   MULTI when it sees one of these: SELECT, PING, INFO, COMMAND, TIME and
   other commands which are neither read nor write.

3. Changing getNodeByQuery and clusterRedirectBlockedClientIfNeeded in
   cluster.c to look at !write rather than read flag.
   This should have no implications, since these code paths are only reachable
   for commands which access keys, and these are always marked as either read
   or write.

This commit improve MULTI propagation tests, for modules and a bunch of
other special cases, all of which used to pass already before that commit.
the only one that test change that uncovered a change of behavior is the
one that DELs a non-existing key, it used to propagate an empty
multi-exec block, and no longer does.
2020-12-22 12:03:49 +02:00
Qu Chen
f48afb4710
Handle binary safe string for REQUIREPASS and MASTERAUTH directives (#8200)
* Handle binary safe string for REQUIREPASS and MASTERAUTH directives.
2020-12-17 09:26:33 -08:00
Itamar Haber
9acd40d97b
GEOSEARCH: change 'FROMLOC' to 'FROMLONLAT' (#8190)
And formats style a tiniee-winiee bit
2020-12-14 17:15:12 +02:00
Oran Agra
cfb449cc80
Sanitize dump payload: excessive free on dup zset fields (#8189) 2020-12-14 17:10:31 +02:00
Oran Agra
7d9b09adaa
Tests: fix new defrag test to be skipped when not supported (#8185)
Additionally the older defrag tests are using an obsolete way to check
if the defragger is suuported (the error no longer contains "DISABLED").
this doesn't usually makes a difference since these tests are completely
skipped if the allocator is not jemalloc, but that would fail if the
allocator is a jemalloc that doesn't support defrag.
2020-12-14 11:13:46 +02:00
Yossi Gottlieb
86e3395c11
Several (mostly Solaris-related) cleanups (#8171)
* Allow runtest-moduleapi use a different 'make', for systems where GNU Make is 'gmake'.
* Fix issue with builds on Solaris re-building everything from scratch due to CFLAGS/LDFLAGS not stored.
* Fix compile failure on Solaris due to atomicvar and a bunch of warnings.
* Fix garbled log timestamps on Solaris.
2020-12-13 17:09:54 +02:00
Oran Agra
ab60dcf564
Add module event for repl-diskless-load swapdb (#8153)
When a replica uses the diskless-load swapdb approach, it backs up the old database,
then attempts to load a new one, and in case of failure, it restores the backup.

this means that modules with global out of keyspace data, must have an option to
subscribe to events and backup/restore/discard their global data too.
2020-12-13 14:36:06 +02:00
Yossi Gottlieb
63c1303cfb
Modules: add defrag API support. (#8149)
Add a new set of defrag functions that take a defrag context and allow
defragmenting memory blocks and RedisModuleStrings.

Modules can register a defrag callback which will be invoked when the
defrag process handles globals.

Modules with custom data types can also register a datatype-specific
defrag callback which is invoked for keys that require defragmentation.
The callback and associated functions support both one-step and
multi-step options, depending on the complexity of the key as exposed by
the free_effort callback.
2020-12-13 09:56:01 +02:00
杨博东
4d06d99bf8
Add GEOSEARCH / GEOSEARCHSTORE commands (#8094)
Add commands to query geospatial data with bounding box.

Two new commands that replace the existing 4 GEORADIUS* commands.

GEOSEARCH key [FROMMEMBER member] [FROMLOC long lat] [BYRADIUS radius
unit] [BYBOX width height unit] [WITHCORD] [WITHDIST] [WITHASH] [COUNT
count] [ASC|DESC]

GEOSEARCHSTORE dest_key src_key [FROMMEMBER member] [FROMLOC long lat]
[BYRADIUS radius unit] [BYBOX width height unit] [WITHCORD] [WITHDIST]
[WITHASH] [COUNT count] [ASC|DESC] [STOREDIST]

- Add two types of CIRCULAR_TYPE and RECTANGLE_TYPE to achieve different searches
- Judge whether the point is within the rectangle, refer to:
geohashGetDistanceIfInRectangle
2020-12-12 02:21:05 +02:00
Yossi Gottlieb
8c291b97b9
TLS: Add different client cert support. (#8076)
This adds a new `tls-client-cert-file` and `tls-client-key-file`
configuration directives which make it possible to use different
certificates for the TLS-server and TLS-client functions of Redis.

This is an optional directive. If it is not specified the `tls-cert-file`
and `tls-key-file` directives are used for TLS client functions as well.

Also, `utils/gen-test-certs.sh` now creates additional server-only and client-only certs and will skip intensive operations if target files already exist.
2020-12-11 18:31:40 +02:00
Yossi Gottlieb
4e064fbab4
Add module data-type support for COPY. (#8112)
This adds a copy callback for module data types, in order to make
modules compatible with the new COPY command.

The callback is optional and COPY will fail for keys with data types
that do not implement it.
2020-12-09 20:22:45 +02:00
Oran Agra
48efc25f74
Handle output buffer limits for Module blocked clients (#8141)
Module blocked clients cache the response in a temporary client,
the reply list in this client would be affected by the recent fix
in #7202, but when the reply is later copied into the real client,
it would have bypassed all the checks for output buffer limit, which
would have resulted in both: responding with a partial response to
the client, and also not disconnecting it at all.
2020-12-08 16:41:20 +02:00
Oran Agra
a102b21d17
Improve stability of new CSC eviction test (#8160)
c4fdf09c0 added a test that now fails with valgrind
it fails for two resons:
1) the test samples the used memory and then limits the maxmemory to
   that value, but it turns out this is not atomic and on slow machines
   the background cron process that clean out old query buffers reduces
   the memory so that the setting doesn't cause eviction.
2) the dbsize was tested late, after reading some invalidation messages
   by that time more and more keys got evicted, partially draining the
   db. this is not the focus of this fix (still a known limitation)
2020-12-08 16:33:09 +02:00
Wang Yuan
1acc315cea
Minor improvements for list-2 test (#8156)
had some unused variables.
now some are used to assert that they match, others were useless.
2020-12-08 16:26:38 +02:00
Yossi Gottlieb
00db1b5579
Fix failing macOS tests due to wc differences. (#8161) 2020-12-08 16:22:16 +02:00
Itamar Haber
37f45d9e56
Adds exclusive range query intervals to XPENDING (#8130) 2020-12-08 11:43:00 +02:00
guybe7
6bb5503524
More efficient self-XCLAIM (#8098)
when the same consumer re-claim an entry that it already has, there's
no need to remove-and-insert if it's the same rax.
we do need to update the idle time though.
this commit only improves efficiency (doesn't change behavior).
2020-12-07 21:31:35 +02:00
Yossi Gottlieb
bccbc5509a
Add CLIENT INFO and CLIENT LIST [id]. (#8113)
* Add CLIENT INFO subcommand.

The output is identical to CLIENT LIST but provides a single line for
the current client only.

* Add CLIENT LIST ID [id...].

Co-authored-by: Itamar Haber <itamar@redislabs.com>
2020-12-07 14:24:05 +02:00
Oran Agra
7ca00d694d Sanitize dump payload: fail RESTORE if memory allocation fails
When RDB input attempts to make a huge memory allocation that fails,
RESTORE should fail gracefully rather than die with panic
2020-12-06 14:54:34 +02:00
Oran Agra
3716950cfc Sanitize dump payload: validate no duplicate records in hash/zset/intset
If RESTORE passes successfully with full sanitization, we can't affort
to crash later on assertion due to duplicate records in a hash when
converting it form ziplist to dict.
This means that when doing full sanitization, we must make sure there
are no duplicate records in any of the collections.
2020-12-06 14:54:34 +02:00
Oran Agra
5b44631397 testsuite: fix fd leak, prevent port clashing when using --baseport
when using --baseport to run two tests suite in parallel (different
folders), we need to also make sure the port used by the testsuite to
communicate with it's workers is unique. otherwise the attept to find a
free port connects to the other test suite and messes it.

maybe one day we need to attempt to bind, instead of connect when tring
to find a free port.
2020-12-06 14:54:34 +02:00
Oran Agra
c31055db61 Sanitize dump payload: fuzz tester and fixes for segfaults and leaks it exposed
The test creates keys with various encodings, DUMP them, corrupt the payload
and RESTORES it.
It utilizes the recently added use-exit-on-panic config to distinguish between
 asserts and segfaults.
If the restore succeeds, it runs random commands on the key to attempt to
trigger a crash.

It runs in two modes, one with deep sanitation enabled and one without.
In the first one we don't expect any assertions or segfaults, in the second one
we expect assertions, but no segfaults.
We also check for leaks and invalid reads using valgrind, and if we find them
we print the commands that lead to that issue.

Changes in the code (other than the test):
- Replace a few NPD (null pointer deference) flows and division by zero with an
  assertion, so that it doesn't fail the test. (since we set the server to use
  `exit` rather than `abort` on assertion).
- Fix quite a lot of flows in rdb.c that could have lead to memory leaks in
  RESTORE command (since it now responds with an error rather than panic)
- Add a DEBUG flag for SET-SKIP-CHECKSUM-VALIDATION so that the test don't need
  to bother with faking a valid checksum
- Remove a pile of code in serverLogObjectDebugInfo which is actually unsafe to
  run in the crash report (see comments in the code)
- fix a missing boundary check in lzf_decompress

test suite infra improvements:
- be able to run valgrind checks before the process terminates
- rotate log files when restarting servers
2020-12-06 14:54:34 +02:00
Oran Agra
01c13bddea Sanitize dump payload: improve tests of ziplist and stream encodings
- improve stream rdb encoding test to include more types of stream metadata
- add test to cover various ziplist encoding entries (although it does
  look like the stress test above it is able to find some too
- add another test for ziplist encoding for hash with full sanitization
- add similar ziplist encoding tests for list
2020-12-06 14:54:34 +02:00
Oran Agra
ca1c182567 Sanitize dump payload: ziplist, listpack, zipmap, intset, stream
When loading an encoded payload we will at least do a shallow validation to
check that the size that's encoded in the payload matches the size of the
allocation.
This let's us later use this encoded size to make sure the various offsets
inside encoded payload don't reach outside the allocation, if they do, we'll
assert/panic, but at least we won't segfault or smear memory.

We can also do 'deep' validation which runs on all the records of the encoded
payload and validates that they don't contain invalid offsets. This lets us
detect corruptions early and reject a RESTORE command rather than accepting
it and asserting (crashing) later when accessing that payload via some command.

configuration:
- adding ACL flag skip-sanitize-payload
- adding config sanitize-dump-payload [yes/no/clients]

For now, we don't have a good way to ensure MIGRATE in cluster resharding isn't
being slowed down by these sanitation, so i'm setting the default value to `no`,
but later on it should be set to `clients` by default.

changes:
- changing rdbReportError not to `exit` in RESTORE command
- adding a new stat to be able to later check if cluster MIGRATE isn't being
  slowed down by sanitation.
2020-12-06 14:54:34 +02:00
Oran Agra
c4fdf09c05
prevent client tracking from causing feedback loop in performEvictions (#8100)
When client tracking is enabled signalModifiedKey can increase memory usage,
this can cause the loop in performEvictions to keep running since it was measuring
the memory usage impact of signalModifiedKey.

The section that measures the memory impact of the eviction should be just on dbDelete,
excluding keyspace notification, client tracking, and propagation to AOF and replicas.

This resolves part of the problem described in #8069
p.s. fix took 1 minute, test took about 3 hours to write.
2020-12-06 14:51:22 +02:00
guybe7
1df5bb5687
Make sure we do not propagate nested MULTI/EXEC (#8097)
One way this was happening is when a module issued an RM_Call which would inject MULTI.
If the module command that does that was itself issued by something else that already did
added MULTI (e.g. another module, or a Lua script), it would have caused nested MULTI.

In fact the MULTI state in the client or the MULTI_EMITTED flag in the context isn't
the right indication that we need to propagate MULTI or not, because on a nested calls
(possibly a module action called by a keyspace event of another module action), these
flags aren't retained / reflected.

instead there's now a global propagate_in_transaction flag for that.

in addition to that, we now have a global in_eval and in_exec flags, to serve the flags
of RM_GetContextFlags, since their dependence on the current client is wrong for the same
reasons mentioned above.
2020-12-06 13:14:18 +02:00
Wang Yuan
75f9dec644
Limit the main db and expires dictionaries to expand (#7954)
As we know, redis may reject user's requests or evict some keys if
used memory is over maxmemory. Dictionaries expanding may make
things worse, some big dictionaries, such as main db and expires dict,
may eat huge memory at once for allocating a new big hash table and be
far more than maxmemory after expanding.
There are related issues: #4213 #4583

More details, when expand dict in redis, we will allocate a new big
ht[1] that generally is double of ht[0], The size of ht[1] will be
very big if ht[0] already is big. For db dict, if we have more than
64 million keys, we need to cost 1GB for ht[1] when dict expands.

If the sum of used memory and new hash table of dict needed exceeds
maxmemory, we shouldn't allow the dict to expand. Because, if we
enable keys eviction, we still couldn't add much more keys after
eviction and rehashing, what's worse, redis will keep less keys when
redis only remains a little memory for storing new hash table instead
of users' data. Moreover users can't write data in redis if disable
keys eviction.

What this commit changed ?

Add a new member function expandAllowed for dict type, it provide a way
for caller to allow expand or not. We expose two parameters for this
function: more memory needed for expanding and dict current load factor,
users can implement a function to make a decision by them.
For main db dict and expires dict type, these dictionaries may be very
big and cost huge memory for expanding, so we implement a judgement
function: we can stop dict to expand provisionally if used memory will
be over maxmemory after dict expands, but to guarantee the performance
of redis, we still allow dict to expand if dict load factor exceeds the
safe load factor.
Add test cases to verify we don't allow main db to expand when left
memory is not enough, so that avoid keys eviction.

Other changes:

For new hash table size when expand. Before this commit, the size is
that double used of dict and later _dictNextPower. Actually we aim to
control a dict load factor between 0.5 and 1.0. Now we replace *2 with
+1, since the first check is that used >= size, the outcome of before
will usually be the same as _dictNextPower(used+1). The only case where
it'll differ is when dict_can_resize is false during fork, so that later
the _dictNextPower(used*2) will cause the dict to jump to *4 (i.e.
_dictNextPower(1025*2) will return 4096).
Fix rehash test cases due to changing algorithm of new hash table size
when expand.
2020-12-06 11:53:04 +02:00
Itamar Haber
441c490024
Adds exclusive ranges to X[REV]RANGE (#8072)
Adds the ability to use exclusive (open) start and end query intervals in XRANGE and XREVRANGE queries.

Fixes #6562
2020-12-03 14:36:48 +02:00
Itamar Haber
7459652e3e
Fix ACL Pub/Sub test timings (#8122) 2020-12-02 17:24:27 +02:00
Wang Yuan
b55a827ea2
Backup keys to slots map and restore when fail to sync if diskless-load type is swapdb in cluster mode (#8108)
When replica diskless-load type is swapdb in cluster mode, we didn't backup
keys to slots map, so we will lose keys to slots map if fail to sync.
Now we backup keys to slots map at first, and restore it properly when fail.

This commit includes a refactory/cleanup of the backups mechanism (moving it to db.c and re-structuring it a bit).

Co-authored-by: Oran Agra <oran@redislabs.com>
2020-12-02 13:56:11 +02:00
luhuachao
7885faf18b
Modify help msg PING_BULK to PING_MBULK in benchmark (#8109)
As described in redis-benchamrk help message 'The test names are the same as the ones produced as output.', In redis-benchmark output, we can only see PING_BULK, but the cmd `redis-benchmark -t ping_bulk` is not supported. We  have to run it with ping_mbulk which is not user friendly.
2020-12-02 13:17:25 +02:00
Madelyn Olson
69b7113bb5
Getset fix (#8118)
* Fixed SET GET executing on wrong type

Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2020-12-01 11:46:45 -08:00
sundb
3ba2281f96
Improve dbid range check for SELECT, MOVE, COPY (#8085)
SELECT used to read the index into a `long` variable, and then pass it to a function
that takes an `int`, possibly causing an overflow before the range check.

Now all these commands use better and cleaner range check, and that also results in
a slight change of the error response in case of an invalid database index.

SELECT:
in the past it would have returned either `-ERR invalid DB index` (if not a number),
or `-ERR DB index is out of range` (if not between 1..16 or alike).
now it'll return either `-ERR value is out of range` (if not a number), or
`-ERR value is out of range, value must between -2147483648 and 2147483647`
(if not in the range for an int), or `-ERR DB index is out of range`
(if not between 0..16 or alike)


MOVE:
in the past it would only fail with `-ERR index out of range` no matter the reason.
now return the same errors as the new ones for SELECT mentioned above.
(i.e. unlike for SELECT even for a value like 17 we changed the error message)

COPY:
doesn't really matter how it behaved in the past (new command), new behavior is
like the above two.
2020-12-01 21:41:26 +02:00
Itamar Haber
c1b1e8c329
Adds pub/sub channel patterns to ACL (#7993)
Fixes #7923.

This PR appropriates the special `&` symbol (because `@` and `*` are taken),
followed by a literal value or pattern for describing the Pub/Sub patterns that
an ACL user can interact with. It is similar to the existing key patterns
mechanism in function (additive) and implementation (copy-pasta). It also adds
the allchannels and resetchannels ACL keywords, naturally.

The default user is given allchannels permissions, whereas new users get
whatever is defined by the acl-pubsub-default configuration directive. For
backward compatibility in 6.2, the default of this directive is allchannels but
this is likely to be changed to resetchannels in the next major version for
stronger default security settings.

Unless allchannels is set for the user, channel access permissions are checked
as follows :
* Calls to both PUBLISH and SUBSCRIBE will fail unless a pattern matching the
  argumentative channel name(s) exists for the user.
* Calls to PSUBSCRIBE will fail unless the pattern(s) provided as an argument
  literally exist(s) in the user's list.

Such failures are logged to the ACL log.

Runtime changes to channel permissions for a user with existing subscribing
clients cause said clients to disconnect unless the new permissions permit the
connections to continue. Note, however, that PSUBSCRIBErs' patterns are matched
literally, so given the change bar:* -> b*, pattern subscribers to bar:* will be
disconnected.

Notes/questions:
* UNSUBSCRIBE, PUNSUBSCRIBE and PUBSUB remain unprotected due to lack of reasons
  for touching them.
2020-12-01 14:21:39 +02:00
guybe7
ada2ac9ae2
XPENDING with IDLE (#7972)
Used to filter stream pending entries by their idle-time,
useful for XCLAIMing entries that have not been processed
for some time
2020-11-29 12:08:47 +02:00
Itamar Haber
0963edbc65
Adds tests for XADD/XTRIM's MAXLEN arguments (#8083) 2020-11-23 14:37:58 +02:00
guybe7
f8ae991717
EXISTS should not alter LRU, OBJECT should not reveal expired keys on replica (#8016)
The bug was introduced by #5021 which only attempted avoid EXIST on an
already expired key from returning 1 on a replica.

Before that commit, dbExists was used instead of
lookupKeyRead (which had an undesired effect to "touch" the LRU/LFU)

Other than that, this commit fixes OBJECT to also come empty handed on
expired keys in replica.

And DEBUG DIGEST-VALUE to behave like DEBUG OBJECT (get the data from
the key regardless of it's expired state)
2020-11-18 11:16:21 +02:00
Meir Shpilraien (Spielrein)
d87a0d0286
Unified MULTI, LUA, and RM_Call with respect to blocking commands (#8025)
Blocking command should not be used with MULTI, LUA, and RM_Call. This is because,
the caller, who executes the command in this context, expects a reply.

Today, LUA and MULTI have a special (and different) treatment to blocking commands:

LUA   - Most commands are marked with no-script flag which are checked when executing
and command from LUA, commands that are not marked (like XREAD) verify that their
blocking mode is not used inside LUA (by checking the CLIENT_LUA client flag).
MULTI - Command that is going to block, first verify that the client is not inside
multi (by checking the CLIENT_MULTI client flag). If the client is inside multi, they
return a result which is a match to the empty key with no timeout (for example blpop
inside MULTI will act as lpop)
For modules that perform RM_Call with blocking command, the returned results type is
REDISMODULE_REPLY_UNKNOWN and the caller can not really know what happened.

Disadvantages of the current state are:

No unified approach, LUA, MULTI, and RM_Call, each has a different treatment
Module can not safely execute blocking command (and get reply or error).
Though It is true that modules are not like LUA or MULTI and should be smarter not
to execute blocking commands on RM_Call, sometimes you want to execute a command base
on client input (for example if you create a module that provides a new scripting
language like javascript or python).
While modules (on modules command) can check for REDISMODULE_CTX_FLAGS_LUA or
REDISMODULE_CTX_FLAGS_MULTI to know not to block the client, there is no way to
check if the command came from another module using RM_Call. So there is no way
for a module to know not to block another module RM_Call execution.

This commit adds a way to unify the treatment for blocking clients by introducing
a new CLIENT_DENY_BLOCKING client flag. On LUA, MULTI, and RM_Call the new flag
turned on to signify that the client should not be blocked. A blocking command
verifies that the flag is turned off before blocking. If a blocking command sees
that the CLIENT_DENY_BLOCKING flag is on, it's not blocking and return results
which are matches to empty key with no timeout (as MULTI does today).

The new flag is checked on the following commands:

List blocking commands: BLPOP, BRPOP, BRPOPLPUSH, BLMOVE,
Zset blocking commands: BZPOPMIN, BZPOPMAX
Stream blocking commands: XREAD, XREADGROUP
SUBSCRIBE, PSUBSCRIBE, MONITOR
In addition, the new flag is turned on inside the AOF client, we do not want to
block the AOF client to prevent deadlocks and commands ordering issues (and there
is also an existing assert in the code that verifies it).

To keep backward compatibility on LUA, all the no-script flags on existing commands
were kept untouched. In addition, a LUA special treatment on XREAD and XREADGROUP was kept.

To keep backward compatibility on MULTI (which today allows SUBSCRIBE, and PSUBSCRIBE).
We added a special treatment on those commands to allow executing them on MULTI.

The only backward compatibility issue that this PR introduces is that now MONITOR
is not allowed inside MULTI.

Tests were added to verify blocking commands are not blocking the client on LUA, MULTI,
or RM_Call. Tests were added to verify the module can check for CLIENT_DENY_BLOCKING flag.

Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Itamar Haber <itamar@redislabs.com>
2020-11-17 18:58:55 +02:00
thomaston
39f716a121
ZREVRANGEBYSCORE Optimization for out of range offset (#5773)
ZREVRANGEBYSCORE key max min [WITHSCORES] [LIMIT offset count]
When the offset is too large, the query is very slow. Especially when the offset is greater than the length of zset it is easy to determine whether the offset is greater than the length of zset at first, and If it exceed the length of zset, then return directly.

Co-authored-by: Oran Agra <oran@redislabs.com>
2020-11-17 18:35:56 +02:00
swamp0407
ea7cf737a1
Add COPY command (#7953)
Syntax:
COPY <key> <new-key> [DB <dest-db>] [REPLACE]

No support for module keys yet.

Co-authored-by: tmgauss
Co-authored-by: Itamar Haber <itamar@redislabs.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2020-11-17 12:03:05 +02:00
chenyangyang
c1aaad06d8
Modules callbacks for lazy free effort, and unlink (#7912)
Add two optional callbacks to the RedisModuleTypeMethods structure, which is `free_effort`
and `unlink`. the `free_effort` callback indicates the effort required to free a module memory.
Currently, if the effort exceeds LAZYFREE_THRESHOLD, the module memory may be released
asynchronously. the `unlink` callback indicates the key has been removed from the DB by redis, and
may soon be freed by a background thread.

Add `lazyfreed_objects` info field, which represents the number of objects that have been
lazyfreed since redis was started.

Add `RM_GetTypeMethodVersion` API, which return the current redis-server runtime value of
`REDISMODULE_TYPE_METHOD_VERSION`. You can use that when calling `RM_CreateDataType` to know
which fields of RedisModuleTypeMethods are gonna be supported and which will be ignored.
2020-11-16 10:34:04 +02:00
Felipe Machado
d8fd48c436
Add new commands ZDIFF and ZDIFFSTORE (#7961)
- Add ZDIFF and ZDIFFSTORE which work similarly to SDIFF and SDIFFSTORE
- Make sure the new WITHSCORES argument that was added for ZUNION isn't considered valid for ZUNIONSTORE

Co-authored-by: Oran Agra <oran@redislabs.com>
2020-11-15 14:14:25 +02:00
Yossi Gottlieb
48a9d63af3
Add timer module API tests. (#8041) 2020-11-11 22:57:33 +02:00
Madelyn Olson
3feff7d78a
Rewritten commands are logged as their original command (#8006)
* Rewritten commands are logged as their original command

Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2020-11-10 13:50:03 -08:00
nitaicaro
19c29b6007
Extend client tracking tests (#7998)
Test support for the new map, null and push message types. Map objects are parsed as a list of lists of key value pairs.
for instance: user => john password => 123

will be parsed to the following TCL list:

{{user john} {password 123}}

Also added the following tests:

Redirection still works with RESP3

Able to use a RESP3 client as a redirection client

No duplicate invalidation messages when turning BCAST mode on after normal tracking

Server is able to evacuate enough keys when num of keys surpasses limit by more than defined initial effort

Different clients using different protocols can track the same key

OPTOUT tests

OPTIN tests

Clients can redirect to the same connection

tracking-redir-broken test

HELLO 3 checks

Invalidation messages still work when using RESP3, with and without redirection

Switching to RESP3 doesn't disturb previous tracked keys

Tracking info is correct

Flushall and flushdb produce invalidation messages

These tests achieve 100% line coverage for tracking.c using lcov.
2020-11-09 22:54:47 +02:00
Meir Shpilraien (Spielrein)
97d647a139
Moved RMAPI_FUNC_SUPPORTED location such that it will be visible to modules (#8037)
The RMAPI_FUNC_SUPPORTED was defined in the wrong place on redismodule.h
and was not visible to modules.
2020-11-09 10:46:23 +02:00
sundb
cd1c600548
Typo fix: entires -> entries (#8031) 2020-11-08 08:32:38 +02:00
Yossi Gottlieb
1fd456f91a
Add RESET command. (#7982)
Perform full reset of all client connection states, is if the client was
disconnected and re-connected. This affects:

* MULTI state
* Watched keys
* MONITOR mode
* Pub/Sub subscription
* ACL/Authenticated state
* Client tracking state
* Cluster read-only/asking state
* RESP version (reset to 2)
* Selected database
* CLIENT REPLY state

The response is +RESET to make it easily distinguishable from other
responses.

Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Itamar Haber <itamar@redislabs.com>
2020-11-05 10:51:26 +02:00
Yossi Gottlieb
f6546eff45 Tests: fix filename reported in run_solo tests. 2020-11-04 21:43:55 +02:00
Yossi Gottlieb
2faa0f19eb Fix test failure on slower systems.
Not disabling save, slower systems begun background save that did not
complete in time, resulting with SAVE failing with "ERR Background save
already in progress".
2020-11-04 21:43:55 +02:00
filipe oliveira
10b5006934
Enable specifying TLS ciphers(suites) in redis-cli/redis-benchmark (#8005)
Enable specifying the preferred ciphers and/or ciphersuites for redis-cli/redis-benchmark.

Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
2020-11-04 14:49:15 +02:00
Meir Shpilraien (Spielrein)
762be79f0b
Disable new SIGABRT test on valgrind (#8013)
The crash reports cause false-positive warnings when run with valgrind.
2020-11-04 13:13:55 +02:00
Wang Yuan
89c78a9808
Disable rehash when redis has child process (#8007)
In redisFork(), we don't set child pid, so updateDictResizePolicy()
doesn't take effect, that isn't friendly for copy-on-write.

The bug was introduced this in redis 6.0: 56258c6
2020-11-03 17:16:11 +02:00
Meir Shpilraien (Spielrein)
f210e197f3
Added crash report on SIGABRT (#8004)
The reason that we want to get a full crash report on SIGABRT
is that the jmalloc, when detecting a corruption, calls abort().
This will cause the Redis to exist silently without any report
and without any way to analyze what happened.
2020-11-03 14:59:21 +02:00
Oran Agra
9122379abc
Propagate GETSET and SET-GET as SET (#7957)
- Generates a more backwards compatible command stream
- Slightly more efficient execution in replica/AOF
- Add a test for coverage
2020-11-03 14:56:57 +02:00
Madelyn Olson
411bcf1a41 Further improved ACL algorithm for picking categories 2020-10-28 10:01:20 -07:00
filipe oliveira
39436b2152
TLS Support for redis-benchmark (#7959) 2020-10-28 08:00:54 +02:00
Wang Yuan
dc899c4c88
Fix timing dependence in replication tcl tests (#7969)
Remove 'fork child $pid' log in replication.tcl
2020-10-27 09:36:42 +02:00
Oran Agra
4e2e5be201
Attempt to fix sporadic test failures due to wait_for_log_messages (#7955)
The tests sometimes fail to find a log message.
Recently i added a print that shows the log files that are searched
and it shows that the message was in deed there.
The only reason i can't think of for this seach to fail, is we we
happened to read an incomplete line, which didn't match our pattern and
then on the next iteration we would continue reading from the line after
it.

The fix is to always re-evaluation the previous line.
2020-10-26 11:55:24 +02:00
filipe oliveira
01acfa71ca
redis-benchmark: add tests, --version, a minor bug fixes (#7947)
- add test suite coverage for redis-benchmark
- add --version (similar to what redis-cli has)
- fix bug sending more requests than intended when pipeline > 1.
- when done sending requests, avoid freeing client in the write handler, in theory before
  responses are received (probably dead code since the read handler will call clientDone first)

Co-authored-by: Oran Agra <oran@redislabs.com>
2020-10-26 08:04:59 +02:00
Zach Fewtrell
ebfa769925
fix invalid 'failover' identifier in cluster slave selection test (#7942) 2020-10-25 10:05:38 +02:00
Qu Chen
556acefe75
WATCH no longer ignores keys which have expired for MULTI/EXEC. (#7920)
This wrong behavior was backed by a test, and also documentation, and dates back to 2010.
But it makes no sense to anyone involved so it was decided to change that.

Note that 20eeddf (invalidate watch on expire on access) was released in 6.0 RC2
and 2d1968f released in in 6.0.0 GA (invalidate watch when key is evicted).
both of which do similar changes.
2020-10-22 12:57:45 +03:00
Oran Agra
c96ece9f5e
improve verbose logging on failed test. print log file lines (#7938) 2020-10-22 11:34:54 +03:00
Yossi Gottlieb
843a13e88f
Add a --no-latency tests flag. (#7939)
Useful for running tests on systems which may be way slower than usual.
2020-10-22 11:10:53 +03:00
Yossi Gottlieb
ef92f507dd
Fix tests failure on busybox systems. (#7916) 2020-10-18 14:50:29 +03:00
Wen Hui
f328194d12
support NOMKSTREAM option in xadd command (#7910)
introduces a NOMKSTREAM option for xadd command, this would be useful for some
use cases when we do not want to create new stream by default:

XADD key [MAXLEN [~|=] <count>] [NOMKSTREAM] <ID or *> [field value] [field value]
2020-10-18 10:15:43 +03:00
Yossi Gottlieb
056a43e1a6
Modules: fix RM_GetCommandKeys API. (#7901)
This cleans up and simplifies the API by passing the command name as the
first argument. Previously the command name was specified explicitly,
but was still included in the argv.
2020-10-11 18:10:55 +03:00
Meir Shpilraien (Spielrein)
adc3183cd2
Add Module API for version and compatibility checks (#7865)
* Introduce a new API's: RM_GetContextFlagsAll, and
RM_GetKeyspaceNotificationFlagsAll that will return the
full flags mask of each feature. The module writer can
check base on this value if the Flags he needs are
supported or not.

* For each flag, introduce a new value on redismodule.h,
this value represents the LAST value and should be there
as a reminder to update it when a new value is added,
also it will be used in the code to calculate the full
flags mask (assuming flags are incrementally increasing).
In addition, stated that the module writer should not use
the LAST flag directly and he should use the GetFlagAll API's.

* Introduce a new API: RM_IsSubEventSupported, that returns for a given
event and subevent, whether or not the subevent supported.

* Introduce a new macro RMAPI_FUNC_SUPPORTED(func) that returns whether
or not a function API is supported by comparing it to NULL.

* Introduce a new API: int RM_GetServerVersion();, that will return the
current Redis version in the format 0x00MMmmpp; e.g. 0x00060008;

* Changed unstable version from 999.999.999 to 255.255.255

Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
2020-10-11 17:21:58 +03:00
Yossi Gottlieb
0aec98dce2
Module API: Add RM_GetClientCertificate(). (#7866)
This API function makes it possible to retrieve the X.509 certificate
used by clients to authenticate TLS connections.
2020-10-11 17:11:42 +03:00
Yossi Gottlieb
907da0580b
Modules: Add RM_GetDetachedThreadSafeContext(). (#7886)
The main motivation here is to provide a way for modules to create a
single, global context that can be used for logging.

Currently, it is possible to obtain a thread-safe context that is not
attached to any blocked client by using `RM_GetThreadSafeContext`.
However, the attached context is not linked to the module identity so
log messages produced are not tagged with the module name.

Ideally we'd fix this in `RM_GetThreadSafeContext` itself but as it
doesn't accept the current context as an argument there's no way to do
that in a backwards compatible manner.
2020-10-11 16:11:31 +03:00
Yossi Gottlieb
7d117d7591 Modules: add RM_GetCommandKeys().
This is essentially the same as calling COMMAND GETKEYS but provides a
more efficient interface that can be used in every context (i.e. not a
Redis command).
2020-10-11 16:04:14 +03:00
Felipe Machado
c3f9e01794
Adds new pop-push commands (LMOVE, BLMOVE) (#6929)
Adding [B]LMOVE <src> <dst> RIGHT|LEFT RIGHT|LEFT. deprecating [B]RPOPLPUSH.

Note that when receiving a BRPOPLPUSH we'll still propagate an RPOPLPUSH,
but on BLMOVE RIGHT LEFT we'll propagate an LMOVE

improvement to existing tests
- Replace "after 1000" with "wait_for_condition" when wait for
  clients to block/unblock.
- Add a pre-existing element to target list on basic tests so
  that we can check if the new element was added to the correct
  side of the list.
- check command stats on the replica to make sure the right
  command was replicated

Co-authored-by: Oran Agra <oran@redislabs.com>
2020-10-08 08:33:17 +03:00
Madelyn Olson
2127f7c8eb
Fixed excessive categories being displayed from acls (#7889) 2020-10-07 22:09:09 -07:00
Oran Agra
216c110609
Allow blocked XREAD on a cluster replica (#7881)
I suppose that it was overlooked, since till recently none of the blocked commands were readonly.

other changes:
- add test for the above.
- add better support for additional (and deferring) clients for
  cluster tests
- improve a test which left the client in MULTI state.
2020-10-06 21:43:30 +03:00
Oran Agra
bea40e6a41
memory reporting of clients argv (#7874)
track and report memory used by clients argv.
this is very usaful in case clients started sending a command and didn't
complete it. in which case the first args of the command are already
trimmed from the query buffer.

in an effort to avoid cache misses and overheads while keeping track of
these, i avoid calling sdsZmallocSize and instead use the sdslen /
bulk-len which can at least give some insight into the problem.

This memory is now added to the total clients memory usage, as well as
the client list.
2020-10-05 11:15:36 +03:00
Nykolas Laurentino de Lima
66ee45b65c
Add GET parameter to SET command (#7852)
Add optional GET parameter to SET command in order to set a new value to
a key and retrieve the old key value. With this change we can deprecate
`GETSET` command and use only the SET command with the GET parameter.
2020-10-02 15:07:19 +03:00
Oran Agra
dc803d25a6
Fix crash in script timeout during AOF loading (#7870) 2020-10-01 11:27:45 +03:00
nitaicaro
8fb89a5728
Fixed Tracking test “The other connection is able to get invalidations” (#7871)
PROBLEM:

[$rd1 read] reads invalidation messages one by one, so it's never going to see the second invalidation message produced after INCR b, whether or not it exists. Adding another read will block incase no invalidation message is produced.

FIX:

We switch the order of "INCR a" and "INCR b" - now "INCR b" comes first. We still only read the first invalidation message produces. If an invalidation message is wrongly produces for b - then it will be produced before that of a, since "INCR b" comes before "INCR a".

Co-authored-by: Nitai Caro <caronita@amazon.com>
2020-09-30 19:52:01 +03:00
Oran Agra
8aa083bd28
Fix new obuf-limits tests to work with TLS (#7848)
Also stabilize new shutdown tests on slow machines (valgrind)
2020-09-27 17:13:33 +03:00
Wang Yuan
57709c4bc6
Don't write replies if close the client ASAP (#7202)
Before this commit, we would have continued to add replies to the reply buffer even if client
output buffer limit is reached, so the used memory would keep increasing over the configured limit.
What's more, we shouldn’t write any reply to the client if it is set 'CLIENT_CLOSE_ASAP' flag
because that doesn't conform to its definition and we will close all clients flagged with
'CLIENT_CLOSE_ASAP' in ‘beforeSleep’.

Because of code execution order, before this, we may firstly write to part of the replies to
the socket before disconnecting it, but in fact, we may can’t send the full replies to clients
since OS socket buffer is limited. But this unexpected behavior makes some commands work well,
for instance ACL DELUSER, if the client deletes the current user, we need to send reply to client
and close the connection, but before, we close the client firstly and write the reply to reply
buffer. secondly, we shouldn't do this despite the fact it works well in most cases.

We add a flag 'CLIENT_CLOSE_AFTER_COMMAND' to mark clients, this flag means we will close the
client after executing commands and send all entire replies, so that we can write replies to
reply buffer during executing commands, send replies to clients, and close them later.

We also fix some implicit problems. If client output buffer limit is enforced in 'multi/exec',
all commands will be executed completely in redis and clients will not read any reply instead of
partial replies. Even more, if the client executes 'ACL deluser' the using user in 'multi/exec',
it will not read the replies after 'ACL deluser' just like before executing 'client kill' itself
in 'multi/exec'.

We added some tests for output buffer limit breach during multi-exec and using a pipeline of
many small commands rather than one with big response.

Co-authored-by: Oran Agra <oran@redislabs.com>
2020-09-24 16:01:41 +03:00
valentinogeron
795c454db1
Stream: Inconsistency between master and replica some XREADGROUP case (#7526)
XREADGROUP auto-creates the consumer inside the consumer group the
first time it saw it.
When XREADGROUP is being used with NOACK option, the message will not
be added into the client's PEL and XGROUP SETID would be propagated.
When the replica gets the XGROUP SETID it will only update the last delivered
id of the group, but will not create the consumer.

So, in this commit XGROUP CREATECONSUMER is being added.
Command pattern: XGROUP CREATECONSUMER <key> <group> <consumer>.

When NOACK option is being used, createconsumer command would be
propagated as well.

In case of AOFREWRITE, consumer with an empty PEL would be saved with
XGROUP CREATECONSUMER whereas consumer with pending entries would be
saved with XCLAIM
2020-09-24 12:02:40 +03:00
bodong.ybd
e08bf16637 Add ZINTER/ZUNION command
Syntax: ZINTER/ZUNION numkeys key [key ...] [WEIGHTS weight [weight ...]]
[AGGREGATE SUM|MIN|MAX] [WITHSCORES]

see #7624
2020-09-24 08:59:14 +03:00
alexronke-channeladvisor
66a13ccbdf
Add GT and LT options to ZADD for conditional score updates (#7818)
Co-authored-by: Alex Ronke <w.alex.ronke@gmail.com>
2020-09-23 21:56:16 +03:00
Wang Yuan
1bb5794a1f
Kill disk-based fork child when all replicas drop and 'save' is not enabled (#7819)
When all replicas waiting for a bgsave get disconnected (possibly due to output buffer limit),
It may be good to kill the bgsave child. in diskless replication it already happens, but in
disk-based, the child may still serve some purpose (for persistence).

By killing the child, we prevent it from eating COW memory in vain, and we also allow a new child fork sooner for the next full synchronization or bgsave.
We do that only if rdb persistence wasn't enabled in the configuration.

Btw, now, rdbRemoveTempFile in killRDBChild won't block server, so we can killRDBChild safely.
2020-09-22 09:47:58 +03:00
Wen Hui
dfe9714c86
Add Swapdb Module Event (#7804) 2020-09-20 13:36:20 +03:00
Wang Yuan
b002d2b4f1
Remove tmp rdb file in background thread (#7762)
We're already using bg_unlink in several places to delete the rdb file in the background,
and avoid paying the cost of the deletion from our main thread.
This commit uses bg_unlink to remove the temporary rdb file in the background too.

However, in case we delete that rdb file just before exiting, we don't actually wait for the
background thread or the main thread to delete it, and just let the OS clean up after us.
i.e. we open the file, unlink it and exit with the fd still open.

Furthermore, rdbRemoveTempFile can be called from a thread and was using snprintf which is
not async-signal-safe, we now use ll2string instead.
2020-09-17 18:20:10 +03:00
WuYunlong
98c8ac0d70
Clarify help text of tcl scripts. (#7798)
Before this commit, following command did not show --tls option:
./runtest-cluster --help
./runtest-sentinel --help
2020-09-15 08:27:42 +03:00