Additionally the older defrag tests are using an obsolete way to check
if the defragger is suuported (the error no longer contains "DISABLED").
this doesn't usually makes a difference since these tests are completely
skipped if the allocator is not jemalloc, but that would fail if the
allocator is a jemalloc that doesn't support defrag.
* Allow runtest-moduleapi use a different 'make', for systems where GNU Make is 'gmake'.
* Fix issue with builds on Solaris re-building everything from scratch due to CFLAGS/LDFLAGS not stored.
* Fix compile failure on Solaris due to atomicvar and a bunch of warnings.
* Fix garbled log timestamps on Solaris.
When a replica uses the diskless-load swapdb approach, it backs up the old database,
then attempts to load a new one, and in case of failure, it restores the backup.
this means that modules with global out of keyspace data, must have an option to
subscribe to events and backup/restore/discard their global data too.
Add a new set of defrag functions that take a defrag context and allow
defragmenting memory blocks and RedisModuleStrings.
Modules can register a defrag callback which will be invoked when the
defrag process handles globals.
Modules with custom data types can also register a datatype-specific
defrag callback which is invoked for keys that require defragmentation.
The callback and associated functions support both one-step and
multi-step options, depending on the complexity of the key as exposed by
the free_effort callback.
Add commands to query geospatial data with bounding box.
Two new commands that replace the existing 4 GEORADIUS* commands.
GEOSEARCH key [FROMMEMBER member] [FROMLOC long lat] [BYRADIUS radius
unit] [BYBOX width height unit] [WITHCORD] [WITHDIST] [WITHASH] [COUNT
count] [ASC|DESC]
GEOSEARCHSTORE dest_key src_key [FROMMEMBER member] [FROMLOC long lat]
[BYRADIUS radius unit] [BYBOX width height unit] [WITHCORD] [WITHDIST]
[WITHASH] [COUNT count] [ASC|DESC] [STOREDIST]
- Add two types of CIRCULAR_TYPE and RECTANGLE_TYPE to achieve different searches
- Judge whether the point is within the rectangle, refer to:
geohashGetDistanceIfInRectangle
This adds a new `tls-client-cert-file` and `tls-client-key-file`
configuration directives which make it possible to use different
certificates for the TLS-server and TLS-client functions of Redis.
This is an optional directive. If it is not specified the `tls-cert-file`
and `tls-key-file` directives are used for TLS client functions as well.
Also, `utils/gen-test-certs.sh` now creates additional server-only and client-only certs and will skip intensive operations if target files already exist.
This adds a copy callback for module data types, in order to make
modules compatible with the new COPY command.
The callback is optional and COPY will fail for keys with data types
that do not implement it.
Module blocked clients cache the response in a temporary client,
the reply list in this client would be affected by the recent fix
in #7202, but when the reply is later copied into the real client,
it would have bypassed all the checks for output buffer limit, which
would have resulted in both: responding with a partial response to
the client, and also not disconnecting it at all.
c4fdf09c0 added a test that now fails with valgrind
it fails for two resons:
1) the test samples the used memory and then limits the maxmemory to
that value, but it turns out this is not atomic and on slow machines
the background cron process that clean out old query buffers reduces
the memory so that the setting doesn't cause eviction.
2) the dbsize was tested late, after reading some invalidation messages
by that time more and more keys got evicted, partially draining the
db. this is not the focus of this fix (still a known limitation)
when the same consumer re-claim an entry that it already has, there's
no need to remove-and-insert if it's the same rax.
we do need to update the idle time though.
this commit only improves efficiency (doesn't change behavior).
* Add CLIENT INFO subcommand.
The output is identical to CLIENT LIST but provides a single line for
the current client only.
* Add CLIENT LIST ID [id...].
Co-authored-by: Itamar Haber <itamar@redislabs.com>
If RESTORE passes successfully with full sanitization, we can't affort
to crash later on assertion due to duplicate records in a hash when
converting it form ziplist to dict.
This means that when doing full sanitization, we must make sure there
are no duplicate records in any of the collections.
when using --baseport to run two tests suite in parallel (different
folders), we need to also make sure the port used by the testsuite to
communicate with it's workers is unique. otherwise the attept to find a
free port connects to the other test suite and messes it.
maybe one day we need to attempt to bind, instead of connect when tring
to find a free port.
The test creates keys with various encodings, DUMP them, corrupt the payload
and RESTORES it.
It utilizes the recently added use-exit-on-panic config to distinguish between
asserts and segfaults.
If the restore succeeds, it runs random commands on the key to attempt to
trigger a crash.
It runs in two modes, one with deep sanitation enabled and one without.
In the first one we don't expect any assertions or segfaults, in the second one
we expect assertions, but no segfaults.
We also check for leaks and invalid reads using valgrind, and if we find them
we print the commands that lead to that issue.
Changes in the code (other than the test):
- Replace a few NPD (null pointer deference) flows and division by zero with an
assertion, so that it doesn't fail the test. (since we set the server to use
`exit` rather than `abort` on assertion).
- Fix quite a lot of flows in rdb.c that could have lead to memory leaks in
RESTORE command (since it now responds with an error rather than panic)
- Add a DEBUG flag for SET-SKIP-CHECKSUM-VALIDATION so that the test don't need
to bother with faking a valid checksum
- Remove a pile of code in serverLogObjectDebugInfo which is actually unsafe to
run in the crash report (see comments in the code)
- fix a missing boundary check in lzf_decompress
test suite infra improvements:
- be able to run valgrind checks before the process terminates
- rotate log files when restarting servers
- improve stream rdb encoding test to include more types of stream metadata
- add test to cover various ziplist encoding entries (although it does
look like the stress test above it is able to find some too
- add another test for ziplist encoding for hash with full sanitization
- add similar ziplist encoding tests for list
When loading an encoded payload we will at least do a shallow validation to
check that the size that's encoded in the payload matches the size of the
allocation.
This let's us later use this encoded size to make sure the various offsets
inside encoded payload don't reach outside the allocation, if they do, we'll
assert/panic, but at least we won't segfault or smear memory.
We can also do 'deep' validation which runs on all the records of the encoded
payload and validates that they don't contain invalid offsets. This lets us
detect corruptions early and reject a RESTORE command rather than accepting
it and asserting (crashing) later when accessing that payload via some command.
configuration:
- adding ACL flag skip-sanitize-payload
- adding config sanitize-dump-payload [yes/no/clients]
For now, we don't have a good way to ensure MIGRATE in cluster resharding isn't
being slowed down by these sanitation, so i'm setting the default value to `no`,
but later on it should be set to `clients` by default.
changes:
- changing rdbReportError not to `exit` in RESTORE command
- adding a new stat to be able to later check if cluster MIGRATE isn't being
slowed down by sanitation.
When client tracking is enabled signalModifiedKey can increase memory usage,
this can cause the loop in performEvictions to keep running since it was measuring
the memory usage impact of signalModifiedKey.
The section that measures the memory impact of the eviction should be just on dbDelete,
excluding keyspace notification, client tracking, and propagation to AOF and replicas.
This resolves part of the problem described in #8069
p.s. fix took 1 minute, test took about 3 hours to write.
One way this was happening is when a module issued an RM_Call which would inject MULTI.
If the module command that does that was itself issued by something else that already did
added MULTI (e.g. another module, or a Lua script), it would have caused nested MULTI.
In fact the MULTI state in the client or the MULTI_EMITTED flag in the context isn't
the right indication that we need to propagate MULTI or not, because on a nested calls
(possibly a module action called by a keyspace event of another module action), these
flags aren't retained / reflected.
instead there's now a global propagate_in_transaction flag for that.
in addition to that, we now have a global in_eval and in_exec flags, to serve the flags
of RM_GetContextFlags, since their dependence on the current client is wrong for the same
reasons mentioned above.
As we know, redis may reject user's requests or evict some keys if
used memory is over maxmemory. Dictionaries expanding may make
things worse, some big dictionaries, such as main db and expires dict,
may eat huge memory at once for allocating a new big hash table and be
far more than maxmemory after expanding.
There are related issues: #4213#4583
More details, when expand dict in redis, we will allocate a new big
ht[1] that generally is double of ht[0], The size of ht[1] will be
very big if ht[0] already is big. For db dict, if we have more than
64 million keys, we need to cost 1GB for ht[1] when dict expands.
If the sum of used memory and new hash table of dict needed exceeds
maxmemory, we shouldn't allow the dict to expand. Because, if we
enable keys eviction, we still couldn't add much more keys after
eviction and rehashing, what's worse, redis will keep less keys when
redis only remains a little memory for storing new hash table instead
of users' data. Moreover users can't write data in redis if disable
keys eviction.
What this commit changed ?
Add a new member function expandAllowed for dict type, it provide a way
for caller to allow expand or not. We expose two parameters for this
function: more memory needed for expanding and dict current load factor,
users can implement a function to make a decision by them.
For main db dict and expires dict type, these dictionaries may be very
big and cost huge memory for expanding, so we implement a judgement
function: we can stop dict to expand provisionally if used memory will
be over maxmemory after dict expands, but to guarantee the performance
of redis, we still allow dict to expand if dict load factor exceeds the
safe load factor.
Add test cases to verify we don't allow main db to expand when left
memory is not enough, so that avoid keys eviction.
Other changes:
For new hash table size when expand. Before this commit, the size is
that double used of dict and later _dictNextPower. Actually we aim to
control a dict load factor between 0.5 and 1.0. Now we replace *2 with
+1, since the first check is that used >= size, the outcome of before
will usually be the same as _dictNextPower(used+1). The only case where
it'll differ is when dict_can_resize is false during fork, so that later
the _dictNextPower(used*2) will cause the dict to jump to *4 (i.e.
_dictNextPower(1025*2) will return 4096).
Fix rehash test cases due to changing algorithm of new hash table size
when expand.
When replica diskless-load type is swapdb in cluster mode, we didn't backup
keys to slots map, so we will lose keys to slots map if fail to sync.
Now we backup keys to slots map at first, and restore it properly when fail.
This commit includes a refactory/cleanup of the backups mechanism (moving it to db.c and re-structuring it a bit).
Co-authored-by: Oran Agra <oran@redislabs.com>
As described in redis-benchamrk help message 'The test names are the same as the ones produced as output.', In redis-benchmark output, we can only see PING_BULK, but the cmd `redis-benchmark -t ping_bulk` is not supported. We have to run it with ping_mbulk which is not user friendly.
SELECT used to read the index into a `long` variable, and then pass it to a function
that takes an `int`, possibly causing an overflow before the range check.
Now all these commands use better and cleaner range check, and that also results in
a slight change of the error response in case of an invalid database index.
SELECT:
in the past it would have returned either `-ERR invalid DB index` (if not a number),
or `-ERR DB index is out of range` (if not between 1..16 or alike).
now it'll return either `-ERR value is out of range` (if not a number), or
`-ERR value is out of range, value must between -2147483648 and 2147483647`
(if not in the range for an int), or `-ERR DB index is out of range`
(if not between 0..16 or alike)
MOVE:
in the past it would only fail with `-ERR index out of range` no matter the reason.
now return the same errors as the new ones for SELECT mentioned above.
(i.e. unlike for SELECT even for a value like 17 we changed the error message)
COPY:
doesn't really matter how it behaved in the past (new command), new behavior is
like the above two.
Fixes#7923.
This PR appropriates the special `&` symbol (because `@` and `*` are taken),
followed by a literal value or pattern for describing the Pub/Sub patterns that
an ACL user can interact with. It is similar to the existing key patterns
mechanism in function (additive) and implementation (copy-pasta). It also adds
the allchannels and resetchannels ACL keywords, naturally.
The default user is given allchannels permissions, whereas new users get
whatever is defined by the acl-pubsub-default configuration directive. For
backward compatibility in 6.2, the default of this directive is allchannels but
this is likely to be changed to resetchannels in the next major version for
stronger default security settings.
Unless allchannels is set for the user, channel access permissions are checked
as follows :
* Calls to both PUBLISH and SUBSCRIBE will fail unless a pattern matching the
argumentative channel name(s) exists for the user.
* Calls to PSUBSCRIBE will fail unless the pattern(s) provided as an argument
literally exist(s) in the user's list.
Such failures are logged to the ACL log.
Runtime changes to channel permissions for a user with existing subscribing
clients cause said clients to disconnect unless the new permissions permit the
connections to continue. Note, however, that PSUBSCRIBErs' patterns are matched
literally, so given the change bar:* -> b*, pattern subscribers to bar:* will be
disconnected.
Notes/questions:
* UNSUBSCRIBE, PUNSUBSCRIBE and PUBSUB remain unprotected due to lack of reasons
for touching them.
The bug was introduced by #5021 which only attempted avoid EXIST on an
already expired key from returning 1 on a replica.
Before that commit, dbExists was used instead of
lookupKeyRead (which had an undesired effect to "touch" the LRU/LFU)
Other than that, this commit fixes OBJECT to also come empty handed on
expired keys in replica.
And DEBUG DIGEST-VALUE to behave like DEBUG OBJECT (get the data from
the key regardless of it's expired state)
Blocking command should not be used with MULTI, LUA, and RM_Call. This is because,
the caller, who executes the command in this context, expects a reply.
Today, LUA and MULTI have a special (and different) treatment to blocking commands:
LUA - Most commands are marked with no-script flag which are checked when executing
and command from LUA, commands that are not marked (like XREAD) verify that their
blocking mode is not used inside LUA (by checking the CLIENT_LUA client flag).
MULTI - Command that is going to block, first verify that the client is not inside
multi (by checking the CLIENT_MULTI client flag). If the client is inside multi, they
return a result which is a match to the empty key with no timeout (for example blpop
inside MULTI will act as lpop)
For modules that perform RM_Call with blocking command, the returned results type is
REDISMODULE_REPLY_UNKNOWN and the caller can not really know what happened.
Disadvantages of the current state are:
No unified approach, LUA, MULTI, and RM_Call, each has a different treatment
Module can not safely execute blocking command (and get reply or error).
Though It is true that modules are not like LUA or MULTI and should be smarter not
to execute blocking commands on RM_Call, sometimes you want to execute a command base
on client input (for example if you create a module that provides a new scripting
language like javascript or python).
While modules (on modules command) can check for REDISMODULE_CTX_FLAGS_LUA or
REDISMODULE_CTX_FLAGS_MULTI to know not to block the client, there is no way to
check if the command came from another module using RM_Call. So there is no way
for a module to know not to block another module RM_Call execution.
This commit adds a way to unify the treatment for blocking clients by introducing
a new CLIENT_DENY_BLOCKING client flag. On LUA, MULTI, and RM_Call the new flag
turned on to signify that the client should not be blocked. A blocking command
verifies that the flag is turned off before blocking. If a blocking command sees
that the CLIENT_DENY_BLOCKING flag is on, it's not blocking and return results
which are matches to empty key with no timeout (as MULTI does today).
The new flag is checked on the following commands:
List blocking commands: BLPOP, BRPOP, BRPOPLPUSH, BLMOVE,
Zset blocking commands: BZPOPMIN, BZPOPMAX
Stream blocking commands: XREAD, XREADGROUP
SUBSCRIBE, PSUBSCRIBE, MONITOR
In addition, the new flag is turned on inside the AOF client, we do not want to
block the AOF client to prevent deadlocks and commands ordering issues (and there
is also an existing assert in the code that verifies it).
To keep backward compatibility on LUA, all the no-script flags on existing commands
were kept untouched. In addition, a LUA special treatment on XREAD and XREADGROUP was kept.
To keep backward compatibility on MULTI (which today allows SUBSCRIBE, and PSUBSCRIBE).
We added a special treatment on those commands to allow executing them on MULTI.
The only backward compatibility issue that this PR introduces is that now MONITOR
is not allowed inside MULTI.
Tests were added to verify blocking commands are not blocking the client on LUA, MULTI,
or RM_Call. Tests were added to verify the module can check for CLIENT_DENY_BLOCKING flag.
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Itamar Haber <itamar@redislabs.com>
ZREVRANGEBYSCORE key max min [WITHSCORES] [LIMIT offset count]
When the offset is too large, the query is very slow. Especially when the offset is greater than the length of zset it is easy to determine whether the offset is greater than the length of zset at first, and If it exceed the length of zset, then return directly.
Co-authored-by: Oran Agra <oran@redislabs.com>
Syntax:
COPY <key> <new-key> [DB <dest-db>] [REPLACE]
No support for module keys yet.
Co-authored-by: tmgauss
Co-authored-by: Itamar Haber <itamar@redislabs.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
Add two optional callbacks to the RedisModuleTypeMethods structure, which is `free_effort`
and `unlink`. the `free_effort` callback indicates the effort required to free a module memory.
Currently, if the effort exceeds LAZYFREE_THRESHOLD, the module memory may be released
asynchronously. the `unlink` callback indicates the key has been removed from the DB by redis, and
may soon be freed by a background thread.
Add `lazyfreed_objects` info field, which represents the number of objects that have been
lazyfreed since redis was started.
Add `RM_GetTypeMethodVersion` API, which return the current redis-server runtime value of
`REDISMODULE_TYPE_METHOD_VERSION`. You can use that when calling `RM_CreateDataType` to know
which fields of RedisModuleTypeMethods are gonna be supported and which will be ignored.
- Add ZDIFF and ZDIFFSTORE which work similarly to SDIFF and SDIFFSTORE
- Make sure the new WITHSCORES argument that was added for ZUNION isn't considered valid for ZUNIONSTORE
Co-authored-by: Oran Agra <oran@redislabs.com>
Test support for the new map, null and push message types. Map objects are parsed as a list of lists of key value pairs.
for instance: user => john password => 123
will be parsed to the following TCL list:
{{user john} {password 123}}
Also added the following tests:
Redirection still works with RESP3
Able to use a RESP3 client as a redirection client
No duplicate invalidation messages when turning BCAST mode on after normal tracking
Server is able to evacuate enough keys when num of keys surpasses limit by more than defined initial effort
Different clients using different protocols can track the same key
OPTOUT tests
OPTIN tests
Clients can redirect to the same connection
tracking-redir-broken test
HELLO 3 checks
Invalidation messages still work when using RESP3, with and without redirection
Switching to RESP3 doesn't disturb previous tracked keys
Tracking info is correct
Flushall and flushdb produce invalidation messages
These tests achieve 100% line coverage for tracking.c using lcov.
Perform full reset of all client connection states, is if the client was
disconnected and re-connected. This affects:
* MULTI state
* Watched keys
* MONITOR mode
* Pub/Sub subscription
* ACL/Authenticated state
* Client tracking state
* Cluster read-only/asking state
* RESP version (reset to 2)
* Selected database
* CLIENT REPLY state
The response is +RESET to make it easily distinguishable from other
responses.
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Itamar Haber <itamar@redislabs.com>
Not disabling save, slower systems begun background save that did not
complete in time, resulting with SAVE failing with "ERR Background save
already in progress".
In redisFork(), we don't set child pid, so updateDictResizePolicy()
doesn't take effect, that isn't friendly for copy-on-write.
The bug was introduced this in redis 6.0: 56258c6
The reason that we want to get a full crash report on SIGABRT
is that the jmalloc, when detecting a corruption, calls abort().
This will cause the Redis to exist silently without any report
and without any way to analyze what happened.
The tests sometimes fail to find a log message.
Recently i added a print that shows the log files that are searched
and it shows that the message was in deed there.
The only reason i can't think of for this seach to fail, is we we
happened to read an incomplete line, which didn't match our pattern and
then on the next iteration we would continue reading from the line after
it.
The fix is to always re-evaluation the previous line.
- add test suite coverage for redis-benchmark
- add --version (similar to what redis-cli has)
- fix bug sending more requests than intended when pipeline > 1.
- when done sending requests, avoid freeing client in the write handler, in theory before
responses are received (probably dead code since the read handler will call clientDone first)
Co-authored-by: Oran Agra <oran@redislabs.com>
This wrong behavior was backed by a test, and also documentation, and dates back to 2010.
But it makes no sense to anyone involved so it was decided to change that.
Note that 20eeddf (invalidate watch on expire on access) was released in 6.0 RC2
and 2d1968f released in in 6.0.0 GA (invalidate watch when key is evicted).
both of which do similar changes.
introduces a NOMKSTREAM option for xadd command, this would be useful for some
use cases when we do not want to create new stream by default:
XADD key [MAXLEN [~|=] <count>] [NOMKSTREAM] <ID or *> [field value] [field value]
This cleans up and simplifies the API by passing the command name as the
first argument. Previously the command name was specified explicitly,
but was still included in the argv.
* Introduce a new API's: RM_GetContextFlagsAll, and
RM_GetKeyspaceNotificationFlagsAll that will return the
full flags mask of each feature. The module writer can
check base on this value if the Flags he needs are
supported or not.
* For each flag, introduce a new value on redismodule.h,
this value represents the LAST value and should be there
as a reminder to update it when a new value is added,
also it will be used in the code to calculate the full
flags mask (assuming flags are incrementally increasing).
In addition, stated that the module writer should not use
the LAST flag directly and he should use the GetFlagAll API's.
* Introduce a new API: RM_IsSubEventSupported, that returns for a given
event and subevent, whether or not the subevent supported.
* Introduce a new macro RMAPI_FUNC_SUPPORTED(func) that returns whether
or not a function API is supported by comparing it to NULL.
* Introduce a new API: int RM_GetServerVersion();, that will return the
current Redis version in the format 0x00MMmmpp; e.g. 0x00060008;
* Changed unstable version from 999.999.999 to 255.255.255
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
The main motivation here is to provide a way for modules to create a
single, global context that can be used for logging.
Currently, it is possible to obtain a thread-safe context that is not
attached to any blocked client by using `RM_GetThreadSafeContext`.
However, the attached context is not linked to the module identity so
log messages produced are not tagged with the module name.
Ideally we'd fix this in `RM_GetThreadSafeContext` itself but as it
doesn't accept the current context as an argument there's no way to do
that in a backwards compatible manner.
This is essentially the same as calling COMMAND GETKEYS but provides a
more efficient interface that can be used in every context (i.e. not a
Redis command).
Adding [B]LMOVE <src> <dst> RIGHT|LEFT RIGHT|LEFT. deprecating [B]RPOPLPUSH.
Note that when receiving a BRPOPLPUSH we'll still propagate an RPOPLPUSH,
but on BLMOVE RIGHT LEFT we'll propagate an LMOVE
improvement to existing tests
- Replace "after 1000" with "wait_for_condition" when wait for
clients to block/unblock.
- Add a pre-existing element to target list on basic tests so
that we can check if the new element was added to the correct
side of the list.
- check command stats on the replica to make sure the right
command was replicated
Co-authored-by: Oran Agra <oran@redislabs.com>
I suppose that it was overlooked, since till recently none of the blocked commands were readonly.
other changes:
- add test for the above.
- add better support for additional (and deferring) clients for
cluster tests
- improve a test which left the client in MULTI state.
track and report memory used by clients argv.
this is very usaful in case clients started sending a command and didn't
complete it. in which case the first args of the command are already
trimmed from the query buffer.
in an effort to avoid cache misses and overheads while keeping track of
these, i avoid calling sdsZmallocSize and instead use the sdslen /
bulk-len which can at least give some insight into the problem.
This memory is now added to the total clients memory usage, as well as
the client list.
Add optional GET parameter to SET command in order to set a new value to
a key and retrieve the old key value. With this change we can deprecate
`GETSET` command and use only the SET command with the GET parameter.
PROBLEM:
[$rd1 read] reads invalidation messages one by one, so it's never going to see the second invalidation message produced after INCR b, whether or not it exists. Adding another read will block incase no invalidation message is produced.
FIX:
We switch the order of "INCR a" and "INCR b" - now "INCR b" comes first. We still only read the first invalidation message produces. If an invalidation message is wrongly produces for b - then it will be produced before that of a, since "INCR b" comes before "INCR a".
Co-authored-by: Nitai Caro <caronita@amazon.com>
Before this commit, we would have continued to add replies to the reply buffer even if client
output buffer limit is reached, so the used memory would keep increasing over the configured limit.
What's more, we shouldn’t write any reply to the client if it is set 'CLIENT_CLOSE_ASAP' flag
because that doesn't conform to its definition and we will close all clients flagged with
'CLIENT_CLOSE_ASAP' in ‘beforeSleep’.
Because of code execution order, before this, we may firstly write to part of the replies to
the socket before disconnecting it, but in fact, we may can’t send the full replies to clients
since OS socket buffer is limited. But this unexpected behavior makes some commands work well,
for instance ACL DELUSER, if the client deletes the current user, we need to send reply to client
and close the connection, but before, we close the client firstly and write the reply to reply
buffer. secondly, we shouldn't do this despite the fact it works well in most cases.
We add a flag 'CLIENT_CLOSE_AFTER_COMMAND' to mark clients, this flag means we will close the
client after executing commands and send all entire replies, so that we can write replies to
reply buffer during executing commands, send replies to clients, and close them later.
We also fix some implicit problems. If client output buffer limit is enforced in 'multi/exec',
all commands will be executed completely in redis and clients will not read any reply instead of
partial replies. Even more, if the client executes 'ACL deluser' the using user in 'multi/exec',
it will not read the replies after 'ACL deluser' just like before executing 'client kill' itself
in 'multi/exec'.
We added some tests for output buffer limit breach during multi-exec and using a pipeline of
many small commands rather than one with big response.
Co-authored-by: Oran Agra <oran@redislabs.com>
XREADGROUP auto-creates the consumer inside the consumer group the
first time it saw it.
When XREADGROUP is being used with NOACK option, the message will not
be added into the client's PEL and XGROUP SETID would be propagated.
When the replica gets the XGROUP SETID it will only update the last delivered
id of the group, but will not create the consumer.
So, in this commit XGROUP CREATECONSUMER is being added.
Command pattern: XGROUP CREATECONSUMER <key> <group> <consumer>.
When NOACK option is being used, createconsumer command would be
propagated as well.
In case of AOFREWRITE, consumer with an empty PEL would be saved with
XGROUP CREATECONSUMER whereas consumer with pending entries would be
saved with XCLAIM
When all replicas waiting for a bgsave get disconnected (possibly due to output buffer limit),
It may be good to kill the bgsave child. in diskless replication it already happens, but in
disk-based, the child may still serve some purpose (for persistence).
By killing the child, we prevent it from eating COW memory in vain, and we also allow a new child fork sooner for the next full synchronization or bgsave.
We do that only if rdb persistence wasn't enabled in the configuration.
Btw, now, rdbRemoveTempFile in killRDBChild won't block server, so we can killRDBChild safely.
We're already using bg_unlink in several places to delete the rdb file in the background,
and avoid paying the cost of the deletion from our main thread.
This commit uses bg_unlink to remove the temporary rdb file in the background too.
However, in case we delete that rdb file just before exiting, we don't actually wait for the
background thread or the main thread to delete it, and just let the OS clean up after us.
i.e. we open the file, unlink it and exit with the fd still open.
Furthermore, rdbRemoveTempFile can be called from a thread and was using snprintf which is
not async-signal-safe, we now use ll2string instead.
This test was nearly always failing on MacOS github actions.
This is because of bugs in the test that caused it to nearly always run
all 3 attempts and just look at the last one as the pass/fail creteria.
i.e. the test was nearly always running all 3 attempts and still sometimes
succeed. this is because the break condition was different than the test
completion condition.
The reason the test succeeded is because the break condition tested the
results of all 3 tests (PSETEX/PEXPIRE/PEXPIREAT), but the success check
at the end was only testing the result of PSETEX.
The reason the PEXPIREAT test nearly always failed is because it was
getting the current time wrong: getting the current second and loosing
the sub-section time, so the only chance for it to succeed is if it run
right when a certain second started.
Because i now get the time from redis, adding another round trip, i
added another 100ms to the PEXPIRE test to make it less fragile, and
also added many more attempts.
Adding many more attempts before failure to account for slow platforms,
github actions and valgrind
This is a catch-all test to confirm that that rewrite produces a valid
output for all parameters and that this process does not introduce
undesired configuration changes.
Save parameters should either be default or whatever specified in the
config file. This fixes an issue introduced in #7092 which causes
configuration file settings to be applied on top of the defaults.
There was a bug. Although cluster replicas would allow read commands,
they would not allow a MULTI-EXEC that's composed solely of read commands.
Adds tests for coverage.
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Eran Liberty <eranl@amazon.com>
if there are nested tests and nested servers, we need to restore the
previous value of cur_test when a test exist.
example:
```
test{test 1} {
start_server {
test{test 1.1 - master only} {
}
start_server {
test{test 1.2 - with replication} {
}
}
}
}
```
when `test 1.1 - master only exists`, we're still inside `test 1`
1) cur_test: when restart_server, "no such variable" error occurs
./runtest --single integration/rdb
test {client freed during loading}
SET ::cur_test
restart_server
kill_server
test "Check for memory leaks (pid $pid)"
SET ::cur_test
UNSET ::cur_test
UNSET ::cur_test // This global variable has been unset.
2) `ps --ppid` not available on macOS platform, can be replaced with
`pgrep -P pid`.
This test was failing from time to time see discussion at the bottom of #7635
This was probably due to timing, the DEBUG SLEEP executed by redis-cli
didn't sleep for enough time.
This commit changes:
1) use SET-ACTIVE-EXPIRE instead of DEBUG SLEEP
2) reduce many `after` sleeps with retry loops to speed up the test.
3) add many comment explaining the different steps of the test and
it's purpose.
4) config appendonly before populating the volatile keys, so that they'll
be part of the AOF command stream rather than the preamble RDB portion.
other complications: recently kill_instance switched from SIGKILL to
SIGTERM, and this would sometimes fail since there was an AOFRW running
in the background. now we wait for it to end before attempting the kill.
There is an inherent race condition in port allocation for spawned
servers. If a server fails to start because a port is taken, a new port
is allocated. This fixes a problem where the logs are not truncated and
as a result a large number of unmonitored servers are started.
2b998de46 added a file for stderr to keep valgrind log but i forgot to
add a similar thing when valgrind isn't being used.
the result is that `glob */err.txt` fails.
Starting redis 6.0 and the changes we made to the diskless master to be
suitable for TLS, I made the master avoid reaping (wait3) the pid of the
child until we know all replicas are done reading their rdb.
I did that in order to avoid a state where the rdb_child_pid is -1 but
we don't yet want to start another fork (still busy serving that data to
replicas).
It turns out that the solution used so far was problematic in case the
fork child was being killed (e.g. by the kernel OOM killer), in that
case there's a chance that we currently disabled the read event on the
rdb pipe, since we're waiting for a replica to become writable again.
and in that scenario the master would have never realized the child
exited, and the replica will remain hung too.
Note that there's no mechanism to detect a hung replica while it's in
rdb transfer state.
The solution here is to add another pipe which is used by the parent to
tell the child it is safe to exit. this mean that when the child exits,
for whatever reason, it is safe to reap it.
Besides that, i'm re-introducing an adjustment to REPLCONF ACK which was
part of #6271 (Accelerate diskless master connections) but was dropped
when that PR was rebased after the TLS fork/pipe changes (5a47794).
Now that RdbPipeCleanup no longer calls checkChildrenDone, and the ACK
has chance to detect that the child exited, it should be the one to call
it so that we don't have to wait for cron (server.hz) to do that.
- redirect valgrind reports to a dedicated file rather than console
- try to avoid killing instances with SIGKILL so that we get the memory
leak report (killing with SIGTERM before resorting to SIGKILL)
- search for valgrind reports when done, print them and fail the tests
- add --dont-clean option to keep the logs on exit
- fix exit error code when crash is found (would have exited with 0)
changes that affect the normal redis test suite:
- refactor check_valgrind_errors into two functions one to search and
one to report
- move the search half into util.tcl to serve the cluster tests too
- ignore "address range perms" valgrind warnings which seem non relevant.
in some cases a command that returns an error possibly due to a timing
issue causes the tcl code to crash and thus prevents the rest of the
tests from running. this adds an option to make the test proceed despite
the crash.
maybe it should be the default mode some day.
- skip full units
- skip a single test (not just a list of tests)
- when skipping tag, skip spinning up servers, not just the tests
- skip tags when running against an external server too
- allow using multiple tags (split them)
During long running scripts or loading RDB/AOF, we may need to do some
defragging. Since processEventsWhileBlocked is called periodically at
unknown intervals, and many cron jobs either depend on run_with_period
(including active defrag), or rely on being called at server.hz rate
(i.e. active defrag knows ho much time to run by looking at server.hz),
the whileBlockedCron may have to run a loop triggering the cron jobs in it
(currently only active defrag) several times.
Other changes:
- Adding a test for defrag during aof loading.
- Changing key-load-delay config to take negative values for fractions
of a microsecond sleep
If the server gets MULTI command followed by only read
commands, and right before it gets the EXEC it reaches OOM,
the client will get OOM response.
So, from now on, it will get OOM response only if there was
at least one command that was tagged with `use-memory` flag
When calling to LPOS command when RANK is higher than matches,
the return value is non valid response. For example:
```
LPUSH l a
:1
LPOS l b RANK 5 COUNT 10
*-4
```
It may break client-side parser.
Now, we count how many replies were replied in the array.
```
LPUSH l a
:1
LPOS l b RANK 5 COUNT 10
*0
```
After fork, the child process(redis-aof-rewrite) will get the fd opened
by the parent process(redis), when redis killed by kill -9, it will not
graceful exit(call prepareForShutdown()), so redis-aof-rewrite thread may still
alive, the fd(lock) will still be held by redis-aof-rewrite thread, and
redis restart will fail to get lock, means fail to start.
This issue was causing failures in the cluster tests in github actions.
Co-authored-by: Oran Agra <oran@redislabs.com>
Add Linux kernel OOM killer control option.
This adds the ability to control the Linux OOM killer oom_score_adj
parameter for all Redis processes, depending on the process role (i.e.
master, replica, background child).
A oom-score-adj global boolean flag control this feature. In addition,
specific values can be configured using oom-score-adj-values if
additional tuning is required.
This is a rebased version of #3078 originally by shaharmor
with the following patches by TysonAndre made after rebasing
to work with the updated C API:
1. Add 2 more unit tests
(wrong argument count error message, integer over 64 bits)
2. Use addReplyArrayLen instead of addReplyMultiBulkLen.
3. Undo changes to src/help.h - for the ZMSCORE PR,
I heard those should instead be automatically
generated from the redis-doc repo if it gets updated
Motivations:
- Example use case: Client code to efficiently check if each element of a set
of 1000 items is a member of a set of 10 million items.
(Similar to reasons for working on #7593)
- HMGET and ZMSCORE already exist. This may lead to developers deciding
to implement functionality that's best suited to a regular set with a
data type of sorted set or hash map instead, for the multi-get support.
Currently, multi commands or lua scripting to call sismember multiple times
would almost definitely be less efficient than a native smismember
for the following reasons:
- Need to fetch the set from the string every time
instead of reusing the C pointer.
- Using pipelining or multi-commands would result in more bytes sent
and received by the client for the repeated SISMEMBER KEY sections.
- Need to specially encode the data and decode it from the client
for lua-based solutions.
- Proposed solutions using Lua or SADD/SDIFF could trigger writes to
memory, which is undesirable on a redis replica server
or when commands get replicated to replicas.
Co-Authored-By: Shahar Mor <shahar@peer5.com>
Co-Authored-By: Tyson Andre <tysonandre775@hotmail.com>
Added RedisModule_HoldString that either returns a
shallow copy of the given String (by increasing
the String ref count) or a new deep copy of String
in case its not possible to get a shallow copy.
Co-authored-by: Itamar Haber <itamar@redislabs.com>
When runtest-cluster, at first, we need to create a cluster use spawn_instance,
a port which is not used is choosen, however sometimes we can't run server on
the port. possibley due to a race with another process taking it first.
such as redis/redis/runs/896537490. It may be due to the machine problem or
In order to reduce the probability of failure when start redis in
runtest-cluster, we attemp to use another port when find server do not start up.
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: yanhui13 <yanhui13@meituan.com>
Diskless master has some inherent latencies.
1) fork starts with delay from cron rather than immediately
2) replica is put online only after an ACK. but the ACK
was sent only once a second.
3) but even if it would arrive immediately, it will not
register in case cron didn't yet detect that the fork is done.
Besides that, when a replica disconnects, it doesn't immediately
attempts to re-connect, it waits for replication cron (one per second).
in case it was already online, it may be important to try to re-connect
as soon as possible, so that the backlog at the master doesn't vanish.
In case it disconnected during rdb transfer, one can argue that it's
not very important to re-connect immediately, but this is needed for the
"diskless loading short read" test to be able to run 100 iterations in 5
seconds, rather than 3 (waiting for replication cron re-connection)
changes in this commit:
1) sync command starts a fork immediately if no sync_delay is configured
2) replica sends REPLCONF ACK when done reading the rdb (rather than on 1s cron)
3) when a replica unexpectedly disconnets, it immediately tries to
re-connect rather than waiting 1s
4) when when a child exits, if there is another replica waiting, we spawn a new
one right away, instead of waiting for 1s replicationCron.
5) added a call to connectWithMaster from replicationSetMaster. which is called
from the REPLICAOF command but also in 3 places in cluster.c, in all of
these the connection attempt will now be immediate instead of delayed by 1
second.
side note:
we can add a call to rdbPipeReadHandler in replconfCommand when getting
a REPLCONF ACK from the replica to solve a race where the replica got
the entire rdb and EOF marker before we detected that the pipe was
closed.
in the test i did see this race happens in one about of some 300 runs,
but i concluded that this race is unlikely in real life (where the
replica is on another host and we're more likely to first detect the
pipe was closed.
the test runs 100 iterations in 3 seconds, so in some cases it'll take 4
seconds instead (waiting for another REPLCONF ACK).
Removing unneeded startBgsaveForReplication from updateSlavesWaitingForBgsave
Now that CheckChildrenDone is calling the new replicationStartPendingFork
(extracted from serverCron) there's actually no need to call
startBgsaveForReplication from updateSlavesWaitingForBgsave anymore,
since as soon as updateSlavesWaitingForBgsave returns, CheckChildrenDone is
calling replicationStartPendingFork that handles that anyway.
The code in updateSlavesWaitingForBgsave had a bug in which it ignored
repl-diskless-sync-delay, but removing that code shows that this bug was
hiding another bug, which is that the max_idle should have used >= and
not >, this one second delay has a big impact on my new test.
Syntax: `ZMSCORE KEY MEMBER [MEMBER ...]`
This is an extension of #2359
amended by Tyson Andre to work with the changed unstable API,
add more tests, and consistently return an array.
- It seemed as if it would be more likely to get reviewed
after updating the implementation.
Currently, multi commands or lua scripting to call zscore multiple times
would almost definitely be less efficient than a native ZMSCORE
for the following reasons:
- Need to fetch the set from the string every time instead of reusing the C
pointer.
- Using pipelining or multi-commands would result in more bytes sent by
the client for the repeated `ZMSCORE KEY` sections.
- Need to specially encode the data and decode it from the client
for lua-based solutions.
- The fastest solution I've seen for large sets(thousands or millions)
involves lua and a variadic ZADD, then a ZINTERSECT, then a ZRANGE 0 -1,
then UNLINK of a temporary set (or lua). This is still inefficient.
Co-authored-by: Tyson Andre <tysonandre775@hotmail.com>
- the test now waits for specific set of log messages rather than wait for
timeout looking for just one message.
- we don't wanna sample the current length of the log after an action, due
to a race, we need to start the search from the line number of the last
message we where waiting for.
- when attempting to trigger a full sync, use multi-exec to avoid a race
where the replica manages to re-connect before we completed the set of
actions that should force a full sync.
- fix verify_log_message which was broken and unused
on ci.redis.io the test fails a lot, reporting that bgsave didn't end.
increaseing the timeout we wait for that bgsave to get aborted.
in addition to that, i also verify that it indeed got aborted by
checking that the save counter wasn't reset.
add another test to verify that a successful bgsave indeed resets the
change counter.
in cases where you have
test name {
start_server {
start_server {
assert
}
}
}
the exception will be thrown to the test proc, and the servers are
supposed to be killed on the way out. but it seems there was always a
bug of not cleaning the server stack, and recently (#7404) we started
relying on that stack in order to kill them, so with that bug sometimes
we would have tried to kill the same server twice, and leave one alive.
luckly, in most cases the pattern is:
start_server {
test name {
}
}
This re-implements the redis-cli --pipe test so it no longer depends on a close feature available only in TCL 8.6.
Basically what this test does is run redis-cli --pipe, generates a bunch of commands and pipes them through redis-cli, and inspects the result in both Redis and the redis-cli output.
To do that, we need to close stdin for redis-cli to indicate we're done so it can flush its buffers and exit. TCL has bi-directional channels can only offers a way to "one-way close" a channel with TCL 8.6. To work around that, we now generate the commands into a file and feed that file to redis-cli directly.
As we're writing to an actual file, the number of commands is now reduced.
Before this commit, the output of "./runtest-cluster --help" is incorrect.
After this commit, the format of the following 3 output is consistent:
./runtest --help
./runtest-cluster --help
./runtest-sentinel --help
interestingly the latency monitor test fails because valgrind is slow
enough so that the time inside PEXPIREAT command from the moment of
the first mstime() call to get the basetime until checkAlreadyExpired
calls mstime() again is more than 1ms, and that test was too sensitive.
using this opportunity to speed up the test (unrelated to the failure)
the fix is just the longer time passed to PEXPIRE.
in the majority of the cases (on this rarely used feature) we want to
stop and be able to connect to the shard with redis-cli.
since these are two different processes interracting with the tty we
need to stop both, and we'll have to hit enter twice, but it's not that
bad considering it is rarely used.
* Tests: fix and reintroduce redis-cli tests.
These tests have been broken and disabled for 10 years now!
* TLS: add remaining redis-cli support.
This adds support for the redis-cli --pipe, --rdb and --replica options
previously unsupported in --tls mode.
* Fix writeConn().
Similarly to EXPIREAT with TTL in the past, which implicitly deletes the
key and return success, RESTORE should not store key that are already
expired into the db.
When used together with REPLACE it should emit a DEL to keyspace
notification and replication stream.
On some platforms strtold("+inf") with valgrind returns a non-inf result
[err]: INCRBYFLOAT does not allow NaN or Infinity in tests/unit/type/incr.tcl
Expected 'ERR*would produce*' to equal or match '1189731495357231765085759.....'
tests were sensitive to additional log lines appearing in the log
causing the search to come empty handed.
instead of just looking for the n last log lines, capture the log lines
before performing the action, and then search from that offset.
* tests/valgrind: don't use debug restart
DEBUG REATART causes two issues:
1. it uses execve which replaces the original process and valgrind doesn't
have a chance to check for errors, so leaks go unreported.
2. valgrind report invalid calls to close() which we're unable to resolve.
So now the tests use restart_server mechanism in the tests, that terminates
the old server and starts a new one, new PID, but same stdout, stderr.
since the stderr can contain two or more valgrind report, it is not enough
to just check for the absence of leaks, we also need to check for some known
errors, we do both, and fail if we either find an error, or can't find a
report saying there are no leaks.
other changes:
- when killing a server that was already terminated we check for leaks too.
- adding DEBUG LEAK which was used to test it.
- adding --trace-children to valgrind, although no longer needed.
- since the stdout contains two or more runs, we need slightly different way
of checking if the new process is up (explicitly looking for the new PID)
- move the code that handles --wait-server to happen earlier (before
watching the startup message in the log), and serve the restarted server too.
* squashme - CR fixes