This commit adds two new fields in the INFO output, stats section:
expired_stale_perc:0.34
expired_time_cap_reached_count:58
The first field is an estimate of the number of keys that are yet in
memory but are already logically expired. They reason why those keys are
yet not reclaimed is because the active expire cycle can't spend more
time on the process of reclaiming the keys, and at the same time nobody
is accessing such keys. However as the active expire cycle runs, while
it will eventually have to return to the caller, because of time limit
or because there are less than 25% of keys logically expired in each
given database, it collects the stats in order to populate this INFO
field.
Note that expired_stale_perc is a running average, where the current
sample accounts for 5% and the history for 95%, so you'll see it
changing smoothly over time.
The other field, expired_time_cap_reached_count, counts the number
of times the expire cycle had to stop, even if still it was finding a
sizeable number of keys yet to expire, because of the time limit.
This allows people handling operations to understand if the Redis
server, during mass-expiration events, is able to collect keys fast
enough usually. It is normal for this field to increment during mass
expires, but normally it should very rarely increment. When instead it
constantly increments, it means that the current workloads is using
a very important percentage of CPU time to expire keys.
This feature was created thanks to the hints of Rashmi Ramesh and
Bart Robinson from Twitter. In private email exchanges, they noted how
it was important to improve the observability of this parameter in the
Redis server. Actually in big deployments, the amount of keys that are
yet to expire in each server, even if they are logically expired, may
account for a very big amount of wasted memory.
It is possible to do BGREWRITEAOF even if appendonly=no. This is by design.
stopAppendonly() didn't turn off aof_rewrite_scheduled (it can be turned on
again by BGREWRITEAOF even while appendonly is off anyway).
After configuring `appendonly yes` it will see that the state is AOF_OFF,
there's no RDB fork, so it will do rewriteAppendOnlyFileBackground() which
will fail since the aof_child_pid is set (was scheduled and started by cron).
Solution:
stopAppendonly() will turn off the schedule flag (regardless of who asked for it).
startAppendonly() will terminate any existing fork and start a new one (so it is the most recent).
When feeding the master with a high rate traffic the the slave's feed is much slower.
This causes the replication buffer to grow (indefinitely) which leads to slave disconnection.
The problem is that writeToClient() decides to stop writing after NET_MAX_WRITES_PER_EVENT
writes (In order to be fair to clients).
We should ignore this when the client is a slave.
It's better if clients wait longer, the alternative is that the slave has no chance to stay in
sync in this situation.
See #3462 and related PRs.
We use a simple algorithm to calculate the level of affinity violation,
and then an optimizer that performs random swaps until things improve.
after a slave is promoted (assuming it has no slaves
and it booted over an hour ago), it will lose it's replication
backlog at the next replication cron, rather than waiting for slaves
to connect to it.
so on a simple master/slave faiover, if the new slave doesn't connect
immediately, it may be too later and PSYNC2 will fail.
This fixes a crash with Redis Cluster when OBJECT is mis-used, because
getKeysUsingCommandTable() will call serverPanic() detecting we are
accessing an invalid argument in the case "OBJECT foo" is called.
This bug was introduced when OBJECT HELP was introduced, because the key
argument is set fixed at index 2 in the command table, however now
OBJECT may be called with an insufficient number of arguments to extract
the key.
The "Right Thing" would be to have a specific function to extract keys
from the OBJECT command, however this is kinda of an overkill, so I
preferred to make getKeysUsingCommandTable() more robust and just return
no keys when it's not possible to honor the command table, because new
commands are often added and also there are a number with an HELP
subcommand violating the normal form, and crashing for this trivial
reason or having many command-specific key extraction functions is not
great.
- protocol parsing (processMultibulkBuffer) was limitted to 32big positions in the buffer
readQueryFromClient potential overflow
- rioWriteBulkCount used int, although rioWriteBulkString gave it size_t
- several places in sds.c that used int for string length or index.
- bugfix in RM_SaveAuxField (return was 1 or -1 and not length)
- RM_SaveStringBuffer was limitted to 32bit length
The commit splits the add functions into a set() and add() set of
functions, so that it's possible to set registers in an independent way
just having the index and count.
Related to #3819, otherwise a fix is not possible.
The main change introduced by this commit is pretending that help
arrays are more text than code, thus indenting them at level 0. This
improves readability, and is an old practice when defining arrays of
C strings describing text.
Additionally a few useless return statements are removed, and the HELP
subcommand capitalized when printed to the user.
We have this operation in two places: when caching the master and
when linking a new client after the client creation. By having an API
for this we avoid incurring in errors when modifying one of the two
places forgetting the other. The function is also a good place where to
document why we cache the linked list node.
Related to #4497 and #4210.
The function in its initial form, and after the fixes for the PSYNC2
bugs, required code duplication in multiple spots. This commit modifies
it in order to always compute the script name independently, and to
return the SDS of the SHA of the body: this way it can be used in all
the places, including for SCRIPT LOAD, without duplicating the code to
create the Lua function name. Note that this requires to re-compute the
body SHA1 in the case of EVAL seeing a script for the first time, but
this should not change scripting performance in any way because new
scripts definition is a rare event happening the first time a script is
seen, and the SHA1 computation is anyway not a very slow process against
the typical Redis script and compared to the actua Lua byte compiling of
the body.
Note that the function used to assert() if a duplicated script was
loaded, however actually now two times over three, we want the function
to handle duplicated scripts just fine: this happens in SCRIPT LOAD and
in RDB AUX "lua" loading. Moreover the assert was not defending against
some obvious failure mode, so now the function always tests against
already defined functions at start.
Unfortunately, as outlined by @soloestoy in #4505, "lua" AUX RDB field
loading in case of duplicated script was still broken. This commit fixes
this problem and also a memory leak introduced by the past commit.
Note that now we have a regression test able to duplicate the issue, so
this commit was actually tested against the regression. The original PR
also had a valid fix, but I prefer to hide the details of scripting.c
outside scripting.c, and later "SCRIPT LOAD" should also be able to use
the function luaCreateFunction() instead of redoing the work.
With PSYNC2 to force a full SYNC in tests is hard. With this new DEBUG
subcommand we just need to call it and then CLIENT KILL TYPE master in
the slave.
In the case of slaves loading the RDB from master, or in other similar
cases, the script is already defined, and the function registering the
script should not fail in the assert() call.
It's a bit of black magic without actually tracking it inside rax.c,
however Redis usage of the radix tree for the stream data structure is
quite consistent, so a few magic constants apparently are producing
results that make sense.
Note that streams produced by XADD in previous broken versions having
elements with 4096 bytes or more will be permanently broken and must be
created again from scratch.
Fix#4428Fix#4349
After checking with the community via Twitter (here:
https://twitter.com/antirez/status/915130876861788161) the verdict was to
use ":". However I later realized, after users lamented the fact that
it's hard to copy IDs just with double click, that this was the reason
why I moved to "." in the first instance. Fortunately "-", that was the
other option with most votes, also gets selected with double click on
most terminal applications on Linux and MacOS.
So my reasoning was:
1) We can't retain "." because it's actually confusing to newcomers, it
looks like a floating number, people may be tricked into thinking they
can order IDs numerically as floats.
2) Moving to a double-click-to-select format is much better. People will
work with such IDs for long time when coding / debugging. Why making now
a choice that will impact this for the next years?
The only other viable option was "-", and that's what I did. Thanks.
The core of this change is the implementation of stream trimming, and
the resulting MAXLEN option of XADD as a trivial result of having
trimming functionalities. MAXLEN already works but in order to be more
efficient listpack GC should be implemented, currently marked as a TODO
item inside the comments.
Listpack max size is a tradeoff between space and time. A 2k max entry
puts the memory usage approximately at a similar order of magnitude (5
million entries went from 96 to 120 MB), but the range queries speed
doubled (because there are half entries to scan in the average case).
Lower values could be considered, or maybe this parameter should be
made tunable.
We used to have the master ID stored at the start of the listpack,
however using the key directly makes more sense in order to create a
space efficient representation: anyway the key at the radix tree is very
unlikely to change because of how the stream is implemented. Moreover on
nodes merging, to rewrite the merged listpacks is anyway the most
sensible operation, and we can use the iterator and the append-to-stream
function in order to avoid re-implementing the code needed for merging.
This commit also adds two items at the start of the listpack: the
number of valid items inside the listpack, and the number of items
marked as deleted. This means that there is no need to scan a listpack
in order to understand if it's a good candidate for garbage collection,
if the ration between valid/deleted items triggers the GC.
The approach used is to set a fixed header at the start of every
listpack blob (that contains many entries). The header contains a
"master" ID and fields, that are initially just obtained from the first
entry inserted in the listpack, so that the first enty is always well
compressed. Later every new entry is checked against these fields, and
if it matches, the SAMEFIELD flag is set in the entry so that we know to
just use the master entry flags. The IDs are always delta-encoded
against the first entry. This approach avoids cascading effects in which
entries are encoded depending on the previous entries, in order to avoid
complexity and rewritings of the data when data is removed in the middle
(which is a planned feature).
blockForKeys() was not freeing the allocation holding the ID when the
key was already found busy. Fortunately the unit test checked explicitly
for blocking multiple times for the same key (copying a regression in
the blocking lists tests), so the bug was detected by the Redis test leak
checker.
XADD was suboptimal in the first incarnation of the command, not being
able to accept an ID (very useufl for replication), nor options for
having capped streams.
The keyspace notification for streams was not implemented.
A client may lose a lot of time between invocations of blocking XREAD,
for example because it is processing the messages or for any other
cause. When it returns back, it may provide a low enough message ID that
the server will block to send an unreasonable number of messages in a
single call. For this reason we set a COUNT when the client is blocked
with XREAD calls, even if no COUNT is given. This is arbitrarily set to
1000 because it's enough to avoid slowing down the reception of many
messages, but low enough to avoid to block.
With lists we need to signal only on key creation, but streams can
provide data to clients listening at every new item added.
To make this slightly more efficient we now track different classes of
blocked clients to avoid signaling keys when there is nobody listening.
A typical case is when the stream is used as a time series DB and
accessed only by range with XRANGE.
After a few attempts it looked quite saner to just add the last item ID
at the end of the serialized listpacks, instead of scanning the last
listpack loaded from head to tail just to fetch it. It's a disk space VS
CPU-and-simplicity tradeoff basically.
Related to #4483. As suggested by @soloestoy, we can retrieve the SHA1
from the body. Given that in the new implementation using AUX fields we
ended copying around a lot to create new objects and strings, extremize
such concept and trade CPU for space inside the RDB file.
This is currently needed in order to fix#4483, but this can be
useful in other contexts, so maybe later we may want to remove the
conditionals and always save/load scripts.
Note that we are using the "lua" AUX field here, in order to guarantee
backward compatibility of the RDB file. The unknown AUX fields must be
discarded by past versions of Redis.
Doing the following ended with a broken server.executable:
1. Start Redis with src/redis-server
2. Send CONFIG SET DIR /tmp/
3. Send DEBUG RESTART
At this point we called execve with an argv[0] that is no longer related
to the new path. So after the restart the absolute path of the
executable is recomputed in the wrong way. With this fix we pass the
absolute path already computed as argv[0].
This adds a new `addReplyHelp` helper that's used by commands
when returning a help text. The following commands have been
touched: DEBUG, OBJECT, COMMAND, PUBSUB, SCRIPT and SLOWLOG.
WIP
Fix entry command table entry for OBJECT for HELP option.
After #4472 the command may have just 2 arguments.
Improve OBJECT HELP descriptions.
See #4472.
WIP 2
WIP 3
See #4192, the original PR removed lines of code that are actually
needed, so thanks to @chunqiulfq for reporting the problem, but merging
solution from @jeesyn after checking, together with @artix75, that the
logic covers all the cases.
Firstly, use access time to replace the decreas time of LFU.
For function LFUDecrAndReturn,
it should only try to get decremented counter,
not update LFU fields, we will update it in an explicit way.
And we will times halve the counter according to the times of
elapsed time than server.lfu_decay_time.
Everytime a key is accessed, we should update the LFU
including update access time, and increment the counter after
call function LFUDecrAndReturn.
If a key is overwritten, the LFU should be also updated.
Then we can use `OBJECT freq` command to get a key's frequence,
and LFUDecrAndReturn should be called in `OBJECT freq` command
in case of the key has not been accessed for a long time,
because we update the access time only when the key is read or
overwritten.
getLongLongFromObject calls string2ll which has this line:
/* Return if not all bytes were used. */
so if you pass an sds with 3 characters "1\01" it will fail.
but getLongDoubleFromObject calls strtold, and considers it ok if eptr[0]==`\0`
i.e. if the end of the string found by strtold ends with null terminator
127.0.0.1:6379> set a 1
OK
127.0.0.1:6379> setrange a 2 2
(integer) 3
127.0.0.1:6379> get a
"1\x002"
127.0.0.1:6379> incrbyfloat a 2
"3"
127.0.0.1:6379> get a
"3"
For example:
1. A module command called within a MULTI section.
2. A Lua script with replicate_commands() called within a MULTI section.
3. A module command called from a Lua script in the above context.
Normally in modern Redis you can't create zero-len lists, however it's
possible to load them from old RDB files generated, for instance, using
Redis 2.8 (see issue #4409). The "Right Thing" would be not loading such
lists at all, but this requires to hook in rdb.c random places in a not
great way, for a problem that is at this point, at best, minor.
Here in this commit instead I just fix the fact that zero length lists,
materialized as quicklists with the first node set to NULL, were
iterated in the wrong way while they are saved, leading to a crash.
The other parts of the list implementation are apparently able to deal
with empty lists correctly, even if they are no longer a thing.
Since SDS v2, we no longer have a single header, so the function to
rewrite the SDS in terms of the minimum space required, instead of just
using realloc() and let the underlying allocator decide what to do,
was doing an allocation + copy every time the minimum possible header
needed to represent the string was different than the current one.
This could be often a bit wasteful, because if we go, for instance, from
the 32 bit fields header to the 16 bit fields header, the overhead of
the header is normally very small. With this commit we call realloc
instead, unless the change in header size is very significant in relation
to the string length.
When we free the backlog, we should use a new
replication ID and clear the ID2. Since without
backlog we can not increment master_repl_offset
even do write commands, that may lead to inconsistency
when we try to connect a "slave-before" master
(if this master is our slave before, our replid
equals the master's replid2). As the master have our
history, so we can match the master's replid2 and
second_replid_offset, that make partial sync work,
but the data is inconsistent.
There was not enough sanity checking in the code loading the slots of
Redis Cluster from the nodes.conf file, this resulted into the
attacker's ability to write data at random addresses in the process
memory, by manipulating the index of the array. The bug seems
exploitable using the following techique: the config file may be altered so
that one of the nodes gets, as node ID (which is the first field inside the
structure) some data that is actually executable: then by writing this
address in selected places, this node ID part can be executed after a
jump. So it is mostly just a matter of effort in order to exploit the
bug. In practice however the issue is not very critical because the
bug requires an unprivileged user to be able to modify the Redis cluster
nodes configuration, and at the same time this should result in some
gain. However Redis normally is unprivileged as well. Yet much better to
have this fixed indeed.
Fix#4278.
Certain checks were useless, at the same time certain malformed inputs
were accepted without problems (emtpy strings parsed as zero).
Cases where strtod() returns ERANGE but we still want to parse the input
where ok in getDoubleFromObject() but not in the long variant.
As a side effect of these fixes, this commit fixes#4391.
This commit is a reinforcement of commit c1c99e9.
1. Replication information can be stored when the RDB file is
generated by a mater using server.slaveseldb when server.repl_backlog
is not NULL, or set repl_stream_db be -1. That's safe, because
NULL server.repl_backlog will trigger full synchronization,
then master will send SELECT command to replicaiton stream.
2. Only do rdbSave* when rsiptr is not NULL,
if we do rdbSave* without rdbSaveInfo, slave will miss repl-stream-db.
3. Save the replication informations also in the case of
SAVE command, FLUSHALL command and DEBUG reload.
This commit attempts to fix a number of bugs reported in #4316.
They are related to the way replication info like replication ID,
offsets, and currently selected DB in the master client, are stored
and loaded by Redis. In order to avoid inconsistencies the changes in
this commit try to enforce that:
1. Replication information are only stored when the RDB file is
generated by a slave that has a valid 'master' client, so that we can
always extract the currently selected DB.
2. When replication informations are persisted in the RDB file, all the
info for a successful PSYNC or nothing is persisted.
3. The RDB replication informations are only loaded if the instance is
configured as a slave, otherwise a master can start with IDs that relate
to a different history of the data set, and stil retain such IDs in the
future while receiving unrelated writes.
A slave may be started with an RDB file able to provide enough slave to
perform a successful partial SYNC with its master. However in such a
case, how outlined in issue #4268, the slave backlog will not be
started, since it was only initialized on full syncs attempts. This
creates different problems with successive PSYNC attempts that will
always result in full synchronizations.
Thanks to @fdingiit for discovering the issue.
when SHUTDOWN command is recived it is possible that some of the recent
command were not yet flushed from the AOF buffer, and the server
experiences data loss at shutdown.
Lua scripting does not support calling blocking commands, however all
the native Redis commands are flagged as "s" (no scripting flag), so
this is not possible at all. With modules there is no such mechanism in
order to flag a command as non callable by the Lua scripting engine,
moreover we cannot trust the modules users from complying all the times:
it is likely that modules will be released to have blocking commands
without such commands being flagged correctly, even if we provide a way to
signal this fact.
This commit attempts to address the problem in a short term way, by
detecting that a module is trying to block in the context of the Lua
scripting engine client, and preventing to do this. The module will
actually believe to block as usually, but what happens is that the Lua
script receives an error immediately, and the background call is ignored
by the Redis engine (if not for the cleanup callbacks, once it
unblocks).
Long term, the more likely solution, is to introduce a new call called
RedisModule_GetClientFlags(), so that a command can detect if the caller
is a Lua script, and return an error, or avoid blocking at all.
Being the blocking API experimental right now, more work is needed in
this regard in order to reach a level well blocking module commands and
all the other Redis subsystems interact peacefully.
Now the effect is like the following:
127.0.0.1:6379> eval "redis.call('hello.block',1,5000)" 0
(error) ERR Error running script (call to
f_b5ba35ff97bc1ef23debc4d6e9fd802da187ed53): @user_script:1: ERR
Blocking module command called from Lua script
This commit fixes issue #4127 in the short term.
This function failed when an internal-only flag was set as an only flag
in a node: the string was trimmed expecting a final comma before
exiting the function, causing a crash. See issue #4142.
Moreover generation of flags representation only needed at DEBUG log
level was always performed: a waste of CPU time. This is fixed as well
by this commit.
The function cache was not working at all, and the function returned
wrong values if there where two or more modules exporting native data
types.
See issue #4131 for more details.
In Redis 4.0 replication, with the introduction of PSYNC2, masters and
slaves replicate commands to cascading slaves and to the replication
backlog itself in a different way compared to the past.
Masters actually replicate the effects of client commands.
Slaves just propagate what they receive from masters.
This mechanism can cause problems when the configuration of an instance
is changed from master to slave inside a transaction. For instance
we could send to a master instance the following sequence:
MULTI
SLAVEOF 127.0.0.1 0
EXEC
SLAVEOF NO ONE
Before the fixes in this commit, the MULTI command used to be propagated
into the replication backlog, however after the SLAVEOF command the
instance is a slave, so the EXEC implementation failed to also propagate
the EXEC command. When the slaves of the above instance reconnected,
they were incrementally synchronized just sending a "MULTI". This put
the master client (in the slaves) into MULTI state, breaking the
replication.
Notably even Redis Sentinel uses the above approach in order to guarantee
that configuration changes are always performed together with rewrites
of the configuration and with clients disconnection. Sentiel does:
MULTI
SLAVEOF ...
CONFIG REWRITE
CLIENT KILL TYPE normal
EXEC
So this was a really problematic issue. However even with the fix in
this commit, that will add the final EXEC to the replication stream in
case the instance was switched from master to slave during the
transaction, the result would be to increment the slave replication
offset, so a successive reconnection with the new master, will not
permit a successful partial resynchronization: no way the new master can
provide us with the backlog needed, we incremented our offset to a value
that the new master cannot have.
However the EXEC implementation waits to emit the MULTI, so that if the
commands inside the transaction actually do not need to be replicated,
no commands propagation happens at all. From multi.c:
if (!must_propagate && !(c->cmd->flags & (CMD_READONLY|CMD_ADMIN))) {
execCommandPropagateMulti(c);
must_propagate = 1;
}
The above code is already modified by this commit you are reading.
Now also ADMIN commands do not trigger the emission of MULTI. It is actually
not clear why we do not just check for CMD_WRITE... Probably I wrote it this
way in order to make the code more reliable: better to over-emit MULTI
than not emitting it in time.
So this commit should indeed fix issue #3836 (verified), however it looks
like some reconsideration of this code path is needed in the long term.
BONUS POINT: The reverse bug.
Even in a read only slave "B", in a replication setup like:
A -> B -> C
There are commands without the READONLY nor the ADMIN flag, that are also
not flagged as WRITE commands. An example is just the PING command.
So if we send B the following sequence:
MULTI
PING
SLAVEOF NO ONE
EXEC
The result will be the reverse bug, where only EXEC is emitted, but not the
previous MULTI. However this apparently does not create problems in practice
but it is yet another acknowledge of the fact some work is needed here
in order to make this code path less surprising.
Note that there are many different approaches we could follow. For instance
MULTI/EXEC blocks containing administrative commands may be allowed ONLY
if all the commands are administrative ones, otherwise they could be
denined. When allowed, the commands could simply never be replicated at all.
In general we do not want before/after sleep() callbacks to be called
when we re-enter the event loop, since those calls are only designed in
order to perform operations every main iteration of the event loop, and
re-entering is often just a way to incrementally serve clietns with
error messages or other auxiliary operations. However, if we call the
callbacks, we are then forced to think at before/after sleep callbacks
as re-entrant, which is much harder without any good need.
However here there was also a clear bug: beforeSleep() was actually
never called when re-entering the event loop. But the new afterSleep()
callback was. This is broken and in this instance re-entering
afterSleep() caused a modules GIL dead lock.
Redis clients need to have an instantaneous idea of the amount of memory
they are consuming (if the number is not exact should at least be
proportional to the actual memory usage). We do that adding and
subtracting the SDS length when pushing / popping from the client->reply
list. However it is quite simple to add bugs in such a setup, by not
taking the objects in the list and the count in sync. For such reason,
Redis has an assertion to track counts near 2^64: those are always the
result of the counter wrapping around because we subtract more than we
add. This commit adds the symmetrical assertion: when the list is empty
since we sent everything, the reply_bytes count should be zero. Thanks
to the new assertion it should be simple to also detect the other
problem, where the count slowly increases because of over-counting.
The assertion adds a conditional in the code that sends the buffer to
the socket but should not create any measurable performance slowdown,
listLength() just accesses a structure field, and this code path is
totally dominated by write(2).
Related to #4100.
This commit closes issue #3698, at least for now, since the root cause
was not fixed: the bounding box function, for huge radiuses, does not
return a correct bounding box, there are points still within the radius
that are left outside.
So when using GEORADIUS queries with radiuses in the order of 5000 km or
more, it was possible to see, at the edge of the area, certain points
not correctly reported.
Because the bounding box for now was used just as an optimization, and
such huge radiuses are not common, for now the optimization is just
switched off when the radius is near such magnitude.
Three test cases found by the Continuous Integration test were added, so
that we can easily trigger the bug again, both for regression testing
and in order to properly fix it as some point in the future.
This feature was proposed by @rosmo in PR #2643 and later redesigned
in order to fit better with the other options for non-interactive modes
of redis-cli. The idea is basically to allow to collect latency
information in scripts, cron jobs or whateever, just running for a
limited time and then producing a single output.
Issue #4084 shows how for a design error, GEORADIUS is a write command
because of the STORE option. Because of this it does not work
on readonly slaves, gets redirected to masters in Redis Cluster even
when the connection is in READONLY mode and so forth.
To break backward compatibility at this stage, with Redis 4.0 to be in
advanced RC state, is problematic for the user base. The API can be
fixed into the unstable branch soon if we'll decide to do so in order to
be more consistent, and reease Redis 5.0 with this incompatibility in
the future. This is still unclear.
However, the ability to scale GEO queries in slaves easily is too
important so this commit adds two read-only variants to the GEORADIUS
and GEORADIUSBYMEMBER command: GEORADIUS_RO and GEORADIUSBYMEMBER_RO.
The commands are exactly as the original commands, but they do not
accept the STORE and STOREDIST options.
This is the first step towards getting rid of HMSET which is a command
that does not make much sense once HSET is variadic, and has a saner
return value.
The original RDB serialization format was not parsable without the
module loaded, becuase the structure was managed only by the module
itself. Moreover RDB is a streaming protocol in the sense that it is
both produce di an append-only fashion, and is also sometimes directly
sent to the socket (in the case of diskless replication).
The fact that modules values cannot be parsed without the relevant
module loaded is a problem in many ways: RDB checking tools must have
loaded modules even for doing things not involving the value at all,
like splitting an RDB into N RDBs by key or alike, or just checking the
RDB for sanity.
In theory module values could be just a blob of data with a prefixed
length in order for us to be able to skip it. However prefixing the values
with a length would mean one of the following:
1. To be able to write some data at a previous offset. This breaks
stremaing.
2. To bufferize values before outputting them. This breaks performances.
3. To have some chunked RDB output format. This breaks simplicity.
Moreover, the above solution, still makes module values a totally opaque
matter, with the fowllowing problems:
1. The RDB check tool can just skip the value without being able to at
least check the general structure. For datasets composed mostly of
modules values this means to just check the outer level of the RDB not
actually doing any checko on most of the data itself.
2. It is not possible to do any recovering or processing of data for which a
module no longer exists in the future, or is unknown.
So this commit implements a different solution. The modules RDB
serialization API is composed if well defined calls to store integers,
floats, doubles or strings. After this commit, the parts generated by
the module API have a one-byte prefix for each of the above emitted
parts, and there is a final EOF byte as well. So even if we don't know
exactly how to interpret a module value, we can always parse it at an
high level, check the overall structure, understand the types used to
store the information, and easily skip the whole value.
The change is backward compatible: older RDB files can be still loaded
since the new encoding has a new RDB type: MODULE_2 (of value 7).
The commit also implements the ability to check RDB files for sanity
taking advantage of the new feature.
It looks safer to return C_OK from freeMemoryIfNeeded() when clients are
paused because returning C_ERR may prevent success of writes. It is
possible that there is no difference in practice since clients cannot
execute writes while clients are paused, but it looks more correct this
way, at least conceptually.
Related to PR #4028.
1. brpop last key index, thus checking all keys for slots.
2. Memory leak in clusterRedirectBlockedClientIfNeeded.
3. Remove while loop in clusterRedirectBlockedClientIfNeeded.