When a client blocks for a consumer group, we don't know the actual ID
we want to be served: other clients blocked in the same consumer group
may be served first, so the consumer group latest delivered ID changes.
This was not handled correctly, all the clients in the consumer group
were unblocked without data but the first.
With such information will be able to use a private localtime()
implementation serverLog(), which does not use any locking and is both
thread and fork() safe.
PR #5081 fixes an "interesting" bug about Redis Cluster failover but in
general about the updating of repl_down_since, that is used in order to
count the time a slave was left disconnected from its master.
While the fix provided resolves the specific issue, in general the
validity of repl_down_since is limited to states that are different
than the state CONNECTED, and the disconnected time is set when the
state is DISCONNECTED. However from CONNECTED to other states, the state
machine must always go to DISCONNECTED first. So it makes sense to set
the field to zero (since it is meaningless in that context) when the
state is set to CONNECTED.
Instead of telling the user to set the renamed command to "" to remove
the renaming, to the obvious thing when a command is renamed to itself.
So if I want to remove the renaming of PING, I just rename it to PING
again.
Unlike the BZPOP variants, these functions take a single key. This fixes
an erroneous CROSSSLOT error when passing a count to a cluster enabled
server.
RESTORE now supports:
1. Setting LRU/LFU
2. Absolute-time TTL
Other related changes:
1. RDB loading will not override LRU bits when RDB file
does not contain the LRU opcode.
2. RDB loading will not set LRU/LFU bits if the server's
maxmemory-policy does not match.
this reduces the extra 8 bytes we save before each pointer.
but more importantly maybe, it makes the valgrind runs to be more similiar
to our normal runs.
note: the change in malloc_stats struct in server.h is to eliminate an name conflict.
structs that are not typedefed are resolved from a separate name space.
due to incorrect forward declaration, it didn't provide all arguments.
this lead to random value being read from the stack and return of incorrect time,
which in this case doesn't matter since no one uses it.
Basically we cannot be sure that if the key is expired while writing the
AOF, the main thread will surely find the key expired. There are
possible race conditions like the moment at which the "now" is sampled,
and the fact that time may jump backward.
Think about the following:
SET a 5
EXPIRE a 1
AOF rewrite starts after about 1 second. The child process finds the key
expired, while in the main thread instead an INCR command is called
against the key "a" immediately after a fork, and the scheduler was
faster to give execution time to the main thread, so "a" is yet not
expired.
The main thread will generate an INCR a command to the AOF log that will
be appended to the rewritten AOF file, but that INCR command will target
a non existin "a" key, so a new non volatile key "a" will be created.
Two observations:
A) In theory by computing "now" before the fork, we should be sure that
if a key is expired at that time, it will be expired later when the
main thread will try to access to such key. However this does not take
into account the fact that the computer time may jump backward.
B) Technically we may still make the process safe by using a monotonic
time source.
However there were other similar related bugs, and in general the new
"vision" is that Redis persistence files should represent the memory
state without trying to be too smart: this makes the design more
consistent, bugs less likely to arise from complex interactions, and in
the end what is to fix is the Redis expire process to have less expired
keys in RAM.
Thanks to Oran Agra and Guy Benoish for writing me an email outlining
this problem, after they conducted a Redis 5 code review.
The old version could not handle the fact that "STREAMS" is a valid key
name for streams. Now we really try to parse the command like the
command implementation would do.
Related to #5028 and 4857.
The loop allocated a buffer for the right number of keys positions, then
overflowed it going past the limit.
Related to #4857 and cause of the memory violation seen in #5028.
Now a MAXLEN of 0 really does what it means: it will create a zero
entries stream. This is useful in order to make sure that the behavior
is identical to XTRIM, that must be able to reduce the stream to zero
elements when MAXLEN is given.
Also now MAXLEN with a count < 0 will return an error.
The ability of "SENTINEL SET" to change the reconfiguration script at
runtime is a problem even in the security model of Redis: any client
inside the network may set any executable to be ran once a failover is
triggered.
This option adds protection for this problem: by default the two
SENTINEL SET subcommands modifying scripts paths are denied. However the
user is still able to rever that using the Sentinel configuration file
in order to allow such a feature.
This way we let big endian systems to still load old RDB versions.
However newver versions will be saved and loaded in a way that make RDB
expires cross-endian again. Thanks to @oranagra for the reporting and
the discussion about this problem, leading to this fix.
Currently it does not look it's sensible to generate events for streams
consumer groups modification, being them metadata, however at least for
key-level events, like the creation or removal of a consumer group, I
added a few events here and there. Later we can evaluate if it makes
sense to add more. From the POV instead of WAIT (in Redis transaciton)
and signaling the key as modified, it looks like that the transaction
should not fail when a stream is modified, so no calls are made in
consumer groups related functions to signalModifiedKey().
Again thanks to @oranagra. The object idle time does not fit into an int
sometimes: use the native type that the serialization function will get
as argument, which is uint64_t.
A user with many connections (10 thousand) on a single Redis server
reports in issue #4983 that sometimes Redis is idle becuase at the same
time many clients need to resize their query buffer according to the old
policy.
It looks like this was created by the fact that we allow the query
buffer to grow without problems to a size up to PROTO_MBULK_BIG_ARG
normally, but when the client is idle we immediately are more strict,
and a query buffer greater than 1024 bytes is already enough to trigger
the resize. So for instance if most of the clients stop at the same time
this issue should be easily triggered.
This behavior actually looks odd, and there should be only a clear limit
after we say, let's look at this query buffer to check if it's time to
resize it. This commit puts the limit at PROTO_MBULK_BIG_ARG, and the
check is performed both if compared to the peak usage the current usage
is too big, or if the client is idle.
Then when the check is performed, to waste just a few kbytes is
considered enough to proceed with the resize. This should fix the issue.
We unblocked the client too early, when the group name object was no
longer valid in client->bpop, so propagating XCLAIM later in
streamPropagateXCLAIM() deferenced a field already set to NULL.
Now that we have SETID, the inetrnals of consumer groups should be able
to handle the case of the same message delivered multiple times just
as a side effect of calling XREADGROUP. Normally this should never
happen but if the admin manually "XGROUP SETID mykey mygroup 0",
messages will get re-delivered to clients waiting for the ">" special
ID. The consumer groups internals were not able to handle the case of a
message re-delivered in this circumstances that was already assigned to
another owner.
and will not be inconsistent after we call debug loadaof.
Before this commit, there were 2 problems:
1, When appendonly is set to no and there is not a appendonly file,
redis-server will crash if we call DEBUG LOADAOF.
2, When appendonly is set to no and there is a appendonly file,
redis-server will hold different data after loading appendonly
file.
The AOF tail of a combined RDB+AOF is based on the premise of applying
the AOF commands to the exact state that there was in the server while
the RDB was persisted. By expiring keys while loading the RDB file, we
change the state, so applying the AOF tail later may change the state.
Test case:
* Time1: SET a 10
* Time2: EXPIREAT a $time5
* Time3: INCR a
* Time4: PERSIT A. Start bgrewiteaof with RDB preamble. The value of a is 11 without expire time.
* Time5: Restart redis from the RDB+AOF: consistency violation.
Thanks to @soloestoy for providing the patch.
Thanks to @trevor211 for the original issue report and the initial fix.
Check issue #4950 for more info.