Now that we have SETID, the inetrnals of consumer groups should be able
to handle the case of the same message delivered multiple times just
as a side effect of calling XREADGROUP. Normally this should never
happen but if the admin manually "XGROUP SETID mykey mygroup 0",
messages will get re-delivered to clients waiting for the ">" special
ID. The consumer groups internals were not able to handle the case of a
message re-delivered in this circumstances that was already assigned to
another owner.
and will not be inconsistent after we call debug loadaof.
Before this commit, there were 2 problems:
1, When appendonly is set to no and there is not a appendonly file,
redis-server will crash if we call DEBUG LOADAOF.
2, When appendonly is set to no and there is a appendonly file,
redis-server will hold different data after loading appendonly
file.
The AOF tail of a combined RDB+AOF is based on the premise of applying
the AOF commands to the exact state that there was in the server while
the RDB was persisted. By expiring keys while loading the RDB file, we
change the state, so applying the AOF tail later may change the state.
Test case:
* Time1: SET a 10
* Time2: EXPIREAT a $time5
* Time3: INCR a
* Time4: PERSIT A. Start bgrewiteaof with RDB preamble. The value of a is 11 without expire time.
* Time5: Restart redis from the RDB+AOF: consistency violation.
Thanks to @soloestoy for providing the patch.
Thanks to @trevor211 for the original issue report and the initial fix.
Check issue #4950 for more info.
See issue #2819 for details. The gist is that when we want to send INFO
because we are over the time, we used to send only INFO commands, no
longer sending PING commands. However if a master fails exactly when we
are about to send an INFO command, the PING times will result zero
because the PONG reply was already received, and we'll fail to send more
PINGs, since we try only to send INFO commands: the failure detector
will delay until the connection is closed and re-opened for "long
timeout".
This commit changes the logic so that we can send the three kind of
messages regardless of the fact we sent another one already in the same
code path. It could happen that we go over the message limit for the
link by a few messages, but this is not significant. However now we'll
not introduce delays in sending commands just because there was
something else to send at the same time.
problems fixed:
* failing to read fragmentation information from jemalloc
* overflow in jemalloc fragmentation hint to the defragger
* test suite not triggering eviction after population
Usually blocking operations make a lot of sense with multiple keys so
that we can listen to multiple queues (or whatever the app models) with
a single connection. However in the synchronous case it is more useful
to be able to ask for N elements. This is a change that I also wanted to
perform soon or later in the blocking list variant, but here it is more
natural since there is no reply type difference.
Some times it was not released on error, sometimes it was released two
times because the error path expected the "di" var to be NULL if the
iterator was already released. Thanks to @oranagra for pinging me about
potential problems of this kind inside rdb.c.
- Almost all Cluster Manager related code moved to
the same section.
- Many macroes converted to functions
- Added various comments
- Little code restyling
This is useful in the reply and timeout callback, if the module wants to
do some cleanup of the blocked client handle that may be stored around
in the module-private data structures.
In some modules it may be useful to have an idea about being near to
OOM. Anyway additionally an explicit call to get the fill ratio will be
added in the future.
This way it is possible to use conditional compilation to be compatible
with a larger amount of Redis versions, however note that this breaks
binary compatibiltiy, so the module must be compiled with the
corresponding redismodule.h file depending on the version of Redis
targeted.
Note that this was an experimental API that can only be enabled with
REIDSMODULE_EXPERIMENTAL_API, so it is subject to change until its
promoted to stable API. Sorry for the breakage, it is trivial to
resolve btw. This change will not be back ported to Redis 4.0.
While this feature is not used by Redis, ae.c implements the ability for
a timer to call a finalizer callback when an timer event is deleted.
This feature was bugged since the start, and because it was never used
we never noticed a problem. However Anthony LaTorre was using the same
library in order to implement a different system: he found a bug that he
describes as follows, and which he fixed with the patch in this commit,
sent me by private email:
--- Anthony email ---
've found one bug in the current implementation of the timed events.
It's possible to lose track of a timed event if an event is added in
the finalizerProc of another event.
For example, suppose you start off with three timed events 1, 2, and
3. Then the linked list looks like:
3 -> 2 -> 1
Then, you run processTimeEvents and events 2 and 3 finish, so now the
list looks like:
-1 -> -1 -> 2
Now, on the next iteration of processTimeEvents it starts by deleting
the first event, and suppose this finalizerProc creates a new event,
so that the list looks like this:
4 -> -1 -> 2
On the next iteration of the while loop, when it gets to the second
event, the variable prev is still set to NULL, so that the head of the
event loop after the next event will be set to 2, i.e. after deleting
the next event the event loop will look like:
2
and the event with id 4 will be lost.
I've attached an example program to illustrate the issue. If you run
it you will see that it prints:
```
foo id = 0
spam!
```
But if you uncomment line 29 and run it again it won't print "spam!".
--- End of email ---
Test.c source code is as follows:
#include "ae.h"
#include <stdio.h>
aeEventLoop *el;
int foo(struct aeEventLoop *el, long long id, void *data)
{
printf("foo id = %lld\n", id);
return AE_NOMORE;
}
int spam(struct aeEventLoop *el, long long id, void *data)
{
printf("spam!\n");
return AE_NOMORE;
}
void bar(struct aeEventLoop *el, void *data)
{
aeCreateTimeEvent(el, 0, spam, NULL, NULL);
}
int main(int argc, char **argv)
{
el = aeCreateEventLoop(100);
//aeCreateTimeEvent(el, 0, foo, NULL, NULL);
aeCreateTimeEvent(el, 0, foo, NULL, bar);
aeMain(el);
return 0;
}
Anthony fixed the problem by using a linked list for the list of timers, and
sent me back this patch after he tested the code in production for some time.
The code looks sane to me, so committing it to Redis.
There are too many advantages in doing this, RDB is faster to persist,
more compact, much faster to load back. The main issues here are that
the code is less tested because this was not the old default (so we are
enabling it for the new 5.0 release), and that the AOF is no longer a
trivially parsable format from now on. However the non-preamble mode
will be supported in the future as well, if new data types will be
added.
This should be more than enough, even if in case of partial IDs that are
not found, we send all the IDs to the slave/AOF, but this is definitely
a corner case without bad effects if not some wasted space.
This is a big win for caching use cases, since on reloading Redis will
still have some idea about what is worth to evict and what not.
However this only solves part of the problem because the information is
only partially propagated to slaves (on write operations). Reads will
not affect slaves LFU and LRU counters, so after a failover the eviction
decisions are kinda random until keys start to collect some aging/freq info.
However since new slaves are initially populated via RDB file transfer,
this means that if we spin up a new slave from a master, and perform an
immediate manual failover (for instance in order to upgrade the master),
the slave will have eviction informations to use for some time.
The LFU/LRU info is persisted only if the maxmemory policy is set to one
of the relevant type, even if no actual "maxmemory" memory limit is
set.
XINFO is mainly an observability command that will be used more by
humans than computers, and even when used by computers it will be a very
low traffic command. For this reason the format was changed in order to
have field names. They'll consume some bandwidth and CPU cycles, but in
this context this is much better than having to understand what the
numbers in the output array are.