This is safer as by default maxmemory should just set a memory limit
without any key to be deleted, unless the policy is set to something
more relaxed.
There were 2 spare bits inside the Redis object structure that are now
used in order to enlarge 4x the range of the LRU field.
At the same time the resolution was improved from 10 to 1 second: this
still provides 194 days before the LRU counter overflows (restarting from
zero).
This is not a problem since it only causes lack of eviction precision for
objects not touched for a very long time, and the lack of precision is
only temporary.
This new function is useful to get a number of random entries from an
hash table when we just need to do some sampling without particularly
good distribution.
It just jumps at a random place of the hash table and returns the first
N items encountered by scanning linearly.
The main usefulness of this function is to speedup Redis internal
sampling of the key space, for example for key eviction or expiry.
This is an improvement over the previous eviction algorithm where we use
an eviction pool that is persistent across evictions of keys, and gets
populated with the best candidates for evictions found so far.
It allows to approximate LRU eviction at a given number of samples
better than the previous algorithm used.
For testing purposes it is handy to have a very high resolution of the
LRU clock, so that it is possible to experiment with scripts running in
just a few seconds how the eviction algorithms works.
This commit allows Redis to use the cached LRU clock, or a value
computed on demand, depending on the resolution. So normally we have the
good performance of a precomputed value, and a clock that wraps in many
days using the normal resolution, but if needed, changing a define will
switch behavior to an high resolution LRU clock.
Test sentinel.tilt condition on top and return if it is true.
This allows to remove the check for the tilt condition in the remaining
code paths of the function.
Failure detection in Sentinel is ping-pong based. It used to work by
remembering the last time a valid PONG reply was received, and checking
if the reception time was too old compared to the current current time.
PINGs were sent at a fixed interval of 1 second.
This works in a decent way, but does not scale well when we want to set
very small values of "down-after-milliseconds" (this is the node
timeout basically).
This commit reiplements the failure detection making a number of
changes. Some changes are inspired to Redis Cluster failure detection
code:
* A new last_ping_time field is added in representation of instances.
If non zero, we have an active ping that was sent at the specified
time. When a valid reply to ping is received, the field is zeroed
again.
* last_ping_time is not reset when we reconnect the link or send a new
ping, so from our point of view it represents the time we started
waiting for the instance to reply to our pings without receiving a
reply.
* last_ping_time is now used in order to check if the instance is
timed out. This means that we can have a node timeout of 100
milliseconds and yet the system will work well since the new check is
not bound to the period used to send pings.
* Pings are now sent every second, or often if the value of
down-after-milliseconds is less than one second. With a lower limit of
10 HZ ping frequency.
* Link reconnection code was improved. This is used in order to try to
reconnect the link when we are at 50% of the node timeout without a
valid reply received yet. However the old code triggered unnecessary
reconnections when the node timeout was very small. Now that should be
ok.
The new code passes the tests but more testing is needed and more unit
tests stressing the failure detector, so currently this is merged only
in the unstable branch.
Sentinel's main safety argument is that there are no two configurations
for the same master with the same version (configuration epoch).
For this to be true Sentinels require to be authorized by a majority.
Additionally Sentinels require to do two important things:
* Never vote again for the same epoch.
* Never exchange an old vote for a fresh one.
The first prerequisite, in a crash-recovery system model, requires to
persist the master->leader_epoch on durable storage before to reply to
messages. This was not the case.
We also make sure to persist the current epoch in order to never reply
to stale votes requests from other Sentinels, after a recovery.
The configuration is persisted by making use of fsync(), this is
considered in the context of this code a good enough guarantee that
after a restart our durable state is restored, however this may not
always be the case depending on the kind of hardware and operating
system used.
Now the way HELLO messages are received is unified.
Now it is no longer needed for Sentinels to converge to the higher
configuration for a master to be able to chat via some Redis instance,
the are able to directly exchanges configurations.
Note that this commit does not include the (trivial) change needed to
send HELLO messages to Sentinel instances as well, since for an error I
committed the change in the previous commit that refactored hello
messages processing into a separated function.
Example: if the user will try to configure a cluster with 9 nodes,
asking for 1 slave for master, redis-trib will configure a 4 masters
cluster with 1 slave each as usually, but this time will assign the
spare node as a slave of one of the masters.
By manually modifying nodes configurations in random ways, it is possible
to create the following scenario:
A is serving keys for slot 10
B is manually configured to serve keys for slot 10
A receives an update from B (or another node) where it is informed that
the slot 10 is now claimed by B with a greater configuration epoch,
however A still has keys from slot 10.
With this commit A will put the slot in error setting it in IMPORTING
state, so that redis-trib can detect the issue.
The new "error" subcommand of the DEBUG command can reply with an user
selected error, specified as its sole argument:
DEBUG ERROR "LOADING please wait..."
The error is generated just prefixing the command argument with a "-"
character, and replacing newlines with spaces (since error replies can't
include newlines).
The goal of the command is to help in Client libraries unit tests by
making simple to simulate a command call triggering a given error.
getKeysFromCommand() is designed to be called with the command arguments
passing the basic arity checks described in the command table.
DEBUG CMDKEYS must provide the same guarantees for calling
getKeysFromCommand() to be safe.
Examples:
redis 127.0.0.1:6379> debug cmdkeys set foo bar
1) "foo"
redis 127.0.0.1:6379> debug cmdkeys mget a b c
1) "a"
2) "b"
3) "c"
redis 127.0.0.1:6379> debug cmdkeys zunionstore foo 2 a b
1) "a"
2) "b"
3) "foo"
redis 127.0.0.1:6379> debug cmdkeys ping
(empty list or set)
There is the exception of a "constant" BY pattern that is used in order
to signal to don't sort at all. In this case no lookup is needed so it
is possible to support this case in Cluster mode.
Previously we used zunionInterGetKeys(), however after this function was
fixed to account for the destination key (not needed when the API was
designed for "diskstore") the two set of commands can no longer be served
by an unique keys-extraction function.
This API originated from the "diskstore" experiment, not for Redis
Cluster itself, so there were legacy/useless things trying to
differentiate between keys that are going to be overwritten and keys
that need to be fetched from disk (preloaded).
All useless with Cluster, so removed with the result of code
simplification.