This means that stopping a slave and restarting it will still make it
able to PSYNC with the master. Moreover the master itself will retain
its ID/offset, in case it gets turned into a slave, or if a slave will
try to PSYNC with it with an exactly updated offset (otherwise there is
no backlog).
This change was possible thanks to PSYNC v2 that makes saving the current
replication state much simpler.
The gist of the changes is that now, partial resynchronizations between
slaves and masters (without the need of a full resync with RDB transfer
and so forth), work in a number of cases when it was impossible
in the past. For instance:
1. When a slave is promoted to mastrer, the slaves of the old master can
partially resynchronize with the new master.
2. Chained slalves (slaves of slaves) can be moved to replicate to other
slaves or the master itsef, without requiring a full resync.
3. The master itself, after being turned into a slave, is able to
partially resynchronize with the new master, when it joins replication
again.
In order to obtain this, the following main changes were operated:
* Slaves also take a replication backlog, not just masters.
* Same stream replication for all the slaves and sub slaves. The
replication stream is identical from the top level master to its slaves
and is also the same from the slaves to their sub-slaves and so forth.
This means that if a slave is later promoted to master, it has the
same replication backlong, and can partially resynchronize with its
slaves (that were previously slaves of the old master).
* A given replication history is no longer identified by the `runid` of
a Redis node. There is instead a `replication ID` which changes every
time the instance has a new history no longer coherent with the past
one. So, for example, slaves publish the same replication history of
their master, however when they are turned into masters, they publish
a new replication ID, but still remember the old ID, so that they are
able to partially resynchronize with slaves of the old master (up to a
given offset).
* The replication protocol was slightly modified so that a new extended
+CONTINUE reply from the master is able to inform the slave of a
replication ID change.
* REPLCONF CAPA is used in order to notify masters that a slave is able
to understand the new +CONTINUE reply.
* The RDB file was extended with an auxiliary field that is able to
select a given DB after loading in the slave, so that the slave can
continue receiving the replication stream from the point it was
disconnected without requiring the master to insert "SELECT" statements.
This is useful in order to guarantee the "same stream" property, because
the slave must be able to accumulate an identical backlog.
* Slave pings to sub-slaves are now sent in a special form, when the
top-level master is disconnected, in order to don't interfer with the
replication stream. We just use out of band "\n" bytes as in other parts
of the Redis protocol.
An old design document is available here:
https://gist.github.com/antirez/ae068f95c0d084891305
However the implementation is not identical to the description because
during the work to implement it, different changes were needed in order
to make things working well.
This new command swaps two Redis databases, so that immediately all the
clients connected to a given DB will see the data of the other DB, and
the other way around. Example:
SWAPDB 0 1
This will swap DB 0 with DB 1. All the clients connected with DB 0 will
immediately see the new data, exactly like all the clients connected
with DB 1 will see the data that was formerly of DB 0.
MOTIVATION AND HISTORY
---
The command was recently demanded by Pedro Melo, but was suggested in
the past multiple times, and always refused by me.
The reason why it was asked: Imagine you have clients operating in DB 0.
At the same time, you create a new version of the dataset in DB 1.
When the new version of the dataset is available, you immediately want
to swap the two views, so that the clients will transparently use the
new version of the data. At the same time you'll likely destroy the
DB 1 dataset (that contains the old data) and start to build a new
version, to repeat the process.
This is an interesting pattern, but the reason why I always opposed to
implement this, was that FLUSHDB was a blocking command in Redis before
Redis 4.0 improvements. Now we have FLUSHDB ASYNC that releases the
old data in O(1) from the point of view of the client, to reclaim memory
incrementally in a different thread.
At this point, the pattern can really be supported without latency
spikes, so I'm providing this implementation for the users to comment.
In case a very compelling argument will be made against this new command
it may be removed.
BEHAVIOR WITH BLOCKING OPERATIONS
---
If a client is blocking for a list in a given DB, after the swap it will
still be blocked in the same DB ID, since this is the most logical thing
to do: if I was blocked for a list push to list "foo", even after the
swap I want still a LPUSH to reach the key "foo" in the same DB in order
to unblock.
However an interesting thing happens when a client is, for instance,
blocked waiting for new elements in list "foo" of DB 0. Then the DB
0 and 1 are swapped with SWAPDB. However the DB 1 happened to have
a list called "foo" containing elements. When this happens, this
implementation can correctly unblock the client.
It is possible that there are subtle corner cases that are not covered
in the implementation, but since the command is self-contained from the
POV of the implementation and the Redis core, it cannot cause anything
bad if not used.
Tests and documentation are yet to be provided.
This code was extracted from @oranagra PR #3223 and modified in order
to provide only certain amounts of information compared to the original
code. It was also moved from DEBUG to the newly introduced MEMORY
command. Thanks to Oran for the implementation and the PR.
It implements detailed memory usage stats that can be useful in both
provisioning and troubleshooting memory usage in Redis.
For most tasks, we need the memory estimation to be O(1) by default.
This commit also implements an initial MEMORY command.
Note that objectComputeSize() takes the number of samples to check as
argument, so MEMORY should be able to get the sample size as option
to make precision VS CPU tradeoff tunable.
Related to: PR #3223.
This is an attempt at mitigating problems due to cross protocol
scripting, an attack targeting services using line oriented protocols
like Redis that can accept HTTP requests as valid protocol, by
discarding the invalid parts and accepting the payloads sent, for
example, via a POST request.
For this to be effective, when we detect POST and Host: and terminate
the connection asynchronously, the networking code was modified in order
to never process further input. It was later verified that in a
pipelined request containing a POST command, the successive commands are
not executed.
This feature is useful, especially in deployments using Sentinel in
order to setup Redis HA, where the slave is executed with NAT or port
forwarding, so that the auto-detected port/ip addresses, as listed in
the "INFO replication" output of the master, or as provided by the
"ROLE" command, don't match the real addresses at which the slave is
reachable for connections.
This patch, written in collaboration with Oran Agra (@oranagra) is a companion
to 780a8b1. Together the two patches should avoid that the AOF and RDB saving
processes can be spawned at the same time. Previously conditions that
could lead to two saving processes at the same time were:
1. When AOF is enabled via CONFIG SET and an RDB saving process is
already active.
2. When the SYNC command decides to start an RDB saving process ASAP in
order to serve a new slave that cannot partially resynchronize (but
only if we have a disk target for replication, for diskless
replication there is not such a problem).
Condition "1" is not very severe but "2" can happen often and is
definitely good at degrading Redis performances in an unexpected way.
The two commits have the effect of always spawning RDB savings for
replication in replicationCron() instead of attempting to start an RDB
save synchronously. Moreover when a BGSAVE or AOF rewrite must be
performed, they are instead just postponed using flags that will try to
perform such operations ASAP.
Finally the BGSAVE command was modified in order to accept a SCHEDULE
option so that if an AOF rewrite is in progress, when this option is
given, the command no longer returns an error, but instead schedules an
RDB rewrite operation for when it will be possible to start it.
The LRU eviction code used to make local choices: for each DB visited it
selected the best key to evict. This was repeated for each DB. However
this means that there could be DBs with very frequently accessed keys
that are targeted by the LRU algorithm while there were other DBs with
many better candidates to expire.
This commit attempts to fix this problem for the LRU policy. However the
TTL policy is still not fixed by this commit. The TTL policy will be
fixed in a successive commit.
This is an initial (partial because of TTL policy) fix for issue #2647.
So far we used an external program (later executed within Redis) and
parser in order to check RDB files for correctness. This forces, at each
RDB format update, to have two copies of the same format implementation
that are hard to keep in sync. Morover the former RDB checker only
checked the very high-level format of the file, without actually trying
to load things in memory. Certain corruptions can only be handled by
really loading key-value pairs.
This first commit attempts to unify the Redis RDB loadig code with the
task of checking the RDB file for correctness. More work is needed but
it looks like a sounding direction so far.
This fixes a bug introduced by d827dbf, and makes the code consistent
with the logic of always allowing, while the cluster is down, commands
that don't target any key.
As a side effect the code is also simpler now.
I've renamed maxmemoryToString to evictPolicyToString since that is
more accurate (and easier to mentally connect with the correct data), as
well as updated the function to user server.maxmemory_policy rather than
server.maxmemory. Now with a default config it is actually returning
the correct policy rather than volatile-lru.
The new bitfield command is an extension to the Redis bit operations,
where not just single bit operations are performed, but the array of
bits composing a string, can be addressed at random, not aligned
offsets, with any width unsigned and signed integers like u8, s5, u10
(up to 64 bit signed integers and 63 bit unsigned integers).
The BITFIELD command supports subcommands that can SET, GET, or INCRBY
those arbitrary bit counters, with multiple overflow semantics.
Trivial and credits:
A similar command was imagined a few times in the past, but for
some reason looked a bit far fetched or not well specified.
Finally the command was proposed again in a clear form by
Yoav Steinberg from Redis Labs, that proposed a set of commands on
arbitrary sized integers stored at bit offsets.
Starting from this proposal I wrote an initial specification of a single
command with sub-commands similar to what Yoav envisioned, using short
names for types definitions, and adding control on the overflow.
This commit is the resulting implementation.
Examples:
BITFIELD mykey OVERFLOW wrap INCRBY i2 10 -1 GET i2 10