Track bandwidth used by clients and replication (but diskless
replication is not tracked since the actual transfer happens in the
child process).
This includes a refactoring that makes tracking new instantaneous
metrics simpler.
RDB EOF detection was relying on the final part of the RDB transfer to
be a magic 40 bytes EOF marker. However as the slave is put online
immediately, and because of sockets timeouts, the replication stream is
actually contiguous with the RDB file.
This means that to detect the EOF correctly we should either:
1) Scan all the stream searching for the mark. Sucks CPU-wise.
2) Start to send the replication stream only after an acknowledge.
3) Implement a proper chunked encoding.
For now solution "2" was picked, so the master does not start to send
ASAP the stream of commands in the case of diskless replication. We wait
for the first REPLCONF ACK command from the slave, that certifies us
that the slave correctly loaded the RDB file and is ready to get more
data.
Both upstart and systemd provide a way for daemons to
be supervised, as well as a mechanism for them to
signal their readyness status.
This patch provides compatibility with this functionality while
not interfering with other methods.
With this, it will be possible to use `expect stop` with upstart
and `Type=notify` with systemd.
A more detailed explanation of the mechanism can be found here:
http://spootnik.org/entries/2014/11/09_pid-tracking-in-modern-init-systems.html
We need to remember what is the saving strategy of the current RDB child
process, since the configuration may be modified at runtime via CONFIG
SET and still we'll need to understand, when the child exists, what to
do and for what goal the process was initiated: to create an RDB file
on disk or to write stuff directly to slave's sockets.
The code tested many times if a client had active Pub/Sub subscriptions
by checking the length of a list and dictionary where the patterns and
channels are stored. This was substituted with a client flag called
REDIS_PUBSUB that is simpler to test for. Moreover in order to manage
this flag some code was refactored.
This commit is believed to have no effects in the behavior of the
server.
We introduce the distinction between slow and fast commands since those
are two different sources of latency. An O(1) or O(log N) command without
side effects (can't trigger deletion of large objects as a side effect of
its execution) if delayed is a symptom of inherent latency of the system.
A non-fast command (commands that may run large O(N) computations) if
delayed may just mean that the user is executing slow operations.
The advices LATENCY should provide in this two different cases are
different, so we log the two classes of commands in a separated way.
COMMANDS returns a nested multibulk reply for each
command in the command table. The reply for each
command contains:
- command name
- arity
- array of command flags
- start key position
- end key position
- key offset step
- optional: if the keys are not deterministic and
Redis uses an internal key evaluation function,
the 6th field appears and is defined as a status
reply of: REQUIRES ARGUMENT PARSING
Cluster clients need to know where the keys are in each
command to implement proper routing to cluster nodes.
Redis commands can have multiple keys, keys at offset steps, or other
issues where you can't always assume the first element after
the command name is the cluster routing key.
Using the information exposed by COMMANDS, client implementations
can have live, accurate key extraction details for all commands.
Also implements COMMANDS INFO [commands...] to return only a
specific set of commands instead of all 160+ commands live in Redis.
This will be used by CLIENT KILL and is also a good way to ensure a
given client is still the same across CLIENT LIST calls.
The output of CLIENT LIST was modified to include the new ID, but this
change is considered to be backward compatible as the API does not imply
you can do positional parsing, since each filed as a different name.
Because of output buffer limits Redis internals had this idea of type of
clients: normal, pubsub, slave. It is possible to set different output
buffer limits for the three kinds of clients.
However all the macros and API were named after output buffer limit
classes, while the idea of a client type is a generic one that can be
reused.
This commit does two things:
1) Rename the API and defines with more general names.
2) Change the class of clients executing the MONITOR command from "slave"
to "normal".
"2" is a good idea because you want to have very special settings for
slaves, that are not a good idea for MONITOR clients that are instead
normal clients even if they are conceptually slave-alike (since it is a
push protocol).
The backward-compatibility breakage resulting from "2" is considered to
be minimal to care, since MONITOR is a debugging command, and because
anyway this change is not going to break the format or the behavior, but
just when a connection is closed on big output buffer issues.
The new ROLE command is designed in order to provide a client with
informations about the replication in a fast and easy to use way
compared to the INFO command where the same information is also
available.
Every log contains, just after the pid, a single character that provides
information about the role of an instance:
S - Slave
M - Master
C - Writing child
X - Sentinel
Behrad Zari discovered [1] and Josiah reported [2]: if you block
and wait for a list to exist, but the list creates from
a non-push command, the blocked client never gets notified.
This commit adds notification of blocked clients into
the DB layer and away from individual commands.
Lists can be created by [LR]PUSH, SORT..STORE, RENAME, MOVE,
and RESTORE. Previously, blocked client notifications were
only triggered by [LR]PUSH. Your client would never get
notified if a list were created by SORT..STORE or RENAME or
a RESTORE, etc.
Blocked client notification now happens in one unified place:
- dbAdd() triggers notification when adding a list to the DB
Two new tests are added that fail prior to this commit.
All test pass.
Fixes#1668
[1]: https://groups.google.com/forum/#!topic/redis-db/k4oWfMkN1NU
[2]: #1668
The previous code handling a lost slot (by another master with an higher
configuration for the slot) was defensive, considering it an error and
putting the cluster in an odd state requiring redis-cli fix.
This was changed, because actually this only happens either in a
legitimate way, with failovers, or when the admin messed with the config
in order to reconfigure the cluster. So the new code instead will try to
make sure that the keys stored match the new slots map, by removing all
the keys in the slots we lost ownership from.
The function that deletes the keys from the lost slots is called only
if the node does not lose all its slots (resulting in a reconfiguration
as a slave of the node that got ownership). This is an optimization
since the replication code will anyway flush all the instance data in
a faster way.
The error when the target key is busy was a generic one, while it makes
sense to be able to distinguish between the target key busy error and
the others easily.
This commit adds peer ID caching in the client structure plus an API
change and the use of sdsMakeRoomFor() in order to improve the
reallocation pattern to generate the CLIENT LIST output.
Both the changes account for a very significant speedup.
When we are blocked and a few events a processed from time to time, it
is smarter to call the event handler a few times in order to handle the
accept, read, write, close cycle of a client in a single pass, otherwise
there is too much latency added for clients to receive a reply while the
server is busy in some way (for example during the DB loading).