COMMANDS returns a nested multibulk reply for each
command in the command table. The reply for each
command contains:
- command name
- arity
- array of command flags
- start key position
- end key position
- key offset step
- optional: if the keys are not deterministic and
Redis uses an internal key evaluation function,
the 6th field appears and is defined as a status
reply of: REQUIRES ARGUMENT PARSING
Cluster clients need to know where the keys are in each
command to implement proper routing to cluster nodes.
Redis commands can have multiple keys, keys at offset steps, or other
issues where you can't always assume the first element after
the command name is the cluster routing key.
Using the information exposed by COMMANDS, client implementations
can have live, accurate key extraction details for all commands.
Also implements COMMANDS INFO [commands...] to return only a
specific set of commands instead of all 160+ commands live in Redis.
This will be used by CLIENT KILL and is also a good way to ensure a
given client is still the same across CLIENT LIST calls.
The output of CLIENT LIST was modified to include the new ID, but this
change is considered to be backward compatible as the API does not imply
you can do positional parsing, since each filed as a different name.
Because of output buffer limits Redis internals had this idea of type of
clients: normal, pubsub, slave. It is possible to set different output
buffer limits for the three kinds of clients.
However all the macros and API were named after output buffer limit
classes, while the idea of a client type is a generic one that can be
reused.
This commit does two things:
1) Rename the API and defines with more general names.
2) Change the class of clients executing the MONITOR command from "slave"
to "normal".
"2" is a good idea because you want to have very special settings for
slaves, that are not a good idea for MONITOR clients that are instead
normal clients even if they are conceptually slave-alike (since it is a
push protocol).
The backward-compatibility breakage resulting from "2" is considered to
be minimal to care, since MONITOR is a debugging command, and because
anyway this change is not going to break the format or the behavior, but
just when a connection is closed on big output buffer issues.
The new ROLE command is designed in order to provide a client with
informations about the replication in a fast and easy to use way
compared to the INFO command where the same information is also
available.
Every log contains, just after the pid, a single character that provides
information about the role of an instance:
S - Slave
M - Master
C - Writing child
X - Sentinel
The error when the target key is busy was a generic one, while it makes
sense to be able to distinguish between the target key busy error and
the others easily.
When the listening sockets readable event is fired, we have the chance
to accept multiple clients instead of accepting a single one. This makes
Redis more responsive when there is a mass-connect event (for example
after the server startup), and in workloads where a connect-disconnect
pattern is used often, so that multiple clients are waiting to be
accepted continuously.
As a side effect, this commit makes the LOADING, BUSY, and similar
errors much faster to deliver to the client, making Redis more
responsive when there is to return errors to inform the clients that the
server is blocked in an not interruptible operation.
adjustOpenFilesLimit() and clusterUpdateSlotsWithConfig() that were
assuming uint64_t is the same as unsigned long long, which is true
probably for all the systems out there that we target, but still GCC
emitted a warning since technically they are two different types.
To test the bitfield array of counters set/get macros from the Redis Tcl
suite is hard, so a specialized command that is able to test the
internals was developed.
Bug found by the continuous integration test running the Redis
with valgrind:
==6245== Invalid read of size 8
==6245== at 0x4C2DEEF: memcpy@GLIBC_2.2.5 (mc_replace_strmem.c:876)
==6245== by 0x41F9E6: freeMemoryIfNeeded (redis.c:3010)
==6245== by 0x41D2CC: processCommand (redis.c:2069)
memmove() size argument was accounting for an extra element, going
outside the bounds of the array.
In this commit:
* Decrement steps are semantically differentiated from the reserved FDs.
Previously both values were 32 but the meaning was different.
* Make it clear that we save setrlimit errno.
* Don't explicitly handle wrapping of 'f', but prevent it from
happening.
* Add comments to make the function flow more readable.
This integrates PR #1630
Also update the original REDIS_EVENTLOOP_FDSET_INCR to
include REDIS_MIN_RESERVED_FDS. REDIS_EVENTLOOP_FDSET_INCR
exists to make sure more than (maxclients+RESERVED) entries
are allocated, but we can only guarantee that if we include
the current value of REDIS_MIN_RESERVED_FDS as a minimum
for the INCR size.
Fun fact: rlim_t is an unsigned long long on all platforms.
Continually subtracting from a rlim_t makes it get smaller
and smaller until it wraps, then you're up to 2^64-1.
This was causing an infinite loop on Redis startup if
your ulimit was extremely (almost comically) low.
The case of (f > oldlimit) would never be met in a case like:
f = 150
while (f > 20) f -= 128
Since f is unsigned, it can't go negative and would
take on values of:
Iteration 1: 150 - 128 => 22
Iteration 2: 22 - 128 => 18446744073709551510
Iterations 3-∞: ...
To catch the wraparound, we use the previous value of f
stored in limit.rlimit_cur. If we subtract from f and
get a larger number than the value it had previously,
we print an error and exit since we don't have enough
file descriptors to help the user at this point.
Thanks to @bs3g for the inspiration to fix this problem.
Patches existed from @bs3g at antirez#1227, but I needed to repair a few other
parts of Redis simultaneously, so I didn't get a chance to use them.
The log messages about open file limits have always
been slightly opaque and confusing. Here's an attempt to
fix their wording, detail, and meaning. Users will have a
better understanding of how to fix very common problems
with these reworded messages.
Also, we handle a new error case when maxclients becomes less
than one, essentially rendering the server unusable. We
now exit on startup instead of leaving the user with a server
unable to handle any connections.
This fixes antirez#356 as well.
32 was the additional number of file descriptors Redis
would reserve when managing a too-low ulimit. The
number 32 was in too many places statically, so now
we use a macro instead that looks more appropriate.
When Redis sets up the server event loop, it uses:
server.maxclients+REDIS_EVENTLOOP_FDSET_INCR
So, when reserving file descriptors, it makes sense to
reserve at least REDIS_EVENTLOOP_FDSET_INCR FDs instead
of only 32. Currently, REDIS_EVENTLOOP_FDSET_INCR is
set to 128 in redis.h.
Also, I replaced the static 128 in the while f < old loop
with REDIS_EVENTLOOP_FDSET_INCR as well, which results
in no change since it was already 128.
Impact: Users now need at least maxclients+128 as
their open file limit instead of maxclients+32 to obtain
actual "maxclients" number of clients. Redis will carve
the extra REDIS_EVENTLOOP_FDSET_INCR file descriptors it
needs out of the "maxclients" range instead of failing
to start (unless the local ulimit -n is too low to accomidate
the request).
Obtaining the RSS (Resident Set Size) info is slow in Linux and OSX.
This slowed down the generation of the INFO 'memory' section.
Since the RSS does not require to be a real-time measurement, we
now sample it with server.hz frequency (10 times per second by default)
and use this value both to show the INFO rss field and to compute the
fragmentation ratio.
Practically this does not make any difference for memory profiling of
Redis but speeds up the INFO call significantly.
Uname was profiled to be a slow syscall. It produces always the same
output in the context of a single execution of Redis, so calling it at
every INFO output generation does not make too much sense.
The uname utsname structure was modified as a static variable. At the
same time a static integer was added to check if we need to call uname
the first time.
This is an improvement over the previous eviction algorithm where we use
an eviction pool that is persistent across evictions of keys, and gets
populated with the best candidates for evictions found so far.
It allows to approximate LRU eviction at a given number of samples
better than the previous algorithm used.
For testing purposes it is handy to have a very high resolution of the
LRU clock, so that it is possible to experiment with scripts running in
just a few seconds how the eviction algorithms works.
This commit allows Redis to use the cached LRU clock, or a value
computed on demand, depending on the resolution. So normally we have the
good performance of a precomputed value, and a clock that wraps in many
days using the normal resolution, but if needed, changing a define will
switch behavior to an high resolution LRU clock.
Previously we used zunionInterGetKeys(), however after this function was
fixed to account for the destination key (not needed when the API was
designed for "diskstore") the two set of commands can no longer be served
by an unique keys-extraction function.
This API originated from the "diskstore" experiment, not for Redis
Cluster itself, so there were legacy/useless things trying to
differentiate between keys that are going to be overwritten and keys
that need to be fetched from disk (preloaded).
All useless with Cluster, so removed with the result of code
simplification.