Maybe there are legitimate use cases for MIGRATE inside Lua scripts, at
least for now. When the command will be executed in an asynchronous
fashion (planned) it is possible we'll no longer be able to permit it
from within Lua scripts.
Thanks to Oran Agra (@oranagra) for reporting. Key extraction would not
work otherwise and it does not make sense to take wrong data in the
command table.
The old version only flushed data to slaves if there were strings
pending in the client->reply list. Now also static buffers are flushed.
Does not help to free memory (which is the only use we have right now in
the fuction), but is more correct conceptually, and may be used in
other contexts.
Arguments arity and arguments type error of redis.call() were not
reported correctly to Lua, so the command acted in this regard like
redis.pcall(), but just for two commands. Redis.call() should always
raise errors instead.
During the refactoring needed for lazy free, specifically the conversion
of t_hash from struct robj to plain SDS strings, HINCRBFLOAT was
accidentally moved away from long doubles to doubles for internal
processing of increments and formatting.
The diminished precision created more obvious artifacts in the way small
numbers are formatted once we convert from decimal number in radix 10 to
double and back to its string in radix 10.
By using more precision, we now have less surprising results at least
with small numbers like "1.23", exactly like in the previous versions of
Redis.
See issue #2846.
Currently this feature is only accessible via DEBUG for testing, since
otherwise depending on the instance configuration a given script works
or is broken, which is against the Redis philosophy.
By calling redis.replicate_commands(), the scripting engine of Redis
switches to commands replication instead of replicating whole scripts.
This is useful when the script execution is costly but only results in a
few writes performed to the dataset.
Morover, in this mode, it is possible to call functions with side
effects freely, since the script execution does not need to be
deterministic: anyway we'll capture the outcome from the point of view
of changes to the dataset.
In this mode math.random() returns different sequences at every call.
If redis.replicate_commnads() is not called before any other write, the
command returns false and sticks to whole scripts replication instead.
Sometimes it can be useful for clients to completely disable replies
from the Redis server. For example when the client sends fire and forget
commands or performs a mass loading of data, or in caching contexts
where new data is streamed constantly. In such contexts to use server
time and bandwidth in order to send back replies to clients, which are
going to be ignored, is a shame.
Multiple mechanisms are possible to implement such a feature. For
example it could be a feature of MULTI/EXEC, or a command prefix
such as "NOREPLY SADD myset foo", or a different mechanism that allows
to switch on/off requests using the CLIENT command.
The MULTI/EXEC approach has the problem that transactions are not
strictly part of the no-reply semantics, and if we want to insert a lot
of data in a bulk way, creating a huge MULTI/EXEC transaction in the
server memory is bad.
The prefix is the best in this specific use case since it does not allow
desynchronizations, and is pretty clear semantically. However Redis
internals and client libraries are not prepared to handle this
currently.
So the implementation uses the CLIENT command, providing a new REPLY
subcommand with three options:
CLIENT REPLY OFF disables the replies, and does not reply itself.
CLIENT REPLY ON re-enables the replies, replying +OK.
CLIENT REPLY SKIP only discards the reply of the next command, and
like OFF does not reply anything itself.
The reason to add the SKIP command is that it allows to have an easy
way to send conceptually "single" commands that don't need a reply
as the sum of two pipelined commands:
CLIENT REPLY SKIP
SET key value
Note that CLIENT REPLY ON replies with +OK so it should be used when
sending multiple commands that don't need a reply. However since it
replies with +OK the client can check that the connection is still
active and all the previous commands were received.
This is currently just into Redis "unstable" so the proposal can be
modified or abandoned based on users inputs.
This new function is able to restart the server "in place". The current
Redis process executes the same executable it was executed with, using
the same arguments and configuration file.
the check for lat/long valid ranges were performed inside the for loop,
two times instead of one, and the first time when the second element of
the array, xy[1], was yet not populated. This resulted into issue #2799.
Close issue #2799.
We have them into zmalloc.c, but this is going to replace that
implementation, so that it's possible to use the same idea everywhere
inside the code base.
After the introduction of the list with clients with pending writes, to
process clients incrementally outside of the event loop we also need to
process the pending writes list.
The code was broken and resulted in redis-cli --pipe to, most of the
times, writing everything received in the standard input to the Redis
connection socket without ever reading back the replies, until all the
content to write was written.
This means that Redis had to accumulate all the output in the output
buffers of the client, consuming a lot of memory.
Fixed thanks to the original report of anomalies in the behavior
provided by Twitter user @fsaintjacques.
Georadius works by computing the center + neighbors squares covering all
the area of the specified position and radius. Then a distance filter is
used to remove elements which are actually outside the range.
When a huge radius is used, like 5000 km or more, adjacent neighbors may
collide and be the same, leading to the reporting of the same element
multiple times. This only happens in the edge case of huge radius but is
not ideal.
A robust but slow solution would involve qsorting the range to remove
all the duplicates. However since the collisions are only in adjacent
boxes, for the way they are ordered in the code, it is much faster to
just check if the current box is the same as the previous one processed.
This commit adds a regression test for the bug.
Fixes#2767.
MOVE was not able to move the TTL: when a key was moved into a different
database number, it became persistent like if PERSIST was used.
In some incredible way (I guess almost nobody uses Redis MOVE) this bug
remained unnoticed inside Redis internals for many years.
Finally Andy Grunwald discovered it and opened an issue.
This commit fixes the bug and adds a regression test.
Close#2765.
As Oran Agra suggested, in startBgsaveForReplication() when the BGSAVE
attempt returns an error, we scan the list of slaves in order to remove
them since there is no way to serve them currently.
However we check for the replication state BGSAVE_START, which was
modified by rdbSaveToSlaveSockets() before forking(). So when fork fails
the state of slaves remain BGSAVE_END and no cleanup is performed.
This commit fixes the problem by making rdbSaveToSlavesSockets() able to
undo the state change on fork failure.
Before this commit, after triggering a BGSAVE it was up to the caller of
startBgsavForReplication() to handle slaves in WAIT_BGSAVE_START in
order to update them accordingly. However when the replication target is
the socket, this is not possible since the process of updating the
slaves and sending the FULLRESYNC reply must be coupled with the process
of starting an RDB save (the reason is, we need to send the FULLSYNC
command and spawn a child that will start to send RDB data to the slaves
ASAP).
This commit moves the responsibility of handling slaves in
WAIT_BGSAVE_START to startBgsavForReplication() so that for both
diskless and disk-based replication we have the same chain of
responsiblity. In order accomodate such change, the syncCommand() also
needs to put the client in the slave list ASAP (just after the initial
checks) and not at the end, so that startBgsavForReplication() can find
the new slave alrady in the list.
Another related change is what happens if the BGSAVE fails because of
fork() or other errors: we now remove the slave from the list of slaves
and send an error, scheduling the slave connection to be terminated.
As a side effect of this change the following errors found by
Oran Agra are fixed (thanks!):
1. rdbSaveToSlavesSockets() on failed fork will get the slaves cleaned
up, otherwise they remain in a wrong state forever since we setup them
for full resync before actually trying to fork.
2. updateSlavesWaitingBgsave() with replication target set as "socket"
was broken since the function changed the slaves state from
WAIT_BGSAVE_START to WAIT_BGSAVE_END via
replicationSetupSlaveForFullResync(), so later rdbSaveToSlavesSockets()
will not find any slave in the right state (WAIT_BGSAVE_START) to feed.
It is simpler if removing the read event handler from the FD is up to
slaveTryPartialResynchronization, after all it is only called in the
context of syncWithMaster.
This commit also makes sure that on error all the event handlers are
removed from the socket before closing it.
Talking with @oranagra we had to reason a little bit to understand if
this function could ever flush the output buffers of the wrong slaves,
having online state but actually not being ready to receive writes
before the first ACK is received from them (this happens with diskless
replication).
Next time we'll just read this comment.
Add the concept of slaves capabilities to Redis, the slave now presents
to the Redis master with a set of capabilities in the form:
REPLCONF capa SOMECAPA capa OTHERCAPA ...
This has the effect of setting slave->slave_capa with the corresponding
SLAVE_CAPA macros that the master can test later to understand if it
the slave will understand certain formats and protocols of the
replication process. This makes it much simpler to introduce new
replication capabilities in the future in a way that don't break old
slaves or masters.
This patch was designed and implemented together with Oran Agra
(@oranagra).
Our function to read a line with a timeout handles newlines as requests
to refresh the timeout, however the code kept subtracting the buffer
size left every time a newline was received, for a bug in the loop
logic. Fixed by this commit.
For PINGs we use the period configured by the user, but for the newlines
of slaves waiting for an RDB to be created (including slaves waiting for
the FULLRESYNC reply) we need to ping with frequency of 1 second, since
the timeout is fixed and needs to be refreshed.
In previous commits we moved the FULLRESYNC to the moment we start the
BGSAVE, so that the offset we provide is the right one. However this
also means that we need to re-emit the SELECT statement every time a new
slave starts to accumulate the changes.
To obtian this effect in a more clean way, the function that sends the
FULLRESYNC reply was overloaded with a more important role of also doing
this and chanigng the slave state. So it was renamed to
replicationSetupSlaveForFullResync() to better reflect what it does now.
This commit attempts to fix a bug involving PSYNC and diskless
replication (currently experimental) found by Yuval Inbar from Redis Labs
and that was later found to have even more far reaching effects (the bug also
exists when diskstore is off).
The gist of the bug is that, a Redis master replies with +FULLRESYNC to
a PSYNC attempt that fails and requires a full resynchronization.
However, the baseline offset sent along with FULLRESYNC was always the
current master replication offset. This is not ok, because there are
many reasosn that may delay the RDB file creation. And... guess what,
the master offset we communicate must be the one of the time the RDB
was created. So for example:
1) When the BGSAVE for replication is delayed since there is one
already but is not good for replication.
2) When the BGSAVE is not needed as we attach one currently ongoing.
3) When because of diskless replication the BGSAVE is delayed.
In all the above cases the PSYNC reply is wrong and the slave may
reconnect later claiming to need a wrong offset: this may cause
data curruption later.
Using chained replication where C is slave of B which is in turn slave of
A, if B reconnects the replication link with A but discovers it is no
longer possible to PSYNC, slaves of B must be disconnected and PSYNC
not allowed, since the new B dataset may be completely different after
the synchronization with the master.
Note that there are varius semantical differences in the way this is
handled now compared to the past. In the past the semantics was:
1. When a slave lost connection with its master, disconnected the chained
slaves ASAP. Which is not needed since after a successful PSYNC with the
master, the slaves can continue and don't need to resync in turn.
2. However after a failed PSYNC the replication backlog was not reset, so a
slave was able to PSYNC successfully even if the instance did a full
sync with its master, containing now an entirely different data set.
Now instead chained slaves are not disconnected when the slave lose the
connection with its master, but only when it is forced to full SYNC with
its master. This means that if the slave having chained slaves does a
successful PSYNC all its slaves can continue without troubles.
See issue #2694 for more details.
When empty strings are created, or when sdsMakeRoomFor() is called, we
are likely into an appending pattern. Use at least type 8 SDS strings
since TYPE 5 does not remember the free allocation size and requires to
call sdsMakeRoomFor() at every new piece appended.
The previos attempt to process each client at least once every ten
seconds was not a good idea, because:
1. Usually because of the past min iterations set to 50, you get much
better processing period most of the times.
2. However when there are many clients and a normal setting for
server.hz, the edge case is triggered, and waiting 10 seconds for a
BLPOP that asked for 1 second is not ok.
3. Moreover, because of the high min-itereations limit of 50, when HZ
was set to an high value, the actual behavior was to process a lot of
clients per second.
Also the function checking for timeouts called gettimeofday() at each
iteration which can be costly.
The new implementation will try to process each client once per second,
gets the current time as argument, and does not attempt to process more
than 5 clients per iteration if not needed.
So now:
1. The CPU usage of an idle Redis process is the same or better.
2. The CPU usage of a busy Redis process is the same or better.
3. However a non trivial amount of work may be performed per iteration
when there are many many clients. In this particular case the user may
want to raise the "HZ" value if needed.
Btw with 4000 clients it was still not possible to noticy any actual
latency created by processing 400 clients per second, since the work
performed for each client is pretty small.
This is an attempt to use the refcount feature of the sds.c fork
provided in the Pull Request #2509. A new type, SDS_TYPE_5 is introduced
having a one byte header with just the string length, without
information about the available additional length at the end of the
string (this means that sdsMakeRoomFor() will be required each time
we want to append something, since the string will always report to have
0 bytes available).
More work needed in order to avoid common SDS functions will pay the
cost of this type. For example both sdscatprintf() and sdscatfmt()
should try to upgrade to SDS_TYPE_8 ASAP when appending chars.
The command reports information about the hash table internal state
representing the specified database ID.
This can be used in order to investigate rehashings, memory usage issues
and for other debugging purposes.
The new return value is the number of keys existing, among the ones
specified in the command line, counting the same key multiple times if
given multiple times (and if it exists).
See PR #2667.
Rationale:
1. The commands look like internals exposed without a real strong use
case.
2. Whatever there is an use case, the client would implement the
commands client side instead of paying RTT just to use a simple to
reimplement library.
3. They add complexity to an otherwise quite straightforward API.
So for now KILLED ;-)
Instead of successive divisions in iteration the new code uses bitwise
magic to interleave / deinterleave two 32bit values into a 64bit one.
All tests still passing and is measurably faster, so worth it.