When aof-load-truncated option was introduced, with a default of "yes",
the past behavior of the server to abort with trunncated AOF changed, so
we need to explicitly configure the tests to abort with truncated AOF
by setting the option to no.
Previously, "MOVE key somestring" would move the key to
DB 0 which is just unexpected and wrong.
String as DB == error.
Test added too.
Modified by @antirez in order to use the getLongLongFromObject() API
instead of strtol().
Fixes#1428
Also adds test for numsub — due to tcl being tcl,
it doesn't capture the "numberness" of the fix,
but now we at least have one test case for numsub.
Closes#1561
We only want to use the last STORE key, but we have to record
we actually found a STORE key so we can increment the final return
key count.
Test added to prevent further regression.
Closes#1883, #1645, #1647
Previously the end was casted to a smaller type
which resulted in a wrong check and failed
with values larger than handled by unsigned.
Closes#1847, #1844
In the test we use WAIT when the master and slave are up, and only later the
partition is created killing the master, so we are sure we don't incur
in failure modes that may lose writes in this test: the goal here is to
make sure that the elected slave was replicating correctly with the
master.
In the initialization test for each instance we used to unregister the
old master and register it again to clear the config.
However there is a race condition doing this: as soon as we unregister
and re-register "mymaster", another Sentinel can update the new
configuration with the old state because of gossip "hello" messages.
So the correct procedure is instead, unregister "mymaster" from all the
sentinel instances, and re-register it everywhere again.
Lua scripts are executed in the context of the currently selected
database (as selected by the caller of the script).
However Lua scripts are also free to use the SELECT command in order to
affect other DBs. When SELECT is called frm Lua, the old behavior, before
this commit, was to automatically set the Lua caller selected DB to the
last DB selected by Lua. See for example the following sequence of
commands:
SELECT 0
SET x 10
EVAL "redis.call('select','1')" 0
SET x 20
Before this commit after the execution of this sequence of commands,
we'll have x=10 in DB 0, and x=20 in DB 1.
Because of the problem above, there was a bug affecting replication of
Lua scripts, because of the actual implementation of replication. It was
possible to fix the implementation of Lua scripts in order to fix the
issue, but looking closely, the bug is the consequence of the behavior
of Lua ability to set the caller's DB.
Under the old semantics, a script selecting a different DB, has no simple
ways to restore the state and select back the previously selected DB.
Moreover the script auhtor must remember that the restore is needed,
otherwise the new commands executed by the caller, will be executed in
the context of a different DB.
So this commit fixes both the replication issue, and this hard-to-use
semantics, by removing the ability of Lua, after the script execution,
to force the caller to switch to the DB selected by the Lua script.
The new behavior of the previous sequence of commadns is to just set
X=20 in DB 0. However Lua scripts are still capable of writing / reading
from different DBs if needed.
WARNING: This is a semantical change that will break programs that are
conceived to select the client selected DB via Lua scripts.
This fixes issue #1811.
The new check-for-number behavior of Lua arguments broke
users who use large strings of just integers.
The Lua number check would convert the string to a number, but
that breaks user data because
Lua numbers have limited precision compared to an arbitrarily
precise number wrapped in a string.
Regression fixed and new test added.
Fixes#1118 again.
FLUSHALL will fail on read-only slaves, but there the command is not
needed in order to reset the instance with CLUSTER RESET so errors can
be ignored.
Previously the PID format was:
[PID] Timestamp
But it recently changed to:
PID:X Timestamp
The tcl testing framework was grabbing the PID from \[\d+\], but
that's not valid anymore.
Now we grab the pid from "PID: <PID>" in the part of Redis startup
output to the right of the ASCII logo.
The bug was triggered by running the test with Valgrind (which is a lot
slower and more sensible to timing issues) after the recent changes
that made Redis more promptly able to reply with the -LOADING error.
Behrad Zari discovered [1] and Josiah reported [2]: if you block
and wait for a list to exist, but the list creates from
a non-push command, the blocked client never gets notified.
This commit adds notification of blocked clients into
the DB layer and away from individual commands.
Lists can be created by [LR]PUSH, SORT..STORE, RENAME, MOVE,
and RESTORE. Previously, blocked client notifications were
only triggered by [LR]PUSH. Your client would never get
notified if a list were created by SORT..STORE or RENAME or
a RESTORE, etc.
Blocked client notification now happens in one unified place:
- dbAdd() triggers notification when adding a list to the DB
Two new tests are added that fail prior to this commit.
All test pass.
Fixes#1668
[1]: https://groups.google.com/forum/#!topic/redis-db/k4oWfMkN1NU
[2]: #1668
Better handling of connection errors in order to update the table and
recovery, populate the startup nodes table after fetching the list of
nodes.
More work to do about it, it is still not as reliable as
redis-rb-cluster implementation which is the minimal reference
implementation for Redis Cluster clients.
SPOP, tested in the new test, is among the commands rewritng the
client->argv argument vector (it gets rewritten as SREM) for command
replication purposes.
Because of recent optimizations to client->argv caching in the context
of the Lua internal Redis client, it is important to test for SPOP to be
callable from Lua without bad effects to the other commands.
Sometimes the process is still there but no longer in a state that can
be checked (after being killed). This used to happen after a call to
SHUTDOWN NOSAVE in the scripting unit, causing a false positive.
This makes tests a bit slower, but it is better to test things at a
decent scale instead of using just a few nodes, and for a few tests we
actually need so many nodes.
The test now runs in a self-contained directory.
The general abstractions to run the tests in an environment where
mutliple instances are executed at the same time was extrapolated into
instances.tcl, that will be reused to test Redis Cluster.
This commit sets the failover timeout to 30 seconds instead of the 180
seconds default, and allows to reconfigure multiple slaves at the same
time.
This makes tests less sensible to timing, with the result that there are
less false positives due to normal behaviors that require time to
succeed or to be retried.
However the long term solution is probably some way in order to detect
when a test failed because of timing issues (for example split brain
during leader election) and retry it.
It was verified in practice that this test is able to stress much more
the implementation by introducing errors that were only trivially to
detect with different offsets but impossible to detect starting always
at zero and counting bits the full length of the string.
An unit can abort in the middle for an error. The next unit should not
assume that the instances are in a clean state, and must restart what
was left killed.
Sentinel tests are designed to be dependent on the previous tests in the
same unit, so usually we can't continue with the next test in the same
unit if a previous test failed.
The test was previously performed by removing the master from the
Sentinel monitored masters. The test with the Sentinels crashed is
more similar to real-world partitions / failures.
The area a number of mandatory tests to craete a stable setup for
testing that is not too sensitive to timing issues. All those tests
moved to includes/init-tests, and marked as (init).
It is now possible to kill and restart sentinel or redis instances for
more real-world testing.
The 01 unit tests the capability of Sentinel to update the configuration
of Sentinels rejoining the cluster, however the test is pretty trivial
and more tests should be added.
Some inline test moved into server_is_up procedure.
Also find_available_port was moved into util since it is going
to be used for the Sentinel test as well.
The Redis test uses a server-clients model in order to parallelize the
execution of different tests. However in recent versions of osx not
setting the channel to a binary encoding caused issues even if AFAIK no
binary data is really sent via this channel. However now the channels
are deliberately set to a binary encoding and this solves the issue.
The exact issue was the test not terminating and giving the impression
of running forever, since test clients or servers were unable to
exchange the messages to continue.
When the test is executed using the root account, setting the permission
to 222 does not work as expected, as root can read files with 222
permission.
Now we skip the test if root is detected.
This fixes issue #1034 and the duplicated #1040 issue.
Thanks to Jan-Erik Rediger (@badboy on Github) for finding a way to reproduce the issue.
When keyspace events are enabled, the overhead is not sever but
noticeable, so this commit introduces the ability to select subclasses
of events in order to avoid to generate events the user is not
interested in.
The events can be selected using redis.conf or CONFIG SET / GET.
UNSUBSCRIBE and PUNSUBSCRIBE commands are designed to mass-unsubscribe
the client respectively all the channels and patters if called without
arguments.
However when these functions are called without arguments, but there are
no channels or patters we are subscribed to, the old behavior was to
don't reply at all.
This behavior is broken, as every command should always reply.
Also it is possible that we are no longer subscribed to a channels but we
are subscribed to patters or the other way around, and the client should
be notified with the correct number of subscriptions.
Also it is not pretty that sometimes we did not receive a reply at all
in a redis-cli session from these commands, blocking redis-cli trying
to read the reply.
This fixes issue #714.
The Redis Slow Log always used to log the slow commands executed inside
a MULTI/EXEC block. However also EXEC was logged at the end, which is
perfectly useless.
Now EXEC is no longer logged and a test was added to test this behavior.
This fixes issue #759.
Due to changes in recent releases of osx leaks utility, the osx leak
detection no longer worked. Now it is fixed in a way that should be
backward compatible.
SDIFF used an algorithm that was O(N) where N is the total number
of elements of all the sets involved in the operation.
The algorithm worked like that:
ALGORITHM 1:
1) For the first set, add all the members to an auxiliary set.
2) For all the other sets, remove all the members of the set from the
auxiliary set.
So it is an O(N) algorithm where N is the total number of elements in
all the sets involved in the diff operation.
Cristobal Viedma suggested to modify the algorithm to the following:
ALGORITHM 2:
1) Iterate all the elements of the first set.
2) For every element, check if the element also exists in all the other
remaining sets.
3) Add the element to the auxiliary set only if it does not exist in any
of the other sets.
The complexity of this algorithm on the worst case is O(N*M) where N is
the size of the first set and M the total number of sets involved in the
operation.
However when there are elements in common, with this algorithm we stop
the computation for a given element as long as we find a duplicated
element into another set.
I (antirez) added an additional step to algorithm 2 to make it faster,
that is to sort the set to subtract from the biggest to the
smallest, so that it is more likely to find a duplicate in a larger sets
that are checked before the smaller ones.
WHAT IS BETTER?
None of course, for instance if the first set is much larger than the
other sets the second algorithm does a lot more work compared to the
first algorithm.
Similarly if the first set is much smaller than the other sets, the
original algorithm will less work.
So this commit makes Redis able to guess the number of operations
required by each algorithm, and select the best at runtime according
to the input received.
However, since the second algorithm has better constant times and can do
less work if there are duplicated elements, an advantage is given to the
second algorithm.
EVALSHA used to crash if the SHA1 was not lowercase (Issue #783).
Fixed using a case insensitive dictionary type for the sha -> script
map used for replication of scripts.
The previous behavior was to return -1 if:
1) Existing key but without an expire set.
2) Non existing key.
Now the second case is handled in a different, and TTL will return -2
if the key does not exist at all.
PTTL follows the same behavior as well.
With COPY now MIGRATE does not remove the key from the source instance.
With REPLACE it uses RESTORE REPLACE on the target host so that even if
the key already eixsts in the target instance it will be overwritten.
The options can be used together.
The REPLACE option deletes an existing key with the same name (if any)
and materializes the new one. The default behavior without RESTORE is to
return an error if a key already exists.
So instead to reply with a generic error like:
-ERR ... wrong kind of value ...
now it replies with:
-WRONGTYPE ... wrong kind of value ...
This makes this particular error easy to check without resorting to
(fragile) pattern matching of the error string (however the error string
used to be consistent already).
Client libraries should return a specific exeption type for this error.
Most of the commit is about fixing unit tests.
When calling SCRIPT KILL currently you can get two errors:
* No script in timeout (busy) state.
* The script already performed a write.
It is useful to be able to distinguish the two errors, but right now both
start with "ERR" prefix, so string matching (that is fragile) must be used.
This commit introduces two different prefixes.
-NOTBUSY and -UNKILLABLE respectively to reply with an error when no
script is busy at the moment, and when the script already executed a
write operation and can not be killed.
When SORT is called with the option BY set to a string constant not
inclduing the wildcard character "*", there is no way to sort the output
so any ordering is valid. This allows the SORT internals to optimize its
work and don't really sort the output at all.
However it was odd that this option was not able to retain the natural
order of a sorted set. This feature was requested by users multiple
times as sometimes to call SORT with GET against sorted sets as a way to
mass-fetch objects can be handy.
This commit introduces two things:
1) The ability of SORT to return sorted sets elements in their natural
ordering when `BY nosort` is specified, accordingly to `DESC / ASC` options.
2) The ability of SORT to optimize this case further if LIMIT is passed
as well, avoiding to really fetch the whole sorted set, but directly
obtaining the specified range.
Because in this case the sorting is always deterministic, no
post-sorting activity is performed when SORT is called from a Lua
script.
This commit fixes issue #98.
Lua arrays can't contain nil elements (see
http://www.lua.org/pil/19.1.html for more information), so Lua scripts
were not able to return a multi-bulk reply containing nil bulk
elements inside.
This commit introduces a special conversion: a table with just
a "nilbulk" field set to a boolean value is converted by Redis as a nil
bulk reply, but at the same time for Lua this type is not a "nil" so can
be used inside Lua arrays.
This type is also assigned to redis.NIL, so the following two forms
are equivalent and will be able to return a nil bulk reply as second
element of a three elements array:
EVAL "return {1,redis.NIL,3}" 0
EVAL "return {1,{nilbulk=true},3}" 0
The result in redis-cli will be:
1) (integer) 1
2) (nil)
3) (integer) 3
Redis provides support for blocking operations such as BLPOP or BRPOP.
This operations are identical to normal LPOP and RPOP operations as long
as there are elements in the target list, but if the list is empty they
block waiting for new data to arrive to the list.
All the clients blocked waiting for th same list are served in a FIFO
way, so the first that blocked is the first to be served when there is
more data pushed by another client into the list.
The previous implementation of blocking operations was conceived to
serve clients in the context of push operations. For for instance:
1) There is a client "A" blocked on list "foo".
2) The client "B" performs `LPUSH foo somevalue`.
3) The client "A" is served in the context of the "B" LPUSH,
synchronously.
Processing things in a synchronous way was useful as if "A" pushes a
value that is served by "B", from the point of view of the database is a
NOP (no operation) thing, that is, nothing is replicated, nothing is
written in the AOF file, and so forth.
However later we implemented two things:
1) Variadic LPUSH that could add multiple values to a list in the
context of a single call.
2) BRPOPLPUSH that was a version of BRPOP that also provided a "PUSH"
side effect when receiving data.
This forced us to make the synchronous implementation more complex. If
client "B" is waiting for data, and "A" pushes three elemnents in a
single call, we needed to propagate an LPUSH with a missing argument
in the AOF and replication link. We also needed to make sure to
replicate the LPUSH side of BRPOPLPUSH, but only if in turn did not
happened to serve another blocking client into another list ;)
This were complex but with a few of mutually recursive functions
everything worked as expected... until one day we introduced scripting
in Redis.
Scripting + synchronous blocking operations = Issue #614.
Basically you can't "rewrite" a script to have just a partial effect on
the replicas and AOF file if the script happened to serve a few blocked
clients.
The solution to all this problems, implemented by this commit, is to
change the way we serve blocked clients. Instead of serving the blocked
clients synchronously, in the context of the command performing the PUSH
operation, it is now an asynchronous and iterative process:
1) If a key that has clients blocked waiting for data is the subject of
a list push operation, We simply mark keys as "ready" and put it into a
queue.
2) Every command pushing stuff on lists, as a variadic LPUSH, a script,
or whatever it is, is replicated verbatim without any rewriting.
3) Every time a Redis command, a MULTI/EXEC block, or a script,
completed its execution, we run the list of keys ready to serve blocked
clients (as more data arrived), and process this list serving the
blocked clients.
4) As a result of "3" maybe more keys are ready again for other clients
(as a result of BRPOPLPUSH we may have push operations), so we iterate
back to step "3" if it's needed.
The new code has a much simpler semantics, and a simpler to understand
implementation, with the disadvantage of not being able to "optmize out"
a PUSH+BPOP as a No OP.
This commit will be tested with care before the final merge, more tests
will be added likely.
Bug #582 was not present in 32 bit builds of Redis as
getObjectFromLong() will return an error for overflow.
This commit makes sure that the test does not fail because of the error
returned when running against 32 bit builds.
remove unsafe and unnecessary cast.
until now, this cast may lead segmentation fault when end > UINT_MAX
setbit foo 0 1
bitcount 0 4294967295
=> ok
bitcount 0 4294967296
=> cause segmentation fault.
Note by @antirez: the commit was modified a bit to also change the
string length type to long, since it's guaranteed to be at max 512 MB in
size, so we can work with the same type across all the code path.
A regression test was also added.
SORT is able to return (faster than when ordering) unordered output if
the "BY" clause is used with a constant value. However we try to play
well with scripting requirements of determinism providing always sorted
outputs when SORT (and other similar commands) are called by Lua
scripts.
However we used the general mechanism in place in scripting in order to
reorder SORT output, that is, if the command has the "S" flag set, the
Lua scripting engine will take an additional step when converting a
multi bulk reply to Lua value, calling a Lua sorting function.
This is suboptimal as we can do it faster inside SORT itself.
This is also broken as issue #545 shows us: basically when SORT is used
with a constant BY, and additionally also GET is used, the Lua scripting
engine was trying to order the output as a flat array, while it was
actually a list of key-value pairs.
What we do know is to recognized if the caller of SORT is the Lua client
(since we can check this using the REDIS_LUA_CLIENT flag). If so, and if
a "don't sort" condition is triggered by the BY option with a constant
string, we force the lexicographical sorting.
This commit fixes this bug and improves the performance, and at the same
time simplifies the implementation. This does not mean I'm smart today,
it means I was stupid when I committed the original implementation ;)
Redis used to crash with a call like the following:
EVAL "redis.call()" 0
Now the explicit check for at least one argument prevents the problem.
This commit fixes issue #655.