Sending a command like:
BLPOP foo foo foo foo 0
Resulted into a crash before this commit since the client ended being
inserted in the waiting list for this key multiple times.
This resulted into the function handleClientsBlockedOnLists() to fail
because we have code like that:
if (de) {
list *clients = dictGetVal(de);
int numclients = listLength(clients);
while(numclients--) {
listNode *clientnode = listFirst(clients);
/* server clients here... */
}
}
The code to serve clients used to remove the served client from the
waiting list, so if a client is blocking multiple times, eventually the
call to listFirst() will return NULL or worse will access random memory
since the list may no longer exist as it is removed by the function
unblockClientWaitingData() if there are no more clients waiting for this
list.
To avoid making the rest of the implementation more complex, this commit
modifies blockForKeys() so that a client will be put just a single time
into the waiting list for a given key.
Since it is Saturday, I hope this fixes issue #801.
SDIFF used an algorithm that was O(N) where N is the total number
of elements of all the sets involved in the operation.
The algorithm worked like that:
ALGORITHM 1:
1) For the first set, add all the members to an auxiliary set.
2) For all the other sets, remove all the members of the set from the
auxiliary set.
So it is an O(N) algorithm where N is the total number of elements in
all the sets involved in the diff operation.
Cristobal Viedma suggested to modify the algorithm to the following:
ALGORITHM 2:
1) Iterate all the elements of the first set.
2) For every element, check if the element also exists in all the other
remaining sets.
3) Add the element to the auxiliary set only if it does not exist in any
of the other sets.
The complexity of this algorithm on the worst case is O(N*M) where N is
the size of the first set and M the total number of sets involved in the
operation.
However when there are elements in common, with this algorithm we stop
the computation for a given element as long as we find a duplicated
element into another set.
I (antirez) added an additional step to algorithm 2 to make it faster,
that is to sort the set to subtract from the biggest to the
smallest, so that it is more likely to find a duplicate in a larger sets
that are checked before the smaller ones.
WHAT IS BETTER?
None of course, for instance if the first set is much larger than the
other sets the second algorithm does a lot more work compared to the
first algorithm.
Similarly if the first set is much smaller than the other sets, the
original algorithm will less work.
So this commit makes Redis able to guess the number of operations
required by each algorithm, and select the best at runtime according
to the input received.
However, since the second algorithm has better constant times and can do
less work if there are duplicated elements, an advantage is given to the
second algorithm.
The idea is to be able to identify a build in a unique way, so for
instance after a bug report we can recognize that the build is the one
of a popular Linux distribution and perform the debugging in the same
environment.
1) We no longer test location by location, otherwise the CPU write cache
completely makes our business useless.
2) We still need a memory test that operates in steps from the first to
the last location in order to never hit the cache, but that is still
able to retain the memory content.
This was tested using a Linux box containing a bad memory module with a
zingle bit error (always zero).
So the final solution does has an error propagation step that is:
1) Invert bits at every location.
2) Swap adiacent locations.
3) Swap adiacent locations again.
4) Invert bits at every location.
5) Swap adiacent locations.
6) Swap adiacent locations again.
Before and after these steps, and after step 4, a CRC64 checksum is computed.
If the three CRC64 checksums don't match, a memory error was detected.
EVALSHA used to crash if the SHA1 was not lowercase (Issue #783).
Fixed using a case insensitive dictionary type for the sha -> script
map used for replication of scripts.
After the transcation starts with a MULIT, the previous behavior was to
return an error on problems such as maxmemory limit reached. But still
to execute the transaction with the subset of queued commands on EXEC.
While it is true that the client was able to check for errors
distinguish QUEUED by an error reply, MULTI/EXEC in most client
implementations uses pipelining for speed, so all the commands and EXEC
are sent without caring about replies.
With this change:
1) EXEC fails if at least one command was not queued because of an
error. The EXECABORT error is used.
2) A generic error is always reported on EXEC.
3) The client DISCARDs the MULTI state after a failed EXEC, otherwise
pipelining multiple transactions would be basically impossible:
After a failed EXEC the next transaction would be simply queued as
the tail of the previous transaction.
We use this new bio.c feature in order to stop our I/O threads if there
is a memory test to do on crash. In this case we don't want anything
else than the main thread to run, otherwise the other threads may mess
with the heap and the memory test will report a false positive.
Finally Redis is able to report the amount of memory used by
copy-on-write while saving an RDB or writing an AOF file in background.
Note that this information is currently only logged (at NOTICE level)
and not shown in INFO because this is less trivial (but surely doable
with some minor form of interprocess communication).
The reason we can't capture this information on the parent before we
call wait3() is that the Linux kernel will release the child memory
ASAP, and only retain the minimal state for the process that is useful
to report the child termination to the parent.
The COW size is obtained by summing all the Private_Dirty fields found
in the "smap" file inside the proc filesystem for the process.
All this is Linux specific and is not available on other systems.
Now that we cache connections, a retry attempt makes sure that the
operation don't fail just because there is an existing connection error
on the socket, like the other end closing the connection.
Unfortunately this condition is not detectable using
getsockopt(SO_ERROR), so the only option left is to retry.
We don't retry on timeouts.
The previous behavior was to return -1 if:
1) Existing key but without an expire set.
2) Non existing key.
Now the second case is handled in a different, and TTL will return -2
if the key does not exist at all.
PTTL follows the same behavior as well.
By caching TCP connections used by MIGRATE to chat with other Redis
instances a 5x performance improvement was measured with
redis-benchmark against small keys.
This can dramatically speedup cluster resharding and other processes
where an high load of MIGRATE commands are used.
With COPY now MIGRATE does not remove the key from the source instance.
With REPLACE it uses RESTORE REPLACE on the target host so that even if
the key already eixsts in the target instance it will be overwritten.
The options can be used together.
The REPLACE option deletes an existing key with the same name (if any)
and materializes the new one. The default behavior without RESTORE is to
return an error if a key already exists.
So instead to reply with a generic error like:
-ERR ... wrong kind of value ...
now it replies with:
-WRONGTYPE ... wrong kind of value ...
This makes this particular error easy to check without resorting to
(fragile) pattern matching of the error string (however the error string
used to be consistent already).
Client libraries should return a specific exeption type for this error.
Most of the commit is about fixing unit tests.
After the wait3() syscall we used to do something like that:
if (pid == server.rdb_child_pid) {
backgroundSaveDoneHandler(exitcode,bysignal);
} else {
....
}
So the AOF rewrite was handled in the else branch without actually
checking if the pid really matches. This commit makes the check explicit
and logs at WARNING level if the pid returned by wait3() does not match
neither the RDB or AOF rewrite child.
Because of the short circuit behavior of && inverting the two sides of
the if expression avoids an hash table lookup if the non-EX variant of
SET is called.
Thanks to Weibin Yao (@yaoweibin on github) for spotting this.
(Commit message from @antirez as it was missign in the original commits,
also the patch was modified a bit to still work with 2.4 dumps and to
avoid if expressions that are always true due to checked types range)
This commit changes redis-check-dump to account for new encodings and
for the new MSTIME expire format. It also refactors the test for valid
type into a function.
The code is still compatible with Redis 2.4 generated dumps.
This fixes issue #709.
In some system, notably osx, the 3.5 GB limit was too far and not able
to prevent a crash for out of memory. The 3 GB limit works better and it
is still a lot of memory within a 4 GB theorical limit so it's not going
to bore anyone :-)
This fixes issue #711
When calling SCRIPT KILL currently you can get two errors:
* No script in timeout (busy) state.
* The script already performed a write.
It is useful to be able to distinguish the two errors, but right now both
start with "ERR" prefix, so string matching (that is fragile) must be used.
This commit introduces two different prefixes.
-NOTBUSY and -UNKILLABLE respectively to reply with an error when no
script is busy at the moment, and when the script already executed a
write operation and can not be killed.
Before of this commit it used to be like this:
MULTI
EXEC
... actual commands of the transaction ...
Because after all that is the natural order of things. Transaction
commands are queued and executed *only after* EXEC is called.
However this makes debugging with MONITOR a mess, so the code was
modified to provide a coherent output.
What happens is that MULTI is rendered in the MONITOR output as far as
possible, instead EXEC is propagated only after the transaction is
executed, or even in the case it fails because of WATCH, so in this case
you'll simply see:
MULTI
EXEC
An empty transaction.
If the server is password protected we need to accept AUTH when there is
a server busy (-BUSY) condition, otherwise it will be impossible to send
SHUTDOWN NOSAVE or SCRIPT KILL.
This fixes issue #708.
The code of current implementation:
if (c->pending == 0) clientDone(c);
In clientDone function, the c's memory has been freed, then the loop will continue: while(c->pending). The memory of c has been freed now, so c->pending is invalid (c is an invalid pointer now), and this will cause memory dump in some platforams(eg: Solaris).
So I think the code should be modified as:
if (c->pending == 0)
{
clientDone(c);
break;
}
and this will not lead to while(c->pending).
The previously used hash function, djbhash, is not secure against
collision attacks even when the seed is randomized as there are simple
ways to find seed-independent collisions.
The new hash function appears to be safe (or much harder to exploit at
least) in this case, and has better distribution.
Better distribution does not always means that's better. For instance in
a fast benchmark with "DEBUG POPULATE 1000000" I obtained the following
results:
1.6 seconds with djbhash
2.0 seconds with murmurhash2
This is due to the fact that djbhash will hash objects that follow the
pattern `prefix:<id>` and where the id is numerically near, to near
buckets. This improves the locality.
However in other access patterns with keys that have no relation
murmurhash2 has some (apparently minimal) speed advantage.
On the other hand a better distribution should significantly
improve the quality of the distribution of elements returned with
dictGetRandomKey() that is used in SPOP, SRANDMEMBER, RANDOMKEY, and
other commands.
Everything considered, and under the suspect that this commit fixes a
security issue in Redis, we are switching to the new hash function.
If some serious speed regression will be found in the future we'll be able
to step back easiliy.
This commit fixes issue #663.