Commit Graph

2771 Commits

Author SHA1 Message Date
Jeremy Zawodny
78b606acc2 comment fix
improve English a bit. :-)
2012-07-28 20:54:26 +02:00
antirez
ce7b838fb9 Sentinel: don't start a failover as leader if there is no good slave. 2012-07-26 12:09:40 +02:00
antirez
baace5fc42 Sentinel: ability to execute notification scripts. 2012-07-25 16:33:37 +02:00
Salvatore Sanfilippo
42c571864e Merge pull request #603 from mrb/fix_sentinel_config_warning
Fix warning in redis.c for sentinel config load
2012-07-25 07:15:53 -07:00
Salvatore Sanfilippo
7c15dd1596 Merge pull request #602 from mrb/sentinel_docs
Some cleanup in sentinel.conf
2012-07-25 07:15:02 -07:00
mrb
f1c8661e74 Fix warning in redis.c for sentinel config load 2012-07-25 09:55:53 -04:00
mrb
23023fc626 Some cleanup in sentinel.conf 2012-07-25 09:52:26 -04:00
antirez
672102c2ce Sentinel: abort failover if no good slave is available.
The previous behavior of the state machine was to wait some time and
retry the slave selection, but this is not robust enough against drastic
changes in the conditions of the monitored instances.

What we do now when the slave selection fails is to abort the failover
and return back monitoring the master. If the ODOWN condition is still
present a new failover will be triggered and so forth.

This commit also refactors the code we use to abort a failover.
2012-07-25 11:32:19 +02:00
antirez
9e5bef38e6 Sentinel: reset pending_commands in a more generic way. 2012-07-24 18:57:26 +02:00
antirez
a23a5b6c7d Prevent a spurious +sdown event on switch.
When we reset the master we should start with clean timestamps for ping
replies otherwise we'll detect a spurious +sdown event, because on
+master-switch event the previous master instance was probably in +sdown
condition. Since we updated the address we should count time from
scratch again.

Also this commit makes sure to explicitly reset the count of pending
commands, now we can do this because of the new way the hiredis link
is closed.
2012-07-24 18:46:04 +02:00
antirez
d918e6f127 Sentinel: debugging message removed. 2012-07-24 18:20:05 +02:00
antirez
75fb6e5b8a Sentinel: changes to connection handling and redirection.
We disconnect the Redis instances hiredis link in a more robust way now.
Also we change the way we perform the redirection for the +switch-master
event, that is not just an instance reset with an address change.

Using the same system we now implement the +redirect-to-master event
that is triggered by an instance that is configured to be master but
found to be a slave at the first INFO reply. In that case we monitor the
master instead, logging the incident as an event.
2012-07-24 18:15:44 +02:00
antirez
2179c26916 Sentinel: check that instance still exists in reply callbacks.
We can't be sure the instance object still exists when the reply
callback is called.
2012-07-24 16:37:57 +02:00
antirez
d876d6feac Sentinel: more robust failover detection as observer.
Sentinel observers detect failover checking if a slave attached to the
monitored master turns into its replication state from slave to master.
However while this change may in theory only happen after a SLAVEOF NO
ONE command, in practie it is very easy to reboot a slave instance with
a wrong configuration that turns it into a master, especially if it was
a past master before a successfull failover.

This commit changes the detection policy so that if an instance goes
from slave to master, but at the same time the runid has changed, we
sense a reboot, and in that case we don't detect a failover at all.

This commit also introduces the "reboot" sentinel event, that is logged
at "warning" level (so this will trigger an admin notification).

The commit also fixes a problem in the disconnect handler that assumed
that the instance object always existed, that is not the case. Now we
no longer assume that redisAsyncFree() will call the disconnection
handler before returning.
2012-07-24 12:42:40 +02:00
antirez
1bbdf1709f Fixed an error in the example sentinel.conf. 2012-07-23 15:08:36 +02:00
antirez
be2be3d903 Typo. 2012-07-23 15:06:55 +02:00
antirez
6b5daa2df2 First implementation of Redis Sentinel.
This commit implements the first, beta quality implementation of Redis
Sentinel, a distributed monitoring system for Redis with notification
and automatic failover capabilities.

More info at http://redis.io/topics/sentinel
2012-07-23 13:14:44 +02:00
antirez
03f412ddef Merge remote-tracking branch 'origin/unstable' into unstable 2012-07-22 17:18:42 +02:00
antirez
5d73073f6e Allow Pub/Sub in contexts where other commands are blocked.
Redis loading data from disk, and a Redis slave disconnected from its
master with serve-stale-data disabled, are two conditions where
commands are normally refused by Redis, returning an error.

However there is no reason to disable Pub/Sub commands as well, given
that this layer does not interact with the dataset. To allow Pub/Sub in
as many contexts as possible is especially interesting now that Redis
Sentinel uses Pub/Sub of a Redis master as a communication channel
between Sentinels.

This commit allows Pub/Sub to be used in the above two contexts where
it was previously denied.
2012-07-22 17:18:16 +02:00
Salvatore Sanfilippo
cddf5946fc Merge pull request #593 from steevel/unstable
Check that we have connection before enabling pipe mode
2012-07-21 11:12:35 -07:00
antirez
b62bdf1c64 Don't assume that "char" is signed.
For the C standard char can be either signed or unsigned, it's up to the
compiler, but Redis assumed that it was signed in a few places.

The practical effect of this patch is that now Redis 2.6 will run
correctly in every system where char is unsigned, notably the RaspBerry
PI and other ARM systems with GCC.

Thanks to Georgi Marinov (@eesn on twitter) that reported the problem
and allowed me to use his RaspBerry via SSH to trace and fix the issue!
2012-07-18 12:04:58 +02:00
Steeve Lennmark
e9828cb6f7 Check that we have connection before enabling pipe mode 2012-07-15 14:35:02 +02:00
Salvatore Sanfilippo
a2db8e4801 Merge pull request #569 from jokea/unstable
mark fd as writable when EPOLLERR or EPOLLHUP is returned by epoll_wait.
2012-07-09 03:14:08 -07:00
jokea
93b0075d33 mark fd as writable when EPOLLERR or EPOLLHUP is returned by epoll_wait. 2012-06-29 12:06:38 +08:00
antirez
36def8fd9a Typo in comment. 2012-06-27 11:26:44 +02:00
antirez
3a32897856 REPLCONF internal command introduced.
The REPLCONF command is an internal command (not designed to be directly
used by normal clients) that allows a slave to set some replication
related state in the master before issuing SYNC to start the
replication.

The initial motivation for this command, and the only reason currently
it is used by the implementation, is to let the slave instance
communicate its listening port to the slave, so that the master can
show all the slaves with their listening ports in the "replication"
section of the INFO output.

This allows clients to auto discover and query all the slaves attached
into a master.

Currently only a single option of the REPLCONF command is supported, and
it is called "listening-port", so the slave now starts the replication
process with something like the following chat:

    REPLCONF listening-prot 6380
    SYNC

Note that this works even if the master is an older version of Redis and
does not understand REPLCONF, because the slave ignores the REPLCONF
error.

In the future REPLCONF can be used for partial replication and other
replication related features where there is the need to exchange
information between master and slave.

NOTE: This commit also fixes a bug: the INFO outout already carried
information about slaves, but the port was broken, and was obtained
with getpeername(2), so it was actually just the ephemeral port used
by the slave to connect to the master as a client.
2012-06-27 09:43:57 +02:00
antirez
5410168c6e Fixed comment typo into time_independent_strcmp(). 2012-06-21 14:25:53 +02:00
antirez
31a1439bfd Fixed a timing attack on AUTH (Issue #560).
The way we compared the authentication password using strcmp() allowed
an attacker to gain information about the password using a well known
class of attacks called "timing attacks".

The bug appears to be practically not exploitable in most modern systems
running Redis since even using multiple bytes of differences in the
input at a time instead of one the difference in running time in in the
order of 10 nanoseconds, making it hard to exploit even on LAN. However
attacks always get better so we are providing a fix ASAP.

The new implementation uses two fixed length buffers and a constant time
comparison function, with the goal of:

1) Completely avoid leaking information about the content of the
password, since the comparison is always performed between 512
characters and without conditionals.
2) Partially avoid leaking information about the length of the
password.

About "2" we still have a stage in the code where the real password and
the user provided password are copied in the static buffers, we also run
two strlen() operations against the two inputs, so the running time
of the comparison is a fixed amount plus a time proportional to
LENGTH(A)+LENGTH(B). This means that the absolute time of the operation
performed is still related to the length of the password in some way,
but there is no way to change the input in order to get a difference in
the execution time in the comparison that is not just proportional to
the string provided by the user (because the password length is fixed).

Thus in practical terms the user should try to discover LENGTH(PASSWORD)
looking at the whole execution time of the AUTH command and trying to
guess a proportionality between the whole execution time and the
password length: this appears to be mostly unfeasible in the real world.

Also protecting from this attack is not very useful in the case of Redis
as a brute force attack is anyway feasible if the password is too short,
while with a long password makes it not an issue that the attacker knows
the length.
2012-06-21 11:50:01 +02:00
antirez
5b63ccce6c Fix c->reply_bytes computation in setDeferredMultiBulkLength()
In order to implement reply buffer limits introduced in 2.6 and useful
to close the connection under user-selected circumastances of big output
buffers (for instance slow consumers in pub/sub, a blocked slave, and so
forth) Redis takes a counter with the amount of used memory in objects
inside the output list stored into c->reply.

The computation was broken in the function setDeferredMultiBulkLength(),
in the case the object was glued with the next one. This caused the
c->reply_bytes field to go out of sync, be subtracted more than needed,
and wrap back near to ULONG_MAX values.

This commit fixes this bug and adds an assertion that is able to trap
this class of problems.

This problem was discovered looking at the INFO output of an unrelated
issue (issue #547).
2012-06-15 10:03:25 +02:00
antirez
ba779119b8 ziplistFind(): don't assume that entries are comparable by encoding.
Because Redis 2.6 introduced new integer encodings it is no longer true
that if two entries have a different encoding they are not equal.

An old ziplist can be loaded from an RDB file generated with Redis 2.4,
in this case for instance a small unsigned integers is encoded with a
16 bit encoding, while in Redis 2.6 a more specific 8 bit encoding
format is used.

Because of this bug hashes ended with duplicated values or fields lookup
failed, causing many bad behaviors.
This in turn caused a crash while converting the ziplist encoded hash into
a real hash table because an assertion was raised on duplicated elements.

This commit fixes issue #547.

Many thanks to Pinterest's Marty Weiner and colleagues for discovering
the problem and helping us in the debugging process.
2012-06-14 16:01:27 +02:00
Salvatore Sanfilippo
96b8ff3760 Merge pull request #552 from tnm/unstable
Standardize punctuation in redis-cli help.
2012-06-13 01:25:52 -07:00
Ted Nyman
d665dd0865 Standardize punctuation in redis-cli help.
Right there is a mix of help entries ending with periods or
without periods. This standardizes the end of command as without
periods, which seems to be the general custom in most unix tools,
at least.
2012-06-12 22:35:00 -07:00
antirez
84d9ef4f31 Added a new hash fuzzy tester.
The new fuzzy tester also removes elements from the hash instead of just
adding random fields. This should increase the probability to find bugs
in the implementations of the hash type internal representations.
2012-06-12 15:21:54 +02:00
antirez
ee789e157c Dump ziplist hex value on failed assertion.
The ziplist -> hashtable conversion code is triggered every time an hash
value must be promoted to a full hash table because the number or size of
elements reached the threshold.

If a problem in the ziplist causes the same field to be present
multiple times, the assertion of successful addition of the element
inside the hash table will fail, crashing server with a failed
assertion, but providing little information about the problem.

This code adds a new logging function to perform the hex dump of binary
data, and makes sure that the ziplist -> hashtable conversion code uses
this new logging facility to dump the content of the ziplist when the
assertion fails.

This change was originally made in order to investigate issue #547.
2012-06-12 00:41:48 +02:00
antirez
c0de45924c New test: hash ziplist -> hashtable encoding conversion.
A new stress test was added to stress test the code converting a ziplist
into an hash table.

In this commit also randomValue helper function was modified to also
return negative values.
2012-06-11 15:19:46 +02:00
antirez
80e808b6d6 EVAL replication test: less false positives.
wait_for_condition is now used instead of the usual "after 1000" (that
is the way to sleep in Tcl). This should avoid to find the replica in
a state where it is loading the RDB in memory, returning -LOADING error.

This test used to fail when running the test over valgrind, due to the
added latencies.
2012-06-02 23:29:57 +02:00
Alex Mitrofanov
51857c7e5c Fixed RESTORE hash failure (Issue #532)
(additional commit notes by antirez@gmail.com):

The rdbIsObjectType() macro was not updated when the new RDB object type
of ziplist encoded hashes was added.

As a result RESTORE, that uses rdbLoadObjectType(), failed when a
ziplist encoded hash was loaded.
This does not affected normal RDB loading because in that case we use
the lower-level function rdbLoadType().

The commit also adds a regression test.
2012-06-02 10:24:27 +02:00
antirez
c7a25200e2 RDB type loading functions clarified in comments.
Improved comments to make clear that rdbLoadType() just loads a
general TYPE in the context of RDB that can be an object type or an
expire type, end-of-file, and so forth.

While rdbLoadObjectType() enforces that the type is a valid Object Type
otherwise it returns -1.
2012-06-02 10:21:57 +02:00
antirez
1419406e8d BITOP bug when called against non existing keys fixed.
In the issue #529 an user reported a bug that can be triggered with the
following code:

flushdb
set a
"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
bitop or x a b

The bug was introduced with the speed optimization in commit 8bbc076
that specializes every BITOP operation loop up to the minimum length of
the input strings.

However the computation of the minimum length contained an error when a
non existing key was present in the input, after a key that was non zero
length.

This commit fixes the bug and adds a regression test for it.
2012-05-31 21:52:47 +02:00
antirez
bc70b8e5f4 Tests modified to account for INFO fields renaming.
Commit 33e1db36fa modified the name of a
few INFO fields. This commit changes the Redis test to account for this
changes.
2012-05-25 15:20:59 +02:00
antirez
33e1db36fa Four new persistence fields in INFO. A few renamed.
The 'persistence' section of INFO output now contains additional four
fields related to RDB and AOF persistence:

 rdb_last_bgsave_time_sec       Duration of latest BGSAVE in sec.
 rdb_current_bgsave_time_sec    Duration of current BGSAVE in sec.
 aof_last_rewrite_time_sec      Duration of latest AOF rewrite in sec.
 aof_current_rewrite_time_sec   Duration of current AOF rewrite in sec.

The 'current' fields are set to -1 if a BGSAVE / AOF rewrite is not in
progress. The 'last' fileds are set to -1 if no previous BGSAVE / AOF
rewrites were performed.

Additionally a few fields in the persistence section were renamed for
consistency:

 changes_since_last_save -> rdb_changes_since_last_save
 bgsave_in_progress -> rdb_bgsave_in_progress
 last_save_time -> rdb_last_save_time
 last_bgsave_status -> rdb_last_bgsave_status
 bgrewriteaof_in_progress -> aof_rewrite_in_progress
 bgrewriteaof_scheduled -> aof_rewrite_scheduled

After the renaming, fields in the persistence section start with rdb_ or
aof_ prefix depending on the persistence method they describe.
The field 'loading' and related fields are not prefixed because they are
unique for both the persistence methods.
2012-05-25 12:11:30 +02:00
antirez
d866803818 BITOP command 10x speed improvement.
This commit adds a fast-path to the BITOP that can be used for all the
bytes from 0 to the minimal length of the string, and if there are
at max 16 input keys.

Often the intersected bitmaps are roughly the same size, so this
optimization can provide a 10x speed boost to most real world usages
of the command.

Bytes are processed four full words at a time, in loops specialized
for the specific BITOP sub-command, without the need to check for
length issues with the inputs (since we run this algorithm only as far
as there is data from all the keys at the same time).

The remaining part of the string is intersected in the usual way using
the slow but generic algorith.

It is possible to do better than this with inputs that are not roughly
the same size, sorting the input keys by length, by initializing the
result string in a smarter way, and noticing that the final part of the
output string composed of only data from the longest string does not
need any proecessing since AND, OR and XOR against an empty string does
not alter the output (zero in the first case, and the original string in
the other two cases).

More implementations will be implemented later likely, but this should
be enough to release Redis 2.6-RC4 with bitops merged in.

Note: this commit also adds better testing for BITOP NOT command, that
is currently the faster and hard to optimize further since it just
flips the bits of a single input string.
2012-05-24 15:20:20 +02:00
antirez
fa4a5d5922 BITOP: handle integer encoded objects correctly.
A bug in the implementation caused BITOP to crash the server if at least
one one of the source objects was integer encoded.

The new implementation takes an additional array of Redis objects
pointers and calls getDecodedObject() to get a reference to a string
encoded object, and then uses decrRefCount() to release the object.

Tests modified to cover the regression and improve coverage.
2012-05-24 15:20:16 +02:00
antirez
7c34643f15 BITCOUNT performance improved.
At Redis's default optimization level the command is now much faster,
always using a constant-time bit manipualtion technique to count bits
instead of GCC builtin popcount, and unrolling the loop.

The current implementation performance is 1.5GB/s in a MBA 11" (1.8 Ghz
i7) compiled with both GCC and clang.

The algorithm used is described here:

http://graphics.stanford.edu/~seander/bithacks.html
2012-05-24 15:20:11 +02:00
antirez
80f8028e3c bitop.c renamed bitops.c
bitop.c contains the "Bit related string operations" so it seems more
logical to call it bitops instead of bitop.
This also makes it matching the name of the test (unit/bitops.tcl).
2012-05-24 15:20:06 +02:00
antirez
01d3a7e736 Bit operations tests improved.
Fuzzing tests of BITCOUNT / BITOP are iterated multiple times.
The new BITCOUNT fuzzing test uses random strings in a wider interval of
lengths including zero-len strings.
2012-05-24 15:20:02 +02:00
antirez
343d3bd287 popcount() optimization for speed.
We run the array by 32 bit words instead of processing it byte per byte.
If the code is compiled using GCC __builtin_popcount() builtin function
is used instead.
2012-05-24 15:19:58 +02:00
antirez
dbbbe49ef5 BITCOUNT refactoring.
The low level popualtion counting function is now separated from the
BITCOUNT command implementation, so that the low level function can be
further optimized and eventually used in other contexts if needed.
2012-05-24 15:19:55 +02:00
antirez
760e776526 Bit-related string operations moved to bitop.c
All the general string operations are implemented in t_string.c, however
the bit operations, while targeting the string type, are better served
in a specific file where we have the implementations of the following
four commands and helper functions:

    GETBIT
    SETBIT
    BITOP
    BITCOUNT

In the future this file will probably contain more code related to
making the BITOP and BITCOUNT operations faster.
2012-05-24 15:19:51 +02:00
antirez
a3f2b4895b BITOP and BITCOUNT tests.
The Redis implementation is tested against Tcl implementations of the
same operation. Both fuzzing and testing of specific aspects of the
commands behavior are performed.
2012-05-24 15:19:48 +02:00