Commit Graph

5388 Commits

Author SHA1 Message Date
antirez
c3ad70901f Replication: disconnect blocked clients when switching to slave role.
Bug as old as Redis and blocking operations. It's hard to trigger since
only happens on instance role switch, but the results are quite bad
since an inconsistency between master and slave is created.

How to trigger the bug is a good description of the bug itself.

1. Client does "BLPOP mylist 0" in master.
2. Master is turned into slave, that replicates from New-Master.
3. Client does "LPUSH mylist foo" in New-Master.
4. New-Master propagates write to slave.
5. Slave receives the LPUSH, the blocked client get served.

Now Master "mylist" key has "foo", Slave "mylist" key is empty.

Highlights:

* At step "2" above, the client remains attached, basically escaping any
  check performed during command dispatch: read only slave, in that case.
* At step "5" the slave (that was the master), serves the blocked client
  consuming a list element, which is not consumed on the master side.

This scenario is technically likely to happen during failovers, however
since Redis Sentinel already disconnects clients using the CLIENT
command when changing the role of the instance, the bug is avoided in
Sentinel deployments.

Closes #2473.
2015-03-24 16:00:09 +01:00
antirez
9b7f8b1c9b Cluster: redirection refactoring + handling of blocked clients.
There was a bug in Redis Cluster caused by clients blocked in a blocking
list pop operation, for keys no longer handled by the instance, or
in a condition where the cluster became down after the client blocked.

A typical situation is:

1) BLPOP <somekey> 0
2) <somekey> hash slot is resharded to another master.

The client will block forever int this case.

A symmentrical non-cluster-specific bug happens when an instance is
turned from master to slave. In that case it is more serious since this
will desynchronize data between slaves and masters. This other bug was
discovered as a side effect of thinking about the bug explained and
fixed in this commit, but will be fixed in a separated commit.
2015-03-24 11:56:24 +01:00
Salvatore Sanfilippo
eff68bd656 Merge pull request #2470 from superlogical/unstable
create-cluster fix for stop and watch commands
2015-03-24 09:07:22 +01:00
superlogical
761fc16b4a create-cluster fix for stop and watch commands 2015-03-24 09:44:52 +13:00
antirez
9b0bcf25e1 Cluster: unit 10 modified to leave cluster in proper state. 2015-03-22 22:58:53 +01:00
antirez
f300680408 Cluster: CLUSTER FAILOVER TAKEOVER tests. 2015-03-22 22:44:23 +01:00
antirez
631538cfe0 Cluster: more tests for manual failover + FORCE. 2015-03-22 22:44:02 +01:00
antirez
3b4de6aa18 Cluster: new tests1 for manual failover and scripts replication. 2015-03-22 22:25:05 +01:00
antirez
2f4240b9d9 Cluster: fix Lua scripts replication to slave nodes. 2015-03-22 22:24:08 +01:00
antirez
94030fa4d7 Two cluster.c comments improved. 2015-03-21 12:12:23 +01:00
antirez
2950824ab6 Cluster: TAKEOVER option for manual failover. 2015-03-21 11:54:32 +01:00
antirez
d544600aa5 Fix typo in beforeSleep() comment. 2015-03-21 09:19:08 +01:00
antirez
2b278a3394 Net: processUnblockedClients() and clientsArePaused() minor changes.
1. No need to set btype in processUnblockedClients(), since clients
   flagged REDIS_UNBLOCKED should have it already cleared.
2. When putting clients in the unblocked clients list, clientsArePaused()
   should flag them with REDIS_UNBLOCKED. Not strictly needed with the
   current code but is more coherent.
2015-03-21 09:13:29 +01:00
antirez
5fe4a23131 Net: clientsArePaused() should not touch blocked clients.
When the list of unblocked clients were processed, btype was set to
blocking type none, but the client remained flagged with REDIS_BLOCKED.
When timeout is reached (or when the client disconnects), unblocking it
will trigger an assertion.

There is no need to process pending requests from blocked clients, so
now clientsArePaused() just avoid touching blocked clients.

Close #2467.
2015-03-21 09:04:38 +01:00
antirez
a7010ae208 Cluster: non-conditional steps of slave failover refactored into a function. 2015-03-20 17:56:21 +01:00
antirez
230d141420 Cluster: separate unknown master check from the rest.
In no case we should try to attempt to failover if myself->slaveof is
NULL.
2015-03-20 16:56:59 +01:00
antirez
4f2555aa17 Cluster: refactoring around configEpoch handling.
This commit moves the process of generating a new config epoch without
consensus out of the clusterCommand() implementation, in order to make
it reusable for other reasons (current target is to have a CLUSTER
FAILOVER option forcing the failover when no master majority is
reachable).

Moreover the commit moves other functions which are similarly related to
config epochs in a new logical section of the cluster.c file, just for
clarity.
2015-03-20 16:42:52 +01:00
antirez
25c0f5ac63 Cluster: better cluster state transiction handling.
Before we relied on the global cluster state to make sure all the hash
slots are linked to some node, when getNodeByQuery() is called. So
finding the hash slot unbound was checked with an assertion. However
this is fragile. The cluster state is often updated in the
clusterBeforeSleep() function, and not ASAP on state change, so it may
happen to process clients with a cluster state that is 'ok' but yet
certain hash slots set to NULL.

With this commit the condition is also checked in getNodeByQuery() and
reported with a identical error code of -CLUSTERDOWN but slightly
different error message so that we have more debugging clue in the
future.

Root cause of issue #2288.
2015-03-20 09:59:28 +01:00
antirez
2ecb5edf34 Cluster: move clusterBeforeSleep() call before unblocked clients processing.
Related to issue #2288.
2015-03-20 09:47:54 +01:00
antirez
438a1a84e8 Cluster: more robust slave check in CLUSTER REPLICATE.
There are rare conditions where node->slaveof may be NULL even if the
node is a slave. To check by flag is much more robust.
2015-03-18 12:10:14 +01:00
Salvatore Sanfilippo
61fb441c8c Merge pull request #2386 from inkel/sentinel-add-client-command
Support CLIENT commands in Redis Sentinel
2015-03-13 18:23:36 +01:00
antirez
e791e2dda1 Test: fix SPOP replication test count.
If count is 0 SADD is called without element arguments, which is
currently invalid.
2015-03-13 17:30:13 +01:00
antirez
93b1320fac Cluster: fix CLUSTER NODES optimization error in 'j' increment. 2015-03-13 13:16:35 +01:00
antirez
4ed7582c7b Cluster: ignore various node files in create-cluster dir. 2015-03-13 13:16:01 +01:00
antirez
e1b6c9dd18 Cluster: CLUSTER NODES speedup. 2015-03-13 11:26:04 +01:00
antirez
b2e8eca70d Config: improve loglevel message error. 2015-03-12 14:43:07 +01:00
antirez
792c531688 CONFIG GET syslog-facility added.
Was missing for some reason. Trivial to add after config.c refactoring.
2015-03-12 09:59:10 +01:00
antirez
50b41b6ad3 CONFIG SET refactoring: use enums in more places. 2015-03-11 23:21:04 +01:00
antirez
535b295f96 Net: better Unix socket error. Issue #2449. 2015-03-11 17:24:55 +01:00
antirez
4cd4910f26 Merge branch 'unstable' of github.com:/antirez/redis into unstable 2015-03-11 17:05:14 +01:00
antirez
8e219224b9 CONFIG refactoring: configEnum abstraction.
Still many things to convert inside config.c in the next commits.
Some const safety in String objects creation and addReply() family
functions.
2015-03-11 17:00:13 +01:00
antirez
4a2a0d9e9d CONFIG SET: memory and special field macros. 2015-03-11 09:02:04 +01:00
Salvatore Sanfilippo
36c1a7cba3 Merge pull request #2445 from soveran/add-cluster-myid
Add command CLUSTER MYID
2015-03-10 17:46:09 +01:00
Michel Martens
6201eb0c55 Add command CLUSTER MYID 2015-03-10 16:43:19 +00:00
antirez
3da7408359 CONFIG SET: additional 2 numerical fields refactored. 2015-03-10 13:00:36 +01:00
antirez
d68f28a367 CONFIG SET refactoring of bool and value fields.
Not perfect since The Solution IMHO is to have a DSL with a table of
configuration functions with type, limits, and aux functions to handle
the odd ones. However this hacky macro solution is already better and
forces to put limits in the range of numerical fields.

More field types to be refactored in the next commits hopefully.
2015-03-10 12:37:39 +01:00
antirez
a664040eb7 Config: activerehashing option support in CONFIG SET. 2015-03-08 15:33:42 +01:00
antirez
509a6cc1e8 Fix iterator for issue #2438.
Itereator misuse due to analyzeLatencyForEvent() accessing the
dictionary during the iteration, without the iterator being
reclared as safe.
2015-03-04 11:48:19 -08:00
antirez
c77081a45a Migrate: replace conditional with pre-computed value. 2015-02-27 22:33:54 +01:00
antirez
4f56f035a7 String: use new sdigits10() API in stringObjectLen().
Should be much faster, and regardless, the code is more obvious now
compared to generating a string just to get the return value of the
ll2stirng() function.
2015-02-27 16:09:17 +01:00
antirez
0e5e8ca9e6 Utils: Include stdint.h and fix signess in sdigits10(). 2015-02-27 16:03:02 +01:00
antirez
084a59c324 Test: HSTRLEN stress test of corner cases.
Main point here is to correctly report LLONG_MIN length, since to take
the absolute value we need care in sdigits10().
2015-02-27 15:44:44 +01:00
antirez
0ace1e6d04 Hash: HSTRLEN crash fixed when getting len of int-encoded value 2015-02-27 15:37:04 +01:00
antirez
4e54b85a19 Hash: HSTRLEN (was HVSTRLEN) improved.
1. HVSTRLEN -> HSTRLEN. It's unlikely one needs the length of the key,
   not clear how the API would work (by value does not make sense) and
   there will be better names anyway.
2. Default is to return 0 when field is missing.
3. Default is to return 0 when key is missing.
4. The implementation was slower than needed, and produced unnecessary COW.

Related issue #2415.
2015-02-27 15:31:55 +01:00
antirez
8855b8161f Merge branch 'unstable' of github.com:/antirez/redis into unstable 2015-02-27 15:24:25 +01:00
Salvatore Sanfilippo
b49c00a79c Merge pull request #2415 from landmime/unstable
added a new hvstrlen command
2015-02-27 15:24:04 +01:00
antirez
d8f8b0575f Hash: API to get value string len by field name. 2015-02-27 15:22:49 +01:00
antirez
c95507881a Utils: added function to get radix 10 string length of signed integer. 2015-02-27 15:22:10 +01:00
antirez
7e6b4ea67b server.current_client fix and minor refactoring.
Thanks to @codeslinger (Toby DiPasquale) for identifying the issue.

Related to issue #2409.
2015-02-27 14:17:46 +01:00
antirez
832b0c7cce Improvements to PR #2425
1. Remove useless "cs" initialization.
2. Add a "select" var to capture a condition checked multiple times.
3. Avoid duplication of the same if (!copy) conditional.
4. Don't increment dirty if copy is given (no deletion is performed),
   otherwise we propagate MIGRATE when not needed.
2015-02-26 10:27:56 +01:00