antirez
1375b0611b
Cluster: slaves start failover with a small delay.
...
Redis Cluster can cope with a minority of nodes not informed about the
failure of a master in time for some reason (netsplit or node not
functioning properly, blocked, ...) however to wait a few seconds before
to start the failover will make most "normal" failovers simpler as the
FAIL message will propagate before the slave election happens.
2013-03-15 16:39:49 +01:00
antirez
d512a09c20
Cluster: a bit more serious node role change handling.
2013-03-15 16:35:16 +01:00
antirez
004fbef847
Cluster: remove node from master slaves when it turns into a master.
...
Also, a few nearby comments improved.
2013-03-15 16:16:19 +01:00
antirez
44c92f5aeb
Cluster: slave failover implemented.
2013-03-15 16:11:34 +01:00
antirez
1d8f302e0d
Cluster: election -> promotion in two comments.
2013-03-15 15:44:49 +01:00
antirez
bf82195467
Cluster: added function to broadcast pings.
...
See the function top-comment for info why this is useful sometimes.
2013-03-15 15:43:58 +01:00
antirez
892e98548a
Cluster: don't broadcast messages to HANDSHAKE nodes.
...
Also don't check for NOADDR as we check that node->link is not NULL
that's enough.
2013-03-15 15:36:36 +01:00
antirez
76a3954f4a
Cluster: fix clusterHandleSlaveFailover() conditional: quorum is enough.
2013-03-15 13:20:34 +01:00
antirez
90e99a2082
Cluster: two lame bugs fixed in FAILOVER AUTH messages generation.
2013-03-14 21:27:12 +01:00
antirez
aeacaa57e6
Cluster: code to process messages moved in the right if-else chain.
2013-03-14 21:21:58 +01:00
antirez
35f05c66b6
Cluster: handle FAILOVER_AUTH_ACK messages.
...
That's trivial as we just need to increment the count of masters that
received with an ACK.
2013-03-14 16:43:13 +01:00
antirez
c2595500ac
Cluster: request failover authorization, log if we have quorum.
...
However the failover is yet not really performed.
2013-03-14 16:39:02 +01:00
antirez
7fa42b801d
Cluster: clusterSendFailoverAuth() implementation.
2013-03-14 16:31:57 +01:00
antirez
f59ff6fe61
Cluster: clusterSendFailoverAuthIfNeeded() work in progress.
2013-03-13 19:08:03 +01:00
antirez
44f6fdab60
Cluster: handle FAILOVER_AUTH_REQUEST in clusterProcessPacket().
...
However currently the control is passed to a function doing nothing at
all.
2013-03-13 18:38:08 +01:00
antirez
ece95b2dea
Cluster: sanity check FAILOVER_AUTH_REQUEST messages for proper length.
2013-03-13 17:31:26 +01:00
antirez
66144337bf
Cluster: use 'else if' for mutually exclusive conditionals.
2013-03-13 17:27:06 +01:00
antirez
db7c17e969
Cluster: FAILOVER_AUTH_REQUEST message type introduced.
...
This message is sent by a slave that is ready to failover its master to
other nodes to get the authorization from the majority of masters.
2013-03-13 17:21:20 +01:00
antirez
575cbc9990
Cluster: clusterHandleSlaveFailover() stub.
2013-03-13 13:10:49 +01:00
antirez
3d448bda39
Cluster: call clusterHandleSlaveFailover() when our master is down.
2013-03-13 12:44:02 +01:00
antirez
f0b807cd47
Cluster: update cluster state on PFAIL flag set/cleared on nodes.
2013-03-07 15:40:53 +01:00
antirez
299b8f76c2
Cluster: mark cluster state as fail of majority of masters is unreachable.
2013-03-07 15:36:59 +01:00
antirez
abf06fd5ff
Cluster: log global cluster state change.
2013-03-07 15:22:32 +01:00
antirez
3dad8196b7
Cluster: clusterUpdateState() function simplified.
...
Also the NEEDHELP Cluster state was removed as it will no longer be
used by Redis Cluster.
2013-03-06 18:25:40 +01:00
antirez
011fa89ac9
Cluster: sdssplitargs_free() -> sdsfreesplitres().
2013-03-06 12:38:06 +01:00
antirez
1025dd7786
Cluster: connect to our master ASAP after startup if we are a slave node.
2013-03-05 16:12:08 +01:00
antirez
bac57ad14b
Cluster: more robust FAIL flag cleaup.
...
If we have a master in FAIL state that's reachable again, and apparently
no one is going to serve its slots, clear the FAIL flag and let the
cluster continue with its operations again.
2013-03-05 15:05:32 +01:00
antirez
1a02b7440a
Cluster: new node field fail_time.
...
This is the unix time at which we set the FAIL flag for the node.
It is only valid if FAIL is set.
The idea is to use it in order to make the cluster more robust, for
instance in order to revert a FAIL state if it is long-standing but
still slots are assigned to this node, that is, no one is going to fix
these slots apparently.
2013-03-05 13:15:05 +01:00
antirez
e4b481a5f6
Cluster: A comment updated in clusterCron().
2013-03-05 12:17:30 +01:00
antirez
d728ec6dee
Cluster: send a ping to every node we never contacted in timeout/2 seconds.
...
Usually we try to send just 1 ping every second, however when we detect
we are going to have unreliable failure detection because we can't ping
some node in time, send an additional ping.
This should only happen with very large clusters or when the the node
timeout is set to a very low value.
2013-03-05 12:16:02 +01:00
antirez
e7628be2a7
Cluster: set node->slaveof correctly when a node state is updated.
2013-03-05 11:50:11 +01:00
antirez
d6457577d4
Cluster: don't perform startup slots sanity check for slaves.
...
If we are a cluster node the DB content will not match our configured
slots. Don't do the check at all.
2013-03-04 19:47:00 +01:00
antirez
d334897e80
Cluster: fix maximum line length when loading config.
...
There are pathological cases where the line can be even longer a single
node may contain all the slots in importing/migrating state.
2013-03-04 19:45:36 +01:00
antirez
b8a28bf442
Cluster: actually setup replication in CLUSTER REPLICATE.
2013-03-04 15:27:58 +01:00
antirez
0c01088b51
Cluster: REPLICATE subcommand and stub for clusterSetMaster().
2013-03-04 13:15:09 +01:00
charsyam
bc84c399f8
adding check error code
...
adding check error code
2013-03-04 11:20:11 +01:00
antirez
caf9b24a7d
Cluster: don't set the slot as unassigned because of PONG info.
...
As stated in the comment this is usually due to a resharding in progress
so the client should be still redirected to the old node that will
handle the redirection elsewhere.
2013-02-28 15:54:29 +01:00
antirez
0d77440b26
Cluster: better handling of slots changes in PONG packets.
...
The new code makes sure that the node slots bitmap is always consistent
with the cluster->slots array.
2013-02-28 15:41:54 +01:00
antirez
5f8fd27ace
Cluster: refactoring of clusterNode*Bit to use helper bitmap functions.
2013-02-28 15:23:09 +01:00
antirez
d21d6b666f
Cluster: use node->numslots instead of popcount() where possible.
2013-02-28 15:13:32 +01:00
antirez
4521115b17
Cluster: new field in cluster node structure, "numslots".
...
Before a relatively slow popcount() operation was needed every time we
needed to get the number of slots served by a given cluster node.
Now we just need to check an integer that is taken in sync with the
bitmap.
2013-02-28 15:11:05 +01:00
antirez
a2566d6618
Cluster: don't gossip about nodes that are not useful to the cluster.
2013-02-28 15:00:09 +01:00
antirez
d45d184118
Cluster: CLUSTER FORGET implemented.
2013-02-27 17:55:59 +01:00
antirez
d2b8281b3f
Cluster: added a missing return on CLUSTER SETSLOT.
2013-02-27 17:53:48 +01:00
antirez
d20dea3eb7
Cluster: blank node address when flagging it as NOADDR.
2013-02-27 17:09:33 +01:00
antirez
2dcb5ab72b
Cluster: add comments in sub-sections of CLUSTER command.
2013-02-27 16:12:59 +01:00
antirez
f9b5ca29fd
Use GCC printf format attribute for redisLog().
...
This commit also fixes redisLog() statements producing warnings.
2013-02-27 12:27:15 +01:00
antirez
d0992d6e8b
Cluster: a few random fixes to the new failure detection.
2013-02-26 15:15:44 +01:00
antirez
f288b07563
Cluster: log the event when we clear the FAIL flag.
2013-02-26 15:03:38 +01:00
antirez
97ffcd351b
Cluster: use the failure report API to reimplement failure detection.
...
The new system detects a failure only when there is quorum from masters.
2013-02-26 14:58:39 +01:00