Cluster: slaves start failover with a small delay.

Redis Cluster can cope with a minority of nodes not informed about the
failure of a master in time for some reason (netsplit or node not
functioning properly, blocked, ...) however to wait a few seconds before
to start the failover will make most "normal" failovers simpler as the
FAIL message will propagate before the slave election happens.
This commit is contained in:
antirez 2013-03-15 16:39:49 +01:00
parent d512a09c20
commit 1375b0611b
2 changed files with 8 additions and 1 deletions

View File

@ -1565,10 +1565,16 @@ void clusterCron(void) {
}
/* If we are a slave and our master is down, but is serving slots,
* call the function that handles the failover. */
* call the function that handles the failover.
* This function is called with a small delay in order to let the
* FAIL message to propagate after failure detection, this is not
* strictly required but makes 99.99% of failovers mechanically
* simpler. */
if (server.cluster->myself->flags & REDIS_NODE_SLAVE &&
server.cluster->myself->slaveof &&
server.cluster->myself->slaveof->flags & REDIS_NODE_FAIL &&
(server.unixtime - server.cluster->myself->slaveof->fail_time) >
REDIS_CLUSTER_FAILOVER_DELAY &&
server.cluster->myself->slaveof->numslots != 0)
{
clusterHandleSlaveFailover();

View File

@ -521,6 +521,7 @@ typedef struct redisOpArray {
#define REDIS_CLUSTER_FAIL 1 /* The cluster can't work */
#define REDIS_CLUSTER_NAMELEN 40 /* sha1 hex length */
#define REDIS_CLUSTER_PORT_INCR 10000 /* Cluster port = baseport + PORT_INCR */
#define REDIS_CLUSTER_FAILOVER_DELAY 5 /* Seconds */
struct clusterNode;