Commit Graph

233 Commits

Author SHA1 Message Date
antirez
68cf249f81 Cluster: use server.cluster_node_timeout directly.
We used to copy this value into the server.cluster structure, however this
was not necessary.

The reason why we don't directly use server.cluster->node_timeout is
that things that can be configured via redis.conf need to be directly
available in the server structure as server.cluster is allocated later
only if needed in order to reduce the memory footprint of non-cluster
instances.
2013-04-09 11:24:18 +02:00
antirez
ef4f25ff6e Cluster: configdigest field no longer used. Removed. 2013-04-09 11:07:25 +02:00
antirez
f09b2508f4 Cluster: properly send ping to nodes not pinged foro too much time.
In commit d728ec6 it was introduced the concept of sending a ping to
every node not receiving a ping since node_timeout/2 seconds.
However the code was located in a place that was not executed because of
a previous conditional causing the loop to re-iterate.

This caused false positives in nodes availability detection.

The current code is still not perfect as a node may be detected to be in
PFAIL state even if it does not reply for just node_timeout/2 seconds
that is not correct. There is a plan to improve this code ASAP.
2013-04-08 19:40:20 +02:00
antirez
05fa4f4034 Cluster: node timeout is now configurable. 2013-04-04 12:29:10 +02:00
antirez
00bab23c41 Cluster: turn hardcoded node timeout multiplicators into defines.
Most Redis Cluster time limits are expressed in terms of the configured
node timeout. Turn them into defines.
2013-04-04 12:04:11 +02:00
antirez
c39e34d007 Cluster: when slave changes master, remove it from the old master. 2013-03-25 15:01:25 +01:00
antirez
34c1871e9f Cluster: set node role on successful handshake. 2013-03-25 13:03:01 +01:00
antirez
a8b09faf3d Cluster: comment no longer in sync with code removed. 2013-03-21 10:47:10 +01:00
antirez
8c1bc8e865 Cluster: clear the PROMOTED slave directly into clusterSetMaster().
This way we make sure every time a master is turned into a replica
the flag will be cleared.
2013-03-20 11:51:44 +01:00
antirez
e006407fd0 Cluster: master node must clear its hash slots when turning into a slave.
When a master turns into a slave after a failover event, make sure to
clear the assigned slots before setting up the replication, as a slave
should never claim slots in an explicit way, but just take over the
master slots when replacing its master.
2013-03-20 11:32:35 +01:00
antirez
506f9a42b0 Cluster: new flag PROMOTED introduced.
A slave node set this flag for itself when, after receiving authorization
from the majority of nodes, it turns itself into a master.

At the same time now this flag is tested by nodes receiving a PING
message before reconfiguring after a failover event. This makes the
system more robust: even if currently there is no way to manually turn
a slave into a master it is possible that we'll have such a feature in
the future, or that simply because of misconfiguration a node joins the
cluster as master while others believe it's a slave. This alone is now
no longer enough to trigger reconfiguration as other nodes will check
for the PROMOTED flag.

The PROMOTED flag is cleared every time the node is turned back into a
replica of some other node.
2013-03-20 10:48:42 +01:00
antirez
026b9483db Cluster: add sender flags in cluster bus messages header.
Sender flags were not propagated for the sender, but only for nodes in
the gossip section. This is odd and in the next commits we'll need to
get updated flags for the sender node, so this commit adds a new field
in the cluster messages header.

The message header is the same size as we reused some free space that
was marked as 'unused' because of alignment concerns.
2013-03-20 10:32:00 +01:00
antirez
d15b027d91 Cluster: turn old master into a replica of node that failed over.
So when the failing master node is back in touch with the cluster,
instead of remaining unused it is converted into a replica of the
new master, ready to perform the fail over if the new master node
will fail at some point.

Note that as a side effect clients with stale configuration are now
not an issue as well, as the node converted into a slave will not
accept queries but will redirect clients accordingly.
2013-03-20 00:30:47 +01:00
antirez
4d62623015 Cluster: node replication role change handle improved.
The code handling a master that turns into a slave or the contrary was
improved in order to avoid repeating the same operations. Also
the readability and conceptual simplicity was improved.
2013-03-19 16:01:30 +01:00
antirez
88221f88c0 Cluster: new command CLUSTER FLUSHSLOTS.
It's just a simpler way to CLUSTER DELSLOTS with all the slots as
arguments, in order to obtain a node without assigned slots for
reconfiguration.
2013-03-19 09:58:05 +01:00
antirez
e28e61e839 Cluster: when failing over claim master slots. 2013-03-15 16:53:41 +01:00
antirez
dd091661d4 Cluster: log when a slave asks for failover authorization. 2013-03-15 16:44:08 +01:00
antirez
1375b0611b Cluster: slaves start failover with a small delay.
Redis Cluster can cope with a minority of nodes not informed about the
failure of a master in time for some reason (netsplit or node not
functioning properly, blocked, ...) however to wait a few seconds before
to start the failover will make most "normal" failovers simpler as the
FAIL message will propagate before the slave election happens.
2013-03-15 16:39:49 +01:00
antirez
d512a09c20 Cluster: a bit more serious node role change handling. 2013-03-15 16:35:16 +01:00
antirez
004fbef847 Cluster: remove node from master slaves when it turns into a master.
Also, a few nearby comments improved.
2013-03-15 16:16:19 +01:00
antirez
44c92f5aeb Cluster: slave failover implemented. 2013-03-15 16:11:34 +01:00
antirez
1d8f302e0d Cluster: election -> promotion in two comments. 2013-03-15 15:44:49 +01:00
antirez
bf82195467 Cluster: added function to broadcast pings.
See the function top-comment for info why this is useful sometimes.
2013-03-15 15:43:58 +01:00
antirez
892e98548a Cluster: don't broadcast messages to HANDSHAKE nodes.
Also don't check for NOADDR as we check that node->link is not NULL
that's enough.
2013-03-15 15:36:36 +01:00
antirez
76a3954f4a Cluster: fix clusterHandleSlaveFailover() conditional: quorum is enough. 2013-03-15 13:20:34 +01:00
antirez
90e99a2082 Cluster: two lame bugs fixed in FAILOVER AUTH messages generation. 2013-03-14 21:27:12 +01:00
antirez
aeacaa57e6 Cluster: code to process messages moved in the right if-else chain. 2013-03-14 21:21:58 +01:00
antirez
35f05c66b6 Cluster: handle FAILOVER_AUTH_ACK messages.
That's trivial as we just need to increment the count of masters that
received with an ACK.
2013-03-14 16:43:13 +01:00
antirez
c2595500ac Cluster: request failover authorization, log if we have quorum.
However the failover is yet not really performed.
2013-03-14 16:39:02 +01:00
antirez
7fa42b801d Cluster: clusterSendFailoverAuth() implementation. 2013-03-14 16:31:57 +01:00
antirez
f59ff6fe61 Cluster: clusterSendFailoverAuthIfNeeded() work in progress. 2013-03-13 19:08:03 +01:00
antirez
44f6fdab60 Cluster: handle FAILOVER_AUTH_REQUEST in clusterProcessPacket().
However currently the control is passed to a function doing nothing at
all.
2013-03-13 18:38:08 +01:00
antirez
ece95b2dea Cluster: sanity check FAILOVER_AUTH_REQUEST messages for proper length. 2013-03-13 17:31:26 +01:00
antirez
66144337bf Cluster: use 'else if' for mutually exclusive conditionals. 2013-03-13 17:27:06 +01:00
antirez
db7c17e969 Cluster: FAILOVER_AUTH_REQUEST message type introduced.
This message is sent by a slave that is ready to failover its master to
other nodes to get the authorization from the majority of masters.
2013-03-13 17:21:20 +01:00
antirez
575cbc9990 Cluster: clusterHandleSlaveFailover() stub. 2013-03-13 13:10:49 +01:00
antirez
3d448bda39 Cluster: call clusterHandleSlaveFailover() when our master is down. 2013-03-13 12:44:02 +01:00
antirez
f0b807cd47 Cluster: update cluster state on PFAIL flag set/cleared on nodes. 2013-03-07 15:40:53 +01:00
antirez
299b8f76c2 Cluster: mark cluster state as fail of majority of masters is unreachable. 2013-03-07 15:36:59 +01:00
antirez
abf06fd5ff Cluster: log global cluster state change. 2013-03-07 15:22:32 +01:00
antirez
3dad8196b7 Cluster: clusterUpdateState() function simplified.
Also the NEEDHELP Cluster state was removed as it will no longer be
used by Redis Cluster.
2013-03-06 18:25:40 +01:00
antirez
011fa89ac9 Cluster: sdssplitargs_free() -> sdsfreesplitres(). 2013-03-06 12:38:06 +01:00
antirez
1025dd7786 Cluster: connect to our master ASAP after startup if we are a slave node. 2013-03-05 16:12:08 +01:00
antirez
bac57ad14b Cluster: more robust FAIL flag cleaup.
If we have a master in FAIL state that's reachable again, and apparently
no one is going to serve its slots, clear the FAIL flag and let the
cluster continue with its operations again.
2013-03-05 15:05:32 +01:00
antirez
1a02b7440a Cluster: new node field fail_time.
This is the unix time at which we set the FAIL flag for the node.
It is only valid if FAIL is set.

The idea is to use it in order to make the cluster more robust, for
instance in order to revert a FAIL state if it is long-standing but
still slots are assigned to this node, that is, no one is going to fix
these slots apparently.
2013-03-05 13:15:05 +01:00
antirez
e4b481a5f6 Cluster: A comment updated in clusterCron(). 2013-03-05 12:17:30 +01:00
antirez
d728ec6dee Cluster: send a ping to every node we never contacted in timeout/2 seconds.
Usually we try to send just 1 ping every second, however when we detect
we are going to have unreliable failure detection because we can't ping
some node in time, send an additional ping.

This should only happen with very large clusters or when the the node
timeout is set to a very low value.
2013-03-05 12:16:02 +01:00
antirez
e7628be2a7 Cluster: set node->slaveof correctly when a node state is updated. 2013-03-05 11:50:11 +01:00
antirez
d6457577d4 Cluster: don't perform startup slots sanity check for slaves.
If we are a cluster node the DB content will not match our configured
slots. Don't do the check at all.
2013-03-04 19:47:00 +01:00
antirez
d334897e80 Cluster: fix maximum line length when loading config.
There are pathological cases where the line can be even longer a single
node may contain all the slots in importing/migrating state.
2013-03-04 19:45:36 +01:00
antirez
b8a28bf442 Cluster: actually setup replication in CLUSTER REPLICATE. 2013-03-04 15:27:58 +01:00
antirez
0c01088b51 Cluster: REPLICATE subcommand and stub for clusterSetMaster(). 2013-03-04 13:15:09 +01:00
charsyam
bc84c399f8 adding check error code
adding check error code
2013-03-04 11:20:11 +01:00
antirez
caf9b24a7d Cluster: don't set the slot as unassigned because of PONG info.
As stated in the comment this is usually due to a resharding in progress
so the client should be still redirected to the old node that will
handle the redirection elsewhere.
2013-02-28 15:54:29 +01:00
antirez
0d77440b26 Cluster: better handling of slots changes in PONG packets.
The new code makes sure that the node slots bitmap is always consistent
with the cluster->slots array.
2013-02-28 15:41:54 +01:00
antirez
5f8fd27ace Cluster: refactoring of clusterNode*Bit to use helper bitmap functions. 2013-02-28 15:23:09 +01:00
antirez
d21d6b666f Cluster: use node->numslots instead of popcount() where possible. 2013-02-28 15:13:32 +01:00
antirez
4521115b17 Cluster: new field in cluster node structure, "numslots".
Before a relatively slow popcount() operation was needed every time we
needed to get the number of slots served by a given cluster node.
Now we just need to check an integer that is taken in sync with the
bitmap.
2013-02-28 15:11:05 +01:00
antirez
a2566d6618 Cluster: don't gossip about nodes that are not useful to the cluster. 2013-02-28 15:00:09 +01:00
antirez
d45d184118 Cluster: CLUSTER FORGET implemented. 2013-02-27 17:55:59 +01:00
antirez
d2b8281b3f Cluster: added a missing return on CLUSTER SETSLOT. 2013-02-27 17:53:48 +01:00
antirez
d20dea3eb7 Cluster: blank node address when flagging it as NOADDR. 2013-02-27 17:09:33 +01:00
antirez
2dcb5ab72b Cluster: add comments in sub-sections of CLUSTER command. 2013-02-27 16:12:59 +01:00
antirez
f9b5ca29fd Use GCC printf format attribute for redisLog().
This commit also fixes redisLog() statements producing warnings.
2013-02-27 12:27:15 +01:00
antirez
d0992d6e8b Cluster: a few random fixes to the new failure detection. 2013-02-26 15:15:44 +01:00
antirez
f288b07563 Cluster: log the event when we clear the FAIL flag. 2013-02-26 15:03:38 +01:00
antirez
97ffcd351b Cluster: use the failure report API to reimplement failure detection.
The new system detects a failure only when there is quorum from masters.
2013-02-26 14:58:39 +01:00
antirez
1b1b3f6c06 Cluster: invert two functions declarations in more natural order. 2013-02-26 11:19:48 +01:00
antirez
d5e8b0a47f Cluster: cleanup idle failure reports every time we remove one.
This is not very important as anyway when the function counting the
number of reports is called the cleanup is performed. However with this
change if only part of the nodes that reported the failure will report
the node is back ok, we'll cleanup the older entries ASAP. In complex
split net split scenarios, and when we are dealing with clusters having
nodes in the order of ~ 1000, this can save some CPU.
2013-02-26 11:15:18 +01:00
antirez
9cb578ced0 Cluster: new function clusterNodeDelFailureReport() for failure reports.
This is the missing part of the API that will be used to reimplement
failure detection of Cluster nodes.
2013-02-25 19:13:22 +01:00
antirez
18f537083a Cluster: no limits for the count parameter of CLUSTER GETKEYSINSLOT.
Not sure why I set a limit to 1 million keys, there is no reason for
this artificial limit, and anyway this is s a stupid limit because it is
already high enough to create latency issues. So let's the users shoot
on their feet because maybe they just actually know what they are doing.
2013-02-25 12:41:13 +01:00
antirez
544bbe5387 Cluster: validate slot number in CLUSTER COUNTKEYSINSLOT. 2013-02-25 12:40:32 +01:00
antirez
d4fa40655d Cluster: new sub-command CLUSTER COUNTKEYSINSLOT.
The new sub-command uses the new countKeysInSlot() API and allows a
cluster client to get the number of keys for a given hashslot.
2013-02-25 12:04:31 +01:00
antirez
a517c89321 Cluster: verifyClusterConfigWithData() implemented. 2013-02-25 11:43:49 +01:00
antirez
d2154254be Cluster: fix case for getKeysInSlot() and countKeysInSlot().
Redis functions start in low case. A few functions about cluster were
capitalized the wrong way.
2013-02-25 11:25:40 +01:00
antirez
c2eb4a606f Cluster: use CountKeysInSlot() when we just need the count. 2013-02-25 11:23:04 +01:00
antirez
ad3bca1fdf Cluster: added stub for verifyClusterConfigWithData().
See the top-comment for the function in this commit for details about
what the function is supposed to do.
2013-02-25 11:20:17 +01:00
antirez
825e07f2fd Cluster: if no previous config exists, create the myself node as master. 2013-02-22 19:24:01 +01:00
antirez
f4093753e4 Cluster: add cluster_size field in CLUSTER INFO output. 2013-02-22 19:20:38 +01:00
antirez
d218a4e244 Cluster: new state information, cluster size.
The definition of cluster size is: the number of known nodes in the
cluster that are masters and serving at least an hash slot.
2013-02-22 19:18:30 +01:00
antirez
5c55ed9388 Cluster: remove warning adding clusterNodeSetSlotBit() prototype. 2013-02-22 17:45:49 +01:00
antirez
974929770b Cluster: introduced a failure reports system.
A §Redis Cluster node used to mark a node as failing when itself
detected a failure for that node, and a single acknowledge was received
about the possible failure state.

The new API will be used in order to possible to require that N other
nodes have a PFAIL or FAIL state for a given node for a node to set it
as failing.
2013-02-22 17:43:35 +01:00
antirez
07b6322735 Cluster: more correct update of slots state when PONG is received. 2013-02-21 16:52:06 +01:00
antirez
c6da9d9fac Call clusterUpdateState() after CLUSTER SETSLOT too. 2013-02-21 16:31:22 +01:00
antirez
3a99d1228a Aesthetic change to make a line more 80-cols friendly. 2013-02-21 16:24:48 +01:00
antirez
dc4af60628 Cluster: clusterAddSlot() was not doing what stated in the comment. 2013-02-21 11:51:17 +01:00
antirez
fdb57233e2 Cluster: always use cluster(Add|Del)Slot to modify the cluster slots table. 2013-02-21 11:44:58 +01:00
antirez
786b8d6c87 Use RESTORE-ASKING for MIGRATE when in cluster mode. 2013-02-20 17:36:54 +01:00
antirez
ea7fc82a4a Cluster: new command flag forcing implicit ASKING.
Also using this new flag the RESTORE-ASKING command was implemented that
will be used by MIGRATE.
2013-02-20 17:28:35 +01:00
antirez
9a04e12cc0 Cluster: I/O errors are now logged at DEBUG level. 2013-02-20 13:18:51 +01:00
antirez
02796ba7a7 Cluster: sanity checks on the cluster bus message length. 2013-02-15 16:44:39 +01:00
antirez
6b9c661838 Cluster: make valgrind happy initializing all the bytes of the node IP. 2013-02-15 12:58:35 +01:00
antirez
7371d5e248 Remove wrong decrRefCount() from getNodeByQuery().
This fixes issue #607.
2013-02-15 11:57:53 +01:00
antirez
20f52b5b78 Top comment for getNodeByQuery() improved. 2013-02-15 11:50:54 +01:00
antirez
e0e15bd06d Cluster: with 16384 slots we need bigger buffers. 2013-02-14 15:36:33 +01:00
antirez
1649e509c3 Cluster: the cluster state structure is now heap allocated. 2013-02-14 13:20:56 +01:00
antirez
9dfd11c3da Cluster: Initialize ip and port in createClusterNode(). 2013-02-14 13:01:28 +01:00
antirez
ebd666db47 Cluster: from 4096 to 16384 hash slots. 2013-02-14 12:49:16 +01:00
antirez
b70b459b0e TCP_NODELAY after SYNC: changes to the implementation. 2013-02-05 12:04:30 +01:00
guiquanz
9d09ce3981 Fixed many typos. 2013-01-19 10:59:44 +01:00