redict

mirror of https://codeberg.org/redict/redict.git synced 2025-01-24 00:59:02 -05:00

Author	SHA1	Message	Date
antirez	6dd2dbbd36	Cluster: handshake "already known" error logged to VERBOSE. This is not really an error but something that always happens for example when creating a new cluster, or if the sysadmin rejoins manually a node that is already known. Since useless logs don't help, moved to VERBOSE level.	2014-03-26 16:35:38 +01:00
antirez	3cf6f1f54f	Cluster: clusterHandleConfigEpochCollision() fixed. New config epochs must always be obtained incrementing the currentEpoch, that is itself guaranteed to be >= the max configEpoch currently known to the node.	2014-03-26 12:31:28 +01:00
antirez	80d4c52cdf	Cluster: better logging for clusterUpdateSlotsConfigWith().	2014-03-26 12:09:38 +01:00
antirez	eb746ec408	Cluster: CLUSTER SETSLOT implementation comment updated. Update the comment since the implementation details changed.	2014-03-25 17:50:46 +01:00
antirez	6c527a89a0	Cluster: configEpoch collisions resolution. The slave election in Redis Cluster guarantees that slaves promoted to masters always end with unique config epochs, however failures during manual reshardings, software bugs and operational errors may in theory cause two nodes to have the same configEpoch. This commit introduces a mechanism to eventually always end with different configEpochs if a collision ever happens. As a (wanted) side effect, this also ensures that after a new cluster is created, all nodes will end with a different configEpoch automatically.	2014-03-25 17:19:58 +01:00
antirez	c1041c570f	Cluster: stay within 80 cols.	2014-03-25 16:07:14 +01:00
antirez	82b53c650c	struct dictEntry -> dictEntry.	2014-03-20 16:20:37 +01:00
antirez	e26f4486b0	Cluster: update node configEpoch on UPDATE messages. The UPDATE message contains the configEpoch of the node configuration advertised in the packet. Update it if needed.	2014-03-11 11:53:09 +01:00
antirez	a2ff90919f	Cluster: set slot error if we receive an update for a busy slot. By manually modifying nodes configurations in random ways, it is possible to create the following scenario: A is serving keys for slot 10 B is manually configured to serve keys for slot 10 A receives an update from B (or another node) where it is informed that the slot 10 is now claimed by B with a greater configuration epoch, however A still has keys from slot 10. With this commit A will put the slot in error setting it in IMPORTING state, so that redis-trib can detect the issue.	2014-03-11 11:49:47 +01:00
antirez	1ed0ad77f0	Cluster: clarified a comment in clusterUpdateSlotsConfigWith().	2014-03-11 11:32:40 +01:00
antirez	8287945ff8	Cluster: flush importing/migrating state when master is turned into slave.	2014-03-11 11:22:06 +01:00
antirez	2e8e0ad44e	Cluster: clusterCloseAllSlots() added.	2014-03-11 11:16:18 +01:00
antirez	787b297046	Cluster: getKeysFromCommand() API cleaned up. This API originated from the "diskstore" experiment, not for Redis Cluster itself, so there were legacy/useless things trying to differentiate between keys that are going to be overwritten and keys that need to be fetched from disk (preloaded). All useless with Cluster, so removed with the result of code simplification.	2014-03-10 13:18:41 +01:00
antirez	c1a7d3e61f	Cluster: abort on port too high error. It also fixes multi-line comment style to be consistent with the rest of the code base. Related to #1555.	2014-03-10 10:41:27 +01:00
Salvatore Sanfilippo	442b06db54	Merge pull request #1555 from mattsta/cluster-port-error-out Cluster port error out	2014-03-10 10:37:50 +01:00
antirez	ed8c55237b	Cluster: be explicit about passing NULL as bind addr for connect. The code was already correct but it was using that bindaddr[0] is set to NULL as a side effect of current implementation if no bind address is configured. This is not guarnteed to hold true in the future.	2014-03-10 10:33:53 +01:00
antirez	3e8a92ef8d	Cluster: log error when anetTcpNonBlockBindConnect() fails.	2014-03-10 10:32:28 +01:00
Salvatore Sanfilippo	3b0edb80ec	Merge pull request #1567 from mattsta/fix-cluster-join Bind source address for cluster communication	2014-03-10 10:28:32 +01:00
antirez	0f1f25784f	Cluster: better timeout and retry time for failover. When node-timeout is too small, in the order of a few milliseconds, there is no way the voting process can terminate during that time, so we set a lower limit for the failover timeout of two seconds. The retry time is set to two times the failover timeout time, so it is at least 4 seconds.	2014-03-10 09:57:52 +01:00
antirez	6984692060	Cluster: fix conditional generating TRYAGAIN error.	2014-03-07 16:18:00 +01:00
antirez	36676c2318	Redis Cluster: support for multi-key operations.	2014-03-07 13:19:09 +01:00
Matt Stancliff	385c25f70f	Remove redundant IP length definition REDIS_CLUSTER_IPLEN had the same value as REDIS_IP_STR_LEN. They were both #define'd to the same INET6_ADDRSTRLEN.	2014-03-06 17:55:43 +01:00
Matt Stancliff	d2040ab9b1	Remove some redundant code Function nodeIp2String in cluster.c is exactly anetPeerToString with a pre-extracted fd.	2014-03-06 17:55:39 +01:00
Matt Stancliff	59cf0b1902	Fix return value check for anetTcpAccept anetTcpAccept returns ANET_ERR, not AE_ERR. This isn't a physical error since both ANET_ERR and AE_ERR are -1, but better to be consistent.	2014-03-06 17:55:31 +01:00
Matt Stancliff	e5b1e7be64	Bind source address for cluster communication The first address specified as a bind parameter (server.bindaddr[0]) gets used as the source IP for cluster communication. If no bind address is specified by the user, the behavior is unchanged. This patch allows multiple Redis Cluster instances to communicate when running on the same interface of the same host.	2014-03-04 17:36:45 -05:00
antirez	8dea2029a4	Fix configEpoch assignment when a cluster slot gets "closed". This is still code to rework in order to use agreement to obtain a new configEpoch when a slot is migrated, however this commit handles the special case that happens when the nodes are just started and everybody has a configEpoch of 0. In this special condition to have the maximum configEpoch is not enough as the special epoch 0 is not unique (all the others are). This does not fixes the intrinsic race condition of a failover happening while we are resharding, that will be addressed later.	2014-03-03 11:12:11 +01:00
Matt Stancliff	ce68caea37	Cluster: error out quicker if port is unusable The default cluster control port is 10,000 ports higher than the base Redis port. If Redis is started on a too-high port, Cluster can't start and everything will exit later anyway.	2014-02-19 17:30:07 -05:00
antirez	db6d628c3e	Cluster: clusterDelNode(): remove node from master's slaves.	2014-02-11 10:34:25 +01:00
antirez	5e0e03be41	Cluster: UPDATE messages are the norm and verbose. Logging them at WARNING level was of little utility and of sure disturb.	2014-02-11 10:18:24 +01:00
antirez	4a64286c36	Cluster: configEpoch assignment in SETNODE improved. Avoid to trash a configEpoch for every slot migrated if this node has already the max configEpoch across the cluster. Still work to do in this area but this avoids both ending with a very high configEpoch without any reason and to flood the system with fsyncs.	2014-02-11 10:09:17 +01:00
antirez	72f7abf6a2	Cluster: clusterSetStartupEpoch() made more generally useful. The actual goal of the function was to get the max configEpoch found in the cluster, so make it general by removing the assignment of the max epoch to currentEpoch that is useful only at startup.	2014-02-11 10:00:14 +01:00
antirez	44f7afe28a	Cluster: always increment the configEpoch in SETNODE after import. Removed a stale conditional preventing the configEpoch from incrementing after the import in certain conditions. Since the master got a new slot it should always claim a new configuration.	2014-02-11 09:50:37 +01:00
antirez	a1349728ea	Cluster: on resharding upgrade version of receiving node. The node receiving the hash slot needs to have a version that wins over the other versions in order to force the ownership of the slot. However the current code is far from perfect since a failover can happen during the manual resharding. The fix is a work in progress but the bottom line is that the new version must either be voted as usually, set by redis-trib manually after it makes sure can't be used by other nodes, or reserved configEpochs could be used for manual operations (for example odd versions could be never used by slaves and are always used by CLUSTER SETSLOT NODE).	2014-02-11 00:36:05 +01:00
antirez	6dc26795aa	Cluster: fsync at every SETSLOT command puts too pressure on disks. During slots migration redis-trib can send a number of SETSLOT commands. Fsyncing every time is a bit too much in production as verified empirically. To make sure configs are fsynced on all nodes after a resharding redis-trib may send something like CLUSTER CONFSYNC. In this case fsyncs were not providing too much value since anyway processes can crash in the middle of the resharding of an hash slot, and redis-trib should be able to recover from this condition anyway.	2014-02-10 23:54:08 +01:00
antirez	218358bbbd	Cluster: conditions to clear "migrating" on slot for SETSLOT ... NODE changed. If the slot is manually assigned to another node, clear the migrating status regardless of the fact it was previously assigned to us or not, as long as we no longer have keys for this slot. This avoid a race during slots migration that may leave the slot in migrating status in the source node, since it received an update message from the destination node that is already claiming the slot. This way we are sure that redis-trib at the end of the slot migration is always able to close the slot correctly.	2014-02-10 23:51:47 +01:00
antirez	bf670e0745	Cluster: don't update slave's master if we don't know it. There is no way we can update the slave's node->slaveof pointer if we don't know the master (no node with such an ID in our tables).	2014-02-10 18:33:34 +01:00
antirez	a3755ae9ee	Cluster: ignore slot config changes if we are importing it.	2014-02-10 18:04:43 +01:00
antirez	6fc53e16ad	Cluster: update configEpoch after manually messing with slots.	2014-02-10 18:01:58 +01:00
antirez	1a73c992a3	Cluster: fixed inverted arguments in logging function call.	2014-02-10 17:21:10 +01:00
antirez	32563b4a5f	Cluster: clear the FAIL status for masters without slots. Masters without slots don't participate to the cluster but just do redirections, no need to take them in FAIL state if they are back reachable.	2014-02-10 17:18:27 +01:00
antirez	5b2082ead3	Cluster: replica migration should only work for masters serving slots.	2014-02-10 17:08:37 +01:00
antirez	f885fa8bac	Cluster: clusterReadHandler() fixed to work with new message header.	2014-02-10 16:27:37 +01:00
antirez	7bf7b7350c	Cluster: signature changed to "RCmb" (Redis Cluster message bus). Sounds better after all.	2014-02-10 15:55:21 +01:00
antirez	dced9c0619	Cluster: discard bus messages with version != 0.	2014-02-10 15:54:22 +01:00
antirez	007e1c7cb2	Cluster: added signature + version in bus packets.	2014-02-10 15:53:09 +01:00
antirez	142281dc79	Cluster: keys slot computation now supports hash tags. Currently this is marginally useful, only to make sure two keys are in the same hash slot when the cluster is stable (no rehashing in progress). In the future it is possible that support will be added to run mutli-keys operations with keys in the same hash slot.	2014-02-07 17:39:01 +01:00
antirez	04fe000bf8	Cluster: fixed MF condition in clusterHandleSlaveFailover(). For manual failover we need a manual failover in progress, and that mf_can_start is true (master offset received and matched).	2014-02-05 16:01:56 +01:00
antirez	c6f02fd67a	Cluster: CLUSTER FAILOVER replies with OK and logs the event.	2014-02-05 15:52:38 +01:00
antirez	c72449af30	Cluster: check that a MF is in progress in manualFailoverCheckTimeout(). Otherwise it is always detected as a manual failover timed out.	2014-02-05 15:45:24 +01:00
antirez	b7402bcad5	Cluster: force AUTH ACK on manual failover. When a slave requests masters vote for a manual failover, the REQUEST_AUTH message is flagged in a special way in order to force the masters to give the authorization even if the master is not marked as failing.	2014-02-05 13:10:03 +01:00
antirez	4cf0cd5719	Cluster: manual failover initial implementation.	2014-02-05 13:01:24 +01:00
antirez	a7d30681c9	Cluster: configurable replicas migration barrier. It is possible to configure the min number of additional working slaves a master should be left with, for a slave to migrate to an orphaned master.	2014-01-31 11:26:36 +01:00
antirez	6c9359add1	Cluster: perform orphaned masters check before continue statements. The check was placed in a way that conflicted with the continue statements used by the node hearth beat code later that needs to skip the current node sometimes. Moved at the start of the function so that's always executed.	2014-01-30 18:23:31 +01:00
antirez	c2507b0ff6	Cluster: replica migration implementation. This feature allows slaves to migrate to orphaned masters (masters without working slaves), as long as a set of conditions are met, including the fact that the migrating slave needs to be in a master-slaves ring with at least another slave working.	2014-01-30 18:05:11 +01:00
antirez	5b4020fb42	Cluster: swap two code blocks to have a more obvious flow.	2014-01-30 16:34:23 +01:00
antirez	4beaaff8ea	Cluster: remove not needed return statement breaking failover.	2014-01-29 17:28:46 +01:00
antirez	3582054982	Cluster: broadcast pong to other slaves in the same ring. When we schedule a failover, broadcast a PONG to the slaves. The other slaves that plan to get elected will do the same too, this way it is likely that every slave will have a good picture of its own rank. Note that this is N*N messages where N is the number of slaves for the failing master, however usually even large clusters have many master nodes but a limited number of replicas per node, so this is harmless.	2014-01-29 17:19:55 +01:00
antirez	e2b59621a8	Cluster: log offset when announcing the failover election delay.	2014-01-29 17:16:10 +01:00
antirez	940531e9b7	Cluster: added progressive election delay according to slave rank. Note that when we compute the initial delay, there are probably still more up to date information to receive from slaves with new offsets, so the delay is recomputed when new data is available.	2014-01-29 16:53:45 +01:00
antirez	6f54032080	Cluster: function clusterGetSlaveRank() added. Return the number of slaves for the same master having a better replication offset of the current slave, that is, the slave "rank" used to pick a delay before the request for election.	2014-01-29 16:39:04 +01:00
antirez	40cd38f0c4	Cluster: update node replication offset from bus packets headers.	2014-01-29 16:01:00 +01:00
antirez	9d4ded7ec6	Cluster: refactoring: new macros to check node flags.	2014-01-29 12:17:16 +01:00
antirez	099bd336db	Cluster: use myself instead of server->cluster.myself.	2014-01-29 11:38:14 +01:00
antirez	e36bd8b43e	Cluster: added a global myself pointer in cluster.c. Accessing to the 'myself' node, the node representing the currently running instance, is handy without the need to type server.cluster->myself every time.	2014-01-29 11:22:22 +01:00
antirez	f1e09d8c41	Cluster: clusterBroadcastPong() improved with target selection. Now we can broadcast a pong to all the instances or just the local slaves (that is useful for replication offset propagation).	2014-01-29 11:08:52 +01:00
antirez	befcf6259e	Cluster: broadcast master/slave replication offset in bus header.	2014-01-28 16:51:50 +01:00
antirez	0b1b25c51c	Cluster: introduced repl_offset fields in clusterNode. The two fields are used in order to remember the latest known replication offset and the time we received it from other slave nodes. This will be used by slaves in order to start the election procedure with a delay that is proportional to the rank of the slave among the other slaves for this master, when sorted for replication offset. Usually this allows the slave with the most updated offset to win the election and replace the failing master in the cluster.	2014-01-28 16:28:07 +01:00
antirez	0f9422d575	Cluster: update slaves lists in clusterSetMaster().	2014-01-22 18:46:53 +01:00
antirez	5383ab0bc6	Cluster: CLUSTER SLAVES subcommand added.	2014-01-22 18:38:42 +01:00
antirez	603e480fd5	Cluster: clusterGenNodesDescription() refactored into two functions.	2014-01-22 18:36:12 +01:00
antirez	80e80668f4	Cluster: master nodes wait before rejoining the cluster after reboot. One of the simple heuristics used by Redis Cluster in order to avoid losing data in the typical failure modes created by the asynchronous replication with the slaves (a master is unable, when accepting a write, to immediately tell if it should be really accepted or refused because of a configuration change), is to wait some time before to rejoin the cluster after being partitioned away from the majority of instances. A similar condition happens when a master is restarted. It does not know if it was already failed over, nor if all the clients have already an updated configuration about the cluster map, so it is possible that clients will try to write to stale masters that were restarted. In a similar way this commit changes masters behavior so they wait 2000 milliseconds before accepting writes after a reboot. There is nothing special about 2 seconds if not to be a value supposedly larger a few orders of magnitude compared to the cluster bus communication latencies.	2014-01-20 11:52:52 +01:00
antirez	e6970e204f	Cluster: debug printf statemets removed. These were committed for error after being inserted in order to fix an issue.	2014-01-20 11:19:04 +01:00
antirez	ac3850cabd	Cluster: allow CLUSTER REPLICATE to switch master. The code was doing checks for slaves that should be done only when the instance is currently a master. Switching a slave from a master to another one should just work.	2014-01-17 18:22:35 +01:00
antirez	3d455393a6	Cluster: don't let a node forget its own master. redis-trib should make sure to reconfigure slaves of a node to remove from the cluster to replicate with other nodes before sending CLUSTER FORGET.	2014-01-16 17:49:35 +01:00
antirez	0c373207fa	Cluster: don't forget yourself with CLUSTER FORGET.	2014-01-16 09:46:23 +01:00
antirez	3e948970fe	Cluster: use the node blacklist in CLUSTER FORGET. CLUSTER FORGET is not useful if we can't remove a node from all the nodes of our cluster because of the Gossip protocol that keeps adding a given node to nodes where we already tried to remove it. So now CLUSTER FORGET implements a nodes blacklist that is set and checked by the Gossip section processing function. This way before a node is re-added at least 60 seconds must elapse since the FORGET execution. This means that redis-trib has some time to remove a node from a whole cluster. It is possible that in the future it will be uesful to raise the 60 sec figure to something bigger.	2014-01-15 16:50:45 +01:00
antirez	ccf268fa17	Cluster: fix clusterBlacklistAddNode() by setting right expire time. The hash table value should be set to now + 60 seconds otherwise it expires immediately.	2014-01-15 16:49:31 +01:00
antirez	4e1861155f	Cluster: clusterBlacklistAddNode() key lookup fixed. We can't lookup by node->name that's not an SDS string but a plain C array in the node structure.	2014-01-15 16:45:07 +01:00
antirez	b51be7b34f	Cluster: clusterBlacklistExists() requires blacklist cleanup before lookup.	2014-01-15 16:06:54 +01:00
antirez	a81340abaf	Cluster: set a minimum rejoin delay if node_timeout is too small. The rejoin delay usually is the node timeout. However if the node timeout is too small, we set it to 500 milliseconds, that is a value chosen to be greater than most setups RTT / instances latency figures so that likely communication with other nodes happen before rejoining.	2014-01-15 12:34:33 +01:00
antirez	a687cbc19c	Cluster: periodically call clusterUpdateState() when cluster is down. Usually we update the cluster state (to understand if we should accept queries or reply with an error) only when there is a change in the state of the nodes. However for the "delayed rejoin" feature to work, that is, for a master to wait some time before accepting queries again after it rejoins the majority, we need to periodically update the last time when the node was partitioned away from the majority. With this commit if the cluster is down we update the state ten times per second.	2014-01-15 12:26:12 +01:00
antirez	25ddefdea3	Cluster: range checking in getSlotOrReply() fixed. See issue #1426 on Github.	2014-01-15 11:33:46 +01:00
antirez	fb659cd334	Cluster: ignore empty lines in nodes.conf. Even without the user messing manually with the file, it is still possible to have blank lines (just a single "\n" per line) because of how the nodes.conf update/write process works.	2014-01-15 11:23:41 +01:00
antirez	6c63df3031	Cluster: atomic update of nodes.conf file. The way the file was generated was unsafe and leaded to nodes.conf file corruption (zero length file) on server stop/crash during the creation of the file. The previous file update method was as simple as open with O_TRUNC followed by the write call. While the write call was a single one with the full payload, ensuring no half-written files for POSIX semantics, stopping the server just after the open call resulted into a zero-length file (all the nodes information lost!).	2014-01-15 10:31:20 +01:00
antirez	28273394cb	Cluster: support to read from slave nodes. A client can enter a special cluster read-only mode using the READONLY command: if the client read from a slave instance after this command, for slots that are actually served by the instance's master, the queries will be processed without redirection, allowing clients to read from slaves (but without any kind fo read-after-write guarantee). The READWRITE command can be used in order to exit the readonly state.	2014-01-14 16:33:16 +01:00
antirez	58c8a071a5	Fix RESTORE ttl handling in 32 bit archs. long was used instead of long long in order to handle a 64 bit resolution millisecond timestamp. This fixes issue #1483.	2014-01-09 11:09:23 +01:00
antirez	f510549044	Cluster: clusterProcessPacket() was not 80 cols friendly. The function actually needs to be split into sub-functions at some point in the future.	2013-12-25 17:57:36 +01:00
antirez	66ec1412fe	Redis Cluster: add repl_ping_slave_period to slave data validity time. When the configured node timeout is very small, the data validity time (maximum data age for a slave to try a failover) is too little (ten times the configured node timeout) when the replication link with the master is mostly idle. In this case we'll receive some data from the master only every server.repl_ping_slave_period to refresh the last interaction with the master. This commit adds to the max data validity time the slave ping period to avoid this problem of slaves sensing too old data without a good reason. However this max data validity time is likely a setting that should be configurable by the Redis Cluster user in a way completely independent from the node timeout.	2013-12-22 10:05:16 +01:00
antirez	658aff9d29	Redis Cluster: move node failure reports logging from VERBOSE to NOTICE level.	2013-12-21 00:04:53 +01:00
antirez	5a404c87c1	Redis Cluster: remove no longer relevant comment.	2013-12-20 14:40:11 +01:00
antirez	fda4cba912	Redis Cluster: reconfigure replication when master changes address.	2013-12-20 12:47:22 +01:00
antirez	d7374032c0	Redis Cluster: handshake code refactoring + Gossip IP switch detection. This commit makes it simple to start an handshake with a specific node address, and uses this in order to detect a node IP change and start a new handshake in order to fix the IP if possible.	2013-12-20 12:38:03 +01:00
antirez	a2c938c834	Redis Cluster: delay state change when in the majority again. As specified in the Redis Cluster specification, when a node can reach the majority again after a period in which it was partitioend away with the minorty of masters, wait some time before accepting queries, to provide a reasonable amount of time for other nodes to upgrade its configuration. This lowers the probabilities of both a client and a master with not updated configuration to rejoin the cluster at the same time, with a stale master accepting writes.	2013-12-20 09:56:18 +01:00
antirez	7a666ac419	Cluster: set n->slaves to NULL in clusterNodeResetSlaves(). The value was otherwise undefined, so next time the node was promoted again from slave to master, adding a slave to the list of slaves would likely crash the server or result into undefined behavior.	2013-12-17 14:50:24 +01:00
antirez	fda91dbde3	Cluster: check link is valid before sending UPDATE.	2013-12-17 12:28:37 +01:00
antirez	f57bb36ce7	Cluster: initialize todo_before_sleep flags to 0.	2013-12-17 12:22:02 +01:00
antirez	c70c0c6db7	Cluster: use proper type mstime_t for ping delay var.	2013-12-17 10:27:36 +01:00
antirez	47815d38e0	Fixed clearNodeFailureIfNeeded() time type to mstime_t. This prevented 32bit cluster instances from clearing the FAIL flag when needed.	2013-12-17 09:45:52 +01:00
antirez	e88e6a6334	Cluster: use long long for timestamps in clusterGenNodesDescription(). Ping sent and pong received fields need to be casted to long long to be printed correctly into 32 bit systems.	2013-12-17 09:38:11 +01:00
antirez	11e81a1e9a	Fixed grammar: before H the article is a, not an.	2013-12-05 16:35:32 +01:00
antirez	6fa42b7507	Cluster: nodes re-addition blacklist API.	2013-12-02 11:12:23 +01:00
antirez	8f18345ef0	Cluster: basic data structures for nodes black list.	2013-11-29 17:37:06 +01:00
antirez	3db825fde4	Cluster: some code about clusterHandleSlaveFailover() marginally improved. 80 cols friendly, some minor change to the code to make it simpler.	2013-11-29 16:17:05 +01:00
antirez	a5e7358a12	Cluster: removed not needed newline at end of redisLog() msg.	2013-11-08 17:28:02 +01:00
antirez	28071caf38	Cluster: send a single UPDATE packet for now.	2013-11-08 17:25:49 +01:00
antirez	d289c628b1	Cluster: replace hardcoded 4096 for bus msg len with sizeof().	2013-11-08 17:19:19 +01:00
antirez	94a07d5901	Cluster: slots update refactored + UPDATE msg processing. Now there is a function that handles the update of the local slot configuration every time we have some new info about a node and its set of served slots and configEpoch. Moreoever the UPDATE packets are now processed when received (it was a work in progress in the previous commit).	2013-11-08 17:02:10 +01:00
antirez	dc43f66eac	Cluster: UPDATE msg data structure and sending function.	2013-11-08 16:26:50 +01:00
antirez	6c6572be95	Cluster: refactoring of slots update code and more. The commit also introduces detection of nodes publishing not updated configuration. More work in progress to send an UPDATE packet to inform of the config change.	2013-11-08 10:32:16 +01:00
antirez	1a0cea33a0	Cluster: initialize senderConfigEpoch and senderCurrentEpoch for warnings suppression.	2013-11-05 12:01:07 +01:00
antirez	0c9f60a628	Cluster: there is a lower limit for the handshake timeout.	2013-10-11 10:34:32 +02:00
antirez	1447d28c0f	Cluster: data_age conversion to milliseconds fixed.	2013-10-09 16:36:06 +02:00
antirez	573c2fea91	Cluster: clusterCron() freq is now 10h. Still ping 1 node every sec. After the change in clusterCron() frequency of call, we still want to ping just one random node every second.	2013-10-09 16:29:17 +02:00
antirez	ba42428633	Cluster: time switched from seconds to milliseconds. All the internal state of cluster involving time is now using mstime_t and mstime() in order to use milliseconds resolution. Also the clusterCron() function is called with a 10 hz frequency instead of 1 hz. The cluster node_timeout must be also configured in milliseconds by the user in redis.conf.	2013-10-09 16:19:26 +02:00
antirez	929b6a4480	Cluster: cluster stuff moved from redis.h to cluster.h.	2013-10-09 15:38:05 +02:00
antirez	ae2763f564	Cluster: masters don't vote for a slave with stale config. When a slave requests our vote, the configEpoch he claims for its master and the set of served slots must be greater or equal to the configEpoch of the nodes serving these slots in the current configuraiton of the master granting its vote. In other terms, masters don't vote for slaves having a stale configuration for the slots they want to serve.	2013-10-08 12:45:35 +02:00
antirez	f7d6ad4366	Cluster: fix slave data age computation when master is still connected.	2013-10-07 16:07:13 +02:00
antirez	2c3301b9f5	Cluster: log message improved when FAIL is cleared from a slave node.	2013-10-07 15:44:58 +02:00
antirez	72f38cd70f	Cluster: slave nodes advertise master slots bitmap and configEpoch.	2013-10-07 11:31:12 +02:00
antirez	7afc0dd59a	Cluster: new clusterDoBeforeSleep() API. The new API is able to remember operations to perform before returning to the event loop, such as checking if there is the failover quorum for a slave, save and fsync the configuraiton file, and so forth. Because this operations are performed before returning on the event loop we are sure that messages that are sent in the same event loop run will be delivered after the configuration is already saved, that is a requirement sometimes. For instance we want to publish a new epoch only when it is already stored in nodes.conf in order to avoid returning back in the logical clock when a node is restarted. This new API provides a big performance advantage compared to saving and possibly fsyncing the configuration file multiple times in the same event loop run, especially in the case of big clusters with tens or hundreds of nodes.	2013-10-03 09:58:06 +02:00
antirez	211dcbe339	Cluster: update cluster config when slave changes master.	2013-10-02 12:27:12 +02:00
antirez	6c4d904baf	Cluster: bus messages stats in CLUSTER info.	2013-10-02 10:10:08 +02:00
antirez	abe81781ae	Cluster: FAIL messages from unknown senders are handled better. Previously the event was not logged but instead the node reported an unknown packet type received.	2013-10-02 09:42:45 +02:00
antirez	7970ebd80a	Cluster: senderCurrentEpoch == node currentEpoch was too strict. We can accept a vote as long as its epoch is >= the epoch at which we started the voting process. There is no need for it to be exactly the same.	2013-10-01 17:21:28 +02:00
antirez	f1bfd8233b	Cluster: fix typo in clusterProcessPacket() comment.	2013-10-01 15:40:20 +02:00
antirez	1dedf9aa36	Cluster: time field removed from cluster messages header. The new algorithm does not check replies time as checking for the currentEpoch in the reply ensures that the reply is about the current election process.	2013-09-30 16:19:44 +02:00
antirez	2d0844ee37	Cluster: log message shortened.	2013-09-30 11:51:58 +02:00
antirez	4dc247eb31	Cluster: detect cluster reconfiguration when master slots drop to 0. The old algorithm used a PROMOTED flag and explicitly checks about slave->master convertions. Wit the new cluster meta-data propagation algorithm we just look at the configEpoch to check if we need to reconfigure slots, then: 1) If a node is a master but it reaches zero served slots becuase of reconfiguration. 2) If a node is a slave but the master reaches zero served slots because of a reconfiguration. We switch as a replica of the new slots owner.	2013-09-30 11:45:26 +02:00
antirez	62b1591439	Cluster: re-order failover operations to make it safer. We need to: 1) Increment the configEpoch. 2) Save it to disk and fsync the file. 3) Broadcast the PONG with the new configuration. If other nodes will receive the updated configuration we need to be sure to restart with this new config in the event of a crash.	2013-09-30 10:16:48 +02:00
antirez	b187517719	Cluster: when upading the configEpoch for a node, save config on disk ASAP.	2013-09-30 10:16:25 +02:00
antirez	03ca903983	Cluster: fsync data when saving the cluster config.	2013-09-30 10:13:07 +02:00
antirez	026e63392e	Cluster: update the node configEpoch when newer is detected.	2013-09-27 09:55:41 +02:00
antirez	7c4b8f29e7	Cluster: react faster when a slave wins an election.	2013-09-26 16:54:43 +02:00
antirez	42fa46e49a	Cluster: removed an old source of delay to start the slave failover.	2013-09-26 13:28:19 +02:00
antirez	a445aa30a0	Cluster: master node now uses new protocol to vote.	2013-09-26 13:00:41 +02:00
antirez	fb9b76fe14	Cluster: slave node now uses the new protocol to get elected.	2013-09-26 11:13:17 +02:00
antirez	32b5410af9	Cluster: add currentEpoch to CLUSTER INFO.	2013-09-25 12:38:36 +02:00
antirez	6ec795d2cf	Cluster: update our currentEpoch when a greater one is seen.	2013-09-25 12:36:29 +02:00
antirez	d426ada891	Cluster: broadcast currentEpoch and configEpoch in packets header.	2013-09-25 11:53:35 +02:00
antirez	12483b0061	Cluster: configEpoch added in cluster nodes description.	2013-09-25 11:47:13 +02:00
antirez	3c9bb8751a	Cluster: PFAIL -> FAIL transition allowed for slaves. First change: now there is no need to be a master in order to detect a failure, however the majority of masters signaling PFAIL or FAIL is needed. This change is important because it allows slaves rejoining the cluster after a partition to sense the FAIL condition so that eventually all the nodes agree on failures.	2013-09-20 11:26:44 +02:00
antirez	925ea9f858	Cluster: added time field in cluster bus messages. The time is sent in requests, and copied back in reply packets. This way the receiver can compare the time field in a reply with its local clock and check the age of the request associated with this reply. This is an easy way to discard delayed replies. Note that only a clock is used here, that is the one of the node sending the packet. The receiver only copies the field back into the reply, so no synchronization is needed between clocks of different hosts.	2013-09-20 09:22:21 +02:00
antirez	d0e327413b	Cluster: don't add an handshake node for the same ip:port pair multiple times.	2013-09-04 15:52:16 +02:00
antirez	72587e6cc5	Cluster: free HANDSHAKE nodes after node_timeout. Handshake nodes should turn into normal nodes or be freed in a reasonable amount of time, otherwise they'll keep accumulating if the address they are associated with is not reachable for some reason.	2013-09-04 12:41:21 +02:00
antirez	8eff339ca4	Cluster: CLUSTER SAVECONFIG command added.	2013-09-04 10:33:00 +02:00
antirez	528201ad6c	Cluster: don't save HANDSHAKE nodes in nodes.conf.	2013-09-04 10:25:26 +02:00
antirez	e5d5da6f7c	Cluster: always use safe iteartors to iterate server.cluster->nodes.	2013-09-04 10:07:50 +02:00
antirez	354a5de270	Cluster: clusterReadHandler() reworked to be more correct and simpler to follow.	2013-09-03 11:43:52 +02:00
antirez	1036b4b21b	Cluster: use non-blocking I/O for the cluster bus.	2013-09-03 11:43:52 +02:00
antirez	f6efb6cdec	Cluster: fixed a bug in clusterSendPublish() due to inverted statements. The code used to copy the header after the 'hdr' pointer was already switched to the new buffer. Of course we need to do the reverse.	2013-09-03 11:43:43 +02:00
antirez	303dde3757	Don't update node pong time via gossip. This feature was implemented in the initial days of the Redis Cluster implementaiton but is not a good idea at all. 1) It depends on clocks to be synchronized, that is already very bad. 2) Moreover it adds a bug where the pong time is updated via gossip so no new PING is ever sent by the current node, with the effect of no PONG received, no update of tables, no clearing of PFAIL flag. In general to trust other nodes about the reachability of other nodes is a broken distributed programming model.	2013-08-26 16:16:25 +02:00
antirez	6ae37b0e1d	Cluster: set event handler in cluster bus listening socket. The commit using listenToPort() introduced this bug by no longer creating the event handler to handle incoming messages from the cluster bus.	2013-08-22 14:53:53 +02:00
antirez	81a6a9639a	Use listenToPort() in cluster.c as well.	2013-08-22 14:05:07 +02:00
antirez	042776aff7	Cluster: fix CLUSTER MEET ip address validation. This was broken by the IPv6 support patches.	2013-08-22 11:54:28 +02:00
antirez	9cf30132cc	Cluster: process MEET packets as PING packets. Somewhat a previous commit broken this so CLUSTER MEET was no longer working.	2013-08-22 11:53:28 +02:00
antirez	b804afcf01	Use a safe dict.c iterator in clusterCron().	2013-08-21 15:51:15 +02:00
antirez	6ea8e0949c	sdsrange() does not need to return a value. Actaully the string is modified in-place and a reallocation is never needed, so there is no need to return the new sds string pointer as return value of the function, that is now just "void".	2013-07-24 11:21:39 +02:00
antirez	894eba07c8	Introduction of a new string encoding: EMBSTR Previously two string encodings were used for string objects: 1) REDIS_ENCODING_RAW: a string object with obj->ptr pointing to an sds stirng. 2) REDIS_ENCODING_INT: a string object where the obj->ptr void pointer is casted to a long. This commit introduces a experimental new encoding called REDIS_ENCODING_EMBSTR that implements an object represented by an sds string that is not modifiable but allocated in the same memory chunk as the robj structure itself. The chunk looks like the following: +--------------+-----------+------------+--------+----+ \| robj data... \| robj->ptr \| sds header \| string \| \0 \| +--------------+-----+-----+------------+--------+----+ \| ^ +-----------------------+ The robj->ptr points to the contiguous sds string data, so the object can be manipulated with the same functions used to manipulate plan string objects, however we need just on malloc and one free in order to allocate or release this kind of objects. Moreover it has better cache locality. This new allocation strategy should benefit both the memory usage and the performances. A performance gain between 60 and 70% was observed during micro-benchmarks, however there is more work to do to evaluate the performance impact and the memory usage behavior.	2013-07-22 10:31:38 +02:00
antirez	631d656a94	All IP string repr buffers are now REDIS_IP_STR_LEN bytes.	2013-07-09 11:32:52 +02:00
Geoff Garside	ca78446c55	Mark places that might want changing for IPv6. Any places which I feel might want to be updated to work differently with IPv6 have been marked with a comment starting "IPV6:". Currently the only comments address places where an IP address is combined with a port using the standard : separated form. These may want to be changed when printing IPv6 addresses to wrap the address in [] such as [2001:db8::c0:ffee]:6379 instead of 2001:db8::c0:ffee:6379 as the latter format is a technically valid IPv6 address and it is hard to distinguish the IPv6 address component from the port unless you know the port is supposed to be there.	2013-07-08 15:58:14 +02:00
Geoff Garside	f7d9a92d4e	Mark ip string buffers which could be reduced. In two places buffers have been created with a size of 128 bytes which could be reduced to INET6_ADDRSTRLEN to still hold a full IP address. These places have been marked as they are presently big enough to handle the needs of storing a printable IPv6 address.	2013-07-08 15:57:23 +02:00
Geoff Garside	e6bf4c2676	Update clusterCommand to handle AF_INET6 addresses Changes the sockaddr_in to a sockaddr_storage. Attempts to convert the IP address into an AF_INET or AF_INET6 before returning an "Invalid IP address" error. Handles converting the sockaddr from either AF_INET or AF_INET6 back into a string for storage in the clusterNode ip field.	2013-07-08 15:57:23 +02:00
Geoff Garside	5be83eecac	Update node2IpString to handle AF_INET6 addresses. Change the sockaddr_in to sockaddr_storage which is capable of storing both AF_INET and AF_INET6 sockets. Uses the sockaddr_storage ss_family to correctly return the printable IP address and port. Function makes the assumption that the buffer is of at least REDIS_CLUSTER_IPLEN bytes in size.	2013-07-08 15:57:23 +02:00
Geoff Garside	b39e827d22	Add missing includes for getpeername. getpeername(2) requires <sys/socket.h> which on some systems also requires <sys/types.h>. Include both to avoid compilation warnings.	2013-07-08 15:55:39 +02:00
Geoff Garside	9cfa02fe73	Add macro to define clusterNode.ip buffer size. Add REDIS_CLUSTER_IPLEN macro to define the size of the clusterNode ip character array. Additionally use this macro in inet_ntop(3) calls where the size of the array was being defined manually. The REDIS_CLUSTER_IPLEN is defined as INET_ADDRSTRLEN which defines the correct size of a buffer to store an IPv4 address in. The INET_ADDRSTRLEN macro itself is defined in the <netinet/in.h> header file and should be portable across the majority of systems.	2013-07-08 15:55:39 +02:00
Geoff Garside	6e894f02cf	Fix cluster.c inet_ntop use of sizeof(n->ip). Using sizeof with an array will only return expected results if the array is created in the scope of the function where sizeof is used. This commit changes the inet_ntop calls so that they use the fixed buffer value as defined in redis.h which is 16.	2013-07-08 15:51:37 +02:00
Geoff Garside	693b640510	Use inet_pton(3) in clusterCommand. Replace inet_aton(3) call with the more future proof inet_pton(3) function which is capable of handling additional address families.	2013-07-08 15:51:37 +02:00
Geoff Garside	a6ea707cec	Use inet_ntop(3) in nodeIp2String & clusterCommand Replace inet_ntoa(3) calls with the more future proof inet_ntop(3) function which is capable of handling additional address families.	2013-07-08 15:51:37 +02:00
Geoff Garside	f5494a427e	Update anetTcpAccept & anetPeerToString calls. Add the additional ip buffer length argument to function calls of anetTcpAccept and anetPeerToString in network.c and cluster.c	2013-07-08 15:51:37 +02:00
antirez	98eecb70eb	Binding multiple IPs done properly with multiple sockets.	2013-07-05 11:47:20 +02:00
antirez	2160effc78	Revert "Cluster: use new anet.c listening socket creation API." This reverts commit `016ac38a21`.	2013-07-05 11:08:44 +02:00
antirez	016ac38a21	Cluster: use new anet.c listening socket creation API.	2013-07-04 18:49:49 +02:00
antirez	dfc98dccf4	Cluster: detect nodes address change.	2013-06-12 10:50:07 -07:00
antirez	d427373f01	clusterProcessPacket() comments improved for correctness.	2013-06-11 21:34:34 +02:00
antirez	5c9f6d4f55	Cluster: link reconnection on delayed PONG reply. When the PONG delay is half the cluster node timeout, the link gets disconnected (and later automatically reconnected) in order to ensure that it's not just a dead connection issue. However this operation is only performed if the link is old enough, in order to avoid to disconnect the same link again and again (and among the other problems, never receive the PONG because of that). Note: when the link is reconnected, the 'ping_sent' field is not updated even if a new ping is sent using the new connection, so we can still reliably detect a node ping timeout.	2013-05-03 15:43:03 +02:00
antirez	1315b9f246	Cluster: restore PING sent time on reconnections.	2013-05-03 15:42:59 +02:00
antirez	ae71731019	Cluster: PING/PONG handling redesigned.	2013-05-03 15:42:38 +02:00
antirez	a120560f70	Cluster: process config from PING packets as we do for PONG. Also clusterBroadcastPing() was renamed into clusterBroadcastPong() that's what the function is actually doing.	2013-05-03 15:41:34 +02:00
antirez	8a51c067ad	Cluster: createClusterLink() comment fixed for grammar.	2013-05-03 15:41:29 +02:00
xiaost7	ecdbaf4695	Cluster: fix clusterNode.name print format on debug message. It was %40s instead of %.40s, and since the string is not null terminated it caused random garbage to be displayed, and possibly a crash.	2013-04-19 09:53:43 +02:00
antirez	b84570dece	Cluster: reconfigure additonal slaves on failover.	2013-04-09 12:13:26 +02:00
antirez	68cf249f81	Cluster: use server.cluster_node_timeout directly. We used to copy this value into the server.cluster structure, however this was not necessary. The reason why we don't directly use server.cluster->node_timeout is that things that can be configured via redis.conf need to be directly available in the server structure as server.cluster is allocated later only if needed in order to reduce the memory footprint of non-cluster instances.	2013-04-09 11:24:18 +02:00
antirez	ef4f25ff6e	Cluster: configdigest field no longer used. Removed.	2013-04-09 11:07:25 +02:00
antirez	f09b2508f4	Cluster: properly send ping to nodes not pinged foro too much time. In commit `d728ec6` it was introduced the concept of sending a ping to every node not receiving a ping since node_timeout/2 seconds. However the code was located in a place that was not executed because of a previous conditional causing the loop to re-iterate. This caused false positives in nodes availability detection. The current code is still not perfect as a node may be detected to be in PFAIL state even if it does not reply for just node_timeout/2 seconds that is not correct. There is a plan to improve this code ASAP.	2013-04-08 19:40:20 +02:00
antirez	05fa4f4034	Cluster: node timeout is now configurable.	2013-04-04 12:29:10 +02:00
antirez	00bab23c41	Cluster: turn hardcoded node timeout multiplicators into defines. Most Redis Cluster time limits are expressed in terms of the configured node timeout. Turn them into defines.	2013-04-04 12:04:11 +02:00
antirez	c39e34d007	Cluster: when slave changes master, remove it from the old master.	2013-03-25 15:01:25 +01:00
antirez	34c1871e9f	Cluster: set node role on successful handshake.	2013-03-25 13:03:01 +01:00
antirez	a8b09faf3d	Cluster: comment no longer in sync with code removed.	2013-03-21 10:47:10 +01:00
antirez	8c1bc8e865	Cluster: clear the PROMOTED slave directly into clusterSetMaster(). This way we make sure every time a master is turned into a replica the flag will be cleared.	2013-03-20 11:51:44 +01:00
antirez	e006407fd0	Cluster: master node must clear its hash slots when turning into a slave. When a master turns into a slave after a failover event, make sure to clear the assigned slots before setting up the replication, as a slave should never claim slots in an explicit way, but just take over the master slots when replacing its master.	2013-03-20 11:32:35 +01:00
antirez	506f9a42b0	Cluster: new flag PROMOTED introduced. A slave node set this flag for itself when, after receiving authorization from the majority of nodes, it turns itself into a master. At the same time now this flag is tested by nodes receiving a PING message before reconfiguring after a failover event. This makes the system more robust: even if currently there is no way to manually turn a slave into a master it is possible that we'll have such a feature in the future, or that simply because of misconfiguration a node joins the cluster as master while others believe it's a slave. This alone is now no longer enough to trigger reconfiguration as other nodes will check for the PROMOTED flag. The PROMOTED flag is cleared every time the node is turned back into a replica of some other node.	2013-03-20 10:48:42 +01:00
antirez	026b9483db	Cluster: add sender flags in cluster bus messages header. Sender flags were not propagated for the sender, but only for nodes in the gossip section. This is odd and in the next commits we'll need to get updated flags for the sender node, so this commit adds a new field in the cluster messages header. The message header is the same size as we reused some free space that was marked as 'unused' because of alignment concerns.	2013-03-20 10:32:00 +01:00
antirez	d15b027d91	Cluster: turn old master into a replica of node that failed over. So when the failing master node is back in touch with the cluster, instead of remaining unused it is converted into a replica of the new master, ready to perform the fail over if the new master node will fail at some point. Note that as a side effect clients with stale configuration are now not an issue as well, as the node converted into a slave will not accept queries but will redirect clients accordingly.	2013-03-20 00:30:47 +01:00
antirez	4d62623015	Cluster: node replication role change handle improved. The code handling a master that turns into a slave or the contrary was improved in order to avoid repeating the same operations. Also the readability and conceptual simplicity was improved.	2013-03-19 16:01:30 +01:00
antirez	88221f88c0	Cluster: new command CLUSTER FLUSHSLOTS. It's just a simpler way to CLUSTER DELSLOTS with all the slots as arguments, in order to obtain a node without assigned slots for reconfiguration.	2013-03-19 09:58:05 +01:00
antirez	e28e61e839	Cluster: when failing over claim master slots.	2013-03-15 16:53:41 +01:00
antirez	dd091661d4	Cluster: log when a slave asks for failover authorization.	2013-03-15 16:44:08 +01:00
antirez	1375b0611b	Cluster: slaves start failover with a small delay. Redis Cluster can cope with a minority of nodes not informed about the failure of a master in time for some reason (netsplit or node not functioning properly, blocked, ...) however to wait a few seconds before to start the failover will make most "normal" failovers simpler as the FAIL message will propagate before the slave election happens.	2013-03-15 16:39:49 +01:00
antirez	d512a09c20	Cluster: a bit more serious node role change handling.	2013-03-15 16:35:16 +01:00
antirez	004fbef847	Cluster: remove node from master slaves when it turns into a master. Also, a few nearby comments improved.	2013-03-15 16:16:19 +01:00
antirez	44c92f5aeb	Cluster: slave failover implemented.	2013-03-15 16:11:34 +01:00
antirez	1d8f302e0d	Cluster: election -> promotion in two comments.	2013-03-15 15:44:49 +01:00
antirez	bf82195467	Cluster: added function to broadcast pings. See the function top-comment for info why this is useful sometimes.	2013-03-15 15:43:58 +01:00
antirez	892e98548a	Cluster: don't broadcast messages to HANDSHAKE nodes. Also don't check for NOADDR as we check that node->link is not NULL that's enough.	2013-03-15 15:36:36 +01:00
antirez	76a3954f4a	Cluster: fix clusterHandleSlaveFailover() conditional: quorum is enough.	2013-03-15 13:20:34 +01:00
antirez	90e99a2082	Cluster: two lame bugs fixed in FAILOVER AUTH messages generation.	2013-03-14 21:27:12 +01:00
antirez	aeacaa57e6	Cluster: code to process messages moved in the right if-else chain.	2013-03-14 21:21:58 +01:00
antirez	35f05c66b6	Cluster: handle FAILOVER_AUTH_ACK messages. That's trivial as we just need to increment the count of masters that received with an ACK.	2013-03-14 16:43:13 +01:00
antirez	c2595500ac	Cluster: request failover authorization, log if we have quorum. However the failover is yet not really performed.	2013-03-14 16:39:02 +01:00
antirez	7fa42b801d	Cluster: clusterSendFailoverAuth() implementation.	2013-03-14 16:31:57 +01:00
antirez	f59ff6fe61	Cluster: clusterSendFailoverAuthIfNeeded() work in progress.	2013-03-13 19:08:03 +01:00
antirez	44f6fdab60	Cluster: handle FAILOVER_AUTH_REQUEST in clusterProcessPacket(). However currently the control is passed to a function doing nothing at all.	2013-03-13 18:38:08 +01:00
antirez	ece95b2dea	Cluster: sanity check FAILOVER_AUTH_REQUEST messages for proper length.	2013-03-13 17:31:26 +01:00
antirez	66144337bf	Cluster: use 'else if' for mutually exclusive conditionals.	2013-03-13 17:27:06 +01:00
antirez	db7c17e969	Cluster: FAILOVER_AUTH_REQUEST message type introduced. This message is sent by a slave that is ready to failover its master to other nodes to get the authorization from the majority of masters.	2013-03-13 17:21:20 +01:00
antirez	575cbc9990	Cluster: clusterHandleSlaveFailover() stub.	2013-03-13 13:10:49 +01:00
antirez	3d448bda39	Cluster: call clusterHandleSlaveFailover() when our master is down.	2013-03-13 12:44:02 +01:00
antirez	f0b807cd47	Cluster: update cluster state on PFAIL flag set/cleared on nodes.	2013-03-07 15:40:53 +01:00
antirez	299b8f76c2	Cluster: mark cluster state as fail of majority of masters is unreachable.	2013-03-07 15:36:59 +01:00
antirez	abf06fd5ff	Cluster: log global cluster state change.	2013-03-07 15:22:32 +01:00
antirez	3dad8196b7	Cluster: clusterUpdateState() function simplified. Also the NEEDHELP Cluster state was removed as it will no longer be used by Redis Cluster.	2013-03-06 18:25:40 +01:00
antirez	011fa89ac9	Cluster: sdssplitargs_free() -> sdsfreesplitres().	2013-03-06 12:38:06 +01:00
antirez	1025dd7786	Cluster: connect to our master ASAP after startup if we are a slave node.	2013-03-05 16:12:08 +01:00
antirez	bac57ad14b	Cluster: more robust FAIL flag cleaup. If we have a master in FAIL state that's reachable again, and apparently no one is going to serve its slots, clear the FAIL flag and let the cluster continue with its operations again.	2013-03-05 15:05:32 +01:00
antirez	1a02b7440a	Cluster: new node field fail_time. This is the unix time at which we set the FAIL flag for the node. It is only valid if FAIL is set. The idea is to use it in order to make the cluster more robust, for instance in order to revert a FAIL state if it is long-standing but still slots are assigned to this node, that is, no one is going to fix these slots apparently.	2013-03-05 13:15:05 +01:00
antirez	e4b481a5f6	Cluster: A comment updated in clusterCron().	2013-03-05 12:17:30 +01:00
antirez	d728ec6dee	Cluster: send a ping to every node we never contacted in timeout/2 seconds. Usually we try to send just 1 ping every second, however when we detect we are going to have unreliable failure detection because we can't ping some node in time, send an additional ping. This should only happen with very large clusters or when the the node timeout is set to a very low value.	2013-03-05 12:16:02 +01:00
antirez	e7628be2a7	Cluster: set node->slaveof correctly when a node state is updated.	2013-03-05 11:50:11 +01:00
antirez	d6457577d4	Cluster: don't perform startup slots sanity check for slaves. If we are a cluster node the DB content will not match our configured slots. Don't do the check at all.	2013-03-04 19:47:00 +01:00
antirez	d334897e80	Cluster: fix maximum line length when loading config. There are pathological cases where the line can be even longer a single node may contain all the slots in importing/migrating state.	2013-03-04 19:45:36 +01:00
antirez	b8a28bf442	Cluster: actually setup replication in CLUSTER REPLICATE.	2013-03-04 15:27:58 +01:00
antirez	0c01088b51	Cluster: REPLICATE subcommand and stub for clusterSetMaster().	2013-03-04 13:15:09 +01:00
charsyam	bc84c399f8	adding check error code adding check error code	2013-03-04 11:20:11 +01:00
antirez	caf9b24a7d	Cluster: don't set the slot as unassigned because of PONG info. As stated in the comment this is usually due to a resharding in progress so the client should be still redirected to the old node that will handle the redirection elsewhere.	2013-02-28 15:54:29 +01:00
antirez	0d77440b26	Cluster: better handling of slots changes in PONG packets. The new code makes sure that the node slots bitmap is always consistent with the cluster->slots array.	2013-02-28 15:41:54 +01:00
antirez	5f8fd27ace	Cluster: refactoring of clusterNode*Bit to use helper bitmap functions.	2013-02-28 15:23:09 +01:00
antirez	d21d6b666f	Cluster: use node->numslots instead of popcount() where possible.	2013-02-28 15:13:32 +01:00
antirez	4521115b17	Cluster: new field in cluster node structure, "numslots". Before a relatively slow popcount() operation was needed every time we needed to get the number of slots served by a given cluster node. Now we just need to check an integer that is taken in sync with the bitmap.	2013-02-28 15:11:05 +01:00
antirez	a2566d6618	Cluster: don't gossip about nodes that are not useful to the cluster.	2013-02-28 15:00:09 +01:00
antirez	d45d184118	Cluster: CLUSTER FORGET implemented.	2013-02-27 17:55:59 +01:00
antirez	d2b8281b3f	Cluster: added a missing return on CLUSTER SETSLOT.	2013-02-27 17:53:48 +01:00
antirez	d20dea3eb7	Cluster: blank node address when flagging it as NOADDR.	2013-02-27 17:09:33 +01:00
antirez	2dcb5ab72b	Cluster: add comments in sub-sections of CLUSTER command.	2013-02-27 16:12:59 +01:00
antirez	f9b5ca29fd	Use GCC printf format attribute for redisLog(). This commit also fixes redisLog() statements producing warnings.	2013-02-27 12:27:15 +01:00
antirez	d0992d6e8b	Cluster: a few random fixes to the new failure detection.	2013-02-26 15:15:44 +01:00
antirez	f288b07563	Cluster: log the event when we clear the FAIL flag.	2013-02-26 15:03:38 +01:00
antirez	97ffcd351b	Cluster: use the failure report API to reimplement failure detection. The new system detects a failure only when there is quorum from masters.	2013-02-26 14:58:39 +01:00
antirez	1b1b3f6c06	Cluster: invert two functions declarations in more natural order.	2013-02-26 11:19:48 +01:00
antirez	d5e8b0a47f	Cluster: cleanup idle failure reports every time we remove one. This is not very important as anyway when the function counting the number of reports is called the cleanup is performed. However with this change if only part of the nodes that reported the failure will report the node is back ok, we'll cleanup the older entries ASAP. In complex split net split scenarios, and when we are dealing with clusters having nodes in the order of ~ 1000, this can save some CPU.	2013-02-26 11:15:18 +01:00
antirez	9cb578ced0	Cluster: new function clusterNodeDelFailureReport() for failure reports. This is the missing part of the API that will be used to reimplement failure detection of Cluster nodes.	2013-02-25 19:13:22 +01:00
antirez	18f537083a	Cluster: no limits for the count parameter of CLUSTER GETKEYSINSLOT. Not sure why I set a limit to 1 million keys, there is no reason for this artificial limit, and anyway this is s a stupid limit because it is already high enough to create latency issues. So let's the users shoot on their feet because maybe they just actually know what they are doing.	2013-02-25 12:41:13 +01:00
antirez	544bbe5387	Cluster: validate slot number in CLUSTER COUNTKEYSINSLOT.	2013-02-25 12:40:32 +01:00
antirez	d4fa40655d	Cluster: new sub-command CLUSTER COUNTKEYSINSLOT. The new sub-command uses the new countKeysInSlot() API and allows a cluster client to get the number of keys for a given hashslot.	2013-02-25 12:04:31 +01:00
antirez	a517c89321	Cluster: verifyClusterConfigWithData() implemented.	2013-02-25 11:43:49 +01:00
antirez	d2154254be	Cluster: fix case for getKeysInSlot() and countKeysInSlot(). Redis functions start in low case. A few functions about cluster were capitalized the wrong way.	2013-02-25 11:25:40 +01:00
antirez	c2eb4a606f	Cluster: use CountKeysInSlot() when we just need the count.	2013-02-25 11:23:04 +01:00
antirez	ad3bca1fdf	Cluster: added stub for verifyClusterConfigWithData(). See the top-comment for the function in this commit for details about what the function is supposed to do.	2013-02-25 11:20:17 +01:00
antirez	825e07f2fd	Cluster: if no previous config exists, create the myself node as master.	2013-02-22 19:24:01 +01:00
antirez	f4093753e4	Cluster: add cluster_size field in CLUSTER INFO output.	2013-02-22 19:20:38 +01:00
antirez	d218a4e244	Cluster: new state information, cluster size. The definition of cluster size is: the number of known nodes in the cluster that are masters and serving at least an hash slot.	2013-02-22 19:18:30 +01:00
antirez	5c55ed9388	Cluster: remove warning adding clusterNodeSetSlotBit() prototype.	2013-02-22 17:45:49 +01:00
antirez	974929770b	Cluster: introduced a failure reports system. A §Redis Cluster node used to mark a node as failing when itself detected a failure for that node, and a single acknowledge was received about the possible failure state. The new API will be used in order to possible to require that N other nodes have a PFAIL or FAIL state for a given node for a node to set it as failing.	2013-02-22 17:43:35 +01:00
antirez	07b6322735	Cluster: more correct update of slots state when PONG is received.	2013-02-21 16:52:06 +01:00
antirez	c6da9d9fac	Call clusterUpdateState() after CLUSTER SETSLOT too.	2013-02-21 16:31:22 +01:00
antirez	3a99d1228a	Aesthetic change to make a line more 80-cols friendly.	2013-02-21 16:24:48 +01:00
antirez	dc4af60628	Cluster: clusterAddSlot() was not doing what stated in the comment.	2013-02-21 11:51:17 +01:00
antirez	fdb57233e2	Cluster: always use cluster(Add\|Del)Slot to modify the cluster slots table.	2013-02-21 11:44:58 +01:00
antirez	786b8d6c87	Use RESTORE-ASKING for MIGRATE when in cluster mode.	2013-02-20 17:36:54 +01:00
antirez	ea7fc82a4a	Cluster: new command flag forcing implicit ASKING. Also using this new flag the RESTORE-ASKING command was implemented that will be used by MIGRATE.	2013-02-20 17:28:35 +01:00
antirez	9a04e12cc0	Cluster: I/O errors are now logged at DEBUG level.	2013-02-20 13:18:51 +01:00
antirez	02796ba7a7	Cluster: sanity checks on the cluster bus message length.	2013-02-15 16:44:39 +01:00
antirez	6b9c661838	Cluster: make valgrind happy initializing all the bytes of the node IP.	2013-02-15 12:58:35 +01:00
antirez	7371d5e248	Remove wrong decrRefCount() from getNodeByQuery(). This fixes issue #607.	2013-02-15 11:57:53 +01:00
antirez	20f52b5b78	Top comment for getNodeByQuery() improved.	2013-02-15 11:50:54 +01:00
antirez	e0e15bd06d	Cluster: with 16384 slots we need bigger buffers.	2013-02-14 15:36:33 +01:00
antirez	1649e509c3	Cluster: the cluster state structure is now heap allocated.	2013-02-14 13:20:56 +01:00
antirez	9dfd11c3da	Cluster: Initialize ip and port in createClusterNode().	2013-02-14 13:01:28 +01:00
antirez	ebd666db47	Cluster: from 4096 to 16384 hash slots.	2013-02-14 12:49:16 +01:00
antirez	b70b459b0e	TCP_NODELAY after SYNC: changes to the implementation.	2013-02-05 12:04:30 +01:00
guiquanz	9d09ce3981	Fixed many typos.	2013-01-19 10:59:44 +01:00
antirez	2feef47aa1	MIGRATE: retry one time on I/O error. Now that we cache connections, a retry attempt makes sure that the operation don't fail just because there is an existing connection error on the socket, like the other end closing the connection. Unfortunately this condition is not detectable using getsockopt(SO_ERROR), so the only option left is to retry. We don't retry on timeouts.	2012-11-14 11:30:24 +01:00
antirez	05705bc8bb	MIGRATE: fix default timeout to 1000 milliseconds. When a timeout <= 0 is provided we set a default timeout of 1 second. It was set to 1 millisecond for an error resulting from a recent change.	2012-11-12 18:54:35 +01:00
antirez	149b527a74	MIGRATE timeout should be in milliseconds. While it is documented that the MIGRATE timeout is in milliseconds, it was in seconds instead. This commit fixes the problem.	2012-11-12 14:01:02 +01:00
antirez	e23d281e48	MIGRATE TCP connections caching. By caching TCP connections used by MIGRATE to chat with other Redis instances a 5x performance improvement was measured with redis-benchmark against small keys. This can dramatically speedup cluster resharding and other processes where an high load of MIGRATE commands are used.	2012-11-12 00:47:24 +01:00
antirez	4365e5b2d3	BSD license added to every C source and header file.	2012-11-08 18:31:32 +01:00
antirez	1237d71c4e	COPY and REPLACE options for MIGRATE. With COPY now MIGRATE does not remove the key from the source instance. With REPLACE it uses RESTORE REPLACE on the target host so that even if the key already eixsts in the target instance it will be overwritten. The options can be used together.	2012-11-07 15:32:27 +01:00
antirez	e5b5763f56	REPLACE option for RESTORE. The REPLACE option deletes an existing key with the same name (if any) and materializes the new one. The default behavior without RESTORE is to return an error if a key already exists.	2012-11-07 10:57:23 +01:00
antirez	6fdc635447	Better Out of Memory handling. The previous implementation of zmalloc.c was not able to handle out of memory in an application-specific way. It just logged an error on standard error, and aborted. The result was that in the case of an actual out of memory in Redis where malloc returned NULL (In Linux this actually happens under specific overcommit policy settings and/or with no or little swap configured) the error was not properly logged in the Redis log. This commit fixes this problem, fixing issue #509. Now the out of memory is properly reported in the Redis log and a stack trace is generated. The approach used is to provide a configurable out of memory handler to zmalloc (otherwise the default one logging the event on the standard output is used).	2012-08-24 12:55:37 +02:00
antirez	21661d7acc	Fixed a bug in propagation of PUBLISH via the cluster bus. This bug was spotted by clang on FreeBSD.	2012-04-24 11:28:10 +02:00
antirez	e54fe9a79f	A few compiler warnings suppressed.	2012-04-24 11:11:55 +02:00
antirez	a3fb7fd4f6	Minor MIGRATE implementation simplification about ttl handling.	2012-04-10 16:46:29 +02:00
antirez	46738646d4	dump/restore fixed to use the new crc64 API.	2012-04-09 12:33:57 +02:00
antirez	12e91892a0	Another fix for MIGRATE.	2012-04-03 15:10:42 +02:00
antirez	84e5684bca	Two fixed for MIGRATE: fix TTL propagation and fix transferring of data bigger than 64k.	2012-04-03 12:17:40 +02:00
antirez	31f2ecf436	MIGRATE now let the client distinguish I/O errors and timeouts from other erros.	2012-04-02 16:38:24 +02:00
antirez	f8ea19e539	DUMP/RESTORE now use CRC64 instead of truncated SHA1.	2012-04-02 13:10:39 +02:00
antirez	70d848e1fa	RESTORE ability to set a TTL fixed, bug introduced with millisecond expires.	2012-04-02 11:14:47 +02:00
antirez	a149ce6875	Prettify source code of create/verify DUMP payload.	2012-04-02 10:52:39 +02:00
antirez	bd04465931	DUMP / RESTORE: store RDB version in little endian.	2012-04-02 10:46:24 +02:00

... 4 5 6 7 8 ...

614 Commits