redict

mirror of https://codeberg.org/redict/redict.git synced 2025-01-24 00:59:02 -05:00

Author	SHA1	Message	Date
antirez	58799718be	Cluster: better comment for clusterSendFailoverAuthIfNeeded() epoch test.	2014-06-10 17:20:21 +02:00
antirez	61eb0eae83	Cluster: log granted failover authorizations.	2014-06-10 16:56:08 +02:00
antirez	d5d92deb6c	Cluster: log configEpoch updates to myself.	2014-06-10 16:38:36 +02:00
antirez	8204ab0098	Cluster: log when a master denies a failover auth.	2014-06-10 16:07:26 +02:00
antirez	9b3bc82c1a	Cluster: cluster_my_epoch added to CLUSTER INFO output.	2014-06-10 11:35:40 +02:00
antirez	32d0a79f78	Cluster: check that configEpoch never goes back. Since there are ways to alter the configEpoch outside of the failover procedure (for exampel CLUSTER SET-CONFIG-EPOCH and via the configEpoch collision resolution algorithm), make always sure, before replacing our configEpoch with a new one, that it is greater than the current one.	2014-06-07 14:37:09 +02:00
antirez	a2c2ef7de5	Cluster: SET-CONFIG-EPOCH should update currentEpoch. SET-CONFIG-EPOCH, used by redis-trib at cluster creation time, failed to update the currentEpoch, making it possible after a failover for a server to set its configEpoch to a value smaller than the current one (since configEpochs are obtained using currentEpoch). The bug totally break the Redis Cluster algorithms and protocols allowing for permanent split brain conditions about the slots configuration as shown in issue #1799.	2014-06-07 14:25:47 +02:00
antirez	88c2307535	Cluster: always allow ok -> fail switch in clusterUpdateState(). There is a time defined by REDIS_CLUSTER_WRITABLE_DELAY where fail -> ok switch is not possible after startup as a master for some time, however the contrary (ok -> fail) should always be possible.	2014-05-26 16:24:12 +02:00
antirez	39603a7e31	Cluster: slave validity factor is now user configurable. Check the commit changes in the example redis.conf for more information.	2014-05-22 16:57:54 +02:00
antirez	67133d2f48	Cluster: use clusterSetNodeAsMaster() during slave failover. clusterHandleSlaveFailover() was reimplementing what clusterSetNodeAsMaster() without any good reason.	2014-05-15 17:03:28 +02:00
antirez	8c6e92c3bc	Cluster: clear todo_before_sleep flags when executing actions. Thanks to this change, when there is some code like: clusterDoBeforeSleep(CLUSTER_TODO_UPDATE_STATE\|...); ... and later before returning to the event loop ... clusterUpdateState(); The clusterUpdateState() function will clar the flag and will not be repeated in the clusterBeforeSleep() function. This especially important for config save/fsync flags which are slow to execute and not a good idea to repeat without a good reason. This is implemented for all the CLUSTER_TODO flags.	2014-05-15 16:33:13 +02:00
antirez	7b87cda70e	Fixed typo in CLUSTER RESET implementation.	2014-05-15 12:33:57 +02:00
antirez	796f4ae9f7	CLUSTER RESET implemented. The new command is able to reset a cluster node so that it starts again as a fresh node. By default the command performs a soft reset (the same as calling it as CLUSTER RESET SOFT), and the following steps are performed: 1) All slots are set as unassigned. 2) The list of known nodes is flushed. 3) Node is set as master if it is a slave. When an hard reset is performed with CLUSTER RESET HARD the following additional operations are performed: 4) A new Node ID is created at random. 5) Epochs are set to 0. CLUSTER RESET is useful both when the sysadmin wants to reconfigure a node with a different role (for example turning a slave into a master) and for testing purposes. It also may play a role in automatically provisioned Redis Clusters, since it allows to reset a node back to the initial state in order to be reconfigured.	2014-05-15 11:43:06 +02:00
antirez	8b9d5ecbd1	Remove trailing spaces from cluster.c file.	2014-05-15 10:18:36 +02:00
antirez	60e5d1724c	Cluster: don't accept cluster bus connections during startup.	2014-05-14 12:05:00 +02:00
antirez	6baac558d8	Cluster: better handling of stolen slots. The previous code handling a lost slot (by another master with an higher configuration for the slot) was defensive, considering it an error and putting the cluster in an odd state requiring redis-cli fix. This was changed, because actually this only happens either in a legitimate way, with failovers, or when the admin messed with the config in order to reconfigure the cluster. So the new code instead will try to make sure that the keys stored match the new slots map, by removing all the keys in the slots we lost ownership from. The function that deletes the keys from the lost slots is called only if the node does not lose all its slots (resulting in a reconfiguration as a slave of the node that got ownership). This is an optimization since the replication code will anyway flush all the instance data in a faster way.	2014-05-14 10:46:37 +02:00
antirez	832a298005	Cluster: fixed data_age computation / check integer overflow.	2014-05-12 17:46:15 +02:00
antirez	2692339138	Cluster: forced failover implemented. Using CLUSTER FAILOVER FORCE it is now possible to failover a master in a forced way, which means: 1) No check to understand if the master is up is performed. 2) No data age of the slave is checked. Evan a slave with very old data can manually failover a master in this way. 3) No chat with the master is attempted to reach its replication offset: the master can just be down.	2014-05-12 16:34:20 +02:00
antirez	005f564eb3	Cluster: bypass data_age check for manual failovers. Automatic failovers only happen in Redis Cluster if the slave trying to be elected was disconnected from its master for no more than 10 times the node-timeout value. However there should be no such a check for manual failovers, since these are initiated by the sysadmin that, in theory, knows what she is doing when a slave is selected to be promoted.	2014-05-12 16:12:12 +02:00
antirez	5c78f87666	RESTORE: reply with -BUSYKEY special error code. The error when the target key is busy was a generic one, while it makes sense to be able to distinguish between the target key busy error and the others easily.	2014-05-12 10:01:59 +02:00
antirez	71d0e7e0ea	CLUSTER MEET: better error messages when address is invalid. Fixes issue #1734.	2014-05-09 16:36:59 +02:00
antirez	8a170c817d	Cluster: bulk-accept new nodes connections. The same change was operated for normal client connections. This is important for Cluster as well, since when a node rejoins the cluster, when a partition heals or after a restart, it gets flooded with new connection attempts by all the other nodes trying to form a full mesh again.	2014-05-09 11:52:59 +02:00
antirez	3625b52791	Cluster: clusterAcceptHandler() comments updated to match the code.	2014-05-09 11:44:46 +02:00
antirez	11d9ecb71d	CLUSTER SET-CONFIG-EPOCH implemented. Initially Redis Cluster accepted that after cluster creation all the nodes were at configEpoch 0, evolving from zero as failovers happen. However later the semantic was made more strict in order to make sure a cluster has always all the master nodes with a different configEpoch, which is more robust in some corner case (especially resulting from errors by the system administrator). To assign different configEpochs to different nodes at startup was a task performed naturally by the config conflicts resolution algorithm (see the Cluster specification). However this works well only for small clusters or when there are actually just a few collisions, since it is designed for exceptional cases. When a large cluster is created hundred of nodes can be at epoch 0, so the conflict resolution code is slow to provide an unique config to each node. For this reason this new command was introduced. It can be called only when a node is totally fresh: no other nodes known, and configEpoch set to zero, so it is safe even against misuses. redis-trib will use the new command in order to start the cluster already setting an incremental unique config to every node.	2014-04-29 19:15:16 +02:00
antirez	e3cf812c9e	clusterLoadConfig() REDIS_ERR retval semantics refined. We should return REDIS_ERR to signal we can't read the configuration because there is no config file only after checking errno, othewise we risk to rewrite an existing file that was not accessible for some other reason.	2014-04-24 16:23:03 +02:00
antirez	db06108bc1	Lock nodes.conf to avoid multiple processes using the same file. This was a common source of problems among users. The solution adopted is not bullet-proof as if the user deletes the nodes.conf file manually, and starts a new instance with the same nodes.conf file path, two instances will use the same file. However following this reasoning the user may drop a nuclear bomb into the datacenter as well.	2014-04-24 16:04:10 +02:00
kingsumos	a69178fdd2	fix cluster node description showing wrong slot allocation	2014-04-22 11:44:53 -04:00
antirez	67bb2c46b2	Add casting to match printf format. adjustOpenFilesLimit() and clusterUpdateSlotsWithConfig() that were assuming uint64_t is the same as unsigned long long, which is true probably for all the systems out there that we target, but still GCC emitted a warning since technically they are two different types.	2014-04-07 08:58:06 +02:00
antirez	8f52173b2c	Cluster: last_vote_epoch -> lastVoteEpoch. Use cammel case for epochs that are persisted on disk.	2014-03-27 15:01:24 +01:00
antirez	7fb14b73ba	Cluster: save/restore vars that must persist after recovery. This fixes issue #1479.	2014-03-27 14:56:29 +01:00
antirez	6dd2dbbd36	Cluster: handshake "already known" error logged to VERBOSE. This is not really an error but something that always happens for example when creating a new cluster, or if the sysadmin rejoins manually a node that is already known. Since useless logs don't help, moved to VERBOSE level.	2014-03-26 16:35:38 +01:00
antirez	3cf6f1f54f	Cluster: clusterHandleConfigEpochCollision() fixed. New config epochs must always be obtained incrementing the currentEpoch, that is itself guaranteed to be >= the max configEpoch currently known to the node.	2014-03-26 12:31:28 +01:00
antirez	80d4c52cdf	Cluster: better logging for clusterUpdateSlotsConfigWith().	2014-03-26 12:09:38 +01:00
antirez	eb746ec408	Cluster: CLUSTER SETSLOT implementation comment updated. Update the comment since the implementation details changed.	2014-03-25 17:50:46 +01:00
antirez	6c527a89a0	Cluster: configEpoch collisions resolution. The slave election in Redis Cluster guarantees that slaves promoted to masters always end with unique config epochs, however failures during manual reshardings, software bugs and operational errors may in theory cause two nodes to have the same configEpoch. This commit introduces a mechanism to eventually always end with different configEpochs if a collision ever happens. As a (wanted) side effect, this also ensures that after a new cluster is created, all nodes will end with a different configEpoch automatically.	2014-03-25 17:19:58 +01:00
antirez	c1041c570f	Cluster: stay within 80 cols.	2014-03-25 16:07:14 +01:00
antirez	82b53c650c	struct dictEntry -> dictEntry.	2014-03-20 16:20:37 +01:00
antirez	e26f4486b0	Cluster: update node configEpoch on UPDATE messages. The UPDATE message contains the configEpoch of the node configuration advertised in the packet. Update it if needed.	2014-03-11 11:53:09 +01:00
antirez	a2ff90919f	Cluster: set slot error if we receive an update for a busy slot. By manually modifying nodes configurations in random ways, it is possible to create the following scenario: A is serving keys for slot 10 B is manually configured to serve keys for slot 10 A receives an update from B (or another node) where it is informed that the slot 10 is now claimed by B with a greater configuration epoch, however A still has keys from slot 10. With this commit A will put the slot in error setting it in IMPORTING state, so that redis-trib can detect the issue.	2014-03-11 11:49:47 +01:00
antirez	1ed0ad77f0	Cluster: clarified a comment in clusterUpdateSlotsConfigWith().	2014-03-11 11:32:40 +01:00
antirez	8287945ff8	Cluster: flush importing/migrating state when master is turned into slave.	2014-03-11 11:22:06 +01:00
antirez	2e8e0ad44e	Cluster: clusterCloseAllSlots() added.	2014-03-11 11:16:18 +01:00
antirez	787b297046	Cluster: getKeysFromCommand() API cleaned up. This API originated from the "diskstore" experiment, not for Redis Cluster itself, so there were legacy/useless things trying to differentiate between keys that are going to be overwritten and keys that need to be fetched from disk (preloaded). All useless with Cluster, so removed with the result of code simplification.	2014-03-10 13:18:41 +01:00
antirez	c1a7d3e61f	Cluster: abort on port too high error. It also fixes multi-line comment style to be consistent with the rest of the code base. Related to #1555.	2014-03-10 10:41:27 +01:00
Salvatore Sanfilippo	442b06db54	Merge pull request #1555 from mattsta/cluster-port-error-out Cluster port error out	2014-03-10 10:37:50 +01:00
antirez	ed8c55237b	Cluster: be explicit about passing NULL as bind addr for connect. The code was already correct but it was using that bindaddr[0] is set to NULL as a side effect of current implementation if no bind address is configured. This is not guarnteed to hold true in the future.	2014-03-10 10:33:53 +01:00
antirez	3e8a92ef8d	Cluster: log error when anetTcpNonBlockBindConnect() fails.	2014-03-10 10:32:28 +01:00
Salvatore Sanfilippo	3b0edb80ec	Merge pull request #1567 from mattsta/fix-cluster-join Bind source address for cluster communication	2014-03-10 10:28:32 +01:00
antirez	0f1f25784f	Cluster: better timeout and retry time for failover. When node-timeout is too small, in the order of a few milliseconds, there is no way the voting process can terminate during that time, so we set a lower limit for the failover timeout of two seconds. The retry time is set to two times the failover timeout time, so it is at least 4 seconds.	2014-03-10 09:57:52 +01:00
antirez	6984692060	Cluster: fix conditional generating TRYAGAIN error.	2014-03-07 16:18:00 +01:00
antirez	36676c2318	Redis Cluster: support for multi-key operations.	2014-03-07 13:19:09 +01:00
Matt Stancliff	385c25f70f	Remove redundant IP length definition REDIS_CLUSTER_IPLEN had the same value as REDIS_IP_STR_LEN. They were both #define'd to the same INET6_ADDRSTRLEN.	2014-03-06 17:55:43 +01:00
Matt Stancliff	d2040ab9b1	Remove some redundant code Function nodeIp2String in cluster.c is exactly anetPeerToString with a pre-extracted fd.	2014-03-06 17:55:39 +01:00
Matt Stancliff	59cf0b1902	Fix return value check for anetTcpAccept anetTcpAccept returns ANET_ERR, not AE_ERR. This isn't a physical error since both ANET_ERR and AE_ERR are -1, but better to be consistent.	2014-03-06 17:55:31 +01:00
Matt Stancliff	e5b1e7be64	Bind source address for cluster communication The first address specified as a bind parameter (server.bindaddr[0]) gets used as the source IP for cluster communication. If no bind address is specified by the user, the behavior is unchanged. This patch allows multiple Redis Cluster instances to communicate when running on the same interface of the same host.	2014-03-04 17:36:45 -05:00
antirez	8dea2029a4	Fix configEpoch assignment when a cluster slot gets "closed". This is still code to rework in order to use agreement to obtain a new configEpoch when a slot is migrated, however this commit handles the special case that happens when the nodes are just started and everybody has a configEpoch of 0. In this special condition to have the maximum configEpoch is not enough as the special epoch 0 is not unique (all the others are). This does not fixes the intrinsic race condition of a failover happening while we are resharding, that will be addressed later.	2014-03-03 11:12:11 +01:00
Matt Stancliff	ce68caea37	Cluster: error out quicker if port is unusable The default cluster control port is 10,000 ports higher than the base Redis port. If Redis is started on a too-high port, Cluster can't start and everything will exit later anyway.	2014-02-19 17:30:07 -05:00
antirez	db6d628c3e	Cluster: clusterDelNode(): remove node from master's slaves.	2014-02-11 10:34:25 +01:00
antirez	5e0e03be41	Cluster: UPDATE messages are the norm and verbose. Logging them at WARNING level was of little utility and of sure disturb.	2014-02-11 10:18:24 +01:00
antirez	4a64286c36	Cluster: configEpoch assignment in SETNODE improved. Avoid to trash a configEpoch for every slot migrated if this node has already the max configEpoch across the cluster. Still work to do in this area but this avoids both ending with a very high configEpoch without any reason and to flood the system with fsyncs.	2014-02-11 10:09:17 +01:00
antirez	72f7abf6a2	Cluster: clusterSetStartupEpoch() made more generally useful. The actual goal of the function was to get the max configEpoch found in the cluster, so make it general by removing the assignment of the max epoch to currentEpoch that is useful only at startup.	2014-02-11 10:00:14 +01:00
antirez	44f7afe28a	Cluster: always increment the configEpoch in SETNODE after import. Removed a stale conditional preventing the configEpoch from incrementing after the import in certain conditions. Since the master got a new slot it should always claim a new configuration.	2014-02-11 09:50:37 +01:00
antirez	a1349728ea	Cluster: on resharding upgrade version of receiving node. The node receiving the hash slot needs to have a version that wins over the other versions in order to force the ownership of the slot. However the current code is far from perfect since a failover can happen during the manual resharding. The fix is a work in progress but the bottom line is that the new version must either be voted as usually, set by redis-trib manually after it makes sure can't be used by other nodes, or reserved configEpochs could be used for manual operations (for example odd versions could be never used by slaves and are always used by CLUSTER SETSLOT NODE).	2014-02-11 00:36:05 +01:00
antirez	6dc26795aa	Cluster: fsync at every SETSLOT command puts too pressure on disks. During slots migration redis-trib can send a number of SETSLOT commands. Fsyncing every time is a bit too much in production as verified empirically. To make sure configs are fsynced on all nodes after a resharding redis-trib may send something like CLUSTER CONFSYNC. In this case fsyncs were not providing too much value since anyway processes can crash in the middle of the resharding of an hash slot, and redis-trib should be able to recover from this condition anyway.	2014-02-10 23:54:08 +01:00
antirez	218358bbbd	Cluster: conditions to clear "migrating" on slot for SETSLOT ... NODE changed. If the slot is manually assigned to another node, clear the migrating status regardless of the fact it was previously assigned to us or not, as long as we no longer have keys for this slot. This avoid a race during slots migration that may leave the slot in migrating status in the source node, since it received an update message from the destination node that is already claiming the slot. This way we are sure that redis-trib at the end of the slot migration is always able to close the slot correctly.	2014-02-10 23:51:47 +01:00
antirez	bf670e0745	Cluster: don't update slave's master if we don't know it. There is no way we can update the slave's node->slaveof pointer if we don't know the master (no node with such an ID in our tables).	2014-02-10 18:33:34 +01:00
antirez	a3755ae9ee	Cluster: ignore slot config changes if we are importing it.	2014-02-10 18:04:43 +01:00
antirez	6fc53e16ad	Cluster: update configEpoch after manually messing with slots.	2014-02-10 18:01:58 +01:00
antirez	1a73c992a3	Cluster: fixed inverted arguments in logging function call.	2014-02-10 17:21:10 +01:00
antirez	32563b4a5f	Cluster: clear the FAIL status for masters without slots. Masters without slots don't participate to the cluster but just do redirections, no need to take them in FAIL state if they are back reachable.	2014-02-10 17:18:27 +01:00
antirez	5b2082ead3	Cluster: replica migration should only work for masters serving slots.	2014-02-10 17:08:37 +01:00
antirez	f885fa8bac	Cluster: clusterReadHandler() fixed to work with new message header.	2014-02-10 16:27:37 +01:00
antirez	7bf7b7350c	Cluster: signature changed to "RCmb" (Redis Cluster message bus). Sounds better after all.	2014-02-10 15:55:21 +01:00
antirez	dced9c0619	Cluster: discard bus messages with version != 0.	2014-02-10 15:54:22 +01:00
antirez	007e1c7cb2	Cluster: added signature + version in bus packets.	2014-02-10 15:53:09 +01:00
antirez	142281dc79	Cluster: keys slot computation now supports hash tags. Currently this is marginally useful, only to make sure two keys are in the same hash slot when the cluster is stable (no rehashing in progress). In the future it is possible that support will be added to run mutli-keys operations with keys in the same hash slot.	2014-02-07 17:39:01 +01:00
antirez	04fe000bf8	Cluster: fixed MF condition in clusterHandleSlaveFailover(). For manual failover we need a manual failover in progress, and that mf_can_start is true (master offset received and matched).	2014-02-05 16:01:56 +01:00
antirez	c6f02fd67a	Cluster: CLUSTER FAILOVER replies with OK and logs the event.	2014-02-05 15:52:38 +01:00
antirez	c72449af30	Cluster: check that a MF is in progress in manualFailoverCheckTimeout(). Otherwise it is always detected as a manual failover timed out.	2014-02-05 15:45:24 +01:00
antirez	b7402bcad5	Cluster: force AUTH ACK on manual failover. When a slave requests masters vote for a manual failover, the REQUEST_AUTH message is flagged in a special way in order to force the masters to give the authorization even if the master is not marked as failing.	2014-02-05 13:10:03 +01:00
antirez	4cf0cd5719	Cluster: manual failover initial implementation.	2014-02-05 13:01:24 +01:00
antirez	a7d30681c9	Cluster: configurable replicas migration barrier. It is possible to configure the min number of additional working slaves a master should be left with, for a slave to migrate to an orphaned master.	2014-01-31 11:26:36 +01:00
antirez	6c9359add1	Cluster: perform orphaned masters check before continue statements. The check was placed in a way that conflicted with the continue statements used by the node hearth beat code later that needs to skip the current node sometimes. Moved at the start of the function so that's always executed.	2014-01-30 18:23:31 +01:00
antirez	c2507b0ff6	Cluster: replica migration implementation. This feature allows slaves to migrate to orphaned masters (masters without working slaves), as long as a set of conditions are met, including the fact that the migrating slave needs to be in a master-slaves ring with at least another slave working.	2014-01-30 18:05:11 +01:00
antirez	5b4020fb42	Cluster: swap two code blocks to have a more obvious flow.	2014-01-30 16:34:23 +01:00
antirez	4beaaff8ea	Cluster: remove not needed return statement breaking failover.	2014-01-29 17:28:46 +01:00
antirez	3582054982	Cluster: broadcast pong to other slaves in the same ring. When we schedule a failover, broadcast a PONG to the slaves. The other slaves that plan to get elected will do the same too, this way it is likely that every slave will have a good picture of its own rank. Note that this is N*N messages where N is the number of slaves for the failing master, however usually even large clusters have many master nodes but a limited number of replicas per node, so this is harmless.	2014-01-29 17:19:55 +01:00
antirez	e2b59621a8	Cluster: log offset when announcing the failover election delay.	2014-01-29 17:16:10 +01:00
antirez	940531e9b7	Cluster: added progressive election delay according to slave rank. Note that when we compute the initial delay, there are probably still more up to date information to receive from slaves with new offsets, so the delay is recomputed when new data is available.	2014-01-29 16:53:45 +01:00
antirez	6f54032080	Cluster: function clusterGetSlaveRank() added. Return the number of slaves for the same master having a better replication offset of the current slave, that is, the slave "rank" used to pick a delay before the request for election.	2014-01-29 16:39:04 +01:00
antirez	40cd38f0c4	Cluster: update node replication offset from bus packets headers.	2014-01-29 16:01:00 +01:00
antirez	9d4ded7ec6	Cluster: refactoring: new macros to check node flags.	2014-01-29 12:17:16 +01:00
antirez	099bd336db	Cluster: use myself instead of server->cluster.myself.	2014-01-29 11:38:14 +01:00
antirez	e36bd8b43e	Cluster: added a global myself pointer in cluster.c. Accessing to the 'myself' node, the node representing the currently running instance, is handy without the need to type server.cluster->myself every time.	2014-01-29 11:22:22 +01:00
antirez	f1e09d8c41	Cluster: clusterBroadcastPong() improved with target selection. Now we can broadcast a pong to all the instances or just the local slaves (that is useful for replication offset propagation).	2014-01-29 11:08:52 +01:00
antirez	befcf6259e	Cluster: broadcast master/slave replication offset in bus header.	2014-01-28 16:51:50 +01:00
antirez	0b1b25c51c	Cluster: introduced repl_offset fields in clusterNode. The two fields are used in order to remember the latest known replication offset and the time we received it from other slave nodes. This will be used by slaves in order to start the election procedure with a delay that is proportional to the rank of the slave among the other slaves for this master, when sorted for replication offset. Usually this allows the slave with the most updated offset to win the election and replace the failing master in the cluster.	2014-01-28 16:28:07 +01:00
antirez	0f9422d575	Cluster: update slaves lists in clusterSetMaster().	2014-01-22 18:46:53 +01:00
antirez	5383ab0bc6	Cluster: CLUSTER SLAVES subcommand added.	2014-01-22 18:38:42 +01:00
antirez	603e480fd5	Cluster: clusterGenNodesDescription() refactored into two functions.	2014-01-22 18:36:12 +01:00
antirez	80e80668f4	Cluster: master nodes wait before rejoining the cluster after reboot. One of the simple heuristics used by Redis Cluster in order to avoid losing data in the typical failure modes created by the asynchronous replication with the slaves (a master is unable, when accepting a write, to immediately tell if it should be really accepted or refused because of a configuration change), is to wait some time before to rejoin the cluster after being partitioned away from the majority of instances. A similar condition happens when a master is restarted. It does not know if it was already failed over, nor if all the clients have already an updated configuration about the cluster map, so it is possible that clients will try to write to stale masters that were restarted. In a similar way this commit changes masters behavior so they wait 2000 milliseconds before accepting writes after a reboot. There is nothing special about 2 seconds if not to be a value supposedly larger a few orders of magnitude compared to the cluster bus communication latencies.	2014-01-20 11:52:52 +01:00
antirez	e6970e204f	Cluster: debug printf statemets removed. These were committed for error after being inserted in order to fix an issue.	2014-01-20 11:19:04 +01:00
antirez	ac3850cabd	Cluster: allow CLUSTER REPLICATE to switch master. The code was doing checks for slaves that should be done only when the instance is currently a master. Switching a slave from a master to another one should just work.	2014-01-17 18:22:35 +01:00
antirez	3d455393a6	Cluster: don't let a node forget its own master. redis-trib should make sure to reconfigure slaves of a node to remove from the cluster to replicate with other nodes before sending CLUSTER FORGET.	2014-01-16 17:49:35 +01:00
antirez	0c373207fa	Cluster: don't forget yourself with CLUSTER FORGET.	2014-01-16 09:46:23 +01:00
antirez	3e948970fe	Cluster: use the node blacklist in CLUSTER FORGET. CLUSTER FORGET is not useful if we can't remove a node from all the nodes of our cluster because of the Gossip protocol that keeps adding a given node to nodes where we already tried to remove it. So now CLUSTER FORGET implements a nodes blacklist that is set and checked by the Gossip section processing function. This way before a node is re-added at least 60 seconds must elapse since the FORGET execution. This means that redis-trib has some time to remove a node from a whole cluster. It is possible that in the future it will be uesful to raise the 60 sec figure to something bigger.	2014-01-15 16:50:45 +01:00
antirez	ccf268fa17	Cluster: fix clusterBlacklistAddNode() by setting right expire time. The hash table value should be set to now + 60 seconds otherwise it expires immediately.	2014-01-15 16:49:31 +01:00
antirez	4e1861155f	Cluster: clusterBlacklistAddNode() key lookup fixed. We can't lookup by node->name that's not an SDS string but a plain C array in the node structure.	2014-01-15 16:45:07 +01:00
antirez	b51be7b34f	Cluster: clusterBlacklistExists() requires blacklist cleanup before lookup.	2014-01-15 16:06:54 +01:00
antirez	a81340abaf	Cluster: set a minimum rejoin delay if node_timeout is too small. The rejoin delay usually is the node timeout. However if the node timeout is too small, we set it to 500 milliseconds, that is a value chosen to be greater than most setups RTT / instances latency figures so that likely communication with other nodes happen before rejoining.	2014-01-15 12:34:33 +01:00
antirez	a687cbc19c	Cluster: periodically call clusterUpdateState() when cluster is down. Usually we update the cluster state (to understand if we should accept queries or reply with an error) only when there is a change in the state of the nodes. However for the "delayed rejoin" feature to work, that is, for a master to wait some time before accepting queries again after it rejoins the majority, we need to periodically update the last time when the node was partitioned away from the majority. With this commit if the cluster is down we update the state ten times per second.	2014-01-15 12:26:12 +01:00
antirez	25ddefdea3	Cluster: range checking in getSlotOrReply() fixed. See issue #1426 on Github.	2014-01-15 11:33:46 +01:00
antirez	fb659cd334	Cluster: ignore empty lines in nodes.conf. Even without the user messing manually with the file, it is still possible to have blank lines (just a single "\n" per line) because of how the nodes.conf update/write process works.	2014-01-15 11:23:41 +01:00
antirez	6c63df3031	Cluster: atomic update of nodes.conf file. The way the file was generated was unsafe and leaded to nodes.conf file corruption (zero length file) on server stop/crash during the creation of the file. The previous file update method was as simple as open with O_TRUNC followed by the write call. While the write call was a single one with the full payload, ensuring no half-written files for POSIX semantics, stopping the server just after the open call resulted into a zero-length file (all the nodes information lost!).	2014-01-15 10:31:20 +01:00
antirez	28273394cb	Cluster: support to read from slave nodes. A client can enter a special cluster read-only mode using the READONLY command: if the client read from a slave instance after this command, for slots that are actually served by the instance's master, the queries will be processed without redirection, allowing clients to read from slaves (but without any kind fo read-after-write guarantee). The READWRITE command can be used in order to exit the readonly state.	2014-01-14 16:33:16 +01:00
antirez	58c8a071a5	Fix RESTORE ttl handling in 32 bit archs. long was used instead of long long in order to handle a 64 bit resolution millisecond timestamp. This fixes issue #1483.	2014-01-09 11:09:23 +01:00
antirez	f510549044	Cluster: clusterProcessPacket() was not 80 cols friendly. The function actually needs to be split into sub-functions at some point in the future.	2013-12-25 17:57:36 +01:00
antirez	66ec1412fe	Redis Cluster: add repl_ping_slave_period to slave data validity time. When the configured node timeout is very small, the data validity time (maximum data age for a slave to try a failover) is too little (ten times the configured node timeout) when the replication link with the master is mostly idle. In this case we'll receive some data from the master only every server.repl_ping_slave_period to refresh the last interaction with the master. This commit adds to the max data validity time the slave ping period to avoid this problem of slaves sensing too old data without a good reason. However this max data validity time is likely a setting that should be configurable by the Redis Cluster user in a way completely independent from the node timeout.	2013-12-22 10:05:16 +01:00
antirez	658aff9d29	Redis Cluster: move node failure reports logging from VERBOSE to NOTICE level.	2013-12-21 00:04:53 +01:00
antirez	5a404c87c1	Redis Cluster: remove no longer relevant comment.	2013-12-20 14:40:11 +01:00
antirez	fda4cba912	Redis Cluster: reconfigure replication when master changes address.	2013-12-20 12:47:22 +01:00
antirez	d7374032c0	Redis Cluster: handshake code refactoring + Gossip IP switch detection. This commit makes it simple to start an handshake with a specific node address, and uses this in order to detect a node IP change and start a new handshake in order to fix the IP if possible.	2013-12-20 12:38:03 +01:00
antirez	a2c938c834	Redis Cluster: delay state change when in the majority again. As specified in the Redis Cluster specification, when a node can reach the majority again after a period in which it was partitioend away with the minorty of masters, wait some time before accepting queries, to provide a reasonable amount of time for other nodes to upgrade its configuration. This lowers the probabilities of both a client and a master with not updated configuration to rejoin the cluster at the same time, with a stale master accepting writes.	2013-12-20 09:56:18 +01:00
antirez	7a666ac419	Cluster: set n->slaves to NULL in clusterNodeResetSlaves(). The value was otherwise undefined, so next time the node was promoted again from slave to master, adding a slave to the list of slaves would likely crash the server or result into undefined behavior.	2013-12-17 14:50:24 +01:00
antirez	fda91dbde3	Cluster: check link is valid before sending UPDATE.	2013-12-17 12:28:37 +01:00
antirez	f57bb36ce7	Cluster: initialize todo_before_sleep flags to 0.	2013-12-17 12:22:02 +01:00
antirez	c70c0c6db7	Cluster: use proper type mstime_t for ping delay var.	2013-12-17 10:27:36 +01:00
antirez	47815d38e0	Fixed clearNodeFailureIfNeeded() time type to mstime_t. This prevented 32bit cluster instances from clearing the FAIL flag when needed.	2013-12-17 09:45:52 +01:00
antirez	e88e6a6334	Cluster: use long long for timestamps in clusterGenNodesDescription(). Ping sent and pong received fields need to be casted to long long to be printed correctly into 32 bit systems.	2013-12-17 09:38:11 +01:00
antirez	11e81a1e9a	Fixed grammar: before H the article is a, not an.	2013-12-05 16:35:32 +01:00
antirez	6fa42b7507	Cluster: nodes re-addition blacklist API.	2013-12-02 11:12:23 +01:00
antirez	8f18345ef0	Cluster: basic data structures for nodes black list.	2013-11-29 17:37:06 +01:00
antirez	3db825fde4	Cluster: some code about clusterHandleSlaveFailover() marginally improved. 80 cols friendly, some minor change to the code to make it simpler.	2013-11-29 16:17:05 +01:00
antirez	a5e7358a12	Cluster: removed not needed newline at end of redisLog() msg.	2013-11-08 17:28:02 +01:00
antirez	28071caf38	Cluster: send a single UPDATE packet for now.	2013-11-08 17:25:49 +01:00
antirez	d289c628b1	Cluster: replace hardcoded 4096 for bus msg len with sizeof().	2013-11-08 17:19:19 +01:00
antirez	94a07d5901	Cluster: slots update refactored + UPDATE msg processing. Now there is a function that handles the update of the local slot configuration every time we have some new info about a node and its set of served slots and configEpoch. Moreoever the UPDATE packets are now processed when received (it was a work in progress in the previous commit).	2013-11-08 17:02:10 +01:00
antirez	dc43f66eac	Cluster: UPDATE msg data structure and sending function.	2013-11-08 16:26:50 +01:00
antirez	6c6572be95	Cluster: refactoring of slots update code and more. The commit also introduces detection of nodes publishing not updated configuration. More work in progress to send an UPDATE packet to inform of the config change.	2013-11-08 10:32:16 +01:00
antirez	1a0cea33a0	Cluster: initialize senderConfigEpoch and senderCurrentEpoch for warnings suppression.	2013-11-05 12:01:07 +01:00
antirez	0c9f60a628	Cluster: there is a lower limit for the handshake timeout.	2013-10-11 10:34:32 +02:00
antirez	1447d28c0f	Cluster: data_age conversion to milliseconds fixed.	2013-10-09 16:36:06 +02:00
antirez	573c2fea91	Cluster: clusterCron() freq is now 10h. Still ping 1 node every sec. After the change in clusterCron() frequency of call, we still want to ping just one random node every second.	2013-10-09 16:29:17 +02:00
antirez	ba42428633	Cluster: time switched from seconds to milliseconds. All the internal state of cluster involving time is now using mstime_t and mstime() in order to use milliseconds resolution. Also the clusterCron() function is called with a 10 hz frequency instead of 1 hz. The cluster node_timeout must be also configured in milliseconds by the user in redis.conf.	2013-10-09 16:19:26 +02:00
antirez	929b6a4480	Cluster: cluster stuff moved from redis.h to cluster.h.	2013-10-09 15:38:05 +02:00
antirez	ae2763f564	Cluster: masters don't vote for a slave with stale config. When a slave requests our vote, the configEpoch he claims for its master and the set of served slots must be greater or equal to the configEpoch of the nodes serving these slots in the current configuraiton of the master granting its vote. In other terms, masters don't vote for slaves having a stale configuration for the slots they want to serve.	2013-10-08 12:45:35 +02:00
antirez	f7d6ad4366	Cluster: fix slave data age computation when master is still connected.	2013-10-07 16:07:13 +02:00
antirez	2c3301b9f5	Cluster: log message improved when FAIL is cleared from a slave node.	2013-10-07 15:44:58 +02:00
antirez	72f38cd70f	Cluster: slave nodes advertise master slots bitmap and configEpoch.	2013-10-07 11:31:12 +02:00
antirez	7afc0dd59a	Cluster: new clusterDoBeforeSleep() API. The new API is able to remember operations to perform before returning to the event loop, such as checking if there is the failover quorum for a slave, save and fsync the configuraiton file, and so forth. Because this operations are performed before returning on the event loop we are sure that messages that are sent in the same event loop run will be delivered after the configuration is already saved, that is a requirement sometimes. For instance we want to publish a new epoch only when it is already stored in nodes.conf in order to avoid returning back in the logical clock when a node is restarted. This new API provides a big performance advantage compared to saving and possibly fsyncing the configuration file multiple times in the same event loop run, especially in the case of big clusters with tens or hundreds of nodes.	2013-10-03 09:58:06 +02:00

1 2 3 4 5 ...

494 Commits