redict

mirror of https://codeberg.org/redict/redict.git synced 2025-01-24 09:08:26 -05:00

Author	SHA1	Message	Date
antirez	3a374b0511	Sentinel: failover abort function simplified.	2013-11-18 11:43:35 +01:00
antirez	e0750acf11	Sentinel: slaves reconfig delay modified. The time Sentinel waits since the slave is detected to be configured to the wrong master, before reconfiguring it, is now the failover_timeout time as this makes more sense in order to give the Sentinel performing the failover enoung time to reconfigure the slaves slowly (if required by the configuration). Also we now PUBLISH more frequently the new configuraiton as this allows to switch the reapprearing master back to slave faster.	2013-11-18 11:37:24 +01:00
antirez	83316f515c	Sentinel: failover restart time is now multiple of failover timeout. Also defaulf failover timeout changed to 3 minutes as the failover is a fairly fast procedure most of the times, unless there are a very big number of slaves and the user picked to configure them sequentially (in that case the user should change the failover timeout accordingly).	2013-11-18 11:30:08 +01:00
antirez	3a56013acb	Sentinel: state machine and timeouts simplified.	2013-11-18 11:12:58 +01:00
antirez	4be53b1c5d	Sentinel: election timeout define.	2013-11-18 10:08:06 +01:00
antirez	69d826a354	Sentinel: fix address of master in Hello messages. Once we switched configuration during a failover, we should advertise the new address. This was a serious race condition as the Sentinel performing the failover for a moment advertised the old address with the new configuration epoch: once trasmitted to the other Sentinels the broken configuration would remain there forever, until the next failover (because a greater configuration epoch is required to overwrite an older one).	2013-11-14 10:25:55 +01:00
antirez	e4c65e72c6	Sentinel: master address selection in get-master-address refactored.	2013-11-14 10:23:54 +01:00
antirez	c0d7229364	Sentinel: fix conditional to only affect slaves with wrong master.	2013-11-14 10:23:05 +01:00
antirez	dfbd9c5aeb	Sentinel: simplify and refactor slave reconfig code.	2013-11-14 00:36:43 +01:00
antirez	64ad6648a8	Sentinel: reconfigure slaves to right master.	2013-11-14 00:29:38 +01:00
antirez	3e27d678da	Sentinel: remember last time slave changed master.	2013-11-14 00:20:15 +01:00
antirez	8297745fa6	Sentinel: redirect-to-master is not ok with new algorithm. Now Sentinel believe the current configuration is always the winner and should be applied by Sentinels instead of trying to adapt our view of the cluster based on what we observe. So the only way to modify what a Sentinel believe to be the truth is to win an election and advertise the new configuration via Pub / Sub with a greater configuration epoch.	2013-11-13 17:03:48 +01:00
antirez	76a88f56e5	Sentinel: safer slave reconfig, master reported role should match.	2013-11-13 17:02:09 +01:00
antirez	ddaad9fe2d	Sentinel: role reporting fixed and added in SENTINEL output.	2013-11-13 16:39:57 +01:00
antirez	a0afa66f4b	Sentinel: being a master and reporting as slave is considered SDOWN.	2013-11-13 16:28:56 +01:00
antirez	17718fdcba	Sentinel: make sure role_reported is always updated.	2013-11-13 16:21:58 +01:00
antirez	46a053d34b	Sentinel: track role change time. Wait before reconfigurations.	2013-11-13 16:18:23 +01:00
antirez	9e40c46f5e	Sentinel: fix no-down check in master->slave conversion code.	2013-11-13 13:43:59 +01:00
antirez	ae35b7e240	Sentinel: readd slaves back after a master reset.	2013-11-13 13:01:11 +01:00
antirez	6bd4f6bffe	Sentinel: sentinelResetMaster() new flag to avoid removing set of sentinels. This commit also removes some dead code and cleanup generic flags.	2013-11-13 10:30:45 +01:00
antirez	1569af1f23	Sentinel: receive Pub/Sub messages from slaves.	2013-11-12 23:07:33 +01:00
antirez	dfa5f8b777	Sentinel: change event name when converting master to slave.	2013-11-12 23:00:17 +01:00
antirez	24158d1488	Sentinel: added config-epoch to SENTINEL masters output.	2013-11-12 17:22:04 +01:00
antirez	d2bc6dc39a	Sentinel: new failover algo, desync slaves and update config epoch.	2013-11-12 17:07:31 +01:00
antirez	4a128b949d	Sentinel: when starting failover seek for votes ASAP.	2013-11-12 16:38:02 +01:00
antirez	e6b9d5e97e	Sentinel: +new-epoch events.	2013-11-12 13:35:25 +01:00
antirez	54c447be52	Sentinel: wait some time between failover attempts.	2013-11-12 13:30:31 +01:00
antirez	ab4b2ec88f	Sentinel: allow to vote for myself.	2013-11-12 11:32:40 +01:00
antirez	b6b65b29c0	Sentinel: fix PUBLISH to masters and slaves.	2013-11-12 11:12:48 +01:00
antirez	90ab62fd5e	Sentinel: epoch introduced in leader vote.	2013-11-12 11:09:35 +01:00
antirez	8c1bf9a2bd	Sentinel: leadership handling changes WIP. Changes to leadership handling. Now the leader gets selected by every Sentinel, for a specified epoch, when the SENTINEL is-master-down-by-addr is sent. This command now includes the runid and the currentEpoch of the instance seeking for a vote. The Sentinel only votes a single time in a given epoch. Still a work in progress, does not even compile at this stage.	2013-11-11 18:30:14 +01:00
antirez	0bac36d0a1	Sentinel: handle Hello messages received via slaves correctly. Even when messages are received via the slave, we should perform operations (like adding a new Sentinel) in the context of the master.	2013-11-11 17:12:27 +01:00
antirez	9e1b27d49e	Sentinel: remove code not useful in the new design.	2013-11-11 12:06:11 +01:00
antirez	b93b0adc89	Sentinel: epoch introduced. Sentinel state now includes the idea of current epoch and config epoch. In the Hello message, that is now published both on masters and slaves, a Sentinel no longer just advertises itself but also broadcasts its current view of the configuration: the master name / ip / port and its current epoch. Sentinels receiving such information switch to the new master if the configuration epoch received is newer and the ip / port of the master are indeed different compared to the previos ones.	2013-11-11 11:05:58 +01:00
antirez	80da056c29	Sentinel: sentinelSendSlaveOf() was missing a var and the prototype.	2013-11-06 11:23:53 +01:00
antirez	23800d9e49	Sentinel: increment pending_commands counter in two more places. AUTH and SCRIPT KILL were sent without incrementing the pending commands counter. Clearly this needs some kind of wrapper doing it for the caller in order to be less bug prone.	2013-11-06 11:21:44 +01:00
antirez	671c1dfb56	Sentinel: always send CONFIG REWRITE when changing instance role. This change makes Sentinel less fragile about a number of failure modes. This commit also fixes a different bug as a side effect, SLAVEOF command was sent multiple times without incrementing the pending commands count.	2013-11-06 11:13:27 +01:00
antirez	fb9b76fe14	Cluster: slave node now uses the new protocol to get elected.	2013-09-26 11:13:17 +02:00
antirez	6ea8e0949c	sdsrange() does not need to return a value. Actaully the string is modified in-place and a reallocation is never needed, so there is no need to return the new sds string pointer as return value of the function, that is now just "void".	2013-07-24 11:21:39 +02:00
antirez	73ae8558c1	Sentinel: embed IPv6 address into [] when naming slave/sentinel instance.	2013-07-11 16:38:40 +02:00
antirez	3fc7f324d2	Sentinel: use comma as separator to publish hello messages. We use comma to play well with IPv6 addresses, but the implementation is still able to parse the old messages separated by colons.	2013-07-11 16:37:47 +02:00
antirez	5c5ebb0b9a	Sentinel: make sure published addr/id buffer is large enough. With ipv6 support we need more space, so we account for the IP address max size plus what we need for the Run ID, port, flags.	2013-07-10 14:44:38 +02:00
antirez	631d656a94	All IP string repr buffers are now REDIS_IP_STR_LEN bytes.	2013-07-09 11:32:52 +02:00
Geoff Garside	e04fdf26fe	Add IPv6 support to sentinel.c. This has been done by exposing the anetSockName() function anet.c to be used when the sentinel is publishing its existence to the masters. This implementation is very unintelligent as it will likely break if used with IPv6 as the nested colons will break any parsing of the PUBLISH string by the master.	2013-07-08 16:08:36 +02:00
Geoff Garside	2345cee335	Update calls to anetResolve to include buffer size	2013-07-08 15:57:22 +02:00
antirez	4c0f8c4e5a	Sentinel: parse new INFO replication output correctly. Sentinel was not able to detect slaves when connected to a very recent version of Redis master since a previos non-backward compatible change to INFO broken the parsing of the slaves ip:port INFO output. This fixes issue #1164	2013-06-20 10:23:23 +02:00
antirez	e5ef85c444	Sentinel: changes to tilt mode. Tilt mode was too aggressive (not processing INFO output), this resulted in a few problems: 1) Redirections were not followed when in tilt mode. This opened a window to misinform clients about the current master when a Sentinel was in tilt mode and a fail over happened during the time it was not able to update the state. 2) It was possible for a Sentinel exiting tilt mode to detect a false fail over start, if a slave rebooted with a wrong configuration about at the same time. This used to happen since in tilt mode we lose the information that the runid changed (reboot). Now instead the Sentinel in tilt mode will still remove the instance from the list of slaves if it changes state AND runid at the same time. Both are edge conditions but the changes should overall improve the reliability of Sentinel.	2013-04-30 15:08:29 +02:00
antirez	ef05a78e7e	Sentinel: more sensible delay in master demote after tilt.	2013-04-30 15:08:22 +02:00
antirez	48ede0d84d	Sentinel: only demote old master into slave under certain conditions. We used to always turn a master into a slave if the DEMOTE flag was set, as this was a resurrecting master instance. However the following race condition is possible for a Sentinel that got partitioned or internal issues (tilt mode), and was not able to refresh the state in the meantime: 1) Sentinel X is running, master is instance "A". 3) "A" fails, sentinels will promote slave "B" as master. 2) Sentinel X goes down because of a network partition. 4) "A" returns available, Sentinels will demote it as a slave. 5) "B" fails, other Sentinels will promote slave "A" as master. 6) At this point Sentinel X comes back. When "X" comes back he thinks that: "B" is the master. "A" is the slave to demote. We want to avoid that Sentinel "X" will demote "A" into a slave. We also want that Sentinel "X" will detect that the conditions changed and will reconfigure itself to monitor the right master. There are two main ways for the Sentinel to reconfigure itself after this event: 1) If "B" is reachable and already configured as a slave by other sentinels, "X" will perform a redirection to "A". 2) If there are not the conditions to demote "A", the fact that "A" reports to be a master will trigger a failover detection in "X", that will end into a reconfiguraiton to monitor "A". However if the Sentinel was not reachable, its state may not be updated, so in case it titled, or was partiitoned from the master instance of the slave to demote, the new implementation waits some time (enough to guarantee we can detect the new INFO, and new DOWN conditions). If after some time still there are not the right condiitons to demote the instance, the DEMOTE flag is cleared.	2013-04-26 17:02:13 +02:00
antirez	1965e22aa1	Sentinel: always redirect on master->slave transition. Sentinel redirected to the master if the instance changed runid or it was the first time we got INFO, and a role change was detected from master to slave. While this is a good idea in case of slave->master, since otherwise we could detect a failover without good reasons just after a reboot with a slave with a wrong configuration, in the case of master->slave transition is much better to always perform the redirection for the following reasons: 1) A Sentinel may go down for some time. When it is back online there is no other way to understand there was a failover. 2) Pointing clients to a slave seems to be always the wrong thing to do. 3) There is no good rationale about handling things differently once an instance is rebooted (runid change) in that case.	2013-04-24 11:30:17 +02:00

... 2 3 4 5 6

281 Commits