Sentinel needs to avoid split brain conditions due to multiple sentinels
trying to get voted at the exact same time.
So far some desynchronization was provided by fluctuating server.hz,
that is the frequency of the timer function call. However the
desynchonization provided in this way was not enough when using many
Sentinel instances, especially when a large quorum value is used in
order to force a greater degree of agreement (more than N/2+1).
It was verified that it was likely to trigger a split brain
condition, forcing the system to try again after a timeout.
Usually the system will succeed after a few retries, but this is not
optimal.
This commit desynchronizes instances in a more effective way to make it
likely that the first attempt will be successful.
Now that we have a runtime configuration system, it is very important to
be able to log how the Sentinel configuration changes over time because
of API calls.
This error was conceived for the older version of Sentinel that worked
via master redirection and that was not able to get configuration
updates from other Sentinels via the Pub/Sub channel of masters or
slaves.
This reply does not make sense today, every Sentinel should reply with
the best information it has currently. The error will make even more
sense in the future since the plan is to allow Sentinels to update the
configuration of other Sentinels via gossip with a direct chat without
the prerequisite that they have at least a monitored instance in common.
If we can't reconfigure a slave in time during failover, go forward as
anyway the slave will be fixed by Sentinels in the future, once they
detect it is misconfigured.
Otherwise a failover in progress may never terminate if for some reason
the slave is uncapable to sync with the master while at the same time
it is not disconnected.
Now it logs the file name if it is not accessible. Also there is a
different error for the missing config file case, and for the non
writable file case.
The new command allows to change master-specific configurations
at runtime. All the settable parameters can be retrivied via the
SENTINEL MASTER command, so there is no equivalent "GET" command.
The claim about unlinking the instance from the connected hash tables
was the opposite of the reality. Also the current actual behavior is
safer in most cases, so it is better to manually unlink when needed.
Redis hash table implementation has many non-blocking features like
incremental rehashing, however while deleting a large hash table there
was no way to have a callback called to do some incremental work.
This commit adds this support, as an optiona callback argument to
dictEmpty() that is currently called at a fixed interval (one time every
65k deletions).
The way the role change was recoded was not sane and too much
convoluted, causing the role information to be not always updated.
This commit fixes issue #1445.
When there is a master address switch, the reported role must be set to
master so that we have a chance to re-sample the INFO output to check if
the new address is reporting the right role.
Otherwise if the role was wrong, it will be sensed as wrong even after
the address switch, and for enough time according to the role change
time, for Sentinel consider the master SDOWN.
This fixes isue #1446, that describes the effects of this bug in
practice.
Sentinels are now desynchronized in a better way changing the time
handler frequency between 10 and 20 HZ. This way on average a
desynchronization of 25 milliesconds is produced that should be larger
enough compared to network latency, avoiding most split-brain condition
during the vote.
Now that the clocks are desynchronized, to have larger random delays when
performing operations can be easily achieved in the following way.
Take as example the function that starts the failover, that is
called with a frequency between 10 and 20 HZ and will start the
failover every time there are the conditions. By just adding as an
additional condition something like rand()%4 == 0, we can amplify the
desynchronization between Sentinel instances easily.
See issue #1419.
The result of this one-char bug was pretty serious, if the new master
had the same port of the previous master, but just a different IP
address, non-leader Sentinels would not be able to recognize the
configuration change.
This commit fixes issue #1394.
Many thanks to @shanemadden that reported the bug and helped
investigating it.
Some are just to know if the master is down, and in this case the runid
in the request is set to "*", others are actually in order to seek for a
vote and get elected. In the latter case the runid is set to the runid
of the instance seeking for the vote.
Also the sentinel configuration rewriting was modified in order to
account for failover in progress, where we need to provide the promoted
slave address as master address, and the old master address as one of
the slaves address.
We'll use CONFIG REWRITE (internally) in order to store the new
configuration of a Sentinel after the internal state changes. In order
to do so, we need configuration options (that usually the user will not
touch at all) about config epoch of the master, Sentinels and Slaves
known for this master, and so forth.
The time Sentinel waits since the slave is detected to be configured to
the wrong master, before reconfiguring it, is now the failover_timeout
time as this makes more sense in order to give the Sentinel performing
the failover enoung time to reconfigure the slaves slowly (if required
by the configuration).
Also we now PUBLISH more frequently the new configuraiton as this allows
to switch the reapprearing master back to slave faster.
Also defaulf failover timeout changed to 3 minutes as the failover is a
fairly fast procedure most of the times, unless there are a very big
number of slaves and the user picked to configure them sequentially (in
that case the user should change the failover timeout accordingly).
Once we switched configuration during a failover, we should advertise
the new address.
This was a serious race condition as the Sentinel performing the
failover for a moment advertised the old address with the new
configuration epoch: once trasmitted to the other Sentinels the broken
configuration would remain there forever, until the next failover
(because a greater configuration epoch is required to overwrite an older
one).
Now Sentinel believe the current configuration is always the winner and
should be applied by Sentinels instead of trying to adapt our view of
the cluster based on what we observe.
So the only way to modify what a Sentinel believe to be the truth is to
win an election and advertise the new configuration via Pub / Sub with a
greater configuration epoch.
Changes to leadership handling.
Now the leader gets selected by every Sentinel, for a specified epoch,
when the SENTINEL is-master-down-by-addr is sent.
This command now includes the runid and the currentEpoch of the instance
seeking for a vote. The Sentinel only votes a single time in a given
epoch.
Still a work in progress, does not even compile at this stage.
Sentinel state now includes the idea of current epoch and config epoch.
In the Hello message, that is now published both on masters and slaves,
a Sentinel no longer just advertises itself but also broadcasts its
current view of the configuration: the master name / ip / port and its
current epoch.
Sentinels receiving such information switch to the new master if the
configuration epoch received is newer and the ip / port of the master
are indeed different compared to the previos ones.
AUTH and SCRIPT KILL were sent without incrementing the pending commands
counter. Clearly this needs some kind of wrapper doing it for the caller
in order to be less bug prone.
This change makes Sentinel less fragile about a number of failure modes.
This commit also fixes a different bug as a side effect, SLAVEOF command
was sent multiple times without incrementing the pending commands count.
Actaully the string is modified in-place and a reallocation is never
needed, so there is no need to return the new sds string pointer as
return value of the function, that is now just "void".
This has been done by exposing the anetSockName() function anet.c
to be used when the sentinel is publishing its existence to the masters.
This implementation is very unintelligent as it will likely break if used
with IPv6 as the nested colons will break any parsing of the PUBLISH string
by the master.
Sentinel was not able to detect slaves when connected to a very recent
version of Redis master since a previos non-backward compatible change
to INFO broken the parsing of the slaves ip:port INFO output.
This fixes issue #1164
Tilt mode was too aggressive (not processing INFO output), this
resulted in a few problems:
1) Redirections were not followed when in tilt mode. This opened a
window to misinform clients about the current master when a Sentinel
was in tilt mode and a fail over happened during the time it was not
able to update the state.
2) It was possible for a Sentinel exiting tilt mode to detect a false
fail over start, if a slave rebooted with a wrong configuration
about at the same time. This used to happen since in tilt mode we
lose the information that the runid changed (reboot).
Now instead the Sentinel in tilt mode will still remove the instance
from the list of slaves if it changes state AND runid at the same
time.
Both are edge conditions but the changes should overall improve the
reliability of Sentinel.
We used to always turn a master into a slave if the DEMOTE flag was set,
as this was a resurrecting master instance.
However the following race condition is possible for a Sentinel that
got partitioned or internal issues (tilt mode), and was not able to
refresh the state in the meantime:
1) Sentinel X is running, master is instance "A".
3) "A" fails, sentinels will promote slave "B" as master.
2) Sentinel X goes down because of a network partition.
4) "A" returns available, Sentinels will demote it as a slave.
5) "B" fails, other Sentinels will promote slave "A" as master.
6) At this point Sentinel X comes back.
When "X" comes back he thinks that:
"B" is the master.
"A" is the slave to demote.
We want to avoid that Sentinel "X" will demote "A" into a slave.
We also want that Sentinel "X" will detect that the conditions changed
and will reconfigure itself to monitor the right master.
There are two main ways for the Sentinel to reconfigure itself after
this event:
1) If "B" is reachable and already configured as a slave by other
sentinels, "X" will perform a redirection to "A".
2) If there are not the conditions to demote "A", the fact that "A"
reports to be a master will trigger a failover detection in "X", that
will end into a reconfiguraiton to monitor "A".
However if the Sentinel was not reachable, its state may not be updated,
so in case it titled, or was partiitoned from the master instance of the
slave to demote, the new implementation waits some time (enough to
guarantee we can detect the new INFO, and new DOWN conditions).
If after some time still there are not the right condiitons to demote
the instance, the DEMOTE flag is cleared.
Sentinel redirected to the master if the instance changed runid or it
was the first time we got INFO, and a role change was detected from
master to slave.
While this is a good idea in case of slave->master, since otherwise we
could detect a failover without good reasons just after a reboot with a
slave with a wrong configuration, in the case of master->slave
transition is much better to always perform the redirection for the
following reasons:
1) A Sentinel may go down for some time. When it is back online there is
no other way to understand there was a failover.
2) Pointing clients to a slave seems to be always the wrong thing to do.
3) There is no good rationale about handling things differently once an
instance is rebooted (runid change) in that case.
If we don't have any clue about a master since it never replied to INFO
so far, reply with an -IDONTKNOW error to SENTINEL
get-master-addr-by-name requests.
Before this commit Sentienl used to redirect master ip/addr if the
current instance reported to be a slave only if this was the first INFO
output received, and the role was found to be slave.
Now instead also if we find that the runid is different, and the
reported role is slave, we also redirect to the reported master ip/addr.
This unifies the behavior of Sentinel in the case of a reboot (where it
will see the first INFO output with the wrong role and will perform the
redirection), with the behavior of Sentinel in the case of a change in
what it sees in the INFO output of the master.
The slave priority that is now published by Redis in INFO output is
now used by Sentinel in order to select the slave with minimum priority
for promotion, and in order to consider slaves with priority set to 0 as
not able to play the role of master (they will never be promoted by
Sentinel).
The "slave-priority" field is now one of the fileds that Sentinel
publishes when describing an instance via the SENTINEL commands such as
"SENTINEL slaves mastername".
From the point of view of Redis an instance replying -BUSY is down,
since it is effectively not able to reply to user requests. However
a looping script is a recoverable condition in Redis if the script still
did not performed any write to the dataset. In that case performing a
fail over is not optimal, so Sentinel now tries to restore the normal server
condition killing the script with a SCRIPT KILL command.
If the script already performed some write before entering an infinite
(or long enough to timeout) loop, SCRIPT KILL will not work and the
fail over will be triggered anyway.
This command can be used in order to force a Sentinel instance to start
a failover for the specified master, as leader, forcing the failover
even if the master is up.
The commit also adds some minor refactoring and other improvements to
functions already implemented that make them able to work when the
master is not in SDOWN condition. For instance slave selection
assumed that we ask INFO every second to every slave, this is true
only when the master is in SDOWN condition, so slave selection did not
worked when the master was not in SDOWN condition.
This commit adds support to optionally execute a script when one of the
following events happen:
* The failover starts (with a slave already promoted).
* The failover ends.
* The failover is aborted.
The script is called with enough parameters (documented in the example
sentinel.conf file) to provide information about the old and new ip:port
pair of the master, the role of the sentinel (leader or observer) and
the name of the master.
The goal of the script is to inform clients of the configuration change
in a way specific to the environment Sentinel is running, that can't be
implemented in a genereal way inside Sentinel itself.
When we are in wait start, if another leader (or any other external
entity) turns a slave into a master, abort the failover, and detect it
as an observer.
Note that the wait-start state is mainly there for this reason but the
abort was yet not implemented.
This adds a new sentinel event -failover-abort-race.
When we are a Leader Sentinel in wait-start state, starting with this
commit the failover is aborted if the master returns online.
This improves the way we handle a notable case of net split, that is the
split between Sentinels and Redis servers, that will be a very common
case of split becase Sentinels will often be installed in the client's
network and servers can be in a differnt arm of the network.
When Sentinels and Redis servers are isolated the master is in ODOWN
condition since the Sentinels can agree about this state, however the
failover does not start since there are no good slaves to promote (in
this specific case all the slaves are unreachable).
However when the split is resolved, Sentinels may sense the slave back
a moment before they sense the master is back, so the failover may start
without a good reason (since the master is actually working too).
Now this condition is reversible, so the failover will be aborted
immediately after if the master is detected to be working again, that
is, not in SDOWN nor in ODOWN condition.
We no longer use a vanilla fork+execve but take a queue of jobs of
scripts to execute, with retry on error, timeouts, and so forth.
Currently this is used only for notifications but soon the ability to
also call clients reconfiguration scripts will be added.
The previous behavior of the state machine was to wait some time and
retry the slave selection, but this is not robust enough against drastic
changes in the conditions of the monitored instances.
What we do now when the slave selection fails is to abort the failover
and return back monitoring the master. If the ODOWN condition is still
present a new failover will be triggered and so forth.
This commit also refactors the code we use to abort a failover.
When we reset the master we should start with clean timestamps for ping
replies otherwise we'll detect a spurious +sdown event, because on
+master-switch event the previous master instance was probably in +sdown
condition. Since we updated the address we should count time from
scratch again.
Also this commit makes sure to explicitly reset the count of pending
commands, now we can do this because of the new way the hiredis link
is closed.
We disconnect the Redis instances hiredis link in a more robust way now.
Also we change the way we perform the redirection for the +switch-master
event, that is not just an instance reset with an address change.
Using the same system we now implement the +redirect-to-master event
that is triggered by an instance that is configured to be master but
found to be a slave at the first INFO reply. In that case we monitor the
master instead, logging the incident as an event.
Sentinel observers detect failover checking if a slave attached to the
monitored master turns into its replication state from slave to master.
However while this change may in theory only happen after a SLAVEOF NO
ONE command, in practie it is very easy to reboot a slave instance with
a wrong configuration that turns it into a master, especially if it was
a past master before a successfull failover.
This commit changes the detection policy so that if an instance goes
from slave to master, but at the same time the runid has changed, we
sense a reboot, and in that case we don't detect a failover at all.
This commit also introduces the "reboot" sentinel event, that is logged
at "warning" level (so this will trigger an admin notification).
The commit also fixes a problem in the disconnect handler that assumed
that the instance object always existed, that is not the case. Now we
no longer assume that redisAsyncFree() will call the disconnection
handler before returning.
This commit implements the first, beta quality implementation of Redis
Sentinel, a distributed monitoring system for Redis with notification
and automatic failover capabilities.
More info at http://redis.io/topics/sentinel