The way the role change was recoded was not sane and too much
convoluted, causing the role information to be not always updated.
This commit fixes issue #1445.
When there is a master address switch, the reported role must be set to
master so that we have a chance to re-sample the INFO output to check if
the new address is reporting the right role.
Otherwise if the role was wrong, it will be sensed as wrong even after
the address switch, and for enough time according to the role change
time, for Sentinel consider the master SDOWN.
This fixes isue #1446, that describes the effects of this bug in
practice.
During the refactoring of blocking operations, commit
82b672f633, a bug was introduced where
a milliseconds time is compared to a seconds time, so all the clients
always appear to timeout if timeout is set to non-zero value.
Thanks to Jonathan Leibiusky for finding the bug and helping verifying
the cause and fix.
Sentinels are now desynchronized in a better way changing the time
handler frequency between 10 and 20 HZ. This way on average a
desynchronization of 25 milliesconds is produced that should be larger
enough compared to network latency, avoiding most split-brain condition
during the vote.
Now that the clocks are desynchronized, to have larger random delays when
performing operations can be easily achieved in the following way.
Take as example the function that starts the failover, that is
called with a frequency between 10 and 20 HZ and will start the
failover every time there are the conditions. By just adding as an
additional condition something like rand()%4 == 0, we can amplify the
desynchronization between Sentinel instances easily.
See issue #1419.
The result of this one-char bug was pretty serious, if the new master
had the same port of the previous master, but just a different IP
address, non-leader Sentinels would not be able to recognize the
configuration change.
This commit fixes issue #1394.
Many thanks to @shanemadden that reported the bug and helped
investigating it.
At the end of the file, CONFIG REWRITE adds a comment line that:
# Generated by CONFIG REWRITE
Followed by the additional config options required. However this was
added again and again at every rewrite in praticular conditions (when a
given set of options change in a given time during the time).
Now if it was alrady encountered, it is not added a second time.
This is especially important for Sentinel that rewrites the config at
every state change.
Some are just to know if the master is down, and in this case the runid
in the request is set to "*", others are actually in order to seek for a
vote and get elected. In the latter case the runid is set to the runid
of the instance seeking for the vote.
Also the sentinel configuration rewriting was modified in order to
account for failover in progress, where we need to provide the promoted
slave address as master address, and the old master address as one of
the slaves address.