Sentinel: check Slave INFO state more often when disconnected.

During the initial handshake with the master a slave will report to have
a very high disconnection time from its master (since technically it was
disconnected since forever, so the current UNIX time in seconds is
reported).

However when the slave is connected again the Sentinel may re-scan the
INFO output again only after 10 seconds, which is a long time. During
this time Sentinels will consider this instance unable to failover, so
a useless delay is introduced.

Actaully this hardly happened in the practice because when a slave's
master is down, the INFO period for slaves changes to 1 second. However
when a manual failover is attempted immediately after adding slaves
(like in the case of the Sentinel unit test), this problem may happen.

This commit changes the INFO period to 1 second even in the case the
slave's master is not down, but the slave reported to be disconnected
from the master (by publishing, last time we checked, a master
disconnection time field in INFO).

This change is required as a result of an unrelated change in the
replication code that adds a small delay in the master-slave first
synchronization.
This commit is contained in:
antirez 2016-07-22 10:51:25 +02:00
parent 0a628e5102
commit 3e9ce38b0a
2 changed files with 10 additions and 3 deletions

View File

@ -2579,9 +2579,15 @@ void sentinelSendPeriodicCommands(sentinelRedisInstance *ri) {
/* If this is a slave of a master in O_DOWN condition we start sending
* it INFO every second, instead of the usual SENTINEL_INFO_PERIOD
* period. In this state we want to closely monitor slaves in case they
* are turned into masters by another Sentinel, or by the sysadmin. */
* are turned into masters by another Sentinel, or by the sysadmin.
*
* Similarly we monitor the INFO output more often if the slave reports
* to be disconnected from the master, so that we can have a fresh
* disconnection time figure. */
if ((ri->flags & SRI_SLAVE) &&
(ri->master->flags & (SRI_O_DOWN|SRI_FAILOVER_IN_PROGRESS))) {
((ri->master->flags & (SRI_O_DOWN|SRI_FAILOVER_IN_PROGRESS)) ||
(ri->master_link_down_time != 0)))
{
info_period = 1000;
} else {
info_period = SENTINEL_INFO_PERIOD;

View File

@ -6,7 +6,8 @@ test "Manual failover works" {
set old_port [RI $master_id tcp_port]
set addr [S 0 SENTINEL GET-MASTER-ADDR-BY-NAME mymaster]
assert {[lindex $addr 1] == $old_port}
S 0 SENTINEL FAILOVER mymaster
catch {S 0 SENTINEL FAILOVER mymaster} reply
assert {$reply eq "OK"}
foreach_sentinel_id id {
wait_for_condition 1000 50 {
[lindex [S $id SENTINEL GET-MASTER-ADDR-BY-NAME mymaster] 1] != $old_port