It's ok to ping as soon as the ping period has elapsed since we received
the last PONG, but it's not good that we ping again if there is a
pending ping... With this change we'll send a new ping if there is one
pending only if two times the ping period elapsed since the ping which
is still pending was sent.
This is useful for debugging and logging activities: given a
sentinelRedisInstance object returns a C string representing the
instance type: master, slave, sentinel.
This new command triggers a config flush to save the in-memory config to
disk. This is useful for cases of a configuration management system or a
package manager wiping out your sentinel config while the process is
still running - and has not yet been restarted. It can also be useful
for scripting a backup and migrate or clone of a running sentinel.
Since with a previous commit Sentinels now persist their unique ID, we
no longer need to detect duplicated Sentinels and re-add them. We remove
and re-add back using different events only in the case of address
switch of the same Sentinel, without generating a new +sentinel event.
Previously Sentinels always changed unique ID across restarts, relying
on the server.runid field. This is not a good idea, and forced Sentinel
to rely on detection of duplicated Sentinels and a potentially dangerous
clean-up and re-add operation of the Sentinel instance that was
rebooted.
Now the ID is generated at the first start and persisted in the
configuration file, so that a given Sentinel will have its unique
ID forever (unless the configuration is manually deleted or there is a
filesystem corruption).
Originally, only the +slave event which occurs when a slave is
reconfigured during sentinelResetMasterAndChangeAddress triggers a flush
of the config to disk. However, newly discovered slaves don't
apparently trigger this flush but do trigger the +slave event issuance.
So if you start up a sentinel, add a master, then add a slave to the
master (as a way to reproduce it) you'll see the +slave event issued,
but the sentinel config won't be updated with the known-slave entry.
This change makes sentinel do the flush of the config if a new slave is
deteted in sentinelRefreshInstanceInfo.
To rewrite the config in the loop that adds slaves back after a master
reset, in order to handle switching to another master, is useless: it
just adds latency since there is an fsync call in the inner loop,
without providing any additional guarantee, but the contrary, since if
after the first loop iteration the server crashes we end with just a
single slave entry losing all the other informations.
It is wiser to rewrite the config at the end when the full new
state is configured.
When trying to debug sentinel connections or max connections errors it
would be very useful to have the ability to see the list of connected
clients to a running sentinel. At the same time it would be very helpful
to be able to name each sentinel connection or kill offending clients.
This commits adds the already defined CLIENT commands back to Redis
Sentinel.
Improvements:
- Return empty string if asking for non-existing section (INFO foo)
- Fix potential memory leak (caused by sdsempty() then returned if >2 args)
- Clean up argument parsing
- Allow "all" as valid section (same as "default" or zero args currently)
- Move strcasecmp to end of evaluation chain in conditionals
Also, since we're C99, I moved some variable declarations to be closer
to where they are actually used (saves us from needing to free an empty info
if detect argument errors up front).
Closes#1915Closes#1966
I guess the initial goal of the initialization was to suppress GCC
warning, but if we have to initialize, we can do it with the base-case
value instead of NULL which is never retained.
Sentinel queries the INFO from every master and from every replica of
every master.
We can cache the INFO results in Sentinel so Sentinel can be a single
place to quickly get all INFO output for an entire Sentinel monitoring
group.
This commit gives us SENTINEL INFO-CACHE in two forms:
- SENTINEL INFO-CACHE — returns all masters and all replicas
- SENTINEL INFO-CACHE master0 master1 ... masterN — vararg specify masters
Results are returned as a multibulk reply with two top-level entries
for each master. The first entry for each master is the name of the master.
The second entry is a nested multibulk reply with the contents of INFO,
first for the master, then an additional entry for each of the
replicas.
- Remove trailing newlines from redis.conf
- Fix comment misspelling
- Clarifies zipEncodeLength usage and a C API mention (#1243, #1242)
- Fix cluster typos (inspired by @papanikge #1507)
- Fix rewite -> rewrite in a few places (inspired by #682)
Closes#1243, #1242, #1507
The code to check the number of voters was never updated to follow the new
Sentinel specification, so the number of voters was computed using only
the set of Sentinels that provided a vote.
This means that there is a changing majority on partitions, even if
usually the issue is not triggered because of the configured quorum
check (what was broken was the other implicit check that requires anyway
half of the known sentinels to agree in order to start a failover).
The original implementation was modified in order to allow to
selectively announce a different IP or port, and to rewrite the two
options in the config file after a rewrite.
Some deployments need traffic sent from a specific address. This
change uses the same policy as Cluster where the first listed bindaddr
becomes the source address for outgoing Sentinel communication.
Fixes#1667
Eventual configuration convergence is guaranteed by our periodic hello
messages to all the instances, however when there are important notices
to share, better make a phone call. With this commit we force an hello
message to other Sentinal and Redis instances within the next 100
milliseconds of a config update, which is practically better than
waiting a few seconds.
Lack of check of the SRI_PROMOTED flag caused Sentienl to act with the
promoted slave turned into a master during failover like if it was a
normal instance.
Normally this problem was not apparent because during real failovers the
old master is down so the bugged code path was not entered, however with
manual failovers via the SENTINEL FAILOVER command, the problem was
easily triggered.
This commit prevents promoted slaves from getting reconfigured, moreover
we now explicitly check that during a failover the slave turning into a
master is the one we selected for promotion and not a different one.
This implements the new Sentinel-Client protocol for the Sentinel part:
now instances are reconfigured using a transaction that ensures that the
config is rewritten in the target instance, and that clients lose the
connection with the instance, in order to be forced to: ask Sentinel,
reconnect to the instance, and verify the instance role with the new
ROLE command.
When a Sentinel performs a failover (successful or not), or when a
Sentinel votes for a different Sentinel trying to start a failover, it
sets a min delay before it will try to get elected for a failover.
While not strictly needed, because if multiple Sentinels will try
to failover the same master at the same time, only one configuration
will eventually win, this serialization is practically very useful.
Normal failovers are cleaner: one Sentinel starts to failover, the
others update their config when the Sentinel performing the failover
is able to get the selected slave to move from the role of slave to the
one of master.
However currently this timeout was implicit, so users could see
Sentinels not reacting, after a failed failover, for some time, without
giving any feedback in the logs to the poor sysadmin waiting for clues.
This commit makes Sentinels more verbose about the delay: when a master
is down and a failover attempt is not performed because the delay has
still not elaped, something like that will be logged:
Next failover delay: I will not start a failover
before Thu May 8 16:48:59 2014