This is a catch-all test to confirm that that rewrite produces a valid
output for all parameters and that this process does not introduce
undesired configuration changes.
The Redis sentinel would crash with a segfault after a few minutes
because it tried to read from a page without read permissions. Check up
front whether the sds is long enough to contain redis:slave or
redis:master before memcmp() as is done everywhere else in
sentinelRefreshInstanceInfo().
Bug report and commit message from Theo Buehler. Fix from Nam Nguyen.
Co-authored-by: Nam Nguyen <namn@berkeley.edu>
when trigger a always fail scripts, sentinel.running_scripts will increase ten times, however it
only decrease one times onretry the maximum. and it will't reset, when it become
SENTINEL_SCRIPT_MAX_RUNNING, sentinel don't trigger scripts.
* Introduce a connection abstraction layer for all socket operations and
integrate it across the code base.
* Provide an optional TLS connections implementation based on OpenSSL.
* Pull a newer version of hiredis with TLS support.
* Tests, redis-cli updates for TLS support.
So far it was not possible to setup Sentinel with authentication
enabled. This commit introduces this feature: every Sentinel will try to
authenticate with other sentinels using the same password it is
configured to accept clients with.
So for instance if a Sentinel has a "requirepass" configuration
statemnet set to "foo", it will use the "foo" password to authenticate
with every other Sentinel it connects to. So basically to add the
"requirepass" to all the Sentinels configurations is enough in order to
make sure that:
1) Clients will require the password to access the Sentinels instances.
2) Each Sentinel will use the same password to connect and authenticate
with every other Sentinel in the group.
Related to #3279 and #3329.
SENTINEL REPLICAS was added as an alias, in the configuration rewriting
now it uses known-replica, however all the rest is basically at API
level of logged events and messages having to do with the protocol, so
there is very little to do here compared to the Redis core itself, to
preserve compatibility.
Instead of telling the user to set the renamed command to "" to remove
the renaming, to the obvious thing when a command is renamed to itself.
So if I want to remove the renaming of PING, I just rename it to PING
again.
The ability of "SENTINEL SET" to change the reconfiguration script at
runtime is a problem even in the security model of Redis: any client
inside the network may set any executable to be ran once a failover is
triggered.
This option adds protection for this problem: by default the two
SENTINEL SET subcommands modifying scripts paths are denied. However the
user is still able to rever that using the Sentinel configuration file
in order to allow such a feature.
See issue #2819 for details. The gist is that when we want to send INFO
because we are over the time, we used to send only INFO commands, no
longer sending PING commands. However if a master fails exactly when we
are about to send an INFO command, the PING times will result zero
because the PONG reply was already received, and we'll fail to send more
PINGs, since we try only to send INFO commands: the failure detector
will delay until the connection is closed and re-opened for "long
timeout".
This commit changes the logic so that we can send the three kind of
messages regardless of the fact we sent another one already in the same
code path. It could happen that we go over the message limit for the
link by a few messages, but this is not significant. However now we'll
not introduce delays in sending commands just because there was
something else to send at the same time.
This change attempts to switch to an hash function which mitigates
the effects of the HashDoS attack (denial of service attack trying
to force data structures to worst case behavior) while at the same time
providing Redis with an hash function that does not expect the input
data to be word aligned, a condition no longer true now that sds.c
strings have a varialbe length header.
Note that it is possible sometimes that even using an hash function
for which collisions cannot be generated without knowing the seed,
special implementation details or the exposure of the seed in an
indirect way (for example the ability to add elements to a Set and
check the return in which Redis returns them with SMEMBERS) may
make the attacker's life simpler in the process of trying to guess
the correct seed, however the next step would be to switch to a
log(N) data structure when too many items in a single bucket are
detected: this seems like an overkill in the case of Redis.
SPEED REGRESION TESTS:
In order to verify that switching from MurmurHash to SipHash had
no impact on speed, a set of benchmarks involving fast insertion
of 5 million of keys were performed.
The result shows Redis with SipHash in high pipelining conditions
to be about 4% slower compared to using the previous hash function.
However this could partially be related to the fact that the current
implementation does not attempt to hash whole words at a time but
reads single bytes, in order to have an output which is endian-netural
and at the same time working on systems where unaligned memory accesses
are a problem.
Further X86 specific optimizations should be tested, the function
may easily get at the same level of MurMurHash2 if a few optimizations
are performed.
During the initial handshake with the master a slave will report to have
a very high disconnection time from its master (since technically it was
disconnected since forever, so the current UNIX time in seconds is
reported).
However when the slave is connected again the Sentinel may re-scan the
INFO output again only after 10 seconds, which is a long time. During
this time Sentinels will consider this instance unable to failover, so
a useless delay is introduced.
Actaully this hardly happened in the practice because when a slave's
master is down, the INFO period for slaves changes to 1 second. However
when a manual failover is attempted immediately after adding slaves
(like in the case of the Sentinel unit test), this problem may happen.
This commit changes the INFO period to 1 second even in the case the
slave's master is not down, but the slave reported to be disconnected
from the master (by publishing, last time we checked, a master
disconnection time field in INFO).
This change is required as a result of an unrelated change in the
replication code that adds a small delay in the master-slave first
synchronization.
This commit both fixes the crash reported with issue #3364 and
also properly closes the old links after the Sentinel address for the
other masters gets updated.
The two problems where:
1. The Sentinel that switched address may not monitor all the masters,
it is possible that there is no match, and the 'match' variable is
NULL. Now we check for no match and 'continue' to the next master.
2. By ispecting the code because of issue "1" I noticed that there was a
problem in the code that disconnects the link of the Sentinel that
needs the address update. Basically link->disconnected is non-zero
even if just *a single link* (cc -- command link or pc -- pubsub
link) are disconnected, so to check with if (link->disconnected)
in order to close the links risks to leave one link connected.
I was able to manually reproduce the crash at "1" and verify that the
commit resolves the issue.
Close#3364.
This bug most experienced effect was an inability of Redis to
reconfigure back old masters to slaves after they are reachable again
after a failover. This was due to failing to reset the count of the
pending commands properly, so the master appeared fovever down.
Was introduced in Redis 3.2 new Sentinel connection sharing feature
which is a lot more complex than the 3.0 code, but more scalable.
Many thanks to people reporting the issue, and especially to
@sskorgal for investigating the issue in depth.
Hopefully closes#3285.