This commit sets the failover timeout to 30 seconds instead of the 180
seconds default, and allows to reconfigure multiple slaves at the same
time.
This makes tests less sensible to timing, with the result that there are
less false positives due to normal behaviors that require time to
succeed or to be retried.
However the long term solution is probably some way in order to detect
when a test failed because of timing issues (for example split brain
during leader election) and retry it.
It was verified in practice that this test is able to stress much more
the implementation by introducing errors that were only trivially to
detect with different offsets but impossible to detect starting always
at zero and counting bits the full length of the string.
An unit can abort in the middle for an error. The next unit should not
assume that the instances are in a clean state, and must restart what
was left killed.
Sentinel tests are designed to be dependent on the previous tests in the
same unit, so usually we can't continue with the next test in the same
unit if a previous test failed.
The test was previously performed by removing the master from the
Sentinel monitored masters. The test with the Sentinels crashed is
more similar to real-world partitions / failures.
The area a number of mandatory tests to craete a stable setup for
testing that is not too sensitive to timing issues. All those tests
moved to includes/init-tests, and marked as (init).
It is now possible to kill and restart sentinel or redis instances for
more real-world testing.
The 01 unit tests the capability of Sentinel to update the configuration
of Sentinels rejoining the cluster, however the test is pretty trivial
and more tests should be added.
Some inline test moved into server_is_up procedure.
Also find_available_port was moved into util since it is going
to be used for the Sentinel test as well.
The Redis test uses a server-clients model in order to parallelize the
execution of different tests. However in recent versions of osx not
setting the channel to a binary encoding caused issues even if AFAIK no
binary data is really sent via this channel. However now the channels
are deliberately set to a binary encoding and this solves the issue.
The exact issue was the test not terminating and giving the impression
of running forever, since test clients or servers were unable to
exchange the messages to continue.