This makes tests a bit slower, but it is better to test things at a
decent scale instead of using just a few nodes, and for a few tests we
actually need so many nodes.
The test now runs in a self-contained directory.
The general abstractions to run the tests in an environment where
mutliple instances are executed at the same time was extrapolated into
instances.tcl, that will be reused to test Redis Cluster.
This commit sets the failover timeout to 30 seconds instead of the 180
seconds default, and allows to reconfigure multiple slaves at the same
time.
This makes tests less sensible to timing, with the result that there are
less false positives due to normal behaviors that require time to
succeed or to be retried.
However the long term solution is probably some way in order to detect
when a test failed because of timing issues (for example split brain
during leader election) and retry it.
It was verified in practice that this test is able to stress much more
the implementation by introducing errors that were only trivially to
detect with different offsets but impossible to detect starting always
at zero and counting bits the full length of the string.
An unit can abort in the middle for an error. The next unit should not
assume that the instances are in a clean state, and must restart what
was left killed.
Sentinel tests are designed to be dependent on the previous tests in the
same unit, so usually we can't continue with the next test in the same
unit if a previous test failed.
The test was previously performed by removing the master from the
Sentinel monitored masters. The test with the Sentinels crashed is
more similar to real-world partitions / failures.