start_server now uses return value from Tcl exec to get the server pid,
however this introduces errors that depend from timing: a lot of the
testing code base assumed the server to be actually up and running when
server_start returns.
So the old code that waits to see the pid in the log file was restored.
Previously the PID format was:
[PID] Timestamp
But it recently changed to:
PID:X Timestamp
The tcl testing framework was grabbing the PID from \[\d+\], but
that's not valid anymore.
Now we grab the pid from "PID: <PID>" in the part of Redis startup
output to the right of the ASCII logo.
Better handling of connection errors in order to update the table and
recovery, populate the startup nodes table after fetching the list of
nodes.
More work to do about it, it is still not as reliable as
redis-rb-cluster implementation which is the minimal reference
implementation for Redis Cluster clients.
Sometimes the process is still there but no longer in a state that can
be checked (after being killed). This used to happen after a call to
SHUTDOWN NOSAVE in the scripting unit, causing a false positive.
It is now possible to kill and restart sentinel or redis instances for
more real-world testing.
The 01 unit tests the capability of Sentinel to update the configuration
of Sentinels rejoining the cluster, however the test is pretty trivial
and more tests should be added.
Some inline test moved into server_is_up procedure.
Also find_available_port was moved into util since it is going
to be used for the Sentinel test as well.
Due to changes in recent releases of osx leaks utility, the osx leak
detection no longer worked. Now it is fixed in a way that should be
backward compatible.
A new stress test was added to stress test the code converting a ziplist
into an hash table.
In this commit also randomValue helper function was modified to also
return negative values.
wait_for_condition is now used instead of the usual "after 1000" (that
is the way to sleep in Tcl). This should avoid to find the replica in
a state where it is loading the RDB in memory, returning -LOADING error.
This test used to fail when running the test over valgrind, due to the
added latencies.
Due to a change in the format of the bug report in case of crash of
failed assertion the test suite was no longer able to properly log it.
Instead just a protocol error was logged by the Redis TCL client that
provided no clue about the actual problem.
This commit resolves the issue by logging everything from the first line
of the log including the string REDIS BUG REPORT, till the end of the
file.
Now it uses the new wait_for_condition testing primitive.
Also wait_for_condition implementation was fixed in this commit to properly
escape the expr command and its argument.
A new primitive wait_for_condition was introduced in the scripting
engine that makes waiting for events simpler, so that it is simpler to
write tests that are more resistant to timing issues.