The implementation of the diskless replication was currently diskless only on the master side.
The slave side was still storing the received rdb file to the disk before loading it back in and parsing it.
This commit adds two modes to load rdb directly from socket:
1) when-empty
2) using "swapdb"
the third mode of using diskless slave by flushdb is risky and currently not included.
other changes:
--------------
distinguish between aof configuration and state so that we can re-enable aof only when sync eventually
succeeds (and not when exiting from readSyncBulkPayload after a failed attempt)
also a CONFIG GET and INFO during rdb loading would have lied
When loading rdb from the network, don't kill the server on short read (that can be a network error)
Fix rdb check when performed on preamble AOF
tests:
run replication tests for diskless slave too
make replication test a bit more aggressive
Add test for diskless load swapdb
* allowing --single to be repeated
* adding --only so that only a specific test inside a unit can be run
* adding --skiptill useful to resume a test that crashed passed the problematic unit.
useful together with --clients 1
* adding --skipfile to use a file containing list of tests names to skip
* printing the names of the tests that are skiped by skipfile or denytags
* adding --config to add config file options from command line
* fail the test (exit code) in case of timeout.
* add --wait-server to allow attaching a debugger
* add --dont-clean to keep log files when tests are done
This replaces individual ziplist vs. linkedlist representations
for Redis list operations.
Big thanks for all the reviews and feedback from everybody in
https://github.com/antirez/redis/pull/2143
start_server now uses return value from Tcl exec to get the server pid,
however this introduces errors that depend from timing: a lot of the
testing code base assumed the server to be actually up and running when
server_start returns.
So the old code that waits to see the pid in the log file was restored.
Previously the PID format was:
[PID] Timestamp
But it recently changed to:
PID:X Timestamp
The tcl testing framework was grabbing the PID from \[\d+\], but
that's not valid anymore.
Now we grab the pid from "PID: <PID>" in the part of Redis startup
output to the right of the ASCII logo.
Better handling of connection errors in order to update the table and
recovery, populate the startup nodes table after fetching the list of
nodes.
More work to do about it, it is still not as reliable as
redis-rb-cluster implementation which is the minimal reference
implementation for Redis Cluster clients.
Sometimes the process is still there but no longer in a state that can
be checked (after being killed). This used to happen after a call to
SHUTDOWN NOSAVE in the scripting unit, causing a false positive.
It is now possible to kill and restart sentinel or redis instances for
more real-world testing.
The 01 unit tests the capability of Sentinel to update the configuration
of Sentinels rejoining the cluster, however the test is pretty trivial
and more tests should be added.
Some inline test moved into server_is_up procedure.
Also find_available_port was moved into util since it is going
to be used for the Sentinel test as well.
Due to changes in recent releases of osx leaks utility, the osx leak
detection no longer worked. Now it is fixed in a way that should be
backward compatible.
A new stress test was added to stress test the code converting a ziplist
into an hash table.
In this commit also randomValue helper function was modified to also
return negative values.
wait_for_condition is now used instead of the usual "after 1000" (that
is the way to sleep in Tcl). This should avoid to find the replica in
a state where it is loading the RDB in memory, returning -LOADING error.
This test used to fail when running the test over valgrind, due to the
added latencies.
Due to a change in the format of the bug report in case of crash of
failed assertion the test suite was no longer able to properly log it.
Instead just a protocol error was logged by the Redis TCL client that
provided no clue about the actual problem.
This commit resolves the issue by logging everything from the first line
of the log including the string REDIS BUG REPORT, till the end of the
file.
Now it uses the new wait_for_condition testing primitive.
Also wait_for_condition implementation was fixed in this commit to properly
escape the expr command and its argument.
A new primitive wait_for_condition was introduced in the scripting
engine that makes waiting for events simpler, so that it is simpler to
write tests that are more resistant to timing issues.
networking related stuff moved into networking.c
moved more code
more work on layout of source code
SDS instantaneuos memory saving. By Pieter and Salvatore at VMware ;)
cleanly compiling again after the first split, now splitting it in more C files
moving more things around... work in progress
split replication code
splitting more
Sets split
Hash split
replication split
even more splitting
more splitting
minor change