- the API name was odd, separated to two apis one for LRU and one for LFU
- the LRU idle time was in 1 second resolution, which might be ok for RDB
and RESTORE, but i think modules may need higher resolution
- adding tests for LFU and for handling maxmemory policy mismatch
After the thread in #6537 and thanks to the suggestions received, this
commit updates the original patch in order to:
1. Solve the problem of updating the time in multiple places by updating
it in call().
2. Avoid introducing a new field but use our cached time.
This required some minor refactoring to the function updating the time,
and the introduction of a new cached time in microseconds in order to
use less gettimeofday() calls.
* replication hooks: role change, master link status, replica online/offline
* persistence hooks: saving, loading, loading progress
* misc hooks: cron loop, shutdown, module loaded/unloaded
* change the way hooks test work, and add tests for all of the above
startLoading() now gets flag indicating what is loaded.
stopLoading() now gets an indication of success or failure.
adding startSaving() and stopSaving() with similar args and role.
misc:
- handle SSL_has_pending by iterating though these in beforeSleep, and setting timeout of 0 to aeProcessEvents
- fix issue with epoll signaling EPOLLHUP and EPOLLERR only to the write handlers. (needed to detect the rdb pipe was closed)
- add key-load-delay config for testing
- trim connShutdown which is no longer needed
- rioFdsetWrite -> rioFdWrite - simplified since there's no longer need to write to multiple FDs
- don't detect rdb child exited (don't call wait3) until we detect the pipe is closed
- Cleanup bad optimization from rio.c, add another one
* Introduce a connection abstraction layer for all socket operations and
integrate it across the code base.
* Provide an optional TLS connections implementation based on OpenSSL.
* Pull a newer version of hiredis with TLS support.
* Tests, redis-cli updates for TLS support.
When implementing the code that saves and loads these aux fields we used rdb
format that was added for that in redis 5.0, but then we added the 'when' field
which meant that the old redis-check-rdb won't be able to skip these.
this fix adds an opcode as if that 'when' is part of the module data.
Without such change, the diskless replicas, when loading RDB files from
the socket will not abort when a broken RDB file gets loaded. This is
potentially unsafe, because right now Redis is not able to guarantee
that encoding errors are safe from the POV of memory corruptions (for
instance the LZF library may not be safe against untrusted data?) so
better to abort when the RDB file we are going to load is corrupted.
Instead I/O errors are still returned to the caller without aborting,
so that in case of short read the diskless replica can try again.
now that replica can read rdb directly from the socket, it should avoid exiting
on short read and instead try to re-sync.
this commit tries to have minimal effects on non-diskless rdb reading.
and includes a test that tries to trigger this scenario on various read cases.
* create module API for forking child processes.
* refactor duplicate code around creating and tracking forks by AOF and RDB.
* child processes listen to SIGUSR1 and dies exitFromChild in order to
eliminate a valgrind warning of unhandled signal.
* note that BGSAVE error reply has changed.
valgrind error is:
Process terminating with default action of signal 10 (SIGUSR1)
The implementation of the diskless replication was currently diskless only on the master side.
The slave side was still storing the received rdb file to the disk before loading it back in and parsing it.
This commit adds two modes to load rdb directly from socket:
1) when-empty
2) using "swapdb"
the third mode of using diskless slave by flushdb is risky and currently not included.
other changes:
--------------
distinguish between aof configuration and state so that we can re-enable aof only when sync eventually
succeeds (and not when exiting from readSyncBulkPayload after a failed attempt)
also a CONFIG GET and INFO during rdb loading would have lied
When loading rdb from the network, don't kill the server on short read (that can be a network error)
Fix rdb check when performed on preamble AOF
tests:
run replication tests for diskless slave too
make replication test a bit more aggressive
Add test for diskless load swapdb
Fix#5790 and 5878.
Maybe a better option was to have such fields named with the first
byte '%' as those are info fields for specification, however now to
break it in a backward incompatible way is not an option, so let's use
the fields actively to provide info when sensible, otherwise ignore
when they are not really helpful.
RESTORE now supports:
1. Setting LRU/LFU
2. Absolute-time TTL
Other related changes:
1. RDB loading will not override LRU bits when RDB file
does not contain the LRU opcode.
2. RDB loading will not set LRU/LFU bits if the server's
maxmemory-policy does not match.
This way we let big endian systems to still load old RDB versions.
However newver versions will be saved and loaded in a way that make RDB
expires cross-endian again. Thanks to @oranagra for the reporting and
the discussion about this problem, leading to this fix.
Again thanks to @oranagra. The object idle time does not fit into an int
sometimes: use the native type that the serialization function will get
as argument, which is uint64_t.
The AOF tail of a combined RDB+AOF is based on the premise of applying
the AOF commands to the exact state that there was in the server while
the RDB was persisted. By expiring keys while loading the RDB file, we
change the state, so applying the AOF tail later may change the state.
Test case:
* Time1: SET a 10
* Time2: EXPIREAT a $time5
* Time3: INCR a
* Time4: PERSIT A. Start bgrewiteaof with RDB preamble. The value of a is 11 without expire time.
* Time5: Restart redis from the RDB+AOF: consistency violation.
Thanks to @soloestoy for providing the patch.
Thanks to @trevor211 for the original issue report and the initial fix.
Check issue #4950 for more info.
Some times it was not released on error, sometimes it was released two
times because the error path expected the "di" var to be NULL if the
iterator was already released. Thanks to @oranagra for pinging me about
potential problems of this kind inside rdb.c.