The problem is that WAITAOF could have hang in case commands were
propagated only to replicas.
This can happen if a module uses RM_Call with the REDISMODULE_ARGV_NO_AOF flag.
In that case, master_repl_offset would increase, but there would be nothing to fsync, so
in the absence of other traffic, fsynced_reploff_pending would stay the static, and WAITAOF can hang.
This commit updates fsynced_reploff_pending to the latest offset in flushAppendOnlyFile in case
there's nothing to fsync. i.e. in case it's behind because of the above mentions case it'll be refreshed
and release the WAITAOF.
Other changes:
Fix a race in wait.tcl (client getting blocked vs. the fsync thread)
If we set `fsynced_reploff_pending` in `startAppendOnly`, and the fork doesn't start
immediately (e.g. there's another fork active at the time), any subsequent commands
will increment `server.master_repl_offset`, but will not cause a fsync (given they were
executed before the fork started, they just ended up in the RDB part of it)
Therefore, any WAITAOF will wait on the new master_repl_offset, but it will time out
because no fsync will be executed.
Release notes:
```
WAITAOF could timeout in the absence of write traffic in case a new AOF is created and
an AOFRW can't immediately start.
This can happen by the appendonly config is changed at runtime, but also after FLUSHALL,
and replica full sync.
```
The MacOS CI in github actions often hangs without any logs. GH argues that
it's due to resource utilization, either running out of disk space, memory, or CPU
starvation, and thus the runner is terminated.
This PR contains multiple attempts to resolve this:
1. introducing pause_process instead of SIGSTOP, which waits for the process
to stop before resuming the test, possibly resolving race conditions in some tests,
this was a suspect since there was one test that could result in an infinite loop in that
case, in practice this didn't help, but still a good idea to keep.
2. disable the `save` config in many tests that don't need it, specifically ones that use
heavy writes and could create large files.
3. change the `populate` proc to use short pipeline rather than an infinite one.
4. use `--clients 1` in the macos CI so that we don't risk running multiple resource
demanding tests in parallel.
5. enable `--verbose` to be repeated to elevate verbosity and print more info to stdout
when a test or a server starts.
This PR fix several unrelated bugs that were discovered by the same set of tests
(WAITAOF tests in #11713), could make the `WAITAOF` test hang.
The change in `backgroundRewriteDoneHandler` is about MP-AOF.
That leftover / old code assumes that we started a new AOF file just now
(when we have a new base into which we're gonna incrementally write), but
the fact is that with MP-AOF, the fork done handler doesn't really affect the
incremental file being maintained by the parent process, there's no reason to
re-issue `SELECT`, and no reason to update any of the fsync variables in that flow.
This should have been deleted with MP-AOF (introduced in #9788, 7.0).
The damage is that the update to `aof_fsync_offset` will cause us to miss an fsync
in `flushAppendOnlyFile`, that happens if we stop write commands in `AOF_FSYNC_EVERYSEC`
while an AOFRW is in progress. This caused a new `WAITAOF` test to sometime hang forever.
Also because of MP-AOF, we needed to change `aof_fsync_offset` to `aof_last_incr_fsync_offset`
and match it to `aof_last_incr_size` in `flushAppendOnlyFile`. This is because in the past we compared
`aof_fsync_offset` and `aof_current_size`, but with MP-AOF it could be the total AOF file will be
smaller after AOFRW, and the (already existing) incr file still has data that needs to be fsynced.
The change in `flushAppendOnlyFile`, about the `AOF_FSYNC_ALWAYS`, it is follow #6053
(the details is in #5985), we also check `AOF_FSYNC_ALWAYS` to handle a case where
appendfsync is changed from everysec to always while there is data that's written but not yet fsynced.
There be a situation that satisfies WAIT, and then wrongly unblock
WAITAOF because we mix-use last_offset and last_numreplicas.
We update last_offset and last_numreplicas only when the condition
matches. i.e. output of either replicationCountAOFAcksByOffset or
replicationCountAcksByOffset is right.
In this case, we need to have separate last_ variables for each of
them. Added a last_aof_offset and last_aof_numreplicas for WAITAOF.
WAITAOF was added in #11713. Found while coding #11917.
A Test was added to validate that case.
WAITAOF wad added in #11713, its return is an array.
But forget to handle WAITAOF in last_offset and last_numreplicas,
causing WAITAOF to return a WAIT like reply.
Tests was added to validate that case (both WAIT and WAITAOF).
This PR also refactored processClientsWaitingReplicas a bit for better
maintainability and readability.
Implementing the WAITAOF functionality which would allow the user to
block until a specified number of Redises have fsynced all previous write
commands to the AOF.
Syntax: `WAITAOF <num_local> <num_replicas> <timeout>`
Response: Array containing two elements: num_local, num_replicas
num_local is always either 0 or 1 representing the local AOF on the master.
num_replicas is the number of replicas that acknowledged the a replication
offset of the last write being fsynced to the AOF.
Returns an error when called on replicas, or when called with non-zero
num_local on a master with AOF disabled, in all other cases the response
just contains number of fsync copies.
Main changes:
* Added code to keep track of replication offsets that are confirmed to have
been fsynced to disk.
* Keep advancing master_repl_offset even when replication is disabled (and
there's no replication backlog, only if there's an AOF enabled).
This way we can use this command and it's mechanisms even when replication
is disabled.
* Extend REPLCONF ACK to `REPLCONF ACK <ofs> FACK <ofs>`, the FACK
will be appended only if there's an AOF on the replica, and already ignored on
old masters (thus backwards compatible)
* WAIT now no longer wait for the replication offset after your last command, but
rather the replication offset after your last write (or read command that caused
propagation, e.g. lazy expiry).
Unrelated changes:
* WAIT command respects CLIENT_DENY_BLOCKING (not just CLIENT_MULTI)
Implementation details:
* Add an atomic var named `fsynced_reploff_pending` that's updated
(usually by the bio thread) and later copied to the main `fsynced_reploff`
variable (only if the AOF base file exists).
I.e. during the initial AOF rewrite it will not be used as the fsynced offset
since the AOF base is still missing.
* Replace close+fsync bio job with new BIO_CLOSE_AOF (AOF specific)
job that will also update fsync offset the field.
* Handle all AOF jobs (BIO_CLOSE_AOF, BIO_AOF_FSYNC) in the same bio
worker thread, to impose ordering on their execution. This solves a
race condition where a job could set `fsynced_reploff_pending` to a higher
value than another pending fsync job, resulting in indicating an offset
for which parts of the data have not yet actually been fsynced.
Imposing an ordering on the jobs guarantees that fsync jobs are executed
in increasing order of replication offset.
* Drain bio jobs when switching `appendfsync` to "always"
This should prevent a write race between updates to `fsynced_reploff_pending`
in the main thread (`flushAppendOnlyFile` when set to ALWAYS fsync), and
those done in the bio thread.
* Drain the pending fsync when starting over a new AOF to avoid race conditions
with the previous AOF offsets overriding the new one (e.g. after switching to
replicate from a new master).
* Make sure to update the fsynced offset at the end of the initial AOF rewrite.
a must in case there are no additional writes that trigger a periodic fsync,
specifically for a replica that does a full sync.
Limitations:
It is possible to write a module and a Lua script that propagate to the AOF and doesn't
propagate to the replication stream. see REDISMODULE_ARGV_NO_REPLICAS and luaRedisSetReplCommand.
These features are incompatible with the WAITAOF command, and can result
in two bad cases. The scenario is that the user executes command that only
propagates to AOF, and then immediately
issues a WAITAOF, and there's no further writes on the replication stream after that.
1. if the the last thing that happened on the replication stream is a PING
(which increased the replication offset but won't trigger an fsync on the replica),
then the client would hang forever (will wait for an fack that the replica will never
send sine it doesn't trigger any fsyncs).
2. if the last thing that happened is a write command that got propagated properly,
then WAITAOF will be released immediately, without waiting for an fsync (since
the offset didn't change)
Refactoring:
* Plumbing to allow bio worker to handle multiple job types
This introduces infrastructure necessary to allow BIO workers to
not have a 1-1 mapping of worker to job-type. This allows in the
future to assign multiple job types to a single worker, either as
a performance/resource optimization, or as a way of enforcing
ordering between specific classes of jobs.
Co-authored-by: Oran Agra <oran@redislabs.com>
Refine getTimeoutFromObjectOrReply() out-of-range check.
Timeout is parsed (and verifies out of range) as double and
multiplied by 1000, added mstime() and stored in long-long
which might lead to out-of-range value of long-long.
Co-authored-by: moticless <moticless@github.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>
This commit revives the improves the ability to run the test suite against
external servers, instead of launching and managing `redis-server` processes as
part of the test fixture.
This capability existed in the past, using the `--host` and `--port` options.
However, it was quite limited and mostly useful when running a specific tests.
Attempting to run larger chunks of the test suite experienced many issues:
* Many tests depend on being able to start and control `redis-server` themselves,
and there's no clear distinction between external server compatible and other
tests.
* Cluster mode is not supported (resulting with `CROSSSLOT` errors).
This PR cleans up many things and makes it possible to run the entire test suite
against an external server. It also provides more fine grained controls to
handle cases where the external server supports a subset of the Redis commands,
limited number of databases, cluster mode, etc.
The tests directory now contains a `README.md` file that describes how this
works.
This commit also includes additional cleanups and fixes:
* Tests can now be tagged.
* Tag-based selection is now unified across `start_server`, `tags` and `test`.
* More information is provided about skipped or ignored tests.
* Repeated patterns in tests have been extracted to common procedures, both at a
global level and on a per-test file basis.
* Cleaned up some cases where test setup was based on a previous test executing
(a major anti-pattern that repeats itself in many places).
* Cleaned up some cases where test teardown was not part of a test (in the
future we should have dedicated teardown code that executes even when tests
fail).
* Fixed some tests that were flaky running on external servers.
This adds basic coverage to IO threads by running the cluster and few selected Redis test suite tests with the IO threads enabled.
Also provides some necessary additional improvements to the test suite:
* Add --config to sentinel/cluster tests for arbitrary configuration.
* Fix --tags whitelisting which was broken.
* Add a `network` tag to some tests that are more network intensive. This is work in progress and more tests should be properly tagged in the future.
This fixes three issues:
1. Using debug SLEEP was impacting the subsequent test, and causing it to pass reliably even though it should have failed. There was exactly 5 seconds of artificial pause (after 1000, wait 3000, wait 1000) between the debug sleep 5 and when we needed to unblock the client in the subsequent test. Now the test properly makes sure the client is unblocked, and the subsequent test is fixed.
2. Minor, the client pause types were using & comparisons instead of ==, since it was previously a flag.
3. Test is faster now that some of the hand wavy time is removed.
* Introduce a connection abstraction layer for all socket operations and
integrate it across the code base.
* Provide an optional TLS connections implementation based on OpenSSL.
* Pull a newer version of hiredis with TLS support.
* Tests, redis-cli updates for TLS support.