redict/tests/integration
Itamar Haber c81c7f51c3
Add stream consumer group lag tracking and reporting (#9127)
Adds the ability to track the lag of a consumer group (CG), that is, the number
of entries yet-to-be-delivered from the stream.

The proposed constant-time solution is in the spirit of "best-effort."

Partially addresses #8737.

## Description of approach

We add a new "entries_added" property to the stream. This starts at 0 for a new
stream and is incremented by 1 with every `XADD`.  It is essentially an all-time
counter of the entries added to the stream.

Given the stream's length and this counter value, we can trivially find the logical
"entries_added" counter of the first ID if and only if the stream is contiguous.
A fragmented stream contains one or more tombstones generated by `XDEL`s.
The new "xdel_max_id" stream property tracks the latest tombstone.

The CG also tracks its last delivered ID's as an "entries_read" counter and
increments it independently when delivering new messages, unless the this
read counter is invalid (-1 means invalid offset). When the CG's counter is
available, the reported lag is the difference between added and read counters.

Lastly, this also adds a "first_id" field to the stream structure in order to make
looking it up cheaper in most cases.

## Limitations

There are two cases in which the mechanism isn't able to track the lag.
In these cases, `XINFO` replies with `null` in the "lag" field.

The first case is when a CG is created with an arbitrary last delivered ID,
that isn't "0-0", nor the first or the last entries of the stream. In this case,
it is impossible to obtain a valid read counter (short of an O(N) operation).
The second case is when there are one or more tombstones fragmenting
the stream's entries range.

In both cases, given enough time and assuming that the consumers are
active (reading and lacking) and advancing, the CG should be able to
catch up with the tip of the stream and report zero lag.
Once that's achieved, lag tracking would resume as normal (until the
next tombstone is set).

## API changes

* `XGROUP CREATE` added with the optional named argument `[ENTRIESREAD entries-read]`
  for explicitly specifying the new CG's counter.
* `XGROUP SETID` added with an optional positional argument `[ENTRIESREAD entries-read]`
  for specifying the CG's counter.
* `XINFO` reports the maximal tombstone ID, the recorded first entry ID, and total
  number of entries added to the stream.
* `XINFO` reports the current lag and logical read counter of CGs.
* `XSETID` is an internal command that's used in replication/aof. It has been added with
  the optional positional arguments `[ENTRIESADDED entries-added] [MAXDELETEDID max-deleted-entry-id]`
  for propagating the CG's offset and maximal tombstone ID of the stream.

## The generic unsolved problem

The current stream implementation doesn't provide an efficient way to obtain the
approximate/exact size of a range of entries. While it could've been nice to have
that ability (#5813) in general, let alone specifically in the context of CGs, the risk
and complexities involved in such implementation are in all likelihood prohibitive.

## A refactoring note

The `streamGetEdgeID` has been refactored to accommodate both the existing seek
of any entry as well as seeking non-deleted entries (the addition of the `skip_tombstones`
argument). Furthermore, this refactoring also migrated the seek logic to use the
`streamIterator` (rather than `raxIterator`) that was, in turn, extended with the
`skip_tombstones` Boolean struct field to control the emission of these.

Co-authored-by: Guy Benoish <guy.benoish@redislabs.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-02-23 22:34:58 +02:00
..
aof-multi-part.tcl fix return value of loadAppendOnlyFiles (#10295) 2022-02-22 08:59:23 +02:00
aof-race.tcl Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788) 2022-01-03 19:14:13 +02:00
aof.tcl Adapt redis-check-aof tool for Multi Part Aof (#10061) 2022-02-17 08:13:28 +02:00
block-repl.tcl Improve test suite to handle external servers better. (#9033) 2021-06-09 15:13:24 +03:00
convert-ziplist-hash-on-load.tcl Replace all usage of ziplist with listpack for t_hash (#8887) 2021-08-10 09:18:49 +03:00
convert-ziplist-zset-on-load.tcl Replace all usage of ziplist with listpack for t_zset (#9366) 2021-09-09 18:18:53 +03:00
convert-zipmap-hash-on-load.tcl Replace all usage of ziplist with listpack for t_hash (#8887) 2021-08-10 09:18:49 +03:00
corrupt-dump-fuzzer.tcl Fix false positive leak reported by GCC ASAN (#9816) 2021-11-21 18:47:10 +02:00
corrupt-dump.tcl Add stream consumer group lag tracking and reporting (#9127) 2022-02-23 22:34:58 +02:00
dismiss-mem.tcl Add external test that runs without debug command (#9964) 2021-12-19 17:41:51 +02:00
failover.tcl Improve test suite to handle external servers better. (#9033) 2021-06-09 15:13:24 +03:00
logging.tcl Fix timing issue in logging.tcl with FreeBSD (#9910) 2021-12-07 12:02:58 +02:00
psync2-master-restart.tcl Wait for replicas when shutting down (#9872) 2022-01-02 09:50:15 +02:00
psync2-pingoff.tcl Fix race condition in psync2-pingoff test (#9712) 2021-11-01 16:07:08 +02:00
psync2-reg.tcl Improve test suite to handle external servers better. (#9033) 2021-06-09 15:13:24 +03:00
psync2.tcl Remove EVAL script verbatim replication, propagation, and deterministic execution logic (#9812) 2021-12-21 08:32:42 +02:00
rdb.tcl Solve race in a BGSAVE test (#10190) 2022-01-26 19:46:02 +02:00
redis-benchmark.tcl Added URI support to redis-benchmark (cli and benchmark share the same uri-parsing methods) (#9314) 2021-09-14 19:45:06 +03:00
redis-cli.tcl Function Flags support (no-writes, no-cluster, allow-state, allow-oom) (#10066) 2022-01-14 14:02:02 +02:00
replication-2.tcl Set repl-diskless-sync to yes by default, add repl-diskless-sync-max-replicas (#10092) 2022-01-17 14:11:11 +02:00
replication-3.tcl Remove EVAL script verbatim replication, propagation, and deterministic execution logic (#9812) 2021-12-21 08:32:42 +02:00
replication-4.tcl Add check min-slave-* feature when evaluating Lua scripts and Functions (#10160) 2022-02-03 11:57:51 +02:00
replication-buffer.tcl Set repl-diskless-sync to yes by default, add repl-diskless-sync-max-replicas (#10092) 2022-01-17 14:11:11 +02:00
replication-psync.tcl Improve test suite to handle external servers better. (#9033) 2021-06-09 15:13:24 +03:00
replication.tcl fix "Connect multiple replicas at the same time" test (#10294) 2022-02-14 08:46:58 +02:00
shutdown.tcl Wait for replicas when shutting down (#9872) 2022-01-02 09:50:15 +02:00