The quicklist takes a cached version of the ziplist representation size
in bytes. The implementation must update this length every time the
underlying ziplist changes. However quicklistReplaceAtIndex() failed to
fix the length.
During LSET calls, the size of the ziplist blob and the cached size
inside the quicklist diverged. Later, when this size is used in an
authoritative way, for example during nodes splitting in order to copy
the nodes, we end with a duplicated node that may contain random
garbage.
This commit should fix issue #3343, however several problems were found
reviewing the quicklist.c code in search of this bug that should be
addressed soon or later.
For example:
1. To take a cached ziplist length is fragile since failing to update it
leads to this kind of issues.
2. The node splitting code needs auditing. For example it works just for
a side effect of ziplistDeleteRange() to be able to cope with a wrong
count of elements to remove. The code inside quicklist.c assumes that
-1 means "delete till the end" while actually it's just a count of how
many elements to delete, and is an unsigned count. So -1 gets converted
into the maximum integer, and just by chance the ziplist code stops
deleting elements after there are no more to delete.
3. Node splitting is extremely inefficient, it copies the node and
removes elements from both nodes even when actually there is to move a
single entry from one node to the other, or when the new resulting node
is empty at all so there is nothing to copy but just to create a new
node.
However at least for Redis 3.2 to introduce fresh code inside
quicklist.c may be even more risky, so instead I'm writing a better
fuzzy tester to stress the internals a bit more in order to anticipate
other possible bugs.
This bug was found using a fuzzy tester written after having some clue
about where the bug could be. The tester eventually created a ~2000
commands sequence able to always crash Redis. I wrote a better version
of the tester that searched for the smallest sequence that could crash
Redis automatically. Later this smaller sequence was minimized by
removing random commands till it still crashed the server. This resulted
into a sequence of 7 commands. With this small sequence it was just a
matter of filling the code with enough printf() to understand enough
state to fix the bug.
Let user set how many nodes to *not* compress.
We can specify a compression "depth" of how many nodes
to leave uncompressed on each end of the quicklist.
Depth 0 = disable compression.
Depth 1 = only leave head/tail uncompressed.
- (read as: "skip 1 node on each end of the list before compressing")
Depth 2 = leave head, head->next, tail->prev, tail uncompressed.
- ("skip 2 nodes on each end of the list before compressing")
Depth 3 = Depth 2 + head->next->next + tail->prev->prev
- ("skip 3 nodes...")
etc.
This also:
- updates RDB storage to use native quicklist compression (if node is
already compressed) instead of uncompressing, generating the RDB string,
then re-compressing the quicklist node.
- internalizes the "fill" parameter for the quicklist so we don't
need to pass it to _every_ function. Now it's just a property of
the list.
- allows a runtime-configurable compression option, so we can
expose a compresion parameter in the configuration file if people
want to trade slight request-per-second performance for up to 90%+
memory savings in some situations.
- updates the quicklist tests to do multiple passes: 200k+ tests now.
Fill factor now has two options:
- negative (1-5) for size-based ziplist filling
- positive for length-based ziplist filling with implicit size cap.
Negative offsets define ziplist size limits of:
-1: 4k
-2: 8k
-3: 16k
-4: 32k
-5: 64k
Positive offsets now automatically limit their max size to 8k. Any
elements larger than 8k will be in individual nodes.
Positive ziplist fill factors will keep adding elements
to a ziplist until one of:
- ziplist has FILL number of elements
- or -
- ziplist grows above our ziplist max size (currently 8k)
When using positive fill factors, if you insert a large
element (over 8k), that element will automatically allocate
an individual quicklist node with one element and no other elements will be
in the same ziplist inside that quicklist node.
When using negative fill factors, elements up to the size
limit can be added to one quicklist node. If an element
is added larger than the max ziplist size, that element
will be allocated an individual ziplist in a new quicklist node.
Tests also updated to start testing at fill factor -5.
This started out as #2158 by sunheehnus, but I kept rewriting it
until I could understand things more easily and get a few more
correctness guarantees out of the readability flow.
The original commit created and returned a new ziplist with the contents of
both input ziplists, but I prefer to grow one of the input ziplists
and destroy the other one.
So, instead of malloc+copy as in #2158, the merge now reallocs one of
the existing ziplists and copies the other ziplist into the new space.
Also added merge test cases to ziplistTest()
This replaces individual ziplist vs. linkedlist representations
for Redis list operations.
Big thanks for all the reviews and feedback from everybody in
https://github.com/antirez/redis/pull/2143