Fixes:
- When a consumer is created as a side effect, redis didn't issue a keyspace notification,
nor incremented the server.dirty (affects periodic snapshots).
this was a bug in XREADGROUP, XCLAIM, and XAUTOCLAIM.
- When attempting to delete a non-existent consumer, don't issue a keyspace notification
and don't increment server.dirty
this was a bug in XGROUP DELCONSUMER
Other changes:
- Changed streamLookupConsumer() to always only do lookup consumer (never do implicit creation),
Its last seen time is updated unless the SLC_NO_REFRESH flag is specified.
- Added streamCreateConsumer() to create a new consumer. When the creation is successful,
it will notify and dirty++ unless the SCC_NO_NOTIFY or SCC_NO_DIRTIFY flags is specified.
- Changed streamDelConsumer() to always only do delete consumer.
- Added keyspace notifications tests about stream events.
In _quicklistInsert when `at_head` / `at_tail` is true, but `prev` / `next` is NULL,
the code was reaching the last if-else block at the bottom of the function,
and would have unnecessarily executed _quicklistSplitNode, instead of just creating a new node.
This was because the penultimate if-else was checking `node->next && full_next`.
but in fact it was unnecessary to check if `node->next` exists, if we're gonna create one anyway,
we only care that it's not full, or doesn't exist, so the condition could have been changed to `!node->next || full_next`.
Instead, this PR makes a small refactory to negate `full_next` to a more meaningful variable
`avail_next` that indicates that the next node is available for pushing additional elements or
not (this would be true only if it exists and it is non-full)
With an empty src key, we need to deal with two situations:
1. non-STORE: We should return emptyarray.
2. STORE: Try to delete the store key and return 0.
This applies to both GEOSEARCHSTORE (new to v6.2), and
also GEORADIUS STORE (which was broken since forever)
This pr try to fix#9261. i.e. both STORE variants would have behaved
like the non-STORE variants when the source key was missing,
returning an empty array and not deleting the destination key,
instead of returning 0, and deleting the destination key.
Also add more tests for some commands.
- GEORADIUS: wrong type src key, non existing src key, empty search,
store with non existing src key, store with empty search
- GEORADIUSBYMEMBER: wrong type src key, non existing src key,
non existing member, store with non existing src key
- GEOSEARCH: wrong type src key, non existing src key, empty search,
frommember with non existing member
- GEOSEARCHSTORE: wrong type key, non existing src key,
fromlonlat with empty search, frommember with non existing member
Co-authored-by: Oran Agra <oran@redislabs.com>
In some cases large replies on slow systems may only be partially read
by the test suite, resulting with parsing errors.
This fix is still timing sensitive but should greatly reduce the chances
of this happening.
The issue is that when a sentinel with the same address and IP is turned on with a different runid, its port is set to 0 but it is still present in the dictionary master->sentinels which contain all the sentinels for a master.
This causes a problem when we do INFO SENTINEL because it takes the size of the dictionary of sentinels. This might also cause a problem for failover if enough sentinels have their port set to 0 since the number of voters in failover is also determined by the size of the dictionary of sentinels.
This commits removes the sentinels with the port set to zero from the dictionary of sentinels.
Fixes#8786
The `lru_clock` and `lru` bits in `robj` save the least significant 24 bits of the unixtime (seconds since 1/1/1970),
and wrap around every 194 days.
The `objectSetLRUOrLFU` function, which is used in RESTORE with IDLETIME argument, and also in replica
or master loading an RDB that contains LRU, and by a module API had a bug that's triggered when that happens.
The scenario was that the idle time that came from the user, let's say RESTORE command is about 1000 seconds
(e.g. in the `RESTORE can set LRU` test we have), and the current `lru_clock` just wrapped around and is less than
1000 (i.e. a period of 1000 seconds once in some 6 months), the expression in that function would produce a negative
value and the code (and comment) specified that the best way to solve that is push the idle time backwards into the
past by 3 months. i.e. an idle time of 3 months instead of 1000 seconds.
instead, the right thing to do is to unwrap it, and put it near LRU_CLOCK_MAX. since now `lru_clock` is smaller than
`obj->lru` it will be unwrapped again by `estimateObjectIdleTime`.
bug was introduced by 052e03495f, but the code before it also seemed wrong.
Add two INFO metrics:
```
total_eviction_exceeded_time:69734
current_eviction_exceeded_time:10230
```
`current_eviction_exceeded_time` if greater than 0, means how much time current used memory is greater than `maxmemory`. And we are still over the maxmemory. If used memory is below `maxmemory`, this metric is reset to 0.
`total_eviction_exceeded_time` means total time used memory is greater than `maxmemory` since server startup.
The units of these two metrics are ms.
Co-authored-by: Oran Agra <oran@redislabs.com>
This fixes an issue with zslGetRank which will happen only if the
skiplist data stracture is added two entries with the same element name,
this can't happen in redis zsets (we use dict), but in theory this is a
bug in the underlaying skiplist code.
Fixes#3081 and #4032
Co-authored-by: minjian.cai <cmjgithub@163.com>
GETBIT, SETBIT may access wrong address because of wrap.
BITCOUNT and BITPOS may return wrapped results.
BITFIELD may access the wrong address but also allocate insufficient memory and segfault (see CVE-2021-32761).
This commit uses `uint64_t` or `long long` instead of `size_t`.
related https://github.com/redis/redis/pull/8096
At 32bit platform:
> setbit bit 4294967295 1
(integer) 0
> config set proto-max-bulk-len 536870913
OK
> append bit "\xFF"
(integer) 536870913
> getbit bit 4294967296
(integer) 0
When the bit index is larger than 4294967295, size_t can't hold bit index. In the past, `proto-max-bulk-len` is limit to 536870912, so there is no problem.
After this commit, bit position is stored in `uint64_t` or `long long`. So when `proto-max-bulk-len > 536870912`, 32bit platforms can still be correct.
For 64bit platform, this problem still exists. The major reason is bit pos 8 times of byte pos. When proto-max-bulk-len is very larger, bit pos may overflow.
But at 64bit platform, we don't have so long string. So this bug may never happen.
Additionally this commit add a test cost `512MB` memory which is tag as `large-memory`. Make freebsd ci and valgrind ci ignore this test.
- SELECT and WAIT don't read or write from the keyspace (unlike DEL, EXISTS, EXPIRE, DBSIZE, KEYS, etc).
they're more similar to AUTH and HELLO (and maybe PING and COMMAND).
they only affect the current connection, not the server state, so they should be `@connection`, not `@keyspace`
- ROLE, like LASTSAVE is `@admin` (and `@dangerous` like INFO)
- ASKING, READONLY, READWRITE are `@connection` too (not `@keyspace`)
- Additionally, i'm now documenting the exact meaning of each ACL category so it's clearer which commands belong where.
Fix module info genModulesInfoStringRenderModulesList lack separator when there's more than one module in the list.
Co-authored-by: Oran Agra <oran@redislabs.com>
- promote the code in DEBUG PROTOCOL to addReplyBigNum
- DEBUG PROTOCOL ATTRIB skips the attribute when client is RESP2
- networking.c addReply for push and attributes generate assertion when
called on a RESP2 client, anything else would produce a broken
protocol that clients can't handle.
There are two issues fixed in this commit:
1. we want to fail the EXEC command in case there is a watched key that's logically
expired but not yet deleted by active expire or lazy expire.
2. we saw that currently cache time is update in every `call()` (including nested calls),
this time is being also being use for the isKeyExpired comparison, we want to update
the cache time only in the first call (execCommand)
Co-authored-by: Oran Agra <oran@redislabs.com>
In aof rewrite, when parent stop sending data to child, if there is
new rewrite data, aofChildWriteDiffData write event will be installed.
Then this event is issued and deletes the file event without do anyting.
This will happen over and over again until aof rewrite finish.
This bug used to waste a few system calls per excessive wake-up
(epoll_ctl and epoll_wait) per cycle, each cycle triggered by receiving
a write command from a client.
The if judgement `nextdiff == -4 && reqlen < 4` in __ziplistInsert.
It's strange, but it's useful. Without it there will be problems during
chain update.
Till now these lines didn't have coverage in the tests, and there was
a question if they are at all needed (#7170)
redis-check-aof/redis-check-rdb.
Related to #9176. Before this commit, redis-server starts as
redis-check-aof/redis-check-rdb if the directory it is started from
contains the string redis-check-aof/redis-check-rdb. We check the
executable name instead of directory.
1. redis-cli can output --rdb data to stdout
but redis-cli also write some messages to stdout which will mess up the rdb.
2. Make redis-cli flush stdout when printing a reply
This was needed in order to fix a hung in redis-cli test that uses
--replica.
Note that printf does flush when there's a newline, but fwrite does not.
3. fix the redis-cli --replica test which used to pass previously
because it didn't really care what it read, and because redis-cli
used printf to print these other things to stdout.
4. improve redis-cli --replica test to run with both diskless and disk-based.
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Viktor Söderqvist <viktor@zuiderkwast.se>
Currently a replica is able to recover from a short read (when diskless loading
is enabled) and avoid crashing/exiting, replying to the master and then the rdb
could be sent again by the master for another load attempt by the replica.
There were a few scenarios that were not behaving similarly, such as when
there is no end-of-file marker, or when module aux data failed to load, which
should be allowed to occur due to a short read.
due to a copy-paste bug, it used to reply with null response rather than empty array.
this commit includes new tests that are looking at the RESP response directly in
order to be able to tell the difference between them.
Co-authored-by: Oran Agra <oran@redislabs.com>
This reduces system calls on linux when a new connection is made / accepted.
Changes:
* Add the SOCK_CLOEXEC option to the accept4() call
This ensure that a fork/exec call does not leak a file descriptor.
* Move anetCloexec and connNonBlock info anetGenericAccept
* Moving connNonBlock from accept handlers to anetGenericAccept
Moving connNonBlock from createClient, is safe because createClient is
used in the following ways:
1. without a connection (fake client)
2. on an accepted connection (see above)
3. creating the master client by using connConnect (see below)
The third case, can either use anetTcpNonBlockConnect, or connTLSConnect
which is by default non-blocking.
Co-authored-by: Rajiv Kurian <geetasen@gmail.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Yoav Steinberg <yoav@redislabs.com>
when tracking the peak, don't reset the peak to 0, reset it to the
maximum of the current used, and the planned to be used by the current
arg.
when shrining, split the two separate conditions.
the idle time shrinking will remove all free space.
but the peak based shrinking will keep room for the current arg.
when we resize due to a peak (rahter than idle time), don't trim all
unused space, let the qbuf keep a size that's sufficient for the
currently process bulklen, and the current peak.
Co-authored-by: sundb <sundbcn@gmail.com>
Co-authored-by: yoav-steinberg <yoav@monfort.co.il>
1. querybuf_peak has not been updated correctly in readQueryFromClient.
2. qbuf shrinking uses sdsalloc instead of sdsAllocSize
see more details in issue #4983