Fix test failure which introduced by #9003.
The following case will occur when querybuf expansion will allocate memory equal to (16*1024)k.
1) make use ```CFLAGS=-DNO_MALLOC_USABLE_SIZE```.
2) ```malloc``` will not allocate more under ```alpine```.
Create new module type enhanced callbacks: mem_usage2, free_effort2, unlink2, copy2.
These will be given a context point from which the module can obtain the key name and database id.
In addition the digest and defrag context can now be used to obtain the key name and database id.
When using RESP3, ZPOPMAX/ZPOPMIN should return nested arrays for consistency
with other commands (e.g. ZRANGE).
We do that only when COUNT argument is present (similarly to how LPOP behaves).
for reasoning see https://github.com/redis/redis/issues/8824#issuecomment-855427955
This is a breaking change only when RESP3 is used, and COUNT argument is present!
The initialize memory of `querybuf` is `PROTO_IOBUF_LEN(1024*16) * 2` (due to sdsMakeRoomFor being greedy), under `jemalloc`, the allocated memory will be 40k.
This will most likely result in the `querybuf` being resized when call `clientsCronResizeQueryBuffer` unless the client requests it fast enough.
Note that this bug existed even before #7875, since the condition for resizing includes the sds headers (32k+6).
## Changes
1. Use non-greedy sdsMakeRoomFor when allocating the initial query buffer (of 16k).
1. Also use non-greedy allocation when working with BIG_ARG (we won't use that extra space anyway)
2. in case we did use a greedy allocation, read as much as we can into the buffer we got (including internal frag), to reduce system calls.
3. introduce a dedicated constant for the shrinking (same value as before)
3. Add test for querybuf.
4. improve a maxmemory test by ignoring the effect of replica query buffers (can accumulate many ACKs on slow env)
5. improve a maxmemory by disabling slowlog (it will cause slight memory growth on slow env).
Today when we load the AOF on startup, the loadAppendOnlyFile checks if
the file is openning for reading.
This check is redundent (dead code) as we open the AOF file for writing at initServer,
and the file will always be existing for the loadAppendOnlyFile.
In this commit:
- remove all the exit(1) from loadAppendOnlyFile, as it is the caller
responsibility to decide what to do in case of failure.
- move the opening of the AOF file for writing, to be after we loading it.
- avoid return -ERR in DEBUG LOADAOF, when the AOF is existing but empty
SINTERSTORE would have deleted the dest key right away,
even when later on it is bound to fail on an (WRONGTYPE) error.
With this change it first picks up all the input keys, and only later
delete the dest key if one is empty.
Also add more tests for some commands.
Mainly focus on
- `wrong type error`:
expand test case (base on sinter bug) in non-store variant
add tests for store variant (although it exists in non-store variant, i think it would be better to have same tests)
- the dstkey result when we meet `non-exist key (empty set)` in *store
sdiff:
- improve test case about wrong type error (the one we found in sinter, although it is safe in sdiff)
- add test about using non-exist key (treat it like an empty set)
sdiffstore:
- according to sdiff test case, also add some tests about `wrong type error` and `non-exist key`
- the different is that in sdiffstore, we will consider the `dstkey` result
sunion/sunionstore add more tests (same as above)
sinter/sinterstore also same as above ...
The root cause is that one test (`5 keys in, 5 keys out`) is leaking a volatile key
that can expire while another later test(`All TTL in commands are propagated
as absolute timestamp in replication stream`) is running.
Such leaked expiration injects an unexpected `DEL` command into the
replication command during the later test, causing it to fail.
The fixes are two fold:
1. Plug the leak in the first test.
2. Add FLUSHALL to the later test, to avoid future interference from other tests.
This PR adds a spell checker CI action that will fail future PRs if they introduce typos and spelling mistakes.
This spell checker is based on blacklist of common spelling mistakes, so it will not catch everything,
but at least it is also unlikely to cause false positives.
Besides that, the PR also fixes many spelling mistakes and types, not all are a result of the spell checker we use.
Here's a summary of other changes:
1. Scanned the entire source code and fixes all sorts of typos and spelling mistakes (including missing or extra spaces).
2. Outdated function / variable / argument names in comments
3. Fix outdated keyspace masks error log when we check `config.notify-keyspace-events` in loadServerConfigFromString.
4. Trim the white space at the end of line in `module.c`. Check: https://github.com/redis/redis/pull/7751
5. Some outdated https link URLs.
6. Fix some outdated comment. Such as:
- In README: about the rdb, we used to said create a `thread`, change to `process`
- dbRandomKey function coment (about the dictGetRandomKey, change to dictGetFairRandomKey)
- notifyKeyspaceEvent fucntion comment (add type arg)
- Some others minor fix in comment (Most of them are incorrectly quoted by variable names)
7. Modified the error log so that users can easily distinguish between TCP and TLS in `changeBindAddr`
This commit revives the improves the ability to run the test suite against
external servers, instead of launching and managing `redis-server` processes as
part of the test fixture.
This capability existed in the past, using the `--host` and `--port` options.
However, it was quite limited and mostly useful when running a specific tests.
Attempting to run larger chunks of the test suite experienced many issues:
* Many tests depend on being able to start and control `redis-server` themselves,
and there's no clear distinction between external server compatible and other
tests.
* Cluster mode is not supported (resulting with `CROSSSLOT` errors).
This PR cleans up many things and makes it possible to run the entire test suite
against an external server. It also provides more fine grained controls to
handle cases where the external server supports a subset of the Redis commands,
limited number of databases, cluster mode, etc.
The tests directory now contains a `README.md` file that describes how this
works.
This commit also includes additional cleanups and fixes:
* Tests can now be tagged.
* Tag-based selection is now unified across `start_server`, `tags` and `test`.
* More information is provided about skipped or ignored tests.
* Repeated patterns in tests have been extracted to common procedures, both at a
global level and on a per-test file basis.
* Cleaned up some cases where test setup was based on a previous test executing
(a major anti-pattern that repeats itself in many places).
* Cleaned up some cases where test teardown was not part of a test (in the
future we should have dedicated teardown code that executes even when tests
fail).
* Fixed some tests that were flaky running on external servers.
Till now GET and NX were mutually exclusive.
This change make their combination mean a "Get or Set" command.
If the key exists it returns the old value and avoids setting,
and if it does't exist it returns nil and sets it to the new value (possibly with expiry time)
The decision to stop trimming due to LIMIT in XADD and XTRIM was after the limit was reached.
i.e. the code was deleting **at least** that count of records (from the LIMIT argument's perspective, not the MAXLEN),
instead of **up to** that count of records.
see #9046
running the "geo" unit would have shown that it completed a unit named
"north". this was because the variable `$name` was overwritten.
This commit isn't perfect, but it slightly reduces the chance for
variable name clash.
```
$ ./runtest --single unit/geo
.......
Testing unit/geo
.......
[1/1 done]: north (15 seconds)
```
The test that was merged yesterday fails with valgrind and freebsd CI
that are too slow, and 10 seconds apparently passed between the time the
command was sent to redis and the time it was actually executed.
```
*** [err]: All TTL in commands are propagated as absolute timestamp in replication stream in tests/unit/expire.tcl
Expected 'del a' to match 'set foo1 bar PXAT *' (context: type source line 778 file /home/runner/work/redis/redis/tests/test_helper.tcl cmd {assert_match [lindex $patterns $j] [read_from_replication_stream $s]} proc ::assert_replication_stream level 1)
```
Till now, on replica full-sync we used to transfer absolute time for TTL,
however when a command arrived (EXPIRE or EXPIREAT),
we used to propagate it as is to replicas (possibly with relative time),
but always translate it to EXPIREAT (absolute time) to AOF.
This commit changes that and will always use absolute time for propagation.
see discussion in #8433
Furthermore, we Introduce new commands: `EXPIRETIME/PEXPIRETIME`
that allow extracting the absolute TTL time from a key.
In diskless replication, we create a read pipe for the RDB, between the child and the parent.
When we close this pipe (fd), the read handler also needs to be removed from the event loop (if it still registered).
Otherwise, next time we will use the same fd, the registration will be fail (panic), because
we will use EPOLL_CTL_MOD (the fd still register in the event loop), on fd that already removed from epoll_ctl
When test stop 'load handler' by killing the process that generating the load,
some commands that already in the input buffer, still might be processed by the server.
This may cause some instability in tests, that count on that no more commands
processed after we stop the `load handler'
In this commit, new proc 'wait_load_handlers_disconnected' added, to verify that no more
cammands from any 'load handler' prossesed, by checking that the clients who
genreate the load is disconnceted.
Also, replacing check of dbsize with wait_for_ofs_sync before comparing debug digest, as
it would fail in case the last key the workload wrote was an overridden key (not a new one).
Affected tests
Race fix:
- failover command to specific replica works
- Connect multiple replicas at the same time (issue #141), master diskless=$mdl, replica diskless=$sdl
- AOF rewrite during write load: RDB preamble=$rdbpre
Cleanup and speedup:
- Test replication with blocking lists and sorted sets operations
- Test replication with parallel clients writing in different DBs
- Test replication partial resync: $descr (diskless: $mdl, $sdl, reconnect: $reconnect
I recently saw this failure:
[err]: lazy free a stream with all types of metadata in tests/unit/lazyfree.tcl
Expected '2' to be equal to '1' (context: type eval line 23 cmd {assert_equal [s lazyfreed_objects] 1} proc ::test)
The only explanation for such a thing is that the async flushdb wasn't
done before we did the resetstat
When client breached the output buffer soft limit but then went idle,
we didn't disconnect on soft limit timeout, now we do.
Note this also resolves some sporadic test failures in due to Linux
buffering data which caused tests to fail if during the test we went
back under the soft COB limit.
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: sundb <sundbcn@gmail.com>
Adding a new type mask for key space notification, REDISMODULE_NOTIFY_MODULE, to enable unique notifications from commands on REDISMODULE_KEYTYPE_MODULE type keys (which is currently unsupported).
Modules can subscribe to a module key keyspace notification by RM_SubscribeToKeyspaceEvents,
and clients by notify-keyspace-events of redis.conf or via the CONFIG SET, with the characters 'd' or 'A'
(REDISMODULE_NOTIFY_MODULE type mask is part of the '**A**ll' notation for key space notifications).
Refactor: move some pubsub test infra from pubsub.tcl to util.tcl to be re-used by other tests.
Before this commit using RM_Call without "!" could cause the master
to lazy-expire a key (delete it) but without replicating to replicas.
This could cause the replica's memory usage to gradually grow and
could also cause consistency issues if the master and replica have
a clock diff.
This bug was introduced in #8617
Added a test which demonstrates that scenario.
In the initial release of Redis 6.2 setting a user to only allow pubsub access to
a specific channel, and doing ACL SAVE, resulted in an assertion when
ACL LOAD was used. This was later changed by #8723 (not yet released),
but still not properly resolved (now it errors instead of crash).
The problem is that the server that generates an ACL file, doesn't know what
would be the setting of the acl-pubsub-default config in the server that will load it.
so ACL SAVE needs to always start with resetchannels directive.
This should still be compatible with old acl files (from redis 6.0), and ones from earlier
versions of 6.2 that didn't mess with channels.
Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
The tail size of c->reply is 16kb, but in the test only publish a
few chars each time, due to a change in #8699, the obuf limit
is now checked a new memory allocation is made, so this test
would have sometimes failed to trigger a soft limit disconnection
in time.
The solution is to write bigger payloads to the output buffer, but
still limit their rate (not more than 100k/s).
In github actions CI with valgrind, i saw that even the fast replica
(one that wasn't paused), didn't get to complete the replication fast
enough, and ended up getting disconnected by timeout.
Additionally, due to a typo in uname, we didn't get to actually run the
CPU efficiency part of the test.
1. the `dump_logs` option would have printed only logs of servers that were
spawn before the test proc started, and not ones that the test proc
started inside it.
2. when a server proc catches an exception it should normally forward the
exception upwards, specifically when it's an assertion that should be
caught by a test proc above. however, in `durable` mode, we caught all
exceptions printed them to stdout and let the code continue,
this was wrong to do for assertions, which should have still been
propagated to the test function.
3. don't bother to search for crash log to print if we printed the the
entire log anyway
4. if no crash log was found, no need to print anything (i.e. the fact it
wasn't found)
5. rename warnings_from_file to crashlog_from_file
Starting redis 6.0 (part of the TLS feature), diskless master uses pipe from the fork
child so that the parent is the one sending data to the replicas.
This mechanism has an issue in which a hung replica will cause the master to wait
for it to read the data sent to it forever, thus preventing the fork child from terminating
and preventing the creations of any other forks.
This PR adds a timeout mechanism, much like the ACK-based timeout,
we disconnect replicas that aren't reading the RDB file fast enough.
Disable replica migration to avoid a race condition where the
migrated-from node turns into a replica.
Long term, this test should probably be improved to handle multiple
slots and accept such auto migrations but this is a quick fix to
stabilize the CI without completely dropping this test.
Fix out of range error messages to be clearer (avoid mentioning 9223372036854775807)
* Fix XAUTOCLAIM COUNT option confusing error msg
* Fix other RPOP and alike error message to mention positive
With this fix, module data type registration will fail if the load or save callbacks are not defined, or the optional aux load and save callbacks are not either both defined or both missing.
This is work in progress, focusing on two main areas:
* Avoiding race conditions with cluster configuration propagation.
* Ignoring limitations with redis-cli --cluster fix which makes it hard
to distinguish real errors (e.g. failure to fix) from expected
conditions in this test (e.g. nodes not agreeing on configuration).
Background:
Redis 6.2 added ACL control for pubsub channels (#7993), which were supposed
to be permissive by default to retain compatibility with redis 6.0 ACL.
But due to a bug, only newly created users got this `acl-pubsub-default` applied,
while overwritten (updated) users got reset to `resetchannels` (denied).
Since the "default" user exists before loading the config file,
any ACL change to it, results in an update / overwrite.
So when a "default" user is loaded from config file or include ACL
file with no channels related rules, the user will not have any
permissions to any channels. But other users will have default
permissions to any channels.
When upgraded from 6.0 with config rewrite, this will lead to
"default" user channels permissions lost.
When users are loaded from include file, then call "acl load", users
will also lost channels permissions.
Similarly, the `reset` ACL rule, would have reset the user to be denied
access to any channels, ignoring `acl-pubsub-default` and breaking
compatibility with redis 6.0.
The implication of this fix is that it regains compatibility with redis 6.0,
but breaks compatibility with redis 6.2.0 and 2.0.1. e.g. after the upgrade,
the default user will regain access to pubsub channels.
Other changes:
Additionally this commit rename server.acl_pubusub_default to
server.acl_pubsub_default and fix typo in acl tests.
Previously (and by default after commit) when master loose its last slot
(due to migration, for example), its replicas will migrate to new last slot
holder.
There are cases where this is not desired:
* Consolidation that results with removed nodes (including the replica, eventually).
* Manually configured cluster topologies, which the admin wishes to preserve.
Needlessly migrating a replica triggers a full synchronization and can have a negative impact, so
we prefer to be able to avoid it where possible.
This commit adds 'cluster-allow-replica-migration' configuration option that is
enabled by default to preserve existed behavior. When disabled, replicas will
not be auto-migrated.
Fixes#4896
Co-authored-by: Oran Agra <oran@redislabs.com>
This command used to return the last scanned entry id as the cursor,
instead of the next one to be scanned.
so in the next call, the user could / should have sent `(cursor` and not
just `cursor` if he wanted to avoid scanning the same record twice.
Scanning the record twice would look odd if someone is checking what
exactly was scanned, but it also has a side effect of incrementing the
delivery count twice.