1. Disk error and slave count checks didn't flag the transactions or counted correctly in command stats (regression from #10372 , 7.0 RC3)
2. RM_Call will reply the same way Redis does, in case of non-exisitng command or arity error
3. RM_WrongArtiy will consider the full command name
4. Use lowercase 'u' in "unknonw subcommand" (to align with "unknown command")
Followup work of #10127
This case is interesting because it originates from cron,
rather than from another command.
The idea came from looking at #9890 and #10573, and I was wondering if RM_Call
would work properly when `server.current_client == NULL`
If was first added in #9890 to solve the problem of
CONFIG SET maxmemory causing eviction inside MULTI/EXEC,
but that problem is already fixed (CONFIG SET doesn't evict
directly, it just schedules a later eviction)
Keep that condition may hide bugs (i.e. performEvictions
should always expect to have an empty server.also_propagate)
* Fix timing issue in slowlog redact test
This test failed once in my daily CI (test-sanitizer-address (clang))
```
*** [err]: SLOWLOG - Some commands can redact sensitive fields in tests/unit/slowlog.tcl
Expected 'migrate 127.0.0.1 25649 key 9 5000 AUTH2 (redacted) (redacted)' to match '* key 9 5000 AUTH (redacted)' (context: type eval line 12 cmd {assert_match {* key 9 5000 AUTH (redacted)} [lindex [lindex [r slowlog get] 1] 3]} proc ::test)
```
The reason is that with slowlog-log-slower-than 10000,
slowlog get will have a chance to exceed 10ms.
Change slowlog-log-slower-than from 10000 to -1, disable it.
Also handles a same potentially problematic test above.
This is actually the same timing issue as #10432.
But also avoid repeated calls to `SLOWLOG GET`
In #7491 (part of redis 6.2), we started using the monotonic timer instead of mstime to measure
command execution time for stats, apparently this meant sampling the clock 3 times per command
rather than two (wince we also need the wall-clock time).
In some cases this causes a significant overhead.
This PR fixes that by avoiding the use of monotonic timer, except for the cases were we know it
should be extremely fast.
This PR also adds a new INFO field called `monotonic_clock` that shows which clock redis is using.
Co-authored-by: Oran Agra <oran@redislabs.com>
This PR unifies all the places that test if the current client is the
master client or AOF client, and uses a method to test that on all of
these.
Other than some refactoring, these are the actual implications:
- Replicas **don't** ignore disk error when processing commands not
coming from their master.
**This is important for PING to be used for health check of replicas**
- SETRANGE, APPEND, SETBIT, BITFIELD don't do proto_max_bulk_len check for AOF
- RM_Call in SCRIPT_MODE ignores disk error when coming from master /
AOF
- RM_Call in cluster mode ignores slot check when processing AOF
- Scripts ignore disk error when processing AOF
- Scripts **don't** ignore disk error on a replica, if the command comes
from clients other than the master
- SCRIPT KILL won't kill script coming from AOF
- Scripts **don't** skip OOM check on replica if the command comes from
clients other than the master
Note that Script, AOF, and module clients don't reach processCommand,
which is why some of the changes don't actually have any implications.
Note, reverting the change done to processCommand in 2f4240b9d9
should be dead code due to the above mentioned fact.
`hdr_value_at_percentile()` is part of the Hdr_Histogram library
used when generating `latencystats` report.
There's a pending optimization for this function which greatly
affects the performance of `info latencystats`.
https://github.com/HdrHistogram/HdrHistogram_c/pull/107
This PR:
1. Upgrades the sources in _deps/hdr_histogram_ to the latest Hdr_Histogram
version 0.11.5
2. Applies the referenced optimization.
3. Adds minor documentation about the hdr_histogram dependency which was
missing under _deps/README.md_.
benchmark on my machine:
running: `redis-benchmark -n 100000 info latencystats` on a clean build with no data.
| benchmark | RPS |
| ---- | ---- |
| before upgrade to v0.11.05 | 7,681 |
| before optimization | 12,474 |
| after optimization | 52,606 |
Co-authored-by: filipe oliveira <filipecosta.90@gmail.com>
Add a configuration option to attach an operating system-specific identifier to Redis sockets, supporting advanced network configurations using iptables (Linux) or ipfw (FreeBSD).
Changes:
1. Check the failed rewrite time threshold only when we actually consider triggering a rewrite.
i.e. this should be the last condition tested, since the test has side effects (increasing time threshold)
Could have happened in some rare scenarios
2. no limit in startup state (e.g. after restarting redis that previously failed and had many incr files)
3. the “triggered the limit” log would be recorded only when the limit status is returned
4. remove failure count in log (could be misleading in some cases)
Co-authored-by: chenyang8094 <chenyang8094@users.noreply.github.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
RM_Yield was missing a call to protectClient to prevent redis from
processing future commands of the yielding client.
Adding tests that fail without this fix.
This would be complicated to solve since nested calls to RM_Call used to
replace the current_client variable with the module temp client.
It looks like it's no longer necessary to do that, since it was added
back in #9890 to solve two issues, both already gone:
1. call to CONFIG SET maxmemory could trigger a module hook calling
RM_Call. although this specific issue is gone, arguably other hooks
like keyspace notification, can do the same.
2. an assertion in lookupKey that checks the current command of the
current client, introduced in #9572 and removed in #10248
There is a implicit conversion warning in clang:
```
util.c:574:23: error: implicit conversion from 'long long' to 'double'
changes value from -4611686018427387903 to -4611686018427387904
[-Werror,-Wimplicit-const-int-float-conversion]
if (d < -LLONG_MAX/2 || d > LLONG_MAX/2)
```
introduced in #10486
Co-authored-by: sundb <sundbcn@gmail.com>
When the score doesn't have fractional part, and can be stored as an integer,
we use the integer capabilities of listpack to store it, rather than convert it to string.
This already existed before this PR (lpInsert dose that conversion implicitly).
But to do that, we would have first converted the score from double to string (calling `d2string`),
then pass the string to `lpAppend` which identified it as being an integer and convert it back to an int.
Now, instead of converting it to a string, we store it using lpAppendInteger`.
Unrelated:
---
* Fix the double2ll range check (negative and positive ranges, and also the comparison operands
were slightly off. but also, the range could be made much larger, see comment).
* Unify the double to string conversion code in rdb.c with the one in util.c
* Small optimization in lpStringToInt64, don't attempt to convert strings that are obviously too long.
Benchmark;
---
Up to 20% improvement in certain tight loops doing zzlInsert with large integers.
(if listpack is pre-allocated to avoid realloc, and insertion is sorted from largest to smaller)
since PUBLISH and SPUBLISH use different dictionaries for channels and clients,
and we already have an API for PUBLISH, it only makes sense to have one for SPUBLISH
Add test coverage and unifying some test infrastructure.
The tests verify that loading a binary payload to the Lua interpreter raises an error.
The Lua code modification was done here: fdf9d45509
which force the Lau interpreter to always use the text parser.
From #9166, we call several times of prepareReplicasToWrite when propagating
one write command to replication stream (once per argument, same as we do for
normal clients), that is not necessary. Now we only call it one time per command
at the begin of feeding replication stream.
This results in reducing CPU consumption and slightly better performance,
specifically when there are many replicas.
Add APIs to allow modules to compute the memory consumption of opaque objects owned by redis.
Without these, the mem_usage callbacks of module data types are useless in many cases.
Other changes:
Fix streamRadixTreeMemoryUsage to include the size of the rax structure itself
By the convention of errors, there is supposed to be a space between the code and the name.
While looking at some lua stuff I noticed that interpreter errors were not adding the space,
so some clients will try to map the detailed error message into the error.
We have tests that hit this condition, but they were just checking that the string "starts" with ERR.
I updated some other tests with similar incorrect string checking. This isn't complete though, as
there are other ways we check for ERR I didn't fix.
Produces some fun output like:
```
# Errorstats
errorstat_ERR:count=1
errorstat_ERRuser_script_1_:count=1
```
This PR fix the following minor errors before Redis 7 release:
ZRANGEBYLEX command in deprecated in 6.2.0, and could be replaced by ZRANGE with the
BYLEX argument, but in the document, the words is written incorrect in " by ZRANGE with the BYSCORE argument"
Fix function zpopmaxCommand incorrect comment
The comments of function zmpopCommand and bzmpopCommand are not consistent with document description, fix them
Co-authored-by: Ubuntu <lucas.guang.yang1@huawei.com>
we can observe that when adding to a stream without ID there is a duplicate work
on sds creation/freeing/sdslen that costs ~11% of the CPU cycles.
This PR avoids it by not freeing the sds after the first reply.
The expected reduction in CPU cycles is around 9-10%
Additionally, we now pre-allocate the sds to the right size, to avoid realloc.
this brought another ~10% improvement
Co-authored-by: Oran Agra <oran@redislabs.com>
Add an optional keyspace event when new keys are added to the db.
This is useful for applications where clients need to be aware of the redis keyspace.
Such an application can SCAN once at startup and then listen for "new" events (plus
others associated with DEL, RENAME, etc).
Apparently, some modules can afford deprecating command arguments
(something that was never done in Redis, AFAIK), so in order to represent
this piece of information, we added the `deprecated_since` field to redisCommandArg
(in symmetry to the already existing `since` field).
This commit adds `const char *deprecated_since` to `RedisModuleCommandArg`,
which is technically a breaking change, but since 7.0 was not released yet, we decided to let it slide
Allow specifying an ACL log reason, which is shown in the log. Right now it always shows "unknown", which is a little bit cryptic. This is a breaking change, but this API was added as part of 7 so it seems ok to stabilize it still.
* Extending the use of hashTypeGetValue.
Functions hashTypeExists, hashTypeGetValueLength and addHashFieldToReply
have a similar pattern on calling hashTypeGetFromHashTable or
hashTypeGetFromZipList depending on the underlying data structure. What
does functions are duing is exactly what hashTypeGetValue does. Those
functions were changed to use existing function hashTypeGetValue making
the code more consistent.
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Durability of database is a big and old topic, in this regard Redis use AOF to
support it, and `appendfsync=alwasys` policy is the most strict level, guarantee
all data is both written and synced on disk before reply success to client.
But there are some cases have been overlooked, and could lead to durability broken.
1. The most clear one is about threaded-io mode
we should also set client's write handler with `ae_barrier` in
`handleClientsWithPendingWritesUsingThreads`, or the write handler would be
called after read handler in the next event loop, it means the write command result
could be replied to client before flush to AOF.
2. About blocked client (mostly by module)
in `beforeSleep()`, `handleClientsBlockedOnKeys()` should be called before
`flushAppendOnlyFile()`, in case the unblocked clients modify data without persistence
but send reply.
3. When handling `ProcessingEventsWhileBlocked`
normally it takes place when lua/function/module timeout, and we give a chance to users
to kill the slow operation, but we should call `flushAppendOnlyFile()` before
`handleClientsWithPendingWrites()`, in case the other clients in the last event loop get
acknowledge before data persistence.
for a instance:
```
in the same event loop
client A executes set foo bar
client B executes eval "for var=1,10000000,1 do end" 0
```
after the script timeout, client A will get `OK` but lose data after restart (kill redis when
timeout) if we don't flush the write command to AOF.
4. A more complex case about `ProcessingEventsWhileBlocked`
it is lua timeout in transaction, for example
`MULTI; set foo bar; eval "for var=1,10000000,1 do end" 0; EXEC`, then client will get set
command's result before the whole transaction done, that breaks atomicity too.
fortunately, it's already fixed by #5428 (although it's not the original purpose just a side
effect : )), but module timeout should be fixed too.
case 1, 2, 3 are fixed in this commit, the module issue in case 4 needs a followup PR.
Add field to COMMAND DOCS response to denote the name of the module
that added that command.
COMMAND LIST can filter by module, but if you get the full commands list,
you may still wanna know which command belongs to which module.
The alternative would be to do MODULE LIST, and then multiple calls to COMMAND LIST
The `auto-aof-rewrite-percentage` config defines at what growth percentage
an automatic AOF rewrite is triggered.
This normally works OK since the size of the AOF file at the end of a rewrite
is stored in `server.aof_rewrite_base_size`.
However, on startup, redis used to store the entire size of the AOF file into that
variable, resulting in a wrong automatic AOF rewrite trigger (could have been
triggered much later than desired).
This issue would only affect the first AOFRW after startup, after that future AOFRW
would have been triggered correctly.
This bug existed in all previous versions of Redis.
This PR unifies the meaning of `server.aof_rewrite_base_size`, which only represents
the size of BASE AOF.
Note that after an AOFRW this size includes the size of the incremental file (all the
commands that executed during rewrite), so that auto-aof-rewrite-percentage is the
ratio from the size of the AOF after rewrite.
However, on startup, it is complicated to know that size, and we compromised on
taking just the size of the base file, this means that the first rewrite after startup can
happen a little bit too soon.
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: yoav-steinberg <yoav@redislabs.com>
The bug was when using REDISMODULE_YIELD_FLAG_CLIENTS.
in that case we would have only set the CLIENTS type flag in
server.busy_module_yield_flags and then clear that flag when exiting
RM_Yield, so we would never call unblockPostponedClients when the
context is destroyed.
This didn't really have any actual implication, which is why the tests
couldn't (and still can't) find that since the bug only happens when
using CLIENT, but in this case we won't have any clients to un-postpone
i.e. clients will get rejected with BUSY error, rather than being
postponed.
Unrelated:
* Adding tests for nested contexts, just in case.
* Avoid nested RM_Yield calls
Fixes in command argument in json files
* Fixes BITFIELD's syntax ("sub-commands" can be repeated, and OVERFLOW is only valid for SET and INCR)
* Improves readability of SET (reordered)
* Fixes GEOSEARCH and GEOSEARCH_RO syntices (use `oneof` for mutually exclusive group instead of `optional`)
* Fixes MIGRATE syntax (use `oneof` for mutually exclusive group instead of `optional`)
* Fixes MODULE LOADEX syntax (the `CONFIG` token should be repeated too when using multiple configs)
other:
* make generate-command-help.rb accept a path to commands.json, or read it from stdin (e.g. `generate-commands-json.py | generate-command-help.rb -`)
Fixed a bug that used the `hincrbyfloat` or `hincrby` commands to make the field or value exceed the
`hash_max_listpack_value` but did not change the object encoding of the hash structure.
Add a length check for field and value, check the length of value first, if the length of value does not
exceed `hash_max_listpack_value` then check the length of field.
If the length of field or value is too long, it will reduce the efficiency of listpack, and the object encoding
will become hashtable after AOF restart, so this is also to keep the same before and after AOF restart.
Sentinel once in a while experience Sentinel TILT period or leader election
failure cycle. The problem is that those default timeout are too big and once
it happens, it breaks our tests. Suggesting:
- Reducing failover-timeout from 20 to 10sec (actually it is multiplied by 2
and reach 40sec of timeout)
- Modify tilt-period from default of 30sec to 5sec. When TILT period happens
it might lead to failover in our tests, and might cause also to failover cycle
cycle failure.
Sentinel tests should `wait_for_condition` up to 50seconds, where needed,
to be stable in case having single TILT period or failover failure cycle.
In addition relax timing configuration for "manual failover" Sentinel test
(was modified several months ago as part of an effort to reduce tests runtime)
The command json documents should just include information about the "arguments" and the "outputs".
I removed all of the 'functional wording' so it's clear.
## Move library meta data to be part of the library payload.
Following the discussion on https://github.com/redis/redis/issues/10429 and the intention to add (in the future) library versioning support, we believe that the entire library metadata (like name and engine) should be part of the library payload and not provided by the `FUNCTION LOAD` command. The reasoning behind this is that the programmer who developed the library should be the one who set those values (name, engine, and in the future also version). **It is not the responsibility of the admin who load the library into the database.**
The PR moves all the library metadata (engine and function name) to be part of the library payload. The metadata needs to be provided on the first line of the payload using the shebang format (`#!<engine> name=<name>`), example:
```lua
#!lua name=test
redis.register_function('foo', function() return 1 end)
```
The above script will run on the Lua engine and will create a library called `test`.
## API Changes (compare to 7.0 rc2)
* `FUNCTION LOAD` command was change and now it simply gets the library payload and extract the engine and name from the payload. In addition, the command will now return the function name which can later be used on `FUNCTION DELETE` and `FUNCTION LIST`.
* The description field was completely removed from`FUNCTION LOAD`, and `FUNCTION LIST`
## Breaking Changes (compare to 7.0 rc2)
* Library description was removed (we can re-add it in the future either as part of the shebang line or an additional line).
* Loading an AOF file that was generated by either 7.0 rc1 or 7.0 rc2 will fail because the old command syntax is invalid.
## Notes
* Loading an RDB file that was generated by rc1 / rc2 **is** supported, Redis will automatically add the shebang to the libraries payloads (we can probably delete that code after 7.0.3 or so since there's no need to keep supporting upgrades from an RC build).