xtrimCommand call streamParseAddOrTrimArgsOrReply should use xadd==0.
When the syntax is valid, it does not cause any bugs because the params of XADD is superset of XTRIM.
Just XTRIM will not respond with error on invalid syntax. The syntax of XADD will also be accpeted by XTRIM.
Also the bug that currently does not cause memory leaks.
Because op->type = OBJ_ZSET, in zuiClearIterator, we will do nothing.
Just a cleanup that zuiInitIterator and zuiClearIterator should appear in pairs.
In clusterManagerCommandImport strcat was used to concat COPY and
REPLACE, the space maybe not enough.
If we use --cluster-replace but not --cluster-copy, the MIGRATE
command contained COPY instead of REPLACE.
An integer overflow bug in Redis version 6.0 or newer can be exploited using the
STRALGO LCS command to corrupt the heap and potentially result with remote code
execution. This is a result of an incomplete fix by CVE-2021-29477.
Without this fix, FLUSHALL ASYNC would have freed these in a background thread,
even if they didn't contain many elements (unlike how it works with other structures), which could be inefficient.
Make aof rewrite buffer memory size more accurate, before, there may be 20%
deviation with its real memory usage.
The implication are both lower memory usage, and also a more accurate INFO.
Till now, on replica full-sync we used to transfer absolute time for TTL,
however when a command arrived (EXPIRE or EXPIREAT),
we used to propagate it as is to replicas (possibly with relative time),
but always translate it to EXPIREAT (absolute time) to AOF.
This commit changes that and will always use absolute time for propagation.
see discussion in #8433
Furthermore, we Introduce new commands: `EXPIRETIME/PEXPIRETIME`
that allow extracting the absolute TTL time from a key.
In diskless replication, we create a read pipe for the RDB, between the child and the parent.
When we close this pipe (fd), the read handler also needs to be removed from the event loop (if it still registered).
Otherwise, next time we will use the same fd, the registration will be fail (panic), because
we will use EPOLL_CTL_MOD (the fd still register in the event loop), on fd that already removed from epoll_ctl
In theory we should have used zmalloc_usable_size, but it may be slower,
and now after #7875, there's no actual difference.
So this change isn't making a dramatic change.
Co-authored-by: Oran Agra <oran@redislabs.com>
when string2ll was made to replace isStringRepresentableAsLongLong
(which was similar to what rdbTryIntegerEncoding does),
rdbTryIntegerEncoding was probably forgotten.
These parameters of RedisModule_CreateCommand were previously
undocumented but they are needed for ACL to check permission on keys and
by Redis Cluster to figure our how to route the command.
Co-authored-by: Eduardo Felipe Castegnaro <edufelipe@onsign.tv>
Co-authored-by: Oran Agra <oran@redislabs.com>
As far as i can tell it shows up in redis-cli in both HELP, e.g.
`help client list`, and also in the command completion tips, but it is
unclear what it was needed for.
It exists since the very first commit that added this mechanism.
Currently in redis-cli only AUTH and ACL SETUSER bypass history
file. We add CONFIG SET masterauth/masteruser/requirepass,
HELLO with AUTH, MIGRATE with AUTH or AUTH2 to bypass history
file too.
The drawback is HELLO and MIGRATE's code is a mess. Someday if
we change these commands, we have to change here too.
There's an infinite loop when redis-cli fails to connect in cluster mode.
This commit adds a 1 second sleep to prevent flooding the console with errors.
It also adds a specific error print in a few places that could have error without printing anything.
Co-authored-by: Oran Agra <oran@redislabs.com>
Improve performance by avoiding inefficiencies in the parent process during AOFRW.
* AOF: record the latency of aofChildWriteDiffData
* AOF: avoid memmove in aofChildWriteDiffData
There are two bugs in redis-cli hints:
* The hints of commands with subcommands lack first params.
* When search matching command of currently input, we should find the
command with longest matching prefix. If not COMMAND INFO will always
match COMMAND and display no hints.
Check for errors in inet_ntop and snprintf rather than ignore them
and return success (with garbage output).
The check for ip_len == 0 seems like dead code, removed.
When estimating the effort for unlink, we try to compute the effort of
the first group and extrapolate.
If there's a groups rax that's empty, there'a an assertion.
reproduce:
xadd s * a b
xgroup create s bla $
xgroup destroy s bla
unlink s
when SELECT fails, we should reset dbnum to 0, so the prompt will not
display incorrectly.
Additionally when SELECT and HELLO fail, we output message to inform
it.
Add config.input_dbnum which means the dbnum about to select.
And config.dbnum means currently selected dbnum. When users succeed to
select db, config.dbnum and config.input_dbnum will be the same. When
users select db failed, config.input_dbnum will be kept. Next time if users
auth success, config.input_dbnum will be automatically selected.
When reconnect, we should select the origin dbnum.
Co-authored-by: Oran Agra <oran@redislabs.com>
Redis Enterprise supports the CONFIG GET command, but it replies with am
empty array since the save and appendonly configs are not supported.
before this fix redis-benchmark would segfault for trying to access the
error string on an array type reply.
see #8869
When client breached the output buffer soft limit but then went idle,
we didn't disconnect on soft limit timeout, now we do.
Note this also resolves some sporadic test failures in due to Linux
buffering data which caused tests to fail if during the test we went
back under the soft COB limit.
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: sundb <sundbcn@gmail.com>
An integer overflow bug in Redis version 6.0 or newer could be exploited using
the STRALGO LCS command to corrupt the heap and potentially result with remote
code execution.
An integer overflow bug in Redis 6.2 could be exploited to corrupt the heap and
potentially result with remote code execution.
The vulnerability involves changing the default set-max-intset-entries
configuration value, creating a large set key that consists of integer values
and using the COPY command to duplicate it.
The integer overflow bug exists in all versions of Redis starting with 2.6,
where it could result with a corrupted RDB or DUMP payload, but not exploited
through COPY (which did not exist before 6.2).
This fix is in dead code.
see redisOutOfMemoryHandler an allocation can't fail.
but maybe someone will copy this code to a different project
some day, better have this fixed
Co-authored-by: Oran Agra <oran@redislabs.com>
When redis-cli was used with both -c (cluster) and -s (unix socket),
it would have kept trying to use that unix socket, even if it got
redirected by the cluster (resulting in an infinite loop).
- Immediately exit on errors that are not related to topology updates.
- Deprecates the `-e` option ( retro compatible ) and warns that we now
exit immediately on errors that are not related to topology updates.
- Fixed wrongfully failing on config fetch error (warning only). This only affects RE.
Bottom line:
- MOVED and ASK errors will not show any warning (unlike the throttled error with `-e` before).
- CLUSTERDOWN still prints an error unconditionally and sleeps for 1 second.
- other errors are fatal.
This solves an issue reported in #8712 in which a replica would bypass
the client write pause check and cause an assertion due to executing a
write command during failover.
The fact is that we don't expect replicas to execute any command other
than maybe REPLCONF and PING, etc. but matching against the ADMIN
command flag is insufficient, so instead i just block keyspace access
for now.
Most of the ae.c backends didn't explicitly handle errors, and instead
ignored all errors and did an implicit retry.
This is desired for EAGAIN and EINTER, but in case of other systematic
errors, we prefer to fail and log the error we got rather than get into a busy loop.
After sorting, each item in picks is sorted according
to its index.
In the original code logic, we traverse from the first
element of ziplist until `zipindex == picks[pickindex].index`.
We may be able to start directly in `picks[0].index`,
this will bring small performance improvement.
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
This change does not fix any bugs.
1. ```moduleUnload``` should return ```C_OK``` or ```C_ERR```, not ```REDISMODULE_ERR``` or ```REDISMODULE_OK```.
2. The ```where``` parameter of ```listTypePush``` and ```listTypePop``` should be ```LIST_HEAD``` or ```LIST_TAIL```
Modules event subscribers may get wrong things in notifyKeyspaceEvent callback,
such as wrong number of keys, or be able to lookup this key.
This commit changes the order to be like the one in evict.c.
Cleanup:
Since we know the key exists (it expires now), db*Delete is sure to return 1,
so there's no need to check it's output (misleading).
This scene is hard to happen. When first attempt some keys expired,
only kv position is updated not ov. Then socket err happens, second
attempt is taken. This time kv items may be mismatching with ov items.
Adding a new type mask for key space notification, REDISMODULE_NOTIFY_MODULE, to enable unique notifications from commands on REDISMODULE_KEYTYPE_MODULE type keys (which is currently unsupported).
Modules can subscribe to a module key keyspace notification by RM_SubscribeToKeyspaceEvents,
and clients by notify-keyspace-events of redis.conf or via the CONFIG SET, with the characters 'd' or 'A'
(REDISMODULE_NOTIFY_MODULE type mask is part of the '**A**ll' notation for key space notifications).
Refactor: move some pubsub test infra from pubsub.tcl to util.tcl to be re-used by other tests.
Before this commit using RM_Call without "!" could cause the master
to lazy-expire a key (delete it) but without replicating to replicas.
This could cause the replica's memory usage to gradually grow and
could also cause consistency issues if the master and replica have
a clock diff.
This bug was introduced in #8617
Added a test which demonstrates that scenario.
In the initial release of Redis 6.2 setting a user to only allow pubsub access to
a specific channel, and doing ACL SAVE, resulted in an assertion when
ACL LOAD was used. This was later changed by #8723 (not yet released),
but still not properly resolved (now it errors instead of crash).
The problem is that the server that generates an ACL file, doesn't know what
would be the setting of the acl-pubsub-default config in the server that will load it.
so ACL SAVE needs to always start with resetchannels directive.
This should still be compatible with old acl files (from redis 6.0), and ones from earlier
versions of 6.2 that didn't mess with channels.
Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
When replica never successfully connect to master, server.repl_down_since
will be initialized to 0, therefore, the info master_link_down_since_seconds
was showing the current unix timestamp, which does not make much sense.
This commit fixes the issue by showing master_link_down_since_seconds to -1.
means the replica never connect to master before.
This commit also resets this variable back to 0 when a replica is turned into
a master, so that it'll behave the same if the master is later turned into a
replica again.
The implication of this change is that if some app is checking if the value is > 60
do something, like conclude the replica is stale, this could case harm (changing
a big positive number with a small one).
Starting redis 6.0 (part of the TLS feature), diskless master uses pipe from the fork
child so that the parent is the one sending data to the replicas.
This mechanism has an issue in which a hung replica will cause the master to wait
for it to read the data sent to it forever, thus preventing the fork child from terminating
and preventing the creations of any other forks.
This PR adds a timeout mechanism, much like the ACK-based timeout,
we disconnect replicas that aren't reading the RDB file fast enough.
* Modules API docs: Link API function names to their definitions
Occurrences of API functions are linked to their definition.
A function index with links to all functions is added on the bottom
of the page.
Comment blocks in module.c starting with a markdown h2 heading are
used as sections. A table of contents is generated from these
headings.
The functions names are changed from h2 to h3, since they are now
rendered as sub-headings within each section.
Existing sections in module.c are used with some minor changes.
Some documentation text is added or sligtly modified.
The markdown renderer will add IDs which may clash with our
generated IDs. By prefixing section IDs with "section-" we make
them different.
Replace double dashes with a unicode long ndash
In a code example, using RedisModule_FreeString instead of
RedisModule_Free makes it behave correctly regardless of whether
automatic memory is used or not.
server.client_pause_end_time is uninitialized, or actually 0, at startup,
which means this method would think the timeout was reached
and go look for paused clients.
This causes no harm since unpauseClients will not find any paused clients.
The code used to decide on the next time to wake on a timer with
microsecond accuracy, but when deciding to go to sleep it used
milliseconds accuracy (with truncation), this means that it would wake
up too early, see that there's no timer to process, and go to sleep
again for 0ms again and again until the right microsecond arrived.
i.e. a timer for 100ms, would sleep for 99ms, but then do a busy loop
through the kernel in the last millisecond, triggering many calls to
beforeSleep.
The fix is to change all the logic in ae.c to work with microseconds,
which is good since most of the ae backends support micro (or even nano)
seconds. however the epoll backend, doesn't support micro, so to avoid
this problem it needs to round upwards, rather than truncate.
Issue created by the monotonic timer PR #7644 (redis 6.2)
Before that, all the timers in ae.c were in milliseconds (using
mstime), so when it requested the backend to sleep till the next timer
event, it would have worked ok.
The bio aof fsync fd may be closed by main thread (AOFRW done handler)
and even possibly reused for another socket, pipe, or file.
This can can an EBADF or EINVAL fsync error, which will lead to -MISCONF errors failing all writes.
We just ignore these errno because aof fsync did not really fail.
We handle errno when fsyncing aof in bio, so we could know the real reason
when users get -MISCONF Errors writing to the AOF file error
Issue created with #8419
Fix out of range error messages to be clearer (avoid mentioning 9223372036854775807)
* Fix XAUTOCLAIM COUNT option confusing error msg
* Fix other RPOP and alike error message to mention positive
With this fix, module data type registration will fail if the load or save callbacks are not defined, or the optional aux load and save callbacks are not either both defined or both missing.
Background:
Redis 6.2 added ACL control for pubsub channels (#7993), which were supposed
to be permissive by default to retain compatibility with redis 6.0 ACL.
But due to a bug, only newly created users got this `acl-pubsub-default` applied,
while overwritten (updated) users got reset to `resetchannels` (denied).
Since the "default" user exists before loading the config file,
any ACL change to it, results in an update / overwrite.
So when a "default" user is loaded from config file or include ACL
file with no channels related rules, the user will not have any
permissions to any channels. But other users will have default
permissions to any channels.
When upgraded from 6.0 with config rewrite, this will lead to
"default" user channels permissions lost.
When users are loaded from include file, then call "acl load", users
will also lost channels permissions.
Similarly, the `reset` ACL rule, would have reset the user to be denied
access to any channels, ignoring `acl-pubsub-default` and breaking
compatibility with redis 6.0.
The implication of this fix is that it regains compatibility with redis 6.0,
but breaks compatibility with redis 6.2.0 and 2.0.1. e.g. after the upgrade,
the default user will regain access to pubsub channels.
Other changes:
Additionally this commit rename server.acl_pubusub_default to
server.acl_pubsub_default and fix typo in acl tests.
Previously (and by default after commit) when master loose its last slot
(due to migration, for example), its replicas will migrate to new last slot
holder.
There are cases where this is not desired:
* Consolidation that results with removed nodes (including the replica, eventually).
* Manually configured cluster topologies, which the admin wishes to preserve.
Needlessly migrating a replica triggers a full synchronization and can have a negative impact, so
we prefer to be able to avoid it where possible.
This commit adds 'cluster-allow-replica-migration' configuration option that is
enabled by default to preserve existed behavior. When disabled, replicas will
not be auto-migrated.
Fixes#4896
Co-authored-by: Oran Agra <oran@redislabs.com>
In `aof.c`, we call fsync when stop aof, and now print a log to let user know that if fail.
In `cluster.c`, we now return error, the calling function already handles these write errors.
In `redis-cli.c`, users hope to save rdb, we now print a message if fsync failed.
In `rio.c`, we now treat fsync errors like we do for write errors.
In `server.c`, we try to fsync aof file when shutdown redis, we only can print one log if fail.
In `bio.c`, if failing to fsync aof file, we will set `aof_bio_fsync_status` to error , and reject writing just like last writing aof error, moreover also set INFO command field `aof_last_write_status` to error.
This command used to return the last scanned entry id as the cursor,
instead of the next one to be scanned.
so in the next call, the user could / should have sent `(cursor` and not
just `cursor` if he wanted to avoid scanning the same record twice.
Scanning the record twice would look odd if someone is checking what
exactly was scanned, but it also has a side effect of incrementing the
delivery count twice.
If GT/LT fails the operation we need to reply with
nill (like failure due to NX).
Other changes:
Add the missing $encoding suffix to many zset tests
Note: there's a behavior change just in case of INCR + GT/LT that fails.
The old code was replying with the wrong (rejected) score, and now it'll reply with nil.
Note that that's anyway a corner case so this "behavior change" shouldn't have too much affect.
Using GT/LT with INCR has a predictable result even before we run the command
(INCR GT will only only / always fail if the increment is negative).
The implications of this change is just that in the past when a config file was missing,
in some cases it was exiting before printing the sever startup prints and sometimes after,
and now it'll always exit before printing them.
There are 2 common range comparators for skiplist: zslValueGteMin and
zslValueLteMax, but they're not being reused in zslDeleteRangeByScore
This is a small change to make code cleaner.
this is a followup PR for #8699
instead of copying the deferred reply data to the previous node only if it has room for the entire thing,
we can now split the new payload, put part of it into the spare space in the prev node,
and the rest may fit into the next node.
The 'sentinel replicas <master>' command will ignore replicas with
`replica-announced` set to no.
The goal of disabling the config setting replica-announced is to allow ghost
replicas. The replica is in the cluster, synchronize with its master, can be
promoted to master and is not exposed to sentinel clients. This way, it is
acting as a live backup or living ghost.
In addition, to prevent the replica to be promoted as master, set
replica-priority to 0.
The cluster bus is established over TLS or non-TLS depending on the configuration tls-cluster. The client ports distributed in the cluster and sent to clients are assumed to be TLS or non-TLS also depending on tls-cluster.
The cluster bus is now extended to also contain the non-TLS port of clients in a TLS cluster, when available. The non-TLS port of a cluster node, when available, is sent to clients connected without TLS in responses to CLUSTER SLOTS, CLUSTER NODES, CLUSTER SLAVES and MOVED and ASK redirects, instead of the TLS port.
The user was able to override the client port by defining cluster-announce-port. Now cluster-announce-tls-port is added, so the user can define an alternative announce port for both TLS and non-TLS clients.
Fixes#8134
the bug was also discussed in #8716, and was solved in #8719, but incompletely:
when the server is started, and the save option is default, if you issue the " config set save "" "
to change the save option, and then issue the “config rewrite” command, the " save "" " won't be saved.
'processCommandAndResetClient' returns 1 if client is dead. It does it
by checking if serve.current_client is NULL. On script timeout, Redis will re-enter
'processCommandAndResetClient' and when finish we will set server.current_client
to NULL. This will cause later to falsely return 1 and think that the client that
sent the timed-out script is dead (Redis to stop reading from the client buffer).
In activeDefragSdsListAndDict when defrag list key sucess, the key
in dict is also replaced. Then the corresponding dictEntry is
defraged. But the dictEntry will be defraged in next step by defrag
the dict values too.
Add publish channel permissions check in processCommand.
processCommand didn't check publish channel permissions, so we can
queue a publish command in a transaction. But when exec the transaction,
it will fail with -NOPERM.
We also union keys/commands/channels permissions check togegher in
ACLCheckAllPerm. Remove pubsubCheckACLPermissionsOrReply in
publishCommand/subscribeCommand/psubscribeCommand. Always
check permissions in processCommand/execCommand/
luaRedisGenericCommand.
* SLOWLOG didn't record anything for blocked commands because the client
was reset and argv was already empty. there was a fix for this issue
specifically for modules, now it works for all blocked clients.
* The original command argv (before being re-written) was also reset
before adding the slowlog on behalf of the blocked command.
* Latency monitor is now updated regardless of the slowlog flags of the
command or its execution (their purpose is to hide sensitive info from
the slowlog, not hide the fact the latency happened).
* Latency monitor now uses real_cmd rather than c->cmd (which may be
different if the command got re-written, e.g. GEOADD)
Changes:
* Unify shared code between slowlog insertion in call() and
updateStatsOnUnblock(), hopefully prevent future bugs from happening
due to the later being overlooked.
* Reset CLIENT_PREVENT_LOGGING in resetClient rather than after command
processing.
* Add a test for SLOWLOG and BLPOP
Notes:
- real_cmd == c->lastcmd, except inside MULTI and Lua.
- blocked commands never happen in these cases (MULTI / Lua)
- real_cmd == c->cmd, except for when the command is rewritten (e.g.
GEOADD)
- blocked commands (currently) are never rewritten
- other than the command's CLIENT_PREVENT_LOGGING, and the
execution flag CLIENT_PREVENT_LOGGING, other cases that we want to
avoid slowlog are on AOF loading (specifically CMD_CALL_SLOWLOG will
be off when executed from execCommand that runs from an AOF)
the corrupt-dump-fuzzer test found a case where an access to a corrupt
stream would have caused accessing to uninitialized memory.
now it'll panic instead.
The issue was that there was a stream that says it has more than 0
records, but looking for the max ID came back empty handed.
p.s. when sanitize-dump-payload is used, this corruption is detected,
and the RESTORE command is gracefully rejected.
We sometimes see the crash report saying we were killed by a random
process even in cases where the crash was spontanius in redis.
for instance, crashes found by the corrupt-dump test.
It looks like this si_pid is sometimes left uninitialized, and a good
way to tell if the crash originated in redis or trigged by outside is to
look at si_code, real signal codes are always > 0, and ones generated by
kill are have si_code of 0 or below.