* Adding current_save_keys_total and current_save_keys_processed info fields.
Present in replication, BGSAVE and AOFRW.
* Changing RM_SendChildCOWInfo() to RM_SendChildHeartbeat(double progress)
* Adding new info field current_fork_perc. Present in Replication, BGSAVE, AOFRW,
and module forks.
The key point is how to recover from last AOF write error, for example:
1. start redis with appendonly yes, and append some write commands
2. short write or something else error happen, `server.aof_last_write_status` changed to `C_ERR`, now redis doesn't accept write commands
3. execute `CONFIG SET appendonly no` to avoid the above problem, now redis can accept write commands again
4. disk error resolved, and execute `CONFIG SET appendonly yes` to reopen AOF, but `server.aof_last_write_status` cannot be changed to `C_OK` (if background aof rewrite run less then 1 second, it will free `server.aof_buf` and then serverCron cannot fix `aof_last_write_status`), then redis cannot accept write commands forever.
This PR use a simple way to fix it:
1. just free `server.aof_buf` when stop appendonly to save memory, if error happens in `flushAppendOnlyFile(1)`, the `server.aof_buf` may contains some data which has not be written to aof, I think we can ignore it because we turn off the appendonly.
2. reset fsync status after stop appendonly and call `flushAppendOnlyFile` only when `aof_state` is ON
3. reset `server.last_write_status` when reopen aof to accept write commands
With AOF policy of fsync "always", redis should respect the contract with the user
that on acknowledged write data is already synced on disk.
Redis was already exiting for AOF write error, but don't care about fsync failure.
So to guarantee data safe, redis should exit for fsync error too (when the AOF fsync
policy is 'always').
This commit introduces two new command and two options for an existing command
GETEX <key> [PERSIST][EX seconds][PX milliseconds] [EXAT seconds-timestamp]
[PXAT milliseconds-timestamp]
The getexCommand() function implements extended options and variants of the GET
command. Unlike GET command this command is not read-only. Only one of the options
can be used at a given time.
1. PERSIST removes any TTL associated with the key.
2. EX Set expiry TTL in seconds.
3. PX Set expiry TTL in milliseconds.
4. EXAT Same like EX instead of specifying the number of seconds representing the
TTL (time to live), it takes an absolute Unix timestamp
5. PXAT Same like PX instead of specifying the number of milliseconds representing the
TTL (time to live), it takes an absolute Unix timestamp
Command would return either the bulk string, error or nil.
GETDEL <key>
Would delete the key after getting.
SET key value [NX] [XX] [KEEPTTL] [GET] [EX <seconds>] [PX <milliseconds>]
[EXAT <seconds-timestamp>][PXAT <milliseconds-timestamp>]
Two new options added here are EXAT and PXAT
Key implementation notes
- `SET` with `PX/EX/EXAT/PXAT` is always translated to `PXAT` in `AOF`. When relative time is
specified (`PX/EX`), replication will always use `PX`.
- `setexCommand` and `psetexCommand` would no longer need translation in `feedAppendOnlyFile`
as they are modified to invoke `setGenericCommand ` with appropriate flags which will take care of
correct AOF translation.
- `GETEX` without any optional argument behaves like `GET`.
- `GETEX` command is never propagated, It is either propagated as `PEXPIRE[AT], or PERSIST`.
- `GETDEL` command is propagated as `DEL`
- Combined the validation for `SET` and `GETEX` arguments.
- Test cases to validate AOF/Replication propagation
Add INFO field, rdb_active_cow_size, to report COW of a live fork child while
it's active.
- once in 1024 keys check the time, and if there's more than one second since
the last report send a report to the parent via the pipe.
- refactor the child_info_data struct, it's an implementation detail that
shouldn't be in the server struct, and not used to communicate data between
caller and callee
- remove the magic value from that struct (not sure what it was good for), and
instead add handling of short reads.
- add another value to the structure, cow_type, to indicate if the report is
for the new rdb_active_cow_size field, or it's the last report of a
successful operation
- add new Module API to report the active COW
- add more asserts variants to test.tcl
This is a refactory commit, isn't suppose to have any actual impact.
it does the following:
- keep just one server struct fork child pid variable instead of 3
- have one server struct variable indicating the purpose of the current fork
child.
- redisFork is now responsible of updating the server struct with the pid,
which means it can be the one that calls updateDictResizePolicy
- move child info pipe handling into redisFork instead of having them
repeated outside
- there are two classes of fork purposes, mutually exclusive group (AOF, RDB,
Module), and one that can create several forks to coexist in parallel (LDB,
but maybe Modules some day too, Module API allows for that).
- minor fix to killRDBChild:
unlike killAppendOnlyChild and TerminateModuleForkChild, the killRDBChild
doesn't clear the pid variable or call wait4, so checkChildrenDone does
the cleanup for it.
This commit removes the explicit calls to rdbRemoveTempFile, closeChildInfoPipe,
updateDictResizePolicy, which didn't do any harm, but where unnecessary.
* Allow runtest-moduleapi use a different 'make', for systems where GNU Make is 'gmake'.
* Fix issue with builds on Solaris re-building everything from scratch due to CFLAGS/LDFLAGS not stored.
* Fix compile failure on Solaris due to atomicvar and a bunch of warnings.
* Fix garbled log timestamps on Solaris.
Blocking command should not be used with MULTI, LUA, and RM_Call. This is because,
the caller, who executes the command in this context, expects a reply.
Today, LUA and MULTI have a special (and different) treatment to blocking commands:
LUA - Most commands are marked with no-script flag which are checked when executing
and command from LUA, commands that are not marked (like XREAD) verify that their
blocking mode is not used inside LUA (by checking the CLIENT_LUA client flag).
MULTI - Command that is going to block, first verify that the client is not inside
multi (by checking the CLIENT_MULTI client flag). If the client is inside multi, they
return a result which is a match to the empty key with no timeout (for example blpop
inside MULTI will act as lpop)
For modules that perform RM_Call with blocking command, the returned results type is
REDISMODULE_REPLY_UNKNOWN and the caller can not really know what happened.
Disadvantages of the current state are:
No unified approach, LUA, MULTI, and RM_Call, each has a different treatment
Module can not safely execute blocking command (and get reply or error).
Though It is true that modules are not like LUA or MULTI and should be smarter not
to execute blocking commands on RM_Call, sometimes you want to execute a command base
on client input (for example if you create a module that provides a new scripting
language like javascript or python).
While modules (on modules command) can check for REDISMODULE_CTX_FLAGS_LUA or
REDISMODULE_CTX_FLAGS_MULTI to know not to block the client, there is no way to
check if the command came from another module using RM_Call. So there is no way
for a module to know not to block another module RM_Call execution.
This commit adds a way to unify the treatment for blocking clients by introducing
a new CLIENT_DENY_BLOCKING client flag. On LUA, MULTI, and RM_Call the new flag
turned on to signify that the client should not be blocked. A blocking command
verifies that the flag is turned off before blocking. If a blocking command sees
that the CLIENT_DENY_BLOCKING flag is on, it's not blocking and return results
which are matches to empty key with no timeout (as MULTI does today).
The new flag is checked on the following commands:
List blocking commands: BLPOP, BRPOP, BRPOPLPUSH, BLMOVE,
Zset blocking commands: BZPOPMIN, BZPOPMAX
Stream blocking commands: XREAD, XREADGROUP
SUBSCRIBE, PSUBSCRIBE, MONITOR
In addition, the new flag is turned on inside the AOF client, we do not want to
block the AOF client to prevent deadlocks and commands ordering issues (and there
is also an existing assert in the code that verifies it).
To keep backward compatibility on LUA, all the no-script flags on existing commands
were kept untouched. In addition, a LUA special treatment on XREAD and XREADGROUP was kept.
To keep backward compatibility on MULTI (which today allows SUBSCRIBE, and PSUBSCRIBE).
We added a special treatment on those commands to allow executing them on MULTI.
The only backward compatibility issue that this PR introduces is that now MONITOR
is not allowed inside MULTI.
Tests were added to verify blocking commands are not blocking the client on LUA, MULTI,
or RM_Call. Tests were added to verify the module can check for CLIENT_DENY_BLOCKING flag.
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Itamar Haber <itamar@redislabs.com>
In redisFork(), we don't set child pid, so updateDictResizePolicy()
doesn't take effect, that isn't friendly for copy-on-write.
The bug was introduced this in redis 6.0: 56258c6
Useful when you want to know through which bind address the client connected to
the server in case of multiple bind addresses.
- Adding `laddr` field to CLIENT list showing the local (bind) address.
- Adding `LADDR` option to CLIENT KILL to kill all the clients connected
to a specific local address.
- Refactoring to share code.
minor fix for a bug which happen on error handling code
and doesn't look like it could have caused any real harm
(fd number wouldn't have been reused yet)
If we fail or stop to rewrite aof, we need to remove temporary aof.
We also remove temporary rdb when replicas abort to receive rdb.
But currently we delete them in main thread, to avoid blocking,
we should use bg_unlink to remove them in a background thread.
Btw, we have already used this way to removed child process temporary rdb.
track and report memory used by clients argv.
this is very usaful in case clients started sending a command and didn't
complete it. in which case the first args of the command are already
trimmed from the query buffer.
in an effort to avoid cache misses and overheads while keeping track of
these, i avoid calling sdsZmallocSize and instead use the sdslen /
bulk-len which can at least give some insight into the problem.
This memory is now added to the total clients memory usage, as well as
the client list.
XREADGROUP auto-creates the consumer inside the consumer group the
first time it saw it.
When XREADGROUP is being used with NOACK option, the message will not
be added into the client's PEL and XGROUP SETID would be propagated.
When the replica gets the XGROUP SETID it will only update the last delivered
id of the group, but will not create the consumer.
So, in this commit XGROUP CREATECONSUMER is being added.
Command pattern: XGROUP CREATECONSUMER <key> <group> <consumer>.
When NOACK option is being used, createconsumer command would be
propagated as well.
In case of AOFREWRITE, consumer with an empty PEL would be saved with
XGROUP CREATECONSUMER whereas consumer with pending entries would be
saved with XCLAIM
This commit adds streamIteratorStop call in rewriteStreamObject function in some of the return statement. Although currently this will not cause memory leak since stream id is only 16 bytes long.
During long running scripts or loading RDB/AOF, we may need to do some
defragging. Since processEventsWhileBlocked is called periodically at
unknown intervals, and many cron jobs either depend on run_with_period
(including active defrag), or rely on being called at server.hz rate
(i.e. active defrag knows ho much time to run by looking at server.hz),
the whileBlockedCron may have to run a loop triggering the cron jobs in it
(currently only active defrag) several times.
Other changes:
- Adding a test for defrag during aof loading.
- Changing key-load-delay config to take negative values for fractions
of a microsecond sleep
During a long AOF or RDB loading, the memory stats were not updated, and
INFO would return stale data, specifically about fragmentation and RSS.
In the past some of these were sampled directly inside the INFO command,
but were moved to cron as an optimization.
This commit introduces a concept of loadingCron which should take
some of the responsibilities of serverCron.
It attempts to limit it's rate to approximately the server Hz, but may
not be very accurate.
In order to avoid too many system call, we use the cached ustime, and
also make sure to update it in both AOF loading and RDB loading inside
processEventsWhileBlocked (it seems AOF loading was missing it).
Since the dynamic allocations in raxIterator are only used for deep walks, memory
leak due to missing call to raxStop can only happen for rax with key names longer
than 32 bytes.
Out of all the missing calls, the only ones that may lead to a leak are the rax
for consumer groups and consumers, and these were only in AOFRW and rdbSave, which
normally only happen in fork or at shutdown.
Currently, there are several types of threads/child processes of a
redis server. Sometimes we need deeply optimise the performance of
redis, so we would like to isolate threads/processes.
There were some discussion about cpu affinity cases in the issue:
https://github.com/antirez/redis/issues/2863
So implement cpu affinity setting by redis.conf in this patch, then
we can config server_cpulist/bio_cpulist/aof_rewrite_cpulist/
bgsave_cpulist by cpu list.
Examples of cpulist in redis.conf:
server_cpulist 0-7:2 means cpu affinity 0,2,4,6
bio_cpulist 1,3 means cpu affinity 1,3
aof_rewrite_cpulist 8-11 means cpu affinity 8,9,10,11
bgsave_cpulist 1,10-11 means cpu affinity 1,10,11
Test on linux/freebsd, both work fine.
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Makse sure call() doesn't wrap replicated commands with
a redundant MULTI/EXEC
Other, unrelated changes:
1. Formatting compiler warning in INFO CLIENTS
2. Use CLIENT_ID_AOF instead of UINT64_MAX
the AOF will be loaded successfully, but the stream will be missing,
i.e inconsistencies with the original db.
this was because XADD with id of 0-0 would error.
add a test to reproduce.
althouh in theory, users can do BGREWRITEAOF even if aof is disabled, i
suppose it is more common that the scheduled flag is set by either
startAppendOnly, of a failed initial AOFRW fork (AOF_WAIT_REWRITE)
* replication hooks: role change, master link status, replica online/offline
* persistence hooks: saving, loading, loading progress
* misc hooks: cron loop, shutdown, module loaded/unloaded
* change the way hooks test work, and add tests for all of the above
startLoading() now gets flag indicating what is loaded.
stopLoading() now gets an indication of success or failure.
adding startSaving() and stopSaving() with similar args and role.
misc:
- handle SSL_has_pending by iterating though these in beforeSleep, and setting timeout of 0 to aeProcessEvents
- fix issue with epoll signaling EPOLLHUP and EPOLLERR only to the write handlers. (needed to detect the rdb pipe was closed)
- add key-load-delay config for testing
- trim connShutdown which is no longer needed
- rioFdsetWrite -> rioFdWrite - simplified since there's no longer need to write to multiple FDs
- don't detect rdb child exited (don't call wait3) until we detect the pipe is closed
- Cleanup bad optimization from rio.c, add another one
* Introduce a connection abstraction layer for all socket operations and
integrate it across the code base.
* Provide an optional TLS connections implementation based on OpenSSL.
* Pull a newer version of hiredis with TLS support.
* Tests, redis-cli updates for TLS support.
* create module API for forking child processes.
* refactor duplicate code around creating and tracking forks by AOF and RDB.
* child processes listen to SIGUSR1 and dies exitFromChild in order to
eliminate a valgrind warning of unhandled signal.
* note that BGSAVE error reply has changed.
valgrind error is:
Process terminating with default action of signal 10 (SIGUSR1)