So error message `ERR only (P)SUBSCRIBE / (P)UNSUBSCRIBE / PING / QUIT allowed in this context` will become
`ERR 'get' command submitted, but only (P)SUBSCRIBE / (P)UNSUBSCRIBE / PING / QUIT allowed in this context`
since the refactory of config.c, it was initialized from config_hz in initServer
but apparently that's too late since the config file loading creates objects
which call LRU_CLOCK
This message is there for ten years, but is hardly useful.
Moreover it is likely that it will fill an entire disk if log ratation
is not configured, for no good reasons.
Changes in behavior:
- Change server.stream_node_max_entries from int64_t to long long, so that it can be used by the generic infra
- standard error reply instead of "repl-backlog-size must be 1 or greater" and such
- tls-port and a few TLS booleans were readable (config get) even when USE_OPENSSL was off (now they aren't)
- syslog-enabled, syslog-ident, cluster-enabled, appendfilename, and supervised didn't have a get (now they do)
- pidfile was initialized to NULL in InitServerConfig but had CONFIG_DEFAULT_PID_FILE in rewriteConfig (so the real default was "", but rewrite would cause it to be set), fixed the rewrite.
- TLS config in server.h was uninitialized (if no tls config args were provided)
Adding test for sanity and coverage
- add capability for each config to have a callback to check if value is valid and return error string
will enable converting many of the remaining custom configs into generic ones (reducing the x4 repetition for set,get,config,rewrite)
- add capability for each config to to run some update code after config is changed (only for CONFIG SET)
will also enable converting many of the remaining custom configs into generic ones
- add capability to move default values from server.h and server.c to config.c
will reduce many excess lines in server.h and server.c (plus, no need to rebuild the entire code base when a default change 8-))
other behavior changes:
- fix bug in bool config get (always returning 'yes')
- fix a bug in modifying jemalloc-bg-thread at runtime (didn't call set_jemalloc_bg_thread, due to bad merge conflict resolution (my fault))
- side effect when a failed attempt to enable activedefrag at runtime, we now respond with -ERR and not with -DISABLED
Random command like SPOP with count is replicated as
some SREM operations, and store them in also_propagate
array to propagate after the call, but this would break
atomicity.
To keep the command's atomicity, wrap also_propagate
array with MULTI/EXEC.
Is it sufficient... ? -- Yes it is. In standalone mode, we say READY=1
at the comment point; however in replicated mode, we delay sending
READY=1 until the replication sync completes.
This adds Makefile/build-system support for USE_SYSTEMD=(yes|no|*). This
variable's value determines whether or not libsystemd will be linked at
build-time.
If USE_SYSTEMD is set to "yes", make will use PKG_CONFIG to check for
libsystemd's presence, and fail the build early if it isn't
installed/detected properly.
If USE_SYSTEM is set to "no", libsystemd will *not* be linked, even if
support for it is available on the system redis is being built on.
For any other value that USE_SYSTEM might assume (e.g. "auto"),
PKG_CONFIG will try to determine libsystemd's presence, and set up the
build process to link against it, if it was indicated as being
installed/available.
This approach has a number of repercussions of its own, most importantly
the following: If you build redis on a system that actually has systemd
support, but no libsystemd-dev package(s) installed, you'll end up
*without* support for systemd notification/status reporting support in
redis-server. This changes established runtime behaviour.
I'm not sure if the build system and/or the server binary should
indicate this. I'm also wondering if not actually having
systemd-notify-support, but requesting it via the server's config,
should result in a fatal error now.
Instead of replicating a subset of libsystemd's sd_notify(3) internally,
use the dynamic library provided by systemd to communicate with the
service manager.
When systemd supervision was auto-detected or configured, communicate
the actual server status (i.e. "Loading dataset", "Waiting for
master<->replica sync") to systemd, instead of declaring readiness right
after initializing the server process.
Reduce default minimum effort, so that when fragmentation is just detected,
the impact on the latency will be minor.
Reduce the default maximum effort, mainly to prevent a case were a sudden
massive deletions, won't trigger an aggressive defrag that will cause latency.
When activedefrag is disabled mid-run, reset the 'running' info field, and
clear the scan cursor, so that when it'll be re-enabled, a new fresh scan will
start.
Clearing the 'running' variable is important since lowering the defragger
tunables mid-scan won't help, the defragger only considers new threshold when
a new scan starts, and during a scan it can only become more aggressive,
(when more severe fragmentation is detected), it'll never go less aggressive.
So by temporarily disabling activedefrag, one can lower th the tunables.
Removing the experimantal warning.
One problem with the solution proposed so far in #6537 is that key
lookups outside a command execution via call(), still used a cached
time. The cached time needed to be refreshed in multiple places,
especially because of modules callbacks from timers, cluster bus, and
thread safe contexts, that may use RM_Open().
In order to avoid this problem, this commit introduces the ability to
detect if we are inside call(): this way we can use the reference fixed
time only when we are in the context of a command execution or Lua
script, but for the asynchronous lookups, we can still use mstime() to
get a fresh time reference.
After the thread in #6537 and thanks to the suggestions received, this
commit updates the original patch in order to:
1. Solve the problem of updating the time in multiple places by updating
it in call().
2. Avoid introducing a new field but use our cached time.
This required some minor refactoring to the function updating the time,
and the introduction of a new cached time in microseconds in order to
use less gettimeofday() calls.
Calling lookupKey*() many times to search a key in one command
may get different result.
That's because lookupKey*() calls expireIfNeeded(), and delete
the key when reach the expire time. So we can get an robj before
the expire time, but a NULL after the expire time.
The worst is that may lead to Redis crash, for example
`RPOPLPUSH foo foo` the first time we get a list form `foo` and
hold the pointer, but when we get `foo` again it's expired and
deleted. Now we hold a freed memory, when execute rpoplpushHandlePush()
redis crash.
To fix it, we can refactor the judgment about whether a key is expired,
using the same basetime `server.cmd_start_mstime` instead of calling
mstime() everytime.
- Add RM_GetServerInfo and friends
- Add auto memory for new opaque struct
- Add tests for new APIs
other minor fixes:
- add const in various char pointers
- requested_section in modulesCollectInfo was actually not sds but char*
- extract new string2d out of getDoubleFromObject for code reuse
Add module API for
* replication hooks: role change, master link status, replica online/offline
* persistence hooks: saving, loading, loading progress
* misc hooks: cron loop, shutdown, module loaded/unloaded
* change the way hooks test work, and add tests for all of the above
startLoading() now gets flag indicating what is loaded.
stopLoading() now gets an indication of success or failure.
adding startSaving() and stopSaving() with similar args and role.
As we know if a module exports module-side data types,
unload it is not allowed. This rule is the same with
blocked clients in module, because we use background
threads to implement module blocked clients, and it's
not safe to unload a module if there are background
threads running. So it's necessary to check if any
blocked clients running in this module when unload it.
Moreover, after that we can ensure that if no modules,
then no module blocked clients even module unloaded.
So, we can call moduleHandleBlockedClients only when
we have installed modules.
This is what happened:
1. Instance starts, is a slave in the cluster configuration, but
actually server.masterhost is not set, so technically the instance
is acting like a master.
2. loadDataFromDisk() calls replicationCacheMasterUsingMyself() even if
the instance is a master, in the case it is logically a slave and the
cluster is enabled. So now we have a cached master even if the instance
is practically configured as a master (from the POV of
server.masterhost value and so forth).
3. clusterCron() sees that the instance requires to replicate from its
master, because logically it is a slave, so it calls
replicationSetMaster() that will in turn call
replicationCacheMasterUsingMyself(): before this commit, this call would
overwrite the old cached master, creating a memory leak.
misc:
- handle SSL_has_pending by iterating though these in beforeSleep, and setting timeout of 0 to aeProcessEvents
- fix issue with epoll signaling EPOLLHUP and EPOLLERR only to the write handlers. (needed to detect the rdb pipe was closed)
- add key-load-delay config for testing
- trim connShutdown which is no longer needed
- rioFdsetWrite -> rioFdWrite - simplified since there's no longer need to write to multiple FDs
- don't detect rdb child exited (don't call wait3) until we detect the pipe is closed
- Cleanup bad optimization from rio.c, add another one
* Introduce a connection abstraction layer for all socket operations and
integrate it across the code base.
* Provide an optional TLS connections implementation based on OpenSSL.
* Pull a newer version of hiredis with TLS support.
* Tests, redis-cli updates for TLS support.
cluster.c - stack buffer memory alignment
The pointer 'buf' is cast to a more strictly aligned pointer type
evict.c - lazyfree_lazy_eviction, lazyfree_lazy_eviction always called
defrag.c - bug in dead code
server.c - casting was missing parenthesis
rax.c - indentation / newline suggested an 'else if' was intended
It seeems that since I added the creation of the jemalloc thread redis
sometimes fails to start with the following error:
Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed!
This seems to be due to a race bug in ld.so, in which TLS creation on the
thread, collide with dlopen.
Move the creation of BIO and jemalloc threads to after modules are loaded.
plus small bugfix when trying to disable the jemalloc thread at runtime