This is what happened:
1. Instance starts, is a slave in the cluster configuration, but
actually server.masterhost is not set, so technically the instance
is acting like a master.
2. loadDataFromDisk() calls replicationCacheMasterUsingMyself() even if
the instance is a master, in the case it is logically a slave and the
cluster is enabled. So now we have a cached master even if the instance
is practically configured as a master (from the POV of
server.masterhost value and so forth).
3. clusterCron() sees that the instance requires to replicate from its
master, because logically it is a slave, so it calls
replicationSetMaster() that will in turn call
replicationCacheMasterUsingMyself(): before this commit, this call would
overwrite the old cached master, creating a memory leak.
This adds support for explicit configuration of a CA certs directory (in
addition to the previously supported bundle file). For redis-cli, if no
explicit CA configuration is supplied the system-wide default
configuration will be adopted.
misc:
- handle SSL_has_pending by iterating though these in beforeSleep, and setting timeout of 0 to aeProcessEvents
- fix issue with epoll signaling EPOLLHUP and EPOLLERR only to the write handlers. (needed to detect the rdb pipe was closed)
- add key-load-delay config for testing
- trim connShutdown which is no longer needed
- rioFdsetWrite -> rioFdWrite - simplified since there's no longer need to write to multiple FDs
- don't detect rdb child exited (don't call wait3) until we detect the pipe is closed
- Cleanup bad optimization from rio.c, add another one
* Introduce a connection abstraction layer for all socket operations and
integrate it across the code base.
* Provide an optional TLS connections implementation based on OpenSSL.
* Pull a newer version of hiredis with TLS support.
* Tests, redis-cli updates for TLS support.
cluster.c - stack buffer memory alignment
The pointer 'buf' is cast to a more strictly aligned pointer type
evict.c - lazyfree_lazy_eviction, lazyfree_lazy_eviction always called
defrag.c - bug in dead code
server.c - casting was missing parenthesis
rax.c - indentation / newline suggested an 'else if' was intended
It seeems that since I added the creation of the jemalloc thread redis
sometimes fails to start with the following error:
Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed!
This seems to be due to a race bug in ld.so, in which TLS creation on the
thread, collide with dlopen.
Move the creation of BIO and jemalloc threads to after modules are loaded.
plus small bugfix when trying to disable the jemalloc thread at runtime
since the slowlog and other means that can help you detect the bad script
are only exposed after the script is done. it might be a good idea to at least
print the script name (sha) to the log when it timeouts.
The correct way to access the module about a given IO context is to
deference io->type->module, since io->ctx is only populated if the user
requests an explicit context from an IO object.
We don't want that the API could be used directly in an unsafe way,
without checking if there is an active child. Now the safety checks are
moved directly in the function performing the operations.
In theory currently there is only one active child, but the API may
change or for bugs in the implementation we may have several (it was
like that for years because of a bug). Better to wait for a specific
pid and avoid consuing other pending children information.
We can't expect SIGUSR1 to have any specific value range, so let's
define an exit code that we can handle in a special way.
This also fixes an #include <wait.h> that is not standard.
SipHash expects a 128-bit key, and we were indeed generating 128-bits,
but restricting them to hex characters 0-9a-f, effectively giving us
only 4 bits-per-byte of key material, and 64 bits overall.
Now, we skip the hex conversion and supply 128 bits of unfiltered
random data.
Here we introduce a change in the way we convert values from Lua to
Redis when RESP3 is selected: this is possible without breaking the fact
we can return directly what a command returned, because there is no
Redis command in RESP2 that returns true or false to Lua, so the
conversion in the case of RESP2 is totally arbitrary. When a script is
written selecting RESP3 from Lua, it totally makes sense to change such
behavior and return RESP3 true/false when Lua true/false is returned.
We want all the scripts to run in RESP2 mode by default. It's up to the
caller to switch to V3 using redis.setresp() if it is really needed.
This way most scripts written for past Redis versions will continue to
work with Redis >= 6 even if the client is in RESP3 mode.
Note that this breaks API compatibility with Redis < 6:
CONFIG GET requirepass
Will no longer return a cleartext password as well, but the SHA256 hash
of the password set.
When implementing the code that saves and loads these aux fields we used rdb
format that was added for that in redis 5.0, but then we added the 'when' field
which meant that the old redis-check-rdb won't be able to skip these.
this fix adds an opcode as if that 'when' is part of the module data.
Before this commit we may have not consumer buffers when a read error is
encountered. Such buffers may contain errors that are important clues
for the user: for instance a protocol error in the payload we send in
pipe mode will cause the server to abort the connection. If the user
does not get the protocol error, debugging what is happening can be a
nightmare.
This commit fixes issue #3756.
of aarch64.
The content comes from the definition of the sigcontext and tested on
my aarch64 server.
sigcontext defined at the linux kernel code:
arch/arm64/include/uapi/asm/sigcontext.h
This is extremely useful in order to simulate an high load of requests
about different keys, and force Redis to track a lot of informations
about several clients, to simulate real world workloads.
Now that the call also invalidates client side caching slots, it is
important that after an internal flush operation we both send the
notifications to the clients and, at the same time, are able to reclaim
the memory of the tracking table. This may even fix a few edge cases
related to MULTI/EXEC + WATCH during resync, not sure, but in general
looks more correct.
Otherwise what happens is that the tracking table will never get garbage
collected if there are no longer clients with tracking enabled.
Now the invalidation function immediately checks if there is any table
allocated, otherwise it returns ASAP, so the overhead when the feature
is not used should be near zero.
This is especially needed in diskless loading, were a short read could have
caused redis to exit. now the module can handle the error and return to the
caller gracefully.
this fixes#5326
Without such change, the diskless replicas, when loading RDB files from
the socket will not abort when a broken RDB file gets loaded. This is
potentially unsafe, because right now Redis is not able to guarantee
that encoding errors are safe from the POV of memory corruptions (for
instance the LZF library may not be safe against untrusted data?) so
better to abort when the RDB file we are going to load is corrupted.
Instead I/O errors are still returned to the caller without aborting,
so that in case of short read the diskless replica can try again.
now that replica can read rdb directly from the socket, it should avoid exiting
on short read and instead try to re-sync.
this commit tries to have minimal effects on non-diskless rdb reading.
and includes a test that tries to trigger this scenario on various read cases.
* create module API for forking child processes.
* refactor duplicate code around creating and tracking forks by AOF and RDB.
* child processes listen to SIGUSR1 and dies exitFromChild in order to
eliminate a valgrind warning of unhandled signal.
* note that BGSAVE error reply has changed.
valgrind error is:
Process terminating with default action of signal 10 (SIGUSR1)