* create module API for forking child processes.
* refactor duplicate code around creating and tracking forks by AOF and RDB.
* child processes listen to SIGUSR1 and dies exitFromChild in order to
eliminate a valgrind warning of unhandled signal.
* note that BGSAVE error reply has changed.
valgrind error is:
Process terminating with default action of signal 10 (SIGUSR1)
Add tests to check basic functionality of this optional keyword, and also tested with
a module (redisgraph). Checked quickly with valgrind, no issues.
Copies name the type name canonicalisation code from `typeCommand`, perhaps this would
be better factored out to prevent the two diverging and both needing to be edited to
add new `OBJ_*` types, but this is a little fiddly with C strings.
The [redis-doc](https://github.com/antirez/redis-doc/blob/master/commands.json) repo
will need to be updated with this new arg if accepted.
A quirk to be aware of here is that the GEO commands are backed by zsets not their own
type, so they're not distinguishable from other zsets.
Additionally, for sparse types this has the same behaviour as `MATCH` in that it may
return many empty results before giving something, even for large `COUNT`s.
In fast systems "SLOWLOG RESET" is fast enough to don't be logged even
when the time limit is "1" sometimes. Leading to false positives such
as:
[err]: SLOWLOG - can be disabled in tests/unit/slowlog.tcl
Expected '1' to be equal to '0'
Now clients that are ready to be terminated asynchronously are processed
more often in beforeSleep() instead of being processed in serverCron().
This means that the test will not be able to catch the moment the client
was terminated, also note that the 'omem' figure now changes in big
steps, because of the new client output buffers layout.
So we have to change the test range in order to accomodate for that.
Yet the test is useful enough to be worth taking, even if its precision
is reduced by this commit. Probably if we get more problems, a thing
that makes sense is just to check that the limit is < 200k. That's more
than enough actually.
solving few replication related tests race conditions which fail on slow machines
bugfix in slave buffers test: since the test is executed twice, each time with
a different commands count, the threshold for the delta can't be a constant.
The XCLAIM docs state the XCLAIM increments the delivery counter for
messages. This PR makes the code match the documentation - which seems
like the desired behaviour - whilst still allowing RETRYCOUNT to be
specified manually.
My understanding of the way streamPropagateXCLAIM() works is that this
change will safely propagate to replicas since retry count is pulled
directly from the streamNACK struct.
Fixes#5194
This way the behavior is very similar to the past one.
This is useful in order to remember the user she probably failed to
configure a password correctly.
Few tests had borderline thresholds that were adjusted.
The slave buffers test had two issues, preventing the slave buffer from growing:
1) the slave didn't necessarily go to sleep on time, or woke up too early,
now using SIGSTOP to make sure it goes to sleep exactly when we want.
2) the master disconnected the slave on timeout
This should be able to find new bugs and regressions about the new
sorted set update function when ZADD is used to update an element
already existing.
The test is able to find the bug fixed at 2f282aee immediately.
it looks like on slow machines we're getting:
[err]: slave buffer are counted correctly in tests/unit/maxmemory.tcl
Expected condition '$slave_buf > 2*1024*1024' to be true (16914 > 2*1024*1024)
this is a result of the slave waking up too early and eating the
slave buffer before the traffic and the test ends.
on slower machines, the active defrag test tended to fail.
although the fragmentation ratio was below the treshold, the defragger was
still in the middle of a scan cycle.
this commit changes:
- the defragger uses the current fragmentation state, rather than the cache one
that is updated by server cron every 100ms. this actually fixes a bug of
starting one excess scan cycle
- the test lets the defragger use more CPU cycles, in hope that the defrag
will be faster, but also give it more time before we give up.
A) slave buffers didn't count internal fragmentation and sds unused space,
this caused them to induce eviction although we didn't mean for it.
B) slave buffers were consuming about twice the memory of what they actually needed.
- this was mainly due to sdsMakeRoomFor growing to twice as much as needed each time
but networking.c not storing more than 16k (partially fixed recently in 237a38737).
- besides it wasn't able to store half of the new string into one buffer and the
other half into the next (so the above mentioned fix helped mainly for small items).
- lastly, the sds buffers had up to 30% internal fragmentation that was wasted,
consumed but not used.
C) inefficient performance due to starting from a small string and reallocing many times.
what i changed:
- creating dedicated buffers for reply list, counting their size with zmalloc_size
- when creating a new reply node from, preallocate it to at least 16k.
- when appending a new reply to the buffer, first fill all the unused space of the
previous node before starting a new one.
other changes:
- expose mem_not_counted_for_evict info field for the benefit of the test suite
- add a test to make sure slave buffers are counted correctly and that they don't cause eviction
RESTORE now supports:
1. Setting LRU/LFU
2. Absolute-time TTL
Other related changes:
1. RDB loading will not override LRU bits when RDB file
does not contain the LRU opcode.
2. RDB loading will not set LRU/LFU bits if the server's
maxmemory-policy does not match.
Removing the fix about 50% of the times the test will not be able to
pass cleanly. It's very hard to write a test that will always fail, or
actually, it is possible but then it's likely that it will consistently
pass if we change some random bit, so better to use randomization here.
problems fixed:
* failing to read fragmentation information from jemalloc
* overflow in jemalloc fragmentation hint to the defragger
* test suite not triggering eviction after population
other fixes / improvements:
- LUA script memory isn't taken from zmalloc (taken from libc malloc)
so it can cause high fragmentation ratio to be displayed (which is false)
- there was a problem with "fragmentation" info being calculated from
RSS and used_memory sampled at different times (now sampling them together)
other details:
- adding a few more allocator info fields to INFO and MEMORY commands
- improve defrag test to measure defrag latency of big keys
- increasing the accuracy of the defrag test (by looking at real grag info)
this way we can use an even lower threshold and still avoid false positives
- keep the old (total) "fragmentation" field unchanged, but add new ones for spcific things
- add these the MEMORY DOCTOR command
- deduct LUA memory from the rss in case of non jemalloc allocator (one for which we don't "allocator active/used")
- reduce sampling rate of the rss and allocator info
After checking with the community via Twitter (here:
https://twitter.com/antirez/status/915130876861788161) the verdict was to
use ":". However I later realized, after users lamented the fact that
it's hard to copy IDs just with double click, that this was the reason
why I moved to "." in the first instance. Fortunately "-", that was the
other option with most votes, also gets selected with double click on
most terminal applications on Linux and MacOS.
So my reasoning was:
1) We can't retain "." because it's actually confusing to newcomers, it
looks like a floating number, people may be tricked into thinking they
can order IDs numerically as floats.
2) Moving to a double-click-to-select format is much better. People will
work with such IDs for long time when coding / debugging. Why making now
a choice that will impact this for the next years?
The only other viable option was "-", and that's what I did. Thanks.