This is a rebased version of #3078 originally by shaharmor
with the following patches by TysonAndre made after rebasing
to work with the updated C API:
1. Add 2 more unit tests
(wrong argument count error message, integer over 64 bits)
2. Use addReplyArrayLen instead of addReplyMultiBulkLen.
3. Undo changes to src/help.h - for the ZMSCORE PR,
I heard those should instead be automatically
generated from the redis-doc repo if it gets updated
Motivations:
- Example use case: Client code to efficiently check if each element of a set
of 1000 items is a member of a set of 10 million items.
(Similar to reasons for working on #7593)
- HMGET and ZMSCORE already exist. This may lead to developers deciding
to implement functionality that's best suited to a regular set with a
data type of sorted set or hash map instead, for the multi-get support.
Currently, multi commands or lua scripting to call sismember multiple times
would almost definitely be less efficient than a native smismember
for the following reasons:
- Need to fetch the set from the string every time
instead of reusing the C pointer.
- Using pipelining or multi-commands would result in more bytes sent
and received by the client for the repeated SISMEMBER KEY sections.
- Need to specially encode the data and decode it from the client
for lua-based solutions.
- Proposed solutions using Lua or SADD/SDIFF could trigger writes to
memory, which is undesirable on a redis replica server
or when commands get replicated to replicas.
Co-Authored-By: Shahar Mor <shahar@peer5.com>
Co-Authored-By: Tyson Andre <tysonandre775@hotmail.com>
Added RedisModule_HoldString that either returns a
shallow copy of the given String (by increasing
the String ref count) or a new deep copy of String
in case its not possible to get a shallow copy.
Co-authored-by: Itamar Haber <itamar@redislabs.com>
When runtest-cluster, at first, we need to create a cluster use spawn_instance,
a port which is not used is choosen, however sometimes we can't run server on
the port. possibley due to a race with another process taking it first.
such as redis/redis/runs/896537490. It may be due to the machine problem or
In order to reduce the probability of failure when start redis in
runtest-cluster, we attemp to use another port when find server do not start up.
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: yanhui13 <yanhui13@meituan.com>
Diskless master has some inherent latencies.
1) fork starts with delay from cron rather than immediately
2) replica is put online only after an ACK. but the ACK
was sent only once a second.
3) but even if it would arrive immediately, it will not
register in case cron didn't yet detect that the fork is done.
Besides that, when a replica disconnects, it doesn't immediately
attempts to re-connect, it waits for replication cron (one per second).
in case it was already online, it may be important to try to re-connect
as soon as possible, so that the backlog at the master doesn't vanish.
In case it disconnected during rdb transfer, one can argue that it's
not very important to re-connect immediately, but this is needed for the
"diskless loading short read" test to be able to run 100 iterations in 5
seconds, rather than 3 (waiting for replication cron re-connection)
changes in this commit:
1) sync command starts a fork immediately if no sync_delay is configured
2) replica sends REPLCONF ACK when done reading the rdb (rather than on 1s cron)
3) when a replica unexpectedly disconnets, it immediately tries to
re-connect rather than waiting 1s
4) when when a child exits, if there is another replica waiting, we spawn a new
one right away, instead of waiting for 1s replicationCron.
5) added a call to connectWithMaster from replicationSetMaster. which is called
from the REPLICAOF command but also in 3 places in cluster.c, in all of
these the connection attempt will now be immediate instead of delayed by 1
second.
side note:
we can add a call to rdbPipeReadHandler in replconfCommand when getting
a REPLCONF ACK from the replica to solve a race where the replica got
the entire rdb and EOF marker before we detected that the pipe was
closed.
in the test i did see this race happens in one about of some 300 runs,
but i concluded that this race is unlikely in real life (where the
replica is on another host and we're more likely to first detect the
pipe was closed.
the test runs 100 iterations in 3 seconds, so in some cases it'll take 4
seconds instead (waiting for another REPLCONF ACK).
Removing unneeded startBgsaveForReplication from updateSlavesWaitingForBgsave
Now that CheckChildrenDone is calling the new replicationStartPendingFork
(extracted from serverCron) there's actually no need to call
startBgsaveForReplication from updateSlavesWaitingForBgsave anymore,
since as soon as updateSlavesWaitingForBgsave returns, CheckChildrenDone is
calling replicationStartPendingFork that handles that anyway.
The code in updateSlavesWaitingForBgsave had a bug in which it ignored
repl-diskless-sync-delay, but removing that code shows that this bug was
hiding another bug, which is that the max_idle should have used >= and
not >, this one second delay has a big impact on my new test.
Syntax: `ZMSCORE KEY MEMBER [MEMBER ...]`
This is an extension of #2359
amended by Tyson Andre to work with the changed unstable API,
add more tests, and consistently return an array.
- It seemed as if it would be more likely to get reviewed
after updating the implementation.
Currently, multi commands or lua scripting to call zscore multiple times
would almost definitely be less efficient than a native ZMSCORE
for the following reasons:
- Need to fetch the set from the string every time instead of reusing the C
pointer.
- Using pipelining or multi-commands would result in more bytes sent by
the client for the repeated `ZMSCORE KEY` sections.
- Need to specially encode the data and decode it from the client
for lua-based solutions.
- The fastest solution I've seen for large sets(thousands or millions)
involves lua and a variadic ZADD, then a ZINTERSECT, then a ZRANGE 0 -1,
then UNLINK of a temporary set (or lua). This is still inefficient.
Co-authored-by: Tyson Andre <tysonandre775@hotmail.com>
- the test now waits for specific set of log messages rather than wait for
timeout looking for just one message.
- we don't wanna sample the current length of the log after an action, due
to a race, we need to start the search from the line number of the last
message we where waiting for.
- when attempting to trigger a full sync, use multi-exec to avoid a race
where the replica manages to re-connect before we completed the set of
actions that should force a full sync.
- fix verify_log_message which was broken and unused
on ci.redis.io the test fails a lot, reporting that bgsave didn't end.
increaseing the timeout we wait for that bgsave to get aborted.
in addition to that, i also verify that it indeed got aborted by
checking that the save counter wasn't reset.
add another test to verify that a successful bgsave indeed resets the
change counter.
in cases where you have
test name {
start_server {
start_server {
assert
}
}
}
the exception will be thrown to the test proc, and the servers are
supposed to be killed on the way out. but it seems there was always a
bug of not cleaning the server stack, and recently (#7404) we started
relying on that stack in order to kill them, so with that bug sometimes
we would have tried to kill the same server twice, and leave one alive.
luckly, in most cases the pattern is:
start_server {
test name {
}
}
This re-implements the redis-cli --pipe test so it no longer depends on a close feature available only in TCL 8.6.
Basically what this test does is run redis-cli --pipe, generates a bunch of commands and pipes them through redis-cli, and inspects the result in both Redis and the redis-cli output.
To do that, we need to close stdin for redis-cli to indicate we're done so it can flush its buffers and exit. TCL has bi-directional channels can only offers a way to "one-way close" a channel with TCL 8.6. To work around that, we now generate the commands into a file and feed that file to redis-cli directly.
As we're writing to an actual file, the number of commands is now reduced.
Before this commit, the output of "./runtest-cluster --help" is incorrect.
After this commit, the format of the following 3 output is consistent:
./runtest --help
./runtest-cluster --help
./runtest-sentinel --help
interestingly the latency monitor test fails because valgrind is slow
enough so that the time inside PEXPIREAT command from the moment of
the first mstime() call to get the basetime until checkAlreadyExpired
calls mstime() again is more than 1ms, and that test was too sensitive.
using this opportunity to speed up the test (unrelated to the failure)
the fix is just the longer time passed to PEXPIRE.
in the majority of the cases (on this rarely used feature) we want to
stop and be able to connect to the shard with redis-cli.
since these are two different processes interracting with the tty we
need to stop both, and we'll have to hit enter twice, but it's not that
bad considering it is rarely used.
* Tests: fix and reintroduce redis-cli tests.
These tests have been broken and disabled for 10 years now!
* TLS: add remaining redis-cli support.
This adds support for the redis-cli --pipe, --rdb and --replica options
previously unsupported in --tls mode.
* Fix writeConn().
Similarly to EXPIREAT with TTL in the past, which implicitly deletes the
key and return success, RESTORE should not store key that are already
expired into the db.
When used together with REPLACE it should emit a DEL to keyspace
notification and replication stream.
On some platforms strtold("+inf") with valgrind returns a non-inf result
[err]: INCRBYFLOAT does not allow NaN or Infinity in tests/unit/type/incr.tcl
Expected 'ERR*would produce*' to equal or match '1189731495357231765085759.....'
tests were sensitive to additional log lines appearing in the log
causing the search to come empty handed.
instead of just looking for the n last log lines, capture the log lines
before performing the action, and then search from that offset.
* tests/valgrind: don't use debug restart
DEBUG REATART causes two issues:
1. it uses execve which replaces the original process and valgrind doesn't
have a chance to check for errors, so leaks go unreported.
2. valgrind report invalid calls to close() which we're unable to resolve.
So now the tests use restart_server mechanism in the tests, that terminates
the old server and starts a new one, new PID, but same stdout, stderr.
since the stderr can contain two or more valgrind report, it is not enough
to just check for the absence of leaks, we also need to check for some known
errors, we do both, and fail if we either find an error, or can't find a
report saying there are no leaks.
other changes:
- when killing a server that was already terminated we check for leaks too.
- adding DEBUG LEAK which was used to test it.
- adding --trace-children to valgrind, although no longer needed.
- since the stdout contains two or more runs, we need slightly different way
of checking if the new process is up (explicitly looking for the new PID)
- move the code that handles --wait-server to happen earlier (before
watching the startup message in the log), and serve the restarted server too.
* squashme - CR fixes
In order to support the use of multi-exec in pipeline, it is important that
MULTI and EXEC are never rejected and it is easy for the client to know if the
connection is still in multi state.
It was easy to make sure MULTI and DISCARD never fail (done by previous
commits) since these only change the client state and don't do any actual
change in the server, but EXEC is a different story.
Since in the past, it was possible for clients to handle some EXEC errors and
retry the EXEC, we now can't affort to return any error on EXEC other than
EXECABORT, which now carries with it the real reason for the abort too.
Other fixes in this commit:
- Some checks that where performed at the time of queuing need to be re-
validated when EXEC runs, for instance if the transaction contains writes
commands, it needs to be aborted. there was one check that was already done
in execCommand (-READONLY), but other checks where missing: -OOM, -MISCONF,
-NOREPLICAS, -MASTERDOWN
- When a command is rejected by processCommand it was rejected with addReply,
which was not recognized as an error in case the bad command came from the
master. this will enable to count or MONITOR these errors in the future.
- make it easier for tests to create additional (non deferred) clients.
- add tests for the fixes of this commit.
The scan key module API provides the scan callback with the current
field name and value (if it exists). Those arguments are RedisModuleString*
which means it supposes to point to robj which is encoded as a string.
Using createStringObjectFromLongLong function might return robj that
points to an integer and so break a module that tries for example to
use RedisModule_StringPtrLen on the given field/value.
The PR introduces a fix that uses the createObject function and sdsfromlonglong function.
Using those function promise that the field and value pass to the to the
scan callback will be Strings.
The PR also changes the Scan test module to use RedisModule_StringPtrLen
to catch the issue. without this, the issue is hidden because
RedisModule_ReplyWithString knows to handle integer encoding of the
given robj (RedisModuleString).
The PR also introduces a new test to verify the issue is solved.
these tests create several edge cases that are otherwise uncovered (at
least not consistently) by the test suite, so although they're no longer
testing what they were meant to test, it's still a good idea to keep
them in hope that they'll expose some issue in the future.
i.e. don't start the search from scratch hitting the used ones again.
this will also reduce the likelihood of collisions (if there are any
left) by increasing the time until we re-use a port we did use in the
past.
apparently when running tests in parallel (the default of --clients 16),
there's a chance for two tests to use the same port.
specifically, one test might shutdown a master and still have the
replica up, and then another test will re-use the port number of master
for another master, and then that replica will connect to the master of
the other test.
this can cause a master to count too many full syncs and fail a test if
we run the tests with --single integration/psync2 --loop --stop
see Probmem 2 in #7314
There's a rare case which leads to stagnation in the defragger, causing
it to keep scanning the keyspace and do nothing (not moving any
allocation), this happens when all the allocator slabs of a certain bin
have the same % utilization, but the slab from which new allocations are
made have a lower utilization.
this commit fixes it by removing the current slab from the overall
average utilization of the bin, and also eliminate any precision loss in
the utilization calculation and move the decision about the defrag to
reside inside jemalloc.
and also add a test that consistently reproduce this issue.
with the original version of 6.0.0, this test detects an excessive full
sync.
with the fix in 1a7cd2c0e, this test detects memory corruption,
especially when using libc allocator with or without valgrind.
this test which has coverage for varoius flows of diskless master was
failing randomly from time to time.
the failure was:
[err]: diskless all replicas drop during rdb pipe in tests/integration/replication.tcl
log message of '*Diskless rdb transfer, last replica dropped, killing fork child*' not found
what seemed to have happened is that the master didn't detect that all
replicas dropped by the time the replication ended, it thought that one
replica is still connected.
now the test takes a few seconds longer but it seems stable.
This bug was introduced by a recent change in which readQueryFromClient
is using freeClientAsync, and despite the fact that now
freeClientsInAsyncFreeQueue is in beforeSleep, that's not enough since
it's not called during loading in processEventsWhileBlocked.
furthermore, afterSleep was called in that case but beforeSleep wasn't.
This bug also caused slowness sine the level-triggered mode of epoll
kept signaling these connections as readable causing us to keep doing
connRead again and again for ll of these, which keep accumulating.
now both before and after sleep are called, but not all of their actions
are performed during loading, some are only reserved for the main loop.
fixes issue #7215
Seems like on some systems choosing specific TLS v1/v1.1 versions no
longer works as expected. Test is reduced for v1.2 now which is still
good enough to test the mechansim, and matters most anyway.
* fix memlry leaks with diskless replica short read.
* fix a few timing issues with valgrind runs
* fix issue with valgrind and watchdog schedule signal
about the valgrind WD issue:
the stack trace test in logging.tcl, has issues with valgrind:
==28808== Can't extend stack to 0x1ffeffdb38 during signal delivery for thread 1:
==28808== too small or bad protection modes
it seems to be some valgrind bug with SA_ONSTACK.
SA_ONSTACK seems unneeded since WD is not recursive (SA_NODEFER was removed),
also, not sure if it's even valid without a call to sigaltstack()
Currently, there are several types of threads/child processes of a
redis server. Sometimes we need deeply optimise the performance of
redis, so we would like to isolate threads/processes.
There were some discussion about cpu affinity cases in the issue:
https://github.com/antirez/redis/issues/2863
So implement cpu affinity setting by redis.conf in this patch, then
we can config server_cpulist/bio_cpulist/aof_rewrite_cpulist/
bgsave_cpulist by cpu list.
Examples of cpulist in redis.conf:
server_cpulist 0-7:2 means cpu affinity 0,2,4,6
bio_cpulist 1,3 means cpu affinity 1,3
aof_rewrite_cpulist 8-11 means cpu affinity 8,9,10,11
bgsave_cpulist 1,10-11 means cpu affinity 1,10,11
Test on linux/freebsd, both work fine.
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Now both master and replicas keep track of the last replication offset
that contains meaningful data (ignoring the tailing pings), and both
trim that tail from the replication backlog, and the offset with which
they try to use for psync.
the implication is that if someone missed some pings, or even have
excessive pings that the promoted replica has, it'll still be able to
psync (avoid full sync).
the downside (which was already committed) is that replicas running old
code may fail to psync, since the promoted replica trims pings form it's
backlog.
This commit adds a test that reproduces several cases of promotions and
demotions with stale and non-stale pings
Background:
The mearningful offset on the master was added recently to solve a problem were
the master is left all alone, injecting PINGs into it's backlog when no one is
listening and then gets demoted and tries to replicate from a replica that didn't
have any of the PINGs (or at least not the last ones).
however, consider this case:
master A has two replicas (B and C) replicating directly from it.
there's no traffic at all, and also no network issues, just many pings in the
tail of the backlog. now B gets promoted, A becomes a replica of B, and C
remains a replica of A. when A gets demoted, it trims the pings from its
backlog, and successfully replicate from B. however, C is still aware of
these PINGs, when it'll disconnect and re-connect to A, it'll ask for something
that's not in the backlog anymore (since A trimmed the tail of it's backlog),
and be forced to do a full sync (something it didn't have to do before the
meaningful offset fix).
Besides that, the psync2 test was always failing randomly here and there, it
turns out the reason were PINGs. Investigating it shows the following scenario:
cycle 1: redis #1 is master, and all the rest are direct replicas of #1
cycle 2: redis #2 is promoted to master, #1 is a replica of #2 and #3 is replica of #1
now we see that when #1 is demoted it prints:
17339:S 21 Apr 2020 11:16:38.523 * Using the meaningful offset 3929963 instead of 3929977 to exclude the final PINGs (14 bytes difference)
17339:S 21 Apr 2020 11:16:39.391 * Trying a partial resynchronization (request e2b3f8817735fdfe5fa4626766daa938b61419e5:3929964).
17339:S 21 Apr 2020 11:16:39.392 * Successful partial resynchronization with master.
and when #3 connects to the demoted #2, #2 says:
17339:S 21 Apr 2020 11:16:40.084 * Partial resynchronization not accepted: Requested offset for secondary ID was 3929978, but I can reply up to 3929964
so the issue here is that the meaningful offset feature saved the day for the
demoted master (since it needs to sync from a replica that didn't get the last
ping), but it didn't help one of the other replicas which did get the last ping.
STRALGO should be a container for mostly read-only string
algorithms in Redis. The algorithms should have two main
characteristics:
1. They should be non trivial to compute, and often not part of
programming language standard libraries.
2. They should be fast enough that it is a good idea to have optimized C
implementations.
Next thing I would love to see? A small strings compression algorithm.
this test is time sensitive and it sometimes fail to pass below the
latency threshold, even on strong machines.
this test was the reson we're running just 2 parallel tests in the
github actions CI, revering this.
There is an inherent race between the deferring client and the
"main" client of the test: While the deferring client issues a blocking
command, we can't know for sure that by the time the "main" client
tries to issue another command (Usually one that unblocks the deferring
client) the deferring client is even blocked...
For lack of a better choice this commit uses TCL's 'after' in order
to give some time for the deferring client to issues its blocking
command before the "main" client does its thing.
This problem probably exists in many other tests but this commit
tries to fix blockonkeys.tcl
By using a "circular BRPOPLPUSH"-like scenario it was
possible the get the same client on db->blocking_keys
twice (See comment in moduleTryServeClientBlockedOnKey)
The fix was actually already implememnted in
moduleTryServeClientBlockedOnKey but it had a bug:
the funxction should return 0 or 1 (not OK or ERR)
Other changes:
1. Added two commands to blockonkeys.c test module (To
reproduce the case described above)
2. Simplify blockonkeys.c in order to make testing easier
3. cast raxSize() to avoid warning with format spec
Makse sure call() doesn't wrap replicated commands with
a redundant MULTI/EXEC
Other, unrelated changes:
1. Formatting compiler warning in INFO CLIENTS
2. Use CLIENT_ID_AOF instead of UINT64_MAX
37a10cef introduced automatic wrapping of MULTI/EXEC for the
alsoPropagate API. However this collides with the built-in mechanism
already present in module.c. To avoid complex changes near Redis 6 GA
this commit introduces the ability to exclude call() MUTLI/EXEC wrapping
for also propagate in order to continue to use the old code paths in
module.c.
First, we must parse the IDs, so that we abort ASAP.
The return value of this command cannot be an error if
the client successfully acknowledged some messages,
so it should be executed in a "all or nothing" fashion.
the AOF will be loaded successfully, but the stream will be missing,
i.e inconsistencies with the original db.
this was because XADD with id of 0-0 would error.
add a test to reproduce.
Redis refusing to run MULTI or EXEC during script timeout may cause partial
transactions to run.
1) if the client sends MULTI+commands+EXEC in pipeline without waiting for
response, but these arrive to the shards partially while there's a busy script,
and partially after it eventually finishes: we'll end up running only part of
the transaction (since multi was ignored, and exec would fail).
2) similar to the above if EXEC arrives during busy script, it'll be ignored and
the client state remains in a transaction.
the 3rd test which i added for a case where MULTI and EXEC are ok, and
only the body arrives during busy script was already handled correctly
since processCommand calls flagTransaction
*** [err]: PSYNC2: total sum of full synchronizations is exactly 4 in tests/integration/psync2.tcl
Expected 5 == 4 (context: type eval line 6 cmd {assert {$sum == 4}} proc ::test)
issue was that sometime the test got an unexpected full sync since it
tried to switch to the replica before it was in sync with it's master.
it seems that running two clients at a time is ok too, resuces action
time from 20 minutes to 10. we'll use this for now, and if one day it
won't be enough we'll have to run just the sensitive tests one by one
separately from the others.
this commit also fixes an issue with the defrag test that appears to be
very rare.
seems that github actions are slow, using just one client to reduce
false positives.
also adding verbose, testing only on latest ubuntu, and building on
older one.
when doing that, i can reduce the test threshold back to something saner
I saw that the new defag test for list was failing in CI recently, so i
reduce it's threshold from 12 to 60.
besides that, i add / improve the latency test for that other two defrag
tests (add a sensitive latency and digest / save checks)
and fix bad usage of debug populate (can't overrides existing keys).
this was the original intention, which creates higher fragmentation.
When active defrag kicks in and finds a big list, it will create a bookmark to
a node so that it is able to resume iteration from that node later.
The quicklist manages that bookmark, and updates it in case that node is deleted.
This will increase memory usage only on lists of over 1000 (see
active-defrag-max-scan-fields) quicklist nodes (1000 ziplists, not 1000 items)
by 16 bytes.
In 32 bit build, this change reduces the maximum effective config of
list-compress-depth and list-max-ziplist-size (from 32767 to 8191)
This bug affected RM_StringToLongDouble and HINCRBYFLOAT.
I added tests for both cases.
Main changes:
1. Fixed string2ld to fail if string contains \0 in the middle
2. Use string2ld in getLongDoubleFromObject - No point of
having duplicated code here
The two changes above broke RM_SaveLongDouble/RM_LoadLongDouble
because the long double string was saved with length+1 (An innocent
mistake, but it's actually a bug - The length passed to
RM_SaveLongDouble should not include the last \0).
If a blocked module client times-out (or disconnects, unblocked
by CLIENT command, etc.) we need to call moduleUnblockClient
in order to free memory allocated by the module sub-system
and blocked-client private data
Other changes:
Made blockedonkeys.tcl tests a bit more aggressive in order
to smoke-out potential memory leaks
This commit solves the following bug:
127.0.0.1:6379> XGROUP CREATE x grp $ MKSTREAM
OK
127.0.0.1:6379> XADD x 666 f v
"666-0"
127.0.0.1:6379> XREADGROUP GROUP grp Alice BLOCK 0 STREAMS x >
1) 1) "x"
2) 1) 1) "666-0"
2) 1) "f"
2) "v"
127.0.0.1:6379> XADD x 667 f v
"667-0"
127.0.0.1:6379> XDEL x 667
(integer) 1
127.0.0.1:6379> XREADGROUP GROUP grp Alice BLOCK 0 STREAMS x >
1) 1) "x"
2) (empty array)
The root cause is that we use s->last_id in streamCompareID
while we should use the last *valid* ID
- make lua-replicate-commands mutable (it never was, but i don't see why)
- make tcp-backlog immutable (fix a recent refactory mistake)
- increase the max limit of a few configs to match what they were before
the recent refactory
This commit solves several edge cases that are related to
exhausting the streamID limits: We should correctly calculate
the succeeding streamID instead of blindly incrementing 'seq'
This affects both XREAD and XADD.
Other (unrelated) changes:
Reply with a better error message when trying to add an entry
to a stream that has exhausted last_id
With the previous API, a NULL return value was ambiguous and could
represent either an old value of NULL or an error condition. The new API
returns a status code and allows the old value to be returned
by-reference.
This commit also includes test coverage based on
tests/modules/datatype.c which did not exist at the time of the original
commit.
it seems that commit b087dd1db6 accidentially changed
gen_write_load to not use deferred client, which causes them to be slower and not
generate high load which they should, making some tests less effecitive
Changes in behavior:
- Change server.stream_node_max_entries from int64_t to long long, so that it can be used by the generic infra
- standard error reply instead of "repl-backlog-size must be 1 or greater" and such
- tls-port and a few TLS booleans were readable (config get) even when USE_OPENSSL was off (now they aren't)
- syslog-enabled, syslog-ident, cluster-enabled, appendfilename, and supervised didn't have a get (now they do)
- pidfile was initialized to NULL in InitServerConfig but had CONFIG_DEFAULT_PID_FILE in rewriteConfig (so the real default was "", but rewrite would cause it to be set), fixed the rewrite.
- TLS config in server.h was uninitialized (if no tls config args were provided)
Adding test for sanity and coverage
there were two lssues, one is taht BGREWRITEAOF failed since the initial one was still in progress
the solution for this one is to enable appendonly from the server startup so there's no initial aofrw.
the other problem was 0 loading progress events, theory is that on some
platforms a sleep of 1 will cause a much greater delay due to the context
switch, but on other platform it doesn't. in theory a sleep of 100 micro
for 1k keys whould take 100ms, and with hz of 500 we should be gettering
50 events (one every 2ms). in practise it doesn't work like that, so trying
to find a sleep that would be long enough but still not cause the test to take
too long.
Calling XADD with 0-0 or 0 would result in creating an
empty key and storing it in the database.
Even worse, because XADD will reply with error the action
will not be replicated, creating a master-replica
inconsistency
- Adding RM_ScanKey
- Adding tests for RM_ScanKey
- Refactoring RM_Scan API
Changes in RM_Scan
- cleanup in docs and coding convention
- Moving out of experimantal Api
- Adding ctx to scan callback
- Dont use cursor of -1 as an indication of done (can be a valid cursor)
- Set errno when returning 0 for various reasons
- Rename Cursor to ScanCursor
- Test filters key that are not strings, and opens a key if NULL
The implementation expose the following new functions:
1. RedisModule_CursorCreate - allow to create a new cursor object for
keys scanning
2. RedisModule_CursorRestart - restart an existing cursor to restart the
scan
3. RedisModule_CursorDestroy - destroy an existing cursor
4. RedisModule_Scan - scan keys
The RedisModule_Scan function gets a cursor object, a callback and void*
(used as user private data).
The callback will be called for each key in the database proving the key
name and the value as RedisModuleKey.
- the API name was odd, separated to two apis one for LRU and one for LFU
- the LRU idle time was in 1 second resolution, which might be ok for RDB
and RESTORE, but i think modules may need higher resolution
- adding tests for LFU and for handling maxmemory policy mismatch
recently added more reads into that function, if a later read fails, i must
either free what's already allocated, or return the pointer so that the free
callback will release it.
Add two new functions that leverage the RedisModuleDataType mechanism
for RDB serialization/deserialization and make it possible to use it
to/from arbitrary strings:
* RM_SaveDataTypeToString()
* RM_LoadDataTypeFromString()
rename RM_ServerInfoGetFieldNumerical RM_ServerInfoGetFieldSigned
move string2ull to util.c
fix leak in RM_GetServerInfo when duplicate info fields exist
looks like each platform implements long double differently (different bit count)
so we can't save them as binary, and we also want to avoid creating a new RDB
format version, so we save these are hex strings using "%La".
This commit includes a change in the arguments of ld2string to support this.
as well as tests for coverage and short reads.
coded by @guybe7
- Add RM_GetServerInfo and friends
- Add auto memory for new opaque struct
- Add tests for new APIs
other minor fixes:
- add const in various char pointers
- requested_section in modulesCollectInfo was actually not sds but char*
- extract new string2d out of getDoubleFromObject for code reuse
Add module API for
* replication hooks: role change, master link status, replica online/offline
* persistence hooks: saving, loading, loading progress
* misc hooks: cron loop, shutdown, module loaded/unloaded
* change the way hooks test work, and add tests for all of the above
startLoading() now gets flag indicating what is loaded.
stopLoading() now gets an indication of success or failure.
adding startSaving() and stopSaving() with similar args and role.
sometimes we have several assertions with the same condition in the same test
at different stages, and when these fail (the ones that print the condition
text) you don't know which one it was. other assertions didn't print the
condition text (variable names), just the expected and unexpected values.
So now, all assertions print context line, and conditin text.
besides, one of the major differences between 'assert' and 'assert_equal',
is that the later is able to print the value that doesn't match the expected.
if there is a rare non-reproducible failure, it is helpful to know what was
the value the test encountered and how far it was from the threshold.
So now, adding assert_lessthan and assert_range that can be used in some places.
were we used just 'assert { a > b }' so far.
Adding a test for coverage for RM_Call in a new "misc" unit
to be used for various short simple tests
also solves compilation warnings in redismodule.h and fork.c
misc:
- handle SSL_has_pending by iterating though these in beforeSleep, and setting timeout of 0 to aeProcessEvents
- fix issue with epoll signaling EPOLLHUP and EPOLLERR only to the write handlers. (needed to detect the rdb pipe was closed)
- add key-load-delay config for testing
- trim connShutdown which is no longer needed
- rioFdsetWrite -> rioFdWrite - simplified since there's no longer need to write to multiple FDs
- don't detect rdb child exited (don't call wait3) until we detect the pipe is closed
- Cleanup bad optimization from rio.c, add another one
* Introduce a connection abstraction layer for all socket operations and
integrate it across the code base.
* Provide an optional TLS connections implementation based on OpenSSL.
* Pull a newer version of hiredis with TLS support.
* Tests, redis-cli updates for TLS support.
now that replica can read rdb directly from the socket, it should avoid exiting
on short read and instead try to re-sync.
this commit tries to have minimal effects on non-diskless rdb reading.
and includes a test that tries to trigger this scenario on various read cases.
* create module API for forking child processes.
* refactor duplicate code around creating and tracking forks by AOF and RDB.
* child processes listen to SIGUSR1 and dies exitFromChild in order to
eliminate a valgrind warning of unhandled signal.
* note that BGSAVE error reply has changed.
valgrind error is:
Process terminating with default action of signal 10 (SIGUSR1)
The implementation of the diskless replication was currently diskless only on the master side.
The slave side was still storing the received rdb file to the disk before loading it back in and parsing it.
This commit adds two modes to load rdb directly from socket:
1) when-empty
2) using "swapdb"
the third mode of using diskless slave by flushdb is risky and currently not included.
other changes:
--------------
distinguish between aof configuration and state so that we can re-enable aof only when sync eventually
succeeds (and not when exiting from readSyncBulkPayload after a failed attempt)
also a CONFIG GET and INFO during rdb loading would have lied
When loading rdb from the network, don't kill the server on short read (that can be a network error)
Fix rdb check when performed on preamble AOF
tests:
run replication tests for diskless slave too
make replication test a bit more aggressive
Add test for diskless load swapdb
Add tests to check basic functionality of this optional keyword, and also tested with
a module (redisgraph). Checked quickly with valgrind, no issues.
Copies name the type name canonicalisation code from `typeCommand`, perhaps this would
be better factored out to prevent the two diverging and both needing to be edited to
add new `OBJ_*` types, but this is a little fiddly with C strings.
The [redis-doc](https://github.com/antirez/redis-doc/blob/master/commands.json) repo
will need to be updated with this new arg if accepted.
A quirk to be aware of here is that the GEO commands are backed by zsets not their own
type, so they're not distinguishable from other zsets.
Additionally, for sparse types this has the same behaviour as `MATCH` in that it may
return many empty results before giving something, even for large `COUNT`s.
In fast systems "SLOWLOG RESET" is fast enough to don't be logged even
when the time limit is "1" sometimes. Leading to false positives such
as:
[err]: SLOWLOG - can be disabled in tests/unit/slowlog.tcl
Expected '1' to be equal to '0'
Now clients that are ready to be terminated asynchronously are processed
more often in beforeSleep() instead of being processed in serverCron().
This means that the test will not be able to catch the moment the client
was terminated, also note that the 'omem' figure now changes in big
steps, because of the new client output buffers layout.
So we have to change the test range in order to accomodate for that.
Yet the test is useful enough to be worth taking, even if its precision
is reduced by this commit. Probably if we get more problems, a thing
that makes sense is just to check that the limit is < 200k. That's more
than enough actually.
solving few replication related tests race conditions which fail on slow machines
bugfix in slave buffers test: since the test is executed twice, each time with
a different commands count, the threshold for the delta can't be a constant.
A filter handle is returned and can be used to unregister a filter. In
the future it can also be used to further configure or manipulate the
filter.
Filters are now automatically unregistered when a module unloads.
The XCLAIM docs state the XCLAIM increments the delivery counter for
messages. This PR makes the code match the documentation - which seems
like the desired behaviour - whilst still allowing RETRYCOUNT to be
specified manually.
My understanding of the way streamPropagateXCLAIM() works is that this
change will safely propagate to replicas since retry count is pulled
directly from the streamNACK struct.
Fixes#5194
This way the behavior is very similar to the past one.
This is useful in order to remember the user she probably failed to
configure a password correctly.
Few tests had borderline thresholds that were adjusted.
The slave buffers test had two issues, preventing the slave buffer from growing:
1) the slave didn't necessarily go to sleep on time, or woke up too early,
now using SIGSTOP to make sure it goes to sleep exactly when we want.
2) the master disconnected the slave on timeout
This should be able to find new bugs and regressions about the new
sorted set update function when ZADD is used to update an element
already existing.
The test is able to find the bug fixed at 2f282aee immediately.
* allowing --single to be repeated
* adding --only so that only a specific test inside a unit can be run
* adding --skiptill useful to resume a test that crashed passed the problematic unit.
useful together with --clients 1
* adding --skipfile to use a file containing list of tests names to skip
* printing the names of the tests that are skiped by skipfile or denytags
* adding --config to add config file options from command line
it looks like on slow machines we're getting:
[err]: slave buffer are counted correctly in tests/unit/maxmemory.tcl
Expected condition '$slave_buf > 2*1024*1024' to be true (16914 > 2*1024*1024)
this is a result of the slave waking up too early and eating the
slave buffer before the traffic and the test ends.
on slower machines, the active defrag test tended to fail.
although the fragmentation ratio was below the treshold, the defragger was
still in the middle of a scan cycle.
this commit changes:
- the defragger uses the current fragmentation state, rather than the cache one
that is updated by server cron every 100ms. this actually fixes a bug of
starting one excess scan cycle
- the test lets the defragger use more CPU cycles, in hope that the defrag
will be faster, but also give it more time before we give up.
A) slave buffers didn't count internal fragmentation and sds unused space,
this caused them to induce eviction although we didn't mean for it.
B) slave buffers were consuming about twice the memory of what they actually needed.
- this was mainly due to sdsMakeRoomFor growing to twice as much as needed each time
but networking.c not storing more than 16k (partially fixed recently in 237a38737).
- besides it wasn't able to store half of the new string into one buffer and the
other half into the next (so the above mentioned fix helped mainly for small items).
- lastly, the sds buffers had up to 30% internal fragmentation that was wasted,
consumed but not used.
C) inefficient performance due to starting from a small string and reallocing many times.
what i changed:
- creating dedicated buffers for reply list, counting their size with zmalloc_size
- when creating a new reply node from, preallocate it to at least 16k.
- when appending a new reply to the buffer, first fill all the unused space of the
previous node before starting a new one.
other changes:
- expose mem_not_counted_for_evict info field for the benefit of the test suite
- add a test to make sure slave buffers are counted correctly and that they don't cause eviction
* fail the test (exit code) in case of timeout.
* add --wait-server to allow attaching a debugger
* add --dont-clean to keep log files when tests are done
RESTORE now supports:
1. Setting LRU/LFU
2. Absolute-time TTL
Other related changes:
1. RDB loading will not override LRU bits when RDB file
does not contain the LRU opcode.
2. RDB loading will not set LRU/LFU bits if the server's
maxmemory-policy does not match.
Removing the fix about 50% of the times the test will not be able to
pass cleanly. It's very hard to write a test that will always fail, or
actually, it is possible but then it's likely that it will consistently
pass if we change some random bit, so better to use randomization here.
problems fixed:
* failing to read fragmentation information from jemalloc
* overflow in jemalloc fragmentation hint to the defragger
* test suite not triggering eviction after population
other fixes / improvements:
- LUA script memory isn't taken from zmalloc (taken from libc malloc)
so it can cause high fragmentation ratio to be displayed (which is false)
- there was a problem with "fragmentation" info being calculated from
RSS and used_memory sampled at different times (now sampling them together)
other details:
- adding a few more allocator info fields to INFO and MEMORY commands
- improve defrag test to measure defrag latency of big keys
- increasing the accuracy of the defrag test (by looking at real grag info)
this way we can use an even lower threshold and still avoid false positives
- keep the old (total) "fragmentation" field unchanged, but add new ones for spcific things
- add these the MEMORY DOCTOR command
- deduct LUA memory from the rss in case of non jemalloc allocator (one for which we don't "allocator active/used")
- reduce sampling rate of the rss and allocator info
After checking with the community via Twitter (here:
https://twitter.com/antirez/status/915130876861788161) the verdict was to
use ":". However I later realized, after users lamented the fact that
it's hard to copy IDs just with double click, that this was the reason
why I moved to "." in the first instance. Fortunately "-", that was the
other option with most votes, also gets selected with double click on
most terminal applications on Linux and MacOS.
So my reasoning was:
1) We can't retain "." because it's actually confusing to newcomers, it
looks like a floating number, people may be tricked into thinking they
can order IDs numerically as floats.
2) Moving to a double-click-to-select format is much better. People will
work with such IDs for long time when coding / debugging. Why making now
a choice that will impact this for the next years?
The only other viable option was "-", and that's what I did. Thanks.
getLongLongFromObject calls string2ll which has this line:
/* Return if not all bytes were used. */
so if you pass an sds with 3 characters "1\01" it will fail.
but getLongDoubleFromObject calls strtold, and considers it ok if eptr[0]==`\0`
i.e. if the end of the string found by strtold ends with null terminator
127.0.0.1:6379> set a 1
OK
127.0.0.1:6379> setrange a 2 2
(integer) 3
127.0.0.1:6379> get a
"1\x002"
127.0.0.1:6379> incrbyfloat a 2
"3"
127.0.0.1:6379> get a
"3"
This commit closes issue #3698, at least for now, since the root cause
was not fixed: the bounding box function, for huge radiuses, does not
return a correct bounding box, there are points still within the radius
that are left outside.
So when using GEORADIUS queries with radiuses in the order of 5000 km or
more, it was possible to see, at the edge of the area, certain points
not correctly reported.
Because the bounding box for now was used just as an optimization, and
such huge radiuses are not common, for now the optimization is just
switched off when the radius is near such magnitude.
Three test cases found by the Continuous Integration test were added, so
that we can easily trigger the bug again, both for regression testing
and in order to properly fix it as some point in the future.
Experimentally verified that it can trigger the issue reverting the fix.
At least on my system... Being the bug time/backlog dependant, it is
very hard to tell if this test will be able to trigger the problem
consistently, however even if it triggers the problem once in a while,
we'll see it in the CI environment at http://ci.redis.io.
Apparently 1.4 is too low compared to what you get in certain setups
(including mine). I raised it to 1.55 that hopefully is still enough to
test that the fragmentation went down from 1.7 but without incurring in
issues, however the test setup may be still fragile so certain times this
may lead to false positives again, it's hard to test for these things
in a determinsitic way.
Related to #3786.
And many other related Github issues... all reporting the same problem.
There was probably just not enough backlog in certain unlucky runs.
I'll ask people that can reporduce if they see now this as fixed as
well.
Testing with Solaris C compiler (SunOS 5.11 11.2 sun4v sparc sun4v)
there were issues compiling due to atomicvar.h and running the
tests also failed because of "tail" usage not conform with Solaris
tail implementation. This commit fixes both the issues.
Slow systems like the original Raspberry PI need more time
than 5 seconds to start the script and detect writes.
After fixing the Raspberry PI can pass the unit without issues.
The test now uses more diverse radius sizes, especially sizes near or
greater the whole earth surface are used, that are known to trigger edge
cases. Moreover the PRNG seeding was probably resulting into the same
sequence tested over and over again, now seeding unsing the current unix
time in milliseconds.
Related to #3631.
This actually includes two changes:
1) No newlines to take the master-slave link up when the upstream master
is down. Doing this is dangerous because the sub-slave often is received
replication protocol for an half-command, so can't receive newlines
without desyncing the replication link, even with the code in order to
cancel out the bytes that PSYNC2 was using. Moreover this is probably
also not needed/sane, because anyway the slave can keep serving
requests, and because if it's configured to don't serve stale data, it's
a good idea, actually, to break the link.
2) When a +CONTINUE with a different ID is received, we now break
connection with the sub-slaves: they need to be notified as well. This
was part of the original specification but for some reason it was not
implemented in the code, and was alter found as a PSYNC2 bug in the
integration testing.
This is the PSYNC2 test that helped find issues in the code, and that
still can show a protocol desync from time to time. Work is in progress
in order to find the issue. For now the test is not enabled in "make
test" and must be run manually.
By grepping the continuous integration errors log a number of GEORADIUS
tests failures were detected.
Fortunately when a GEORADIUS failure happens, the test suite logs enough
information in order to reproduce the problem: the PRNG seed,
coordinates and radius of the query.
By reproducing the issues, three different bugs were discovered and
fixed in this commit. This commit also improves the already good
reporting of the fuzzer and adds the failure vectors as regression
tests.
The issues found:
1. We need larger squares around the poles in order to cover the area
requested by the user. There were already checks in order to use a
smaller step (larger squares) but the limit set (+/- 67 degrees) is not
enough in certain edge cases, so 66 is used now.
2. Even near the equator, when the search area center is very near the
edge of the square, the north, south, west or ovest square may not be
able to fully cover the specified radius. Now a test is performed at the
edge of the initial guessed search area, and larger squares are used in
case the test fails.
3. Because of rounding errors between Redis and Tcl, sometimes the test
signaled false positives. This is now addressed.
Whenever possible the original code was improved a bit in other ways. A
debugging example stanza was added in order to make the next debugging
session simpler when the next bug is found.
During the initial handshake with the master a slave will report to have
a very high disconnection time from its master (since technically it was
disconnected since forever, so the current UNIX time in seconds is
reported).
However when the slave is connected again the Sentinel may re-scan the
INFO output again only after 10 seconds, which is a long time. During
this time Sentinels will consider this instance unable to failover, so
a useless delay is introduced.
Actaully this hardly happened in the practice because when a slave's
master is down, the INFO period for slaves changes to 1 second. However
when a manual failover is attempted immediately after adding slaves
(like in the case of the Sentinel unit test), this problem may happen.
This commit changes the INFO period to 1 second even in the case the
slave's master is not down, but the slave reported to be disconnected
from the master (by publishing, last time we checked, a master
disconnection time field in INFO).
This change is required as a result of an unrelated change in the
replication code that adds a small delay in the master-slave first
synchronization.
The test works but is very slow so far, since it involves resharding
1/5 of all the cluster slots from master 0 to the other 4 masters and
back into the original master.
Now elements added to lists are incremental numbers in order to
understand, when inconsistencies are found, what is the order in which
the elements were added. Also the error now provides both the expected
and found value.
This fix, provided by Paul Kulchenko (@pkulchenko), allows the Lua
scripting engine to evaluate statements with a trailing comment like the
following one:
EVAL "print() --comment" 0
Lua can't parse the above if the string does not end with a newline, so
now a final newline is always added automatically. This does not change
the SHA1 of scripts since the SHA1 is computed on the body we pass to
EVAL, without the other code we add to register the function.
Close#2951.
It's a key invariant that when AOF is enabled, after the cluster
reshards, a crash-recovery event causes all the keys to be still fine
with the expected logical content. Now this is part of unit 04.
The old version was modeled with two failovers, however after the first
it is possible that another slave will migrate to the new master, since
for some time the new master is not backed by any slave. Probably there
should be some pause after a failover, before the migration. Anyway the
test is simpler in this way, and depends less on timing.
64 bit double math is not enough to make the test passing, and rounding
to 1.2999999 instead of 1.23 is not an error in the implementation.
Valgrind and sometimes other archs are not able to work with 80 bit
doubles.
An user raised a question about a given behavior of PFCOUNT. Added a
test to show the behavior (union) is correct when most of the items are
in common.
HINCRBY* tests later used the value "tmp" that was sometimes generated
by the random key generation function. The result was ovewriting what
Tcl expected to be inside Redis with another value, causing the next
HSTRLEN test to fail.
Georadius works by computing the center + neighbors squares covering all
the area of the specified position and radius. Then a distance filter is
used to remove elements which are actually outside the range.
When a huge radius is used, like 5000 km or more, adjacent neighbors may
collide and be the same, leading to the reporting of the same element
multiple times. This only happens in the edge case of huge radius but is
not ideal.
A robust but slow solution would involve qsorting the range to remove
all the duplicates. However since the collisions are only in adjacent
boxes, for the way they are ordered in the code, it is much faster to
just check if the current box is the same as the previous one processed.
This commit adds a regression test for the bug.
Fixes#2767.
MOVE was not able to move the TTL: when a key was moved into a different
database number, it became persistent like if PERSIST was used.
In some incredible way (I guess almost nobody uses Redis MOVE) this bug
remained unnoticed inside Redis internals for many years.
Finally Andy Grunwald discovered it and opened an issue.
This commit fixes the bug and adds a regression test.
Close#2765.
This additional info may provide more clues about the test randomly
failing from time to time. Probably the failure is due to some previous
test that overwrites the logical content in the Tcl variable, but this
will make the problem more obvious.
Rationale:
1. The commands look like internals exposed without a real strong use
case.
2. Whatever there is an use case, the client would implement the
commands client side instead of paying RTT just to use a simple to
reimplement library.
3. They add complexity to an otherwise quite straightforward API.
So for now KILLED ;-)
The GIS standard and all the major DBs implementing GIS related
functions take coordinates as x,y that is longitude,latitude.
It was a bad start for Redis to do things differently, so even if this
means that existing users of the Geo module will be required to change
their code, Redis now conforms to the standard.
Usually Redis is very backward compatible, but this is not an exception
to this rule, since this is the first Geo implementation entering the
official Redis source code. It is not wise to try to be backward
compatible with code forks... :-)
Close#2637.
We set random points in the world, pick a random position, and check if
the returned points by Redis match the ones computed by Tcl by brute
forcing all the points using the distance between two points formula.
This approach is sounding since immediately resulted in finding a bug in
the original implementation.
Current todo:
- replace functions in zset.{c,h} with a new unified Redis
zset access API.
Once we get the zset interface fixed, we can squash
relevant commits in this branch and have one nice commit
to merge into unstable.
This commit adds:
- Geo commands
- Tests; runnable with: ./runtest --single unit/geo
- Geo helpers in deps/geohash-int/
- src/geo.{c,h} and src/geojson.{c,h} implementing geo commands
- Updated build configurations to get everything working
- TEMPORARY: src/zset.{c,h} implementing zset score and zset
range reading without writing to client output buffers.
- Modified linkage of one t_zset.c function for use in zset.c
Conflicts:
src/Makefile
src/redis.c
1. HVSTRLEN -> HSTRLEN. It's unlikely one needs the length of the key,
not clear how the API would work (by value does not make sense) and
there will be better names anyway.
2. Default is to return 0 when field is missing.
3. Default is to return 0 when key is missing.
4. The implementation was slower than needed, and produced unnecessary COW.
Related issue #2415.
This test on Linux was extremely slow, since in Tcl we can't enable
easily tcp-nodelay, so the busy loop used to take *a lot* with bigger
writes. Fixed using pipelining.