Commit Graph

4196 Commits

Author SHA1 Message Date
Mota
81fe7a4733 redis-benchmark: default value size usage update.
default size of SET/GET value in usage should be 3 bytes as in main code.
2017-07-25 23:43:46 +08:00
Salvatore Sanfilippo
6b64cc47a0 Merge pull request #2259 from badboy/fix-2258
Check that the whole first argument is a number
2017-07-24 15:19:53 +02:00
Salvatore Sanfilippo
964224b77f Merge pull request #4124 from lamby/proceding-proceeding-typo
Correct proceding -> proceeding typo.
2017-07-24 15:19:21 +02:00
Salvatore Sanfilippo
ae40e5f362 Merge pull request #4125 from trevor211/fixAutoAofRewirteMinSize
fix rewrite config: auto-aof-rewrite-min-size
2017-07-24 15:18:56 +02:00
Salvatore Sanfilippo
25c231c4c1 Merge pull request #1998 from grobe0ba/unstable
Fix missing '-' in redis-benchmark help output (Issue #1996)
2017-07-24 15:18:08 +02:00
Salvatore Sanfilippo
d9565379da Merge pull request #4128 from leonchen83/unstable
fix mismatch argument and return wrong value of clusterDelNodeSlots
2017-07-24 14:18:28 +02:00
liangsijian
ffbbe5a720 Fix lua ldb command log 2017-07-24 19:24:06 +08:00
antirez
314043552b Modules: don't crash when Lua calls a module blocking command.
Lua scripting does not support calling blocking commands, however all
the native Redis commands are flagged as "s" (no scripting flag), so
this is not possible at all. With modules there is no such mechanism in
order to flag a command as non callable by the Lua scripting engine,
moreover we cannot trust the modules users from complying all the times:
it is likely that modules will be released to have blocking commands
without such commands being flagged correctly, even if we provide a way to
signal this fact.

This commit attempts to address the problem in a short term way, by
detecting that a module is trying to block in the context of the Lua
scripting engine client, and preventing to do this. The module will
actually believe to block as usually, but what happens is that the Lua
script receives an error immediately, and the background call is ignored
by the Redis engine (if not for the cleanup callbacks, once it
unblocks).

Long term, the more likely solution, is to introduce a new call called
RedisModule_GetClientFlags(), so that a command can detect if the caller
is a Lua script, and return an error, or avoid blocking at all.

Being the blocking API experimental right now, more work is needed in
this regard in order to reach a level well blocking module commands and
all the other Redis subsystems interact peacefully.

Now the effect is like the following:

    127.0.0.1:6379> eval "redis.call('hello.block',1,5000)" 0
    (error) ERR Error running script (call to
    f_b5ba35ff97bc1ef23debc4d6e9fd802da187ed53): @user_script:1: ERR
    Blocking module command called from Lua script

This commit fixes issue #4127 in the short term.
2017-07-23 12:55:37 +02:00
antirez
5bfdfbe174 Fix typo in unblockClientFromModule() top comment. 2017-07-23 12:41:26 +02:00
antirez
a3778f3b0f Make representClusterNodeFlags() more robust.
This function failed when an internal-only flag was set as an only flag
in a node: the string was trimmed expecting a final comma before
exiting the function, causing a crash. See issue #4142.
Moreover generation of flags representation only needed at DEBUG log
level was always performed: a waste of CPU time. This is fixed as well
by this commit.
2017-07-20 15:17:35 +02:00
antirez
b1c2e1a19c Fix two bugs in moduleTypeLookupModuleByID().
The function cache was not working at all, and the function returned
wrong values if there where two or more modules exporting native data
types.

See issue #4131 for more details.
2017-07-20 14:59:42 +02:00
Leon Chen
9e7a8c0207 fix return wrong value of clusterDelNodeSlots 2017-07-20 17:24:38 +08:00
Leon Chen
2cdf4cc656 fix mismatch argument 2017-07-18 02:28:24 -05:00
WuYunlong
c32c690de6 fix rewrite config: auto-aof-rewrite-min-size 2017-07-15 10:20:56 +08:00
Chris Lamb
7560d347da Correct proceding -> proceeding typo. 2017-07-14 22:53:14 +01:00
antirez
bd1782fa0a Modules: fix thread safe context DB selection.
Before this fix the DB currenty selected by the client blocked was not
respected and operations were always performed on DB 0.
2017-07-14 13:02:15 +02:00
antirez
8eefc9323d Allow certain modules APIs only defining REDISMODULE_EXPERIMENTAL_API.
Those calls may be subject to changes in the future, so the user should
acknowledge it is using non stable API.
2017-07-14 12:07:52 +02:00
antirez
f03947a676 Modules documentation removed from source.
Moving to redis-doc repository to publish via Redis.io.
2017-07-14 11:33:59 +02:00
antirez
43aaf96163 Markdown generation of Redis Modules API reference improved. 2017-07-14 11:29:31 +02:00
antirez
e74f0aa6d1 Fix replication of SLAVEOF inside transaction.
In Redis 4.0 replication, with the introduction of PSYNC2, masters and
slaves replicate commands to cascading slaves and to the replication
backlog itself in a different way compared to the past.

Masters actually replicate the effects of client commands.
Slaves just propagate what they receive from masters.

This mechanism can cause problems when the configuration of an instance
is changed from master to slave inside a transaction. For instance
we could send to a master instance the following sequence:

    MULTI
    SLAVEOF 127.0.0.1 0
    EXEC
    SLAVEOF NO ONE

Before the fixes in this commit, the MULTI command used to be propagated
into the replication backlog, however after the SLAVEOF command the
instance is a slave, so the EXEC implementation failed to also propagate
the EXEC command. When the slaves of the above instance reconnected,
they were incrementally synchronized just sending a "MULTI". This put
the master client (in the slaves) into MULTI state, breaking the
replication.

Notably even Redis Sentinel uses the above approach in order to guarantee
that configuration changes are always performed together with rewrites
of the configuration and with clients disconnection. Sentiel does:

    MULTI
    SLAVEOF ...
    CONFIG REWRITE
    CLIENT KILL TYPE normal
    EXEC

So this was a really problematic issue. However even with the fix in
this commit, that will add the final EXEC to the replication stream in
case the instance was switched from master to slave during the
transaction, the result would be to increment the slave replication
offset, so a successive reconnection with the new master, will not
permit a successful partial resynchronization: no way the new master can
provide us with the backlog needed, we incremented our offset to a value
that the new master cannot have.

However the EXEC implementation waits to emit the MULTI, so that if the
commands inside the transaction actually do not need to be replicated,
no commands propagation happens at all. From multi.c:

    if (!must_propagate && !(c->cmd->flags & (CMD_READONLY|CMD_ADMIN))) {
	execCommandPropagateMulti(c);
	must_propagate = 1;
    }

The above code is already modified by this commit you are reading.
Now also ADMIN commands do not trigger the emission of MULTI. It is actually
not clear why we do not just check for CMD_WRITE... Probably I wrote it this
way in order to make the code more reliable: better to over-emit MULTI
than not emitting it in time.

So this commit should indeed fix issue #3836 (verified), however it looks
like some reconsideration of this code path is needed in the long term.

BONUS POINT: The reverse bug.

Even in a read only slave "B", in a replication setup like:

	A -> B -> C

There are commands without the READONLY nor the ADMIN flag, that are also
not flagged as WRITE commands. An example is just the PING command.

So if we send B the following sequence:

    MULTI
    PING
    SLAVEOF NO ONE
    EXEC

The result will be the reverse bug, where only EXEC is emitted, but not the
previous MULTI. However this apparently does not create problems in practice
but it is yet another acknowledge of the fact some work is needed here
in order to make this code path less surprising.

Note that there are many different approaches we could follow. For instance
MULTI/EXEC blocks containing administrative commands may be allowed ONLY
if all the commands are administrative ones, otherwise they could be
denined. When allowed, the commands could simply never be replicated at all.
2017-07-12 11:07:28 +02:00
antirez
e1b8b4b6da CLUSTER GETKEYSINSLOT: avoid overallocating.
Close #3911.
2017-07-11 15:49:09 +02:00
antirez
5bd46d33db Fix isHLLObjectOrReply() to handle integer encoded strings.
Close #3766.
2017-07-11 12:44:59 +02:00
antirez
e203a46cf3 Clients blocked in modules: free argv/argc later.
See issue #3844 for more information.
2017-07-11 12:33:01 +02:00
antirez
14c32c3569 Merge branch 'unstable' of github.com:/antirez/redis into unstable 2017-07-11 09:46:58 +02:00
antirez
54e4bbeabd Event loop: call after sleep() only from top level.
In general we do not want before/after sleep() callbacks to be called
when we re-enter the event loop, since those calls are only designed in
order to perform operations every main iteration of the event loop, and
re-entering is often just a way to incrementally serve clietns with
error messages or other auxiliary operations. However, if we call the
callbacks, we are then forced to think at before/after sleep callbacks
as re-entrant, which is much harder without any good need.

However here there was also a clear bug: beforeSleep() was actually
never called when re-entering the event loop. But the new afterSleep()
callback was. This is broken and in this instance re-entering
afterSleep() caused a modules GIL dead lock.
2017-07-11 00:13:52 +02:00
Salvatore Sanfilippo
58104d8327 Merge pull request #4113 from guybe7/module_io_bytes
Modules: Fix io->bytes calculation in RDB save
2017-07-10 19:14:34 +02:00
antirez
11182a1a58 redis-check-aof: tell users there is a --fix option. 2017-07-10 16:41:25 +02:00
Guy Benoish
dfb68cd235 Modules: Fix io->bytes calculation in RDB save 2017-07-10 14:41:57 +03:00
antirez
fc7ecd8d35 AOF check utility: ability to check files with RDB preamble. 2017-07-10 13:38:23 +02:00
Salvatore Sanfilippo
6b0670daad Merge pull request #3853 from itamarhaber/issue-3851
Sets up fake client to select current db in RM_Call()
2017-07-06 15:02:11 +02:00
Salvatore Sanfilippo
38dd30af42 Merge pull request #4105 from spinlock/unstable-networking
Optimize addReplyBulkSds for better performance
2017-07-06 14:31:08 +02:00
Salvatore Sanfilippo
2d5aa00959 Merge pull request #4106 from petersunbag/unstable
minor fix in listJoin().
2017-07-06 14:29:37 +02:00
sunweinan
87f771bff1 minor fix in listJoin(). 2017-07-06 19:47:21 +08:00
antirez
2b36950e9b Free IO context if any in RDB loading code.
Thanks to @oranagra for spotting this bug.
2017-07-06 11:20:49 +02:00
antirez
51ffd062d3 Modules: DEBUG DIGEST interface. 2017-07-06 11:04:46 +02:00
spinlock
10db81af71 update Makefile for test-sds 2017-07-05 14:32:09 +00:00
spinlock
ea31a4eae3 Optimize addReplyBulkSds for better performance 2017-07-05 14:25:05 +00:00
antirez
f9fac7f777 Avoid closing invalid FDs to make Valgrind happier. 2017-07-05 15:40:25 +02:00
antirez
413c2bc180 Modules: no MULTI/EXEC for commands replicated from async contexts.
They are technically like commands executed from external clients one
after the other, and do not constitute a single atomic entity.
2017-07-05 10:10:20 +02:00
Salvatore Sanfilippo
09dd7b5ff0 Merge pull request #4101 from dvirsky/fix_modules_reply_len
Proposed fix to #4100
2017-07-04 12:01:51 +02:00
antirez
eddd8d34c4 Add symmetrical assertion to track c->reply_buffer infinite growth.
Redis clients need to have an instantaneous idea of the amount of memory
they are consuming (if the number is not exact should at least be
proportional to the actual memory usage). We do that adding and
subtracting the SDS length when pushing / popping from the client->reply
list. However it is quite simple to add bugs in such a setup, by not
taking the objects in the list and the count in sync. For such reason,
Redis has an assertion to track counts near 2^64: those are always the
result of the counter wrapping around because we subtract more than we
add. This commit adds the symmetrical assertion: when the list is empty
since we sent everything, the reply_bytes count should be zero. Thanks
to the new assertion it should be simple to also detect the other
problem, where the count slowly increases because of over-counting.
The assertion adds a conditional in the code that sends the buffer to
the socket but should not create any measurable performance slowdown,
listLength() just accesses a structure field, and this code path is
totally dominated by write(2).

Related to #4100.
2017-07-04 11:55:05 +02:00
Dvir Volk
86e564e9ff fixed #4100 2017-07-04 00:02:19 +03:00
antirez
b2cd9fcab6 Fix GEORADIUS edge case with huge radius.
This commit closes issue #3698, at least for now, since the root cause
was not fixed: the bounding box function, for huge radiuses, does not
return a correct bounding box, there are points still within the radius
that are left outside.

So when using GEORADIUS queries with radiuses in the order of 5000 km or
more, it was possible to see, at the edge of the area, certain points
not correctly reported.

Because the bounding box for now was used just as an optimization, and
such huge radiuses are not common, for now the optimization is just
switched off when the radius is near such magnitude.

Three test cases found by the Continuous Integration test were added, so
that we can easily trigger the bug again, both for regression testing
and in order to properly fix it as some point in the future.
2017-07-03 19:38:31 +02:00
antirez
26e638a8e9 redis-cli --latency: ability to run non interactively.
This feature was proposed by @rosmo in PR #2643 and later redesigned
in order to fit better with the other options for non-interactive modes
of redis-cli. The idea is basically to allow to collect latency
information in scripts, cron jobs or whateever, just running for a
limited time and then producing a single output.
2017-06-30 15:41:58 +02:00
antirez
7bad78bd2f Fix abort typo in Lua debugger help screen. 2017-06-30 12:12:00 +02:00
antirez
f8547e53f0 Added GEORADIUS(BYMEMBER)_RO variants for read-only operations.
Issue #4084 shows how for a design error, GEORADIUS is a write command
because of the STORE option. Because of this it does not work
on readonly slaves, gets redirected to masters in Redis Cluster even
when the connection is in READONLY mode and so forth.

To break backward compatibility at this stage, with Redis 4.0 to be in
advanced RC state, is problematic for the user base. The API can be
fixed into the unstable branch soon if we'll decide to do so in order to
be more consistent, and reease Redis 5.0 with this incompatibility in
the future. This is still unclear.

However, the ability to scale GEO queries in slaves easily is too
important so this commit adds two read-only variants to the GEORADIUS
and GEORADIUSBYMEMBER command: GEORADIUS_RO and GEORADIUSBYMEMBER_RO.
The commands are exactly as the original commands, but they do not
accept the STORE and STOREDIST options.
2017-06-30 10:03:37 +02:00
antirez
01a4b9892d HMSET and MSET implementations unified. HSET now variadic.
This is the first step towards getting rid of HMSET which is a command
that does not make much sense once HSET is variadic, and has a saner
return value.
2017-06-29 17:38:46 +02:00
Salvatore Sanfilippo
634c64dd18 Merge pull request #4075 from sgn1/brpop_keys
Fix Issues in blocking commands in cluster mode.
2017-06-27 17:51:19 +02:00
antirez
365dd037dc RDB modules values serialization format version 2.
The original RDB serialization format was not parsable without the
module loaded, becuase the structure was managed only by the module
itself. Moreover RDB is a streaming protocol in the sense that it is
both produce di an append-only fashion, and is also sometimes directly
sent to the socket (in the case of diskless replication).

The fact that modules values cannot be parsed without the relevant
module loaded is a problem in many ways: RDB checking tools must have
loaded modules even for doing things not involving the value at all,
like splitting an RDB into N RDBs by key or alike, or just checking the
RDB for sanity.

In theory module values could be just a blob of data with a prefixed
length in order for us to be able to skip it. However prefixing the values
with a length would mean one of the following:

1. To be able to write some data at a previous offset. This breaks
stremaing.
2. To bufferize values before outputting them. This breaks performances.
3. To have some chunked RDB output format. This breaks simplicity.

Moreover, the above solution, still makes module values a totally opaque
matter, with the fowllowing problems:

1. The RDB check tool can just skip the value without being able to at
least check the general structure. For datasets composed mostly of
modules values this means to just check the outer level of the RDB not
actually doing any checko on most of the data itself.
2. It is not possible to do any recovering or processing of data for which a
module no longer exists in the future, or is unknown.

So this commit implements a different solution. The modules RDB
serialization API is composed if well defined calls to store integers,
floats, doubles or strings. After this commit, the parts generated by
the module API have a one-byte prefix for each of the above emitted
parts, and there is a final EOF byte as well. So even if we don't know
exactly how to interpret a module value, we can always parse it at an
high level, check the overall structure, understand the types used to
store the information, and easily skip the whole value.

The change is backward compatible: older RDB files can be still loaded
since the new encoding has a new RDB type: MODULE_2 (of value 7).
The commit also implements the ability to check RDB files for sanity
taking advantage of the new feature.
2017-06-27 13:19:16 +02:00
antirez
c3998728a2 ARM: Fix stack trace generation on crash. 2017-06-26 10:36:16 +02:00
antirez
c9097393bf Issue #4027: unify comment and modify return value in freeMemoryIfNeeded().
It looks safer to return C_OK from freeMemoryIfNeeded() when clients are
paused because returning C_ERR may prevent success of writes. It is
possible that there is no difference in practice since clients cannot
execute writes while clients are paused, but it looks more correct this
way, at least conceptually.

Related to PR #4028.
2017-06-23 11:42:25 +02:00
Salvatore Sanfilippo
936ade80b2 Merge pull request #4028 from zintrepid/prevent_expirations_while_paused
Prevent expirations and evictions while paused
2017-06-23 11:39:02 +02:00
Suraj Narkhede
f85f36f50d Fix following issues in blocking commands:
1. brpop last key index, thus checking all keys for slots.
2. Memory leak in clusterRedirectBlockedClientIfNeeded.
3. Remove while loop in clusterRedirectBlockedClientIfNeeded.
2017-06-23 00:30:21 -07:00
Suraj Narkhede
d303bca587 Fix brpop command table entry and redirect blocked clients. 2017-06-22 23:52:00 -07:00
antirez
8b768e8ea4 Aesthetic changes to #4068 PR to conform to Redis coding standard.
1. Inline if ... statement if short.
2. No lines over 80 columns.
2017-06-22 11:00:34 +02:00
Salvatore Sanfilippo
6476f1a979 Merge pull request #4068 from FreedomU007/unstable
Fix set with ex/px option when propagated to aof
2017-06-22 10:46:58 +02:00
xuzhou
86e9f48a0c Optimize set command with ex/px when updating aof. 2017-06-22 11:06:40 +08:00
Salvatore Sanfilippo
ef446bf16d Merge pull request #3802 from flowly/bugfix-calc-stat-net-output-bytes
Bugfix calc stat net output bytes
2017-06-20 17:01:16 +02:00
Salvatore Sanfilippo
1d857a99d5 Merge pull request #4056 from season89/unstable
Fixed comments of slowlog duration
2017-06-20 16:55:29 +02:00
Salvatore Sanfilippo
0a03187ac4 Merge pull request #3659 from cbgbt/cli-elapsed
cli: Only print elapsed time on OUTPUT_STANDARD.
2017-06-20 16:53:56 +02:00
antirez
2a84927f35 redis-benchmark: add -t hset target. 2017-06-19 09:41:11 +02:00
xuzhou
530fcf8687 Fix set with ex/px option when propagated to aof 2017-06-16 17:51:38 +08:00
antirez
53cb27b1d7 SLOWLOG: log offending client address and name. 2017-06-15 12:57:54 +02:00
antirez
ab9d398835 Merge branch 'unstable' of github.com:/antirez/redis into unstable 2017-06-14 18:29:53 +02:00
Qu Chen
4740424049 Implement getKeys procedure for georadius and georadiusbymember
commands.
2017-06-14 18:15:48 +02:00
xuchengxuan
3fc4bf07cc Fixed comments of slowlog duration 2017-06-14 16:42:21 +08:00
Salvatore Sanfilippo
d3b32ca48d Merge pull request #4034 from amallia/patch-1
Fixed comment in clusterMsg version field
2017-06-13 06:28:23 -07:00
Salvatore Sanfilippo
33035cad04 Merge pull request #4035 from amallia/patch-2
Removed duplicate 'sys/socket.h'  include
2017-06-13 06:27:31 -07:00
antirez
5877c02c51 Fix PERSIST expired key resuscitation issue #4048. 2017-06-13 10:35:51 +02:00
Antonio Mallia
2d1d57eb47 Removed duplicate 'sys/socket.h' include 2017-06-04 15:26:53 +01:00
Antonio Mallia
591dba8055 Fixed comment in clusterMsg version field 2017-06-04 15:09:05 +01:00
Zachary Marquez
a3e53cf9bc Prevent expirations and evictions while paused
Proposed fix to https://github.com/antirez/redis/issues/4027
2017-06-01 16:28:40 -05:00
antirez
e91b81c612 More informative -MISCONF error message. 2017-05-19 12:03:30 +02:00
antirez
e498d9ee3e Collect fork() timing info only if fork succeeded. 2017-05-19 11:10:36 +02:00
antirez
78211aaaaf redis-cli --bigkeys: show error when TYPE fails.
Close #3993.
2017-05-15 11:22:28 +02:00
antirez
1f598fc2bb Modules TSC: use atomic var for server.unixtime.
This avoids Helgrind complaining, but we are actually not using
atomicGet() to get the unixtime value for now: too many places where it
is used and given tha time_t is word-sized it should be safe in all the
archs we support as it is.

On the other hand, Helgrind, when Redis is compiled with "make helgrind"
in order to force the __sync macros, will detect the write in
updateCachedTime() as a read (because atomic functions are used) and
will not complain about races.

This commit also includes minor refactoring of mutex initializations and
a "helgrind" target in the Makefile.
2017-05-10 10:04:16 +02:00
antirez
de786186a5 atomicvar.h: show used API in INFO. Add macro to force __sync builtin.
The __sync builtin can be correctly detected by Helgrind so to force it
is useful for testing. The API in the INFO output can be useful for
debugging after problems are reported.
2017-05-10 09:33:49 +02:00
antirez
6eb51bf1ec zmalloc.c: remove thread safe mode, it's the default way. 2017-05-09 16:59:51 +02:00
antirez
9390c384b8 Modules TSC: Add mutex for server.lruclock.
Only useful for when no atomic builtins are available.
2017-05-09 16:32:49 +02:00
antirez
ece658713b Modules TSC: Improve inter-thread synchronization.
More work to do with server.unixtime and similar. Need to write Helgrind
suppression file in order to suppress the valse positives.
2017-05-09 11:57:09 +02:00
antirez
2a51bac44e Simplify atomicvar.h usage by having the mutex name implicit. 2017-05-04 17:01:00 +02:00
antirez
52bc74f221 Lazyfree: fix lazyfreeGetPendingObjectsCount() race reading counter. 2017-05-04 10:35:40 +02:00
antirez
7d9326b1f3 Modules TSC: HELLO.KEYS reply format fixed. 2017-05-03 23:43:49 +02:00
antirez
9b01b64430 Modules TSC: put the client in the pending write list. 2017-05-03 14:54:48 +02:00
antirez
e67fb915eb adlist: fix final list count in listJoin(). 2017-05-03 14:54:14 +02:00
antirez
79226cb9fa adlist: fix listJoin() to handle empty lists. 2017-05-03 14:15:25 +02:00
antirez
6798736909 Modules: remove unused var in example module. 2017-05-03 14:10:21 +02:00
antirez
1ed2ff5570 Modules TSC: HELLO.KEYS example draft finished. 2017-05-03 14:08:12 +02:00
antirez
7127f15ebe Module: fix RedisModule_Call() "l" specifier to create a raw string. 2017-05-03 14:07:10 +02:00
antirez
3fcf959e60 Modules TSC: Release the GIL for all the time we are blocked.
Instead of giving the module background operations just a small time to
run in the beforeSleep() function, we can have the lock released for all
the time we are blocked in the multiplexing syscall.
2017-05-03 11:26:21 +02:00
antirez
ba4a5a3255 Modules TSC: Export symbols of the new API. 2017-05-02 15:19:28 +02:00
antirez
275905b328 Modules TSC: Handling of RM_Reply* functions. 2017-05-02 15:05:39 +02:00
antirez
9c500b89fb Modules TSC: Basic TS context creeation and handling. 2017-05-02 12:53:10 +02:00
antirez
59b06b14c9 Modules TSC: GIL and cooperative multi tasking setup. 2017-04-28 18:41:10 +02:00
antirez
469d6e2b37 PSYNC2: fix master cleanup when caching it.
The master client cleanup was incomplete: resetClient() was missing and
the output buffer of the client was not reset, so pending commands
related to the previous connection could be still sent.

The first problem caused the client argument vector to be, at times,
half populated, so that when the correct replication stream arrived the
protcol got mixed to the arugments creating invalid commands that nobody
called.

Thanks to @yangsiran for also investigating this problem, after
already providing important design / implementation hints for the
original PSYNC2 issues (see referenced Github issue).

Note that this commit adds a new function to the list library of Redis
in order to be able to reset a list without destroying it.

Related to issue #3899.
2017-04-27 17:08:37 +02:00
antirez
238cebdd5e Check event loop creation return value. Fix #3951.
Normally we never check for OOM conditions inside Redis since the
allocator will always return a pointer or abort the program on OOM
conditons. However we cannot have control on epool_create(), that may
fail for kernel OOM (according to the manual page) even if all the
parameters are correct, so the function aeCreateEventLoop() may indeed
return NULL and this condition must be checked.
2017-04-21 16:27:38 +02:00
Salvatore Sanfilippo
3773c06d28 Merge pull request #3950 from kensou97/unstable
update block->free after some diff data are written to the child process
2017-04-20 07:55:51 +02:00
antirez
7d9dd80db3 Fix getKeysUsingCommandTable() in cluster mode.
Close #3940.
2017-04-19 16:17:08 +02:00
antirez
189a12afb4 PSYNC2: discard pending transactions from cached master.
During the review of the fix for #3899, @yangsiran identified an
implementation bug: given that the offset is now relative to the applied
part of the replication log, when we cache a master, the successive
PSYNC2 request will be made in order to *include* the transaction that
was not completely processed. This means that we need to discard any
pending transaction from our replication buffer: it will be re-executed.
2017-04-19 14:02:52 +02:00
antirez
22be435efe Fix PSYNC2 incomplete command bug as described in #3899.
This bug was discovered by @kevinmcgehee and constituted a major hidden
bug in the PSYNC2 implementation, caused by the propagation from the
master of incomplete commands to slaves.

The bug had several results:

1. Borrowing from Kevin text in the issue: "Given that slaves blindly
copy over their master's input into their own replication backlog over
successive read syscalls, it's possible that with large commands or
small TCP buffers, partial commands are present in this buffer. If the
master were to fail before successfully propagating the entire command
to a slave, the slaves will never execute the partial command (since the
client is invalidated) but will copy it to replication backlog which may
relay those invalid bytes to its slaves on PSYNC2, corrupting the
backlog and possibly other valid commands that follow the failover.
Simple command boundaries aren't sufficient to capture this, either,
because in the case of a MULTI/EXEC block, if the master successfully
propagates a subset of the commands but not the EXEC, then the
transaction in the backlog becomes corrupt and could corrupt other
slaves that consume this data."

2. As identified by @yangsiran later, there is another effect of the
bug. For the same mechanism of the first problem, a slave having another
slave, could receive a full resynchronization request with an already
half-applied command in the backlog. Once the RDB is ready, it will be
sent to the slave, and the replication will continue sending to the
sub-slave the other half of the command, which is not valid.

The fix, designed by @yangsiran and @antirez, and implemented by
@antirez, uses a secondary buffer in order to feed the sub-masters and
update the replication backlog and offsets, only when a given part of
the query buffer is actually *applied* to the state of the instance,
that is, when the command gets processed and the command is not pending
in the Redis transaction buffer because of CLIENT_MULTI state.

Given that now the backlog and offsets representation are in agreement
with the actual processed commands, both issue 1 and 2 should no longer
be possible.

Thanks to @kevinmcgehee, @yangsiran and @oranagra for their work in
identifying and designing a fix for this problem.
2017-04-19 10:25:45 +02:00