Commit Graph

226 Commits

Author SHA1 Message Date
Salvatore Sanfilippo
6204d8c139
Merge pull request #5367 from nUl1/fullresync-stopbgsave
Prevent RDB autosave from overwriting full resync results
2018-10-31 11:42:04 +01:00
antirez
3d07ed983e Fix typo in replicationCron() comment. 2018-10-05 18:30:45 +02:00
Andrey Bugaevskiy
466c277b4f Move child termination to readSyncBulkPayload 2018-09-27 19:38:58 +03:00
Andrey Bugaevskiy
98a64523c4 Prevent RDB autosave from overwriting full resync results
During the full database resync we may still have unsaved changes
on the receiving side. This causes a race condition between
synced data rename/load and the rename of rdbSave tempfile.
2018-09-19 19:58:39 +03:00
antirez
61b7a176ef Slave removal: replication.c logs fixed. 2018-09-11 15:32:28 +02:00
antirez
ef2c7a5bbb Slave removal: SLAVEOF -> REPLICAOF. SLAVEOF is now an alias. 2018-09-11 15:32:28 +02:00
Oran Agra
d55598988b fix rare replication stream corruption with disk-based replication
The slave sends \n keepalive messages to the master while parsing the rdb,
and later sends REPLCONF ACK once a second. rarely, the master recives both
a linefeed char and a REPLCONF in the same read, \n*3\r\n$8\r\nREPLCONF\r\n...
and it tries to trim two chars (\r\n) from the query buffer,
trimming the '*' from *3\r\n$8\r\nREPLCONF\r\n...

then the master tries to process a command starting with '3' and replies to
the slave a bunch of -ERR and one +OK.
although the slave silently ignores these (prints a log message), this corrupts
the replication offset at the slave since the slave increases the replication
offset, and the master did not.

other than the fix in processInlineBuffer, i did several other improvments
while hunting this very rare bug.

- when redis replies with "unknown command" it includes a portion of the
  arguments, not just the command name. so it would be easier to understand
  what was recived, in my case, on the slave side,  it was -ERR, but
  the "arguments" were the interesting part (containing info on the error).
- about a year ago i added code in addReplyErrorLength to print the error to
  the log in case of a reply to master (since this string isn't actually
  trasmitted to the master), now changed that block to print a similar log
  message to indicate an error being sent from the master to the slave.
  note that the slave is marked as CLIENT_SLAVE only after PSYNC was received,
  so this will not cause any harm for REPLCONF, and will only indicate problems
  that are gonna corrupt the replication stream anyway.
- two places were c->reply was emptied, and i wanted to reset sentlen
  this is a precaution (i did not actually see such a problem), since a
  non-zero sentlen will cause corruption to be transmitted on the socket.
2018-07-17 12:51:49 +03:00
Oran Agra
bf680b6f8c slave buffers were wasteful and incorrectly counted causing eviction
A) slave buffers didn't count internal fragmentation and sds unused space,
   this caused them to induce eviction although we didn't mean for it.

B) slave buffers were consuming about twice the memory of what they actually needed.
- this was mainly due to sdsMakeRoomFor growing to twice as much as needed each time
  but networking.c not storing more than 16k (partially fixed recently in 237a38737).
- besides it wasn't able to store half of the new string into one buffer and the
  other half into the next (so the above mentioned fix helped mainly for small items).
- lastly, the sds buffers had up to 30% internal fragmentation that was wasted,
  consumed but not used.

C) inefficient performance due to starting from a small string and reallocing many times.

what i changed:
- creating dedicated buffers for reply list, counting their size with zmalloc_size
- when creating a new reply node from, preallocate it to at least 16k.
- when appending a new reply to the buffer, first fill all the unused space of the
  previous node before starting a new one.

other changes:
- expose mem_not_counted_for_evict info field for the benefit of the test suite
- add a test to make sure slave buffers are counted correctly and that they don't cause eviction
2018-07-16 16:43:42 +03:00
Jack Drogon
93238575f7 Fix typo 2018-07-03 18:19:46 +02:00
antirez
677d10b2a8 Set repl_down_since to zero on state change.
PR #5081 fixes an "interesting" bug about Redis Cluster failover but in
general about the updating of repl_down_since, that is used in order to
count the time a slave was left disconnected from its master.

While the fix provided resolves the specific issue, in general the
validity of repl_down_since is limited to states that are different
than the state CONNECTED, and the disconnected time is set when the
state is DISCONNECTED. However from CONNECTED to other states, the state
machine must always go to DISCONNECTED first. So it makes sense to set
the field to zero (since it is meaningless in that context) when the
state is set to CONNECTED.
2018-07-03 12:42:14 +02:00
WuYunlong
2e167f7d0e fix server.repl_down_since resetting, so that slaves could failover
automatically as expected.
2018-06-30 09:39:08 +08:00
antirez
27178a3fde Fix type of argslen in sendSynchronousCommand().
Related to #5037.
2018-06-26 14:38:35 +02:00
antirez
1f1e724f47 Remove black space. 2018-06-26 14:37:22 +02:00
Madelyn Olson
45731edc4b Addressed comments 2018-06-26 00:57:35 +00:00
Madelyn Olson
e8d68b6b72 Fixed replication authentication with whitespace in password 2018-06-26 00:48:37 +00:00
shenlongxing
c85ae56edc Fix write() errno error 2018-06-06 13:06:42 +02:00
Salvatore Sanfilippo
4aa2ecd98b
Merge pull request #4269 from jianqingdu/unstable
fix not call va_end() when syncWrite() failed
2018-01-24 10:55:25 +01:00
antirez
b23927b240 Hopefully more clear comment to explain the change in #4607. 2018-01-16 15:52:13 +01:00
Oran Agra
689b64c3ad PSYNC2 fix - promoted slave should hold on to it's backlog
after a slave is promoted (assuming it has no slaves
and it booted over an hour ago), it will lose it's replication
backlog at the next replication cron, rather than waiting for slaves
to connect to it.
so on a simple master/slave faiover, if the new slave doesn't connect
immediately, it may be too later and PSYNC2 will fail.
2018-01-16 10:10:42 +02:00
antirez
62a4b817c6 add linkClient(): adds the client and caches the list node.
We have this operation in two places: when caching the master and
when linking a new client after the client creation. By having an API
for this we avoid incurring in errors when modifying one of the two
places forgetting the other. The function is also a good place where to
document why we cache the linked list node.

Related to #4497 and #4210.
2017-12-05 16:02:03 +01:00
zhaozhao.zz
43be967690 networking: optimize unlinkClient() in freeClient() 2017-11-30 18:11:05 +08:00
antirez
4d063bb6ba PSYNC2: reorganize comments related to recent fixes.
Related to PR #4412 and issue #4407.
2017-11-24 11:08:29 +01:00
zhaozhao.zz
6ddf0ea293 PSYNC2: safe free backlog when reach the time limit
When we free the backlog, we should use a new
replication ID and clear the ID2. Since without
backlog we can not increment master_repl_offset
even do write commands, that may lead to inconsistency
when we try to connect a "slave-before" master
(if this master is our slave before, our replid
equals the master's replid2). As the master have our
history, so we can match the master's replid2 and
second_replid_offset, that make partial sync work,
but the data is inconsistent.
2017-11-01 17:32:27 +08:00
antirez
bb3b5ddd19 PSYNC2: More refinements related to #4316. 2017-09-20 11:28:13 +02:00
zhaozhao.zz
b541ccef25 PSYNC2: make persisiting replication info more solid
This commit is a reinforcement of commit c1c99e9.

1. Replication information can be stored when the RDB file is
generated by a mater using server.slaveseldb when server.repl_backlog
is not NULL, or set repl_stream_db be -1. That's safe, because
NULL server.repl_backlog will trigger full synchronization,
then master will send SELECT command to replicaiton stream.
2. Only do rdbSave* when rsiptr is not NULL,
if we do rdbSave* without rdbSaveInfo, slave will miss repl-stream-db.
3. Save the replication informations also in the case of
SAVE command, FLUSHALL command and DEBUG reload.
2017-09-20 11:18:10 +02:00
antirez
c1c99e9f4e PSYNC2: Fix the way replication info is saved/loaded from RDB.
This commit attempts to fix a number of bugs reported in #4316.
They are related to the way replication info like replication ID,
offsets, and currently selected DB in the master client, are stored
and loaded by Redis. In order to avoid inconsistencies the changes in
this commit try to enforce that:

1. Replication information are only stored when the RDB file is
generated by a slave that has a valid 'master' client, so that we can
always extract the currently selected DB.
2. When replication informations are persisted in the RDB file, all the
info for a successful PSYNC or nothing is persisted.
3. The RDB replication informations are only loaded if the instance is
configured as a slave, otherwise a master can start with IDs that relate
to a different history of the data set, and stil retain such IDs in the
future while receiving unrelated writes.
2017-09-19 23:03:39 +02:00
antirez
b75ae0bbea PSYNC2: Create backlog on slave partial sync as well.
A slave may be started with an RDB file able to provide enough slave to
perform a successful partial SYNC with its master. However in such a
case, how outlined in issue #4268, the slave backlog will not be
started, since it was only initialized on full syncs attempts. This
creates different problems with successive PSYNC attempts that will
always result in full synchronizations.

Thanks to @fdingiit for discovering the issue.
2017-09-19 10:33:14 +02:00
jianqingdu
498f65ffb7 fix not call va_end when syncWrite() failed
fix not call va_end when syncWrite() failed in sendSynchronousCommand()
2017-08-30 21:20:14 -05:00
antirez
469d6e2b37 PSYNC2: fix master cleanup when caching it.
The master client cleanup was incomplete: resetClient() was missing and
the output buffer of the client was not reset, so pending commands
related to the previous connection could be still sent.

The first problem caused the client argument vector to be, at times,
half populated, so that when the correct replication stream arrived the
protcol got mixed to the arugments creating invalid commands that nobody
called.

Thanks to @yangsiran for also investigating this problem, after
already providing important design / implementation hints for the
original PSYNC2 issues (see referenced Github issue).

Note that this commit adds a new function to the list library of Redis
in order to be able to reset a list without destroying it.

Related to issue #3899.
2017-04-27 17:08:37 +02:00
antirez
189a12afb4 PSYNC2: discard pending transactions from cached master.
During the review of the fix for #3899, @yangsiran identified an
implementation bug: given that the offset is now relative to the applied
part of the replication log, when we cache a master, the successive
PSYNC2 request will be made in order to *include* the transaction that
was not completely processed. This means that we need to discard any
pending transaction from our replication buffer: it will be re-executed.
2017-04-19 14:02:52 +02:00
antirez
22be435efe Fix PSYNC2 incomplete command bug as described in #3899.
This bug was discovered by @kevinmcgehee and constituted a major hidden
bug in the PSYNC2 implementation, caused by the propagation from the
master of incomplete commands to slaves.

The bug had several results:

1. Borrowing from Kevin text in the issue: "Given that slaves blindly
copy over their master's input into their own replication backlog over
successive read syscalls, it's possible that with large commands or
small TCP buffers, partial commands are present in this buffer. If the
master were to fail before successfully propagating the entire command
to a slave, the slaves will never execute the partial command (since the
client is invalidated) but will copy it to replication backlog which may
relay those invalid bytes to its slaves on PSYNC2, corrupting the
backlog and possibly other valid commands that follow the failover.
Simple command boundaries aren't sufficient to capture this, either,
because in the case of a MULTI/EXEC block, if the master successfully
propagates a subset of the commands but not the EXEC, then the
transaction in the backlog becomes corrupt and could corrupt other
slaves that consume this data."

2. As identified by @yangsiran later, there is another effect of the
bug. For the same mechanism of the first problem, a slave having another
slave, could receive a full resynchronization request with an already
half-applied command in the backlog. Once the RDB is ready, it will be
sent to the slave, and the replication will continue sending to the
sub-slave the other half of the command, which is not valid.

The fix, designed by @yangsiran and @antirez, and implemented by
@antirez, uses a secondary buffer in order to feed the sub-masters and
update the replication backlog and offsets, only when a given part of
the query buffer is actually *applied* to the state of the instance,
that is, when the command gets processed and the command is not pending
in the Redis transaction buffer because of CLIENT_MULTI state.

Given that now the backlog and offsets representation are in agreement
with the actual processed commands, both issue 1 and 2 should no longer
be possible.

Thanks to @kevinmcgehee, @yangsiran and @oranagra for their work in
identifying and designing a fix for this problem.
2017-04-19 10:25:45 +02:00
antirez
104584b95e Fix typo in feedReplicationBacklog() top comment. 2017-04-12 12:28:05 +02:00
antirez
76d87f47c7 Don't leak file descriptor on syncWithMaster().
Close #3804.
2017-02-20 10:18:41 +01:00
antirez
8e390a62ad Hopefully improve code comments for issue #3616.
This commit also contains other changes in order to conform the code to
the Redis core style, specifically 80 chars max per line, smart
conditionals in the same line:

    if (that) do_this();
2016-12-16 17:48:38 +01:00
Salvatore Sanfilippo
ca4ca5073e Merge pull request #3616 from oranagra/stop_aofrw_before_rdbload
CoW improvement, stop AOFRW before flushing and parsing slave RDB
2016-12-16 17:43:20 +01:00
antirez
434e6b2da3 PSYNC2: Do not accept WAIT in slave instances.
No longer makes sense since writable slaves only do local writes now:
writes are no longer passed to sub-slaves in the stream.
2016-12-02 10:21:20 +01:00
antirez
6eb720ff2d PSYNC2: Minor memory leak reading -NOMASTERLINK master reply fixed. 2016-11-29 10:25:00 +01:00
antirez
eab865a0a1 PSYNC2: stop sending newlines to sub-slaves when master is down.
This actually includes two changes:

1) No newlines to take the master-slave link up when the upstream master
is down. Doing this is dangerous because the sub-slave often is received
replication protocol for an half-command, so can't receive newlines
without desyncing the replication link, even with the code in order to
cancel out the bytes that PSYNC2 was using. Moreover this is probably
also not needed/sane, because anyway the slave can keep serving
requests, and because if it's configured to don't serve stale data, it's
a good idea, actually, to break the link.

2) When a +CONTINUE with a different ID is received, we now break
connection with the sub-slaves: they need to be notified as well. This
was part of the original specification but for some reason it was not
implemented in the code, and was alter found as a PSYNC2 bug in the
integration testing.
2016-11-28 17:54:04 +01:00
antirez
e09e31b12e PSYNC2: on transient error jump to error, not write_error. 2016-11-24 15:48:18 +01:00
antirez
5b7d42fff3 PSYNC2: bugfixing pre release.
1. Master replication offset was cleared after switching configuration
to some other slave, since it was assumed you can't PSYNC after a
switch. Note the case anymore and when we successfully PSYNC we need to
have our offset untouched.

2. Secondary replication ID was not reset to "000..." pattern at
startup.

3. Master in error state replying -LOADING or other transient errors
forced the slave to discard the cached master and full resync. This is
now fixed.

4. Better logging of what's happening on failed PSYNCs.
2016-11-23 17:36:45 +01:00
Salvatore Sanfilippo
5b83fa482c Merge pull request #3612 from deep011/unstable
fix a possible bug for 'replconf getack'
2016-11-18 10:45:09 +01:00
oranagra
e3a61950a2 when a slave loads an RDB, stop an AOFRW fork before flusing db and parsing rdb file, to avoid a CoW disaster. 2016-11-16 21:30:59 +02:00
deep011
13a92a5bb1 fix a possible bug for 'replconf getack' 2016-11-16 11:04:33 +08:00
antirez
28c96d73b2 PSYNC2: Save replication ID/offset on RDB file.
This means that stopping a slave and restarting it will still make it
able to PSYNC with the master. Moreover the master itself will retain
its ID/offset, in case it gets turned into a slave, or if a slave will
try to PSYNC with it with an exactly updated offset (otherwise there is
no backlog).

This change was possible thanks to PSYNC v2 that makes saving the current
replication state much simpler.
2016-11-10 12:35:29 +01:00
antirez
4e5e366ed2 PSYNC2: Wrap debugging code with if(0) 2016-11-09 15:37:15 +01:00
antirez
2669fb8364 PSYNC2: different improvements to Redis replication.
The gist of the changes is that now, partial resynchronizations between
slaves and masters (without the need of a full resync with RDB transfer
and so forth), work in a number of cases when it was impossible
in the past. For instance:

1. When a slave is promoted to mastrer, the slaves of the old master can
partially resynchronize with the new master.

2. Chained slalves (slaves of slaves) can be moved to replicate to other
slaves or the master itsef, without requiring a full resync.

3. The master itself, after being turned into a slave, is able to
partially resynchronize with the new master, when it joins replication
again.

In order to obtain this, the following main changes were operated:

* Slaves also take a replication backlog, not just masters.

* Same stream replication for all the slaves and sub slaves. The
replication stream is identical from the top level master to its slaves
and is also the same from the slaves to their sub-slaves and so forth.
This means that if a slave is later promoted to master, it has the
same replication backlong, and can partially resynchronize with its
slaves (that were previously slaves of the old master).

* A given replication history is no longer identified by the `runid` of
a Redis node. There is instead a `replication ID` which changes every
time the instance has a new history no longer coherent with the past
one. So, for example, slaves publish the same replication history of
their master, however when they are turned into masters, they publish
a new replication ID, but still remember the old ID, so that they are
able to partially resynchronize with slaves of the old master (up to a
given offset).

* The replication protocol was slightly modified so that a new extended
+CONTINUE reply from the master is able to inform the slave of a
replication ID change.

* REPLCONF CAPA is used in order to notify masters that a slave is able
to understand the new +CONTINUE reply.

* The RDB file was extended with an auxiliary field that is able to
select a given DB after loading in the slave, so that the slave can
continue receiving the replication stream from the point it was
disconnected without requiring the master to insert "SELECT" statements.
This is useful in order to guarantee the "same stream" property, because
the slave must be able to accumulate an identical backlog.

* Slave pings to sub-slaves are now sent in a special form, when the
top-level master is disconnected, in order to don't interfer with the
replication stream. We just use out of band "\n" bytes as in other parts
of the Redis protocol.

An old design document is available here:

https://gist.github.com/antirez/ae068f95c0d084891305

However the implementation is not identical to the description because
during the work to implement it, different changes were needed in order
to make things working well.
2016-11-09 15:37:15 +01:00
charsyam
ca6fc4f031 Simple change just using slaves instead of server.slaves 2016-09-24 15:53:57 +09:00
Qu Chen
d982f44372 Fix a bug to delay bgsave while AOF rewrite in progress for replication 2016-08-02 10:44:33 +02:00
antirez
55385f99de Ability of slave to announce arbitrary ip/port to master.
This feature is useful, especially in deployments using Sentinel in
order to setup Redis HA, where the slave is executed with NAT or port
forwarding, so that the auto-detected port/ip addresses, as listed in
the "INFO replication" output of the master, or as provided by the
"ROLE" command, don't match the real addresses at which the slave is
reachable for connections.
2016-07-27 17:32:15 +02:00
antirez
03f5b508e5 Replication: when possible start RDB saving ASAP.
In a previous commit the replication code was changed in order to
centralize the BGSAVE for replication trigger in replicationCron(),
however after further testings, the 1 second delay imposed by this
change is not acceptable.

So now the BGSAVE is only delayed if the AOF rewriting process is
active. However past comments made sure that replicationCron() is always
able to trigger the BGSAVE when needed, making the code generally more
robust.

The new code is more similar to the initial @oranagra patch where the
BGSAVE was delayed only if an AOF rewrite was in progress.

Trivia: delaying the BGSAVE uncovered a minor Sentinel issue that is now
fixed.
2016-07-22 17:03:18 +02:00