Commit Graph

285 Commits

Author SHA1 Message Date
oranagra
e3a61950a2 when a slave loads an RDB, stop an AOFRW fork before flusing db and parsing rdb file, to avoid a CoW disaster. 2016-11-16 21:30:59 +02:00
deep011
13a92a5bb1 fix a possible bug for 'replconf getack' 2016-11-16 11:04:33 +08:00
antirez
28c96d73b2 PSYNC2: Save replication ID/offset on RDB file.
This means that stopping a slave and restarting it will still make it
able to PSYNC with the master. Moreover the master itself will retain
its ID/offset, in case it gets turned into a slave, or if a slave will
try to PSYNC with it with an exactly updated offset (otherwise there is
no backlog).

This change was possible thanks to PSYNC v2 that makes saving the current
replication state much simpler.
2016-11-10 12:35:29 +01:00
antirez
4e5e366ed2 PSYNC2: Wrap debugging code with if(0) 2016-11-09 15:37:15 +01:00
antirez
2669fb8364 PSYNC2: different improvements to Redis replication.
The gist of the changes is that now, partial resynchronizations between
slaves and masters (without the need of a full resync with RDB transfer
and so forth), work in a number of cases when it was impossible
in the past. For instance:

1. When a slave is promoted to mastrer, the slaves of the old master can
partially resynchronize with the new master.

2. Chained slalves (slaves of slaves) can be moved to replicate to other
slaves or the master itsef, without requiring a full resync.

3. The master itself, after being turned into a slave, is able to
partially resynchronize with the new master, when it joins replication
again.

In order to obtain this, the following main changes were operated:

* Slaves also take a replication backlog, not just masters.

* Same stream replication for all the slaves and sub slaves. The
replication stream is identical from the top level master to its slaves
and is also the same from the slaves to their sub-slaves and so forth.
This means that if a slave is later promoted to master, it has the
same replication backlong, and can partially resynchronize with its
slaves (that were previously slaves of the old master).

* A given replication history is no longer identified by the `runid` of
a Redis node. There is instead a `replication ID` which changes every
time the instance has a new history no longer coherent with the past
one. So, for example, slaves publish the same replication history of
their master, however when they are turned into masters, they publish
a new replication ID, but still remember the old ID, so that they are
able to partially resynchronize with slaves of the old master (up to a
given offset).

* The replication protocol was slightly modified so that a new extended
+CONTINUE reply from the master is able to inform the slave of a
replication ID change.

* REPLCONF CAPA is used in order to notify masters that a slave is able
to understand the new +CONTINUE reply.

* The RDB file was extended with an auxiliary field that is able to
select a given DB after loading in the slave, so that the slave can
continue receiving the replication stream from the point it was
disconnected without requiring the master to insert "SELECT" statements.
This is useful in order to guarantee the "same stream" property, because
the slave must be able to accumulate an identical backlog.

* Slave pings to sub-slaves are now sent in a special form, when the
top-level master is disconnected, in order to don't interfer with the
replication stream. We just use out of band "\n" bytes as in other parts
of the Redis protocol.

An old design document is available here:

https://gist.github.com/antirez/ae068f95c0d084891305

However the implementation is not identical to the description because
during the work to implement it, different changes were needed in order
to make things working well.
2016-11-09 15:37:15 +01:00
charsyam
ca6fc4f031 Simple change just using slaves instead of server.slaves 2016-09-24 15:53:57 +09:00
Qu Chen
d982f44372 Fix a bug to delay bgsave while AOF rewrite in progress for replication 2016-08-02 10:44:33 +02:00
antirez
55385f99de Ability of slave to announce arbitrary ip/port to master.
This feature is useful, especially in deployments using Sentinel in
order to setup Redis HA, where the slave is executed with NAT or port
forwarding, so that the auto-detected port/ip addresses, as listed in
the "INFO replication" output of the master, or as provided by the
"ROLE" command, don't match the real addresses at which the slave is
reachable for connections.
2016-07-27 17:32:15 +02:00
antirez
03f5b508e5 Replication: when possible start RDB saving ASAP.
In a previous commit the replication code was changed in order to
centralize the BGSAVE for replication trigger in replicationCron(),
however after further testings, the 1 second delay imposed by this
change is not acceptable.

So now the BGSAVE is only delayed if the AOF rewriting process is
active. However past comments made sure that replicationCron() is always
able to trigger the BGSAVE when needed, making the code generally more
robust.

The new code is more similar to the initial @oranagra patch where the
BGSAVE was delayed only if an AOF rewrite was in progress.

Trivia: delaying the BGSAVE uncovered a minor Sentinel issue that is now
fixed.
2016-07-22 17:03:18 +02:00
antirez
780a8b1d76 Replication: start BGSAVE for replication always in replicationCron().
This makes the replication code conceptually simpler by removing the
synchronous BGSAVE trigger in syncCommand(). This also means that
socket and disk BGSAVE targets are handled by the same code.
2016-07-21 12:10:56 +02:00
antirez
acc2336fd1 Centralize slave replication handshake aborting.
Now we have a single function to call in any state of the slave
handshake, instead of using different functions for different states
which is error prone. Change performed in the context of issue #2479 but
does not fix it, since should be functionally identical to the past.
Just an attempt to make replication.c simpler to follow.
2015-12-03 10:38:56 +01:00
antirez
ed6228851c PR 2813 fix ported to unstable. 2015-10-15 10:20:09 +02:00
antirez
252cfa0a39 Lazyfree: cond vars to enabled/disable it based on DEL context. 2015-10-02 15:27:57 +02:00
antirez
c69c6c80fb Lazyfree: ability to free whole DBs in background. 2015-10-01 13:02:26 +02:00
antirez
1e7153831d Refactoring: unlinkClient() added to lower freeClient() complexity. 2015-09-30 17:10:03 +02:00
antirez
fdb3be939e Refactoring: new function to test if client has pending output. 2015-09-30 16:41:48 +02:00
antirez
1c7d87df0c Avoid installing the client write handler when possible. 2015-09-30 16:29:41 +02:00
antirez
d036abe27d Log client details on SLAVEOF command having an effect. 2015-08-21 15:29:07 +02:00
antirez
f18e5b634d startBgsaveForReplication(): handle waiting slaves state change.
Before this commit, after triggering a BGSAVE it was up to the caller of
startBgsavForReplication() to handle slaves in WAIT_BGSAVE_START in
order to update them accordingly. However when the replication target is
the socket, this is not possible since the process of updating the
slaves and sending the FULLRESYNC reply must be coupled with the process
of starting an RDB save (the reason is, we need to send the FULLSYNC
command and spawn a child that will start to send RDB data to the slaves
ASAP).

This commit moves the responsibility of handling slaves in
WAIT_BGSAVE_START to startBgsavForReplication() so that for both
diskless and disk-based replication we have the same chain of
responsiblity. In order accomodate such change, the syncCommand() also
needs to put the client in the slave list ASAP (just after the initial
checks) and not at the end, so that startBgsavForReplication() can find
the new slave alrady in the list.

Another related change is what happens if the BGSAVE fails because of
fork() or other errors: we now remove the slave from the list of slaves
and send an error, scheduling the slave connection to be terminated.

As a side effect of this change the following errors found by
Oran Agra are fixed (thanks!):

1. rdbSaveToSlavesSockets() on failed fork will get the slaves cleaned
up, otherwise they remain in a wrong state forever since we setup them
for full resync before actually trying to fork.

2. updateSlavesWaitingBgsave() with replication target set as "socket"
was broken since the function changed the slaves state from
WAIT_BGSAVE_START to WAIT_BGSAVE_END via
replicationSetupSlaveForFullResync(), so later rdbSaveToSlavesSockets()
will not find any slave in the right state (WAIT_BGSAVE_START) to feed.
2015-08-20 17:39:48 +02:00
antirez
bea1259190 slaveTryPartialResynchronization and syncWithMaster: better synergy.
It is simpler if removing the read event handler from the FD is up to
slaveTryPartialResynchronization, after all it is only called in the
context of syncWithMaster.

This commit also makes sure that on error all the event handlers are
removed from the socket before closing it.
2015-08-07 12:04:37 +02:00
antirez
88c716a0f5 syncWithMaster(): non blocking state machine. 2015-08-06 18:12:20 +02:00
antirez
ce5761e061 startBgsaveForReplication(): log what you really do. 2015-08-06 09:49:38 +02:00
antirez
3e6d4d599a Replication: add REPLCONF CAPA EOF support.
Add the concept of slaves capabilities to Redis, the slave now presents
to the Redis master with a set of capabilities in the form:

    REPLCONF capa SOMECAPA capa OTHERCAPA ...

This has the effect of setting slave->slave_capa with the corresponding
SLAVE_CAPA macros that the master can test later to understand if it
the slave will understand certain formats and protocols of the
replication process. This makes it much simpler to introduce new
replication capabilities in the future in a way that don't break old
slaves or masters.

This patch was designed and implemented together with Oran Agra
(@oranagra).
2015-08-06 09:23:23 +02:00
antirez
55ba772703 Fix replication slave pings period.
For PINGs we use the period configured by the user, but for the newlines
of slaves waiting for an RDB to be created (including slaves waiting for
the FULLRESYNC reply) we need to ping with frequency of 1 second, since
the timeout is fixed and needs to be refreshed.
2015-08-05 16:49:16 +02:00
antirez
15de6b108b Make sure we re-emit SELECT after each new slave full sync setup.
In previous commits we moved the FULLRESYNC to the moment we start the
BGSAVE, so that the offset we provide is the right one. However this
also means that we need to re-emit the SELECT statement every time a new
slave starts to accumulate the changes.

To obtian this effect in a more clean way, the function that sends the
FULLRESYNC reply was overloaded with a more important role of also doing
this and chanigng the slave state. So it was renamed to
replicationSetupSlaveForFullResync() to better reflect what it does now.
2015-08-05 13:34:46 +02:00
antirez
a5a06a8ecd Don't send SELECT to slaves in WAIT_BGSAVE_START state. 2015-08-05 11:23:22 +02:00
antirez
62b5c60ead syncCommand() comments improved. 2015-08-05 08:41:57 +02:00
antirez
292fec058a PSYNC initial offset fix.
This commit attempts to fix a bug involving PSYNC and diskless
replication (currently experimental) found by Yuval Inbar from Redis Labs
and that was later found to have even more far reaching effects (the bug also
exists when diskstore is off).

The gist of the bug is that, a Redis master replies with +FULLRESYNC to
a PSYNC attempt that fails and requires a full resynchronization.
However, the baseline offset sent along with FULLRESYNC was always the
current master replication offset. This is not ok, because there are
many reasosn that may delay the RDB file creation. And... guess what,
the master offset we communicate must be the one of the time the RDB
was created. So for example:

1) When the BGSAVE for replication is delayed since there is one
   already but is not good for replication.
2) When the BGSAVE is not needed as we attach one currently ongoing.
3) When because of diskless replication the BGSAVE is delayed.

In all the above cases the PSYNC reply is wrong and the slave may
reconnect later claiming to need a wrong offset: this may cause
data curruption later.
2015-08-04 17:06:10 +02:00
antirez
c1e94b6b9c Force slaves to resync after unsuccessful PSYNC.
Using chained replication where C is slave of B which is in turn slave of
A, if B reconnects the replication link with A but discovers it is no
longer possible to PSYNC, slaves of B must be disconnected and PSYNC
not allowed, since the new B dataset may be completely different after
the synchronization with the master.

Note that there are varius semantical differences in the way this is
handled now compared to the past. In the past the semantics was:

1. When a slave lost connection with its master, disconnected the chained
slaves ASAP. Which is not needed since after a successful PSYNC with the
master, the slaves can continue and don't need to resync in turn.

2. However after a failed PSYNC the replication backlog was not reset, so a
slave was able to PSYNC successfully even if the instance did a full
sync with its master, containing now an entirely different data set.

Now instead chained slaves are not disconnected when the slave lose the
connection with its master, but only when it is forced to full SYNC with
its master. This means that if the slave having chained slaves does a
successful PSYNC all its slaves can continue without troubles.

See issue #2694 for more details.
2015-07-28 16:35:02 +02:00
antirez
278ea9d16b replicationHandleMasterDisconnection() belongs to replication.c. 2015-07-28 14:36:50 +02:00
antirez
32f80e2f1b RDMF: More consistent define names. 2015-07-27 14:37:58 +02:00
antirez
40eb548a80 RDMF: REDIS_OK REDIS_ERR -> C_OK C_ERR. 2015-07-26 23:17:55 +02:00
antirez
2d9e3eb107 RDMF: redisAssert -> serverAssert. 2015-07-26 15:29:53 +02:00
antirez
14ff572482 RDMF: OBJ_ macros for object related stuff. 2015-07-26 15:28:00 +02:00
antirez
554bd0e7bd RDMF: use client instead of redisClient, like Disque. 2015-07-26 15:20:52 +02:00
antirez
424fe9afd9 RDMF: redisLog -> serverLog. 2015-07-26 15:17:43 +02:00
antirez
cef054e868 RDMF (Redis/Disque merge friendlyness) refactoring WIP 1. 2015-07-26 15:17:18 +02:00
antirez
8366907bed Use best effort address binding to connect to the master
We usually want to reach the master using the address of the interface
Redis is bound to (via the "bind" config option). That's useful since
the master will get (and publish) the slave address getting the peer
name of the incoming socket connection from the slave.

However, when this is not possible, for example because the slave is
bound to the loopback interface but repliaces from a master accessed via
an external interface, we want to still connect with the master even
from a different interface: in this case it is not really important that
the master will provide any other address, while it is vital to be able
to replicate correctly.

Related to issues #2609 and #2612.
2015-06-11 14:34:38 +02:00
antirez
6c60526db9 Net: improve prepareClientToWrite() error handling and comments.
When we fail to setup the write handler it does not make sense to take
the client around, it is missing writes: whatever is a client or a slave
anyway the connection should terminated ASAP.

Moreover what the function does exactly with its return value, and in
which case the write handler is installed on the socket, was not clear,
so the functions comment are improved to make the goals of the function
more obvious.

Also related to #2485.
2015-04-01 10:07:45 +02:00
Oran Agra
159875b5a3 fixes to diskless replication.
master was closing the connection if the RDB transfer took long time.
and also sent PINGs to the slave before it got the initial ACK, in which case the slave wouldn't be able to find the EOF marker.
2015-03-31 23:42:08 +03:00
antirez
c3ad70901f Replication: disconnect blocked clients when switching to slave role.
Bug as old as Redis and blocking operations. It's hard to trigger since
only happens on instance role switch, but the results are quite bad
since an inconsistency between master and slave is created.

How to trigger the bug is a good description of the bug itself.

1. Client does "BLPOP mylist 0" in master.
2. Master is turned into slave, that replicates from New-Master.
3. Client does "LPUSH mylist foo" in New-Master.
4. New-Master propagates write to slave.
5. Slave receives the LPUSH, the blocked client get served.

Now Master "mylist" key has "foo", Slave "mylist" key is empty.

Highlights:

* At step "2" above, the client remains attached, basically escaping any
  check performed during command dispatch: read only slave, in that case.
* At step "5" the slave (that was the master), serves the blocked client
  consuming a list element, which is not consumed on the master side.

This scenario is technically likely to happen during failovers, however
since Redis Sentinel already disconnects clients using the CLIENT
command when changing the role of the instance, the bug is avoided in
Sentinel deployments.

Closes #2473.
2015-03-24 16:00:09 +01:00
antirez
c5dd686ecb Replication: put server.master client creation into separated function. 2015-02-04 11:26:20 +01:00
antirez
ce269ad3c5 AnetFormatIP(): renamed, commented, now sticks to IP:port format.
A few code style changes + consistent format: not nice for humans but
better for parsers.
2014-12-11 18:20:30 +01:00
Matt Stancliff
491881e13b Cleanup all IP formatting code
Instead of manually checking for strchr(n,':') everywhere,
we can use our new centralized IP formatting functions.
2014-12-11 10:12:18 -05:00
antirez
1b732c09d0 Network bandwidth tracking + refactoring.
Track bandwidth used by clients and replication (but diskless
replication is not tracked since the actual transfer happens in the
child process).

This includes a refactoring that makes tracking new instantaneous
metrics simpler.
2014-12-03 12:16:25 +01:00
antirez
bb7fea0d5c Diskless SYNC: fix RDB EOF detection.
RDB EOF detection was relying on the final part of the RDB transfer to
be a magic 40 bytes EOF marker. However as the slave is put online
immediately, and because of sockets timeouts, the replication stream is
actually contiguous with the RDB file.

This means that to detect the EOF correctly we should either:

1) Scan all the stream searching for the mark. Sucks CPU-wise.
2) Start to send the replication stream only after an acknowledge.
3) Implement a proper chunked encoding.

For now solution "2" was picked, so the master does not start to send
ASAP the stream of commands in the case of diskless replication. We wait
for the first REPLCONF ACK command from the slave, that certifies us
that the slave correctly loaded the RDB file and is ready to get more
data.
2014-11-11 17:12:12 +01:00
antirez
f5c6ebbfe3 Disconnect timedout slave: regression introduced with diskless repl. 2014-11-11 15:10:58 +01:00
Matt Stancliff
0014966c1e Networking: add more outbound IP binding fixes
Same as the original bind fixes (we just missed these the
first time around).

This helps Redis not automatically send
connections from the first IP on an interface if we are bound
to a specific IP address (e.g. with multiple IP aliases on one
interface, you want to send from _your_ IP, not from the first IP
on the interface).
2014-10-29 15:09:09 -04:00
antirez
9ec22d9223 Diskless replication: missing listRewind() added.
This caused BGSAVE to be triggered a second time without any need when
we switch from socket to disk target via the command

    CONFIG SET repl-diskless-sync no

and there is already a slave waiting for the BGSAVE to start.
Also comments clarified about what is happening.
2014-10-29 12:48:22 +01:00
antirez
4b8f4b90b9 Log slave ip:port in more log messages. 2014-10-27 12:30:07 +01:00
antirez
8a416ca46e Added a function to get slave name for logs. 2014-10-27 11:58:20 +01:00
antirez
a27befc495 Diskless replication: log BGSAVE delay only when it is non-zero. 2014-10-27 10:48:39 +01:00
antirez
707352439c Diskless sync delay is now configurable. 2014-10-27 10:36:30 +01:00
antirez
c4dbc7cdec Remove duplicated log message about starting BGSAVE. 2014-10-24 10:38:42 +02:00
antirez
456003af25 Diskless replication: less debugging printfs around. 2014-10-17 17:11:48 +02:00
antirez
525c488f63 rio fdset target: handle short writes.
While the socket is set in blocking mode, we still can get short writes
writing to a socket.
2014-10-17 16:45:53 +02:00
antirez
4b16263bd9 Diskless replication: don't send "\n" pings to slaves.
This is useful for normal replication in order to refresh the slave
when we are persisting on disk, but for diskless replication the
child is already receiving data while in WAIT_BGSAVE_END state.
2014-10-17 10:23:44 +02:00
antirez
25a3d9965e Diskless replication: remove 40 bytes EOF mark from end of RDB file. 2014-10-17 10:23:11 +02:00
antirez
0c5a06f6bb Diskless replication: swap inverted branches to compute read len. 2014-10-17 10:22:29 +02:00
antirez
80f7f63b64 Diskless replication: don't enter the read-payload branch forever. 2014-10-17 10:21:18 +02:00
antirez
5ee2ccf48e Diskless replication: EOF:<mark> streaming support slave side. 2014-10-16 17:09:35 +02:00
antirez
43ae606430 Diskless replication: redis.conf and CONFIG SET/GET support. 2014-10-16 10:22:02 +02:00
antirez
42951ab301 Diskless replication: trigger a BGSAVE after a config change.
If we turn from diskless to disk-based replication via CONFIG SET, we
need a way to start a BGSAVE if there are slaves alerady waiting for a
BGSAVE to start. Normally with disk-based replication we do it as soon
as the previous child exits, but when there is a configuration change
via CONFIG SET, we may have slaves in WAIT_BGSAVE_START state without
an RDB background process currently active.
2014-10-16 10:15:18 +02:00
antirez
5f8360eb21 Diskless replication flag renamed repl_diskless -> repl_diskless_sync. 2014-10-16 10:00:50 +02:00
antirez
e9e007555e Diskless replication: trigger diskless RDB transfer if needed. 2014-10-16 09:03:52 +02:00
antirez
3730d118a3 Diskless replication: handle putting the slave online. 2014-10-15 15:31:19 +02:00
antirez
75f0cd6520 Diskless replication: RDB -> slaves transfer draft implementation. 2014-10-14 10:11:29 +02:00
antirez
16546f5aca Add some comments in syncCommand() to clarify RDB target. 2014-10-10 16:25:58 +02:00
Aaron Rutkovsky
3a82b8ac64 Fix typos
Closes #1513
2014-09-29 06:49:07 -04:00
Jan-Erik Rediger
9f98b29cef Fix typo: ad -> and
Closes #1537
2014-09-29 06:49:06 -04:00
antirez
95b1979c32 No more trailing spaces in Redis source code. 2014-06-26 18:48:40 +02:00
antirez
7970d53997 ROLE command: array len fixed for slave output. 2014-06-21 11:17:18 +02:00
antirez
6a13193d8f ROLE output improved for slaves.
Info about the replication state with the master added.
2014-06-07 17:38:20 +02:00
antirez
d34c2fa3bb ROLE command added.
The new ROLE command is designed in order to provide a client with
informations about the replication in a fast and easy to use way
compared to the INFO command where the same information is also
available.
2014-06-07 17:27:49 +02:00
antirez
0bcc7cb4bf CLIENT LIST speedup via peerid caching + smart allocation.
This commit adds peer ID caching in the client structure plus an API
change and the use of sdsMakeRoomFor() in order to improve the
reallocation pattern to generate the CLIENT LIST output.

Both the changes account for a very significant speedup.
2014-04-28 17:36:57 +02:00
antirez
970de3e9c0 Check for EAGAIN in sendBulkToSlave().
Sometime an osx master with a Linux server over a slow link caused
a strange error where osx called the writable function for
the socket but actually apparently there was no room in the socket
buffer to accept the write: write(2) call returned an EAGAIN error,
that was not checked, so we considered write(2) == 0 always as a connection
reset, which was unfortunate since the bulk transfer has to start again.

Also more errors are logged with the WARNING level in the same code path
now.
2014-02-05 16:38:10 +01:00
antirez
6f54032080 Cluster: function clusterGetSlaveRank() added.
Return the number of slaves for the same master having a better
replication offset of the current slave, that is, the slave "rank" used
to pick a delay before the request for election.
2014-01-29 16:39:04 +01:00
antirez
abd6308d27 Set server.repl_down_since to 0 when changing master.
When an instance is potentially set to replicate with another master, it
is conceptually disconnected forever, since we have no old copy of the
dataset for this master in memory.
2014-01-17 18:20:31 +01:00
antirez
90a81b4ebb Don't send REPLCONF ACK to old masters.
Masters not understanding REPLCONF ACK will reply with errors to our
requests causing a number of possible issues.

This commit detects a global replication offest set to -1 at the end of
the replication, and marks the client representing the master with the
REDIS_PRE_PSYNC flag.

Note that this flag was called REDIS_PRE_PSYNC_SLAVE but now it is just
REDIS_PRE_PSYNC as it is used for both slaves and masters starting with
this commit.

This commit fixes issue #1488.
2014-01-08 14:28:16 +01:00
antirez
3f92e05637 Clarify a comment in slaveTryPartialResynchronization(). 2014-01-08 14:28:13 +01:00
antirez
94e8c9e77e Make new masters inherit replication offsets.
Currently replication offsets could be used into a limited way in order
to understand, out of a set of slaves, what is the one with the most
updated data. For example this comparison is possible of N slaves
were replicating all with the same master.

However the replication offset was not transferred from master to slaves
(that are later promoted as masters) in any way, so for instance if
there were three instances A, B, C, with A master and B and C
replication from A, the following could happen:

C disconnects from A.
B is turned into master.
A is switched to master of B.
B receives some write.

In this context there was no way to compare the offset of A and C,
because B would use its own local master replication offset as
replication offset to initialize the replication with A.

With this commit what happens is that when B is turned into master it
inherits the replication offset from A, making A and C comparable.
In the above case assuming no inconsistencies are created during the
disconnection and failover process, A will show to have a replication
offset greater than C.

Note that this does not mean offsets are always comparable to understand
what is, in a set of instances, since in more complex examples the
replica with the higher replication offset could be partitioned away
when picking the instance to elect as new master. However this in
general improves the ability of a system to try to pick a good replica
to promote to master.
2013-12-22 11:43:25 +01:00
antirez
11120689c4 Slaves heartbeats during sync improved.
The previous fix for false positive timeout detected by master was not
complete. There is another blocking stage while loading data for the
first synchronization with the master, that is, flushing away the
current data from the DB memory.

This commit uses the newly introduced dict.c callback in order to make
some incremental work (to send "\n" heartbeats to the master) while
flushing the old data from memory.

It is hard to write a regression test for this issue unfortunately. More
support for debugging in the Redis core would be needed in terms of
functionalities to simulate a slow DB loading / deletion.
2013-12-10 18:47:31 +01:00
antirez
2eb781b35b dict.c: added optional callback to dictEmpty().
Redis hash table implementation has many non-blocking features like
incremental rehashing, however while deleting a large hash table there
was no way to have a callback called to do some incremental work.

This commit adds this support, as an optiona callback argument to
dictEmpty() that is currently called at a fixed interval (one time every
65k deletions).
2013-12-10 18:46:24 +01:00
antirez
2c4ab8a534 Log empty DB + Loading data into two separated messages. 2013-12-10 18:43:25 +01:00
antirez
11e81a1e9a Fixed grammar: before H the article is a, not an. 2013-12-05 16:35:32 +01:00
antirez
c5618e7fdd WAIT command: synchronous replication for Redis. 2013-12-04 16:20:03 +01:00
antirez
b2f834390c Log to what master a slave is going to connect to. 2013-11-11 09:25:36 +01:00
antirez
1461422ce6 Replication: install the write handler when reusing a cached master.
Sometimes when we resurrect a cached master after a successful partial
resynchronization attempt, there is pending data in the output buffers
of the client structure representing the master (likely REPLCONF ACK
commands).

If we don't reinstall the write handler, it will never be installed
again by addReply*() family functions as they'll assume that if there is
already data pending, the write handler is already installed.

This bug caused some slaves after a successful partial sync to never
send REPLCONF ACK, and continuously being detected as timing out by the
master, with a disconnection / reconnection loop.
2013-10-04 16:12:25 +02:00
antirez
37e06bd952 PSYNC: safer handling of PSYNC requests.
There was a bug that over-esteemed the amount of backlog available,
however this could only happen when a slave was asking for an offset
that was in the "future" compared to the master replication backlog.

Now this case is handled well and logged as an incident in the master
log file.
2013-10-04 12:25:09 +02:00
antirez
707ff0f714 Make clear that runids are not cluster node IDs. 2013-09-30 11:48:09 +02:00
Maxim Zakharov
70e82e5c79 A mistype fixed 2013-09-03 15:15:48 +02:00
antirez
c06de115af replicationFeedSlaves() func name typo: feedReplicationBacklogWithObject -> feedReplicationBacklog. 2013-08-12 12:50:45 +02:00
antirez
dcc48a8143 replicationFeedSlave() reworked for correctness and speed.
The previous code using a static buffer as an optimization was lame:

1) Premature optimization, actually it was *slower* than naive code
   because resulted into the creation / destruction of the object
   encapsulating the output buffer.
2) The code was very hard to test, since it was needed to have specific
   tests for command lines exceeding the size of the static buffer.
3) As a result of "2" the code was bugged as the current tests were not
   able to stress specific corner cases.

It was replaced with easy to understand code that is safer and faster.
2013-08-12 12:50:29 +02:00
antirez
aa05128f51 Fix a PSYNC bug caused by a variable name typo. 2013-08-12 11:51:35 +02:00
antirez
89ffba9133 Replication: better way to send a preamble before RDB payload.
During the replication full resynchronization process, the RDB file is
transfered from the master to the slave. However there is a short
preamble to send, that is currently just the bulk payload length of the
file in the usual Redis form $..length..<CR><LF>.

This preamble used to be sent with a direct write call, assuming that
there was alway room in the socket output buffer to hold the few bytes
needed, however this does not scale in case we'll need to send more
stuff, and is not very robust code in general.

This commit introduces a more general mechanism to send a preamble up to
2GB in size (the max length of an sds string) in a non blocking way.
2013-08-12 10:29:14 +02:00
antirez
c151eb6d92 Fix replicationFeedSlaves() off-by-one bug.
This fixes issue #1221.
2013-07-28 12:49:34 +02:00
antirez
a31693417d Fix replicationFeedSlaves() to use sdsEncodedObject() macro. 2013-07-22 10:36:27 +02:00
Ted Nyman
f39a0bdb77 Make sure the log standardizes on 'timeout' 2013-07-12 14:06:27 -07:00
antirez
d1cbad6d14 Use getClientPeerId() for MONITOR implementation. 2013-07-09 16:21:21 +02:00
antirez
90038906f4 Fix old anetPeerToString() API call in replication.c 2013-07-08 16:11:52 +02:00
Geoff Garside
ee5a6df101 Update calls to anetPeerToString to include ip_len. 2013-07-08 15:57:22 +02:00
antirez
8ca265cdb7 Don't disconnect pre PSYNC replication clients for timeout.
Clients using SYNC to replicate are older implementations, such as
redis-cli --slave, and are not designed to acknowledge the master with
REPLCONF ACK commands, so we don't have any feedback and should not
disconnect them on timeout.
2013-06-26 10:11:20 +02:00
antirez
f0bf5fd8c7 Use the RSC to replicate EVALSHA unmodified.
This commit uses the Replication Script Cache in order to avoid
translating EVALSHA into EVAL whenever possible for both the AOF and
slaves.
2013-06-24 18:57:31 +02:00
antirez
94ec7db470 Replication of scripts as EVALSHA: sha1 caching implemented.
This code is only responsible to take an LRU-evicted fixed length cache
of SHA1 that we are sure all the slaves received.

In this commit only the implementation is provided, but the Redis core
does not use it to actually send EVALSHA to slaves when possible.
2013-06-24 10:26:04 +02:00
antirez
1a54d5963e Refresh good slaves count when setting slave state as online. 2013-05-30 12:13:25 +02:00
antirez
ed599d3aca min-slaves-to-write: don't accept writes with less than N replicas.
This feature allows the user to specify the minimum number of
connected replicas having a lag less or equal than the specified
amount of seconds for writes to be accepted.
2013-05-30 11:30:04 +02:00
antirez
3c82c85fcf Close connection with timedout slaves.
Now masters, using the time at which the last REPLCONF ACK was received,
are able to explicitly disconnect slaves that are no longer responding.

Previously the only chance was to see a very long output buffer, that
was highly suboptimal.
2013-05-27 11:42:42 +02:00
antirez
e06a560466 Send ACK to master once every second.
ACKs can be also used as a base for synchronous replication. However in
that case they'll be explicitly requested by the master when the client
sends a request that needs to be replicated synchronously.
2013-05-27 11:42:38 +02:00
antirez
efd87031d0 Don't ACK the master after every command.
Sending an ACK is now moved into the replicationSendAck() function.
2013-05-27 11:42:35 +02:00
antirez
dd0adbb777 Make sure that REPLCONF ACK really has no return value. 2013-05-27 11:42:30 +02:00
antirez
6b4635f4f5 REPLCONF ACK command.
This special command is used by the slave to inform the master the
amount of replication stream it currently consumed.

it does not return anything so that we not need to consume additional
bandwidth needed by the master to reply something.

The master can do a number of things knowing the amount of stream
processed, such as understanding the "lag" in bytes of the slave, verify
if a given command was already processed by the slave, and so forth.
2013-05-27 11:42:17 +02:00
antirez
b7d085fc0d Cluster: SLAVEOF command not allowed in cluster mode. 2013-03-05 12:39:41 +01:00
antirez
3be893123f Make sure replicationSetMaster() works when ip argument is not an sds. 2013-03-04 15:39:55 +01:00
antirez
7bead003e2 SLAVEOF command refactored into a proper API.
We now have replicationSetMaster() and replicationUnsetMaster() that can
be called in other contexts (for instance Redis Cluster).
2013-03-04 13:22:21 +01:00
antirez
f9b5ca29fd Use GCC printf format attribute for redisLog().
This commit also fixes redisLog() statements producing warnings.
2013-02-27 12:27:15 +01:00
antirez
072c91fe13 PSYNC: another change to unexpected reply from PSYNC. 2013-02-13 18:43:40 +01:00
antirez
0e1be5347b PSYNC: More robust handling of unexpected reply to PSYNC. 2013-02-13 18:33:33 +01:00
antirez
3419c8ce70 Replication: more strict error checking for master PING reply. 2013-02-12 16:53:27 +01:00
antirez
24f258360b Replication: added new stats counting full and partial resynchronizations. 2013-02-12 15:33:54 +01:00
antirez
3af478e9ef PSYNC: debugging printf() calls are now logs at DEBUG level. 2013-02-12 12:52:22 +01:00
antirez
89b48f0825 Remove harmless warning in slaveTryPartialResynchronization(). 2013-02-12 12:52:21 +01:00
antirez
0ed6daa48b PSYNC: don't use the client buffer to send +CONTINUE and +FULLRESYNC.
When we are preparing an handshake with the slave we can't touch the
connection buffer as it'll be used to accumulate differences between
the sent RDB file and what arrives next from clients.

So in short we can't use addReply() family functions.

However we just use write(2) because we know that the socket buffer is
empty, since a prerequisite for SYNC to work is that the static buffer
and the output list are empty, and in general it is not expected that a
client SYNCs after doing some heavy I/O with the master.

However a short write connection is explicitly handled to avoid
fragility (we simply close the connection and the slave will retry).
2013-02-12 12:52:21 +01:00
antirez
d2a0348a49 SYNC not allowed with pending data on the static output buffer. 2013-02-12 12:52:21 +01:00
antirez
da315d3325 Log the unexpected string received in place of the SYNC payload length. 2013-02-12 12:52:21 +01:00
antirez
41d64a7516 After SLAVEOF <newslave> don't allow chained slaves to PSYNC. 2013-02-12 12:52:21 +01:00
antirez
078882025e PSYNC: work in progress, preview #2, rebased to unstable. 2013-02-12 12:52:21 +01:00
antirez
e34a35a511 Use the new unified protocol to send SELECT to slaves.
SELECT was still transmitted to slaves using the inline protocol, that
is conceived mostly for humans to type into telnet sessions, and is
notably not understood by redis-cli --slave.

Now the new protocol is used instead.
2013-02-12 12:50:28 +01:00
antirez
4b83ad4e1f Use replicationFeedSlaves() to send PING to slaves.
A Redis master sends PING commands to slaves from time to time: doing
this ensures that even if absence of writes, the master->slave channel
remains active and the slave can feel the master presence, instead of
closing the connection for timeout.

This commit changes the way PINGs are sent to slaves in order to use the
standard interface used to replicate all the other commands, that is,
the function replicationFeedSlaves().

With this change the stream of commands sent to every slave is exactly
the same regardless of their exact state (Transferring RDB for first
synchronization or slave already online). With the previous
implementation the PING was only sent to online slaves, with the result
that the output stream from master to slaves was not identical for all
the slaves: this is a problem if we want to implement partial resyncs in
the future using a global replication stream offset.

TL;DR: this commit should not change the behaviour in practical terms,
but is just something in preparation for partial resynchronization
support.
2013-02-12 12:50:28 +01:00
antirez
7465ac7ab1 Emit SELECT to slaves in a centralized way.
Before this commit every Redis slave had its own selected database ID
state. This was not actually useful as the emitted stream of commands
is identical for all the slaves.

Now the the currently selected database is a global state that is set to
-1 when a new slave is attached, in order to force the SELECT command to
be re-emitted for all the slaves.

This change is useful in order to implement replication partial
resynchronization in the future, as makes sure that the stream of
commands received by slaves, including SELECT commands, are exactly the
same for every slave connected, at any time.

In this way we could have a global offset that can identify a specific
piece of the master -> slaves stream of commands.
2013-02-12 12:50:28 +01:00
antirez
a6c2f9012f Make all WATCHers dirty when the slave reloads the DB. 2013-02-08 10:26:19 +01:00
antirez
b70b459b0e TCP_NODELAY after SYNC: changes to the implementation. 2013-02-05 12:04:30 +01:00
charsyam
c85647f354 Turn off TCP_NODELAY on the slave socket after SYNC.
Further details from @antirez:

It was reported by @StopForumSpam on Twitter that the Redis replication
link was strangely using multiple TCP packets for multiple commands.
This wastes a lot of bandwidth and is due to the TCP_NODELAY option we
enable on the socket after accepting a new connection.

However the master -> slave channel is a one-way channel since Redis
replication is asynchronous, so there is no point in trying to reduce
the latency, we should aim to reduce the bandwidth. For this reason this
commit introduces the ability to disable the nagle algorithm on the
socket after a successful SYNC.

This feature is off by default because the delay can be up to 40
milliseconds with normally configured Linux kernels.
2013-02-05 12:04:25 +01:00
guiquanz
9d09ce3981 Fixed many typos. 2013-01-19 10:59:44 +01:00
antirez
ef99e146a8 Undo slave-master handshake when SLAVEOF sets a new slave.
Issue #828 shows how Redis was not correctly undoing a non-blocking
connection attempt with the previous master when the master was set to a
new address using the SLAVEOF command.

This was also a result of lack of refactoring, so now there is a
function to cancel the non blocking handshake with the master.
The new function is now used when SLAVEOF NO ONE is called or when
SLAVEOF is used to set the master to a different address.
2013-01-15 13:33:24 +01:00
antirez
d7740fc8f3 Better error reporting when fd event creation fails. 2013-01-03 14:29:34 +01:00
antirez
f1481d4a03 serverCron() frequency is now a runtime parameter (was REDIS_HZ).
REDIS_HZ is the frequency our serverCron() function is called with.
A more frequent call to this function results into less latency when the
server is trying to handle very expansive background operations like
mass expires of a lot of keys at the same time.

Redis 2.4 used to have an HZ of 10. This was good enough with almost
every setup, but the incremental key expiration algorithm was working a
bit better under *extreme* pressure when HZ was set to 100 for Redis
2.6.

However for most users a latency spike of 30 milliseconds when million
of keys are expiring at the same time is acceptable, on the other hand a
default HZ of 100 in Redis 2.6 was causing idle instances to use some
CPU time compared to Redis 2.4. The CPU usage was in the order of 0.3%
for an idle instance, however this is a shame as more energy is consumed
by the server, if not important resources.

This commit introduces HZ as a runtime parameter, that can be queried by
INFO or CONFIG GET, and can be modified with CONFIG SET. At the same
time the default frequency is set back to 10.

In this way we default to a sane value of 10, but allows users to
easily switch to values up to 500 for near real-time applications if
needed and if they are willing to pay this small CPU usage penalty.
2012-12-14 17:10:40 +01:00
antirez
4365e5b2d3 BSD license added to every C source and header file. 2012-11-08 18:31:32 +01:00
antirez
2ea41242f6 Unix socket clients properly displayed in MONITOR and CLIENT LIST.
This also fixes issue #745.
2012-11-01 22:10:45 +01:00
antirez
f0b9f80345 "Timeout receiving bulk data" error message modified.
The new message now contains an hint about modifying the repl-timeout
configuration directive if the problem persists.

This should normally not be needed, because while the master generates
the RDB file it makes sure to send newlines to the replication channel
to prevent timeouts. However there are times when masters running on
very slow systems can completely stop for seconds during the RDB saving
process. In such a case enlarging the timeout value can fix the problem.

See issue #695 for an example of this problem in an EC2 deployment.
2012-10-04 11:52:16 +02:00
antirez
d310fbedab Fix compilation on FreeBSD. Thanks to @koobs on twitter. 2012-09-17 12:46:06 +02:00
Salvatore Sanfilippo
24bc807b5c Merge pull request #576 from saj/fix-slave-ping-period
Bug fix: slaves being pinged every second
2012-09-05 06:59:37 -07:00
antirez
bb66fc3120 Send an async PING before starting replication with master.
During the first synchronization step of the replication process, a Redis
slave connects with the master in a non blocking way. However once the
connection is established the replication continues sending the REPLCONF
command, and sometimes the AUTH command if needed. Those commands are
send in a partially blocking way (blocking with timeout in the order of
seconds).

Because it is common for a blocked master to accept connections even if
it is actually not able to reply to the slave requests, it was easy for
a slave to block if the master had serious issues, but was still able to
accept connections in the listening socket.

For this reason we now send an asynchronous PING request just after the
non blocking connection ended in a successful way, and wait for the
reply before to continue with the replication process. It is very
unlikely that a master replying to PING can't reply to the other
commands.

This solution was proposed by Didier Spezia (Thanks!) so that we don't
need to turn all the replication process into a non blocking affair, but
still the probability of a slave blocked is minimal even in the event of
a failing master.

Also we now use getsockopt(SO_ERROR) in order to check errors ASAP
in the event handler, instead of waiting for actual I/O to return an
error.

This commit fixes issue #632.
2012-09-02 12:24:38 +02:00
antirez
784b93087c Incrementally flush RDB on disk while loading it from a master.
This fixes issue #539.

Basically if there is enough free memory the OS may buffer the RDB file
that the slave transfers on disk from the master. The file may
actually be flused on disk at once by the operating system when it gets
closed by Redis, causing the close system call to block for a long time.

This patch is a modified version of one provided by yoav-steinberg of
@garantiadata (the original version was posted in the issue #539
comments), and tries to flush the OS buffers incrementally (every 8 MB
of loaded data).
2012-08-28 12:47:33 +02:00
Saj Goonatilleke
9edfe63553 Bug fix: slaves being pinged every second
REDIS_REPL_PING_SLAVE_PERIOD controls how often the master should
transmit a heartbeat (PING) to its slaves.  This period, which defaults
to 10, is measured in seconds.

Redis 2.4 masters used to ping their slaves every ten seconds, just like
it says on the tin.

The Redis 2.6 masters I have been experimenting with, on the other hand,
ping their slaves *every second*.  (master_last_io_seconds_ago never
approaches 10.)  I think the ping period was inadvertently slashed to
one-tenth of its nominal value around the time REDIS_HZ was introduced.
This commit reintroduces correct ping schedule behaviour.
2012-07-05 14:29:27 +10:00
antirez
36def8fd9a Typo in comment. 2012-06-27 11:26:44 +02:00
antirez
3a32897856 REPLCONF internal command introduced.
The REPLCONF command is an internal command (not designed to be directly
used by normal clients) that allows a slave to set some replication
related state in the master before issuing SYNC to start the
replication.

The initial motivation for this command, and the only reason currently
it is used by the implementation, is to let the slave instance
communicate its listening port to the slave, so that the master can
show all the slaves with their listening ports in the "replication"
section of the INFO output.

This allows clients to auto discover and query all the slaves attached
into a master.

Currently only a single option of the REPLCONF command is supported, and
it is called "listening-port", so the slave now starts the replication
process with something like the following chat:

    REPLCONF listening-prot 6380
    SYNC

Note that this works even if the master is an older version of Redis and
does not understand REPLCONF, because the slave ignores the REPLCONF
error.

In the future REPLCONF can be used for partial replication and other
replication related features where there is the need to exchange
information between master and slave.

NOTE: This commit also fixes a bug: the INFO outout already carried
information about slaves, but the port was broken, and was obtained
with getpeername(2), so it was actually just the ephemeral port used
by the slave to connect to the master as a client.
2012-06-27 09:43:57 +02:00
antirez
ef37997608 Dead code removed from replication.c.
The user @jokea noticed that the following line of code into
replication.c made little sense:

    addReplySds(slave,sdsempty());

Investigating a bit I found that this was introduced by commit 6208b3a7
three years ago in the early stages of Redis. The code apparently is not
useful at all, so I'm removing it.

This change will not be backported into 2.4 so that in the rare case
this should introduce a bug, we'll have a chance to detect it into the
development branch. However following the code path it seems like the
code is not useful at all, so the risk is truly small.
2012-05-24 11:35:21 +02:00
antirez
299290d3a4 Remove useless trailing space in SYNC command sent to master. 2012-05-02 21:47:53 +02:00
David Tran
31788f50b7 Spelling: s/synchrnonization/synchronization 2012-04-25 12:21:56 -07:00
antirez
9157549fad syncio.c calls in replication.c fixed for the new millisecond timeout API. 2012-03-31 11:23:30 +02:00