Commit Graph

46 Commits

Author SHA1 Message Date
antirez
d310fbedab Fix compilation on FreeBSD. Thanks to @koobs on twitter. 2012-09-17 12:46:06 +02:00
Salvatore Sanfilippo
24bc807b5c Merge pull request #576 from saj/fix-slave-ping-period
Bug fix: slaves being pinged every second
2012-09-05 06:59:37 -07:00
antirez
bb66fc3120 Send an async PING before starting replication with master.
During the first synchronization step of the replication process, a Redis
slave connects with the master in a non blocking way. However once the
connection is established the replication continues sending the REPLCONF
command, and sometimes the AUTH command if needed. Those commands are
send in a partially blocking way (blocking with timeout in the order of
seconds).

Because it is common for a blocked master to accept connections even if
it is actually not able to reply to the slave requests, it was easy for
a slave to block if the master had serious issues, but was still able to
accept connections in the listening socket.

For this reason we now send an asynchronous PING request just after the
non blocking connection ended in a successful way, and wait for the
reply before to continue with the replication process. It is very
unlikely that a master replying to PING can't reply to the other
commands.

This solution was proposed by Didier Spezia (Thanks!) so that we don't
need to turn all the replication process into a non blocking affair, but
still the probability of a slave blocked is minimal even in the event of
a failing master.

Also we now use getsockopt(SO_ERROR) in order to check errors ASAP
in the event handler, instead of waiting for actual I/O to return an
error.

This commit fixes issue #632.
2012-09-02 12:24:38 +02:00
antirez
784b93087c Incrementally flush RDB on disk while loading it from a master.
This fixes issue #539.

Basically if there is enough free memory the OS may buffer the RDB file
that the slave transfers on disk from the master. The file may
actually be flused on disk at once by the operating system when it gets
closed by Redis, causing the close system call to block for a long time.

This patch is a modified version of one provided by yoav-steinberg of
@garantiadata (the original version was posted in the issue #539
comments), and tries to flush the OS buffers incrementally (every 8 MB
of loaded data).
2012-08-28 12:47:33 +02:00
Saj Goonatilleke
9edfe63553 Bug fix: slaves being pinged every second
REDIS_REPL_PING_SLAVE_PERIOD controls how often the master should
transmit a heartbeat (PING) to its slaves.  This period, which defaults
to 10, is measured in seconds.

Redis 2.4 masters used to ping their slaves every ten seconds, just like
it says on the tin.

The Redis 2.6 masters I have been experimenting with, on the other hand,
ping their slaves *every second*.  (master_last_io_seconds_ago never
approaches 10.)  I think the ping period was inadvertently slashed to
one-tenth of its nominal value around the time REDIS_HZ was introduced.
This commit reintroduces correct ping schedule behaviour.
2012-07-05 14:29:27 +10:00
antirez
36def8fd9a Typo in comment. 2012-06-27 11:26:44 +02:00
antirez
3a32897856 REPLCONF internal command introduced.
The REPLCONF command is an internal command (not designed to be directly
used by normal clients) that allows a slave to set some replication
related state in the master before issuing SYNC to start the
replication.

The initial motivation for this command, and the only reason currently
it is used by the implementation, is to let the slave instance
communicate its listening port to the slave, so that the master can
show all the slaves with their listening ports in the "replication"
section of the INFO output.

This allows clients to auto discover and query all the slaves attached
into a master.

Currently only a single option of the REPLCONF command is supported, and
it is called "listening-port", so the slave now starts the replication
process with something like the following chat:

    REPLCONF listening-prot 6380
    SYNC

Note that this works even if the master is an older version of Redis and
does not understand REPLCONF, because the slave ignores the REPLCONF
error.

In the future REPLCONF can be used for partial replication and other
replication related features where there is the need to exchange
information between master and slave.

NOTE: This commit also fixes a bug: the INFO outout already carried
information about slaves, but the port was broken, and was obtained
with getpeername(2), so it was actually just the ephemeral port used
by the slave to connect to the master as a client.
2012-06-27 09:43:57 +02:00
antirez
ef37997608 Dead code removed from replication.c.
The user @jokea noticed that the following line of code into
replication.c made little sense:

    addReplySds(slave,sdsempty());

Investigating a bit I found that this was introduced by commit 6208b3a7
three years ago in the early stages of Redis. The code apparently is not
useful at all, so I'm removing it.

This change will not be backported into 2.4 so that in the rare case
this should introduce a bug, we'll have a chance to detect it into the
development branch. However following the code path it seems like the
code is not useful at all, so the risk is truly small.
2012-05-24 11:35:21 +02:00
antirez
299290d3a4 Remove useless trailing space in SYNC command sent to master. 2012-05-02 21:47:53 +02:00
David Tran
31788f50b7 Spelling: s/synchrnonization/synchronization 2012-04-25 12:21:56 -07:00
antirez
9157549fad syncio.c calls in replication.c fixed for the new millisecond timeout API. 2012-03-31 11:23:30 +02:00
antirez
c2672a06cd Purely aesthetic code change. 2012-03-30 10:39:34 +02:00
Joseph Jang
f892797e1b Fixed a memory leak with replication
occurs when two or more dbs are replicated and at least one of them is >db10
2012-03-30 10:34:29 +02:00
antirez
179e54d2a9 Fix for slaves chains. Force resync of slaves (simply disconnecting them) when SLAVEOF turns a master into a slave. 2012-03-29 09:24:02 +02:00
Premysl Hruby
d194905449 use server.unixtime instead of time(NULL) where possible (cluster.c not checked though) 2012-03-27 17:39:58 +02:00
antirez
e31b615e62 Better MONITOR output, now includes client ip:port or the lua string if the command was executed by the scripting engine. 2012-03-07 12:12:15 +01:00
antirez
a950a84303 Ping the slave using the standard protocol instead of the inline one. 2012-02-29 16:33:54 +01:00
antirez
ebdfad69dc Don't change the replication state if SLAVE OF is called with arguments specifying the same master we are already connected with. This fixes issues #290. 2012-01-16 11:29:47 +01:00
antirez
1824e3a3a3 Fixed replication when multiple slaves are attaching at the same time. The output buffer was not copied correctly between slaves. This fixes issue #141. 2011-12-30 19:40:43 +01:00
antirez
1844f9900f server.replstate -> server.repl_state 2011-12-21 12:23:18 +01:00
antirez
f48cd4b90c some RDB server struct fields renamed. 2011-12-21 12:22:13 +01:00
antirez
e394114d95 AOF refactoring, now with three states: ON, OFF, WAIT_REWRITE. 2011-12-21 10:31:34 +01:00
antirez
e7a2e7c1f7 AOF fixes in the context of replicaiton (when AOF is used by slave) and CONFIG SET appendonly yes/no. 2011-12-15 16:07:49 +01:00
antirez
27acd7aa89 Replication bug fixed: now non blocking connect is also forced to follow the configured replication timeout. 2011-11-30 15:35:16 +01:00
antirez
8996bf7720 7c6da73 2011-10-31 11:13:28 +01:00
antirez
76e772f39a Return from syncWithMaster() ASAP if the event fired but the instance is no longer a slave. This should fix Issue #145. 2011-10-18 11:15:11 +02:00
antirez
45029d37cb Two fixes for replication: Slave performs the AOF rewrite at the right point. Non blocking connect also uses readable handler as with old Linux kernels like 2.6.18 on connection refused the writable even is not fired (kernel bug). 2011-06-09 15:39:12 +02:00
Pieter Noordhuis
632e4c09ac Make replication faster (biggest gain for small number of slaves) 2011-05-30 12:45:07 +02:00
Pieter Noordhuis
890a2ed989 Configurable synchronous I/O timeout 2011-05-22 12:58:18 +02:00
Pieter Noordhuis
b075621fb7 Minor changes in non-blocking repl. connect 2011-05-22 12:51:09 +02:00
Pieter Noordhuis
a330913999 Non-blocking connect with master 2011-05-19 18:54:57 +02:00
antirez
f96a9f82d8 suppress a Linux warning, for 2.2 sake 2011-02-21 17:51:52 +01:00
antirez
89a1433e69 Fixed issue #435 and at the same time introduced explicit ping in the master-slave channel that will detect a blocked master or a broken even if apparently connected TCP link. 2011-01-20 13:18:23 +01:00
Pieter Noordhuis
2b2eca1f56 Zero-pad timestamps in MONITOR output
Original report and fix:
http://code.google.com/p/redis/issues/detail?id=404
2010-12-14 17:39:34 +01:00
antirez
9fd01051bf Fix for bug 374, thanks to Jeremy Zawodny for reporting and tracing why it was crashing. 2010-11-12 20:02:20 +01:00
antirez
f6433915fe more replication info in logs 2010-11-04 18:14:20 +01:00
antirez
26b3366993 non blocking slave replication is now more non blocking than the first implementation... 2010-11-04 18:09:35 +01:00
antirez
62ec599c36 typos and minor stuff fixed in the new non blocking replication code 2010-11-04 17:35:03 +01:00
antirez
f4aa600b99 first attempt to non blocking implementation of slave replication and SYNC bulk data download. Never compiled so far... 2010-11-04 17:29:53 +01:00
antirez
19e61097c5 synchronous I/O networking functions originally used just for replication refactored in a file as generally useful, they are used in the cluster branch for MIGRATE. 2010-10-24 16:22:52 +02:00
Pieter Noordhuis
3ab203762f Use specialized function to add status and error replies 2010-09-02 23:33:06 +02:00
antirez
09252fc4f3 Fixed another instace of the Issue 173 2010-08-27 12:46:10 +02:00
antirez
b91d605a35 slave now detect lost connection during SYNC, fixing Issue 173 2010-08-24 16:25:00 +02:00
antirez
778b2210a9 slave with attached slaves now close the conection to all the slaves when the connection to the master is lost. Now a slave without a connected link to the master will refuse SYNC from other slaves. Enhanced the replication error reporting. All this will fix Issue 156 2010-08-24 16:04:13 +02:00
antirez
d3b958c3fc Fixed MONITOR output for consistency: now integer encoded values are also formatted like this: "3932" 2010-07-01 20:22:46 +02:00
antirez
e2641e09cc redis.c split into many different C files.
networking related stuff moved into networking.c

moved more code

more work on layout of source code

SDS instantaneuos memory saving. By Pieter and Salvatore at VMware ;)

cleanly compiling again after the first split, now splitting it in more C files

moving more things around... work in progress

split replication code

splitting more

Sets split

Hash split

replication split

even more splitting

more splitting

minor change
2010-07-01 14:38:51 +02:00