To perform a socket write() for each RDB rio API write call was
extremely unefficient, so now rio has minimal buffering capabilities.
Writes are accumulated into a buffer and only when a given limit is
reacehd are actually wrote to the N slaves FDs.
Trivia: rio lacked support for buffering since our targets were:
1) Memory buffers.
2) C standard I/O.
Both were buffered already.
This is useful for normal replication in order to refresh the slave
when we are persisting on disk, but for diskless replication the
child is already receiving data while in WAIT_BGSAVE_END state.
If we turn from diskless to disk-based replication via CONFIG SET, we
need a way to start a BGSAVE if there are slaves alerady waiting for a
BGSAVE to start. Normally with disk-based replication we do it as soon
as the previous child exits, but when there is a configuration change
via CONFIG SET, we may have slaves in WAIT_BGSAVE_START state without
an RDB background process currently active.
Fdset target is used when we want to write an RDB file directly to
slave's sockets. In this setup as long as there is a single slave that
is still receiving our payload, we want to continue sennding instead of
aborting. However rio calls should abort of no FD is ok.
Also we want the errors reported so that we can signal the parent who is
ok and who is broken, so there is a new set integers with the state of
each fd. Zero is ok, non-zero is the errno of the failure, if avaialble,
or a generic EIO.
A few people have written custom C commands because bit
manipulation isn't exposed through Lua. Let's give
them Mike Pall's bitop.
This adds bitop 1.0.2 (2012-05-08) from http://bitop.luajit.org/
bitop is imported as "bit" into the global namespace.
New Lua commands: bit.tobit, bit.tohex, bit.bnot, bit.band, bit.bor, bit.bxor,
bit.lshift, bit.rshift, bit.arshift, bit.rol, bit.ror, bit.bswap
Verification of working (the asserts would abort on error, so (nil) is correct):
127.0.0.1:6379> eval "assert(bit.tobit(1) == 1); assert(bit.band(1) == 1); assert(bit.bxor(1,2) == 3); assert(bit.bor(1,2,4,8,16,32,64,128) == 255)" 0
(nil)
127.0.0.1:6379> eval 'assert(0x7fffffff == 2147483647, "broken hex literals"); assert(0xffffffff == -1 or 0xffffffff == 2^32-1, "broken hex literals"); assert(tostring(-1) == "-1", "broken tostring()"); assert(tostring(0xffffffff) == "-1" or tostring(0xffffffff) == "4294967295", "broken tostring()")' 0
(nil)
Tests also integrated into the scripting tests and can be run with:
./runtest --single unit/scripting
Tests are excerpted from `bittest.lua` included in the bitop distribution.
With the exception of nodes sending MEET packets: we have to trust them
since they can send us MEET packets only when the cluster is initially
created or because sysadmin manual action.
In the cluster evaluation function we are supposed to set the cluster
state as "fail" if we are among a minority, however the code was not
detecting to be into a minority partition if exactly half the masters
were reachable, which is a minority.
We need to remember what is the saving strategy of the current RDB child
process, since the configuration may be modified at runtime via CONFIG
SET and still we'll need to understand, when the child exists, what to
do and for what goal the process was initiated: to create an RDB file
on disk or to write stuff directly to slave's sockets.
However we don't try to do this if the integer is already inside a range
representable with a shared integer.
The performance gain appears to be around ~15% in micro benchmarks,
however in the long run this also helps to improve locality, so should
have more, hard to measure, benefits.
Some language in the comment was difficult
to understand, so this commit: clarifies wording, removes
unnecessary words, and relocates some dependent clauses
closer to what they actually describe.
I also tried to break up longer chains of thought
(if X, then Y, and Q, and also F, so obviously M)
into more manageable chunks for ease of understanding.