Commit Graph

324 Commits

Author SHA1 Message Date
antirez
f7d4c3acdf Streams: trap more errors in stream loading + RDB check type name. 2018-03-15 12:54:10 +01:00
antirez
13ff7bc3ef CG: fix RDB saving when there are no consumer groups. 2018-03-15 12:54:10 +01:00
antirez
9f60a6bcee CG: RDB loading, fix inverted conditional. 2018-03-15 12:54:10 +01:00
antirez
f4e1a4de25 CG: RDB loading first implementation. 2018-03-15 12:54:10 +01:00
antirez
db7a5f23b4 CG: RDB saving part 2, consumers. 2018-03-15 12:54:10 +01:00
antirez
8fb6048ed0 CG: RDB saving part 1, metadata and PEL. 2018-03-15 12:54:10 +01:00
charsyam
76386c48b8 refactoring-make-condition-clear-for-rdb 2018-02-27 21:55:20 +09:00
Salvatore Sanfilippo
d8830200b4
Merge pull request #3828 from oranagra/sdsnewlen_pr
add SDS_NOINIT option to sdsnewlen to avoid unnecessary memsets.
2018-02-27 04:04:32 -08:00
Oran Agra
60a4f12f8b fix processing of large bulks (above 2GB)
- protocol parsing (processMultibulkBuffer) was limitted to 32big positions in the buffer
  readQueryFromClient potential overflow
- rioWriteBulkCount used int, although rioWriteBulkString gave it size_t
- several places in sds.c that used int for string length or index.
- bugfix in RM_SaveAuxField (return was 1 or -1 and not length)
- RM_SaveStringBuffer was limitted to 32bit length
2017-12-29 12:24:19 +02:00
antirez
60d26acfc8 Refactoring: improve luaCreateFunction() API.
The function in its initial form, and after the fixes for the PSYNC2
bugs, required code duplication in multiple spots. This commit modifies
it in order to always compute the script name independently, and to
return the SDS of the SHA of the body: this way it can be used in all
the places, including for SCRIPT LOAD, without duplicating the code to
create the Lua function name. Note that this requires to re-compute the
body SHA1 in the case of EVAL seeing a script for the first time, but
this should not change scripting performance in any way because new
scripts definition is a rare event happening the first time a script is
seen, and the SHA1 computation is anyway not a very slow process against
the typical Redis script and compared to the actua Lua byte compiling of
the body.

Note that the function used to assert() if a duplicated script was
loaded, however actually now two times over three, we want the function
to handle duplicated scripts just fine: this happens in SCRIPT LOAD and
in RDB AUX "lua" loading. Moreover the assert was not defending against
some obvious failure mode, so now the function always tests against
already defined functions at start.
2017-12-04 11:25:20 +01:00
antirez
65a9740fa8 Fix loading of RDB files lua AUX fields when the script is defined.
In the case of slaves loading the RDB from master, or in other similar
cases, the script is already defined, and the function registering the
script should not fail in the assert() call.
2017-12-01 16:01:10 +01:00
antirez
f24d3a7de0 Streams: delta encode IDs based on key. Add count + deleted fields.
We used to have the master ID stored at the start of the listpack,
however using the key directly makes more sense in order to create a
space efficient representation: anyway the key at the radix tree is very
unlikely to change because of how the stream is implemented. Moreover on
nodes merging, to rewrite the merged listpacks is anyway the most
sensible operation, and we can use the iterator and the append-to-stream
function in order to avoid re-implementing the code needed for merging.

This commit also adds two items at the start of the listpack: the
number of valid items inside the listpack, and the number of items
marked as deleted. This means that there is no need to scan a listpack
in order to understand if it's a good candidate for garbage collection,
if the ration between valid/deleted items triggers the GC.
2017-12-01 10:24:24 +01:00
antirez
98d184db12 Streams: Save stream->length in RDB. 2017-12-01 10:24:24 +01:00
antirez
edd70c1993 Streams: RDB loading. RDB saving modified.
After a few attempts it looked quite saner to just add the last item ID
at the end of the serialized listpacks, instead of scanning the last
listpack loaded from head to tail just to fetch it. It's a disk space VS
CPU-and-simplicity tradeoff basically.
2017-12-01 10:24:24 +01:00
antirez
485014cc74 Streams: RDB saving. 2017-12-01 10:24:24 +01:00
antirez
452ad2e928 PSYNC2: just store script bodies into RDB.
Related to #4483. As suggested by @soloestoy, we can retrieve the SHA1
from the body. Given that in the new implementation using AUX fields we
ended copying around a lot to create new objects and strings, extremize
such concept and trade CPU for space inside the RDB file.
2017-11-30 18:38:26 +01:00
antirez
f11a7585a8 PSYNC2: Save Lua scripts state into RDB file.
This is currently needed in order to fix #4483, but this can be
useful in other contexts, so maybe later we may want to remove the
conditionals and always save/load scripts.

Note that we are using the "lua" AUX field here, in order to guarantee
backward compatibility of the RDB file. The unknown AUX fields must be
discarded by past versions of Redis.
2017-11-30 18:37:52 +01:00
antirez
4d063bb6ba PSYNC2: reorganize comments related to recent fixes.
Related to PR #4412 and issue #4407.
2017-11-24 11:08:29 +01:00
Salvatore Sanfilippo
9d86ae4597
Merge pull request #4412 from soloestoy/bugfix-psync2
PSYNC2: safe free backlog when reach the time limit and others
2017-11-24 10:56:18 +01:00
zhaozhao.zz
ea2e51c630 PSYNC2: persist cached_master's dbid inside the RDB 2017-11-22 12:11:26 +08:00
zhaozhao.zz
93037f7642 PSYNC2: make repl_stream_db never be -1
it means that after this change all the replication
info in RDB is valid, and it can distinguish us from
the older version.
2017-11-22 12:05:34 +08:00
antirez
a1944c3e4d Fix saving of zero-length lists.
Normally in modern Redis you can't create zero-len lists, however it's
possible to load them from old RDB files generated, for instance, using
Redis 2.8 (see issue #4409). The "Right Thing" would be not loading such
lists at all, but this requires to hook in rdb.c random places in a not
great way, for a problem that is at this point, at best, minor.

Here in this commit instead I just fix the fact that zero length lists,
materialized as quicklists with the first node set to NULL, were
iterated in the wrong way while they are saved, leading to a crash.

The other parts of the list implementation are apparently able to deal
with empty lists correctly, even if they are no longer a thing.
2017-11-06 12:37:03 +01:00
zhaozhao.zz
b8579c225c PSYNC2: clarify the scenario when repl_stream_db can be -1 2017-11-02 10:45:33 +08:00
zhaozhao.zz
885c4f856e PSYNC2 & RDB: fix the missing rdbSaveInfo for BGSAVE 2017-11-01 17:52:43 +08:00
antirez
bb3b5ddd19 PSYNC2: More refinements related to #4316. 2017-09-20 11:28:13 +02:00
zhaozhao.zz
b541ccef25 PSYNC2: make persisiting replication info more solid
This commit is a reinforcement of commit c1c99e9.

1. Replication information can be stored when the RDB file is
generated by a mater using server.slaveseldb when server.repl_backlog
is not NULL, or set repl_stream_db be -1. That's safe, because
NULL server.repl_backlog will trigger full synchronization,
then master will send SELECT command to replicaiton stream.
2. Only do rdbSave* when rsiptr is not NULL,
if we do rdbSave* without rdbSaveInfo, slave will miss repl-stream-db.
3. Save the replication informations also in the case of
SAVE command, FLUSHALL command and DEBUG reload.
2017-09-20 11:18:10 +02:00
antirez
c1c99e9f4e PSYNC2: Fix the way replication info is saved/loaded from RDB.
This commit attempts to fix a number of bugs reported in #4316.
They are related to the way replication info like replication ID,
offsets, and currently selected DB in the master client, are stored
and loaded by Redis. In order to avoid inconsistencies the changes in
this commit try to enforce that:

1. Replication information are only stored when the RDB file is
generated by a slave that has a valid 'master' client, so that we can
always extract the currently selected DB.
2. When replication informations are persisted in the RDB file, all the
info for a successful PSYNC or nothing is persisted.
3. The RDB replication informations are only loaded if the instance is
configured as a slave, otherwise a master can start with IDs that relate
to a different history of the data set, and stil retain such IDs in the
future while receiving unrelated writes.
2017-09-19 23:03:39 +02:00
antirez
fc7ecd8d35 AOF check utility: ability to check files with RDB preamble. 2017-07-10 13:38:23 +02:00
antirez
2b36950e9b Free IO context if any in RDB loading code.
Thanks to @oranagra for spotting this bug.
2017-07-06 11:20:49 +02:00
antirez
365dd037dc RDB modules values serialization format version 2.
The original RDB serialization format was not parsable without the
module loaded, becuase the structure was managed only by the module
itself. Moreover RDB is a streaming protocol in the sense that it is
both produce di an append-only fashion, and is also sometimes directly
sent to the socket (in the case of diskless replication).

The fact that modules values cannot be parsed without the relevant
module loaded is a problem in many ways: RDB checking tools must have
loaded modules even for doing things not involving the value at all,
like splitting an RDB into N RDBs by key or alike, or just checking the
RDB for sanity.

In theory module values could be just a blob of data with a prefixed
length in order for us to be able to skip it. However prefixing the values
with a length would mean one of the following:

1. To be able to write some data at a previous offset. This breaks
stremaing.
2. To bufferize values before outputting them. This breaks performances.
3. To have some chunked RDB output format. This breaks simplicity.

Moreover, the above solution, still makes module values a totally opaque
matter, with the fowllowing problems:

1. The RDB check tool can just skip the value without being able to at
least check the general structure. For datasets composed mostly of
modules values this means to just check the outer level of the RDB not
actually doing any checko on most of the data itself.
2. It is not possible to do any recovering or processing of data for which a
module no longer exists in the future, or is unknown.

So this commit implements a different solution. The modules RDB
serialization API is composed if well defined calls to store integers,
floats, doubles or strings. After this commit, the parts generated by
the module API have a one-byte prefix for each of the above emitted
parts, and there is a final EOF byte as well. So even if we don't know
exactly how to interpret a module value, we can always parse it at an
high level, check the overall structure, understand the types used to
store the information, and easily skip the whole value.

The change is backward compatible: older RDB files can be still loaded
since the new encoding has a new RDB type: MODULE_2 (of value 7).
The commit also implements the ability to check RDB files for sanity
taking advantage of the new feature.
2017-06-27 13:19:16 +02:00
antirez
e498d9ee3e Collect fork() timing info only if fork succeeded. 2017-05-19 11:10:36 +02:00
antirez
c33493277a Clarify why we save ziplist elements in revserse order.
Also get rid of variables that are now kinda redundant, since the
dictionary iterator was removed.

This is related to PR #3949.
2017-04-18 11:01:47 +02:00
spinlock
23ec36909e rdb: saving skiplist in reversed order to accelerate the deserialisation process 2017-04-17 13:22:34 +08:00
oranagra
f86df924b0 add SDS_NOINIT option to sdsnewlen to avoid unnecessary memsets.
this commit also contains small bugfix in rdbLoadLzfStringObject
a bug that currently has no implications.
2017-02-23 03:04:08 -08:00
antirez
04542cff92 Replication: fix the infamous key leakage of writable slaves + EXPIRE.
BACKGROUND AND USE CASEj

Redis slaves are normally write only, however the supprot a "writable"
mode which is very handy when scaling reads on slaves, that actually
need write operations in order to access data. For instance imagine
having slaves replicating certain Sets keys from the master. When
accessing the data on the slave, we want to peform intersections between
such Sets values. However we don't want to intersect each time: to cache
the intersection for some time often is a good idea.

To do so, it is possible to setup a slave as a writable slave, and
perform the intersection on the slave side, perhaps setting a TTL on the
resulting key so that it will expire after some time.

THE BUG

Problem: in order to have a consistent replication, expiring of keys in
Redis replication is up to the master, that synthesize DEL operations to
send in the replication stream. However slaves logically expire keys
by hiding them from read attempts from clients so that if the master did
not promptly sent a DEL, the client still see logically expired keys
as non existing.

Because slaves don't actively expire keys by actually evicting them but
just masking from the POV of read operations, if a key is created in a
writable slave, and an expire is set, the key will be leaked forever:

1. No DEL will be received from the master, which does not know about
such a key at all.

2. No eviction will be performed by the slave, since it needs to disable
eviction because it's up to masters, otherwise consistency of data is
lost.

THE FIX

In order to fix the problem, the slave should be able to tag keys that
were created in the slave side and have an expire set in some way.

My solution involved using an unique additional dictionary created by
the writable slave only if needed. The dictionary is obviously keyed by
the key name that we need to track: all the keys that are set with an
expire directly by a client writing to the slave are tracked.

The value in the dictionary is a bitmap of all the DBs where such a key
name need to be tracked, so that we can use a single dictionary to track
keys in all the DBs used by the slave (actually this limits the solution
to the first 64 DBs, but the default with Redis is to use 16 DBs).

This solution allows to pay both a small complexity and CPU penalty,
which is zero when the feature is not used, actually. The slave-side
eviction is encapsulated in code which is not coupled with the rest of
the Redis core, if not for the hook to track the keys.

TODO

I'm doing the first smoke tests to see if the feature works as expected:
so far so good. Unit tests should be added before merging into the
4.0 branch.
2016-12-13 10:59:54 +01:00
Chris Lamb
6eb0c52d4c src/rdb.c: Correct "whenver" -> "whenever" typo. 2016-12-01 13:16:30 +01:00
antirez
28c96d73b2 PSYNC2: Save replication ID/offset on RDB file.
This means that stopping a slave and restarting it will still make it
able to PSYNC with the master. Moreover the master itself will retain
its ID/offset, in case it gets turned into a slave, or if a slave will
try to PSYNC with it with an exactly updated offset (otherwise there is
no backlog).

This change was possible thanks to PSYNC v2 that makes saving the current
replication state much simpler.
2016-11-10 12:35:29 +01:00
antirez
2669fb8364 PSYNC2: different improvements to Redis replication.
The gist of the changes is that now, partial resynchronizations between
slaves and masters (without the need of a full resync with RDB transfer
and so forth), work in a number of cases when it was impossible
in the past. For instance:

1. When a slave is promoted to mastrer, the slaves of the old master can
partially resynchronize with the new master.

2. Chained slalves (slaves of slaves) can be moved to replicate to other
slaves or the master itsef, without requiring a full resync.

3. The master itself, after being turned into a slave, is able to
partially resynchronize with the new master, when it joins replication
again.

In order to obtain this, the following main changes were operated:

* Slaves also take a replication backlog, not just masters.

* Same stream replication for all the slaves and sub slaves. The
replication stream is identical from the top level master to its slaves
and is also the same from the slaves to their sub-slaves and so forth.
This means that if a slave is later promoted to master, it has the
same replication backlong, and can partially resynchronize with its
slaves (that were previously slaves of the old master).

* A given replication history is no longer identified by the `runid` of
a Redis node. There is instead a `replication ID` which changes every
time the instance has a new history no longer coherent with the past
one. So, for example, slaves publish the same replication history of
their master, however when they are turned into masters, they publish
a new replication ID, but still remember the old ID, so that they are
able to partially resynchronize with slaves of the old master (up to a
given offset).

* The replication protocol was slightly modified so that a new extended
+CONTINUE reply from the master is able to inform the slave of a
replication ID change.

* REPLCONF CAPA is used in order to notify masters that a slave is able
to understand the new +CONTINUE reply.

* The RDB file was extended with an auxiliary field that is able to
select a given DB after loading in the slave, so that the slave can
continue receiving the replication stream from the point it was
disconnected without requiring the master to insert "SELECT" statements.
This is useful in order to guarantee the "same stream" property, because
the slave must be able to accumulate an identical backlog.

* Slave pings to sub-slaves are now sent in a special form, when the
top-level master is disconnected, in order to don't interfer with the
replication stream. We just use out of band "\n" bytes as in other parts
of the Redis protocol.

An old design document is available here:

https://gist.github.com/antirez/ae068f95c0d084891305

However the implementation is not identical to the description because
during the work to implement it, different changes were needed in order
to make things working well.
2016-11-09 15:37:15 +01:00
antirez
152c1b6802 Module: Ability to get context from IO context.
It was noted by @dvirsky that it is not possible to use string functions
when writing the AOF file. This sometimes is critical since the command
rewriting may need to be built in the context of the AOF callback, and
without access to the context, and the limited types that the AOF
production functions will accept, this can be an issue.

Moreover there are other needs that we can't anticipate regarding the
ability to use Redis Modules APIs using the context in order to build
representations to emit AOF / RDB.

Because of this a new API was added that allows the user to get a
temporary context from the IO context. The context is auto released
if obtained when the RDB / AOF callback returns.

Calling multiple time the function to get the context, always returns
the same one, since it is invalid to have more than a single context.
2016-10-06 17:09:26 +02:00
antirez
3dc84c5300 Modules: API to save/load single precision floating point numbers.
When double precision is not needed, to take 2x space in the
serialization is not good.
2016-10-03 00:08:35 +02:00
antirez
e565632e59 Child -> Parent pipe for COW info transferring. 2016-09-19 13:45:20 +02:00
antirez
945a2f948e zmalloc: zmalloc_get_smap_bytes_by_field() modified to work for any PID.
The goal is to get copy-on-write amount of the child from the parent.
2016-09-19 10:28:42 +02:00
antirez
3793afa0ba Merge branch 'aofrdb' into unstable 2016-09-09 15:03:21 +02:00
antirez
57a0db9495 Fix rdb.c var types when calling rdbLoadLen().
Technically as soon as Redis 64 bit gets proper support for loading
collections and/or DBs with more than 2^32 elements, the 32 bit version
should be modified in order to check if what we read from rdbLoadLen()
overflows. This would only apply to huge RDB files created with a 64 bit
instance and later loaded into a 32 bit instance.
2016-09-01 11:08:44 +02:00
antirez
f1c32f0dcb RDB AOF preamble: WIP 3 (RDB loading refactoring). 2016-08-11 15:27:29 +02:00
antirez
feda52381d RDB AOF preamble: WIP 2. 2016-08-09 16:41:40 +02:00
antirez
4426cb11e2 RDB AOF preamble: WIP 1. 2016-08-09 11:07:32 +02:00
antirez
0a628e5102 Avoid simultaneous RDB and AOF child process.
This patch, written in collaboration with Oran Agra (@oranagra) is a companion
to 780a8b1. Together the two patches should avoid that the AOF and RDB saving
processes can be spawned at the same time. Previously conditions that
could lead to two saving processes at the same time were:

1. When AOF is enabled via CONFIG SET and an RDB saving process is
   already active.

2. When the SYNC command decides to start an RDB saving process ASAP in
   order to serve a new slave that cannot partially resynchronize (but
   only if we have a disk target for replication, for diskless
   replication there is not such a problem).

Condition "1" is not very severe but "2" can happen often and is
definitely good at degrading Redis performances in an unexpected way.

The two commits have the effect of always spawning RDB savings for
replication in replicationCron() instead of attempting to start an RDB
save synchronously. Moreover when a BGSAVE or AOF rewrite must be
performed, they are instead just postponed using flags that will try to
perform such operations ASAP.

Finally the BGSAVE command was modified in order to accept a SCHEDULE
option so that if an AOF rewrite is in progress, when this option is
given, the command no longer returns an error, but instead schedules an
RDB rewrite operation for when it will be possible to start it.
2016-07-21 18:35:01 +02:00
antirez
7e220a964a In Redis RDB check: more details in error reportings. 2016-07-01 15:26:55 +02:00
antirez
e697153d18 In Redis RDB check: log decompression errors. 2016-07-01 11:59:25 +02:00
antirez
e9f31ba9c2 In Redis RDB check: better error reporting. 2016-07-01 09:36:52 +02:00
Pierre Chapuis
188d90fc87 fix some compiler warnings 2016-06-05 16:48:45 +02:00
antirez
8ec28002be Modules: support for modules native data types. 2016-06-03 18:14:04 +02:00
antirez
27e5f385c1 RDB v8: fix rdbLoadLen() return value. 2016-06-01 20:18:28 +02:00
antirez
e6554bed92 RDB v8: new ZSET storage format with binary doubles. 2016-06-01 12:12:26 +02:00
antirez
4aae4f7d35 RDB v8: ability to save uint64_t lengths. 2016-06-01 11:35:47 +02:00
Oran Agra
5e3880a492 various cleanups and minor fixes 2016-04-25 16:49:57 +03:00
Oran Agra
7ba90225a0 fix small issues in redis 3.2 2016-04-25 14:19:28 +03:00
antirez
b0ec22f948 Include full paths on RDB/AOF files errors.
Close #3086.
2016-02-15 16:15:01 +01:00
antirez
97ba4e3886 Lazyfree: Hash converted to use plain SDS WIP 5. 2015-10-01 13:02:25 +02:00
antirez
a7c5be18a8 Lazyfree: Sorted sets convereted to plain SDS. (several commits squashed) 2015-10-01 13:02:24 +02:00
antirez
86d48efbfd Lazyfree: Convert Sets to use plains SDS (several commits squashed). 2015-10-01 13:02:24 +02:00
antirez
8e55537459 Undo slaves state change on failed rdbSaveToSlavesSockets().
As Oran Agra suggested, in startBgsaveForReplication() when the BGSAVE
attempt returns an error, we scan the list of slaves in order to remove
them since there is no way to serve them currently.

However we check for the replication state BGSAVE_START, which was
modified by rdbSaveToSlaveSockets() before forking(). So when fork fails
the state of slaves remain BGSAVE_END and no cleanup is performed.

This commit fixes the problem by making rdbSaveToSlavesSockets() able to
undo the state change on fork failure.
2015-09-07 16:09:23 +02:00
antirez
54ece2c583 Remove slave state change handled by replicationSetupSlaveForFullResync(). 2015-08-05 13:58:56 +02:00
antirez
15de6b108b Make sure we re-emit SELECT after each new slave full sync setup.
In previous commits we moved the FULLRESYNC to the moment we start the
BGSAVE, so that the offset we provide is the right one. However this
also means that we need to re-emit the SELECT statement every time a new
slave starts to accumulate the changes.

To obtian this effect in a more clean way, the function that sends the
FULLRESYNC reply was overloaded with a more important role of also doing
this and chanigng the slave state. So it was renamed to
replicationSetupSlaveForFullResync() to better reflect what it does now.
2015-08-05 13:34:46 +02:00
antirez
292fec058a PSYNC initial offset fix.
This commit attempts to fix a bug involving PSYNC and diskless
replication (currently experimental) found by Yuval Inbar from Redis Labs
and that was later found to have even more far reaching effects (the bug also
exists when diskstore is off).

The gist of the bug is that, a Redis master replies with +FULLRESYNC to
a PSYNC attempt that fails and requires a full resynchronization.
However, the baseline offset sent along with FULLRESYNC was always the
current master replication offset. This is not ok, because there are
many reasosn that may delay the RDB file creation. And... guess what,
the master offset we communicate must be the one of the time the RDB
was created. So for example:

1) When the BGSAVE for replication is delayed since there is one
   already but is not good for replication.
2) When the BGSAVE is not needed as we attach one currently ongoing.
3) When because of diskless replication the BGSAVE is delayed.

In all the above cases the PSYNC reply is wrong and the slave may
reconnect later claiming to need a wrong offset: this may cause
data curruption later.
2015-08-04 17:06:10 +02:00
antirez
32f80e2f1b RDMF: More consistent define names. 2015-07-27 14:37:58 +02:00
antirez
40eb548a80 RDMF: REDIS_OK REDIS_ERR -> C_OK C_ERR. 2015-07-26 23:17:55 +02:00
antirez
2d9e3eb107 RDMF: redisAssert -> serverAssert. 2015-07-26 15:29:53 +02:00
antirez
14ff572482 RDMF: OBJ_ macros for object related stuff. 2015-07-26 15:28:00 +02:00
antirez
554bd0e7bd RDMF: use client instead of redisClient, like Disque. 2015-07-26 15:20:52 +02:00
antirez
424fe9afd9 RDMF: redisLog -> serverLog. 2015-07-26 15:17:43 +02:00
antirez
cef054e868 RDMF (Redis/Disque merge friendlyness) refactoring WIP 1. 2015-07-26 15:17:18 +02:00
Yongyue Sun
427794d845 bugfix: errno might change before logging
Signed-off-by: Yongyue Sun <abioy.sun@gmail.com>
2015-07-17 10:47:32 +02:00
Salvatore Sanfilippo
d83c810265 Merge pull request #2301 from mattsta/fix/lengths
Improve type correctness
2015-02-24 17:22:53 +01:00
antirez
fad758b322 Check RDB automatically in a few more cases. 2015-02-03 10:33:05 +01:00
Matt Stancliff
d8c7db1bdb Improve RDB error-on-load handling
Previouly if we loaded a corrupt RDB, Redis printed an error report
with a big "REPORT ON GITHUB" message at the bottom.  But, we know
RDB load failures are corrupt data, not corrupt code.

Now when RDB failure is detected (duplicate keys or unknown data
types in the file), we run check-rdb against the RDB then exit.  The
automatic check-rdb hopefully gives the user instant feedback
about what is wrong instead of providing a mysterious stack
trace.
2015-01-28 11:19:00 -05:00
antirez
92cfab44b2 Fix gcc warning for lack of casting to char pointer. 2015-01-21 14:51:42 +01:00
Matt Stancliff
f704360462 Improve RDB type correctness
It's possible large objects could be larger than 'int', so let's
upgrade all size counters to ssize_t.

This also fixes rdbSaveObject serialized bytes calculation.
Since entire serializations of data structures can be large,
so we don't want to limit their calculated size to a 32 bit signed max.

This commit increases object size calculation and
cascades the change back up to serializedlength printing.

Before:
127.0.0.1:6379> debug object hihihi
... encoding:quicklist serializedlength:-2147483559 ...

After:
127.0.0.1:6379> debug object hihihi
... encoding:quicklist serializedlength:2147483737 ...
2015-01-19 14:10:12 -05:00
Matt Stancliff
eb7d67a3ab Remove RDB AUX memory leaks 2015-01-09 15:19:18 -05:00
antirez
a7722dc31b Typo fixed: fiels -> fields in rdbSaveInfoAuxFields().
Thx to @badboy.
2015-01-08 12:06:22 +01:00
antirez
4c0e8923a6 A few more AUX info fields added to RDB. 2015-01-08 09:52:59 +01:00
antirez
206cd219b6 RDB AUX fields support.
This commit introduces a new RDB data type called 'aux'. It is used in
order to insert inside an RDB file key-value pairs that may serve
different needs, without breaking backward compatibility when new
informations are embedded inside an RDB file. The contract between Redis
versions is to ignore unknown aux fields when encountered.

Aux fields can be used in order to:

1. Augment the RDB file with info like version of Redis that created the
RDB file, creation time, used memory while the RDB was created, and so
forth.
2. Add state about Redis inside the RDB file that we need to reload
later: replication offset, previos master run ID, in order to improve
failovers safety and allow partial resynchronization after a slave
restart.
3. Anything that we may want to add to RDB files without breaking the
ability of past versions of Redis to load the file.
2015-01-08 09:52:55 +01:00
antirez
1a30e7ded1 rdbLoad() refactoring to make it simpler to follow. 2015-01-08 09:52:51 +01:00
antirez
e8614a1a77 New RDB v7 opcode: RESIZEDB.
The new opcode is an hint about the size of the dataset (keys and number
of expires) we are going to load for a given Redis database inside the
RDB file. Since hash tables are resized accordingly ASAP, useless
rehashing is avoided, speeding up load times significantly, in the order
of ~ 20% or more for larger data sets.

Related issue: #1719
2015-01-08 09:52:47 +01:00
antirez
f699b5e801 Use RDB_LOAD_PLAIN to load quicklists and encoded types.
Before we needed to create a string object with an embedded SDS, adn
basically duplicate the SDS part into a plain zmalloc() allocation.
2015-01-08 09:52:40 +01:00
antirez
68bc02c36c RDB refactored to load plain strings from RDB. 2015-01-08 09:52:36 +01:00
Matt Stancliff
02bb515a09 Config: Add quicklist, remove old list options
This removes:
  - list-max-ziplist-entries
  - list-max-ziplist-value

This adds:
  - list-max-ziplist-size
  - list-compress-depth

Also updates config file with new sections and updates
tests to use quicklist settings instead of old list settings.
2015-01-02 11:16:10 -05:00
Matt Stancliff
abdd1414a8 Allow compression of interior quicklist nodes
Let user set how many nodes to *not* compress.

We can specify a compression "depth" of how many nodes
to leave uncompressed on each end of the quicklist.

Depth 0 = disable compression.
Depth 1 = only leave head/tail uncompressed.
  - (read as: "skip 1 node on each end of the list before compressing")
Depth 2 = leave head, head->next, tail->prev, tail uncompressed.
  - ("skip 2 nodes on each end of the list before compressing")
Depth 3 = Depth 2 + head->next->next + tail->prev->prev
  - ("skip 3 nodes...")
etc.

This also:
  - updates RDB storage to use native quicklist compression (if node is
    already compressed) instead of uncompressing, generating the RDB string,
    then re-compressing the quicklist node.
  - internalizes the "fill" parameter for the quicklist so we don't
    need to pass it to _every_ function.  Now it's just a property of
    the list.
  - allows a runtime-configurable compression option, so we can
    expose a compresion parameter in the configuration file if people
    want to trade slight request-per-second performance for up to 90%+
    memory savings in some situations.
  - updates the quicklist tests to do multiple passes: 200k+ tests now.
2015-01-02 11:16:09 -05:00
Matt Stancliff
101b3a6e42 Convert quicklist RDB to store ziplist nodes
Turns out it's a huge improvement during save/reload/migrate/restore
because, with compression enabled, we're compressing 4k or 8k
chunks of data consisting of multiple elements in one ziplist
instead of compressing series of smaller individual elements.
2015-01-02 11:16:09 -05:00
Matt Stancliff
127c15e2b2 Convert RDB ziplist loading to sdsnative()
This saves us an unnecessary zmalloc, memcpy, and two frees.
2015-01-02 11:16:09 -05:00
Matt Stancliff
5e362b84ab Add quicklist implementation
This replaces individual ziplist vs. linkedlist representations
for Redis list operations.

Big thanks for all the reviews and feedback from everybody in
https://github.com/antirez/redis/pull/2143
2015-01-02 11:16:08 -05:00
Matt Stancliff
d956d809ac Fix three simple clang analyzer warnings 2014-12-23 09:31:04 -05:00
antirez
840435ad0b INFO loading stats: three fixes.
1. Server unxtime may remain not updated while loading AOF, so ETA is
not updated correctly.

2. Number of processed byte was not initialized.

3. Possible division by zero condition (likely cause of issue #1932).
2014-12-23 14:54:34 +01:00
Alon Diamant
14b04c062e Fixed memory leaks in rdbSaveToSlavesSockets() 2014-12-21 16:13:45 +02:00
antirez
775cc30a98 Use new slave name function for diskless repl reporting. 2014-10-27 12:23:03 +01:00
antirez
ebb3bd53c2 Diskless replication: child -> parent communication improved.
Child now reports full info to the parent including IDs of slaves in
failure state and exit code.
2014-10-23 23:10:33 +02:00
antirez
d4f6a1711d Diskless replication: set / reset socket send timeout.
We need to avoid that a child -> slaves transfer can continue forever.
We use the same timeout used as global replication timeout, which is
documented to also affect I/O operations during bulk transfers.
2014-10-22 15:53:45 +02:00
antirez
525c488f63 rio fdset target: handle short writes.
While the socket is set in blocking mode, we still can get short writes
writing to a socket.
2014-10-17 16:45:53 +02:00
antirez
10aafdad56 Diskless replication: rio fdset target new supports buffering.
To perform a socket write() for each RDB rio API write call was
extremely unefficient, so now rio has minimal buffering capabilities.
Writes are accumulated into a buffer and only when a given limit is
reacehd are actually wrote to the N slaves FDs.

Trivia: rio lacked support for buffering since our targets were:

1) Memory buffers.
2) C standard I/O.

Both were buffered already.
2014-10-17 11:36:12 +02:00
antirez
b1337b15b6 Diskless replication: Various fixes to backgroundSaveDoneHandlerSocket() 2014-10-17 10:43:56 +02:00
antirez
7a1e0d9898 Diskless replication: read report from child. 2014-10-15 11:36:03 +02:00
antirez
fbe7545545 Diskless replication: child writes report to parent. 2014-10-15 09:46:49 +02:00
antirez
1cd0d26c63 Diskless replication: parent-child pipe and a few TODOs. 2014-10-14 15:29:07 +02:00
antirez
75f0cd6520 Diskless replication: RDB -> slaves transfer draft implementation. 2014-10-14 10:11:29 +02:00
antirez
2df8341c75 Define different types of RDB childs.
We need to remember what is the saving strategy of the current RDB child
process, since the configuration may be modified at runtime via CONFIG
SET and still we'll need to understand, when the child exists, what to
do and for what goal the process was initiated: to create an RDB file
on disk or to write stuff directly to slave's sockets.
2014-10-08 09:09:01 +02:00
antirez
8beb98574a RDB file creation refactored to target non-disk target. 2014-10-07 12:56:23 +02:00
zionwu
a2583466e4 Fix incorrect comments
error != success; and 0 != number of bytes written

Closes #1806
2014-09-29 06:49:06 -04:00
yoav
0a98b21f65 Add error check for writing RDB checksum
Closes #857
2014-08-18 11:09:06 +02:00
Matt Stancliff
8db020e2a1 Fix assert technical correctness
dictAdd returns DICT_OK, not REDIS_OK. They both
have the same underlying values, so it works even though
the code is technically wrong.

Fixes #1512
2014-08-08 10:03:22 +02:00
antirez
7bb25f8a46 Force quit when receiving a second SIGINT.
Also quit ASAP when we are still loading a DB, since care is not needed
in this special condition, especially for a SIGINT.
2014-08-07 16:39:02 +02:00
Yossi Gottlieb
a75a574141 Fail SYNC if background save child aborted due to a signal. 2014-07-28 14:43:30 +03:00
antirez
c7822bf382 RDB: load string objects directly as EMBSTR objects when possible. 2014-07-16 11:36:22 +02:00
antirez
7fb90a670e LATENCY DOCTOR first implementation complete. 2014-07-08 17:05:56 +02:00
antirez
de88bc63d5 Latency monitor: more hooks around the code. 2014-07-01 17:19:08 +02:00
antirez
95b1979c32 No more trailing spaces in Redis source code. 2014-06-26 18:48:40 +02:00
Akos Vandra
b252fab06c Fixed possible buffer overflow bug if RDB file is corrupted.
(Note: commit message modified by @antirez for clarity).
2014-05-12 11:48:14 +02:00
Akos Vandra
433e835d3e fixed possible buffer overflow error 2014-05-12 11:19:07 +02:00
antirez
e29d330724 Process events with processEventsWhileBlocked() when blocked.
When we are blocked and a few events a processed from time to time, it
is smarter to call the event handler a few times in order to handle the
accept, read, write, close cycle of a client in a single pass, otherwise
there is too much latency added for clients to receive a reply while the
server is busy in some way (for example during the DB loading).
2014-04-24 21:44:32 +02:00
Matt Stancliff
b47b343fab Fix data loss when save AOF/RDB with no free space
Previously, the (!fp) would only catch lack of free space
under OS X.  Linux waits to discover it can't write until
it actually writes contents to disk.

(fwrite() returns success even if the underlying file
has no free space to write into.  All the errors
only show up at flush/sync/close time.)

Fixes antirez/redis#1604
2014-03-24 13:54:14 -04:00
antirez
51bd9da1fd Update cached time in rdbLoad() callback.
server.unixtime and server.mstime are cached less precise timestamps
that we use every time we don't need an accurate time representation and
a syscall would be too slow for the number of calls we require.

Such an example is the initialization and update process of the last
interaction time with the client, that is used for timeouts.

However rdbLoad() can take some time to load the DB, but at the same
time it did not updated the time during DB loading. This resulted in the
bug described in issue #1535, where in the replication process the slave
loads the DB, creates the redisClient representation of its master, but
the timestamp is so old that the master, under certain conditions, is
sensed as already "timed out".

Thanks to @yoav-steinberg and Redis Labs Inc for the bug report and
analysis.
2014-02-13 15:13:26 +01:00
antirez
11120689c4 Slaves heartbeats during sync improved.
The previous fix for false positive timeout detected by master was not
complete. There is another blocking stage while loading data for the
first synchronization with the master, that is, flushing away the
current data from the DB memory.

This commit uses the newly introduced dict.c callback in order to make
some incremental work (to send "\n" heartbeats to the master) while
flushing the old data from memory.

It is hard to write a regression test for this issue unfortunately. More
support for debugging in the Redis core would be needed in terms of
functionalities to simulate a slow DB loading / deletion.
2013-12-10 18:47:31 +01:00
antirez
7c531eb5ad Don't send more than 1 newline/sec while loading RDB. 2013-12-10 18:43:19 +01:00
antirez
27db38d069 Slaves heartbeat while loading RDB files.
Starting with Redis 2.8 masters are able to detect timed out slaves,
while before 2.8 only slaves were able to detect a timed out master.

Now that timeout detection is bi-directional the following problem
happens as described "in the field" by issue #1449:

1) Master and slave setup with big dataset.
2) Slave performs the first synchronization, or a full sync
   after a failed partial resync.
3) Master sends the RDB payload to the slave.
4) Slave loads this payload.
5) Master detects the slave as timed out since does not receive back the
   REPLCONF ACK acknowledges.

Here the problem is that the master has no way to know how much the
slave will take to load the RDB file in memory. The obvious solution is
to use a greater replication timeout setting, but this is a shame since
for the 0.1% of operation time we are forced to use a timeout that is
not what is suited for 99.9% of operation time.

This commit tries to fix this problem with a solution that is a bit of
an hack, but that modifies little of the replication internals, in order
to be back ported to 2.8 safely.

During the RDB loading time, we send the master newlines to avoid
being sensed as timed out. This is the same that the master already does
while saving the RDB file to still signal its presence to the slave.

The single newline is used because:

1) It can't desync the protocol, as it is only transmitted all or
nothing.
2) It can be safely sent while we don't have a client structure for the
master or in similar situations just with write(2).
2013-12-09 20:26:00 +01:00
antirez
11e81a1e9a Fixed grammar: before H the article is a, not an. 2013-12-05 16:35:32 +01:00
antirez
d3588dc194 Fix broken rdbWriteRaw() return value check in rdb.c.
Thanks to @PhoneLi for reporting.
2013-11-07 23:53:18 +01:00
antirez
b34126e378 Update server.lastbgsave_status when fork() fails. 2013-08-27 10:16:29 +02:00
antirez
7e9929e12e Use printf %zu specifier to print private_dirty. 2013-08-20 12:04:57 +02:00
antirez
894eba07c8 Introduction of a new string encoding: EMBSTR
Previously two string encodings were used for string objects:

1) REDIS_ENCODING_RAW: a string object with obj->ptr pointing to an sds
stirng.

2) REDIS_ENCODING_INT: a string object where the obj->ptr void pointer
is casted to a long.

This commit introduces a experimental new encoding called
REDIS_ENCODING_EMBSTR that implements an object represented by an sds
string that is not modifiable but allocated in the same memory chunk as
the robj structure itself.

The chunk looks like the following:

+--------------+-----------+------------+--------+----+
| robj data... | robj->ptr | sds header | string | \0 |
+--------------+-----+-----+------------+--------+----+
                     |                       ^
                     +-----------------------+

The robj->ptr points to the contiguous sds string data, so the object
can be manipulated with the same functions used to manipulate plan
string objects, however we need just on malloc and one free in order to
allocate or release this kind of objects. Moreover it has better cache
locality.

This new allocation strategy should benefit both the memory usage and
the performances. A performance gain between 60 and 70% was observed
during micro-benchmarks, however there is more work to do to evaluate
the performance impact and the memory usage behavior.
2013-07-22 10:31:38 +02:00
yoav
63d15dfc87 Chunked loading of RDB to prevent redis from stalling reading very large keys. 2013-07-16 15:41:24 +02:00
antirez
98eecb70eb Binding multiple IPs done properly with multiple sockets. 2013-07-05 11:47:20 +02:00
antirez
b237de33d1 Throttle BGSAVE attempt on saving error.
When a BGSAVE fails, Redis used to flood itself trying to BGSAVE at
every next cron call, that is either 10 or 100 times per second
depending on configuration and server version.

This commit does not allow a new automatic BGSAVE attempt to be
performed before a few seconds delay (currently 5).

This avoids both the auto-flood problem and filling the disk with
logs at a serious rate.

The five seconds limit, considering a log entry of 200 bytes, will use
less than 4 MB of disk space per day that is reasonable, the sysadmin
should notice before of catastrofic events especially since by default
Redis will stop serving write queries after the first failed BGSAVE.

This fixes issue #849
2013-04-02 14:05:50 +02:00
antirez
79a6844e44 rdbLoad(): rework code to save vertical space. 2013-03-12 19:46:33 +01:00
Damian Janowski
4178a80282 Abort when opening the RDB file results in an error other than ENOENT.
This fixes cases where the RDB file does exist but can't be accessed for
any reason. For instance, when the Redis process doesn't have enough
permissions on the file.
2013-03-12 14:37:50 -03:00
antirez
e4682c82e3 Remove too agressive/spamming log in rdb.c. 2013-02-27 12:29:20 +01:00
antirez
6356cf6808 Set process name in ps output to make operations safer.
This commit allows Redis to set a process name that includes the binding
address and the port number in order to make operations simpler.

Redis children processes doing AOF rewrites or RDB saving change the
name into redis-aof-rewrite and redis-rdb-bgsave respectively.

This in general makes harder to kill the wrong process because of an
error and makes simpler to identify saving children.

This feature was suggested by Arnaud GRANAL in the Redis Google Group,
Arnaud also pointed me to the setproctitle.c implementation includeed in
this commit.

This feature should work on all the Linux, OSX, and all the three major
BSD systems.
2013-02-26 11:52:12 +01:00
antirez
79a0ef62db Whitelist SIGUSR1 to avoid auto-triggering errors.
This commit fixes issue #875 that was caused by the following events:

1) There is an active child doing BGSAVE.
2) flushall is called (or any other condition that makes Redis killing
the saving child process).
3) An error is sensed by Redis as the child exited with an error (killed
by a singal), that stops accepting write commands until a BGSAVE happens
to be executed with success.

Whitelisting SIGUSR1 and making sure Redis always uses this signal in
order to kill its own children fixes the issue.
2013-01-19 13:30:38 +01:00
guiquanz
9d09ce3981 Fixed many typos. 2013-01-19 10:59:44 +01:00
antirez
49b6452351 Children creating AOF or RDB files now report memory used by COW.
Finally Redis is able to report the amount of memory used by
copy-on-write while saving an RDB or writing an AOF file in background.

Note that this information is currently only logged (at NOTICE level)
and not shown in INFO because this is less trivial (but surely doable
with some minor form of interprocess communication).

The reason we can't capture this information on the parent before we
call wait3() is that the Linux kernel will release the child memory
ASAP, and only retain the minimal state for the process that is useful
to report the child termination to the parent.

The COW size is obtained by summing all the Private_Dirty fields found
in the "smap" file inside the proc filesystem for the process.

All this is Linux specific and is not available on other systems.
2012-11-19 12:02:08 +01:00
antirez
4365e5b2d3 BSD license added to every C source and header file. 2012-11-08 18:31:32 +01:00
antirez
68fc64afd4 Update memory peak stats while loading RDB / AOF. 2012-10-24 12:21:41 +02:00
antirez
c7a25200e2 RDB type loading functions clarified in comments.
Improved comments to make clear that rdbLoadType() just loads a
general TYPE in the context of RDB that can be an object type or an
expire type, end-of-file, and so forth.

While rdbLoadObjectType() enforces that the type is a valid Object Type
otherwise it returns -1.
2012-06-02 10:21:57 +02:00
antirez
33e1db36fa Four new persistence fields in INFO. A few renamed.
The 'persistence' section of INFO output now contains additional four
fields related to RDB and AOF persistence:

 rdb_last_bgsave_time_sec       Duration of latest BGSAVE in sec.
 rdb_current_bgsave_time_sec    Duration of current BGSAVE in sec.
 aof_last_rewrite_time_sec      Duration of latest AOF rewrite in sec.
 aof_current_rewrite_time_sec   Duration of current AOF rewrite in sec.

The 'current' fields are set to -1 if a BGSAVE / AOF rewrite is not in
progress. The 'last' fileds are set to -1 if no previous BGSAVE / AOF
rewrites were performed.

Additionally a few fields in the persistence section were renamed for
consistency:

 changes_since_last_save -> rdb_changes_since_last_save
 bgsave_in_progress -> rdb_bgsave_in_progress
 last_save_time -> rdb_last_save_time
 last_bgsave_status -> rdb_last_bgsave_status
 bgrewriteaof_in_progress -> aof_rewrite_in_progress
 bgrewriteaof_scheduled -> aof_rewrite_scheduled

After the renaming, fields in the persistence section start with rdb_ or
aof_ prefix depending on the persistence method they describe.
The field 'loading' and related fields are not prefixed because they are
unique for both the persistence methods.
2012-05-25 12:11:30 +02:00
antirez
053d56a1fa rdbLoad() should check REDIS_RDB_VERSION instead of hardcoded number. 2012-04-24 12:53:30 +02:00
Salvatore Sanfilippo
7d3ee4172f Merge pull request #440 from ErikDubbelboer/spelling
Fixed some spelling errors in comments
2012-04-21 03:31:06 -07:00
antirez
84bcd3aa24 It is now possible to enable/disable RDB checksum computation from redis.conf or via CONFIG SET/GET. Also CONFIG SET support added for rdbcompression as well. 2012-04-10 15:47:10 +02:00
antirez
82e32055d8 RDB files now embed a crc64 checksum. Version of RDB bumped to 5. 2012-04-09 22:40:41 +02:00
antirez
8491f1d9fd Fixed compilation of new rio.c changes (typos and so forth.) 2012-04-09 12:36:44 +02:00
antirez
736b7c3f04 Add checksum computation to rio.c 2012-04-09 12:33:09 +02:00
Erik Dubbelboer
e1d9857b12 Update src/rdb.c 2012-04-07 15:48:30 +03:00