Commit Graph

118 Commits

Author SHA1 Message Date
oranagra
7aa9e6d2ae active memory defragmentation 2016-12-30 03:37:52 +02:00
antirez
06bfeb482d Only show Redis logo if logging to stdout / TTY.
You can still force the logo in the normal logs.
For motivations, check issue #3112. For me the reason is that actually
the logo is nice to have in interactive sessions, but inside the logs
kinda loses its usefulness, but for the ability of users to recognize
restarts easily: for this reason the new startup sequence shows a one
liner ASCII "wave" so that there is still a bit of visual clue.

Startup logging was modified in order to log events in more obvious
ways, and to log more events. Also certain important informations are
now more easy to parse/grep since they are printed in field=value style.

The option --always-show-logo in redis.conf was added, defaulting to no.
2016-12-19 16:41:47 +01:00
antirez
87538cb7fe Switch PFCOUNT to LogLog-Beta algorithm.
The new algorithm provides the same speed with a smaller error for
cardinalities in the range 0-100k. Before switching, the new and old
algorithm behavior was studied in details in the context of
issue #3677. You can find a few graphs and motivations there.
2016-12-16 11:07:30 +01:00
Harish Murthy
c55e3fbae5 LogLog-Beta Algorithm support within HLL
Config option to use LogLog-Beta Algorithm for Cardinality
2016-12-16 11:07:30 +01:00
antirez
ac61f90625 DEBUG: new "ziplist" subcommand added. Dumps a ziplist on stdout.
The commit improves ziplistRepr() and adds a new debugging subcommand so
that we can trigger the dump directly from the Redis API.
This command capability was used while investigating issue #3684.
2016-12-16 09:02:50 +01:00
antirez
b6f871cf42 Writable slaves expires: fix leak in key tracking.
We need to use a dictionary type that frees the key, since we copy the
keys in the dictionary we use to track expires created in the slave
side.
2016-12-13 16:27:13 +01:00
antirez
d1adc85aa6 INFO: show num of slave-expires keys tracked. 2016-12-13 16:02:29 +01:00
antirez
04542cff92 Replication: fix the infamous key leakage of writable slaves + EXPIRE.
BACKGROUND AND USE CASEj

Redis slaves are normally write only, however the supprot a "writable"
mode which is very handy when scaling reads on slaves, that actually
need write operations in order to access data. For instance imagine
having slaves replicating certain Sets keys from the master. When
accessing the data on the slave, we want to peform intersections between
such Sets values. However we don't want to intersect each time: to cache
the intersection for some time often is a good idea.

To do so, it is possible to setup a slave as a writable slave, and
perform the intersection on the slave side, perhaps setting a TTL on the
resulting key so that it will expire after some time.

THE BUG

Problem: in order to have a consistent replication, expiring of keys in
Redis replication is up to the master, that synthesize DEL operations to
send in the replication stream. However slaves logically expire keys
by hiding them from read attempts from clients so that if the master did
not promptly sent a DEL, the client still see logically expired keys
as non existing.

Because slaves don't actively expire keys by actually evicting them but
just masking from the POV of read operations, if a key is created in a
writable slave, and an expire is set, the key will be leaked forever:

1. No DEL will be received from the master, which does not know about
such a key at all.

2. No eviction will be performed by the slave, since it needs to disable
eviction because it's up to masters, otherwise consistency of data is
lost.

THE FIX

In order to fix the problem, the slave should be able to tag keys that
were created in the slave side and have an expire set in some way.

My solution involved using an unique additional dictionary created by
the writable slave only if needed. The dictionary is obviously keyed by
the key name that we need to track: all the keys that are set with an
expire directly by a client writing to the slave are tracked.

The value in the dictionary is a bitmap of all the DBs where such a key
name need to be tracked, so that we can use a single dictionary to track
keys in all the DBs used by the slave (actually this limits the solution
to the first 64 DBs, but the default with Redis is to use 16 DBs).

This solution allows to pay both a small complexity and CPU penalty,
which is zero when the feature is not used, actually. The slave-side
eviction is encapsulated in code which is not coupled with the rest of
the Redis core, if not for the hook to track the keys.

TODO

I'm doing the first smoke tests to see if the feature works as expected:
so far so good. Unit tests should be added before merging into the
4.0 branch.
2016-12-13 10:59:54 +01:00
antirez
71e8d15e49 Modules: change type registration API to use a struct of methods. 2016-11-30 11:14:01 +01:00
antirez
28c96d73b2 PSYNC2: Save replication ID/offset on RDB file.
This means that stopping a slave and restarting it will still make it
able to PSYNC with the master. Moreover the master itself will retain
its ID/offset, in case it gets turned into a slave, or if a slave will
try to PSYNC with it with an exactly updated offset (otherwise there is
no backlog).

This change was possible thanks to PSYNC v2 that makes saving the current
replication state much simpler.
2016-11-10 12:35:29 +01:00
antirez
2669fb8364 PSYNC2: different improvements to Redis replication.
The gist of the changes is that now, partial resynchronizations between
slaves and masters (without the need of a full resync with RDB transfer
and so forth), work in a number of cases when it was impossible
in the past. For instance:

1. When a slave is promoted to mastrer, the slaves of the old master can
partially resynchronize with the new master.

2. Chained slalves (slaves of slaves) can be moved to replicate to other
slaves or the master itsef, without requiring a full resync.

3. The master itself, after being turned into a slave, is able to
partially resynchronize with the new master, when it joins replication
again.

In order to obtain this, the following main changes were operated:

* Slaves also take a replication backlog, not just masters.

* Same stream replication for all the slaves and sub slaves. The
replication stream is identical from the top level master to its slaves
and is also the same from the slaves to their sub-slaves and so forth.
This means that if a slave is later promoted to master, it has the
same replication backlong, and can partially resynchronize with its
slaves (that were previously slaves of the old master).

* A given replication history is no longer identified by the `runid` of
a Redis node. There is instead a `replication ID` which changes every
time the instance has a new history no longer coherent with the past
one. So, for example, slaves publish the same replication history of
their master, however when they are turned into masters, they publish
a new replication ID, but still remember the old ID, so that they are
able to partially resynchronize with slaves of the old master (up to a
given offset).

* The replication protocol was slightly modified so that a new extended
+CONTINUE reply from the master is able to inform the slave of a
replication ID change.

* REPLCONF CAPA is used in order to notify masters that a slave is able
to understand the new +CONTINUE reply.

* The RDB file was extended with an auxiliary field that is able to
select a given DB after loading in the slave, so that the slave can
continue receiving the replication stream from the point it was
disconnected without requiring the master to insert "SELECT" statements.
This is useful in order to guarantee the "same stream" property, because
the slave must be able to accumulate an identical backlog.

* Slave pings to sub-slaves are now sent in a special form, when the
top-level master is disconnected, in order to don't interfer with the
replication stream. We just use out of band "\n" bytes as in other parts
of the Redis protocol.

An old design document is available here:

https://gist.github.com/antirez/ae068f95c0d084891305

However the implementation is not identical to the description because
during the work to implement it, different changes were needed in order
to make things working well.
2016-11-09 15:37:15 +01:00
antirez
c7a4e694ad SWAPDB command.
This new command swaps two Redis databases, so that immediately all the
clients connected to a given DB will see the data of the other DB, and
the other way around. Example:

    SWAPDB 0 1

This will swap DB 0 with DB 1. All the clients connected with DB 0 will
immediately see the new data, exactly like all the clients connected
with DB 1 will see the data that was formerly of DB 0.

MOTIVATION AND HISTORY
---

The command was recently demanded by Pedro Melo, but was suggested in
the past multiple times, and always refused by me.

The reason why it was asked: Imagine you have clients operating in DB 0.
At the same time, you create a new version of the dataset in DB 1.
When the new version of the dataset is available, you immediately want
to swap the two views, so that the clients will transparently use the
new version of the data. At the same time you'll likely destroy the
DB 1 dataset (that contains the old data) and start to build a new
version, to repeat the process.

This is an interesting pattern, but the reason why I always opposed to
implement this, was that FLUSHDB was a blocking command in Redis before
Redis 4.0 improvements. Now we have FLUSHDB ASYNC that releases the
old data in O(1) from the point of view of the client, to reclaim memory
incrementally in a different thread.

At this point, the pattern can really be supported without latency
spikes, so I'm providing this implementation for the users to comment.
In case a very compelling argument will be made against this new command
it may be removed.

BEHAVIOR WITH BLOCKING OPERATIONS
---

If a client is blocking for a list in a given DB, after the swap it will
still be blocked in the same DB ID, since this is the most logical thing
to do: if I was blocked for a list push to list "foo", even after the
swap I want still a LPUSH to reach the key "foo" in the same DB in order
to unblock.

However an interesting thing happens when a client is, for instance,
blocked waiting for new elements in list "foo" of DB 0. Then the DB
0 and 1 are swapped with SWAPDB. However the DB 1 happened to have
a list called "foo" containing elements. When this happens, this
implementation can correctly unblock the client.

It is possible that there are subtle corner cases that are not covered
in the implementation, but since the command is self-contained from the
POV of the implementation and the Redis core, it cannot cause anything
bad if not used.

Tests and documentation are yet to be provided.
2016-10-14 15:28:04 +02:00
antirez
8fadfe52a2 Module: API to block clients with threading support.
Just a draft to align the main ideas, never executed code. Compiles.
2016-10-07 11:55:35 +02:00
antirez
799208de85 Fix name of mispelled function. 2016-10-06 17:10:47 +02:00
antirez
152c1b6802 Module: Ability to get context from IO context.
It was noted by @dvirsky that it is not possible to use string functions
when writing the AOF file. This sometimes is critical since the command
rewriting may need to be built in the context of the AOF callback, and
without access to the context, and the limited types that the AOF
production functions will accept, this can be an issue.

Moreover there are other needs that we can't anticipate regarding the
ability to use Redis Modules APIs using the context in order to build
representations to emit AOF / RDB.

Because of this a new API was added that allows the user to get a
temporary context from the IO context. The context is auto released
if obtained when the RDB / AOF callback returns.

Calling multiple time the function to get the context, always returns
the same one, since it is invalid to have more than a single context.
2016-10-06 17:09:26 +02:00
antirez
e565632e59 Child -> Parent pipe for COW info transferring. 2016-09-19 13:45:20 +02:00
antirez
44e714a59c MEMORY DOCTOR initial implementation. 2016-09-16 16:36:53 +02:00
antirez
d9325ac6c8 Provide percentage of memory peak used info. 2016-09-16 10:43:19 +02:00
oranagra
309c2bcd1b add zmalloc used mem to DEBUG SDSLEN 2016-09-16 10:29:27 +02:00
antirez
e9629e148b MEMORY command: HELP + dataset percentage (like in INFO). 2016-09-15 17:33:16 +02:00
antirez
bf2624ea99 C struct memoh renamed redisMemOverhead. API prototypes added. 2016-09-15 09:44:07 +02:00
antirez
8c84c962cf MEMORY OVERHEAD implemented (using Oran Agra initial implementation).
This code was extracted from @oranagra PR #3223 and modified in order
to provide only certain amounts of information compared to the original
code. It was also moved from DEBUG to the newly introduced MEMORY
command. Thanks to Oran for the implementation and the PR.

It implements detailed memory usage stats that can be useful in both
provisioning and troubleshooting memory usage in Redis.
2016-09-13 17:39:25 +02:00
antirez
89dec6921d objectComputeSize(): estimate collections sampling N elements.
For most tasks, we need the memory estimation to be O(1) by default.
This commit also implements an initial MEMORY command.
Note that objectComputeSize() takes the number of samples to check as
argument, so MEMORY should be able to get the sample size as option
to make precision VS CPU tradeoff tunable.

Related to: PR #3223.
2016-09-13 10:28:23 +02:00
antirez
feda52381d RDB AOF preamble: WIP 2. 2016-08-09 16:41:40 +02:00
antirez
4426cb11e2 RDB AOF preamble: WIP 1. 2016-08-09 11:07:32 +02:00
antirez
a81a92ca2c Security: Cross Protocol Scripting protection.
This is an attempt at mitigating problems due to cross protocol
scripting, an attack targeting services using line oriented protocols
like Redis that can accept HTTP requests as valid protocol, by
discarding the invalid parts and accepting the payloads sent, for
example, via a POST request.

For this to be effective, when we detect POST and Host: and terminate
the connection asynchronously, the networking code was modified in order
to never process further input. It was later verified that in a
pipelined request containing a POST command, the successive commands are
not executed.
2016-08-03 11:12:32 +02:00
antirez
55385f99de Ability of slave to announce arbitrary ip/port to master.
This feature is useful, especially in deployments using Sentinel in
order to setup Redis HA, where the slave is executed with NAT or port
forwarding, so that the auto-detected port/ip addresses, as listed in
the "INFO replication" output of the master, or as provided by the
"ROLE" command, don't match the real addresses at which the slave is
reachable for connections.
2016-07-27 17:32:15 +02:00
antirez
0a628e5102 Avoid simultaneous RDB and AOF child process.
This patch, written in collaboration with Oran Agra (@oranagra) is a companion
to 780a8b1. Together the two patches should avoid that the AOF and RDB saving
processes can be spawned at the same time. Previously conditions that
could lead to two saving processes at the same time were:

1. When AOF is enabled via CONFIG SET and an RDB saving process is
   already active.

2. When the SYNC command decides to start an RDB saving process ASAP in
   order to serve a new slave that cannot partially resynchronize (but
   only if we have a disk target for replication, for diskless
   replication there is not such a problem).

Condition "1" is not very severe but "2" can happen often and is
definitely good at degrading Redis performances in an unexpected way.

The two commits have the effect of always spawning RDB savings for
replication in replicationCron() instead of attempting to start an RDB
save synchronously. Moreover when a BGSAVE or AOF rewrite must be
performed, they are instead just postponed using flags that will try to
perform such operations ASAP.

Finally the BGSAVE command was modified in order to accept a SCHEDULE
option so that if an AOF rewrite is in progress, when this option is
given, the command no longer returns an error, but instead schedules an
RDB rewrite operation for when it will be possible to start it.
2016-07-21 18:35:01 +02:00
antirez
2d5eb1f1a0 Volatile-ttl eviction policy implemented in terms of the pool.
Precision of the eviction improved sensibly. Also this allows us to have
a single code path for most eviction types.
2016-07-20 19:54:12 +02:00
antirez
6854c7b9ee LFU: make counter log factor and decay time configurable. 2016-07-20 15:00:35 +02:00
antirez
5d07984c5d LFU: Redis object level implementation.
Implementation of LFU maxmemory policy for anything related to Redis
objects. Still no actual eviction implemented.
2016-07-15 12:12:58 +02:00
antirez
e423f76e75 LRU: Make cross-database choices for eviction.
The LRU eviction code used to make local choices: for each DB visited it
selected the best key to evict. This was repeated for each DB. However
this means that there could be DBs with very frequently accessed keys
that are targeted by the LRU algorithm while there were other DBs with
many better candidates to expire.

This commit attempts to fix this problem for the LRU policy. However the
TTL policy is still not fixed by this commit. The TTL policy will be
fixed in a successive commit.

This is an initial (partial because of TTL policy) fix for issue #2647.
2016-07-13 13:12:30 +02:00
antirez
965905c9f2 Move the struct evictionPoolEntry() into only file using it.
Local scope is always better when possible.
2016-07-12 12:22:38 +02:00
antirez
d8e92a8207 Move prototype of evictionPoolAlloc() in server.h. 2016-07-12 12:22:35 +02:00
antirez
b46239e58b Expire and LRU related code moved into different files. 2016-07-06 15:24:06 +02:00
antirez
b99ad1bd80 Make tcp-keepalive default to 300 in internal conf.
We already changed the default in the redis.conf template, but I forgot
to change the internal config as well.
2016-07-04 12:08:42 +02:00
antirez
e9f31ba9c2 In Redis RDB check: better error reporting. 2016-07-01 09:36:52 +02:00
Salvatore Sanfilippo
28ea585fce Merge pull request #3336 from yossigo/create_string_from_string
Add RedisModule_CreateStringFromString().
2016-06-23 16:16:28 +02:00
Yossi Gottlieb
61172ed01e Add RedisModule_CreateStringFromString(). 2016-06-22 21:02:40 +03:00
Yossi Gottlieb
8f3a4df775 Use const in Redis Module API where possible. 2016-06-20 23:08:06 +03:00
antirez
41d804d9dc TTL and TYPE LRU access fixed. TOUCH implemented. 2016-06-14 15:33:59 +02:00
antirez
a4bce77e92 Don't assume no padding or specific ordering in moduleLoadQueueEntry structure.
We need to be free to shuffle fields or add more fields in a structure
without breaking code.

Related to issue #3293.
2016-06-13 09:51:06 +02:00
antirez
1ad5c22763 Minor changes to unifor C style to Redis code base for PR #3293. 2016-06-13 09:39:44 +02:00
Yossi Gottlieb
cc58f11ccc Use RedisModuleString for OnLoad argv. 2016-06-05 13:18:24 +03:00
Yossi Gottlieb
2bd13cf0eb Allow passing arguments to modules on load. 2016-06-05 11:37:24 +03:00
antirez
8ec28002be Modules: support for modules native data types. 2016-06-03 18:14:04 +02:00
antirez
4aae4f7d35 RDB v8: ability to save uint64_t lengths. 2016-06-01 11:35:47 +02:00
Salvatore Sanfilippo
bafed3ddd6 Merge pull request #3222 from oranagra/more_minir_fixes
minor fixes - mainly signalModifiedKey, and GEORADIUS
2016-05-18 07:50:53 -07:00
antirez
ffd1600ccf Clarify that the LOG_STR_SIZE includes null term. 2016-05-18 15:23:35 +02:00
antirez
227d68094b Modules: command <-> core interface modified to get flags & keys. 2016-05-10 06:40:09 +02:00