Commit Graph

250 Commits

Author SHA1 Message Date
antirez
2edcafb35d addReplySubSyntaxError() renamed to addReplySubcommandSyntaxError(). 2018-07-02 18:49:34 +02:00
Salvatore Sanfilippo
bc6a004588
Merge pull request #4998 from itamarhaber/module_command_help
Module command help
2018-07-02 18:46:56 +02:00
chendianqiang
7de1ada070 limit the size of pending-querybuf in masterclient 2018-07-01 14:43:53 +08:00
antirez
fb39bfd7af Take clients in a ID -> Client handle dictionary. 2018-06-27 14:08:42 +02:00
michael-grunder
db6b99f90c Update ZPOPMIN/ZPOPMAX command declaration
Unlike the BZPOP variants, these functions take a single key.  This fixes
an erroneous CROSSSLOT error when passing a count to a cluster enabled
server.
2018-06-21 12:30:42 -07:00
Salvatore Sanfilippo
e670ccffea
Merge pull request #4857 from youjiali1995/fix-command-getkeys
Fix core dump when using some commands with wrong arguments.
2018-06-18 13:54:38 +02:00
Salvatore Sanfilippo
94658303e9
Merge pull request #4758 from soloestoy/rdb-save-incremental-fsync
Rdb save incremental fsync
2018-06-16 10:59:37 +02:00
antirez
cec404f099 Use a less aggressive query buffer resize policy.
A user with many connections (10 thousand) on a single Redis server
reports in issue #4983 that sometimes Redis is idle becuase at the same
time many clients need to resize their query buffer according to the old
policy.

It looks like this was created by the fact that we allow the query
buffer to grow without problems to a size up to PROTO_MBULK_BIG_ARG
normally, but when the client is idle we immediately are more strict,
and a query buffer greater than 1024 bytes is already enough to trigger
the resize. So for instance if most of the clients stop at the same time
this issue should be easily triggered.

This behavior actually looks odd, and there should be only a clear limit
after we say, let's look at this query buffer to check if it's time to
resize it. This commit puts the limit at PROTO_MBULK_BIG_ARG, and the
check is performed both if compared to the peak usage the current usage
is too big, or if the client is idle.

Then when the check is performed, to waste just a few kbytes is
considered enough to proceed with the resize. This should fix the issue.
2018-06-11 17:12:28 +02:00
Itamar Haber
fefde6e3e4 Capitalizes subcommands & orders lexicographically 2018-06-09 21:03:52 +03:00
Itamar Haber
c199280edb Globally applies addReplySubSyntaxError 2018-06-07 18:39:36 +03:00
antirez
19a438e2c0 Streams: use non static macro node limits.
Also add the concept of size/items limit, instead of just having as
limit the number of bytes.
2018-06-07 14:24:49 +02:00
赵磊
10dedc2586 Fix core dump when using 'command getkeys' with wrong arguments. 2018-06-04 15:14:50 +08:00
Oran Agra
ad133e1023 Active defrag fixes for 32bit builds
problems fixed:
* failing to read fragmentation information from jemalloc
* overflow in jemalloc fragmentation hint to the defragger
* test suite not triggering eviction after population
2018-05-17 09:52:00 +03:00
antirez
6efb6c1e06 ZPOP: renaming to have explicit MIN/MAX score idea.
This commit also adds a top comment about a subtle behavior of mixing
blocking operations of different types in the same key.
2018-05-11 17:31:53 +02:00
Itamar Haber
49890c8ee9 Adds memory information about the script's cache to INFO
Implementation notes: as INFO is "already broken", I didn't want to break it further. Instead of computing the server.lua_script dict size on every call, I'm keeping a running sum of the body's length and dict overheads.

This implementation is naive as it **does not** take into consideration dict rehashing, but that inaccuracy pays off in speed ;)

Demo time:

```bash
$ redis-cli info memory | grep "script"
used_memory_scripts:96
used_memory_scripts_human:96B
number_of_cached_scripts:0
$ redis-cli eval "" 0 ; redis-cli info memory | grep "script"
(nil)
used_memory_scripts:120
used_memory_scripts_human:120B
number_of_cached_scripts:1
$ redis-cli script flush ; redis-cli info memory | grep "script"
OK
used_memory_scripts:96
used_memory_scripts_human:96B
number_of_cached_scripts:0
$ redis-cli eval "return('Hello, Script Cache :)')" 0 ; redis-cli info memory | grep "script"
"Hello, Script Cache :)"
used_memory_scripts:152
used_memory_scripts_human:152B
number_of_cached_scripts:1
$ redis-cli eval "return redis.sha1hex(\"return('Hello, Script Cache :)')\")" 0 ; redis-cli info memory | grep "script"
"1be72729d43da5114929c1260a749073732dc822"
used_memory_scripts:232
used_memory_scripts_human:232B
number_of_cached_scripts:2
✔ 19:03:54 redis [lua_scripts-in-info-memory L ✚…⚑] $ redis-cli evalsha 1be72729d43da5114929c1260a749073732dc822 0
"Hello, Script Cache :)"
```
2018-04-30 19:33:01 +03:00
Itamar Haber
438125b47c Implements [B]Z[REV]POP and the respective unit tests
An implementation of the
[Ze POP Redis Module](https://github.com/itamarhaber/zpop) as core
Redis commands.

Fixes #1861.
2018-04-30 02:10:42 +03:00
antirez
e6b0e8d9ec Streams: XTRIM command added. 2018-04-19 16:25:29 +02:00
antirez
aba76320d5 Streams: XDEL command. 2018-04-18 13:12:09 +02:00
charpty
0fd2b25c8d Remove unnecessary return statements
Signed-off-by: charpty <charpty@gmail.com>
2018-04-06 18:46:24 +08:00
Salvatore Sanfilippo
da621783f0
Merge pull request #4691 from oranagra/active_defrag_v2
Active defrag v2
2018-03-22 09:16:32 +01:00
antirez
5577130451 CG: Make XINFO Great Again (and more Redis-ish).
With XINFO out of the blue I invented a new syntax for commands never
used in Redis in the past... Let's fix it and make it

        Great Again!!11one (TM)
2018-03-20 11:52:42 +01:00
antirez
0b58ad301e CG: Replication WIP 1: XREADGROUP and XCLAIM propagated as XCLAIM. 2018-03-19 18:02:19 +01:00
zhaozhao.zz
54cae05ea7 rdb: incremental fsync when redis saves rdb 2018-03-16 00:44:50 +08:00
antirez
0cf6b1e3ae CG: XINFO CONSUMERS implemented. 2018-03-15 12:54:10 +01:00
antirez
b26f03bd69 CG: XCLAIM now updates the idle time of the message. 2018-03-15 12:54:10 +01:00
antirez
1bc31666da CG: XPENDING without start/stop variant implemented. 2018-03-15 12:54:10 +01:00
antirez
388c69fe4e CG: XACK implementation. 2018-03-15 12:54:10 +01:00
antirez
bd1c11dc35 CG: add XREADGROUP in the command table. 2018-03-15 12:54:10 +01:00
antirez
58f0c000a5 CG: data structures design + XGROUP CREATE implementation. 2018-03-15 12:54:10 +01:00
antirez
432bf4770e Cluster: ability to prevent slaves from failing over their masters.
This commit, in some parts derived from PR #3041 which is no longer
possible to merge (because the user deleted the original branch),
implements the ability of slaves to have a special configuration
preventing that they try to start a failover when the master is failing.

There are multiple reasons for wanting this, and the feautre was
requested in issue #3021 time ago.

The differences between this patch and the original PR are the
following:

1. The flag is saved/loaded on the nodes configuration.
2. The 'myself' node is now flag-aware, the flag is updated as needed
   when the configuration is changed via CONFIG SET.
3. The flag name uses NOFAILOVER instead of NO_FAILOVER to be consistent
   with existing NOADDR.
4. The redis.conf documentation was rewritten.

Thanks to @deep011 for the original patch.
2018-03-14 14:01:38 +01:00
Oran Agra
806736cdf9 Adding real allocator fragmentation to INFO and MEMORY command + active defrag test
other fixes / improvements:
- LUA script memory isn't taken from zmalloc (taken from libc malloc)
  so it can cause high fragmentation ratio to be displayed (which is false)
- there was a problem with "fragmentation" info being calculated from
  RSS and used_memory sampled at different times (now sampling them together)

other details:
- adding a few more allocator info fields to INFO and MEMORY commands
- improve defrag test to measure defrag latency of big keys
- increasing the accuracy of the defrag test (by looking at real grag info)
  this way we can use an even lower threshold and still avoid false positives
- keep the old (total) "fragmentation" field unchanged, but add new ones for spcific things
- add these the MEMORY DOCTOR command
- deduct LUA memory from the rss in case of non jemalloc allocator (one for which we don't "allocator active/used")
- reduce sampling rate of the rss and allocator info
2018-03-12 15:08:52 +02:00
Oran Agra
be1b4aa9aa active defrag v2
- big keys are not defragged in one go from within the dict scan
  instead they are scanned in parts after the main dict hash bucket is done.
- add latency monitor sample for defrag
- change default active-defrag-cycle-min to induce lower latency
- make active defrag start a new scan right away if needed, so it's easier
  (for the test suite) to detect when it's done
- make active defrag quick the current cycle after each db / big key
- defrag  some non key long term global allocations
- some refactoring for smaller functions and more reusable code
- during dict rehashing, one scan iteration of the dict, can end up scanning
  one bucket in the smaller dict and many many buckets in the larger dict.
  so waiting for 16 scan iterations before checking the time, may be much too long.
2018-03-12 15:07:43 +02:00
pan.liangp
f4eb64cd35 move get clients max buffer calculate into info clients command 2018-03-02 17:16:00 +08:00
antirez
ffde73c57d Track number of logically expired keys still in memory.
This commit adds two new fields in the INFO output, stats section:

expired_stale_perc:0.34
expired_time_cap_reached_count:58

The first field is an estimate of the number of keys that are yet in
memory but are already logically expired. They reason why those keys are
yet not reclaimed is because the active expire cycle can't spend more
time on the process of reclaiming the keys, and at the same time nobody
is accessing such keys. However as the active expire cycle runs, while
it will eventually have to return to the caller, because of time limit
or because there are less than 25% of keys logically expired in each
given database, it collects the stats in order to populate this INFO
field.

Note that expired_stale_perc is a running average, where the current
sample accounts for 5% and the history for 95%, so you'll see it
changing smoothly over time.

The other field, expired_time_cap_reached_count, counts the number
of times the expire cycle had to stop, even if still it was finding a
sizeable number of keys yet to expire, because of the time limit.
This allows people handling operations to understand if the Redis
server, during mass-expiration events, is able to collect keys fast
enough usually. It is normal for this field to increment during mass
expires, but normally it should very rarely increment. When instead it
constantly increments, it means that the current workloads is using
a very important percentage of CPU time to expire keys.

This feature was created thanks to the hints of Rashmi Ramesh and
Bart Robinson from Twitter. In private email exchanges, they noted how
it was important to improve the observability of this parameter in the
Redis server. Actually in big deployments, the amount of keys that are
yet to expire in each server, even if they are logically expired, may
account for a very big amount of wasted memory.
2018-02-19 11:12:49 +01:00
antirez
e1e0bbe04d Remove useless comment from serverCron().
The behavior is well specified by the code itself.
2018-01-17 11:23:41 +01:00
Salvatore Sanfilippo
a18e4c964e
Merge pull request #4546 from hqin6/unstable
fixbug for #4545 dead loop aof rewrite
2018-01-17 11:21:55 +01:00
heqin
3d3faa0a19 fixbug for #4545 dead loop aof rewrite 2018-01-17 18:08:30 +08:00
antirez
8075572207 New config options about protocol prefixed with "proto".
Related to #4568.
2018-01-11 11:27:41 +01:00
Salvatore Sanfilippo
c2fa4b8a60
Merge pull request #4568 from oranagra/restore_size_limit
fix RESTORE command size limits
2018-01-11 11:20:02 +01:00
Salvatore Sanfilippo
e954742fbd
Merge pull request #3356 from yusaku/fix-module-firstkey
Fix the firstkey, lastkey, and keystep of moduleCommand
2018-01-09 18:58:31 +01:00
Oran Agra
b509a14c3e Add config options for max-bulk-len and max-querybuf-len mainly to support RESTORE of large keys 2017-12-29 12:43:48 +02:00
heqin
365e6c45ea fixbug for #4545 dead loop aof rewrite 2017-12-18 17:59:03 +08:00
antirez
522760fac7 Change indentation and other minor details of PR #4489.
The main change introduced by this commit is pretending that help
arrays are more text than code, thus indenting them at level 0. This
improves readability, and is an old practice when defining arrays of
C strings describing text.

Additionally a few useless return statements are removed, and the HELP
subcommand capitalized when printed to the user.
2017-12-06 12:05:14 +01:00
Itamar Haber
8b51121998 Merge remote-tracking branch 'upstream/unstable' into help_subcommands 2017-12-05 18:14:59 +02:00
Salvatore Sanfilippo
c56fbb246c
Merge pull request #4508 from trevor211/fixNotes
fix some notes
2017-12-05 15:47:05 +01:00
Salvatore Sanfilippo
6830fa25e3
Merge pull request #4488 from itamarhaber/debug_arity
Standardizes arity handling of DEBUG
2017-12-05 15:29:42 +01:00
WuYunlong
3f232ebfb8 fix some notes 2017-12-05 14:41:16 +08:00
antirez
9bb18e5438 Streams: XRANGE REV option -> XREVRANGE command. 2017-12-01 10:24:25 +01:00
antirez
19b06935d5 Streams: fix XADD API and keyspace notifications.
XADD was suboptimal in the first incarnation of the command, not being
able to accept an ID (very useufl for replication), nor options for
having capped streams.

The keyspace notification for streams was not implemented.
2017-12-01 10:24:24 +01:00
antirez
6468cb2e82 Streams: fix XREAD ready-key signaling.
With lists we need to signal only on key creation, but streams can
provide data to clients listening at every new item added.
To make this slightly more efficient we now track different classes of
blocked clients to avoid signaling keys when there is nobody listening.
A typical case is when the stream is used as a time series DB and
accessed only by range with XRANGE.
2017-12-01 10:24:24 +01:00
antirez
110041825c Streams: XREAD get-keys method. 2017-12-01 10:24:24 +01:00
antirez
4a377cecd8 Streams: initial work to use blocking lists logic for streams XREAD. 2017-12-01 10:24:24 +01:00
antirez
ec9bbe96bf Streams: XLEN command. 2017-12-01 10:24:24 +01:00
antirez
79866a6361 Streams: 12 commits squashed into the initial Streams implementation. 2017-12-01 10:24:24 +01:00
antirez
3b9be93fda Prevent corruption of server.executable after DEBUG RESTART.
Doing the following ended with a broken server.executable:

1. Start Redis with src/redis-server
2. Send CONFIG SET DIR /tmp/
3. Send DEBUG RESTART

At this point we called execve with an argv[0] that is no longer related
to the new path. So after the restart the absolute path of the
executable is recomputed in the wrong way. With this fix we pass the
absolute path already computed as argv[0].
2017-11-30 18:30:06 +01:00
antirez
d8f8701032 Be more verbose when DEBUG RESTART fails. 2017-11-30 18:08:21 +01:00
Itamar Haber
59d52f7fab Standardizes the 'help' subcommand
This adds a new `addReplyHelp` helper that's used by commands
when returning a help text. The following commands have been
touched: DEBUG, OBJECT, COMMAND, PUBSUB, SCRIPT and SLOWLOG.

WIP

Fix entry command table entry for OBJECT for HELP option.

After #4472 the command may have just 2 arguments.

Improve OBJECT HELP descriptions.

See #4472.

WIP 2

WIP 3
2017-11-28 21:15:45 +02:00
Itamar Haber
8c7f90e91e Standardizes arity handling of DEBUG 2017-11-28 18:18:45 +02:00
antirez
b412c544fd Fix entry command table entry for OBJECT for HELP option.
After #4472 the command may have just 2 arguments.
2017-11-27 13:16:07 +01:00
Salvatore Sanfilippo
9d86ae4597
Merge pull request #4412 from soloestoy/bugfix-psync2
PSYNC2: safe free backlog when reach the time limit and others
2017-11-24 10:56:18 +01:00
antirez
de914ede93 Modules: fix for scripting replication of modules commands.
See issue #4466 / #4467.
2017-11-23 15:14:17 +01:00
zhaozhao.zz
57bd8feb8d rehash: handle one db until finished 2017-11-21 09:49:42 +01:00
zhaozhao.zz
b8579c225c PSYNC2: clarify the scenario when repl_stream_db can be -1 2017-11-02 10:45:33 +08:00
antirez
c1c99e9f4e PSYNC2: Fix the way replication info is saved/loaded from RDB.
This commit attempts to fix a number of bugs reported in #4316.
They are related to the way replication info like replication ID,
offsets, and currently selected DB in the master client, are stored
and loaded by Redis. In order to avoid inconsistencies the changes in
this commit try to enforce that:

1. Replication information are only stored when the RDB file is
generated by a slave that has a valid 'master' client, so that we can
always extract the currently selected DB.
2. When replication informations are persisted in the RDB file, all the
info for a successful PSYNC or nothing is persisted.
3. The RDB replication informations are only loaded if the instance is
configured as a slave, otherwise a master can start with IDs that relate
to a different history of the data set, and stil retain such IDs in the
future while receiving unrelated writes.
2017-09-19 23:03:39 +02:00
Oran Agra
b122cadc66 Flush append only buffers before existing.
when SHUTDOWN command is recived it is possible that some of the recent
command were not yet flushed from the AOF buffer, and the server
experiences data loss at shutdown.
2017-09-17 07:22:16 +03:00
Salvatore Sanfilippo
34a79c353f Merge pull request #3935 from itamarhaber/module-cmdstats
Changes command stats iteration to being dict-based
2017-08-02 12:51:26 +02:00
antirez
fc7ecd8d35 AOF check utility: ability to check files with RDB preamble. 2017-07-10 13:38:23 +02:00
antirez
f9fac7f777 Avoid closing invalid FDs to make Valgrind happier. 2017-07-05 15:40:25 +02:00
antirez
f8547e53f0 Added GEORADIUS(BYMEMBER)_RO variants for read-only operations.
Issue #4084 shows how for a design error, GEORADIUS is a write command
because of the STORE option. Because of this it does not work
on readonly slaves, gets redirected to masters in Redis Cluster even
when the connection is in READONLY mode and so forth.

To break backward compatibility at this stage, with Redis 4.0 to be in
advanced RC state, is problematic for the user base. The API can be
fixed into the unstable branch soon if we'll decide to do so in order to
be more consistent, and reease Redis 5.0 with this incompatibility in
the future. This is still unclear.

However, the ability to scale GEO queries in slaves easily is too
important so this commit adds two read-only variants to the GEORADIUS
and GEORADIUSBYMEMBER command: GEORADIUS_RO and GEORADIUSBYMEMBER_RO.
The commands are exactly as the original commands, but they do not
accept the STORE and STOREDIST options.
2017-06-30 10:03:37 +02:00
antirez
01a4b9892d HMSET and MSET implementations unified. HSET now variadic.
This is the first step towards getting rid of HMSET which is a command
that does not make much sense once HSET is variadic, and has a saner
return value.
2017-06-29 17:38:46 +02:00
Suraj Narkhede
d303bca587 Fix brpop command table entry and redirect blocked clients. 2017-06-22 23:52:00 -07:00
xuzhou
530fcf8687 Fix set with ex/px option when propagated to aof 2017-06-16 17:51:38 +08:00
antirez
53cb27b1d7 SLOWLOG: log offending client address and name. 2017-06-15 12:57:54 +02:00
Qu Chen
4740424049 Implement getKeys procedure for georadius and georadiusbymember
commands.
2017-06-14 18:15:48 +02:00
antirez
e91b81c612 More informative -MISCONF error message. 2017-05-19 12:03:30 +02:00
antirez
1f598fc2bb Modules TSC: use atomic var for server.unixtime.
This avoids Helgrind complaining, but we are actually not using
atomicGet() to get the unixtime value for now: too many places where it
is used and given tha time_t is word-sized it should be safe in all the
archs we support as it is.

On the other hand, Helgrind, when Redis is compiled with "make helgrind"
in order to force the __sync macros, will detect the write in
updateCachedTime() as a read (because atomic functions are used) and
will not complain about races.

This commit also includes minor refactoring of mutex initializations and
a "helgrind" target in the Makefile.
2017-05-10 10:04:16 +02:00
antirez
de786186a5 atomicvar.h: show used API in INFO. Add macro to force __sync builtin.
The __sync builtin can be correctly detected by Helgrind so to force it
is useful for testing. The API in the INFO output can be useful for
debugging after problems are reported.
2017-05-10 09:33:49 +02:00
antirez
6eb51bf1ec zmalloc.c: remove thread safe mode, it's the default way. 2017-05-09 16:59:51 +02:00
antirez
9390c384b8 Modules TSC: Add mutex for server.lruclock.
Only useful for when no atomic builtins are available.
2017-05-09 16:32:49 +02:00
antirez
ece658713b Modules TSC: Improve inter-thread synchronization.
More work to do with server.unixtime and similar. Need to write Helgrind
suppression file in order to suppress the valse positives.
2017-05-09 11:57:09 +02:00
antirez
3fcf959e60 Modules TSC: Release the GIL for all the time we are blocked.
Instead of giving the module background operations just a small time to
run in the beforeSleep() function, we can have the lock released for all
the time we are blocked in the multiplexing syscall.
2017-05-03 11:26:21 +02:00
antirez
59b06b14c9 Modules TSC: GIL and cooperative multi tasking setup. 2017-04-28 18:41:10 +02:00
antirez
238cebdd5e Check event loop creation return value. Fix #3951.
Normally we never check for OOM conditions inside Redis since the
allocator will always return a pointer or abort the program on OOM
conditons. However we cannot have control on epool_create(), that may
fail for kernel OOM (according to the manual page) even if all the
parameters are correct, so the function aeCreateEventLoop() may indeed
return NULL and this condition must be checked.
2017-04-21 16:27:38 +02:00
Itamar Haber
b8286d1fc9 Changes command stats iteration to being dict-based
With the addition of modules, looping over the redisCommandTable
misses any added commands. By moving to dictionary iteration this
is resolved.
2017-04-13 17:03:46 +03:00
antirez
4a850be4dc Set lua-time-limit default value at safe place.
Otherwise, as it was, it will overwrite whatever the user set.

Close #3703.
2017-04-11 16:56:00 +02:00
antirez
ffefc9f92d Fix modules blocking commands awake delay.
If a thread unblocks a client blocked in a module command, by using the
RedisMdoule_UnblockClient() API, the event loop may not be awaken until
the next timeout of the multiplexing API or the next unrelated I/O
operation on other clients. We actually want the client to be served
ASAP, so a mechanism is needed in order for the unblocking API to inform
Redis that there is a client to serve ASAP.

This commit fixes the issue using the old trick of the pipe: when a
client needs to be unblocked, a byte is written in a pipe. When we run
the list of clients blocked in modules, we consume all the bytes
written in the pipe. Writes and reads are performed inside the context
of the mutex, so no race is possible in which we consume the bytes that
are actually related to an awake request for a client that should still
be put into the list of clients to unblock.

It was verified that after the fix the server handles the blocked
clients with the expected short delay.

Thanks to @dvirsky for understanding there was such a problem and
reporting it.
2017-04-10 09:33:21 +02:00
antirez
adeed29a99 Use SipHash hash function to mitigate HashDos attempts.
This change attempts to switch to an hash function which mitigates
the effects of the HashDoS attack (denial of service attack trying
to force data structures to worst case behavior) while at the same time
providing Redis with an hash function that does not expect the input
data to be word aligned, a condition no longer true now that sds.c
strings have a varialbe length header.

Note that it is possible sometimes that even using an hash function
for which collisions cannot be generated without knowing the seed,
special implementation details or the exposure of the seed in an
indirect way (for example the ability to add elements to a Set and
check the return in which Redis returns them with SMEMBERS) may
make the attacker's life simpler in the process of trying to guess
the correct seed, however the next step would be to switch to a
log(N) data structure when too many items in a single bucket are
detected: this seems like an overkill in the case of Redis.

SPEED REGRESION TESTS:

In order to verify that switching from MurmurHash to SipHash had
no impact on speed, a set of benchmarks involving fast insertion
of 5 million of keys were performed.

The result shows Redis with SipHash in high pipelining conditions
to be about 4% slower compared to using the previous hash function.
However this could partially be related to the fact that the current
implementation does not attempt to hash whole words at a time but
reads single bytes, in order to have an output which is endian-netural
and at the same time working on systems where unaligned memory accesses
are a problem.

Further X86 specific optimizations should be tested, the function
may easily get at the same level of MurMurHash2 if a few optimizations
are performed.
2017-02-20 17:29:17 +01:00
oranagra
7aa9e6d2ae active memory defragmentation 2016-12-30 03:37:52 +02:00
antirez
074383f850 Remove first version of ASCII wave, later discarded. 2016-12-19 16:45:18 +01:00
antirez
06bfeb482d Only show Redis logo if logging to stdout / TTY.
You can still force the logo in the normal logs.
For motivations, check issue #3112. For me the reason is that actually
the logo is nice to have in interactive sessions, but inside the logs
kinda loses its usefulness, but for the ability of users to recognize
restarts easily: for this reason the new startup sequence shows a one
liner ASCII "wave" so that there is still a bit of visual clue.

Startup logging was modified in order to log events in more obvious
ways, and to log more events. Also certain important informations are
now more easy to parse/grep since they are printed in field=value style.

The option --always-show-logo in redis.conf was added, defaulting to no.
2016-12-19 16:41:47 +01:00
antirez
90a6f7fc98 adjustOpenFilesLimit() comment made hopefully more clear. 2016-12-19 08:53:29 +01:00
Salvatore Sanfilippo
2988889db1 Merge pull request #3603 from oranagra/adjustOpenFilesLimit_overflow
fix unsigned int overflow in adjustOpenFilesLimit
2016-12-19 08:48:44 +01:00
antirez
87538cb7fe Switch PFCOUNT to LogLog-Beta algorithm.
The new algorithm provides the same speed with a smaller error for
cardinalities in the range 0-100k. Before switching, the new and old
algorithm behavior was studied in details in the context of
issue #3677. You can find a few graphs and motivations there.
2016-12-16 11:07:30 +01:00
Harish Murthy
c55e3fbae5 LogLog-Beta Algorithm support within HLL
Config option to use LogLog-Beta Algorithm for Cardinality
2016-12-16 11:07:30 +01:00
antirez
d1adc85aa6 INFO: show num of slave-expires keys tracked. 2016-12-13 16:02:29 +01:00
antirez
04542cff92 Replication: fix the infamous key leakage of writable slaves + EXPIRE.
BACKGROUND AND USE CASEj

Redis slaves are normally write only, however the supprot a "writable"
mode which is very handy when scaling reads on slaves, that actually
need write operations in order to access data. For instance imagine
having slaves replicating certain Sets keys from the master. When
accessing the data on the slave, we want to peform intersections between
such Sets values. However we don't want to intersect each time: to cache
the intersection for some time often is a good idea.

To do so, it is possible to setup a slave as a writable slave, and
perform the intersection on the slave side, perhaps setting a TTL on the
resulting key so that it will expire after some time.

THE BUG

Problem: in order to have a consistent replication, expiring of keys in
Redis replication is up to the master, that synthesize DEL operations to
send in the replication stream. However slaves logically expire keys
by hiding them from read attempts from clients so that if the master did
not promptly sent a DEL, the client still see logically expired keys
as non existing.

Because slaves don't actively expire keys by actually evicting them but
just masking from the POV of read operations, if a key is created in a
writable slave, and an expire is set, the key will be leaked forever:

1. No DEL will be received from the master, which does not know about
such a key at all.

2. No eviction will be performed by the slave, since it needs to disable
eviction because it's up to masters, otherwise consistency of data is
lost.

THE FIX

In order to fix the problem, the slave should be able to tag keys that
were created in the slave side and have an expire set in some way.

My solution involved using an unique additional dictionary created by
the writable slave only if needed. The dictionary is obviously keyed by
the key name that we need to track: all the keys that are set with an
expire directly by a client writing to the slave are tracked.

The value in the dictionary is a bitmap of all the DBs where such a key
name need to be tracked, so that we can use a single dictionary to track
keys in all the DBs used by the slave (actually this limits the solution
to the first 64 DBs, but the default with Redis is to use 16 DBs).

This solution allows to pay both a small complexity and CPU penalty,
which is zero when the feature is not used, actually. The slave-side
eviction is encapsulated in code which is not coupled with the rest of
the Redis core, if not for the hook to track the keys.

TODO

I'm doing the first smoke tests to see if the feature works as expected:
so far so good. Unit tests should be added before merging into the
4.0 branch.
2016-12-13 10:59:54 +01:00
antirez
5b7d42fff3 PSYNC2: bugfixing pre release.
1. Master replication offset was cleared after switching configuration
to some other slave, since it was assumed you can't PSYNC after a
switch. Note the case anymore and when we successfully PSYNC we need to
have our offset untouched.

2. Secondary replication ID was not reset to "000..." pattern at
startup.

3. Master in error state replying -LOADING or other transient errors
forced the slave to discard the cached master and full resync. This is
now fixed.

4. Better logging of what's happening on failed PSYNCs.
2016-11-23 17:36:45 +01:00
oranagra
a1a07225b3 fix unsigned int overflow in adjustOpenFilesLimit 2016-11-10 16:59:52 +02:00
antirez
28c96d73b2 PSYNC2: Save replication ID/offset on RDB file.
This means that stopping a slave and restarting it will still make it
able to PSYNC with the master. Moreover the master itself will retain
its ID/offset, in case it gets turned into a slave, or if a slave will
try to PSYNC with it with an exactly updated offset (otherwise there is
no backlog).

This change was possible thanks to PSYNC v2 that makes saving the current
replication state much simpler.
2016-11-10 12:35:29 +01:00
antirez
2669fb8364 PSYNC2: different improvements to Redis replication.
The gist of the changes is that now, partial resynchronizations between
slaves and masters (without the need of a full resync with RDB transfer
and so forth), work in a number of cases when it was impossible
in the past. For instance:

1. When a slave is promoted to mastrer, the slaves of the old master can
partially resynchronize with the new master.

2. Chained slalves (slaves of slaves) can be moved to replicate to other
slaves or the master itsef, without requiring a full resync.

3. The master itself, after being turned into a slave, is able to
partially resynchronize with the new master, when it joins replication
again.

In order to obtain this, the following main changes were operated:

* Slaves also take a replication backlog, not just masters.

* Same stream replication for all the slaves and sub slaves. The
replication stream is identical from the top level master to its slaves
and is also the same from the slaves to their sub-slaves and so forth.
This means that if a slave is later promoted to master, it has the
same replication backlong, and can partially resynchronize with its
slaves (that were previously slaves of the old master).

* A given replication history is no longer identified by the `runid` of
a Redis node. There is instead a `replication ID` which changes every
time the instance has a new history no longer coherent with the past
one. So, for example, slaves publish the same replication history of
their master, however when they are turned into masters, they publish
a new replication ID, but still remember the old ID, so that they are
able to partially resynchronize with slaves of the old master (up to a
given offset).

* The replication protocol was slightly modified so that a new extended
+CONTINUE reply from the master is able to inform the slave of a
replication ID change.

* REPLCONF CAPA is used in order to notify masters that a slave is able
to understand the new +CONTINUE reply.

* The RDB file was extended with an auxiliary field that is able to
select a given DB after loading in the slave, so that the slave can
continue receiving the replication stream from the point it was
disconnected without requiring the master to insert "SELECT" statements.
This is useful in order to guarantee the "same stream" property, because
the slave must be able to accumulate an identical backlog.

* Slave pings to sub-slaves are now sent in a special form, when the
top-level master is disconnected, in order to don't interfer with the
replication stream. We just use out of band "\n" bytes as in other parts
of the Redis protocol.

An old design document is available here:

https://gist.github.com/antirez/ae068f95c0d084891305

However the implementation is not identical to the description because
during the work to implement it, different changes were needed in order
to make things working well.
2016-11-09 15:37:15 +01:00