Commit Graph

2922 Commits

Author SHA1 Message Date
antirez
53ae687d59 Latency monitor: specialize delayed aof writes events. 2014-07-02 16:45:45 +02:00
antirez
a53c734094 LATENCY GRAPH: filling under the curve is more readable. 2014-07-02 16:37:53 +02:00
antirez
aa16f87b87 LATENCY GRAPH implemented. 2014-07-02 16:31:22 +02:00
antirez
6f20482a86 latencyTimeSeries structure max field type fixed. 2014-07-02 16:14:28 +02:00
antirez
9d4d810861 Free labels in freeSparklineSequence(). 2014-07-02 12:49:14 +02:00
antirez
1766d91697 LATENCY LATEST: add the max field. 2014-07-02 12:40:38 +02:00
antirez
e173f7a0e3 Latency monitor trheshold value is now configurable.
This commit adds both support for redis.conf and CONFIG SET/GET.
2014-07-02 12:28:17 +02:00
antirez
cc4df5a6b8 ASCII sparklines generation API. 2014-07-02 10:13:53 +02:00
antirez
ed4980243a License added to latency.h. 2014-07-02 10:06:58 +02:00
antirez
b809676a9e Latency monitor turned off by default.
It is not a good idea to bloat the code with gettimeofday() calls if the
instance is working well, and turning monitoring on at runtime is a
joke.
2014-07-01 17:23:59 +02:00
antirez
de88bc63d5 Latency monitor: more hooks around the code. 2014-07-01 17:19:08 +02:00
antirez
f35abe2ff5 Latency monitor: don't add new samples in the same second.
Instead we update the old sample with the new latency if it is greater.
2014-07-01 17:12:09 +02:00
antirez
83beaa886c LATENCY LATEST implemented. 2014-07-01 16:17:33 +02:00
antirez
753b707d2a Latency monitor: command duration is in useconds. Convert. 2014-07-01 16:09:02 +02:00
antirez
551bee86b4 LATENCY SAMPLES implemented. 2014-07-01 16:07:13 +02:00
antirez
8612e6de88 Latency monitor: collect slow commands.
We introduce the distinction between slow and fast commands since those
are two different sources of latency. An O(1) or O(log N) command without
side effects (can't trigger deletion of large objects as a side effect of
its execution) if delayed is a symptom of inherent latency of the system.

A non-fast command (commands that may run large O(N) computations) if
delayed may just mean that the user is executing slow operations.

The advices LATENCY should provide in this two different cases are
different, so we log the two classes of commands in a separated way.
2014-07-01 11:47:08 +02:00
antirez
d7a07a2012 Latency monitor: basic samples collection. 2014-07-01 11:30:15 +02:00
antirez
0afb7a48c0 Fix Solaris compilation due to ctime_r() call.
Introduced in Redis 2.8.10 because of a change in Sentinel.
This closes issue #1837.
2014-06-30 16:29:12 +02:00
antirez
683f41adf2 DEBUG CMDKEYS moved to COMMAND GETKEYS. 2014-06-27 12:22:15 +02:00
antirez
885b6fc577 COMMAND COUNT subcommand added. 2014-06-27 12:11:15 +02:00
antirez
a92ae77740 COMMAND: fix argument parsing.
This fixes detection of wrong subcommand (that resulted in the default
all-commands output instead) and allows COMMAND INFO to be called
without arguments (resulting into an empty array) which is useful in
programmtically generated calls like the following (in Ruby):

    redis.commands("command","info",*mycommands)

Note: mycommands may be empty.
2014-06-27 12:05:54 +02:00
antirez
7fd0149d34 COMMANDS command renamed COMMAND. 2014-06-27 12:01:29 +02:00
antirez
9bf6921f3d COMMANDS command: remove static + aesthetic changes.
Static was removed since it is needed in order to get symbols in stack
traces. Minor changes in the source code were operated to make it more
similar to the existing Redis code base.
2014-06-27 11:59:48 +02:00
Matt Stancliff
183458f76a Cluster: Add COMMANDS command
COMMANDS returns a nested multibulk reply for each
command in the command table.  The reply for each
command contains:
  - command name
  - arity
  - array of command flags
  - start key position
  - end key position
  - key offset step
  - optional: if the keys are not deterministic and
    Redis uses an internal key evaluation function,
    the 6th field appears and is defined as a status
    reply of: REQUIRES ARGUMENT PARSING

Cluster clients need to know where the keys are in each
command to implement proper routing to cluster nodes.

Redis commands can have multiple keys, keys at offset steps, or other
issues where you can't always assume the first element after
the command name is the cluster routing key.

Using the information exposed by COMMANDS, client implementations
can have live, accurate key extraction details for all commands.

Also implements COMMANDS INFO [commands...] to return only a
specific set of commands instead of all 160+ commands live in Redis.
2014-06-27 11:54:26 +02:00
antirez
95b1979c32 No more trailing spaces in Redis source code. 2014-06-26 18:48:40 +02:00
antirez
97f1fc65cf CLIENT KILL: don't kill the master as a normal client.
Technically the problem is due to the client type API that does not
return a special value for the master, however fixing it locally in the
CLIENT KILL command is better currently because otherwise we would
introduce a new output buffer limit class as a side effect.
2014-06-26 18:43:09 +02:00
Matt Stancliff
a3e7a665ad Allow __powerpc__ to define HAVE_ATOMIC too
From mailing list post https://groups.google.com/forum/#!topic/redis-db/D3k7KmJmYgM

In the file “config.h”, the definition HAVE_ATOMIC is used to indicate
if an architecture on which redis is implemented supports atomic
synchronization primitives.  Powerpc  supports atomic synchronization
primitives, however, it is not listed as one of the architectures
supported in config.h. This patch  adds the __powerpc__ to the list of
architectures supporting these primitives. The improvement of redis
due to the atomic synchronization on powerpc is significant,
around 30% to 40%, over the default implementation using pthreads.

This proposal adds __powerpc__ to the list of architectures designated
to support atomic builtins.
2014-06-26 08:55:47 -04:00
Matt Stancliff
a953c88381 Allow atomic memory count update with C11 builtins
From mailing list post https://groups.google.com/forum/#!topic/redis-db/QLjiQe4D7LA

In zmalloc.c the following primitives are currently used
to synchronize access to single global variable:
__sync_add_and_fetch
__sync_sub_and_fetch

In some architectures such as powerpc these primitives are overhead
intensive. More efficient C11 __atomic builtins are available with
newer GCC versions, see
http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/_005f_005fatomic-Builtins.html#_005f_005fatomic-Builtins

By substituting the following  __atomic… builtins:
__atomic_add_fetch
__atomic_sub_fetch

the performance improvement on certain architectures such as powerpc can be significant,
around 10% to 15%, over the implementation using __sync builtins while there is only slight uptick on
Intel architectures because it was already enforcing Intel Strongly ordered memory semantics.

The selection of __atomic built-ins can be predicated on the definition of ATOMIC_RELAXED
which Is available on in gcc 4.8.2 and later versions.
2014-06-26 08:52:53 -04:00
Matt Stancliff
f9bca13a1a Use predefined macro for used_memory() update 2014-06-26 08:51:13 -04:00
antirez
9be3ee8283 Make unstable branch version unique and distinguishable. 2014-06-25 15:30:34 +02:00
antirez
75c57d53ea CLUSTER SLOTS: don't output failing slaves.
While we have to output failing masters in order to provide an accurate
map (that may be the one of a Redis Cluster in down state because not
all slots are served by a working master), to provide slaves in FAIL
state is not a good idea since those are not necesarely needed, and the
client will likely incur into a latency penalty trying to connect with a
slave which is down.

Note that this means that CLUSTER SLOTS does not provide a *complete*
map of slaves, however this would not be of any help since slaves may be
added later, and a client that needs to scale reads and requires to
stay updated with the list of slaves, need to do a refresh of the map
from time to time, anyway.
2014-06-25 15:19:35 +02:00
antirez
a6fe4ca321 CLUSTER SLOTS: main loop should skip only slaves and zero slot masters. 2014-06-25 15:08:33 +02:00
Matt Stancliff
e14829de30 Cluster: Add CLUSTER SLOTS command
CLUSTER SLOTS returns a Redis-formatted mapping from
slot ranges to IP/Port pairs serving that slot range.

The outer return elements group return values by slot ranges.

The first two entires in each result are the min and max slots for the range.

The third entry in each result is guaranteed to be either
an IP/Port of the master for that slot range - OR - null
if that slot range, for some reason, has no master

The 4th and higher entries in each result are replica instances
for the slot range.

Output comparison:
127.0.0.1:7001> cluster nodes
f853501ec8ae1618df0e0f0e86fd7abcfca36207 127.0.0.1:7001 myself,master - 0 0 2 connected 4096-8191
5a2caa782042187277647661ffc5da739b3e0805 127.0.0.1:7005 slave f853501ec8ae1618df0e0f0e86fd7abcfca36207 0 1402622415859 6 connected
6c70b49813e2ffc9dd4b8ec1e108276566fcf59f 127.0.0.1:7007 slave 26f4729ca0a5a992822667fc16b5220b13368f32 0 1402622415357 8 connected
2bd5a0e3bb7afb2b56a2120d3fef2f2e4333de1d 127.0.0.1:7006 slave 32adf4b8474fdc938189dba00dc8ed60ce635b0f 0 1402622419373 7 connected
5a9450e8279df36ff8e6bb1c139ce4d5268d1390 127.0.0.1:7000 master - 0 1402622418872 1 connected 0-4095
32adf4b8474fdc938189dba00dc8ed60ce635b0f 127.0.0.1:7002 master - 0 1402622419874 3 connected 8192-12287
5db7d05c245267afdfe48c83e7de899348d2bdb6 127.0.0.1:7004 slave 5a9450e8279df36ff8e6bb1c139ce4d5268d1390 0 1402622417867 5 connected
26f4729ca0a5a992822667fc16b5220b13368f32 127.0.0.1:7003 master - 0 1402622420877 4 connected 12288-16383

127.0.0.1:7001> cluster slots
1) 1) (integer) 0
   2) (integer) 4095
   3) 1) "127.0.0.1"
      2) (integer) 7000
   4) 1) "127.0.0.1"
      2) (integer) 7004
2) 1) (integer) 12288
   2) (integer) 16383
   3) 1) "127.0.0.1"
      2) (integer) 7003
   4) 1) "127.0.0.1"
      2) (integer) 7007
3) 1) (integer) 4096
   2) (integer) 8191
   3) 1) "127.0.0.1"
      2) (integer) 7001
   4) 1) "127.0.0.1"
      2) (integer) 7005
4) 1) (integer) 8192
   2) (integer) 12287
   3) 1) "127.0.0.1"
      2) (integer) 7002
   4) 1) "127.0.0.1"
      2) (integer) 7006
2014-06-25 15:03:41 +02:00
antirez
f29b12d0bf Cluster: myself->ip autodiscovery.
Instead of having an hardcoded IP address in the node configuration, we
autodiscover it via MEET messages for automatic update when the node is
restarted with a different IP address.

This mechanism was discussed in the context of PR #1782.
2014-06-25 11:28:57 +02:00
antirez
46319094db Old form of CLIENT KILL should still allow suicide. 2014-06-24 12:49:28 +02:00
antirez
e3bae84606 Sentinel implementation of ROLE. 2014-06-23 12:07:41 +02:00
antirez
be8f4d49d4 Silence different signs comparison warning in sds.c. 2014-06-23 11:50:24 +02:00
Matt Stancliff
5cd83ef539 Sentinel: bind source address
Some deployments need traffic sent from a specific address.  This
change uses the same policy as Cluster where the first listed bindaddr
becomes the source address for outgoing Sentinel communication.

Fixes #1667
2014-06-23 11:44:35 +02:00
Matt Stancliff
d830dcb12d Add REDIS_BIND_ADDR access macro
We need to access (bindaddr[0] || NULL) in a few places, so centralize
access with a nice macro.
2014-06-23 11:44:34 +02:00
Matt Stancliff
ef897a41e8 Cancel SHUTDOWN if initial AOF is being written
Fixes #1826 (and many other reports of the same problem)
2014-06-23 11:44:34 +02:00
antirez
fb2f637c4a Allow to call ROLE in LOADING state. 2014-06-21 11:39:43 +02:00
antirez
7970d53997 ROLE command: array len fixed for slave output. 2014-06-21 11:17:18 +02:00
antirez
22d17bc14f Cluster: clear NOADDR flag when updating node address. 2014-06-20 09:32:47 +02:00
antirez
41f12ac988 Sentinel: send hello messages ASAP after config change.
Eventual configuration convergence is guaranteed by our periodic hello
messages to all the instances, however when there are important notices
to share, better make a phone call. With this commit we force an hello
message to other Sentinal and Redis instances within the next 100
milliseconds of a config update, which is practically better than
waiting a few seconds.
2014-06-19 15:17:06 +02:00
antirez
94bc467328 Sentinel: handle SRI_PROMOTED flag correctly.
Lack of check of the SRI_PROMOTED flag caused Sentienl to act with the
promoted slave turned into a master during failover like if it was a
normal instance.

Normally this problem was not apparent because during real failovers the
old master is down so the bugged code path was not entered, however with
manual failovers via the SENTINEL FAILOVER command, the problem was
easily triggered.

This commit prevents promoted slaves from getting reconfigured, moreover
we now explicitly check that during a failover the slave turning into a
master is the one we selected for promotion and not a different one.
2014-06-19 10:28:27 +02:00
Alex Suraci
9f8dcfe69a add missing signal.h include 2014-06-17 21:59:12 -07:00
Matt Stancliff
20c2a38ad0 Add SIGINT handler to cli --intrinsic-latency
If we run a long latency session and want to Ctrl-C out of it,
it's nice to still get the summary output at the end.
2014-06-17 10:12:57 -04:00
antirez
2c17591224 Sentinel: send SLAVEOF with MULTI, CLIENT KILL, CONFIG REWRITE.
This implements the new Sentinel-Client protocol for the Sentinel part:
now instances are reconfigured using a transaction that ensures that the
config is rewritten in the target instance, and that clients lose the
connection with the instance, in order to be forced to: ask Sentinel,
reconnect to the instance, and verify the instance role with the new
ROLE command.
2014-06-17 11:03:21 +02:00
antirez
bb2011d992 CLIENT KILL API modified.
Added a new SKIPME option that is true by default, that prevents the
client sending the command to be killed, unless SKIPME NO is sent.
2014-06-16 14:50:15 +02:00
antirez
e06b3819ea CLIENT KILL: fix closing link of the current client. 2014-06-16 14:28:23 +02:00
antirez
e7affd266c New features for CLIENT KILL. 2014-06-16 14:24:28 +02:00
antirez
f26f79ea37 Assign an unique non-repeating ID to each new client.
This will be used by CLIENT KILL and is also a good way to ensure a
given client is still the same across CLIENT LIST calls.

The output of CLIENT LIST was modified to include the new ID, but this
change is considered to be backward compatible as the API does not imply
you can do positional parsing, since each filed as a different name.
2014-06-16 14:22:55 +02:00
antirez
56d26c2380 Client types generalized.
Because of output buffer limits Redis internals had this idea of type of
clients: normal, pubsub, slave. It is possible to set different output
buffer limits for the three kinds of clients.

However all the macros and API were named after output buffer limit
classes, while the idea of a client type is a generic one that can be
reused.

This commit does two things:

1) Rename the API and defines with more general names.
2) Change the class of clients executing the MONITOR command from "slave"
   to "normal".

"2" is a good idea because you want to have very special settings for
slaves, that are not a good idea for MONITOR clients that are instead
normal clients even if they are conceptually slave-alike (since it is a
push protocol).

The backward-compatibility breakage resulting from "2" is considered to
be minimal to care, since MONITOR is a debugging command, and because
anyway this change is not going to break the format or the behavior, but
just when a connection is closed on big output buffer issues.
2014-06-16 10:43:05 +02:00
antirez
96e0fe6232 Fix semantics of Lua calls to SELECT.
Lua scripts are executed in the context of the currently selected
database (as selected by the caller of the script).

However Lua scripts are also free to use the SELECT command in order to
affect other DBs. When SELECT is called frm Lua, the old behavior, before
this commit, was to automatically set the Lua caller selected DB to the
last DB selected by Lua. See for example the following sequence of
commands:

    SELECT 0
    SET x 10
    EVAL "redis.call('select','1')" 0
    SET x 20

Before this commit after the execution of this sequence of commands,
we'll have x=10 in DB 0, and x=20 in DB 1.

Because of the problem above, there was a bug affecting replication of
Lua scripts, because of the actual implementation of replication. It was
possible to fix the implementation of Lua scripts in order to fix the
issue, but looking closely, the bug is the consequence of the behavior
of Lua ability to set the caller's DB.

Under the old semantics, a script selecting a different DB, has no simple
ways to restore the state and select back the previously selected DB.
Moreover the script auhtor must remember that the restore is needed,
otherwise the new commands executed by the caller, will be executed in
the context of a different DB.

So this commit fixes both the replication issue, and this hard-to-use
semantics, by removing the ability of Lua, after the script execution,
to force the caller to switch to the DB selected by the Lua script.

The new behavior of the previous sequence of commadns is to just set
X=20 in DB 0. However Lua scripts are still capable of writing / reading
from different DBs if needed.

WARNING: This is a semantical change that will break programs that are
conceived to select the client selected DB via Lua scripts.

This fixes issue #1811.
2014-06-12 16:05:52 +02:00
antirez
73fefd0bc0 Scripting: Fix for a #1118 regression simplified.
It is more straightforward to just test for a numerical type avoiding
Lua's automatic conversion. The code is technically more correct now,
however Lua should automatically convert to number only if the original
type is a string that "looks like a number", and not from other types,
so practically speaking the fix is identical AFAIK.
2014-06-11 10:10:58 +02:00
Matt Stancliff
76efe1225f Scripting: Fix regression from #1118
The new check-for-number behavior of Lua arguments broke
users who use large strings of just integers.

The Lua number check would convert the string to a number, but
that breaks user data because
Lua numbers have limited precision compared to an arbitrarily
precise number wrapped in a string.

Regression fixed and new test added.

Fixes #1118 again.
2014-06-10 14:26:13 -04:00
antirez
8ef79e72ac Cluster: fix an error message when logging failover auth denied. 2014-06-10 17:39:42 +02:00
antirez
58799718be Cluster: better comment for clusterSendFailoverAuthIfNeeded() epoch test. 2014-06-10 17:20:21 +02:00
antirez
61eb0eae83 Cluster: log granted failover authorizations. 2014-06-10 16:56:08 +02:00
antirez
d5d92deb6c Cluster: log configEpoch updates to myself. 2014-06-10 16:38:36 +02:00
antirez
8204ab0098 Cluster: log when a master denies a failover auth. 2014-06-10 16:07:26 +02:00
antirez
9b3bc82c1a Cluster: cluster_my_epoch added to CLUSTER INFO output. 2014-06-10 11:35:40 +02:00
Salvatore Sanfilippo
08c7363647 Merge pull request #1743 from mattsta/cygwin-compile-fix
Cygwin compile fix
2014-06-09 11:42:14 +02:00
Salvatore Sanfilippo
c7f93143f6 Merge pull request #1669 from mattsta/blpop-internally-added-keys
Fix blocking operations from missing new lists
2014-06-09 11:37:28 +02:00
antirez
6a13193d8f ROLE output improved for slaves.
Info about the replication state with the master added.
2014-06-07 17:38:20 +02:00
antirez
d34c2fa3bb ROLE command added.
The new ROLE command is designed in order to provide a client with
informations about the replication in a fast and easy to use way
compared to the INFO command where the same information is also
available.
2014-06-07 17:27:49 +02:00
antirez
32d0a79f78 Cluster: check that configEpoch never goes back.
Since there are ways to alter the configEpoch outside of the failover
procedure (for exampel CLUSTER SET-CONFIG-EPOCH and via the configEpoch
collision resolution algorithm), make always sure, before replacing our
configEpoch with a new one, that it is greater than the current one.
2014-06-07 14:37:09 +02:00
antirez
a2c2ef7de5 Cluster: SET-CONFIG-EPOCH should update currentEpoch.
SET-CONFIG-EPOCH, used by redis-trib at cluster creation time, failed to
update the currentEpoch, making it possible after a failover for a
server to set its configEpoch to a value smaller than the current one
(since configEpochs are obtained using currentEpoch).

The bug totally break the Redis Cluster algorithms and protocols
allowing for permanent split brain conditions about the slots
configuration as shown in issue #1799.
2014-06-07 14:25:47 +02:00
Salvatore Sanfilippo
a2403227c7 Merge pull request #1772 from andygrunwald/typo-avarege-average
Fixed typo in word avarege in result message of --intrinsic-latency analyzer
2014-06-06 11:19:21 +02:00
Salvatore Sanfilippo
113be48221 Merge pull request #1780 from badboy/patch-8
Small typo fixed
2014-06-06 10:45:00 +02:00
Salvatore Sanfilippo
1e221d101c Merge pull request #1788 from zionwu/unstable
fix issue 1787
2014-06-06 10:33:11 +02:00
antirez
14fb0ac649 Don't process min-slaves-to-write for slaves.
Replication is totally broken when a slave has this option, since it
stops accepting updates from masters.

This fixes issue #1434.
2014-06-05 10:48:05 +02:00
antirez
3758f27bc1 Fixed dbuf variable scope in luaRedisGenericCommand().
I'm not sure if while the visibility is the inner block, the fact we
point to 'dbuf' is a problem or not, probably the stack var isx
guaranteed to live until the function returns. However obvious code is
better anyway.
2014-06-04 18:57:12 +02:00
antirez
072982d83c Scripting: better Lua number -> string conversion in luaRedisGenericCommand().
The lua_to*string() family of functions use a non optimal format
specifier when converting integers to strings. This has both the problem
of the number being converted in exponential notation, which we don't
use as a Redis return value when floating point numbers are involed,
and, moreover, there is a loss of precision since the default format
specifier is not able to represent numbers that must be represented
exactly in the IEEE 754 number mantissa.

The new code handles it as a special case using a saner conversion.

This fixes issue #1118.
2014-06-04 18:33:24 +02:00
zionwu
dc8584696a fix issue 1787 2014-06-01 02:23:24 +08:00
antirez
8a588ac14d More trailing spaces in sentinel.c removed. 2014-05-28 15:46:05 +02:00
Jan-Erik Rediger
b187c591e3 Small typo fixed 2014-05-28 09:46:01 +02:00
Matt Stancliff
7a0c5fdf12 Disable recursive watchdog signal handler
If we are in the signal handler, we don't want to handle
the signal again.  In extreme cases, this can cause a stack overflow
and segfault Redis.

Fixes #1771
2014-05-26 17:53:33 +02:00
antirez
88c2307535 Cluster: always allow ok -> fail switch in clusterUpdateState().
There is a time defined by REDIS_CLUSTER_WRITABLE_DELAY where fail -> ok
switch is not possible after startup as a master for some time, however
the contrary (ok -> fail) should always be possible.
2014-05-26 16:24:12 +02:00
Andy Grunwald
94e3bb568a Fixed typo in word avarege in result message of --intrinsic-latency analyzer 2014-05-22 20:01:12 +02:00
antirez
b239a32aae redisLogFromHandler() format changed to match new logs format. 2014-05-22 19:24:35 +02:00
antirez
d98fa718e0 Tag every log line with role.
Every log contains, just after the pid, a single character that provides
information about the role of an instance:

S - Slave
M - Master
C - Writing child
X - Sentinel
2014-05-22 18:48:37 +02:00
antirez
39603a7e31 Cluster: slave validity factor is now user configurable.
Check the commit changes in the example redis.conf for more information.
2014-05-22 16:57:54 +02:00
antirez
762b1ae2be Fix an error in redis-trib where we always talk with same node.
While iterating the list of nodes we want to set the slot as stable in
the current node, not always in the first node of the list.
2014-05-21 18:17:02 +02:00
antirez
c68c78719f redis-trib fix improved: move keys from N nodes to owner. 2014-05-21 16:40:46 +02:00
Matt Stancliff
33f943b4cd Fix blocking operations from missing new lists
Behrad Zari discovered [1] and Josiah reported [2]: if you block
and wait for a list to exist, but the list creates from
a non-push command, the blocked client never gets notified.

This commit adds notification of blocked clients into
the DB layer and away from individual commands.

Lists can be created by [LR]PUSH, SORT..STORE, RENAME, MOVE,
and RESTORE.  Previously, blocked client notifications were
only triggered by [LR]PUSH.  Your client would never get
notified if a list were created by SORT..STORE or RENAME or
a RESTORE, etc.

Blocked client notification now happens in one unified place:
  - dbAdd() triggers notification when adding a list to the DB

Two new tests are added that fail prior to this commit.

All test pass.

Fixes #1668

[1]: https://groups.google.com/forum/#!topic/redis-db/k4oWfMkN1NU
[2]: #1668
2014-05-21 09:52:52 -04:00
antirez
56161ca0a4 redis-trib fix: use MIGRATE REPLACE when fixing slots.
This fixes issue #1765.
2014-05-21 12:15:06 +02:00
antirez
ce2b2f22d9 Merge branch 'unstable' of github.com:/antirez/redis into unstable 2014-05-20 16:15:13 +02:00
Salvatore Sanfilippo
ce7c47265b Merge pull request #1764 from michael-grunder/lua_cache_segfault
Fix LUA_OBJCACHE segfault.
2014-05-20 16:14:34 +02:00
antirez
4ddc77041f Remove trailing spaces from scripting.c 2014-05-20 16:11:22 +02:00
antirez
01e3f9ba1d Remove trailing spaces from sentinel.c. 2014-05-20 14:22:42 +02:00
michael-grunder
ea0e2524aa Fix LUA_OBJCACHE segfault.
When scanning the argument list inside of a redis.call() invocation
for pre-cached values, there was no check being done that the
argument we were on was in fact within the bounds of the cache size.

So if a redis.call() command was ever executed with more than 32
arguments (current cache size #define setting) redis-server could
segfault.
2014-05-19 13:18:13 -07:00
Mike Trinkala
ba52cd06c8 Correct the HyperLogLog stale cache flag to prevent unnecessary computations.
Set the MSB as documented.
2014-05-18 07:26:26 -07:00
antirez
67133d2f48 Cluster: use clusterSetNodeAsMaster() during slave failover.
clusterHandleSlaveFailover() was reimplementing what
clusterSetNodeAsMaster() without any good reason.
2014-05-15 17:03:28 +02:00
antirez
8c6e92c3bc Cluster: clear todo_before_sleep flags when executing actions.
Thanks to this change, when there is some code like:

    clusterDoBeforeSleep(CLUSTER_TODO_UPDATE_STATE|...);
    ... and later before returning to the event loop ...
    clusterUpdateState();

The clusterUpdateState() function will clar the flag and will not be
repeated in the clusterBeforeSleep() function. This especially important
for config save/fsync flags which are slow to execute and not a good
idea to repeat without a good reason.

This is implemented for all the CLUSTER_TODO flags.
2014-05-15 16:33:13 +02:00
antirez
7b87cda70e Fixed typo in CLUSTER RESET implementation. 2014-05-15 12:33:57 +02:00
antirez
796f4ae9f7 CLUSTER RESET implemented.
The new command is able to reset a cluster node so that it starts again
as a fresh node. By default the command performs a soft reset (the same
as calling it as CLUSTER RESET SOFT), and the following steps are
performed:

1) All slots are set as unassigned.
2) The list of known nodes is flushed.
3) Node is set as master if it is a slave.

When an hard reset is performed with CLUSTER RESET HARD the following
additional operations are performed:

4) A new Node ID is created at random.
5) Epochs are set to 0.

CLUSTER RESET is useful both when the sysadmin wants to reconfigure a
node with a different role (for example turning a slave into a master)
and for testing purposes.

It also may play a role in automatically provisioned Redis Clusters,
since it allows to reset a node back to the initial state in order to be
reconfigured.
2014-05-15 11:43:06 +02:00
antirez
8b9d5ecbd1 Remove trailing spaces from cluster.c file. 2014-05-15 10:18:36 +02:00
antirez
60e5d1724c Cluster: don't accept cluster bus connections during startup. 2014-05-14 12:05:00 +02:00
antirez
6baac558d8 Cluster: better handling of stolen slots.
The previous code handling a lost slot (by another master with an higher
configuration for the slot) was defensive, considering it an error and
putting the cluster in an odd state requiring redis-cli fix.

This was changed, because actually this only happens either in a
legitimate way, with failovers, or when the admin messed with the config
in order to reconfigure the cluster. So the new code instead will try to
make sure that the keys stored match the new slots map, by removing all
the keys in the slots we lost ownership from.

The function that deletes the keys from the lost slots is called only
if the node does not lose all its slots (resulting in a reconfiguration
as a slave of the node that got ownership). This is an optimization
since the replication code will anyway flush all the instance data in
a faster way.
2014-05-14 10:46:37 +02:00
antirez
832a298005 Cluster: fixed data_age computation / check integer overflow. 2014-05-12 17:46:15 +02:00
Matt Stancliff
7c4decb101 Fix lack of strtold under Cygwin
Renaming strtold to strtod then casting
the result is the standard way of dealing with
no strtold in Cygwin.
2014-05-12 11:11:09 -04:00
Matt Stancliff
3e0e51dd9f Fix lack of SA_ONSTACK under Cygwin
Fixes #232
2014-05-12 11:10:24 -04:00
antirez
2692339138 Cluster: forced failover implemented.
Using CLUSTER FAILOVER FORCE it is now possible to failover a master in
a forced way, which means:

1) No check to understand if the master is up is performed.
2) No data age of the slave is checked. Evan a slave with very old data
   can manually failover a master in this way.
3) No chat with the master is attempted to reach its replication offset:
   the master can just be down.
2014-05-12 16:34:20 +02:00
antirez
005f564eb3 Cluster: bypass data_age check for manual failovers.
Automatic failovers only happen in Redis Cluster if the slave trying to
be elected was disconnected from its master for no more than 10 times
the node-timeout value. However there should be no such a check for
manual failovers, since these are initiated by the sysadmin that, in
theory, knows what she is doing when a slave is selected to be promoted.
2014-05-12 16:12:12 +02:00
Akos Vandra
b252fab06c Fixed possible buffer overflow bug if RDB file is corrupted.
(Note: commit message modified by @antirez for clarity).
2014-05-12 11:48:14 +02:00
Akos Vandra
433e835d3e fixed possible buffer overflow error 2014-05-12 11:19:07 +02:00
antirez
658ad301cc redis-trib create: use CONFIG SET-CONFIG-EPOCH before joining the cluster.
This way there is no need for the conflict resolution algo to be used in
order to start with a cluster where each node has a different
configEpoch.
2014-05-12 11:06:37 +02:00
antirez
715a6d3a78 redis-trib import: trap MIGRATE errors. 2014-05-12 10:36:33 +02:00
antirez
939c586ef7 redis-trib.rb: MIGRATE hardcoded timeout set to 15 sec.
Will be configurable / adaptive at some point but let's start with a
saner value compared to 1 sec which is not a good idea for big data
structures stored into a single key.
2014-05-12 10:22:24 +02:00
antirez
5c78f87666 RESTORE: reply with -BUSYKEY special error code.
The error when the target key is busy was a generic one, while it makes
sense to be able to distinguish between the target key busy error and
the others easily.
2014-05-12 10:01:59 +02:00
antirez
2a48bd4a37 Cluster: initial ability to import data from standalone instance. 2014-05-10 17:59:31 +02:00
antirez
71d0e7e0ea CLUSTER MEET: better error messages when address is invalid.
Fixes issue #1734.
2014-05-09 16:36:59 +02:00
antirez
74435aba47 redis-trib: allow support for mandatory options. 2014-05-09 16:11:11 +02:00
antirez
72ff03346f DEBUG POPULATE: call dictExpand() to avoid useless rehashing. 2014-05-09 15:02:29 +02:00
antirez
8a170c817d Cluster: bulk-accept new nodes connections.
The same change was operated for normal client connections. This is
important for Cluster as well, since when a node rejoins the cluster,
when a partition heals or after a restart, it gets flooded with new
connection attempts by all the other nodes trying to form a full
mesh again.
2014-05-09 11:52:59 +02:00
antirez
3625b52791 Cluster: clusterAcceptHandler() comments updated to match the code. 2014-05-09 11:44:46 +02:00
antirez
2102778606 Sentinel: log when a failover will be attempted again.
When a Sentinel performs a failover (successful or not), or when a
Sentinel votes for a different Sentinel trying to start a failover, it
sets a min delay before it will try to get elected for a failover.

While not strictly needed, because if multiple Sentinels will try
to failover the same master at the same time, only one configuration
will eventually win, this serialization is practically very useful.
Normal failovers are cleaner: one Sentinel starts to failover, the
others update their config when the Sentinel performing the failover
is able to get the selected slave to move from the role of slave to the
one of master.

However currently this timeout was implicit, so users could see
Sentinels not reacting, after a failed failover, for some time, without
giving any feedback in the logs to the poor sysadmin waiting for clues.

This commit makes Sentinels more verbose about the delay: when a master
is down and a failover attempt is not performed because the delay has
still not elaped, something like that will be logged:

    Next failover delay: I will not start a failover
    before Thu May  8 16:48:59 2014
2014-05-08 16:38:53 +02:00
antirez
931beae9b0 Sentinel: generate +config-update-from event when a new config is received.
This event makes clear, before the switch-master event is generated,
that a Sentinel received a configuration update from another Sentinel.
2014-05-08 15:59:34 +02:00
antirez
0b0f872f3f REDIS_ENCODING_EMBSTR_SIZE_LIMIT set to 39.
The new value is the limit for the robj + SDS header + string +
null-term to stay inside the 64 bytes Jemalloc arena in 64 bits
systems.
2014-05-07 17:05:09 +02:00
antirez
4f686555ce Scripting: objects caching for Lua c->argv creation.
Reusing small objects when possible is a major speedup under certain
conditions, since it is able to avoid the malloc/free pattern that
otherwise is performed for every argument in the client command vector.
2014-05-07 16:12:32 +02:00
antirez
1e4ba6e7e6 Scripting: Use faster API for Lua client c->argv creation.
Replace the three calls to Lua API lua_tostring, lua_lua_strlen,
and lua_isstring, with a single call to lua_tolstring.

~ 5% consistent speed gain measured.
2014-05-07 16:12:32 +02:00
antirez
76fda9f8e1 Scripting: don't call lua_gc() after Lua script run.
Calling lua_gc() after every script execution is too expensive, and
apparently does not make the execution smoother: the same peak latency
was measured before and after the commit.

This change accounts for scripts execution speedup in the order of 10%.
2014-05-07 16:12:32 +02:00
antirez
48c49c4851 Scripting: cache argv in luaRedisGenericCommand().
~ 4% consistently measured speed improvement.
2014-05-07 16:12:32 +02:00
antirez
3318b74705 Fixed missing c->bufpos reset in luaRedisGenericCommand().
Bug introduced when adding a fast path to avoid copying the reply buffer
for small replies that fit into the client static buffer.
2014-05-07 16:12:32 +02:00
antirez
c49955fd77 Scripting: replace tolower() with faster code in evalGenericCommand().
The function showed up consuming a non trivial amount of time in the
profiler output. After this change benchmarking gives a 6% speed
improvement that can be consistently measured.
2014-05-07 16:12:32 +02:00
antirez
0ef4f44c5a Scripting: luaRedisGenericCommand() fast path for buffer-only replies.
When the reply is only contained in the client static output buffer, use
a fast path avoiding the dynamic allocation of an SDS string to
concatenate the client reply objects.
2014-05-07 16:12:32 +02:00
antirez
8226be61ec Define HAVE_ATOMIC for clang. 2014-05-07 16:12:32 +02:00
antirez
40abeb1f40 Scripting: simpler reply buffer creation in luaRedisGenericCommand().
It if faster to just create the string with a single sdsnewlen() call.
If c->bufpos is zero, the call will simply be like sdsemtpy().
2014-05-07 16:12:32 +02:00
antirez
11d9ecb71d CLUSTER SET-CONFIG-EPOCH implemented.
Initially Redis Cluster accepted that after cluster creation all the
nodes were at configEpoch 0, evolving from zero as failovers happen.

However later the semantic was made more strict in order to make sure a
cluster has always all the master nodes with a different configEpoch,
which is more robust in some corner case (especially resulting from
errors by the system administrator).

To assign different configEpochs to different nodes at startup was a
task performed naturally by the config conflicts resolution algorithm
(see the Cluster specification). However this works well only for small
clusters or when there are actually just a few collisions, since it is
designed for exceptional cases.

When a large cluster is created hundred of nodes can be at epoch 0, so
the conflict resolution code is slow to provide an unique config to each
node. For this reason this new command was introduced. It can be called
only when a node is totally fresh: no other nodes known, and configEpoch
set to zero, so it is safe even against misuses.

redis-trib will use the new command in order to start the cluster
already setting an incremental unique config to every node.
2014-04-29 19:15:16 +02:00
antirez
0bcc7cb4bf CLIENT LIST speedup via peerid caching + smart allocation.
This commit adds peer ID caching in the client structure plus an API
change and the use of sdsMakeRoomFor() in order to improve the
reallocation pattern to generate the CLIENT LIST output.

Both the changes account for a very significant speedup.
2014-04-28 17:36:57 +02:00
antirez
f9a4a80f49 Use sdscatfmt() in getClientInfoString() to make it faster. 2014-04-28 16:55:43 +02:00
antirez
2d76736a2e Added new sdscatfmt() %u and %U format specifiers.
This commit also fixes a bug in the implementation of sdscatfmt()
resulting from stale references to the SDS string header after
sdsMakeRoomFor() calls.
2014-04-28 16:38:17 +02:00
antirez
53575c4708 sdscatfmt() added to SDS library.
sdscatprintf() relies on printf() family libc functions and is sometimes
too slow in critical code paths. sdscatfmt() is an alternative which is:

1) Far less capable.
2) Format specifier uncompatible.
3) Faster.

It is suitable to be used in those speed critical code paths such as
CLIENT LIST output generation.
2014-04-28 16:23:17 +02:00
antirez
e29d330724 Process events with processEventsWhileBlocked() when blocked.
When we are blocked and a few events a processed from time to time, it
is smarter to call the event handler a few times in order to handle the
accept, read, write, close cycle of a client in a single pass, otherwise
there is too much latency added for clients to receive a reply while the
server is busy in some way (for example during the DB loading).
2014-04-24 21:44:32 +02:00
antirez
3a3458ee7b Accept multiple clients per iteration.
When the listening sockets readable event is fired, we have the chance
to accept multiple clients instead of accepting a single one. This makes
Redis more responsive when there is a mass-connect event (for example
after the server startup), and in workloads where a connect-disconnect
pattern is used often, so that multiple clients are waiting to be
accepted continuously.

As a side effect, this commit makes the LOADING, BUSY, and similar
errors much faster to deliver to the client, making Redis more
responsive when there is to return errors to inform the clients that the
server is blocked in an not interruptible operation.
2014-04-24 21:44:32 +02:00
antirez
cac4bae11a AE_ERR -> ANET_ERR in acceptUnixHandler().
No actual changes since the value is the same.
2014-04-24 21:43:22 +02:00
antirez
7d9b45b4a1 While ANET_ERR is -1, check syscall retval for -1 itself. 2014-04-24 17:03:07 +02:00
antirez
e3cf812c9e clusterLoadConfig() REDIS_ERR retval semantics refined.
We should return REDIS_ERR to signal we can't read the configuration
because there is no config file only after checking errno, othewise
we risk to rewrite an existing file that was not accessible for some
other reason.
2014-04-24 16:23:03 +02:00
antirez
db06108bc1 Lock nodes.conf to avoid multiple processes using the same file.
This was a common source of problems among users.
The solution adopted is not bullet-proof as if the user deletes the
nodes.conf file manually, and starts a new instance with the same
nodes.conf file path, two instances will use the same file. However
following this reasoning the user may drop a nuclear bomb into the
datacenter as well.
2014-04-24 16:04:10 +02:00
Salvatore Sanfilippo
32c917964e Merge pull request #1677 from mattsta/expire-before-delete
Check key expiration before deleting
2014-04-23 16:13:49 +02:00
Glauber Costa
7dd4432798 fix null pointer access with no file pointer
I happen to be working on a system that lacks urandom. While the code does try
to handle this case and artificially create some bytes if the file pointer is
empty, it does try to close it unconditionally, leading to a segfault.
2014-04-23 12:07:25 +02:00
Salvatore Sanfilippo
e0918a332d Merge pull request #1701 from kingsumos/node_description
fix cluster node description showing wrong slot allocation
2014-04-23 11:37:47 +02:00
antirez
cb4e2ee9e7 Missing return REDIS_ERR added to processMultibulkBuffer().
When we set a protocol error we should return with REDIS_ERR to let the
caller know it should stop processing the client.

Bug found in a code auditing related to issue #1699.
2014-04-23 10:19:43 +02:00
kingsumos
a69178fdd2 fix cluster node description showing wrong slot allocation 2014-04-22 11:44:53 -04:00
antirez
20c040d364 redis-cli help.h updated. 2014-04-22 16:14:38 +02:00
antirez
ab3afe2f4d ZREMRANGEBYLEX memory leak removed calling zslFreeLexRange(). 2014-04-18 13:01:04 +02:00
antirez
5eb7ac0c92 Speedup hllRawSum() processing 8 bytes per iteration.
The internal HLL raw encoding used by PFCOUNT when merging multiple keys
is aligned to 8 bits (1 byte per register) so we can exploit this to
improve performances by processing multiple bytes per iteration.

In benchmarks the new code was several times faster with HLLs with many
registers set to zero, while no slowdown was observed with populated
HLLs.
2014-04-17 18:05:27 +02:00
antirez
192a213274 Speedup SUM(2^-reg[m]) in HyperLogLog computation.
When the register is set to zero, we need to add 2^-0 to E, which is 1,
but it is faster to just add 'ez' at the end, which is the number of
registers set to zero, a value we need to compute anyway.
2014-04-17 17:53:20 +02:00
antirez
0feb2aabca PFCOUNT support for multi-key union. 2014-04-17 17:32:59 +02:00
antirez
fcd2155b6f HyperLogLog low level merge extracted from PFMERGE. 2014-04-17 17:08:43 +02:00
antirez
78954ca3a2 ZREMRANGEBYLEX implemented. 2014-04-17 14:49:25 +02:00
antirez
8827dc4eec Always pass sorted set range objects by reference. 2014-04-17 14:30:12 +02:00
antirez
95098b7230 ZREMRANGE* commands refactored into a single generic function. 2014-04-17 14:19:14 +02:00
antirez
bcab07f7fc Pass by pointer and release of lex ranges.
Given that the code was written with a 2 years pause... something
strange happened in the middle. So there was no function to free a
lex range min/max objects, and in some places the range was passed by
value.
2014-04-16 23:55:58 +02:00
antirez
8b5e0b213e ZLEXCOUNT implemented.
Like ZCOUNT for lexicographical ranges.
2014-04-16 12:17:00 +02:00
antirez
8e8f8189eb HyperLogLog invalid representation error code set to INVALIDOBJ. 2014-04-16 09:10:30 +02:00
antirez
0bbdaca6a0 PFDEBUG TODENSE added.
Converts HyperLogLogs from sparse to dense. Used for testing.
2014-04-16 09:05:42 +02:00
antirez
402110f9fd User-defined switch point between sparse-dense HLL encodings. 2014-04-15 17:46:51 +02:00
antirez
d541f65d66 PFSELFTEST improved with sparse encoding checks. 2014-04-15 10:10:38 +02:00
antirez
dde8dff73f PFDEBUG ENCODING added. 2014-04-14 19:35:00 +02:00
antirez
54f0156e8c Set HLL_SPARSE_MAX to 3000.
After running a few benchmarks, 3000 looks like a reasonable value to
keep HLLs with a few thousand elements small while the CPU cost is
still not huge.

This covers all the cases where the dense representation would use N
orders of magnitude more space, like in the case of many HLLs with
carinality of a few tens or hundreds.

It is not impossible that in the future this gets user configurable,
however it is easy to pick an unreasoable value just looking at savings
in the space dimension without checking what happens in the time
dimension.
2014-04-14 16:15:55 +02:00
antirez
848d0461f9 Error message for invalid HLL objects unified. 2014-04-14 16:11:54 +02:00
antirez
81ceef7d22 PFMERGE fixed to work with sparse encoding. 2014-04-14 16:09:32 +02:00
antirez
9df77fc0c4 Mark PFDEBUG as write command in the commands table.
It is safer since it is able to have side effects.
2014-04-14 15:57:50 +02:00
antirez
3bc35f9ce9 Correctly replicate PFDEBUG GETREG.
Even if it is a debugging command, make sure that when it forces a
change in encoding, the command is propagated.
2014-04-14 15:57:19 +02:00
antirez
ba0afb4566 Added assertion in hllSparseAdd() when promotion to dense occurs.
If we converted to dense, a register must be updated in the dense
representation.
2014-04-14 15:55:21 +02:00
antirez
e9cd51c7eb hllSparseAdd(): speed optimization.
Mostly by reordering opcodes check conditional by frequency of opcodes
in larger sparse-encoded HLLs.
2014-04-14 15:42:05 +02:00
antirez
681bf7468b Detect corrupted sparse HLLs in hllSparseSum(). 2014-04-14 15:20:26 +02:00
antirez
db40da0a47 hllSparseAdd(): faster code removing conditional.
Bottleneck found profiling. Big run time improvement found when testing
after the change.
2014-04-14 12:58:46 +02:00
antirez
4e0a99ba51 Comment typo in hllSparseAdd(). first -> fits. 2014-04-14 12:12:53 +02:00
antirez
5532b5308a Merge adjacent VAL opcodes in hllSparseAdd().
As more values are added splitting ZERO or XZERO opcodes, try to merge
adjacent VAL opcodes if they have the same value.
2014-04-14 12:11:39 +02:00
antirez
837ca39081 More robust HLL_SPARSE macros protecting 'p' with parens.
Now the macros will work with arguments such as "ptr+1".
2014-04-14 11:49:53 +02:00
antirez
142d133c8a hllSparseAdd() opcode seek stop condition fixed. 2014-04-14 11:04:11 +02:00
antirez
1ee18db922 Fixed error message generation in PFDEBUG GETREG.
Bulk length for registers was emitted too early, so if there was a bug
the reply looked like a long array with just one element, blocking the
client as result.
2014-04-14 10:25:19 +02:00
antirez
82c31f750d Fixed memmove() count in hllSparseAdd(). 2014-04-14 09:40:07 +02:00
antirez
3b20003503 hllSparseAdd(): more correct dense conversion conditional.
We want to promote if the total string size exceeds the resulting size
after the upgrade.
2014-04-14 09:36:32 +02:00
antirez
b7571b7453 hllSparseToDense(): sanity check added.
The function checks if all the HLL_REGISTERS were processed during the
convertion from sparse to dense encoding, returning REDIS_OK or
REDIS_ERR to signal a corruption problem.

A bug in PFDEBUG GETREG was fixed: when the object is converted to the
dense representation we need to reassign the new pointer to the header
structure pointer.
2014-04-14 09:27:01 +02:00
antirez
f9dc3cb04d PFDEBUG DECODE added.
Provides a human readable description of the opcodes composing a
run-length encoded HLL (sparse encoding).
The command is only useful for debugging / development tasks.
2014-04-14 09:00:53 +02:00
antirez
261da523e8 PFDEBUG added, PFGETREG removed.
PFDEBUG will be the interface to do debugging tasks with a key
containing an HLL object.
2014-04-13 23:01:21 +02:00
antirez
e8e717e145 hllSparseToDense API changed to take ref to object.
The new API takes directly the object doing everything needed to
turn it into a dense representation, including setting the new
representation as object->ptr.
2014-04-13 22:59:27 +02:00
antirez
2067644a8c hllSparseAdd() sanity check for span != 0 added. 2014-04-13 10:19:12 +02:00
antirez
80140fa006 Fix hllSparseAdd() new sequence replacement when next is NULL.
sdsIncrLen() must be called anyway even if we are replacing the last
oppcode of the sparse representation.
2014-04-12 23:55:44 +02:00
antirez
3c3c16561a Fix seqlen computation in hllSparseAdd(). 2014-04-12 23:52:36 +02:00
antirez
a9e057e095 Abstract hllSparseAdd() / hllDenseAdd() via hllAdd(). 2014-04-12 23:42:56 +02:00
antirez
0b7d08efb9 hllSparseSum(): multiply 1 * runlen for zero entries. 2014-04-12 16:47:50 +02:00
antirez
d9314079ca Macro HLL_SPARSE_XZERO_LEN fixed. 2014-04-12 16:46:08 +02:00
antirez
f5c03044a6 Fix HLL sparse object creation #2.
Two vars initialized to wrong values in createHLLObject().
2014-04-12 16:37:50 +02:00
antirez
b5659cb0a6 Increment pointer while iterating sparse HLL object. 2014-04-12 11:02:14 +02:00
antirez
1ccb661569 Fix HLL sparse object creation.
The function didn't considered the fact that each XZERO opcode is
two bytes.
2014-04-12 10:59:12 +02:00
antirez
a79386b1af Create HyperLogLog objects with sparse encoding. 2014-04-12 10:56:18 +02:00
antirez
1fc04a6221 HyperLogLog sparse to dense conversion function. 2014-04-12 10:55:42 +02:00
antirez
c756936b1d HyperLogLog sparse representation initial implementation.
Code never tested, but the basic layout is shaped in this commit.
Also missing:

1) Sparse -> Dense conversion function.
2) New HLL object creation using the sparse representation.
3) Implementation of PFMERGE for the sparse representation.
2014-04-11 17:34:32 +02:00
antirez
8ea5b46d30 hllCount() refactored to support multiple representations. 2014-04-11 10:25:07 +02:00
antirez
1efc1e052d hllAdd() refactored into two functions.
Also dense representation access macro renamed accordingly.
2014-04-11 09:47:52 +02:00
antirez
d55474e558 HyperLogLog refactoring to support different encodings.
Metadata are now placed at the start of the representation as an header.
There is a proper structure to access the representation.
Still work to do in order to truly abstract the implementation from the
representation, commands still work assuming dense representation.
2014-04-11 09:26:45 +02:00
Matt Stancliff
83d2830372 Check key expiration before deleting
Deleting an expired key should return 0, not success.

Fixes #1648
2014-04-10 17:08:02 -04:00
antirez
9c037ba85f HyperLogLog sparse representation slightly modified.
After running a few simulations with different alternative encodings,
it was found that the VAL opcode performs better using 5 bits for the
value and 2 bits for the run length, at least for cardinalities in the
range of interest.
2014-04-10 16:36:31 +02:00
antirez
da2fbcf93d HyperLogLog sparse representation description and macros. 2014-04-09 18:56:00 +02:00
antirez
67bb2c46b2 Add casting to match printf format.
adjustOpenFilesLimit() and clusterUpdateSlotsWithConfig() that were
assuming uint64_t is the same as unsigned long long, which is true
probably for all the systems out there that we target, but still GCC
emitted a warning since technically they are two different types.
2014-04-07 08:58:06 +02:00
antirez
3a6a1e42f1 ZRANGEBYLEX and ZREVRANGEBYLEX implementation. 2014-04-05 11:41:43 +02:00
antirez
d5be696db8 PFCOUNT: always unshare/decode the object.
This will be a non-op most of the times since the object will be
unshared / decoded, however it is more technically correct to start this
way since the object may be decoded even in the read-only code path.
2014-04-04 17:25:55 +02:00
antirez
1c12bcbcfb tryObjectEncoding() refactoring.
We also avoid to re-create an object that is already in EMBSTR encoding.
2014-04-04 17:25:35 +02:00
antirez
433ce7f85c Changed HyperLogLog hash seed to a non-zero value.
Using a seed of zero has the side effect of having the empty string
hashing to what is a very special case in the context of HyperLogLog: a
very long run of zeroes.

This did not influenced the correctness of the result with 16k registers
because of the harmonic mean, but still it is inconvenient that a so
obvious value maps to a so special hash.

The seed 0xadc83b19 is used instead, which is the first 64 bits of the
SHA1 of the empty string.

Reference: issue #1657.
2014-04-04 09:36:32 +02:00
antirez
d2ca4bb62d Return "WRONGTYPE" error on PF* type mismatch. 2014-04-03 22:10:20 +02:00
antirez
349c978189 Fix PFADD infinite loop.
We need to guarantee that the last bit is 1, otherwise an element may
hash to just zeroes with probability 1/(2^64) and trigger an infinite
loop.

See issue #1657.
2014-04-03 19:31:26 +02:00
antirez
ce637b2fef Remove HyperLogLog type checking duplicated code. 2014-04-03 13:20:34 +02:00
antirez
aaaed66c56 PFGETREG added for testing purposes.
The new command allows to get a dump of the registers stored
into an HyperLogLog data structure for testing / debugging purposes.
2014-04-03 10:45:30 +02:00
antirez
9682295f68 PFCOUNT: unshare the object when cached cardinality is modified. 2014-04-03 10:37:32 +02:00
antirez
be9860d0e9 PFSELFTEST improved to test the approximation error. 2014-04-03 10:18:31 +02:00
antirez
096b5e921e HyperLogLog: added magic / version.
This will allow future changes like compressed representations.
Currently the magic is not checked for performance reasons but this may
change in the future, for example if we add new types encoded in strings
that may have the same size of HyperLogLogs.
2014-04-02 09:58:47 +02:00
Raymond Myers
bf066c875f Fixed pfadd/pfcount commands emitting hll* events instead of pf* events 2014-04-01 14:59:13 -07:00
Raymond Myers
f0868e080d Change HLL* to PF* in error messages 2014-04-01 14:54:31 -07:00
antirez
4ab162a559 Include redis.h before other stuff in hyperloglog.c.
Otherwise fmacros.h is included later and this may break compilation on
different systems.
2014-04-01 15:52:15 +02:00
antirez
5afcca34ce HyperLogLog API prefix modified from "P" to "PF".
Using both the initials of Philippe Flajolet instead of just "P".
2014-03-31 22:48:01 +02:00
antirez
ba4e20835a Makefile.dep updated with hyperloglog.o deps. 2014-03-31 19:51:34 +02:00
antirez
e887c62e45 HyperLogLog: make API use the P prefix in honor of Philippe Flajolet. 2014-03-31 19:29:40 +02:00
antirez
f1b7608128 HLLMERGE fixed by adding a... missing loop! 2014-03-31 16:03:05 +02:00
antirez
ec1ee66256 HyperLogLog apply bias correction using a polynomial.
Better results can be achieved by compensating for the bias of the raw
approximation just after 2.5m (when LINEARCOUNTING is no longer used) by
using a polynomial that approximates the bias at a given cardinality.

The curve used was found using this web page:

    http://www.xuru.org/rt/PR.asp

That performs polynomial regression given a set of values.
2014-03-31 15:41:38 +02:00
antirez
f2277475b2 HLLMERGE implemented.
Merge N HLL data structures by selecting the max value for every
M[i] register among the set of HLLs.
2014-03-31 14:39:44 +02:00
antirez
4ab45183fc HLLCOUNT is technically a write command
When we update the cached value, we need to propagate the command and
signal the key as modified for WATCH.
2014-03-31 12:29:24 +02:00
antirez
8aeb0c196a HLLADD: propagate write when only variable name is given.
The following form is given:

    HLLADD myhll

No element is provided in the above case so if 'myhll' var does not
exist the result is to just create an empty HLL structure, and no update
will be performed on the registers.

In this case, the DB should still be set dirty and the command
propagated.
2014-03-31 12:21:08 +02:00
antirez
60e60f4ee0 HyperLogLog: use LINEARCOUNTING up to 3m.
The HyperLogLog original paper suggests using LINEARCOUNTING for
cardinalities < 2.5m, however for P=14 the median / max error
curves show that a value of '3' is the best pick for m = 16384.
2014-03-31 10:09:55 +02:00
antirez
307a189900 HyperLogLog approximated cardinality caching.
The more we add elements to an HyperLogLog counter, the smaller is
the probability that we actually update some register.

From this observation it is easy to see how it is possible to use
caching of a previously computed cardinality and reuse it to serve
HLLCOUNT queries as long as no register was updated in the data
structure.

This commit does exactly this by using just additional 8 bytes for the
data structure to store a 64 bit unsigned integer value cached
cardinality. When the most significant bit of the 64 bit integer is set,
it means that the value computed is no longer usable since at least a
single register was modified and we need to recompute it at the next
call of HLLCOUNT.

The value is always stored in little endian format regardless of the
actual CPU endianess.
2014-03-30 19:26:16 +02:00
antirez
543ede03f2 String value unsharing refactored into proper function.
All the Redis functions that need to modify the string value of a key in
a destructive way (APPEND, SETBIT, SETRANGE, ...) require to make the
object unshared (if refcount > 1) and encoded in raw format (if encoding
is not already REDIS_ENCODING_RAW).

This was cut & pasted many times in multiple places of the code. This
commit puts the small logic needed into a function called
dbUnshareStringValue().
2014-03-30 18:32:17 +02:00
antirez
aaf6db459b Use endian neutral hash function for HyperLogLog.
We need to be sure that you can save a dataset in a Redis instance,
reload it in a different architecture, and continue to count in the same
HyperLogLog structure.

So 32 and 64 bit, little or bit endian, must all guarantee to output the
same hash for the same element.
2014-03-30 00:55:49 +01:00
antirez
4628ac0065 HyperLogLog internal representation modified.
The new representation is more obvious, starting from the LSB of the
first byte and using bits going to MSB, and passing to next byte as
needed.

There was also a subtle error: first two bits were unused, everything
was carried over on the right of two bits, even if it worked because of
the code requirement of always having a byte more at the end.

During the rewrite the code was made safer trying to avoid undefined
behavior due to shifting an uint8_t for more than 8 bits.
2014-03-29 16:04:27 +01:00
antirez
5317a582cf Remove a few useless operations from hllCount() fast path. 2014-03-29 12:17:56 +01:00
antirez
3ed947fb30 HLLCOUNT 3x faster taking fast path for default params. 2014-03-29 12:12:44 +01:00
antirez
28dce36f76 Use processor base types in HLL_(GET|SET)_REGISTER.
This speedups the macros by a noticeable factor.
2014-03-29 08:37:01 +01:00
antirez
ac8fbe8829 HyperLogLog: use precomputed table for 2^(-M[i]). 2014-03-28 22:49:24 +01:00
antirez
f90a4af3d7 HyperLogLog algorithm fixed in two ways.
There was an error in the computation of 2^register, and the sequence of
zeroes computed after the hashing did not included the "1".
2014-03-28 18:24:05 +01:00
antirez
ded86076b3 HLLCOUNT implemented. 2014-03-28 17:37:18 +01:00
antirez
156929ee97 HLLADD implemented. 2014-03-28 16:24:35 +01:00
antirez
5660ff1cc1 hllAdd() low level HyperLogLog "add" implemented. 2014-03-28 14:42:30 +01:00
antirez
e3234116ad HyperLogLog: redefine constants using "P". 2014-03-28 14:09:28 +01:00
antirez
e73839e7d5 HLL_SET_REGISTER fixed.
There was an error in the first version of the macro.
Now the HLLSELFTEST test reports success.
2014-03-28 13:56:07 +01:00
antirez
f22397dd7f Use REDIS_HLL_REGISTER_MAX when possible. 2014-03-28 12:16:39 +01:00
antirez
1c88c5941b HLL_(SET|GET)_REGISTER types fixed. 2014-03-28 12:15:46 +01:00
antirez
552eb5407a HLLSELFTEST command implemented.
To test the bitfield array of counters set/get macros from the Redis Tcl
suite is hard, so a specialized command that is able to test the
internals was developed.
2014-03-28 12:11:55 +01:00
antirez
0609380603 HyperLogLog: initial sketch of registers access. 2014-03-28 11:18:48 +01:00
antirez
8f52173b2c Cluster: last_vote_epoch -> lastVoteEpoch.
Use cammel case for epochs that are persisted on disk.
2014-03-27 15:01:24 +01:00
antirez
7fb14b73ba Cluster: save/restore vars that must persist after recovery.
This fixes issue #1479.
2014-03-27 14:56:29 +01:00
antirez
6dd2dbbd36 Cluster: handshake "already known" error logged to VERBOSE.
This is not really an error but something that always happens for
example when creating a new cluster, or if the sysadmin rejoins manually
a node that is already known.

Since useless logs don't help, moved to VERBOSE level.
2014-03-26 16:35:38 +01:00
antirez
3cf6f1f54f Cluster: clusterHandleConfigEpochCollision() fixed.
New config epochs must always be obtained incrementing the currentEpoch,
that is itself guaranteed to be >= the max configEpoch currently known
to the node.
2014-03-26 12:31:28 +01:00
antirez
80d4c52cdf Cluster: better logging for clusterUpdateSlotsConfigWith(). 2014-03-26 12:09:38 +01:00
antirez
eb746ec408 Cluster: CLUSTER SETSLOT implementation comment updated.
Update the comment since the implementation details changed.
2014-03-25 17:50:46 +01:00
antirez
0064b1a583 Cluster: redis-trib cluster allocation more even across nodes.
redis-trib used to allocate slots not considering fractions of nodes
when computing the slots_per_node amount. So the fractional part was
carried over till the end of the allocation, where the last node
received a few more slots than any other (or a lot more if the cluster
was composed of many nodes).

The computation was changed to allocate slots more evenly when they are
not exactly divisible for the number of masters we have.
2014-03-25 17:44:39 +01:00
antirez
6c527a89a0 Cluster: configEpoch collisions resolution.
The slave election in Redis Cluster guarantees that slaves promoted to
masters always end with unique config epochs, however failures during
manual reshardings, software bugs and operational errors may in theory
cause two nodes to have the same configEpoch.

This commit introduces a mechanism to eventually always end with different
configEpochs if a collision ever happens.

As a (wanted) side effect, this also ensures that after a new cluster
is created, all nodes will end with a different configEpoch automatically.
2014-03-25 17:19:58 +01:00
antirez
c1041c570f Cluster: stay within 80 cols. 2014-03-25 16:07:14 +01:00
antirez
6540e9eeaa Fix off by one bug in freeMemoryIfNeeded() eviction pool.
Bug found by the continuous integration test running the Redis
with valgrind:

==6245== Invalid read of size 8
==6245==    at 0x4C2DEEF: memcpy@GLIBC_2.2.5 (mc_replace_strmem.c:876)
==6245==    by 0x41F9E6: freeMemoryIfNeeded (redis.c:3010)
==6245==    by 0x41D2CC: processCommand (redis.c:2069)

memmove() size argument was accounting for an extra element, going
outside the bounds of the array.
2014-03-25 10:32:15 +01:00
antirez
6e33c908dd adjustOpenFilesLimit() refactoring.
In this commit:
* Decrement steps are semantically differentiated from the reserved FDs.
  Previously both values were 32 but the meaning was different.
* Make it clear that we save setrlimit errno.
* Don't explicitly handle wrapping of 'f', but prevent it from
  happening.
* Add comments to make the function flow more readable.

This integrates PR #1630
2014-03-25 09:05:28 +01:00
Salvatore Sanfilippo
72c5ebcba4 Merge pull request #1630 from mattsta/fix-infinite-loop-ulimit
Fix infinite loop ulimit
2014-03-25 08:42:39 +01:00
antirez
35667d75c3 Fixed undefined variable value with certain code paths.
In sentinelFlushConfig() fd could be undefined when the following if
statement was true:

        if (rewrite_status == -1) goto werr;

This could cause random file descriptors to get closed.
2014-03-24 21:07:44 +01:00
Matt Stancliff
78782ed59f Use LRU_CLOCK() instead of function getLRUClock()
lookupKey() uses LRU_CLOCK(), so it seems object
creation should use LRU_CLOCK() too.
2014-03-24 14:39:26 -04:00
Matt Stancliff
4290455145 Sentinel: Notify user when config can't be saved 2014-03-24 13:54:14 -04:00
Matt Stancliff
b47b343fab Fix data loss when save AOF/RDB with no free space
Previously, the (!fp) would only catch lack of free space
under OS X.  Linux waits to discover it can't write until
it actually writes contents to disk.

(fwrite() returns success even if the underlying file
has no free space to write into.  All the errors
only show up at flush/sync/close time.)

Fixes antirez/redis#1604
2014-03-24 13:54:14 -04:00
Salvatore Sanfilippo
906c4d77c0 Merge pull request #1617 from mattsta/remove-unused-warning
Cluster: remove variable causing warning
2014-03-24 18:33:22 +01:00
Salvatore Sanfilippo
8e6625e6ae Merge pull request #1629 from mattsta/fix-trib-master-assignment
Cluster: Restore proper trib master iteration
2014-03-24 18:31:55 +01:00
Salvatore Sanfilippo
a006fcb8a7 Merge pull request #1628 from mattsta/fix-trib-create
Cluster: Fix trib create when masters==replicas
2014-03-24 18:26:17 +01:00
Matt Stancliff
386a46946b Fix potentially incorrect errno usage
errno may be reset by the previous call to redisLog, so capture
the original value for proper error reporting.
2014-03-24 13:21:15 -04:00
Matt Stancliff
3b54ee6ea4 Add REDIS_MIN_RESERVED_FDS define for open fds
Also update the original REDIS_EVENTLOOP_FDSET_INCR to
include REDIS_MIN_RESERVED_FDS. REDIS_EVENTLOOP_FDSET_INCR
exists to make sure more than (maxclients+RESERVED) entries
are allocated, but we can only guarantee that if we include
the current value of REDIS_MIN_RESERVED_FDS as a minimum
for the INCR size.
2014-03-24 13:15:35 -04:00
Salvatore Sanfilippo
896e15f3e3 Merge pull request #1627 from badboy/lru-fix
Fixed a few typos.
2014-03-24 18:13:39 +01:00
Matt Stancliff
e942f3ce0f Cluster: Restore proper trib master iteration
This got removed in 2e5c394 during a new feature addition.

The prior commit had "break if masters.length == masters_count"
but we are guaranteed to aready have that condition met since
otherwise we would haven't gotten this far.

Without this break statement, it's possible some masters may
be forgotten and have zero replicas while other masters have
more than their requested number of replicas.

Thanks to carlos for pointing out this regression at:
https://groups.google.com/forum/#!topic/redis-db/_WVVqDw5B7c
2014-03-24 10:17:44 -04:00
Matt Stancliff
df4bdbf688 Cluster: Fix trib create when masters==replicas
This bug was introduced in 2e5c394f during a refactor.

It took me a while to understand what was going on with
the code, so I've refactored it further by:
  - Replacing boolean values with meaningful symbols
  - Replacing 'i' with a meaningful variable name
  - Adding the proper abort check
  - Factoring out now duplicated conditionals
  - Adding optional verbose logging (we're inside *four*
    different looping constructs, so it takes a while to
    figure out where all the moving parts are)
  - Updating comment for the section

This fixes a problem when the number of master instances
equaled the number of replica instances.  Before, when
there were equal numbers of both, nodes_count would go to
zero, but the while loop would spin in i < @replicas because
i would never be updated (because the nodes_list of each ip
was length == 0, which triggered an endless loop of
next -> i = 0 -> 0 < 1? -> true -> next -> i = 0 ...)

Thanks to carlo who found this problem at:
https://groups.google.com/forum/#!topic/redis-db/_WVVqDw5B7c
2014-03-24 10:17:38 -04:00
Matt Stancliff
90b844212d Fix infinite loop on startup if ulimit too low
Fun fact: rlim_t is an unsigned long long on all platforms.

Continually subtracting from a rlim_t makes it get smaller
and smaller until it wraps, then you're up to 2^64-1.

This was causing an infinite loop on Redis startup if
your ulimit was extremely (almost comically) low.

The case of (f > oldlimit) would never be met in a case like:

    f = 150
    while (f > 20) f -= 128

Since f is unsigned, it can't go negative and would
take on values of:

    Iteration 1: 150 - 128 => 22
    Iteration 2:  22 - 128 => 18446744073709551510
    Iterations 3-∞: ...

To catch the wraparound, we use the previous value of f
stored in limit.rlimit_cur.  If we subtract from f and
get a larger number than the value it had previously,
we print an error and exit since we don't have enough
file descriptors to help the user at this point.

Thanks to @bs3g for the inspiration to fix this problem.
Patches existed from @bs3g at antirez#1227, but I needed to repair a few other
parts of Redis simultaneously, so I didn't get a chance to use them.
2014-03-24 10:17:33 -04:00
Matt Stancliff
4a25983f8f Improve error handling around setting ulimits
The log messages about open file limits have always
been slightly opaque and confusing.  Here's an attempt to
fix their wording, detail, and meaning.  Users will have a
better understanding of how to fix very common problems
with these reworded messages.

Also, we handle a new error case when maxclients becomes less
than one, essentially rendering the server unusable.  We
now exit on startup instead of leaving the user with a server
unable to handle any connections.

This fixes antirez#356 as well.
2014-03-24 10:17:33 -04:00
Matt Stancliff
491532a713 Replace magic 32 with REDIS_EVENTLOOP_FDSET_INCR
32 was the additional number of file descriptors Redis
would reserve when managing a too-low ulimit.  The
number 32 was in too many places statically, so now
we use a macro instead that looks more appropriate.

When Redis sets up the server event loop, it uses:
    server.maxclients+REDIS_EVENTLOOP_FDSET_INCR

So, when reserving file descriptors, it makes sense to
reserve at least REDIS_EVENTLOOP_FDSET_INCR FDs instead
of only 32.  Currently, REDIS_EVENTLOOP_FDSET_INCR is
set to 128 in redis.h.

Also, I replaced the static 128 in the while f < old loop
with REDIS_EVENTLOOP_FDSET_INCR as well, which results
in no change since it was already 128.

Impact: Users now need at least maxclients+128 as
their open file limit instead of maxclients+32 to obtain
actual "maxclients" number of clients.  Redis will carve
the extra REDIS_EVENTLOOP_FDSET_INCR file descriptors it
needs out of the "maxclients" range instead of failing
to start (unless the local ulimit -n is too low to accomidate
the request).
2014-03-24 10:17:33 -04:00
Matt Stancliff
c138631cd1 Fix maxclients error handling
Everywhere in the Redis code base, maxclients is treated
as an int with (int)maxclients or `maxclients = atoi(source)`,
so let's make maxclients an int.

This fixes a bug where someone could specify a negative maxclients
on startup and it would work (as well as set maxclients very high)
because:

    unsigned int maxclients;
    char *update = "-300";
    maxclients = atoi(update);
    if (maxclients < 1) goto fail;

But, (maxclients < 1) can only catch the case when maxclients
is exactly 0.  maxclients happily sets itself to -300, which isn't
-300, but rather 4294966996, which isn't < 1, so... everything
"worked."

maxclients config parsing checks for the case of < 1, but maxclients
CONFIG SET parsing was checking for case of < 0 (allowing
maxclients to be set to 0).  CONFIG SET parsing is now updated to
match config parsing of < 1.

It's tempting to add a MINIMUM_CLIENTS define, but... I didn't.

These changes were inspired by antirez#356, but this doesn't
fix that issue.
2014-03-24 10:17:33 -04:00
antirez
93253c2762 Sample and cache RSS in serverCron().
Obtaining the RSS (Resident Set Size) info is slow in Linux and OSX.
This slowed down the generation of the INFO 'memory' section.

Since the RSS does not require to be a real-time measurement, we
now sample it with server.hz frequency (10 times per second by default)
and use this value both to show the INFO rss field and to compute the
fragmentation ratio.

Practically this does not make any difference for memory profiling of
Redis but speeds up the INFO call significantly.
2014-03-24 12:00:20 +01:00
antirez
30639c8ca9 sdscatvprintf(): Try to use a static buffer.
For small content the function now tries to use a static buffer to avoid
a malloc/free cycle that is too costly when the function is used in the
context of performance critical code path such as INFO output generation.

This change was verified to have positive effects in the execution speed
of the INFO command.
2014-03-24 10:20:33 +01:00
antirez
d3efe04c47 Cache uname() output across INFO calls.
Uname was profiled to be a slow syscall. It produces always the same
output in the context of a single execution of Redis, so calling it at
every INFO output generation does not make too much sense.

The uname utsname structure was modified as a static variable. At the
same time a static integer was added to check if we need to call uname
the first time.
2014-03-24 10:00:08 +01:00
antirez
a9caca0424 sdscatvprintf(): guess buflen using format length.
sdscatvprintf() uses a loop where it tries to output the formatted
string in a buffer of the initial length, if there was not enough room,
a buffer of doubled size is tried and so forth.

The initial guess for the buffer length was very poor, an hardcoded
"16". This caused the printf to be processed multiple times without a
good reason. Given that printf functions are already not fast, the
overhead was significant.

The new heuristic is to use a buffer 4 times the length of the format
buffer, and 32 as minimal size. This appears to be a good balance for
typical uses of the function inside the Redis code base.

This change improved INFO command performances 3 times.
2014-03-24 09:44:11 +01:00
antirez
4d2e8fa189 Use getLRUClock() instead of server.lruclock to create objects.
Thanks to Matt Stancliff for noticing this error. It was in the original
code but somehow I managed to remove the change from the commit...
2014-03-21 09:08:20 +01:00
antirez
5fa3248bad The default maxmemory policy is now noeviction.
This is safer as by default maxmemory should just set a memory limit
without any key to be deleted, unless the policy is set to something
more relaxed.
2014-03-21 08:03:34 +01:00
Jan-Erik Rediger
4fdd7a0546 Fixed a few typos. 2014-03-20 23:16:38 +01:00
antirez
a98369929e Use 24 bits for the lru object field and improve resolution.
There were 2 spare bits inside the Redis object structure that are now
used in order to enlarge 4x the range of the LRU field.

At the same time the resolution was improved from 10 to 1 second: this
still provides 194 days before the LRU counter overflows (restarting from
zero).

This is not a problem since it only causes lack of eviction precision for
objects not touched for a very long time, and the lack of precision is
only temporary.
2014-03-20 17:56:27 +01:00
antirez
f4da796c53 Default LRU samples is now 5. 2014-03-20 17:05:42 +01:00
antirez
c641b670c3 Use new dictGetRandomKeys() API to get samples for eviction.
The eviction quality degradates a bit in my tests, but since the API is
faster, it allows to raise the number of samples, and overall is a win.
2014-03-20 16:52:12 +01:00
antirez
82b53c650c struct dictEntry -> dictEntry. 2014-03-20 16:20:37 +01:00
antirez
5317f5e99a Added dictGetRandomKeys() to dict.c: mass get random entries.
This new function is useful to get a number of random entries from an
hash table when we just need to do some sampling without particularly
good distribution.

It just jumps at a random place of the hash table and returns the first
N items encountered by scanning linearly.

The main usefulness of this function is to speedup Redis internal
sampling of the key space, for example for key eviction or expiry.
2014-03-20 15:50:46 +01:00
antirez
22c9cfaf57 LRU eviction pool implementation.
This is an improvement over the previous eviction algorithm where we use
an eviction pool that is persistent across evictions of keys, and gets
populated with the best candidates for evictions found so far.

It allows to approximate LRU eviction at a given number of samples
better than the previous algorithm used.
2014-03-20 11:57:29 +01:00
antirez
6d5790d682 Fix OBJECT IDLETIME return value converting to seconds.
estimateObjectIdleTime() returns a value in milliseconds now, so we need
to scale the output of OBJECT IDLETIME to seconds.
2014-03-20 11:55:18 +01:00
antirez
ad6b0f70b2 Obtain LRU clock in a resolution dependent way.
For testing purposes it is handy to have a very high resolution of the
LRU clock, so that it is possible to experiment with scripts running in
just a few seconds how the eviction algorithms works.

This commit allows Redis to use the cached LRU clock, or a value
computed on demand, depending on the resolution. So normally we have the
good performance of a precomputed value, and a clock that wraps in many
days using the normal resolution, but if needed, changing a define will
switch behavior to an high resolution LRU clock.
2014-03-20 11:47:12 +01:00
antirez
1faf82663f Specify lruclock in redisServer structure via REDIS_LRU_BITS.
The padding field was totally useless: removed.
2014-03-20 11:37:27 +01:00
antirez
d77e231682 Specify LRU resolution in milliseconds. 2014-03-20 11:33:25 +01:00
antirez
fe30847016 Set LRU parameters via REDIS_LRU_BITS define. 2014-03-20 11:22:47 +01:00
antirez
e150ec7d0c Unify stats reset for CONFIG RESETSTAT / initServer().
Now CONFIG RESETSTAT makes sure to reset all the fields, and in the
future it will be simpler to avoid missing new fields.
2014-03-19 12:55:49 +01:00
Matt Stancliff
67ed5f00aa Cluster: remove variable causing warning
GCC-4.9 warned about this, but clang didn't.

This commit fixes warning:
sentinel.c: In function 'sentinelReceiveHelloMessages':
sentinel.c:2156:43: warning: variable 'master' set but not used [-Wunused-but-set-variable]
     sentinelRedisInstance *ri = c->data, *master;
2014-03-18 15:35:09 -04:00
antirez
b9e90a70fa Sentinel: sentinelRefreshInstanceInfo() minor refactoring.
Test sentinel.tilt condition on top and return if it is true.
This allows to remove the check for the tilt condition in the remaining
code paths of the function.
2014-03-18 15:35:47 +01:00
antirez
218cc5fc39 Sentinel: propagate down-after-ms changes to slaves and sentinels. 2014-03-18 14:37:44 +01:00
antirez
bb6d850160 Sentinel: down-after-milliseconds is not master-specific.
addReplySentinelRedisInstance() modified so that this field is displayed
for all the kind of instances: Sentinels, Masters, Slaves.
2014-03-18 11:21:17 +01:00
antirez
ae0b7680b3 Sentinel failure detection implementation improved.
Failure detection in Sentinel is ping-pong based. It used to work by
remembering the last time a valid PONG reply was received, and checking
if the reception time was too old compared to the current current time.

PINGs were sent at a fixed interval of 1 second.

This works in a decent way, but does not scale well when we want to set
very small values of "down-after-milliseconds" (this is the node
timeout basically).

This commit reiplements the failure detection making a number of
changes. Some changes are inspired to Redis Cluster failure detection
code:

* A new last_ping_time field is added in representation of instances.
  If non zero, we have an active ping that was sent at the specified
  time. When a valid reply to ping is received, the field is zeroed
  again.
* last_ping_time is not reset when we reconnect the link or send a new
  ping, so from our point of view it represents the time we started
  waiting for the instance to reply to our pings without receiving a
  reply.
* last_ping_time is now used in order to check if the instance is
  timed out. This means that we can have a node timeout of 100
  milliseconds and yet the system will work well since the new check is
  not bound to the period used to send pings.
* Pings are now sent every second, or often if the value of
  down-after-milliseconds is less than one second. With a lower limit of
  10 HZ ping frequency.
* Link reconnection code was improved. This is used in order to try to
  reconnect the link when we are at 50% of the node timeout without a
  valid reply received yet. However the old code triggered unnecessary
  reconnections when the node timeout was very small. Now that should be
  ok.

The new code passes the tests but more testing is needed and more unit
tests stressing the failure detector, so currently this is merged only
in the unstable branch.
2014-03-17 18:33:45 +01:00
antirez
3a2ff55617 Sentinel: use CLIENT SETNAME when connecting to Redis.
This makes debugging / monitoring of Sentinels simpler since you can
identify sentinels in CLIENT LIST output of Redis instances.
2014-03-15 14:59:23 +01:00
Matt Stancliff
584052ee6b Fix segfault from accessing array out of bounds
argc == 2; argv[2] == crash
2014-03-14 17:38:05 -04:00
antirez
ed813863f0 Sentinel: be safe under crash-recovery assumptions.
Sentinel's main safety argument is that there are no two configurations
for the same master with the same version (configuration epoch).

For this to be true Sentinels require to be authorized by a majority.
Additionally Sentinels require to do two important things:

* Never vote again for the same epoch.
* Never exchange an old vote for a fresh one.

The first prerequisite, in a crash-recovery system model, requires to
persist the master->leader_epoch on durable storage before to reply to
messages. This was not the case.

We also make sure to persist the current epoch in order to never reply
to stale votes requests from other Sentinels, after a recovery.

The configuration is persisted by making use of fsync(), this is
considered in the context of this code a good enough guarantee that
after a restart our durable state is restored, however this may not
always be the case depending on the kind of hardware and operating
system used.
2014-03-14 14:58:44 +01:00
antirez
365094028b Sentinel: fake PUBLISH command to receive HELLO messages.
Now the way HELLO messages are received is unified.
Now it is no longer needed for Sentinels to converge to the higher
configuration for a master to be able to chat via some Redis instance,
the are able to directly exchanges configurations.

Note that this commit does not include the (trivial) change needed to
send HELLO messages to Sentinel instances as well, since for an error I
committed the change in the previous commit that refactored hello
messages processing into a separated function.
2014-03-14 11:07:42 +01:00
antirez
9dfe426fc8 Sentinel: HELLO processing refactored into sentinelProcessHelloMessage(). 2014-03-14 11:07:42 +01:00
antirez
133fccb03f Cluster: flag the transaction as dirty for the new redirections. 2014-03-13 15:11:53 +01:00
antirez
429aff4ef4 Linenoise updated, multiline mode enabled in redis-cli. 2014-03-13 15:11:08 +01:00