Commit Graph

516 Commits

Author SHA1 Message Date
Matt Stancliff
d830dcb12d Add REDIS_BIND_ADDR access macro
We need to access (bindaddr[0] || NULL) in a few places, so centralize
access with a nice macro.
2014-06-23 11:44:34 +02:00
antirez
e7affd266c New features for CLIENT KILL. 2014-06-16 14:24:28 +02:00
antirez
f26f79ea37 Assign an unique non-repeating ID to each new client.
This will be used by CLIENT KILL and is also a good way to ensure a
given client is still the same across CLIENT LIST calls.

The output of CLIENT LIST was modified to include the new ID, but this
change is considered to be backward compatible as the API does not imply
you can do positional parsing, since each filed as a different name.
2014-06-16 14:22:55 +02:00
antirez
56d26c2380 Client types generalized.
Because of output buffer limits Redis internals had this idea of type of
clients: normal, pubsub, slave. It is possible to set different output
buffer limits for the three kinds of clients.

However all the macros and API were named after output buffer limit
classes, while the idea of a client type is a generic one that can be
reused.

This commit does two things:

1) Rename the API and defines with more general names.
2) Change the class of clients executing the MONITOR command from "slave"
   to "normal".

"2" is a good idea because you want to have very special settings for
slaves, that are not a good idea for MONITOR clients that are instead
normal clients even if they are conceptually slave-alike (since it is a
push protocol).

The backward-compatibility breakage resulting from "2" is considered to
be minimal to care, since MONITOR is a debugging command, and because
anyway this change is not going to break the format or the behavior, but
just when a connection is closed on big output buffer issues.
2014-06-16 10:43:05 +02:00
Salvatore Sanfilippo
c7f93143f6 Merge pull request #1669 from mattsta/blpop-internally-added-keys
Fix blocking operations from missing new lists
2014-06-09 11:37:28 +02:00
antirez
d34c2fa3bb ROLE command added.
The new ROLE command is designed in order to provide a client with
informations about the replication in a fast and easy to use way
compared to the INFO command where the same information is also
available.
2014-06-07 17:27:49 +02:00
antirez
d98fa718e0 Tag every log line with role.
Every log contains, just after the pid, a single character that provides
information about the role of an instance:

S - Slave
M - Master
C - Writing child
X - Sentinel
2014-05-22 18:48:37 +02:00
antirez
39603a7e31 Cluster: slave validity factor is now user configurable.
Check the commit changes in the example redis.conf for more information.
2014-05-22 16:57:54 +02:00
Matt Stancliff
33f943b4cd Fix blocking operations from missing new lists
Behrad Zari discovered [1] and Josiah reported [2]: if you block
and wait for a list to exist, but the list creates from
a non-push command, the blocked client never gets notified.

This commit adds notification of blocked clients into
the DB layer and away from individual commands.

Lists can be created by [LR]PUSH, SORT..STORE, RENAME, MOVE,
and RESTORE.  Previously, blocked client notifications were
only triggered by [LR]PUSH.  Your client would never get
notified if a list were created by SORT..STORE or RENAME or
a RESTORE, etc.

Blocked client notification now happens in one unified place:
  - dbAdd() triggers notification when adding a list to the DB

Two new tests are added that fail prior to this commit.

All test pass.

Fixes #1668

[1]: https://groups.google.com/forum/#!topic/redis-db/k4oWfMkN1NU
[2]: #1668
2014-05-21 09:52:52 -04:00
antirez
6baac558d8 Cluster: better handling of stolen slots.
The previous code handling a lost slot (by another master with an higher
configuration for the slot) was defensive, considering it an error and
putting the cluster in an odd state requiring redis-cli fix.

This was changed, because actually this only happens either in a
legitimate way, with failovers, or when the admin messed with the config
in order to reconfigure the cluster. So the new code instead will try to
make sure that the keys stored match the new slots map, by removing all
the keys in the slots we lost ownership from.

The function that deletes the keys from the lost slots is called only
if the node does not lose all its slots (resulting in a reconfiguration
as a slave of the node that got ownership). This is an optimization
since the replication code will anyway flush all the instance data in
a faster way.
2014-05-14 10:46:37 +02:00
antirez
5c78f87666 RESTORE: reply with -BUSYKEY special error code.
The error when the target key is busy was a generic one, while it makes
sense to be able to distinguish between the target key busy error and
the others easily.
2014-05-12 10:01:59 +02:00
antirez
0bcc7cb4bf CLIENT LIST speedup via peerid caching + smart allocation.
This commit adds peer ID caching in the client structure plus an API
change and the use of sdsMakeRoomFor() in order to improve the
reallocation pattern to generate the CLIENT LIST output.

Both the changes account for a very significant speedup.
2014-04-28 17:36:57 +02:00
antirez
e29d330724 Process events with processEventsWhileBlocked() when blocked.
When we are blocked and a few events a processed from time to time, it
is smarter to call the event handler a few times in order to handle the
accept, read, write, close cycle of a client in a single pass, otherwise
there is too much latency added for clients to receive a reply while the
server is busy in some way (for example during the DB loading).
2014-04-24 21:44:32 +02:00
antirez
78954ca3a2 ZREMRANGEBYLEX implemented. 2014-04-17 14:49:25 +02:00
antirez
8827dc4eec Always pass sorted set range objects by reference. 2014-04-17 14:30:12 +02:00
antirez
8b5e0b213e ZLEXCOUNT implemented.
Like ZCOUNT for lexicographical ranges.
2014-04-16 12:17:00 +02:00
antirez
402110f9fd User-defined switch point between sparse-dense HLL encodings. 2014-04-15 17:46:51 +02:00
antirez
261da523e8 PFDEBUG added, PFGETREG removed.
PFDEBUG will be the interface to do debugging tasks with a key
containing an HLL object.
2014-04-13 23:01:21 +02:00
antirez
3a6a1e42f1 ZRANGEBYLEX and ZREVRANGEBYLEX implementation. 2014-04-05 11:41:43 +02:00
antirez
aaaed66c56 PFGETREG added for testing purposes.
The new command allows to get a dump of the registers stored
into an HyperLogLog data structure for testing / debugging purposes.
2014-04-03 10:45:30 +02:00
antirez
5afcca34ce HyperLogLog API prefix modified from "P" to "PF".
Using both the initials of Philippe Flajolet instead of just "P".
2014-03-31 22:48:01 +02:00
antirez
e887c62e45 HyperLogLog: make API use the P prefix in honor of Philippe Flajolet. 2014-03-31 19:29:40 +02:00
antirez
f2277475b2 HLLMERGE implemented.
Merge N HLL data structures by selecting the max value for every
M[i] register among the set of HLLs.
2014-03-31 14:39:44 +02:00
antirez
543ede03f2 String value unsharing refactored into proper function.
All the Redis functions that need to modify the string value of a key in
a destructive way (APPEND, SETBIT, SETRANGE, ...) require to make the
object unshared (if refcount > 1) and encoded in raw format (if encoding
is not already REDIS_ENCODING_RAW).

This was cut & pasted many times in multiple places of the code. This
commit puts the small logic needed into a function called
dbUnshareStringValue().
2014-03-30 18:32:17 +02:00
antirez
ded86076b3 HLLCOUNT implemented. 2014-03-28 17:37:18 +01:00
antirez
156929ee97 HLLADD implemented. 2014-03-28 16:24:35 +01:00
antirez
552eb5407a HLLSELFTEST command implemented.
To test the bitfield array of counters set/get macros from the Redis Tcl
suite is hard, so a specialized command that is able to test the
internals was developed.
2014-03-28 12:11:55 +01:00
Matt Stancliff
3b54ee6ea4 Add REDIS_MIN_RESERVED_FDS define for open fds
Also update the original REDIS_EVENTLOOP_FDSET_INCR to
include REDIS_MIN_RESERVED_FDS. REDIS_EVENTLOOP_FDSET_INCR
exists to make sure more than (maxclients+RESERVED) entries
are allocated, but we can only guarantee that if we include
the current value of REDIS_MIN_RESERVED_FDS as a minimum
for the INCR size.
2014-03-24 13:15:35 -04:00
Matt Stancliff
c138631cd1 Fix maxclients error handling
Everywhere in the Redis code base, maxclients is treated
as an int with (int)maxclients or `maxclients = atoi(source)`,
so let's make maxclients an int.

This fixes a bug where someone could specify a negative maxclients
on startup and it would work (as well as set maxclients very high)
because:

    unsigned int maxclients;
    char *update = "-300";
    maxclients = atoi(update);
    if (maxclients < 1) goto fail;

But, (maxclients < 1) can only catch the case when maxclients
is exactly 0.  maxclients happily sets itself to -300, which isn't
-300, but rather 4294966996, which isn't < 1, so... everything
"worked."

maxclients config parsing checks for the case of < 1, but maxclients
CONFIG SET parsing was checking for case of < 0 (allowing
maxclients to be set to 0).  CONFIG SET parsing is now updated to
match config parsing of < 1.

It's tempting to add a MINIMUM_CLIENTS define, but... I didn't.

These changes were inspired by antirez#356, but this doesn't
fix that issue.
2014-03-24 10:17:33 -04:00
antirez
93253c2762 Sample and cache RSS in serverCron().
Obtaining the RSS (Resident Set Size) info is slow in Linux and OSX.
This slowed down the generation of the INFO 'memory' section.

Since the RSS does not require to be a real-time measurement, we
now sample it with server.hz frequency (10 times per second by default)
and use this value both to show the INFO rss field and to compute the
fragmentation ratio.

Practically this does not make any difference for memory profiling of
Redis but speeds up the INFO call significantly.
2014-03-24 12:00:20 +01:00
antirez
5fa3248bad The default maxmemory policy is now noeviction.
This is safer as by default maxmemory should just set a memory limit
without any key to be deleted, unless the policy is set to something
more relaxed.
2014-03-21 08:03:34 +01:00
antirez
a98369929e Use 24 bits for the lru object field and improve resolution.
There were 2 spare bits inside the Redis object structure that are now
used in order to enlarge 4x the range of the LRU field.

At the same time the resolution was improved from 10 to 1 second: this
still provides 194 days before the LRU counter overflows (restarting from
zero).

This is not a problem since it only causes lack of eviction precision for
objects not touched for a very long time, and the lack of precision is
only temporary.
2014-03-20 17:56:27 +01:00
antirez
f4da796c53 Default LRU samples is now 5. 2014-03-20 17:05:42 +01:00
antirez
22c9cfaf57 LRU eviction pool implementation.
This is an improvement over the previous eviction algorithm where we use
an eviction pool that is persistent across evictions of keys, and gets
populated with the best candidates for evictions found so far.

It allows to approximate LRU eviction at a given number of samples
better than the previous algorithm used.
2014-03-20 11:57:29 +01:00
antirez
ad6b0f70b2 Obtain LRU clock in a resolution dependent way.
For testing purposes it is handy to have a very high resolution of the
LRU clock, so that it is possible to experiment with scripts running in
just a few seconds how the eviction algorithms works.

This commit allows Redis to use the cached LRU clock, or a value
computed on demand, depending on the resolution. So normally we have the
good performance of a precomputed value, and a clock that wraps in many
days using the normal resolution, but if needed, changing a define will
switch behavior to an high resolution LRU clock.
2014-03-20 11:47:12 +01:00
antirez
1faf82663f Specify lruclock in redisServer structure via REDIS_LRU_BITS.
The padding field was totally useless: removed.
2014-03-20 11:37:27 +01:00
antirez
d77e231682 Specify LRU resolution in milliseconds. 2014-03-20 11:33:25 +01:00
antirez
fe30847016 Set LRU parameters via REDIS_LRU_BITS define. 2014-03-20 11:22:47 +01:00
antirez
e150ec7d0c Unify stats reset for CONFIG RESETSTAT / initServer().
Now CONFIG RESETSTAT makes sure to reset all the fields, and in the
future it will be simpler to avoid missing new fields.
2014-03-19 12:55:49 +01:00
antirez
04cf02e8dc Cluster: SORT get keys helper implemented. 2014-03-10 16:26:08 +01:00
antirez
c0e818ab08 Cluster: evalGetKey() added for EVAL/EVALSHA.
Previously we used zunionInterGetKeys(), however after this function was
fixed to account for the destination key (not needed when the API was
designed for "diskstore") the two set of commands can no longer be served
by an unique keys-extraction function.
2014-03-10 15:26:13 +01:00
antirez
787b297046 Cluster: getKeysFromCommand() API cleaned up.
This API originated from the "diskstore" experiment, not for Redis
Cluster itself, so there were legacy/useless things trying to
differentiate between keys that are going to be overwritten and keys
that need to be fetched from disk (preloaded).

All useless with Cluster, so removed with the result of code
simplification.
2014-03-10 13:18:41 +01:00
zhanghailei
138695d990 refer to updateLRUClock's comment REDIS_LRU_CLOCK_MAX is 22 bits,but #define REDIS_LRU_CLOCK_MAX ((1<<21)-1) only 21 bits 2014-03-04 12:20:31 +08:00
antirez
38c620b3b5 Initial implementation of BITPOS.
It appears to work but more stress testing, and both unit tests and
fuzzy testing, is needed in order to ensure the implementation is sane.
2014-02-27 12:44:27 +01:00
antirez
51bd9da1fd Update cached time in rdbLoad() callback.
server.unixtime and server.mstime are cached less precise timestamps
that we use every time we don't need an accurate time representation and
a syscall would be too slow for the number of calls we require.

Such an example is the initialization and update process of the last
interaction time with the client, that is used for timeouts.

However rdbLoad() can take some time to load the DB, but at the same
time it did not updated the time during DB loading. This resulted in the
bug described in issue #1535, where in the replication process the slave
loads the DB, creates the redisClient representation of its master, but
the timestamp is so old that the master, under certain conditions, is
sensed as already "timed out".

Thanks to @yoav-steinberg and Redis Labs Inc for the bug report and
analysis.
2014-02-13 15:13:26 +01:00
antirez
fe8352540f AOF: don't abort on write errors unless fsync is 'always'.
A system similar to the RDB write error handling is used, in which when
we can't write to the AOF file, writes are no longer accepted until we
are able to write again.

For fsync == always we still abort on errors since there is currently no
easy way to avoid replying with success to the user otherwise, and this
would violate the contract with the user of only acknowledging data
already secured on disk.
2014-02-12 16:11:36 +01:00
antirez
4919a13f50 CLIENT PAUSE and related API implemented.
The API is one of the bulding blocks of CLUSTER FAILOVER command that
executes a manual failover in Redis Cluster. However exposed as a
command that the user can call directly, it makes much simpler to
upgrade a standalone Redis instance using a slave in a safer way.

The commands works like that:

    CLIENT PAUSE <milliesconds>

All the clients that are not slaves and not in MONITOR state are paused
for the specified number of milliesconds. This means that slaves are
normally served in the meantime.

At the end of the specified amount of time all the clients are unblocked
and will continue operations normally. This command has no effects on
the population of the slow log, since clients are not blocked in the
middle of operations but only when there is to process new data.

Note that while the clients are unblocked, still new commands are
accepted and queued in the client buffer, so clients will likely not
block while writing to the server while the pause is active.
2014-02-04 16:16:09 +01:00
antirez
89884e8f6e Scripting: use mstime() and mstime_t for lua_time_start.
server.lua_time_start is expressed in milliseconds. Use mstime_t instead
of long long, and populate it with mstime() instead of ustime()/1000.

Functionally identical but more natural.
2014-02-03 15:45:40 +01:00
antirez
7be946fde2 Option "backlog" renamed "tcp-backlog".
This is especially important since we already have a concept of backlog
(the replication backlog).
2014-01-31 14:56:10 +01:00
Nenad Merdanovic
d76aa96d1a Add support for listen(2) backlog definition
In high RPS environments, the default listen backlog is not sufficient, so
giving users the power to configure it is the right approach, especially
since it requires only minor modifications to the code.
2014-01-31 14:52:10 +01:00
antirez
a7d30681c9 Cluster: configurable replicas migration barrier.
It is possible to configure the min number of additional working slaves
a master should be left with, for a slave to migrate to an orphaned
master.
2014-01-31 11:26:36 +01:00
antirez
6f54032080 Cluster: function clusterGetSlaveRank() added.
Return the number of slaves for the same master having a better
replication offset of the current slave, that is, the slave "rank" used
to pick a delay before the request for election.
2014-01-29 16:39:04 +01:00
antirez
28273394cb Cluster: support to read from slave nodes.
A client can enter a special cluster read-only mode using the READONLY
command: if the client read from a slave instance after this command,
for slots that are actually served by the instance's master, the queries
will be processed without redirection, allowing clients to read from
slaves (but without any kind fo read-after-write guarantee).

The READWRITE command can be used in order to exit the readonly state.
2014-01-14 16:33:16 +01:00
antirez
5189485625 Set REDIS_AOF_REWRITE_MIN_SIZE to 64mb.
64mb is the default value in redis.conf. For some reason instead the
hard-coded default was 1mb that is too small.
2014-01-14 11:27:28 +01:00
antirez
90a81b4ebb Don't send REPLCONF ACK to old masters.
Masters not understanding REPLCONF ACK will reply with errors to our
requests causing a number of possible issues.

This commit detects a global replication offest set to -1 at the end of
the replication, and marks the client representing the master with the
REDIS_PRE_PSYNC flag.

Note that this flag was called REDIS_PRE_PSYNC_SLAVE but now it is just
REDIS_PRE_PSYNC as it is used for both slaves and masters starting with
this commit.

This commit fixes issue #1488.
2014-01-08 14:28:16 +01:00
Yubao Liu
7da423f79f CONFIG REWRITE: don't throw some options on config rewrite
Those options will be thrown without this patch:
  include, rename-command, min-slaves-to-write, min-slaves-max-lag,
appendfilename.
2013-12-19 15:56:48 +01:00
Yossi Gottlieb
88a5cede88 Fix wrong repldboff type which causes dropped replication in rare cases. 2013-12-11 11:38:02 +01:00
antirez
11120689c4 Slaves heartbeats during sync improved.
The previous fix for false positive timeout detected by master was not
complete. There is another blocking stage while loading data for the
first synchronization with the master, that is, flushing away the
current data from the DB memory.

This commit uses the newly introduced dict.c callback in order to make
some incremental work (to send "\n" heartbeats to the master) while
flushing the old data from memory.

It is hard to write a regression test for this issue unfortunately. More
support for debugging in the Redis core would be needed in terms of
functionalities to simulate a slow DB loading / deletion.
2013-12-10 18:47:31 +01:00
antirez
2eb781b35b dict.c: added optional callback to dictEmpty().
Redis hash table implementation has many non-blocking features like
incremental rehashing, however while deleting a large hash table there
was no way to have a callback called to do some incremental work.

This commit adds this support, as an optiona callback argument to
dictEmpty() that is currently called at a fixed interval (one time every
65k deletions).
2013-12-10 18:46:24 +01:00
antirez
c5618e7fdd WAIT command: synchronous replication for Redis. 2013-12-04 16:20:03 +01:00
antirez
82b672f633 BLPOP blocking code refactored to be generic & reusable. 2013-12-03 17:43:53 +01:00
antirez
2e027c48e5 Removed old comments and dead code from freeClient(). 2013-12-03 13:54:06 +01:00
antirez
8f18345ef0 Cluster: basic data structures for nodes black list. 2013-11-29 17:37:06 +01:00
antirez
297de1ab26 Sentinel: test for writable config file.
This commit introduces a funciton called when Sentinel is ready for
normal operations to avoid putting Sentinel specific stuff in redis.c.
2013-11-21 12:28:15 +01:00
antirez
e257ab2bfe Sentinel: sentinelFlushConfig() to CONFIG REWRITE + fsync. 2013-11-19 10:13:04 +01:00
antirez
5998769c28 Sentinel: CONFIG REWRITE support for Sentinel config. 2013-11-19 09:48:12 +01:00
antirez
ebcb6251e6 SCAN code refactored to parse cursor first.
The previous implementation of SCAN parsed the cursor in the generic
function implementing SCAN, SSCAN, HSCAN and ZSCAN.

The actual higher-level command implementation only checked for empty
keys and return ASAP in that case. The result was that inverting the
arguments of, for instance, SSCAN for example and write:

    SSCAN 0 key

Instead of

    SSCAN key 0

Resulted into no error, since 0 is a non-existing key name very likely.
Just the iterator returned no elements at all.

In order to fix this issue the code was refactored to extract the
function to parse the cursor and return the error. Every higher level
command implementation now parses the cursor and later checks if the key
exist or not.
2013-11-05 15:47:50 +01:00
antirez
2c643ffa8d ZSCAN implemented. 2013-10-28 11:36:42 +01:00
antirez
e50090aa06 HSCAN implemented. 2013-10-28 11:35:26 +01:00
antirez
4a1f1cc0d7 SSCAN implemented. 2013-10-28 11:17:32 +01:00
Pieter Noordhuis
7f490b197f Add SCAN command 2013-10-25 10:49:48 +02:00
antirez
ba42428633 Cluster: time switched from seconds to milliseconds.
All the internal state of cluster involving time is now using mstime_t
and mstime() in order to use milliseconds resolution.

Also the clusterCron() function is called with a 10 hz frequency instead
of 1 hz.

The cluster node_timeout must be also configured in milliseconds by the
user in redis.conf.
2013-10-09 16:19:26 +02:00
antirez
929b6a4480 Cluster: cluster stuff moved from redis.h to cluster.h. 2013-10-09 15:38:05 +02:00
antirez
7afc0dd59a Cluster: new clusterDoBeforeSleep() API.
The new API is able to remember operations to perform before returning
to the event loop, such as checking if there is the failover quorum for
a slave, save and fsync the configuraiton file, and so forth.

Because this operations are performed before returning on the event
loop we are sure that messages that are sent in the same event loop run
will be delivered *after* the configuration is already saved, that is a
requirement sometimes. For instance we want to publish a new epoch only
when it is already stored in nodes.conf in order to avoid returning back
in the logical clock when a node is restarted.

This new API provides a big performance advantage compared to saving and
possibly fsyncing the configuration file multiple times in the same
event loop run, especially in the case of big clusters with tens or
hundreds of nodes.
2013-10-03 09:58:06 +02:00
antirez
6c4d904baf Cluster: bus messages stats in CLUSTER info. 2013-10-02 10:10:08 +02:00
antirez
1dedf9aa36 Cluster: time field removed from cluster messages header.
The new algorithm does not check replies time as checking for the
currentEpoch in the reply ensures that the reply is about the current
election process.
2013-09-30 16:19:44 +02:00
antirez
7c4b8f29e7 Cluster: react faster when a slave wins an election. 2013-09-26 16:54:43 +02:00
antirez
a445aa30a0 Cluster: master node now uses new protocol to vote. 2013-09-26 13:00:41 +02:00
antirez
fb9b76fe14 Cluster: slave node now uses the new protocol to get elected. 2013-09-26 11:13:17 +02:00
antirez
12483b0061 Cluster: configEpoch added in cluster nodes description. 2013-09-25 11:47:13 +02:00
antirez
925ea9f858 Cluster: added time field in cluster bus messages.
The time is sent in requests, and copied back in reply packets.
This way the receiver can compare the time field in a reply with its
local clock and check the age of the request associated with this reply.

This is an easy way to discard delayed replies. Note that only a clock
is used here, that is the one of the node sending the packet. The
receiver only copies the field back into the reply, so no
synchronization is needed between clocks of different hosts.
2013-09-20 09:22:21 +02:00
antirez
72587e6cc5 Cluster: free HANDSHAKE nodes after node_timeout.
Handshake nodes should turn into normal nodes or be freed in a
reasonable amount of time, otherwise they'll keep accumulating if the
address they are associated with is not reachable for some reason.
2013-09-04 12:41:21 +02:00
antirez
81a6a9639a Use listenToPort() in cluster.c as well. 2013-08-22 14:05:07 +02:00
antirez
89ffba9133 Replication: better way to send a preamble before RDB payload.
During the replication full resynchronization process, the RDB file is
transfered from the master to the slave. However there is a short
preamble to send, that is currently just the bulk payload length of the
file in the usual Redis form $..length..<CR><LF>.

This preamble used to be sent with a direct write call, assuming that
there was alway room in the socket output buffer to hold the few bytes
needed, however this does not scale in case we'll need to send more
stuff, and is not very robust code in general.

This commit introduces a more general mechanism to send a preamble up to
2GB in size (the max length of an sds string) in a non blocking way.
2013-08-12 10:29:14 +02:00
antirez
112fa47978 Add per-db average TTL information in INFO output.
Example:

db0:keys=221913,expires=221913,avg_ttl=655

The algorithm uses a running average with only two samples (current and
previous). Keys found to be expired are considered at TTL zero even if
the actual TTL can be negative.

The TTL is reported in milliseconds.
2013-08-06 15:00:43 +02:00
antirez
6500fabfb8 Some activeExpireCycle() refactoring. 2013-08-06 12:55:49 +02:00
antirez
b09ea1bd90 Draft #1 of a new expired keys collection algorithm.
The main idea here is that when we are no longer to expire keys at the
rate the are created, we can't block more in the normal expire cycle as
this would result in too big latency spikes.

For this reason the commit introduces a "fast" expire cycle that does
not run for more than 1 millisecond but is called in the beforeSleep()
hook of the event loop, so much more often, and with a frequency bound
to the frequency of executed commnads.

The fast expire cycle is only called when the standard expiration
algorithm runs out of time, that is, consumed more than
REDIS_EXPIRELOOKUPS_TIME_PERC of CPU in a given cycle without being able
to take the number of already expired keys that are yet not collected
to a number smaller than 25% of the number of keys.

You can test this commit with different loads, but a simple way is to
use the following:

Extreme load with pipelining:

redis-benchmark -r 100000000 -n 100000000  \
        -P 32 set ele:rand:000000000000 foo ex 2

Remove the -P32 in order to avoid the pipelining for a more real-world
load.

In another terminal tab you can monitor the Redis behavior with:

redis-cli -i 0.1 -r -1 info keyspace

and

redis-cli --latency-history

Note: this commit will make Redis printing a lot of debug messages, it
is not a good idea to use it in production.
2013-08-05 12:05:22 +02:00
antirez
894eba07c8 Introduction of a new string encoding: EMBSTR
Previously two string encodings were used for string objects:

1) REDIS_ENCODING_RAW: a string object with obj->ptr pointing to an sds
stirng.

2) REDIS_ENCODING_INT: a string object where the obj->ptr void pointer
is casted to a long.

This commit introduces a experimental new encoding called
REDIS_ENCODING_EMBSTR that implements an object represented by an sds
string that is not modifiable but allocated in the same memory chunk as
the robj structure itself.

The chunk looks like the following:

+--------------+-----------+------------+--------+----+
| robj data... | robj->ptr | sds header | string | \0 |
+--------------+-----+-----+------------+--------+----+
                     |                       ^
                     +-----------------------+

The robj->ptr points to the contiguous sds string data, so the object
can be manipulated with the same functions used to manipulate plan
string objects, however we need just on malloc and one free in order to
allocate or release this kind of objects. Moreover it has better cache
locality.

This new allocation strategy should benefit both the memory usage and
the performances. A performance gain between 60 and 70% was observed
during micro-benchmarks, however there is more work to do to evaluate
the performance impact and the memory usage behavior.
2013-07-22 10:31:38 +02:00
yoav
63d15dfc87 Chunked loading of RDB to prevent redis from stalling reading very large keys. 2013-07-16 15:41:24 +02:00
antirez
cf1579a798 SORT ALPHA: use collation instead of binary comparison.
Note that we only do it when STORE is not used, otherwise we want an
absolutely locale independent and binary safe sorting in order to ensure
AOF / replication consistency.

This is probably an unexpected behavior violating the least surprise
rule, but there is currently no other simple / good alternative.
2013-07-12 12:02:36 +02:00
antirez
81e55ec0f3 Fixed compareStringObject() and introduced collateStringObject().
compareStringObject was not always giving the same result when comparing
two exact strings, but encoded as integers or as sds strings, since it
switched to strcmp() when at least one of the strings were not sds
encoded.

For instance the two strings "123" and "123\x00456", where the first
string was integer encoded, would result into the old implementation of
compareStringObject() to return 0 as if the strings were equal, while
instead the second string is "greater" than the first in a binary
comparison.

The same compasion, but with "123" encoded as sds string, would instead
return a value < 0, as it is correct. It is not impossible that the
above caused some obscure bug, since the comparison was not always
deterministic, and compareStringObject() is used in the implementation
of skiplists, hash tables, and so forth.

At the same time, collateStringObject() was introduced by this commit, so
that can be used by SORT command to return sorted strings usign
collation instead of binary comparison. See next commit.
2013-07-12 11:56:52 +02:00
antirez
d0001fe810 getClientPeerId() refactored into two functions. 2013-07-09 15:46:34 +02:00
antirez
e4c019e7a8 getClientPeerId() now reports errors.
We now also use it in CLIENT KILL implementation.
2013-07-09 15:28:30 +02:00
antirez
5cdc5da990 getClientPeerID introduced.
The function returns an unique identifier for the client, as ip:port for
IPv4 and IPv6 clients, or as path:0 for Unix socket clients.

See the top comment in the function for more info.
2013-07-09 12:49:20 +02:00
Geoff Garside
6181455ac6 Update REDIS_CLUSTER_IPLEN to INET6_ADDRSTRLEN.
Change REDIS_CLUSTER_IPLEN to INET6_ADDRSTRLEN so that the clusterNode
ip character buffer is big enough to hold an IPv6 address.
2013-07-08 15:57:23 +02:00
Geoff Garside
9cfa02fe73 Add macro to define clusterNode.ip buffer size.
Add REDIS_CLUSTER_IPLEN macro to define the size of the clusterNode ip
character array. Additionally use this macro in inet_ntop(3) calls where
the size of the array was being defined manually.

The REDIS_CLUSTER_IPLEN is defined as INET_ADDRSTRLEN which defines the
correct size of a buffer to store an IPv4 address in. The
INET_ADDRSTRLEN macro itself is defined in the <netinet/in.h> header
file and should be portable across the majority of systems.
2013-07-08 15:55:39 +02:00
antirez
98eecb70eb Binding multiple IPs done properly with multiple sockets. 2013-07-05 11:47:20 +02:00
antirez
90b0d66cce Ability to bind multiple addresses. 2013-07-04 18:50:15 +02:00
antirez
de9a221749 CONFIG SET maxclients. 2013-06-28 17:08:03 +02:00
antirez
13585dd677 function renamed: popcount_binary -> redisPopcount. 2013-06-26 15:19:06 +02:00