Commit Graph

625 Commits

Author SHA1 Message Date
antirez
d34c2fa3bb ROLE command added.
The new ROLE command is designed in order to provide a client with
informations about the replication in a fast and easy to use way
compared to the INFO command where the same information is also
available.
2014-06-07 17:27:49 +02:00
antirez
14fb0ac649 Don't process min-slaves-to-write for slaves.
Replication is totally broken when a slave has this option, since it
stops accepting updates from masters.

This fixes issue #1434.
2014-06-05 10:48:05 +02:00
antirez
b239a32aae redisLogFromHandler() format changed to match new logs format. 2014-05-22 19:24:35 +02:00
antirez
d98fa718e0 Tag every log line with role.
Every log contains, just after the pid, a single character that provides
information about the role of an instance:

S - Slave
M - Master
C - Writing child
X - Sentinel
2014-05-22 18:48:37 +02:00
antirez
39603a7e31 Cluster: slave validity factor is now user configurable.
Check the commit changes in the example redis.conf for more information.
2014-05-22 16:57:54 +02:00
antirez
5c78f87666 RESTORE: reply with -BUSYKEY special error code.
The error when the target key is busy was a generic one, while it makes
sense to be able to distinguish between the target key busy error and
the others easily.
2014-05-12 10:01:59 +02:00
antirez
3a3458ee7b Accept multiple clients per iteration.
When the listening sockets readable event is fired, we have the chance
to accept multiple clients instead of accepting a single one. This makes
Redis more responsive when there is a mass-connect event (for example
after the server startup), and in workloads where a connect-disconnect
pattern is used often, so that multiple clients are waiting to be
accepted continuously.

As a side effect, this commit makes the LOADING, BUSY, and similar
errors much faster to deliver to the client, making Redis more
responsive when there is to return errors to inform the clients that the
server is blocked in an not interruptible operation.
2014-04-24 21:44:32 +02:00
antirez
0feb2aabca PFCOUNT support for multi-key union. 2014-04-17 17:32:59 +02:00
antirez
78954ca3a2 ZREMRANGEBYLEX implemented. 2014-04-17 14:49:25 +02:00
antirez
8b5e0b213e ZLEXCOUNT implemented.
Like ZCOUNT for lexicographical ranges.
2014-04-16 12:17:00 +02:00
antirez
402110f9fd User-defined switch point between sparse-dense HLL encodings. 2014-04-15 17:46:51 +02:00
antirez
9df77fc0c4 Mark PFDEBUG as write command in the commands table.
It is safer since it is able to have side effects.
2014-04-14 15:57:50 +02:00
antirez
261da523e8 PFDEBUG added, PFGETREG removed.
PFDEBUG will be the interface to do debugging tasks with a key
containing an HLL object.
2014-04-13 23:01:21 +02:00
antirez
67bb2c46b2 Add casting to match printf format.
adjustOpenFilesLimit() and clusterUpdateSlotsWithConfig() that were
assuming uint64_t is the same as unsigned long long, which is true
probably for all the systems out there that we target, but still GCC
emitted a warning since technically they are two different types.
2014-04-07 08:58:06 +02:00
antirez
3a6a1e42f1 ZRANGEBYLEX and ZREVRANGEBYLEX implementation. 2014-04-05 11:41:43 +02:00
antirez
aaaed66c56 PFGETREG added for testing purposes.
The new command allows to get a dump of the registers stored
into an HyperLogLog data structure for testing / debugging purposes.
2014-04-03 10:45:30 +02:00
antirez
5afcca34ce HyperLogLog API prefix modified from "P" to "PF".
Using both the initials of Philippe Flajolet instead of just "P".
2014-03-31 22:48:01 +02:00
antirez
e887c62e45 HyperLogLog: make API use the P prefix in honor of Philippe Flajolet. 2014-03-31 19:29:40 +02:00
antirez
f2277475b2 HLLMERGE implemented.
Merge N HLL data structures by selecting the max value for every
M[i] register among the set of HLLs.
2014-03-31 14:39:44 +02:00
antirez
ded86076b3 HLLCOUNT implemented. 2014-03-28 17:37:18 +01:00
antirez
156929ee97 HLLADD implemented. 2014-03-28 16:24:35 +01:00
antirez
552eb5407a HLLSELFTEST command implemented.
To test the bitfield array of counters set/get macros from the Redis Tcl
suite is hard, so a specialized command that is able to test the
internals was developed.
2014-03-28 12:11:55 +01:00
antirez
6540e9eeaa Fix off by one bug in freeMemoryIfNeeded() eviction pool.
Bug found by the continuous integration test running the Redis
with valgrind:

==6245== Invalid read of size 8
==6245==    at 0x4C2DEEF: memcpy@GLIBC_2.2.5 (mc_replace_strmem.c:876)
==6245==    by 0x41F9E6: freeMemoryIfNeeded (redis.c:3010)
==6245==    by 0x41D2CC: processCommand (redis.c:2069)

memmove() size argument was accounting for an extra element, going
outside the bounds of the array.
2014-03-25 10:32:15 +01:00
antirez
6e33c908dd adjustOpenFilesLimit() refactoring.
In this commit:
* Decrement steps are semantically differentiated from the reserved FDs.
  Previously both values were 32 but the meaning was different.
* Make it clear that we save setrlimit errno.
* Don't explicitly handle wrapping of 'f', but prevent it from
  happening.
* Add comments to make the function flow more readable.

This integrates PR #1630
2014-03-25 09:05:28 +01:00
Salvatore Sanfilippo
72c5ebcba4 Merge pull request #1630 from mattsta/fix-infinite-loop-ulimit
Fix infinite loop ulimit
2014-03-25 08:42:39 +01:00
Matt Stancliff
386a46946b Fix potentially incorrect errno usage
errno may be reset by the previous call to redisLog, so capture
the original value for proper error reporting.
2014-03-24 13:21:15 -04:00
Matt Stancliff
3b54ee6ea4 Add REDIS_MIN_RESERVED_FDS define for open fds
Also update the original REDIS_EVENTLOOP_FDSET_INCR to
include REDIS_MIN_RESERVED_FDS. REDIS_EVENTLOOP_FDSET_INCR
exists to make sure more than (maxclients+RESERVED) entries
are allocated, but we can only guarantee that if we include
the current value of REDIS_MIN_RESERVED_FDS as a minimum
for the INCR size.
2014-03-24 13:15:35 -04:00
Salvatore Sanfilippo
896e15f3e3 Merge pull request #1627 from badboy/lru-fix
Fixed a few typos.
2014-03-24 18:13:39 +01:00
Matt Stancliff
90b844212d Fix infinite loop on startup if ulimit too low
Fun fact: rlim_t is an unsigned long long on all platforms.

Continually subtracting from a rlim_t makes it get smaller
and smaller until it wraps, then you're up to 2^64-1.

This was causing an infinite loop on Redis startup if
your ulimit was extremely (almost comically) low.

The case of (f > oldlimit) would never be met in a case like:

    f = 150
    while (f > 20) f -= 128

Since f is unsigned, it can't go negative and would
take on values of:

    Iteration 1: 150 - 128 => 22
    Iteration 2:  22 - 128 => 18446744073709551510
    Iterations 3-∞: ...

To catch the wraparound, we use the previous value of f
stored in limit.rlimit_cur.  If we subtract from f and
get a larger number than the value it had previously,
we print an error and exit since we don't have enough
file descriptors to help the user at this point.

Thanks to @bs3g for the inspiration to fix this problem.
Patches existed from @bs3g at antirez#1227, but I needed to repair a few other
parts of Redis simultaneously, so I didn't get a chance to use them.
2014-03-24 10:17:33 -04:00
Matt Stancliff
4a25983f8f Improve error handling around setting ulimits
The log messages about open file limits have always
been slightly opaque and confusing.  Here's an attempt to
fix their wording, detail, and meaning.  Users will have a
better understanding of how to fix very common problems
with these reworded messages.

Also, we handle a new error case when maxclients becomes less
than one, essentially rendering the server unusable.  We
now exit on startup instead of leaving the user with a server
unable to handle any connections.

This fixes antirez#356 as well.
2014-03-24 10:17:33 -04:00
Matt Stancliff
491532a713 Replace magic 32 with REDIS_EVENTLOOP_FDSET_INCR
32 was the additional number of file descriptors Redis
would reserve when managing a too-low ulimit.  The
number 32 was in too many places statically, so now
we use a macro instead that looks more appropriate.

When Redis sets up the server event loop, it uses:
    server.maxclients+REDIS_EVENTLOOP_FDSET_INCR

So, when reserving file descriptors, it makes sense to
reserve at least REDIS_EVENTLOOP_FDSET_INCR FDs instead
of only 32.  Currently, REDIS_EVENTLOOP_FDSET_INCR is
set to 128 in redis.h.

Also, I replaced the static 128 in the while f < old loop
with REDIS_EVENTLOOP_FDSET_INCR as well, which results
in no change since it was already 128.

Impact: Users now need at least maxclients+128 as
their open file limit instead of maxclients+32 to obtain
actual "maxclients" number of clients.  Redis will carve
the extra REDIS_EVENTLOOP_FDSET_INCR file descriptors it
needs out of the "maxclients" range instead of failing
to start (unless the local ulimit -n is too low to accomidate
the request).
2014-03-24 10:17:33 -04:00
antirez
93253c2762 Sample and cache RSS in serverCron().
Obtaining the RSS (Resident Set Size) info is slow in Linux and OSX.
This slowed down the generation of the INFO 'memory' section.

Since the RSS does not require to be a real-time measurement, we
now sample it with server.hz frequency (10 times per second by default)
and use this value both to show the INFO rss field and to compute the
fragmentation ratio.

Practically this does not make any difference for memory profiling of
Redis but speeds up the INFO call significantly.
2014-03-24 12:00:20 +01:00
antirez
d3efe04c47 Cache uname() output across INFO calls.
Uname was profiled to be a slow syscall. It produces always the same
output in the context of a single execution of Redis, so calling it at
every INFO output generation does not make too much sense.

The uname utsname structure was modified as a static variable. At the
same time a static integer was added to check if we need to call uname
the first time.
2014-03-24 10:00:08 +01:00
Jan-Erik Rediger
4fdd7a0546 Fixed a few typos. 2014-03-20 23:16:38 +01:00
antirez
c641b670c3 Use new dictGetRandomKeys() API to get samples for eviction.
The eviction quality degradates a bit in my tests, but since the API is
faster, it allows to raise the number of samples, and overall is a win.
2014-03-20 16:52:12 +01:00
antirez
82b53c650c struct dictEntry -> dictEntry. 2014-03-20 16:20:37 +01:00
antirez
22c9cfaf57 LRU eviction pool implementation.
This is an improvement over the previous eviction algorithm where we use
an eviction pool that is persistent across evictions of keys, and gets
populated with the best candidates for evictions found so far.

It allows to approximate LRU eviction at a given number of samples
better than the previous algorithm used.
2014-03-20 11:57:29 +01:00
antirez
ad6b0f70b2 Obtain LRU clock in a resolution dependent way.
For testing purposes it is handy to have a very high resolution of the
LRU clock, so that it is possible to experiment with scripts running in
just a few seconds how the eviction algorithms works.

This commit allows Redis to use the cached LRU clock, or a value
computed on demand, depending on the resolution. So normally we have the
good performance of a precomputed value, and a clock that wraps in many
days using the normal resolution, but if needed, changing a define will
switch behavior to an high resolution LRU clock.
2014-03-20 11:47:12 +01:00
antirez
d77e231682 Specify LRU resolution in milliseconds. 2014-03-20 11:33:25 +01:00
antirez
e150ec7d0c Unify stats reset for CONFIG RESETSTAT / initServer().
Now CONFIG RESETSTAT makes sure to reset all the fields, and in the
future it will be simpler to avoid missing new fields.
2014-03-19 12:55:49 +01:00
antirez
133fccb03f Cluster: flag the transaction as dirty for the new redirections. 2014-03-13 15:11:53 +01:00
antirez
04cf02e8dc Cluster: SORT get keys helper implemented. 2014-03-10 16:26:08 +01:00
antirez
c0e818ab08 Cluster: evalGetKey() added for EVAL/EVALSHA.
Previously we used zunionInterGetKeys(), however after this function was
fixed to account for the destination key (not needed when the API was
designed for "diskstore") the two set of commands can no longer be served
by an unique keys-extraction function.
2014-03-10 15:26:13 +01:00
antirez
787b297046 Cluster: getKeysFromCommand() API cleaned up.
This API originated from the "diskstore" experiment, not for Redis
Cluster itself, so there were legacy/useless things trying to
differentiate between keys that are going to be overwritten and keys
that need to be fetched from disk (preloaded).

All useless with Cluster, so removed with the result of code
simplification.
2014-03-10 13:18:41 +01:00
Salvatore Sanfilippo
442b06db54 Merge pull request #1555 from mattsta/cluster-port-error-out
Cluster port error out
2014-03-10 10:37:50 +01:00
antirez
36676c2318 Redis Cluster: support for multi-key operations. 2014-03-07 13:19:09 +01:00
antirez
c5edd91716 Cluster: invalidate current transaction on redirections. 2014-03-03 17:11:51 +01:00
antirez
12a88d575d Document why we update peak memory in INFO. 2014-03-03 11:19:54 +01:00
Matt Stancliff
f1c9a203b2 Force INFO used_memory_peak to match peak memory
used_memory_peak only updates in serverCron every server.hz,
but Redis can use more memory and a user can request memory
INFO before used_memory_peak gets updated in the next
cron run.

This patch updates used_memory_peak to the current
memory usage if the current memory usage is higher
than the recorded used_memory_peak value.

(And it only calls zmalloc_used_memory() once instead of
twice as it was doing before.)
2014-02-28 17:47:41 -05:00
antirez
38c620b3b5 Initial implementation of BITPOS.
It appears to work but more stress testing, and both unit tests and
fuzzy testing, is needed in order to ensure the implementation is sane.
2014-02-27 12:44:27 +01:00
Matt Stancliff
2c273e3591 Add cluster or sentinel to proc title
If you launch redis with `redis-server --sentinel` then
in a ps, your output only says "redis-server IP:Port" — this
patch changes the proc title to include [sentinel] or
[cluster] depending on the current server mode:
e.g.  "redis-server IP:Port [sentinel]"
      "redis-server IP:Port [cluster]"
2014-02-20 23:58:54 -05:00
Matt Stancliff
b20ae393f1 Fix "can't bind to address" error reporting.
Report the actual port used for the listening attempt instead of
server.port.

Originally, Redis would just listen on server.port.
But, with clustering, Redis uses a Cluster Port too,
so we can't say server.port is always where we are listening.

If you tried to launch Redis with a too-high port number (any
port where Port+10000 > 65535), Redis would refuse to start, but
only print an error saying it can't connect to the Redis port.

This patch fixes much confusions.
2014-02-19 17:26:33 -05:00
antirez
ede33fb912 Get absoulte config file path before processig 'dir'.
The code tried to obtain the configuration file absolute path after
processing the configuration file. However if config file was a relative
path and a "dir" statement was processed reading the config, the absolute
path obtained was wrong.

With this fix the absolute path is obtained before processing the
configuration while the server is still in the original directory where
it was executed.
2014-02-17 16:44:53 +01:00
antirez
51bd9da1fd Update cached time in rdbLoad() callback.
server.unixtime and server.mstime are cached less precise timestamps
that we use every time we don't need an accurate time representation and
a syscall would be too slow for the number of calls we require.

Such an example is the initialization and update process of the last
interaction time with the client, that is used for timeouts.

However rdbLoad() can take some time to load the DB, but at the same
time it did not updated the time during DB loading. This resulted in the
bug described in issue #1535, where in the replication process the slave
loads the DB, creates the redisClient representation of its master, but
the timestamp is so old that the master, under certain conditions, is
sensed as already "timed out".

Thanks to @yoav-steinberg and Redis Labs Inc for the bug report and
analysis.
2014-02-13 15:13:26 +01:00
antirez
fc08c8599f AOF write error: retry with a frequency of 1 hz. 2014-02-12 16:27:59 +01:00
antirez
fe8352540f AOF: don't abort on write errors unless fsync is 'always'.
A system similar to the RDB write error handling is used, in which when
we can't write to the AOF file, writes are no longer accepted until we
are able to write again.

For fsync == always we still abort on errors since there is currently no
easy way to avoid replying with success to the user otherwise, and this
would violate the contract with the user of only acknowledging data
already secured on disk.
2014-02-12 16:11:36 +01:00
antirez
6df4ffe639 Don't count time to feed MONITORs in SLOWLOG. 2014-02-07 18:29:20 +01:00
antirez
2d6eb68993 Sentinel: allow SHUTDOWN command in Sentinel mode. 2014-02-07 11:22:24 +01:00
antirez
4919a13f50 CLIENT PAUSE and related API implemented.
The API is one of the bulding blocks of CLUSTER FAILOVER command that
executes a manual failover in Redis Cluster. However exposed as a
command that the user can call directly, it makes much simpler to
upgrade a standalone Redis instance using a slave in a safer way.

The commands works like that:

    CLIENT PAUSE <milliesconds>

All the clients that are not slaves and not in MONITOR state are paused
for the specified number of milliesconds. This means that slaves are
normally served in the meantime.

At the end of the specified amount of time all the clients are unblocked
and will continue operations normally. This command has no effects on
the population of the slow log, since clients are not blocked in the
middle of operations but only when there is to process new data.

Note that while the clients are unblocked, still new commands are
accepted and queued in the client buffer, so clients will likely not
block while writing to the server while the pause is active.
2014-02-04 16:16:09 +01:00
antirez
b770079f2c Allow CONFIG and SHUTDOWN while in stale-slave state. 2014-02-03 15:51:03 +01:00
antirez
7be946fde2 Option "backlog" renamed "tcp-backlog".
This is especially important since we already have a concept of backlog
(the replication backlog).
2014-01-31 14:56:10 +01:00
Nenad Merdanovic
d76aa96d1a Add support for listen(2) backlog definition
In high RPS environments, the default listen backlog is not sufficient, so
giving users the power to configure it is the right approach, especially
since it requires only minor modifications to the code.
2014-01-31 14:52:10 +01:00
antirez
a7d30681c9 Cluster: configurable replicas migration barrier.
It is possible to configure the min number of additional working slaves
a master should be left with, for a slave to migrate to an orphaned
master.
2014-01-31 11:26:36 +01:00
antirez
72f1715e45 Fixed inverted if condition in MISCONF error code path. 2014-01-28 10:11:12 +01:00
antirez
28273394cb Cluster: support to read from slave nodes.
A client can enter a special cluster read-only mode using the READONLY
command: if the client read from a slave instance after this command,
for slots that are actually served by the instance's master, the queries
will be processed without redirection, allowing clients to read from
slaves (but without any kind fo read-after-write guarantee).

The READWRITE command can be used in order to exit the readonly state.
2014-01-14 16:33:16 +01:00
antirez
7e9433cee1 Configuring port to 0 disables IP socket as specified.
This was no longer the case with 2.8 becuase of a bug introduced with
the IPv6 support. Now it is fixed.

This fixes issue #1287 and #1477.
2013-12-23 11:31:35 +01:00
Yubao Liu
7da423f79f CONFIG REWRITE: don't throw some options on config rewrite
Those options will be thrown without this patch:
  include, rename-command, min-slaves-to-write, min-slaves-max-lag,
appendfilename.
2013-12-19 15:56:48 +01:00
antirez
a5ec247f13 Replication: publish the slave_repl_offset when disconnected from master.
When a slave was disconnected from its master the replication offset was
reported as -1. Now it is reported as the replication offset of the
previous master, so that failover can be performed using this value in
order to try to select a slave with more processed data from a set of
slaves of the old master.
2013-12-11 15:23:15 +01:00
antirez
11e81a1e9a Fixed grammar: before H the article is a, not an. 2013-12-05 16:35:32 +01:00
antirez
58713c6b13 Fix clients timeout handling.
During the refactoring of blocking operations, commit
82b672f633, a bug was introduced where
a milliseconds time is compared to a seconds time, so all the clients
always appear to timeout if timeout is set to non-zero value.

Thanks to Jonathan Leibiusky for finding the bug and helping verifying
the cause and fix.
2013-12-05 14:55:07 +01:00
antirez
c5618e7fdd WAIT command: synchronous replication for Redis. 2013-12-04 16:20:03 +01:00
antirez
82b672f633 BLPOP blocking code refactored to be generic & reusable. 2013-12-03 17:43:53 +01:00
antirez
8f18345ef0 Cluster: basic data structures for nodes black list. 2013-11-29 17:37:06 +01:00
antirez
55f90b11c9 Stop writes on MISCONF only if instance is a master.
From the point of view of the slave not accepting writes from the master
can only create a bigger consistency issue.
2013-11-28 16:29:26 +01:00
antirez
60817bb262 Reply to PING with error when there is a MISCONF state. 2013-11-28 16:17:10 +01:00
antirez
297de1ab26 Sentinel: test for writable config file.
This commit introduces a funciton called when Sentinel is ready for
normal operations to avoid putting Sentinel specific stuff in redis.c.
2013-11-21 12:28:15 +01:00
antirez
37a51a2568 Sentinel: distinguish between is-master-down-by-addr requests.
Some are just to know if the master is down, and in this case the runid
in the request is set to "*", others are actually in order to seek for a
vote and get elected. In the latter case the runid is set to the runid
of the instance seeking for the vote.
2013-11-19 16:50:04 +01:00
antirez
2c643ffa8d ZSCAN implemented. 2013-10-28 11:36:42 +01:00
antirez
e50090aa06 HSCAN implemented. 2013-10-28 11:35:26 +01:00
antirez
4a1f1cc0d7 SSCAN implemented. 2013-10-28 11:17:32 +01:00
antirez
cd8cb49dc4 SCAN is a random command and does not require output sorting.
Sorting the output helps when we want to turn a non-deterministic into a
deterministic command, in that case this is not possible.
2013-10-28 11:13:43 +01:00
Pieter Noordhuis
7a6cfb18f3 SCAN requires at least 1 argument 2013-10-25 10:49:56 +02:00
Pieter Noordhuis
7f490b197f Add SCAN command 2013-10-25 10:49:48 +02:00
antirez
ba42428633 Cluster: time switched from seconds to milliseconds.
All the internal state of cluster involving time is now using mstime_t
and mstime() in order to use milliseconds resolution.

Also the clusterCron() function is called with a 10 hz frequency instead
of 1 hz.

The cluster node_timeout must be also configured in milliseconds by the
user in redis.conf.
2013-10-09 16:19:26 +02:00
antirez
929b6a4480 Cluster: cluster stuff moved from redis.h to cluster.h. 2013-10-09 15:38:05 +02:00
antirez
7c4b8f29e7 Cluster: react faster when a slave wins an election. 2013-09-26 16:54:43 +02:00
antirez
7bec743e66 Allow AUTH / PING when disconnected from slave and serve-stale-data is no. 2013-09-17 09:46:06 +02:00
antirez
003cc8a4f5 Only run the fast active expire cycle if master & enabled. 2013-08-27 09:31:55 +02:00
antirez
4f310e05c0 Opening TCP listening ports refactored into a function. 2013-08-22 14:01:16 +02:00
antirez
0f0cc88589 Print error message when can't bind * on any address. 2013-08-22 13:02:59 +02:00
antirez
35a977c499 Fix for issue #1214 simplified. 2013-08-21 11:36:09 +02:00
Salvatore Sanfilippo
038e356dbc Merge pull request #1214 from kaoshijuan/unstable
fixed initServer fail problem
2013-08-21 02:18:41 -07:00
antirez
112fa47978 Add per-db average TTL information in INFO output.
Example:

db0:keys=221913,expires=221913,avg_ttl=655

The algorithm uses a running average with only two samples (current and
previous). Keys found to be expired are considered at TTL zero even if
the actual TTL can be negative.

The TTL is reported in milliseconds.
2013-08-06 15:00:43 +02:00
antirez
4befe73b60 activeExpireCycle(): fix about fast cycle early start.
We don't want to repeat a fast cycle too soon, the previous code was
broken, we need to wait two times the period *since* the start of the
previous cycle in order to avoid there is an even space between cycles:

.-> start                   .-> second start
|                           |
+-------------+-------------+--------------+
| first cycle |    pause    | second cycle |
+-------------+-------------+--------------+

The second and first start must be PERIOD*2 useconds apart hence the *2
in the new code.
2013-08-06 12:59:04 +02:00
antirez
6500fabfb8 Some activeExpireCycle() refactoring. 2013-08-06 12:55:49 +02:00
antirez
d398f38879 Remove dead code and fix comments for new expire code. 2013-08-06 12:36:13 +02:00
antirez
66a26471dc Darft #2 for key collection algo: more improvements.
This commit makes the fast collection cycle time configurable, at
the same time it does not allow to run a new fast collection cycle
for the same amount of time as the max duration of the fast
collection cycle.
2013-08-05 16:14:28 +02:00
antirez
b09ea1bd90 Draft #1 of a new expired keys collection algorithm.
The main idea here is that when we are no longer to expire keys at the
rate the are created, we can't block more in the normal expire cycle as
this would result in too big latency spikes.

For this reason the commit introduces a "fast" expire cycle that does
not run for more than 1 millisecond but is called in the beforeSleep()
hook of the event loop, so much more often, and with a frequency bound
to the frequency of executed commnads.

The fast expire cycle is only called when the standard expiration
algorithm runs out of time, that is, consumed more than
REDIS_EXPIRELOOKUPS_TIME_PERC of CPU in a given cycle without being able
to take the number of already expired keys that are yet not collected
to a number smaller than 25% of the number of keys.

You can test this commit with different loads, but a simple way is to
use the following:

Extreme load with pipelining:

redis-benchmark -r 100000000 -n 100000000  \
        -P 32 set ele:rand:000000000000 foo ex 2

Remove the -P32 in order to avoid the pipelining for a more real-world
load.

In another terminal tab you can monitor the Redis behavior with:

redis-cli -i 0.1 -r -1 info keyspace

and

redis-cli --latency-history

Note: this commit will make Redis printing a lot of debug messages, it
is not a good idea to use it in production.
2013-08-05 12:05:22 +02:00
Allan
a0e986d7f2 fixed initServer fail while having no IPv6 nor IPv4 2013-07-25 15:36:00 +08:00
Allan
cba7a4e69a fixed initServer failed if no IPV4 or no IPV6 2013-07-25 15:28:33 +08:00
Allan
1e7cff23b3 fixed bug issue of #1213 2013-07-24 21:34:55 +08:00
antirez
894eba07c8 Introduction of a new string encoding: EMBSTR
Previously two string encodings were used for string objects:

1) REDIS_ENCODING_RAW: a string object with obj->ptr pointing to an sds
stirng.

2) REDIS_ENCODING_INT: a string object where the obj->ptr void pointer
is casted to a long.

This commit introduces a experimental new encoding called
REDIS_ENCODING_EMBSTR that implements an object represented by an sds
string that is not modifiable but allocated in the same memory chunk as
the robj structure itself.

The chunk looks like the following:

+--------------+-----------+------------+--------+----+
| robj data... | robj->ptr | sds header | string | \0 |
+--------------+-----+-----+------------+--------+----+
                     |                       ^
                     +-----------------------+

The robj->ptr points to the contiguous sds string data, so the object
can be manipulated with the same functions used to manipulate plan
string objects, however we need just on malloc and one free in order to
allocate or release this kind of objects. Moreover it has better cache
locality.

This new allocation strategy should benefit both the memory usage and
the performances. A performance gain between 60 and 70% was observed
during micro-benchmarks, however there is more work to do to evaluate
the performance impact and the memory usage behavior.
2013-07-22 10:31:38 +02:00
yoav
63d15dfc87 Chunked loading of RDB to prevent redis from stalling reading very large keys. 2013-07-16 15:41:24 +02:00
antirez
123b221dc9 Use the environment locale for strcoll() collation. 2013-07-12 13:38:43 +02:00
antirez
631d656a94 All IP string repr buffers are now REDIS_IP_STR_LEN bytes. 2013-07-09 11:32:52 +02:00
antirez
f19e267e9a IPv6: bind IPv4 and IPv6 interfaces by default. 2013-07-09 10:47:17 +02:00
antirez
90038906f4 Fix old anetPeerToString() API call in replication.c 2013-07-08 16:11:52 +02:00
Geoff Garside
a68e3d4c6a Cleanup main() and BACKTRACE mistaken pulled while rebasing. 2013-07-08 16:07:26 +02:00
Geoff Garside
1ca4008d14 Fix calls to anetPeerToString() missing buffer size. 2013-07-08 16:07:26 +02:00
Geoff Garside
ee5a6df101 Update calls to anetPeerToString to include ip_len. 2013-07-08 15:57:22 +02:00
antirez
98eecb70eb Binding multiple IPs done properly with multiple sockets. 2013-07-05 11:47:20 +02:00
antirez
90b0d66cce Ability to bind multiple addresses. 2013-07-04 18:50:15 +02:00
antirez
0781ad6899 getAbsolutePath() moved into utils.c 2013-07-02 11:56:52 +02:00
antirez
de9a221749 CONFIG SET maxclients. 2013-06-28 17:08:03 +02:00
antirez
3130670b97 Allow SHUTDOWN in loading state. 2013-06-27 12:18:29 +02:00
Salvatore Sanfilippo
bae60ede1d Merge pull request #1111 from yamt/netbsd3
netbsd support
2013-06-26 06:17:02 -07:00
antirez
82ea1c6f5d Move Replication Script Cache initialization in safer place.
It should be called just one time at startup and not every time the Lua
scripting engine is re-initialized, otherwise memory is leaked.
2013-06-24 19:27:49 +02:00
antirez
f0bf5fd8c7 Use the RSC to replicate EVALSHA unmodified.
This commit uses the Replication Script Cache in order to avoid
translating EVALSHA into EVAL whenever possible for both the AOF and
slaves.
2013-06-24 18:57:31 +02:00
antirez
94ec7db470 Replication of scripts as EVALSHA: sha1 caching implemented.
This code is only responsible to take an LRU-evicted fixed length cache
of SHA1 that we are sure all the slaves received.

In this commit only the implementation is provided, but the Redis core
does not use it to actually send EVALSHA to slaves when possible.
2013-06-24 10:26:04 +02:00
antirez
515a26bbc1 New API to force propagation.
The old REDIS_CMD_FORCE_REPLICATION flag was removed from the
implementation of Redis, now there is a new API to force specific
executions of a command to be propagated to AOF / Replication link:

    void forceCommandPropagation(int flags);

The new API is also compatible with Lua scripting, so a script that will
execute commands that are forced to be propagated, will also be
propagated itself accordingly even if no change to data is operated.

As a side effect, this new design fixes the issue with scripts not able
to propagate PUBLISH to slaves (issue #873).
2013-06-21 12:07:53 +02:00
antirez
455563faec PUBSUB command implemented.
Currently it implements three subcommands:

PUBSUB CHANNELS [<pattern>]    List channels with non-zero subscribers.
PUBSUB NUMSUB [channel_1 ...]  List number of subscribers for channels.
PUBSUB NUMPAT                  Return number of subscribed patterns.
2013-06-20 15:32:00 +02:00
antirez
88441bf18f New INFO field "min_slaves_good_slaves".
When min-slaves-to-write feature is active, this field reports the
number of slaves considered good (online state, lag within the specified
range).
2013-05-30 12:18:31 +02:00
antirez
2ec7875cbf min-replicas-to-write: only deny write commands.
I guess I needed another coffee...
2013-05-30 11:30:09 +02:00
antirez
ed599d3aca min-slaves-to-write: don't accept writes with less than N replicas.
This feature allows the user to specify the minimum number of
connected replicas having a lag less or equal than the specified
amount of seconds for writes to be accepted.
2013-05-30 11:30:04 +02:00
antirez
888400ebd5 repl_offset field in INFO replication is now just offset. 2013-05-29 19:56:33 +02:00
antirez
37c29e037b Slaves list in INFO output: lag added, format changed.
There is a new 'lag' information in the list of slaves, in the
"replication" section of the INFO output.

Also the format was changed in a backward incompatible way in order to
make it more easy to parse if new fields are added in the future, as the
new format is comma separated but has named fields (no longer positional
fields).
2013-05-29 19:54:44 +02:00
antirez
091ed386f7 Accept REPLCONF in any state. 2013-05-28 15:26:20 +02:00
antirez
efd87031d0 Don't ACK the master after every command.
Sending an ACK is now moved into the replicationSendAck() function.
2013-05-27 11:42:35 +02:00
antirez
0292c5f7ae Replication: send REPLCONF ACK to master. 2013-05-27 11:42:25 +02:00
YAMAMOTO Takashi
9fcead7a59 don't assume time_t == long
time_t is always 64bit on recent versions of NetBSD.
2013-05-17 17:22:39 +09:00
antirez
310dbba01c Added a define for most configuration defaults.
Also the logfile option was modified to always have an explicit value
and to log to stdout when an empty string is used as log file.

Previously there was special handling of the string "stdout" that set
the logfile to NULL, this always required some special handling.
2013-05-15 10:12:29 +02:00
antirez
c184f36d21 CONFIG REWRITE: support for client-output-buffer-limit. 2013-05-13 18:34:18 +02:00
antirez
7e049fafd3 CONFIG REWRITE: Initial support code and design. 2013-05-13 11:11:12 +02:00
antirez
5947f170f9 Obtain absoute path of configuration file, expose it in INFO. 2013-05-09 16:57:59 +02:00
antirez
d264122f6a Config option to turn AOF rewrite incremental fsync on/off. 2013-04-24 10:57:07 +02:00
antirez
9d823fc222 More explicit panic message on out of memory. 2013-04-19 15:11:34 +02:00
antirez
05fa4f4034 Cluster: node timeout is now configurable. 2013-04-04 12:29:10 +02:00
antirez
b237de33d1 Throttle BGSAVE attempt on saving error.
When a BGSAVE fails, Redis used to flood itself trying to BGSAVE at
every next cron call, that is either 10 or 100 times per second
depending on configuration and server version.

This commit does not allow a new automatic BGSAVE attempt to be
performed before a few seconds delay (currently 5).

This avoids both the auto-flood problem and filling the disk with
logs at a serious rate.

The five seconds limit, considering a log entry of 200 bytes, will use
less than 4 MB of disk space per day that is reasonable, the sysadmin
should notice before of catastrofic events especially since by default
Redis will stop serving write queries after the first failed BGSAVE.

This fixes issue #849
2013-04-02 14:05:50 +02:00
antirez
30d5d416e6 Extended SET command implemented (issue #931). 2013-03-28 15:40:19 +01:00
antirez
32a83c8206 DEBUG set-active-expire added.
We need the ability to disable the activeExpireCycle() (active
expired key collection) call for testing purposes.
2013-03-27 17:55:02 +01:00
antirez
df69155e8a Allow SELECT while loading the DB.
Fixes issue #1024.
2013-03-26 13:51:17 +01:00
antirez
8bb5eb7357 Flag PUBLISH as read-only in the command table. 2013-03-26 11:09:22 +01:00
antirez
1902a9c532 Replication: master_link_down_since_seconds initial value should be huge.
server.repl_down_since used to be initialized to the current time at
startup. This is wrong since the replication never started. Clients
testing this filed to check if data is uptodate should never believe
data is recent if we never ever connected to our master.
2013-03-13 12:54:48 +01:00
Salvatore Sanfilippo
9925c7c670 Merge pull request #1001 from djanowski/fatal-errors-rdb-load
Abort when opening the RDB file results in an error other than ENOENT.
2013-03-12 11:40:36 -07:00
Damian Janowski
4178a80282 Abort when opening the RDB file results in an error other than ENOENT.
This fixes cases where the RDB file does exist but can't be accessed for
any reason. For instance, when the Redis process doesn't have enough
permissions on the file.
2013-03-12 14:37:50 -03:00
antirez
215bfaea16 Set default for stop_writes_on_bgsave_err in initServerConfig().
It was placed for error in initServer() that's called after the
configuation is already loaded, causing issue #1000.
2013-03-12 18:34:08 +01:00
antirez
2d851333a6 activeExpireCycle() smarter with many DBs and under expire pressure.
activeExpireCycle() tries to test just a few DBs per iteration so that
it scales if there are many configured DBs in the Redis instance.
However this commit makes it a bit smarter when one a few of those DBs
are under expiration pressure and there are many many keys to expire.

What we do is to remember if in the last iteration had to return because
we ran out of time. In that case the next iteration we'll test all the
configured DBs so that we are sure we'll test again the DB under
pressure.

Before of this commit after some mass-expire in a given DB the function
tested just a few of the next DBs, possibly empty, a few per iteration,
so it took a long time for the function to reach again the DB under
pressure. This resulted in a lot of memory being used by already expired
keys and never accessed by clients.
2013-03-11 11:10:33 +01:00
antirez
08b107e405 In databasesCron() never test more DBs than we have. 2013-03-11 10:51:03 +01:00
antirez
4b1ccdfd49 Make comment name match var name in activeExpireCycle(). 2013-03-11 10:42:14 +01:00
antirez
1f7d2c1e27 Optimize inner loop of activeExpireCycle() for no-expires case. 2013-03-09 11:48:54 +01:00