Commit Graph

518 Commits

Author SHA1 Message Date
antirez
2d6eb68993 Sentinel: allow SHUTDOWN command in Sentinel mode. 2014-02-07 11:22:24 +01:00
antirez
4919a13f50 CLIENT PAUSE and related API implemented.
The API is one of the bulding blocks of CLUSTER FAILOVER command that
executes a manual failover in Redis Cluster. However exposed as a
command that the user can call directly, it makes much simpler to
upgrade a standalone Redis instance using a slave in a safer way.

The commands works like that:

    CLIENT PAUSE <milliesconds>

All the clients that are not slaves and not in MONITOR state are paused
for the specified number of milliesconds. This means that slaves are
normally served in the meantime.

At the end of the specified amount of time all the clients are unblocked
and will continue operations normally. This command has no effects on
the population of the slow log, since clients are not blocked in the
middle of operations but only when there is to process new data.

Note that while the clients are unblocked, still new commands are
accepted and queued in the client buffer, so clients will likely not
block while writing to the server while the pause is active.
2014-02-04 16:16:09 +01:00
antirez
b770079f2c Allow CONFIG and SHUTDOWN while in stale-slave state. 2014-02-03 15:51:03 +01:00
antirez
7be946fde2 Option "backlog" renamed "tcp-backlog".
This is especially important since we already have a concept of backlog
(the replication backlog).
2014-01-31 14:56:10 +01:00
Nenad Merdanovic
d76aa96d1a Add support for listen(2) backlog definition
In high RPS environments, the default listen backlog is not sufficient, so
giving users the power to configure it is the right approach, especially
since it requires only minor modifications to the code.
2014-01-31 14:52:10 +01:00
antirez
a7d30681c9 Cluster: configurable replicas migration barrier.
It is possible to configure the min number of additional working slaves
a master should be left with, for a slave to migrate to an orphaned
master.
2014-01-31 11:26:36 +01:00
antirez
72f1715e45 Fixed inverted if condition in MISCONF error code path. 2014-01-28 10:11:12 +01:00
antirez
28273394cb Cluster: support to read from slave nodes.
A client can enter a special cluster read-only mode using the READONLY
command: if the client read from a slave instance after this command,
for slots that are actually served by the instance's master, the queries
will be processed without redirection, allowing clients to read from
slaves (but without any kind fo read-after-write guarantee).

The READWRITE command can be used in order to exit the readonly state.
2014-01-14 16:33:16 +01:00
antirez
7e9433cee1 Configuring port to 0 disables IP socket as specified.
This was no longer the case with 2.8 becuase of a bug introduced with
the IPv6 support. Now it is fixed.

This fixes issue #1287 and #1477.
2013-12-23 11:31:35 +01:00
Yubao Liu
7da423f79f CONFIG REWRITE: don't throw some options on config rewrite
Those options will be thrown without this patch:
  include, rename-command, min-slaves-to-write, min-slaves-max-lag,
appendfilename.
2013-12-19 15:56:48 +01:00
antirez
a5ec247f13 Replication: publish the slave_repl_offset when disconnected from master.
When a slave was disconnected from its master the replication offset was
reported as -1. Now it is reported as the replication offset of the
previous master, so that failover can be performed using this value in
order to try to select a slave with more processed data from a set of
slaves of the old master.
2013-12-11 15:23:15 +01:00
antirez
11e81a1e9a Fixed grammar: before H the article is a, not an. 2013-12-05 16:35:32 +01:00
antirez
58713c6b13 Fix clients timeout handling.
During the refactoring of blocking operations, commit
82b672f633, a bug was introduced where
a milliseconds time is compared to a seconds time, so all the clients
always appear to timeout if timeout is set to non-zero value.

Thanks to Jonathan Leibiusky for finding the bug and helping verifying
the cause and fix.
2013-12-05 14:55:07 +01:00
antirez
c5618e7fdd WAIT command: synchronous replication for Redis. 2013-12-04 16:20:03 +01:00
antirez
82b672f633 BLPOP blocking code refactored to be generic & reusable. 2013-12-03 17:43:53 +01:00
antirez
8f18345ef0 Cluster: basic data structures for nodes black list. 2013-11-29 17:37:06 +01:00
antirez
55f90b11c9 Stop writes on MISCONF only if instance is a master.
From the point of view of the slave not accepting writes from the master
can only create a bigger consistency issue.
2013-11-28 16:29:26 +01:00
antirez
60817bb262 Reply to PING with error when there is a MISCONF state. 2013-11-28 16:17:10 +01:00
antirez
297de1ab26 Sentinel: test for writable config file.
This commit introduces a funciton called when Sentinel is ready for
normal operations to avoid putting Sentinel specific stuff in redis.c.
2013-11-21 12:28:15 +01:00
antirez
37a51a2568 Sentinel: distinguish between is-master-down-by-addr requests.
Some are just to know if the master is down, and in this case the runid
in the request is set to "*", others are actually in order to seek for a
vote and get elected. In the latter case the runid is set to the runid
of the instance seeking for the vote.
2013-11-19 16:50:04 +01:00
antirez
2c643ffa8d ZSCAN implemented. 2013-10-28 11:36:42 +01:00
antirez
e50090aa06 HSCAN implemented. 2013-10-28 11:35:26 +01:00
antirez
4a1f1cc0d7 SSCAN implemented. 2013-10-28 11:17:32 +01:00
antirez
cd8cb49dc4 SCAN is a random command and does not require output sorting.
Sorting the output helps when we want to turn a non-deterministic into a
deterministic command, in that case this is not possible.
2013-10-28 11:13:43 +01:00
Pieter Noordhuis
7a6cfb18f3 SCAN requires at least 1 argument 2013-10-25 10:49:56 +02:00
Pieter Noordhuis
7f490b197f Add SCAN command 2013-10-25 10:49:48 +02:00
antirez
ba42428633 Cluster: time switched from seconds to milliseconds.
All the internal state of cluster involving time is now using mstime_t
and mstime() in order to use milliseconds resolution.

Also the clusterCron() function is called with a 10 hz frequency instead
of 1 hz.

The cluster node_timeout must be also configured in milliseconds by the
user in redis.conf.
2013-10-09 16:19:26 +02:00
antirez
929b6a4480 Cluster: cluster stuff moved from redis.h to cluster.h. 2013-10-09 15:38:05 +02:00
antirez
7c4b8f29e7 Cluster: react faster when a slave wins an election. 2013-09-26 16:54:43 +02:00
antirez
7bec743e66 Allow AUTH / PING when disconnected from slave and serve-stale-data is no. 2013-09-17 09:46:06 +02:00
antirez
003cc8a4f5 Only run the fast active expire cycle if master & enabled. 2013-08-27 09:31:55 +02:00
antirez
4f310e05c0 Opening TCP listening ports refactored into a function. 2013-08-22 14:01:16 +02:00
antirez
0f0cc88589 Print error message when can't bind * on any address. 2013-08-22 13:02:59 +02:00
antirez
35a977c499 Fix for issue #1214 simplified. 2013-08-21 11:36:09 +02:00
Salvatore Sanfilippo
038e356dbc Merge pull request #1214 from kaoshijuan/unstable
fixed initServer fail problem
2013-08-21 02:18:41 -07:00
antirez
112fa47978 Add per-db average TTL information in INFO output.
Example:

db0:keys=221913,expires=221913,avg_ttl=655

The algorithm uses a running average with only two samples (current and
previous). Keys found to be expired are considered at TTL zero even if
the actual TTL can be negative.

The TTL is reported in milliseconds.
2013-08-06 15:00:43 +02:00
antirez
4befe73b60 activeExpireCycle(): fix about fast cycle early start.
We don't want to repeat a fast cycle too soon, the previous code was
broken, we need to wait two times the period *since* the start of the
previous cycle in order to avoid there is an even space between cycles:

.-> start                   .-> second start
|                           |
+-------------+-------------+--------------+
| first cycle |    pause    | second cycle |
+-------------+-------------+--------------+

The second and first start must be PERIOD*2 useconds apart hence the *2
in the new code.
2013-08-06 12:59:04 +02:00
antirez
6500fabfb8 Some activeExpireCycle() refactoring. 2013-08-06 12:55:49 +02:00
antirez
d398f38879 Remove dead code and fix comments for new expire code. 2013-08-06 12:36:13 +02:00
antirez
66a26471dc Darft #2 for key collection algo: more improvements.
This commit makes the fast collection cycle time configurable, at
the same time it does not allow to run a new fast collection cycle
for the same amount of time as the max duration of the fast
collection cycle.
2013-08-05 16:14:28 +02:00
antirez
b09ea1bd90 Draft #1 of a new expired keys collection algorithm.
The main idea here is that when we are no longer to expire keys at the
rate the are created, we can't block more in the normal expire cycle as
this would result in too big latency spikes.

For this reason the commit introduces a "fast" expire cycle that does
not run for more than 1 millisecond but is called in the beforeSleep()
hook of the event loop, so much more often, and with a frequency bound
to the frequency of executed commnads.

The fast expire cycle is only called when the standard expiration
algorithm runs out of time, that is, consumed more than
REDIS_EXPIRELOOKUPS_TIME_PERC of CPU in a given cycle without being able
to take the number of already expired keys that are yet not collected
to a number smaller than 25% of the number of keys.

You can test this commit with different loads, but a simple way is to
use the following:

Extreme load with pipelining:

redis-benchmark -r 100000000 -n 100000000  \
        -P 32 set ele:rand:000000000000 foo ex 2

Remove the -P32 in order to avoid the pipelining for a more real-world
load.

In another terminal tab you can monitor the Redis behavior with:

redis-cli -i 0.1 -r -1 info keyspace

and

redis-cli --latency-history

Note: this commit will make Redis printing a lot of debug messages, it
is not a good idea to use it in production.
2013-08-05 12:05:22 +02:00
Allan
a0e986d7f2 fixed initServer fail while having no IPv6 nor IPv4 2013-07-25 15:36:00 +08:00
Allan
cba7a4e69a fixed initServer failed if no IPV4 or no IPV6 2013-07-25 15:28:33 +08:00
Allan
1e7cff23b3 fixed bug issue of #1213 2013-07-24 21:34:55 +08:00
antirez
894eba07c8 Introduction of a new string encoding: EMBSTR
Previously two string encodings were used for string objects:

1) REDIS_ENCODING_RAW: a string object with obj->ptr pointing to an sds
stirng.

2) REDIS_ENCODING_INT: a string object where the obj->ptr void pointer
is casted to a long.

This commit introduces a experimental new encoding called
REDIS_ENCODING_EMBSTR that implements an object represented by an sds
string that is not modifiable but allocated in the same memory chunk as
the robj structure itself.

The chunk looks like the following:

+--------------+-----------+------------+--------+----+
| robj data... | robj->ptr | sds header | string | \0 |
+--------------+-----+-----+------------+--------+----+
                     |                       ^
                     +-----------------------+

The robj->ptr points to the contiguous sds string data, so the object
can be manipulated with the same functions used to manipulate plan
string objects, however we need just on malloc and one free in order to
allocate or release this kind of objects. Moreover it has better cache
locality.

This new allocation strategy should benefit both the memory usage and
the performances. A performance gain between 60 and 70% was observed
during micro-benchmarks, however there is more work to do to evaluate
the performance impact and the memory usage behavior.
2013-07-22 10:31:38 +02:00
yoav
63d15dfc87 Chunked loading of RDB to prevent redis from stalling reading very large keys. 2013-07-16 15:41:24 +02:00
antirez
123b221dc9 Use the environment locale for strcoll() collation. 2013-07-12 13:38:43 +02:00
antirez
631d656a94 All IP string repr buffers are now REDIS_IP_STR_LEN bytes. 2013-07-09 11:32:52 +02:00
antirez
f19e267e9a IPv6: bind IPv4 and IPv6 interfaces by default. 2013-07-09 10:47:17 +02:00
antirez
90038906f4 Fix old anetPeerToString() API call in replication.c 2013-07-08 16:11:52 +02:00
Geoff Garside
a68e3d4c6a Cleanup main() and BACKTRACE mistaken pulled while rebasing. 2013-07-08 16:07:26 +02:00
Geoff Garside
1ca4008d14 Fix calls to anetPeerToString() missing buffer size. 2013-07-08 16:07:26 +02:00
Geoff Garside
ee5a6df101 Update calls to anetPeerToString to include ip_len. 2013-07-08 15:57:22 +02:00
antirez
98eecb70eb Binding multiple IPs done properly with multiple sockets. 2013-07-05 11:47:20 +02:00
antirez
90b0d66cce Ability to bind multiple addresses. 2013-07-04 18:50:15 +02:00
antirez
0781ad6899 getAbsolutePath() moved into utils.c 2013-07-02 11:56:52 +02:00
antirez
de9a221749 CONFIG SET maxclients. 2013-06-28 17:08:03 +02:00
antirez
3130670b97 Allow SHUTDOWN in loading state. 2013-06-27 12:18:29 +02:00
Salvatore Sanfilippo
bae60ede1d Merge pull request #1111 from yamt/netbsd3
netbsd support
2013-06-26 06:17:02 -07:00
antirez
82ea1c6f5d Move Replication Script Cache initialization in safer place.
It should be called just one time at startup and not every time the Lua
scripting engine is re-initialized, otherwise memory is leaked.
2013-06-24 19:27:49 +02:00
antirez
f0bf5fd8c7 Use the RSC to replicate EVALSHA unmodified.
This commit uses the Replication Script Cache in order to avoid
translating EVALSHA into EVAL whenever possible for both the AOF and
slaves.
2013-06-24 18:57:31 +02:00
antirez
94ec7db470 Replication of scripts as EVALSHA: sha1 caching implemented.
This code is only responsible to take an LRU-evicted fixed length cache
of SHA1 that we are sure all the slaves received.

In this commit only the implementation is provided, but the Redis core
does not use it to actually send EVALSHA to slaves when possible.
2013-06-24 10:26:04 +02:00
antirez
515a26bbc1 New API to force propagation.
The old REDIS_CMD_FORCE_REPLICATION flag was removed from the
implementation of Redis, now there is a new API to force specific
executions of a command to be propagated to AOF / Replication link:

    void forceCommandPropagation(int flags);

The new API is also compatible with Lua scripting, so a script that will
execute commands that are forced to be propagated, will also be
propagated itself accordingly even if no change to data is operated.

As a side effect, this new design fixes the issue with scripts not able
to propagate PUBLISH to slaves (issue #873).
2013-06-21 12:07:53 +02:00
antirez
455563faec PUBSUB command implemented.
Currently it implements three subcommands:

PUBSUB CHANNELS [<pattern>]    List channels with non-zero subscribers.
PUBSUB NUMSUB [channel_1 ...]  List number of subscribers for channels.
PUBSUB NUMPAT                  Return number of subscribed patterns.
2013-06-20 15:32:00 +02:00
antirez
88441bf18f New INFO field "min_slaves_good_slaves".
When min-slaves-to-write feature is active, this field reports the
number of slaves considered good (online state, lag within the specified
range).
2013-05-30 12:18:31 +02:00
antirez
2ec7875cbf min-replicas-to-write: only deny write commands.
I guess I needed another coffee...
2013-05-30 11:30:09 +02:00
antirez
ed599d3aca min-slaves-to-write: don't accept writes with less than N replicas.
This feature allows the user to specify the minimum number of
connected replicas having a lag less or equal than the specified
amount of seconds for writes to be accepted.
2013-05-30 11:30:04 +02:00
antirez
888400ebd5 repl_offset field in INFO replication is now just offset. 2013-05-29 19:56:33 +02:00
antirez
37c29e037b Slaves list in INFO output: lag added, format changed.
There is a new 'lag' information in the list of slaves, in the
"replication" section of the INFO output.

Also the format was changed in a backward incompatible way in order to
make it more easy to parse if new fields are added in the future, as the
new format is comma separated but has named fields (no longer positional
fields).
2013-05-29 19:54:44 +02:00
antirez
091ed386f7 Accept REPLCONF in any state. 2013-05-28 15:26:20 +02:00
antirez
efd87031d0 Don't ACK the master after every command.
Sending an ACK is now moved into the replicationSendAck() function.
2013-05-27 11:42:35 +02:00
antirez
0292c5f7ae Replication: send REPLCONF ACK to master. 2013-05-27 11:42:25 +02:00
YAMAMOTO Takashi
9fcead7a59 don't assume time_t == long
time_t is always 64bit on recent versions of NetBSD.
2013-05-17 17:22:39 +09:00
antirez
310dbba01c Added a define for most configuration defaults.
Also the logfile option was modified to always have an explicit value
and to log to stdout when an empty string is used as log file.

Previously there was special handling of the string "stdout" that set
the logfile to NULL, this always required some special handling.
2013-05-15 10:12:29 +02:00
antirez
c184f36d21 CONFIG REWRITE: support for client-output-buffer-limit. 2013-05-13 18:34:18 +02:00
antirez
7e049fafd3 CONFIG REWRITE: Initial support code and design. 2013-05-13 11:11:12 +02:00
antirez
5947f170f9 Obtain absoute path of configuration file, expose it in INFO. 2013-05-09 16:57:59 +02:00
antirez
d264122f6a Config option to turn AOF rewrite incremental fsync on/off. 2013-04-24 10:57:07 +02:00
antirez
9d823fc222 More explicit panic message on out of memory. 2013-04-19 15:11:34 +02:00
antirez
05fa4f4034 Cluster: node timeout is now configurable. 2013-04-04 12:29:10 +02:00
antirez
b237de33d1 Throttle BGSAVE attempt on saving error.
When a BGSAVE fails, Redis used to flood itself trying to BGSAVE at
every next cron call, that is either 10 or 100 times per second
depending on configuration and server version.

This commit does not allow a new automatic BGSAVE attempt to be
performed before a few seconds delay (currently 5).

This avoids both the auto-flood problem and filling the disk with
logs at a serious rate.

The five seconds limit, considering a log entry of 200 bytes, will use
less than 4 MB of disk space per day that is reasonable, the sysadmin
should notice before of catastrofic events especially since by default
Redis will stop serving write queries after the first failed BGSAVE.

This fixes issue #849
2013-04-02 14:05:50 +02:00
antirez
30d5d416e6 Extended SET command implemented (issue #931). 2013-03-28 15:40:19 +01:00
antirez
32a83c8206 DEBUG set-active-expire added.
We need the ability to disable the activeExpireCycle() (active
expired key collection) call for testing purposes.
2013-03-27 17:55:02 +01:00
antirez
df69155e8a Allow SELECT while loading the DB.
Fixes issue #1024.
2013-03-26 13:51:17 +01:00
antirez
8bb5eb7357 Flag PUBLISH as read-only in the command table. 2013-03-26 11:09:22 +01:00
antirez
1902a9c532 Replication: master_link_down_since_seconds initial value should be huge.
server.repl_down_since used to be initialized to the current time at
startup. This is wrong since the replication never started. Clients
testing this filed to check if data is uptodate should never believe
data is recent if we never ever connected to our master.
2013-03-13 12:54:48 +01:00
Salvatore Sanfilippo
9925c7c670 Merge pull request #1001 from djanowski/fatal-errors-rdb-load
Abort when opening the RDB file results in an error other than ENOENT.
2013-03-12 11:40:36 -07:00
Damian Janowski
4178a80282 Abort when opening the RDB file results in an error other than ENOENT.
This fixes cases where the RDB file does exist but can't be accessed for
any reason. For instance, when the Redis process doesn't have enough
permissions on the file.
2013-03-12 14:37:50 -03:00
antirez
215bfaea16 Set default for stop_writes_on_bgsave_err in initServerConfig().
It was placed for error in initServer() that's called after the
configuation is already loaded, causing issue #1000.
2013-03-12 18:34:08 +01:00
antirez
2d851333a6 activeExpireCycle() smarter with many DBs and under expire pressure.
activeExpireCycle() tries to test just a few DBs per iteration so that
it scales if there are many configured DBs in the Redis instance.
However this commit makes it a bit smarter when one a few of those DBs
are under expiration pressure and there are many many keys to expire.

What we do is to remember if in the last iteration had to return because
we ran out of time. In that case the next iteration we'll test all the
configured DBs so that we are sure we'll test again the DB under
pressure.

Before of this commit after some mass-expire in a given DB the function
tested just a few of the next DBs, possibly empty, a few per iteration,
so it took a long time for the function to reach again the DB under
pressure. This resulted in a lot of memory being used by already expired
keys and never accessed by clients.
2013-03-11 11:10:33 +01:00
antirez
08b107e405 In databasesCron() never test more DBs than we have. 2013-03-11 10:51:03 +01:00
antirez
4b1ccdfd49 Make comment name match var name in activeExpireCycle(). 2013-03-11 10:42:14 +01:00
antirez
1f7d2c1e27 Optimize inner loop of activeExpireCycle() for no-expires case. 2013-03-09 11:48:54 +01:00
antirez
5f5aa487f9 REDIS_DBCRON_DBS_PER_SEC -> REDIS_DBCRON_DBS_PER_CALL 2013-03-09 11:44:20 +01:00
antirez
db29d71a30 activeExpireCycle(): process only a small number of DBs per iteration.
This small number of DBs is set to 16 so actually in the default
configuraiton Redis should behave exactly like in the past.
However the difference is that when the user configures a very large
number of DBs we don't do an O(N) operation, consuming a non trivial
amount of CPU per serverCron() iteration.
2013-03-08 17:48:58 +01:00
antirez
40a2da159c Use unsigned integers for DB ids, for defined wrap-to-zero. 2013-03-08 17:41:20 +01:00
antirez
7ac3b3a486 Only resize/rehash a few databases per cron iteration.
This is the first step to lower the CPU usage when many databases are
configured. The other is to also process a limited number of DBs per
call in the active expire cycle.
2013-03-08 14:01:12 +01:00
antirez
dfd732dff3 Actually call databasesCron() inside serverCron(). 2013-03-08 13:59:50 +01:00
antirez
cd9dcd1835 Move Redis databases background processing to databasesCron(). 2013-03-08 12:34:05 +01:00
antirez
7b190a08cf API to lookup commands with their original name.
A new server.orig_commands table was added to the server structure, this
contains a copy of the commant table unaffected by rename-command
statements in redis.conf.

A new API lookupCommandOrOriginal() was added that checks both tables,
new first, old later, so that rewriteClientCommandVector() and friends
can lookup commands with their new or original name in order to fix the
client->cmd pointer when the argument vector is renamed.

This fixes the segfault of issue #986, but does not fix a wider range of
problems resulting from renaming commands that actually operate on data
and are registered into the AOF file or propagated to slaves... That is
command renaming should be handled with care.
2013-03-06 16:28:26 +01:00