redict

mirror of https://codeberg.org/redict/redict.git synced 2025-01-24 00:59:02 -05:00

Author	SHA1	Message	Date
Salvatore Sanfilippo	c9f0456d81	Merge pull request #3673 from badboy/reset-ttl-on-migrating Reset the ttl for additional keys	2016-12-14 12:41:00 +01:00
antirez	04542cff92	Replication: fix the infamous key leakage of writable slaves + EXPIRE. BACKGROUND AND USE CASEj Redis slaves are normally write only, however the supprot a "writable" mode which is very handy when scaling reads on slaves, that actually need write operations in order to access data. For instance imagine having slaves replicating certain Sets keys from the master. When accessing the data on the slave, we want to peform intersections between such Sets values. However we don't want to intersect each time: to cache the intersection for some time often is a good idea. To do so, it is possible to setup a slave as a writable slave, and perform the intersection on the slave side, perhaps setting a TTL on the resulting key so that it will expire after some time. THE BUG Problem: in order to have a consistent replication, expiring of keys in Redis replication is up to the master, that synthesize DEL operations to send in the replication stream. However slaves logically expire keys by hiding them from read attempts from clients so that if the master did not promptly sent a DEL, the client still see logically expired keys as non existing. Because slaves don't actively expire keys by actually evicting them but just masking from the POV of read operations, if a key is created in a writable slave, and an expire is set, the key will be leaked forever: 1. No DEL will be received from the master, which does not know about such a key at all. 2. No eviction will be performed by the slave, since it needs to disable eviction because it's up to masters, otherwise consistency of data is lost. THE FIX In order to fix the problem, the slave should be able to tag keys that were created in the slave side and have an expire set in some way. My solution involved using an unique additional dictionary created by the writable slave only if needed. The dictionary is obviously keyed by the key name that we need to track: all the keys that are set with an expire directly by a client writing to the slave are tracked. The value in the dictionary is a bitmap of all the DBs where such a key name need to be tracked, so that we can use a single dictionary to track keys in all the DBs used by the slave (actually this limits the solution to the first 64 DBs, but the default with Redis is to use 16 DBs). This solution allows to pay both a small complexity and CPU penalty, which is zero when the feature is not used, actually. The slave-side eviction is encapsulated in code which is not coupled with the rest of the Redis core, if not for the hook to track the keys. TODO I'm doing the first smoke tests to see if the feature works as expected: so far so good. Unit tests should be added before merging into the 4.0 branch.	2016-12-13 10:59:54 +01:00
Jan-Erik Rediger	2a32f0371e	Reset the ttl for additional keys Before, if a previous key had a TTL set but the current one didn't, the TTL was reused and thus resulted in wrong expirations set. This behaviour was experienced, when `MigrateDefaultPipeline` in redis-trib was set to >1 Fixes #3655	2016-12-08 14:27:21 +01:00
andyli	8abf9729f0	Modify MIN->MAX	2016-11-29 16:34:41 +08:00
antirez	cfdb3a2214	Cluster: handle zero bytes at the end of nodes.conf.	2016-11-16 14:13:18 +01:00
antirez	a3f893b800	RESTORE: accept RDB dumps with older versions. Reference issue #3218. Checking the code I can't find a reason why the original RESTORE code was so opinionated about restoring only the current version. The code in to `rdb.c` appears to be capable as always to restore data from older versions of Redis, and the only places where it is needed the current version in order to correctly restore data, is while loading the opcodes, not the values itself as it happens in the case of RESTORE. For the above reasons, this commit enables RESTORE to accept older versions of values payloads.	2016-06-16 15:53:57 +02:00
antirez	971e3c51b6	Cluster: make getNodeByQuery() responsible of -CLUSTERDOWN errors. This fixes a bug introduced by `d827dbf`, and makes the code consistent with the logic of always allowing, while the cluster is down, commands that don't target any key. As a side effect the code is also simpler now.	2016-05-05 11:33:43 +02:00
Salvatore Sanfilippo	330715afd8	Merge pull request #3039 from itamarhaber/patch-3 Fixes a typo in the comments	2016-05-05 10:15:17 +02:00
antirez	4fdde78c72	New masters with slots are now targets of migration if others are. This fixes issue #3043. Before this fix, after a complete resharding of a master slots to other nodes, the master remains empty and the slaves migrate away to other masters with non-zero nodes. However the old master now empty, is no longer considered a target for migration, because the system has no way to tell it had slaves in the past. This fix leaves the algorithm used in the past untouched, but adds a new rule. When a new or old master which is empty and without slaves, are assigend with their first slot, if other masters in the cluster have slaves, they are automatically considered to be targets for replicas migration.	2016-05-02 18:37:30 +02:00
antirez	b841f3ad1a	Cluster: store busport with different separator in CLUSTER NODES. We need to be able to correctly parse the node address in the case of IPv6 addresses.	2016-02-02 08:20:04 +01:00
antirez	92b9de2417	Cluster announce: WIP, allow building again.	2016-02-01 18:16:25 +01:00
antirez	e27b9b1cec	Merge branch 'cluster-docker' into unstable	2016-02-01 18:01:22 +01:00
antirez	c285862621	Cluster: include node IDs in SLOTS output. CLUSTER SLOTS now includes IDs in the nodes description associated with a given slot range. Certain client libraries implementations need a way to reference a node in an unique way, so they were relying on CLUSTER NODES, that is not a stable API and may change frequently depending on Redis Cluster future requirements.	2016-01-29 12:00:40 +01:00
antirez	d0a8512eda	Cluster anounce-ip/port WIP.	2016-01-29 09:06:37 +01:00
antirez	4abf486ca3	Cluster announce port: set port/bport for myself at startup.	2016-01-29 09:06:37 +01:00
antirez	1c038379f7	Cluster: persist bus port in nodes.conf.	2016-01-29 09:06:37 +01:00
antirez	dc98907e50	Cluster announce ip: take myself->ip always in sync.	2016-01-29 09:06:37 +01:00
antirez	11436b1449	Cluster announce ip / port initial implementation.	2016-01-29 09:06:37 +01:00
Itamar Haber	9e46bf22ed	Fixes a typo	2016-01-28 21:47:18 +02:00
antirez	5bbb09ed2c	Cluster: check packets length before accessing far fields.	2016-01-27 16:35:21 +01:00
antirez	fe44a7cb60	Cluster: mismatch sender ID log put back at DEBUG level.	2016-01-26 14:21:18 +01:00
antirez	d6c5922f75	Cluster: fix missing ntohs() call to access gossip section port.	2016-01-26 14:18:13 +01:00
antirez	592419b4ca	Better address udpate strategy when processing gossip sections. The change covers the case where: 1. There is a node we can't reach (in fail or pfail state). 2. We see a different address for this node, in the gossip section sent to us by a node that, instead, is able to talk with the node we cannot talk to. In this case it's a good bet to switch to the address reported by this node, since there was an address switch and it is able to talk with the node and we are not. However previosuly this was done in a dangerous way, by initiating an handshake. The handshake, using the MEET packet, forces the receiver to join our cluster, and this is not a good idea. If the node in question really just switched address, but is the same node, it already knows about us, so we just need to perform an address update and a reconnection. So with this commit instead we just update the address of the node, release the node link if any, and attempt to reconnect in the next clusterCron() cycle. The commit also improves debugging messages printed by Cluster during address or ID switches.	2016-01-26 12:32:53 +01:00
antirez	83b862a30e	Minor MIGRATE refactoring. Centralize cleanup of newargv in a single place. Add more comments to help a bit following a complex function. Related to issue #3016.	2016-01-19 09:53:04 +01:00
antirez	f5a1e608cc	More variadic MIGRATE fixes. Another leak was fixed in the case of syntax error by restructuring the allocation strategy for the two dynamic vectors. We also make sure to always close the cached socket on I/O errors so that all the I/O errors are handled the same, even if we had a previously queued error of a different kind from the destination server. Thanks to Kevin McGehee. Related to issue #3016.	2016-01-19 09:28:43 +01:00
antirez	00d3a40f82	Various fixes to MIGRATE with multiple keys. In issue #3016 Kevin McGehee identified multiple very serious issues in the new implementation of MIGRATE. This commit attempts to restructure the code in oder to avoid mistakes, an analysis of the new implementation is in progress in order to check for possible edge cases.	2016-01-18 16:49:21 +01:00
antirez	fc3ca8ff87	Cluster: fix setting nodes slaveof pointer to NULL on node release. With this commit we preserve the list of nodes that have .slaveof set to the node, even when the node is turned into a slave, and make sure to fix the .slaveof pointers to NULL when a node is freed from memory, regardless of the fact it's a slave or a master. Basically we try to remember the logical master in the current configuration even if the logical master advertised it as a slave already. However we still remember the associations, so that when a node is freed we can fix them. This should fix issue #3002.	2016-01-14 17:34:49 +01:00
antirez	02c40c9dc2	CLUSTER BUMPEPOCH initial implementation fixed.	2016-01-11 15:39:11 +01:00
antirez	b58796f520	Cluster: CLUSTER BUMPEPOCH introduced to help redis-trib fix. Sometimes during "fixes" we have to setup a new configuration and assign slots to nodes. With BUMPEPOCH we can make sure the new configuration of the node will win if there are conflicting configurations (for example another node is also claiming the same slot because the cluster is totally messed up).	2016-01-11 15:01:14 +01:00
antirez	524be1e465	Cluster: don't allow CLUSTER SETSLOT with slaves.	2016-01-11 15:00:45 +01:00
antirez	e15e518a67	Allow MIGRATE to always be called on local keys for open slots. Extend the MIGRATE extra freedom to be able to be called in the context of the local slot, anytime there is a slot open in one or the other direction (importing or migrating). This is useful for redis-trib to fix the cluster when it has in an odd state. Thix fix allows "redis-trib fix" to make its work in certain cases where previously an error was reported.	2016-01-08 15:04:16 +01:00
antirez	36704d653b	Fix typos & grammar in clusterBumpConfigEpochWithoutConsensus() comment.	2016-01-08 12:07:54 +01:00
antirez	00d637f2cc	Cluster: don't send -ASK to MIGRATE. For non existing keys, we don't want to send -ASK redirections to MIGRATE, since when moving slots from the migrating node to the importing node, we want just to ignore keys that are no longer there. They may be expired or deleted between the GETKEYSINSLOT call and the MIGRATE call. Otherwise this causes an error during migrations with redis-trib (or equivalent cluster management tools).	2016-01-06 12:14:49 +01:00
antirez	b9aeb98156	Suppress harmless warnings.	2015-12-16 12:36:32 +01:00
antirez	ac0a731057	MIGRATE: Fix new argument rewriting refcount handling.	2015-12-11 14:26:41 +01:00
antirez	d85fc1e9cf	MIGRATE: fix replies processing and argument rewriting. We need to process replies after errors in order to delete keys successfully transferred. Also argument rewriting was fixed since it was broken in several ways. Now a fresh argument vector is created and set if we are acknowledged of at least one key.	2015-12-11 14:04:47 +01:00
antirez	9ebf7a6776	Pipelined multiple keys MIGRATE.	2015-12-11 13:38:26 +01:00
antirez	adc2fe6993	Cluster: replica migration with delay. We wait a fixed amount of time (5 seconds currently) much greater than the usual Cluster node to node communication latency, before migrating. This way when a failover occurs, before detecting the new master as a target for migration, we give the time to its natural slaves (the slaves of the failed over master) to announce they switched to the new master, preventing an useless migration operation.	2015-12-11 09:19:06 +01:00
antirez	4159055f83	Remove debugging message left there for error.	2015-12-10 08:56:33 +01:00
antirez	e0f22df995	Fix replicas migration by adding a new flag. Some time ago I broken replicas migration (reported in #2924). The idea was to prevent masters without replicas from getting replicas because of replica migration, I remember it to create issues with tests, but there is no clue in the commit message about why it was so undesirable. However my patch as a side effect totally ruined the concept of replicas migration since we want it to work also for instances that, technically, never had slaves in the past: promoted slaves. So now instead the ability to be targeted by replicas migration, is a new flag "migrate-to". It only applies to masters, and is set in the following two cases: 1. When a master gets a slave, it is set. 2. When a slave turns into a master because of fail over, it is set. This way replicas migration targets are only masters that used to have slaves, and slaves of masters (that used to have slaves... obviously) and are promoted. The new flag is only internal, and is never exposed in the output nor persisted in the nodes configuration, since all the information to handle it are implicit in the cluster configuration we already have.	2015-12-09 23:03:18 +01:00
antirez	a0d41e51c2	Redis Cluster: hint about validity factor when slave can't failover.	2015-11-27 08:59:17 +01:00
antirez	c69c6c80fb	Lazyfree: ability to free whole DBs in background.	2015-10-01 13:02:26 +02:00
antirez	a7c5be18a8	Lazyfree: Sorted sets convereted to plain SDS. (several commits squashed)	2015-10-01 13:02:24 +02:00
antirez	02b1d5213d	RDMF: use representClusterNodeFlags() generic name.	2015-07-27 15:08:58 +02:00
antirez	3325a9b11f	RDMF: more names updated.	2015-07-27 15:03:10 +02:00
antirez	32f80e2f1b	RDMF: More consistent define names.	2015-07-27 14:37:58 +02:00
antirez	40eb548a80	RDMF: REDIS_OK REDIS_ERR -> C_OK C_ERR.	2015-07-26 23:17:55 +02:00
antirez	2d9e3eb107	RDMF: redisAssert -> serverAssert.	2015-07-26 15:29:53 +02:00
antirez	14ff572482	RDMF: OBJ_ macros for object related stuff.	2015-07-26 15:28:00 +02:00
antirez	554bd0e7bd	RDMF: use client instead of redisClient, like Disque.	2015-07-26 15:20:52 +02:00
antirez	424fe9afd9	RDMF: redisLog -> serverLog.	2015-07-26 15:17:43 +02:00
antirez	cef054e868	RDMF (Redis/Disque merge friendlyness) refactoring WIP 1.	2015-07-26 15:17:18 +02:00
Jan-Erik Rediger	d28c51d166	Do not attempt to lock on Solaris	2015-06-24 14:57:15 +02:00
antirez	a401a84eb2	Don't try to bind the source address for MIGRATE Related to issues #2609 and #2612.	2015-06-11 14:34:38 +02:00
antirez	9b7f8b1c9b	Cluster: redirection refactoring + handling of blocked clients. There was a bug in Redis Cluster caused by clients blocked in a blocking list pop operation, for keys no longer handled by the instance, or in a condition where the cluster became down after the client blocked. A typical situation is: 1) BLPOP <somekey> 0 2) <somekey> hash slot is resharded to another master. The client will block forever int this case. A symmentrical non-cluster-specific bug happens when an instance is turned from master to slave. In that case it is more serious since this will desynchronize data between slaves and masters. This other bug was discovered as a side effect of thinking about the bug explained and fixed in this commit, but will be fixed in a separated commit.	2015-03-24 11:56:24 +01:00
antirez	94030fa4d7	Two cluster.c comments improved.	2015-03-21 12:12:23 +01:00
antirez	2950824ab6	Cluster: TAKEOVER option for manual failover.	2015-03-21 11:54:32 +01:00
antirez	a7010ae208	Cluster: non-conditional steps of slave failover refactored into a function.	2015-03-20 17:56:21 +01:00
antirez	230d141420	Cluster: separate unknown master check from the rest. In no case we should try to attempt to failover if myself->slaveof is NULL.	2015-03-20 16:56:59 +01:00
antirez	4f2555aa17	Cluster: refactoring around configEpoch handling. This commit moves the process of generating a new config epoch without consensus out of the clusterCommand() implementation, in order to make it reusable for other reasons (current target is to have a CLUSTER FAILOVER option forcing the failover when no master majority is reachable). Moreover the commit moves other functions which are similarly related to config epochs in a new logical section of the cluster.c file, just for clarity.	2015-03-20 16:42:52 +01:00
antirez	25c0f5ac63	Cluster: better cluster state transiction handling. Before we relied on the global cluster state to make sure all the hash slots are linked to some node, when getNodeByQuery() is called. So finding the hash slot unbound was checked with an assertion. However this is fragile. The cluster state is often updated in the clusterBeforeSleep() function, and not ASAP on state change, so it may happen to process clients with a cluster state that is 'ok' but yet certain hash slots set to NULL. With this commit the condition is also checked in getNodeByQuery() and reported with a identical error code of -CLUSTERDOWN but slightly different error message so that we have more debugging clue in the future. Root cause of issue #2288.	2015-03-20 09:59:28 +01:00
antirez	438a1a84e8	Cluster: more robust slave check in CLUSTER REPLICATE. There are rare conditions where node->slaveof may be NULL even if the node is a slave. To check by flag is much more robust.	2015-03-18 12:10:14 +01:00
antirez	93b1320fac	Cluster: fix CLUSTER NODES optimization error in 'j' increment.	2015-03-13 13:16:35 +01:00
antirez	e1b6c9dd18	Cluster: CLUSTER NODES speedup.	2015-03-13 11:26:04 +01:00
Michel Martens	6201eb0c55	Add command CLUSTER MYID	2015-03-10 16:43:19 +00:00
antirez	c77081a45a	Migrate: replace conditional with pre-computed value.	2015-02-27 22:33:54 +01:00
antirez	832b0c7cce	Improvements to PR #2425 1. Remove useless "cs" initialization. 2. Add a "select" var to capture a condition checked multiple times. 3. Avoid duplication of the same if (!copy) conditional. 4. Don't increment dirty if copy is given (no deletion is performed), otherwise we propagate MIGRATE when not needed.	2015-02-26 10:27:56 +01:00
Tommy Wang	7fda935ad3	Add last_dbid to migrateCachedSocket to avoid redundant SELECT Avoid redundant SELECT calls when continuously migrating keys to the same dbid within a target Redis instance.	2015-02-26 10:18:43 +01:00
Salvatore Sanfilippo	d83c810265	Merge pull request #2301 from mattsta/fix/lengths Improve type correctness	2015-02-24 17:22:53 +01:00
antirez	233729fe7f	Cluster: some bias towwards FAIL/PFAIL nodes in gossip sections. This improves PFAIL -> FAIL switch. Too late at this point in the RC releases to add proper PFAIL/FAIL separate dictionary to do this in a less randomized way. Tested in practice with experiments that this helps. PFAIL -> FAIL average with 20 nodes and node-timeout set to 5 seconds takes 2.5 seconds without this commit, 1 second with this commit.	2015-01-30 11:55:36 +01:00
antirez	69b4f00d28	More correct wanted / maxiterations values in clusterSendPing().	2015-01-30 11:23:27 +01:00
antirez	e5a22064cc	Cluster: magical 10% of nodes explained in comments.	2015-01-29 15:43:35 +01:00
antirez	1efacfe53d	CLUSTER count-failure-reports command added.	2015-01-29 15:02:10 +01:00
antirez	3fd43062c8	Cluster: use a number of gossip sections proportional to cluster size. Otherwise it is impossible to receive the majority of failure reports in the node_timeout*2 window in larger clusters. Still with a 200 nodes cluster, 20 gossip sections are a very reasonable amount of bytes to send. A side effect of this change is also fater cluster nodes joins for large clusters, because the cluster layout makes less time to propagate.	2015-01-29 14:20:59 +01:00
antirez	9802ec3c83	Cluster: initialized not used fileds in gossip section. Otherwise we risk sending not initialized data to other nodes, that may contain anything. This was actually not possible only because the initialization of the buffer where the cluster packets header is created was larger than the 3 gossip sections we use, so the memory was already all filled with zeroes by the memset().	2015-01-24 07:52:24 +01:00
Matt Stancliff	051a43e03a	Fix cluster migrate memory leak Fixes valgrind error: 48 bytes in 1 blocks are definitely lost in loss record 196 of 373 at 0x4910D3: je_malloc (jemalloc.c:944) by 0x42807D: zmalloc (zmalloc.c:125) by 0x41FA0D: dictGetIterator (dict.c:543) by 0x41FA48: dictGetSafeIterator (dict.c:555) by 0x459B73: clusterHandleSlaveMigration (cluster.c:2776) by 0x45BF27: clusterCron (cluster.c:3123) by 0x423344: serverCron (redis.c:1239) by 0x41D6CD: aeProcessEvents (ae.c:311) by 0x41D8EA: aeMain (ae.c:455) by 0x41A84B: main (redis.c:3832)	2015-01-21 18:47:16 +01:00
Matt Stancliff	29049507ec	Fix potential invalid read past end of array If array has N elements, we can't read +1 if we are already at N. Also, we need to move elements by their storage size in the array, not just by individual bytes.	2015-01-21 18:01:03 +01:00
Matt Stancliff	30152554ea	Fix cluster reset memory leak [maybe] Fixes valgrind errors: 32 bytes in 4 blocks are definitely lost in loss record 107 of 228 at 0x80EA447: je_malloc (jemalloc.c:944) by 0x806E59C: zrealloc (zmalloc.c:125) by 0x80A9AFC: clusterSetMaster (cluster.c:801) by 0x80AEDC9: clusterCommand (cluster.c:3994) by 0x80682A5: call (redis.c:2049) by 0x8068A20: processCommand (redis.c:2309) by 0x8076497: processInputBuffer (networking.c:1143) by 0x8073BAF: readQueryFromClient (networking.c:1208) by 0x8060E98: aeProcessEvents (ae.c:412) by 0x806123B: aeMain (ae.c:455) by 0x806C3DB: main (redis.c:3832) 64 bytes in 8 blocks are definitely lost in loss record 143 of 228 at 0x80EA447: je_malloc (jemalloc.c:944) by 0x806E59C: zrealloc (zmalloc.c:125) by 0x80AAB40: clusterProcessPacket (cluster.c:801) by 0x80A847F: clusterReadHandler (cluster.c:1975) by 0x30000FF: ??? 80 bytes in 10 blocks are definitely lost in loss record 148 of 228 at 0x80EA447: je_malloc (jemalloc.c:944) by 0x806E59C: zrealloc (zmalloc.c:125) by 0x80AAB40: clusterProcessPacket (cluster.c:801) by 0x80A847F: clusterReadHandler (cluster.c:1975) by 0x2FFFFFF: ???	2015-01-21 17:51:57 +01:00
Matt Stancliff	72b8574cca	Fix sending uninitialized bytes Fixes valgrind error: Syscall param write(buf) points to uninitialised byte(s) at 0x514C35D: ??? (syscall-template.S:81) by 0x456B81: clusterWriteHandler (cluster.c:1907) by 0x41D596: aeProcessEvents (ae.c:416) by 0x41D8EA: aeMain (ae.c:455) by 0x41A84B: main (redis.c:3832) Address 0x5f268e2 is 2,274 bytes inside a block of size 8,192 alloc'd at 0x4932D1: je_realloc (jemalloc.c:1297) by 0x428185: zrealloc (zmalloc.c:162) by 0x4269E0: sdsMakeRoomFor.part.0 (sds.c:142) by 0x426CD7: sdscatlen (sds.c:251) by 0x4579E7: clusterSendMessage (cluster.c:1995) by 0x45805A: clusterSendPing (cluster.c:2140) by 0x45BB03: clusterCron (cluster.c:2944) by 0x423344: serverCron (redis.c:1239) by 0x41D6CD: aeProcessEvents (ae.c:311) by 0x41D8EA: aeMain (ae.c:455) by 0x41A84B: main (redis.c:3832) Uninitialised value was created by a stack allocation at 0x457810: nodeUpdateAddressIfNeeded (cluster.c:1236)	2015-01-21 17:50:17 +01:00
antirez	2601e3e461	Cluster: node deletion cleanup / centralization.	2015-01-21 16:03:43 +01:00
antirez	59ad6ac5fe	Cluster: set the slaves->slaveof filed to NULL when master is freed. Related to issue #2289.	2015-01-21 15:55:53 +01:00
Matt Stancliff	53c082ec39	Improve networking type correctness read() and write() return ssize_t (signed long), not int. For other offsets, we can use the unsigned size_t type instead of a signed offset (since our replication offsets and buffer positions are never negative).	2015-01-19 14:10:12 -05:00
antirez	cf76af6b9f	Cluster: fetch my IP even if msg is not MEET for the first time. In order to avoid that misconfigured cluster nodes at some time may force an IP update on other nodes, it is required that nodes update their own address only on MEET messages. However it does not make sense to do this the first time a node is contacted and yet does not have an IP, we just risk that myself->ip remains not assigned if there are messages lost or cluster creation procedures that don't make sure everybody is targeted by at least one incoming MEET message. Also fix the logging of the IP switch avoiding the :-1 tail.	2015-01-13 10:50:34 +01:00
antirez	5b0f4a83ac	Cluster: clusterMsgDataGossip structure, explict padding + minor stuff. Also explicitly set version to 0, add a protocol version define, improve comments in the gossip structure. Note that the structure layout is the same after the change, we are just making the padding explicit with an additional not used 16 bits field. So this commit is still able to talk with the previous versions of cluster nodes.	2015-01-13 10:40:09 +01:00
antirez	237ab727b9	Suppress valgrind error about write sending uninitialized data. Valgrind checks that the buffers we transfer via syscalls are all composed of bytes actually initialized. This is useful, it makes we able to avoid leaking informations in non initialized parts fo messages transferred to other hosts. This commit fixes one of such issues.	2015-01-13 09:31:37 +01:00
antirez	6274a6789d	Cluster: initialize mf_end. Can't be initialized by resetManualFailover() since it's actual state the function uses, so we need to initialize it at startup time. Not really a bug in practical terms, but showed up into valgrind and is not technically correct anyway.	2015-01-12 15:55:00 +01:00
Matt Stancliff	ad41a7c404	Add addReplyBulkSds() function Refactor a common pattern into one function so we don't end up with copy/paste programming.	2014-12-23 09:31:02 -05:00
Matt Stancliff	a772747ffc	Cluster: Notify user on accept error If we woke up to accept a connection, but we can't accept it, inform the user of the error going on with their networking. (The previous message was the same for success or error!)	2014-12-17 10:49:32 -05:00
antirez	1aef29e079	Fix comment in clusterHandleSlaveFailover().	2014-12-16 15:03:12 +01:00
antirez	90c7d8cfa1	Make sure buffer is enough in clusterSendPing().	2014-12-15 10:18:22 +01:00
antirez	ce269ad3c5	AnetFormatIP(): renamed, commented, now sticks to IP:port format. A few code style changes + consistent format: not nice for humans but better for parsers.	2014-12-11 18:20:30 +01:00
Matt Stancliff	491881e13b	Cleanup all IP formatting code Instead of manually checking for strchr(n,':') everywhere, we can use our new centralized IP formatting functions.	2014-12-11 10:12:18 -05:00
antirez	06e76bc3e2	Better read-only behavior for expired keys in slaves. Slaves key expire is orchestrated by the master. Sometimes the master will send the synthesized DEL to expire keys on the slave with a non trivial delay (when the key is not accessed, only the incremental expiry algorithm will expire it in background). During that time, a key is logically expired, but slaves still return the key if you GET (or whatever) it. This is a bad behavior. However we can't simply trust the slave view of the key, since we need the master to be able to send write commands to update the slave data set, and DELs should only happen when the key is expired in the master in order to ensure consistency. However 99.99% of the issues with this behavior is when a client which is not a master sends a read only command. In this case we are safe and can consider the key as non existing. This commit does a few changes in order to make this sane: 1. lookupKeyRead() is modified in order to return NULL if the above conditions are met. 2. Calls to lookupKeyRead() in commands actually writing to the data set are repliaced with calls to lookupKeyWrite(). There are redundand checks, so for example, if in "2" something was overlooked, we should be still safe, since anyway, when the master writes the behavior is to don't care about what expireIfneeded() returns. This commit is related to #1768, #1770, #2131.	2014-12-10 16:10:21 +01:00
antirez	669aa2a210	Cluster PUBLISH message: fix totlen count. bulk_data field size was not removed from the count. It is not possible to declare it simply as 'char bulk_data[]' since the structure is nested into another structure.	2014-11-28 10:21:47 +01:00
Salvatore Sanfilippo	5a526c22cc	Merge pull request #2096 from mattsta/cluster-ipv6 Enable Cluster IPv6 Support	2014-10-31 10:38:22 +01:00
Matt Stancliff	0014966c1e	Networking: add more outbound IP binding fixes Same as the original bind fixes (we just missed these the first time around). This helps Redis not automatically send connections from the first IP on an interface if we are bound to a specific IP address (e.g. with multiple IP aliases on one interface, you want to send from _your_ IP, not from the first IP on the interface).	2014-10-29 15:09:09 -04:00
Matt Stancliff	daca1edb6e	Parse cluster state file in IPv6 compatible way We need to pick the port based on the _last_ colon, not the first one.	2014-10-29 15:08:35 -04:00
antirez	5f6950caa8	Cluster: process gossip section only for known nodes. With the exception of nodes sending MEET packets: we have to trust them since they can send us MEET packets only when the cluster is initially created or because sysadmin manual action.	2014-10-08 16:58:12 +02:00
antirez	36e34a656a	Cluster: fix logic to detect we are among a minority. In the cluster evaluation function we are supposed to set the cluster state as "fail" if we are among a minority, however the code was not detecting to be into a minority partition if exactly half the masters were reachable, which is a minority.	2014-10-08 16:27:07 +02:00
antirez	edb3987a06	Cluster: more chatty slaves when failover is stalled.	2014-10-07 09:51:55 +02:00
Matt Stancliff	12d0195b30	Clean up text throughout project - Remove trailing newlines from redis.conf - Fix comment misspelling - Clarifies zipEncodeLength usage and a C API mention (#1243, #1242) - Fix cluster typos (inspired by @papanikge #1507) - Fix rewite -> rewrite in a few places (inspired by #682) Closes #1243, #1242, #1507	2014-09-29 06:49:07 -04:00
antirez	2374496799	Cluster: claim ping_sent time even if we can't connect. This fixes a potential bug that was never observed in practice since what happens is that the asynchronous connect returns ok (to fail later, calling the handler) every time, so a ping is queued, and sent_ping happens to always be populated. Howver technically connect(2) with a non blocking socket may return an error synchronously, so before this fix the code was not correct.	2014-09-17 16:39:41 +02:00
antirez	c89afc8e5d	Cluster: new option to work with partial slots coverage.	2014-09-17 11:10:09 +02:00
Matt Stancliff	60c448b584	Cluster: Fix segfault if cluster config corrupt This commit adds a size check after initial config line parsing to make sure we have at least 8 arguments per line. Also, instead of asserting for cluster->myself, we just test and error out normally (since the error does a hard exit anyway). Closes #1597	2014-08-25 10:11:38 +02:00
Matt Stancliff	879e18b7ec	Fix memory leak in cluster config parsing The continue stop us from triggering the free after the long line for loop, so add it earlier.	2014-08-18 11:27:19 +02:00
Matt Stancliff	6a7a32a806	Clarify existing slot wording on cluster start	2014-08-18 10:58:00 +02:00
antirez	edca2b14d2	Remove warnings and improve integer sign correctness.	2014-08-13 11:44:38 +02:00
antirez	ded57795ff	representRedisNodeFlags() moved into right code section. The funciton was also modified in order to be more standalone and produce an output without trailing spaces, making the reuse simpler. The global variable was renamed in cammel case as most other Redis globals, except the main ones we refer too many times, like 'server'.	2014-08-08 15:53:42 +02:00
charsyam	de5465baf7	Refactor cluster flag printing Less copy/paste code duplication. Closes #952	2014-08-08 15:39:44 +02:00
SungBin_Hong	dec58464d8	Free memory in clusterLoadConfig error handler Closes #1327	2014-08-08 14:40:32 +02:00
antirez	0d9bcb1c12	Cluster: don't migrate to a master that never had slaves. Replica migration algorithm modified so that slaves never try to migrate to masters that were never configured to have slaves in the past. We want the algorithm to take care of masters that remained without working slaves, but that used to have slaves according to the cluster configuration.	2014-07-25 11:02:09 +02:00
antirez	89af463124	CLUSTER RESET: Flush dataset if node is a slave. For non-empty masters, CLUSTER RESET is denied, and the user requires to start to reset a node by explicitly clearing it with FLUSHALL. However CLUSTER RESET when executed with slaves don't have this restrictions since data is just a replica of the master, and with read-only slaves it is also not possible to remove the data set. However the node was turned from slave to master after a reset, without touching the old slave data. This is 99.99% of times not appropriate and forces full resets to follow this path to work with both slave and master nodes: FLUSHALL CLUSTER RESET HARD FLUSHALL Since we need the first flushall for masters, and the second for slaves. This commit changes the behavior so that CLUSTER RESET removes the data set of a slave node during a reset, in the moment it gets turned into a master, so the new pattern is simply: FLUSHALL (that may fail for slaves) CLUSTER RESET	2014-07-22 15:29:57 +02:00
antirez	95b1979c32	No more trailing spaces in Redis source code.	2014-06-26 18:48:40 +02:00
antirez	75c57d53ea	CLUSTER SLOTS: don't output failing slaves. While we have to output failing masters in order to provide an accurate map (that may be the one of a Redis Cluster in down state because not all slots are served by a working master), to provide slaves in FAIL state is not a good idea since those are not necesarely needed, and the client will likely incur into a latency penalty trying to connect with a slave which is down. Note that this means that CLUSTER SLOTS does not provide a complete map of slaves, however this would not be of any help since slaves may be added later, and a client that needs to scale reads and requires to stay updated with the list of slaves, need to do a refresh of the map from time to time, anyway.	2014-06-25 15:19:35 +02:00
antirez	a6fe4ca321	CLUSTER SLOTS: main loop should skip only slaves and zero slot masters.	2014-06-25 15:08:33 +02:00
Matt Stancliff	e14829de30	Cluster: Add CLUSTER SLOTS command CLUSTER SLOTS returns a Redis-formatted mapping from slot ranges to IP/Port pairs serving that slot range. The outer return elements group return values by slot ranges. The first two entires in each result are the min and max slots for the range. The third entry in each result is guaranteed to be either an IP/Port of the master for that slot range - OR - null if that slot range, for some reason, has no master The 4th and higher entries in each result are replica instances for the slot range. Output comparison: 127.0.0.1:7001> cluster nodes f853501ec8ae1618df0e0f0e86fd7abcfca36207 127.0.0.1:7001 myself,master - 0 0 2 connected 4096-8191 5a2caa782042187277647661ffc5da739b3e0805 127.0.0.1:7005 slave f853501ec8ae1618df0e0f0e86fd7abcfca36207 0 1402622415859 6 connected 6c70b49813e2ffc9dd4b8ec1e108276566fcf59f 127.0.0.1:7007 slave 26f4729ca0a5a992822667fc16b5220b13368f32 0 1402622415357 8 connected 2bd5a0e3bb7afb2b56a2120d3fef2f2e4333de1d 127.0.0.1:7006 slave 32adf4b8474fdc938189dba00dc8ed60ce635b0f 0 1402622419373 7 connected 5a9450e8279df36ff8e6bb1c139ce4d5268d1390 127.0.0.1:7000 master - 0 1402622418872 1 connected 0-4095 32adf4b8474fdc938189dba00dc8ed60ce635b0f 127.0.0.1:7002 master - 0 1402622419874 3 connected 8192-12287 5db7d05c245267afdfe48c83e7de899348d2bdb6 127.0.0.1:7004 slave 5a9450e8279df36ff8e6bb1c139ce4d5268d1390 0 1402622417867 5 connected 26f4729ca0a5a992822667fc16b5220b13368f32 127.0.0.1:7003 master - 0 1402622420877 4 connected 12288-16383 127.0.0.1:7001> cluster slots 1) 1) (integer) 0 2) (integer) 4095 3) 1) "127.0.0.1" 2) (integer) 7000 4) 1) "127.0.0.1" 2) (integer) 7004 2) 1) (integer) 12288 2) (integer) 16383 3) 1) "127.0.0.1" 2) (integer) 7003 4) 1) "127.0.0.1" 2) (integer) 7007 3) 1) (integer) 4096 2) (integer) 8191 3) 1) "127.0.0.1" 2) (integer) 7001 4) 1) "127.0.0.1" 2) (integer) 7005 4) 1) (integer) 8192 2) (integer) 12287 3) 1) "127.0.0.1" 2) (integer) 7002 4) 1) "127.0.0.1" 2) (integer) 7006	2014-06-25 15:03:41 +02:00
antirez	f29b12d0bf	Cluster: myself->ip autodiscovery. Instead of having an hardcoded IP address in the node configuration, we autodiscover it via MEET messages for automatic update when the node is restarted with a different IP address. This mechanism was discussed in the context of PR #1782.	2014-06-25 11:28:57 +02:00
Matt Stancliff	d830dcb12d	Add REDIS_BIND_ADDR access macro We need to access (bindaddr[0] \|\| NULL) in a few places, so centralize access with a nice macro.	2014-06-23 11:44:34 +02:00
antirez	22d17bc14f	Cluster: clear NOADDR flag when updating node address.	2014-06-20 09:32:47 +02:00
antirez	8ef79e72ac	Cluster: fix an error message when logging failover auth denied.	2014-06-10 17:39:42 +02:00
antirez	58799718be	Cluster: better comment for clusterSendFailoverAuthIfNeeded() epoch test.	2014-06-10 17:20:21 +02:00
antirez	61eb0eae83	Cluster: log granted failover authorizations.	2014-06-10 16:56:08 +02:00
antirez	d5d92deb6c	Cluster: log configEpoch updates to myself.	2014-06-10 16:38:36 +02:00
antirez	8204ab0098	Cluster: log when a master denies a failover auth.	2014-06-10 16:07:26 +02:00
antirez	9b3bc82c1a	Cluster: cluster_my_epoch added to CLUSTER INFO output.	2014-06-10 11:35:40 +02:00
antirez	32d0a79f78	Cluster: check that configEpoch never goes back. Since there are ways to alter the configEpoch outside of the failover procedure (for exampel CLUSTER SET-CONFIG-EPOCH and via the configEpoch collision resolution algorithm), make always sure, before replacing our configEpoch with a new one, that it is greater than the current one.	2014-06-07 14:37:09 +02:00
antirez	a2c2ef7de5	Cluster: SET-CONFIG-EPOCH should update currentEpoch. SET-CONFIG-EPOCH, used by redis-trib at cluster creation time, failed to update the currentEpoch, making it possible after a failover for a server to set its configEpoch to a value smaller than the current one (since configEpochs are obtained using currentEpoch). The bug totally break the Redis Cluster algorithms and protocols allowing for permanent split brain conditions about the slots configuration as shown in issue #1799.	2014-06-07 14:25:47 +02:00
antirez	88c2307535	Cluster: always allow ok -> fail switch in clusterUpdateState(). There is a time defined by REDIS_CLUSTER_WRITABLE_DELAY where fail -> ok switch is not possible after startup as a master for some time, however the contrary (ok -> fail) should always be possible.	2014-05-26 16:24:12 +02:00
antirez	39603a7e31	Cluster: slave validity factor is now user configurable. Check the commit changes in the example redis.conf for more information.	2014-05-22 16:57:54 +02:00
antirez	67133d2f48	Cluster: use clusterSetNodeAsMaster() during slave failover. clusterHandleSlaveFailover() was reimplementing what clusterSetNodeAsMaster() without any good reason.	2014-05-15 17:03:28 +02:00
antirez	8c6e92c3bc	Cluster: clear todo_before_sleep flags when executing actions. Thanks to this change, when there is some code like: clusterDoBeforeSleep(CLUSTER_TODO_UPDATE_STATE\|...); ... and later before returning to the event loop ... clusterUpdateState(); The clusterUpdateState() function will clar the flag and will not be repeated in the clusterBeforeSleep() function. This especially important for config save/fsync flags which are slow to execute and not a good idea to repeat without a good reason. This is implemented for all the CLUSTER_TODO flags.	2014-05-15 16:33:13 +02:00
antirez	7b87cda70e	Fixed typo in CLUSTER RESET implementation.	2014-05-15 12:33:57 +02:00
antirez	796f4ae9f7	CLUSTER RESET implemented. The new command is able to reset a cluster node so that it starts again as a fresh node. By default the command performs a soft reset (the same as calling it as CLUSTER RESET SOFT), and the following steps are performed: 1) All slots are set as unassigned. 2) The list of known nodes is flushed. 3) Node is set as master if it is a slave. When an hard reset is performed with CLUSTER RESET HARD the following additional operations are performed: 4) A new Node ID is created at random. 5) Epochs are set to 0. CLUSTER RESET is useful both when the sysadmin wants to reconfigure a node with a different role (for example turning a slave into a master) and for testing purposes. It also may play a role in automatically provisioned Redis Clusters, since it allows to reset a node back to the initial state in order to be reconfigured.	2014-05-15 11:43:06 +02:00
antirez	8b9d5ecbd1	Remove trailing spaces from cluster.c file.	2014-05-15 10:18:36 +02:00
antirez	60e5d1724c	Cluster: don't accept cluster bus connections during startup.	2014-05-14 12:05:00 +02:00
antirez	6baac558d8	Cluster: better handling of stolen slots. The previous code handling a lost slot (by another master with an higher configuration for the slot) was defensive, considering it an error and putting the cluster in an odd state requiring redis-cli fix. This was changed, because actually this only happens either in a legitimate way, with failovers, or when the admin messed with the config in order to reconfigure the cluster. So the new code instead will try to make sure that the keys stored match the new slots map, by removing all the keys in the slots we lost ownership from. The function that deletes the keys from the lost slots is called only if the node does not lose all its slots (resulting in a reconfiguration as a slave of the node that got ownership). This is an optimization since the replication code will anyway flush all the instance data in a faster way.	2014-05-14 10:46:37 +02:00
antirez	832a298005	Cluster: fixed data_age computation / check integer overflow.	2014-05-12 17:46:15 +02:00
antirez	2692339138	Cluster: forced failover implemented. Using CLUSTER FAILOVER FORCE it is now possible to failover a master in a forced way, which means: 1) No check to understand if the master is up is performed. 2) No data age of the slave is checked. Evan a slave with very old data can manually failover a master in this way. 3) No chat with the master is attempted to reach its replication offset: the master can just be down.	2014-05-12 16:34:20 +02:00
antirez	005f564eb3	Cluster: bypass data_age check for manual failovers. Automatic failovers only happen in Redis Cluster if the slave trying to be elected was disconnected from its master for no more than 10 times the node-timeout value. However there should be no such a check for manual failovers, since these are initiated by the sysadmin that, in theory, knows what she is doing when a slave is selected to be promoted.	2014-05-12 16:12:12 +02:00
antirez	5c78f87666	RESTORE: reply with -BUSYKEY special error code. The error when the target key is busy was a generic one, while it makes sense to be able to distinguish between the target key busy error and the others easily.	2014-05-12 10:01:59 +02:00
antirez	71d0e7e0ea	CLUSTER MEET: better error messages when address is invalid. Fixes issue #1734.	2014-05-09 16:36:59 +02:00
antirez	8a170c817d	Cluster: bulk-accept new nodes connections. The same change was operated for normal client connections. This is important for Cluster as well, since when a node rejoins the cluster, when a partition heals or after a restart, it gets flooded with new connection attempts by all the other nodes trying to form a full mesh again.	2014-05-09 11:52:59 +02:00
antirez	3625b52791	Cluster: clusterAcceptHandler() comments updated to match the code.	2014-05-09 11:44:46 +02:00
antirez	11d9ecb71d	CLUSTER SET-CONFIG-EPOCH implemented. Initially Redis Cluster accepted that after cluster creation all the nodes were at configEpoch 0, evolving from zero as failovers happen. However later the semantic was made more strict in order to make sure a cluster has always all the master nodes with a different configEpoch, which is more robust in some corner case (especially resulting from errors by the system administrator). To assign different configEpochs to different nodes at startup was a task performed naturally by the config conflicts resolution algorithm (see the Cluster specification). However this works well only for small clusters or when there are actually just a few collisions, since it is designed for exceptional cases. When a large cluster is created hundred of nodes can be at epoch 0, so the conflict resolution code is slow to provide an unique config to each node. For this reason this new command was introduced. It can be called only when a node is totally fresh: no other nodes known, and configEpoch set to zero, so it is safe even against misuses. redis-trib will use the new command in order to start the cluster already setting an incremental unique config to every node.	2014-04-29 19:15:16 +02:00
antirez	e3cf812c9e	clusterLoadConfig() REDIS_ERR retval semantics refined. We should return REDIS_ERR to signal we can't read the configuration because there is no config file only after checking errno, othewise we risk to rewrite an existing file that was not accessible for some other reason.	2014-04-24 16:23:03 +02:00
antirez	db06108bc1	Lock nodes.conf to avoid multiple processes using the same file. This was a common source of problems among users. The solution adopted is not bullet-proof as if the user deletes the nodes.conf file manually, and starts a new instance with the same nodes.conf file path, two instances will use the same file. However following this reasoning the user may drop a nuclear bomb into the datacenter as well.	2014-04-24 16:04:10 +02:00
kingsumos	a69178fdd2	fix cluster node description showing wrong slot allocation	2014-04-22 11:44:53 -04:00
antirez	67bb2c46b2	Add casting to match printf format. adjustOpenFilesLimit() and clusterUpdateSlotsWithConfig() that were assuming uint64_t is the same as unsigned long long, which is true probably for all the systems out there that we target, but still GCC emitted a warning since technically they are two different types.	2014-04-07 08:58:06 +02:00
antirez	8f52173b2c	Cluster: last_vote_epoch -> lastVoteEpoch. Use cammel case for epochs that are persisted on disk.	2014-03-27 15:01:24 +01:00
antirez	7fb14b73ba	Cluster: save/restore vars that must persist after recovery. This fixes issue #1479.	2014-03-27 14:56:29 +01:00

1 2 3 4 5 ...

614 Commits