redict

mirror of https://codeberg.org/redict/redict.git synced 2025-01-24 09:08:26 -05:00

Author	SHA1	Message	Date
antirez	36e34a656a	Cluster: fix logic to detect we are among a minority. In the cluster evaluation function we are supposed to set the cluster state as "fail" if we are among a minority, however the code was not detecting to be into a minority partition if exactly half the masters were reachable, which is a minority.	2014-10-08 16:27:07 +02:00
antirez	edb3987a06	Cluster: more chatty slaves when failover is stalled.	2014-10-07 09:51:55 +02:00
Matt Stancliff	12d0195b30	Clean up text throughout project - Remove trailing newlines from redis.conf - Fix comment misspelling - Clarifies zipEncodeLength usage and a C API mention (#1243, #1242) - Fix cluster typos (inspired by @papanikge #1507) - Fix rewite -> rewrite in a few places (inspired by #682) Closes #1243, #1242, #1507	2014-09-29 06:49:07 -04:00
antirez	2374496799	Cluster: claim ping_sent time even if we can't connect. This fixes a potential bug that was never observed in practice since what happens is that the asynchronous connect returns ok (to fail later, calling the handler) every time, so a ping is queued, and sent_ping happens to always be populated. Howver technically connect(2) with a non blocking socket may return an error synchronously, so before this fix the code was not correct.	2014-09-17 16:39:41 +02:00
antirez	c89afc8e5d	Cluster: new option to work with partial slots coverage.	2014-09-17 11:10:09 +02:00
Matt Stancliff	60c448b584	Cluster: Fix segfault if cluster config corrupt This commit adds a size check after initial config line parsing to make sure we have at least 8 arguments per line. Also, instead of asserting for cluster->myself, we just test and error out normally (since the error does a hard exit anyway). Closes #1597	2014-08-25 10:11:38 +02:00
Matt Stancliff	879e18b7ec	Fix memory leak in cluster config parsing The continue stop us from triggering the free after the long line for loop, so add it earlier.	2014-08-18 11:27:19 +02:00
Matt Stancliff	6a7a32a806	Clarify existing slot wording on cluster start	2014-08-18 10:58:00 +02:00
antirez	edca2b14d2	Remove warnings and improve integer sign correctness.	2014-08-13 11:44:38 +02:00
antirez	ded57795ff	representRedisNodeFlags() moved into right code section. The funciton was also modified in order to be more standalone and produce an output without trailing spaces, making the reuse simpler. The global variable was renamed in cammel case as most other Redis globals, except the main ones we refer too many times, like 'server'.	2014-08-08 15:53:42 +02:00
charsyam	de5465baf7	Refactor cluster flag printing Less copy/paste code duplication. Closes #952	2014-08-08 15:39:44 +02:00
SungBin_Hong	dec58464d8	Free memory in clusterLoadConfig error handler Closes #1327	2014-08-08 14:40:32 +02:00
antirez	0d9bcb1c12	Cluster: don't migrate to a master that never had slaves. Replica migration algorithm modified so that slaves never try to migrate to masters that were never configured to have slaves in the past. We want the algorithm to take care of masters that remained without working slaves, but that used to have slaves according to the cluster configuration.	2014-07-25 11:02:09 +02:00
antirez	89af463124	CLUSTER RESET: Flush dataset if node is a slave. For non-empty masters, CLUSTER RESET is denied, and the user requires to start to reset a node by explicitly clearing it with FLUSHALL. However CLUSTER RESET when executed with slaves don't have this restrictions since data is just a replica of the master, and with read-only slaves it is also not possible to remove the data set. However the node was turned from slave to master after a reset, without touching the old slave data. This is 99.99% of times not appropriate and forces full resets to follow this path to work with both slave and master nodes: FLUSHALL CLUSTER RESET HARD FLUSHALL Since we need the first flushall for masters, and the second for slaves. This commit changes the behavior so that CLUSTER RESET removes the data set of a slave node during a reset, in the moment it gets turned into a master, so the new pattern is simply: FLUSHALL (that may fail for slaves) CLUSTER RESET	2014-07-22 15:29:57 +02:00
antirez	95b1979c32	No more trailing spaces in Redis source code.	2014-06-26 18:48:40 +02:00
antirez	75c57d53ea	CLUSTER SLOTS: don't output failing slaves. While we have to output failing masters in order to provide an accurate map (that may be the one of a Redis Cluster in down state because not all slots are served by a working master), to provide slaves in FAIL state is not a good idea since those are not necesarely needed, and the client will likely incur into a latency penalty trying to connect with a slave which is down. Note that this means that CLUSTER SLOTS does not provide a complete map of slaves, however this would not be of any help since slaves may be added later, and a client that needs to scale reads and requires to stay updated with the list of slaves, need to do a refresh of the map from time to time, anyway.	2014-06-25 15:19:35 +02:00
antirez	a6fe4ca321	CLUSTER SLOTS: main loop should skip only slaves and zero slot masters.	2014-06-25 15:08:33 +02:00
Matt Stancliff	e14829de30	Cluster: Add CLUSTER SLOTS command CLUSTER SLOTS returns a Redis-formatted mapping from slot ranges to IP/Port pairs serving that slot range. The outer return elements group return values by slot ranges. The first two entires in each result are the min and max slots for the range. The third entry in each result is guaranteed to be either an IP/Port of the master for that slot range - OR - null if that slot range, for some reason, has no master The 4th and higher entries in each result are replica instances for the slot range. Output comparison: 127.0.0.1:7001> cluster nodes f853501ec8ae1618df0e0f0e86fd7abcfca36207 127.0.0.1:7001 myself,master - 0 0 2 connected 4096-8191 5a2caa782042187277647661ffc5da739b3e0805 127.0.0.1:7005 slave f853501ec8ae1618df0e0f0e86fd7abcfca36207 0 1402622415859 6 connected 6c70b49813e2ffc9dd4b8ec1e108276566fcf59f 127.0.0.1:7007 slave 26f4729ca0a5a992822667fc16b5220b13368f32 0 1402622415357 8 connected 2bd5a0e3bb7afb2b56a2120d3fef2f2e4333de1d 127.0.0.1:7006 slave 32adf4b8474fdc938189dba00dc8ed60ce635b0f 0 1402622419373 7 connected 5a9450e8279df36ff8e6bb1c139ce4d5268d1390 127.0.0.1:7000 master - 0 1402622418872 1 connected 0-4095 32adf4b8474fdc938189dba00dc8ed60ce635b0f 127.0.0.1:7002 master - 0 1402622419874 3 connected 8192-12287 5db7d05c245267afdfe48c83e7de899348d2bdb6 127.0.0.1:7004 slave 5a9450e8279df36ff8e6bb1c139ce4d5268d1390 0 1402622417867 5 connected 26f4729ca0a5a992822667fc16b5220b13368f32 127.0.0.1:7003 master - 0 1402622420877 4 connected 12288-16383 127.0.0.1:7001> cluster slots 1) 1) (integer) 0 2) (integer) 4095 3) 1) "127.0.0.1" 2) (integer) 7000 4) 1) "127.0.0.1" 2) (integer) 7004 2) 1) (integer) 12288 2) (integer) 16383 3) 1) "127.0.0.1" 2) (integer) 7003 4) 1) "127.0.0.1" 2) (integer) 7007 3) 1) (integer) 4096 2) (integer) 8191 3) 1) "127.0.0.1" 2) (integer) 7001 4) 1) "127.0.0.1" 2) (integer) 7005 4) 1) (integer) 8192 2) (integer) 12287 3) 1) "127.0.0.1" 2) (integer) 7002 4) 1) "127.0.0.1" 2) (integer) 7006	2014-06-25 15:03:41 +02:00
antirez	f29b12d0bf	Cluster: myself->ip autodiscovery. Instead of having an hardcoded IP address in the node configuration, we autodiscover it via MEET messages for automatic update when the node is restarted with a different IP address. This mechanism was discussed in the context of PR #1782.	2014-06-25 11:28:57 +02:00
Matt Stancliff	d830dcb12d	Add REDIS_BIND_ADDR access macro We need to access (bindaddr[0] \|\| NULL) in a few places, so centralize access with a nice macro.	2014-06-23 11:44:34 +02:00
antirez	22d17bc14f	Cluster: clear NOADDR flag when updating node address.	2014-06-20 09:32:47 +02:00
antirez	8ef79e72ac	Cluster: fix an error message when logging failover auth denied.	2014-06-10 17:39:42 +02:00
antirez	58799718be	Cluster: better comment for clusterSendFailoverAuthIfNeeded() epoch test.	2014-06-10 17:20:21 +02:00
antirez	61eb0eae83	Cluster: log granted failover authorizations.	2014-06-10 16:56:08 +02:00
antirez	d5d92deb6c	Cluster: log configEpoch updates to myself.	2014-06-10 16:38:36 +02:00
antirez	8204ab0098	Cluster: log when a master denies a failover auth.	2014-06-10 16:07:26 +02:00
antirez	9b3bc82c1a	Cluster: cluster_my_epoch added to CLUSTER INFO output.	2014-06-10 11:35:40 +02:00
antirez	32d0a79f78	Cluster: check that configEpoch never goes back. Since there are ways to alter the configEpoch outside of the failover procedure (for exampel CLUSTER SET-CONFIG-EPOCH and via the configEpoch collision resolution algorithm), make always sure, before replacing our configEpoch with a new one, that it is greater than the current one.	2014-06-07 14:37:09 +02:00
antirez	a2c2ef7de5	Cluster: SET-CONFIG-EPOCH should update currentEpoch. SET-CONFIG-EPOCH, used by redis-trib at cluster creation time, failed to update the currentEpoch, making it possible after a failover for a server to set its configEpoch to a value smaller than the current one (since configEpochs are obtained using currentEpoch). The bug totally break the Redis Cluster algorithms and protocols allowing for permanent split brain conditions about the slots configuration as shown in issue #1799.	2014-06-07 14:25:47 +02:00
antirez	88c2307535	Cluster: always allow ok -> fail switch in clusterUpdateState(). There is a time defined by REDIS_CLUSTER_WRITABLE_DELAY where fail -> ok switch is not possible after startup as a master for some time, however the contrary (ok -> fail) should always be possible.	2014-05-26 16:24:12 +02:00
antirez	39603a7e31	Cluster: slave validity factor is now user configurable. Check the commit changes in the example redis.conf for more information.	2014-05-22 16:57:54 +02:00
antirez	67133d2f48	Cluster: use clusterSetNodeAsMaster() during slave failover. clusterHandleSlaveFailover() was reimplementing what clusterSetNodeAsMaster() without any good reason.	2014-05-15 17:03:28 +02:00
antirez	8c6e92c3bc	Cluster: clear todo_before_sleep flags when executing actions. Thanks to this change, when there is some code like: clusterDoBeforeSleep(CLUSTER_TODO_UPDATE_STATE\|...); ... and later before returning to the event loop ... clusterUpdateState(); The clusterUpdateState() function will clar the flag and will not be repeated in the clusterBeforeSleep() function. This especially important for config save/fsync flags which are slow to execute and not a good idea to repeat without a good reason. This is implemented for all the CLUSTER_TODO flags.	2014-05-15 16:33:13 +02:00
antirez	7b87cda70e	Fixed typo in CLUSTER RESET implementation.	2014-05-15 12:33:57 +02:00
antirez	796f4ae9f7	CLUSTER RESET implemented. The new command is able to reset a cluster node so that it starts again as a fresh node. By default the command performs a soft reset (the same as calling it as CLUSTER RESET SOFT), and the following steps are performed: 1) All slots are set as unassigned. 2) The list of known nodes is flushed. 3) Node is set as master if it is a slave. When an hard reset is performed with CLUSTER RESET HARD the following additional operations are performed: 4) A new Node ID is created at random. 5) Epochs are set to 0. CLUSTER RESET is useful both when the sysadmin wants to reconfigure a node with a different role (for example turning a slave into a master) and for testing purposes. It also may play a role in automatically provisioned Redis Clusters, since it allows to reset a node back to the initial state in order to be reconfigured.	2014-05-15 11:43:06 +02:00
antirez	8b9d5ecbd1	Remove trailing spaces from cluster.c file.	2014-05-15 10:18:36 +02:00
antirez	60e5d1724c	Cluster: don't accept cluster bus connections during startup.	2014-05-14 12:05:00 +02:00
antirez	6baac558d8	Cluster: better handling of stolen slots. The previous code handling a lost slot (by another master with an higher configuration for the slot) was defensive, considering it an error and putting the cluster in an odd state requiring redis-cli fix. This was changed, because actually this only happens either in a legitimate way, with failovers, or when the admin messed with the config in order to reconfigure the cluster. So the new code instead will try to make sure that the keys stored match the new slots map, by removing all the keys in the slots we lost ownership from. The function that deletes the keys from the lost slots is called only if the node does not lose all its slots (resulting in a reconfiguration as a slave of the node that got ownership). This is an optimization since the replication code will anyway flush all the instance data in a faster way.	2014-05-14 10:46:37 +02:00
antirez	832a298005	Cluster: fixed data_age computation / check integer overflow.	2014-05-12 17:46:15 +02:00
antirez	2692339138	Cluster: forced failover implemented. Using CLUSTER FAILOVER FORCE it is now possible to failover a master in a forced way, which means: 1) No check to understand if the master is up is performed. 2) No data age of the slave is checked. Evan a slave with very old data can manually failover a master in this way. 3) No chat with the master is attempted to reach its replication offset: the master can just be down.	2014-05-12 16:34:20 +02:00
antirez	005f564eb3	Cluster: bypass data_age check for manual failovers. Automatic failovers only happen in Redis Cluster if the slave trying to be elected was disconnected from its master for no more than 10 times the node-timeout value. However there should be no such a check for manual failovers, since these are initiated by the sysadmin that, in theory, knows what she is doing when a slave is selected to be promoted.	2014-05-12 16:12:12 +02:00
antirez	5c78f87666	RESTORE: reply with -BUSYKEY special error code. The error when the target key is busy was a generic one, while it makes sense to be able to distinguish between the target key busy error and the others easily.	2014-05-12 10:01:59 +02:00
antirez	71d0e7e0ea	CLUSTER MEET: better error messages when address is invalid. Fixes issue #1734.	2014-05-09 16:36:59 +02:00
antirez	8a170c817d	Cluster: bulk-accept new nodes connections. The same change was operated for normal client connections. This is important for Cluster as well, since when a node rejoins the cluster, when a partition heals or after a restart, it gets flooded with new connection attempts by all the other nodes trying to form a full mesh again.	2014-05-09 11:52:59 +02:00
antirez	3625b52791	Cluster: clusterAcceptHandler() comments updated to match the code.	2014-05-09 11:44:46 +02:00
antirez	11d9ecb71d	CLUSTER SET-CONFIG-EPOCH implemented. Initially Redis Cluster accepted that after cluster creation all the nodes were at configEpoch 0, evolving from zero as failovers happen. However later the semantic was made more strict in order to make sure a cluster has always all the master nodes with a different configEpoch, which is more robust in some corner case (especially resulting from errors by the system administrator). To assign different configEpochs to different nodes at startup was a task performed naturally by the config conflicts resolution algorithm (see the Cluster specification). However this works well only for small clusters or when there are actually just a few collisions, since it is designed for exceptional cases. When a large cluster is created hundred of nodes can be at epoch 0, so the conflict resolution code is slow to provide an unique config to each node. For this reason this new command was introduced. It can be called only when a node is totally fresh: no other nodes known, and configEpoch set to zero, so it is safe even against misuses. redis-trib will use the new command in order to start the cluster already setting an incremental unique config to every node.	2014-04-29 19:15:16 +02:00
antirez	e3cf812c9e	clusterLoadConfig() REDIS_ERR retval semantics refined. We should return REDIS_ERR to signal we can't read the configuration because there is no config file only after checking errno, othewise we risk to rewrite an existing file that was not accessible for some other reason.	2014-04-24 16:23:03 +02:00
antirez	db06108bc1	Lock nodes.conf to avoid multiple processes using the same file. This was a common source of problems among users. The solution adopted is not bullet-proof as if the user deletes the nodes.conf file manually, and starts a new instance with the same nodes.conf file path, two instances will use the same file. However following this reasoning the user may drop a nuclear bomb into the datacenter as well.	2014-04-24 16:04:10 +02:00
kingsumos	a69178fdd2	fix cluster node description showing wrong slot allocation	2014-04-22 11:44:53 -04:00
antirez	67bb2c46b2	Add casting to match printf format. adjustOpenFilesLimit() and clusterUpdateSlotsWithConfig() that were assuming uint64_t is the same as unsigned long long, which is true probably for all the systems out there that we target, but still GCC emitted a warning since technically they are two different types.	2014-04-07 08:58:06 +02:00

1 2 3 4 5 ...

516 Commits