redict

mirror of https://codeberg.org/redict/redict.git synced 2025-01-24 00:59:02 -05:00

Author	SHA1	Message	Date
Rogerio Goncalves	ef29748d0d	Check args before run ckquorum. Fix issue #2635	2015-07-24 14:08:50 +02:00
antirez	821a986643	Sentinel: fix bug in config rewriting during failover We have a check to rewrite the config properly when a failover is in progress, in order to add the current (already failed over) master as slave, and don't include in the slave list the promoted slave itself. However there was an issue, the variable with the right address was computed but never used when the code was modified, and no tests are available for this feature for two reasons: 1. The Sentinel unit test currently does not test Sentinel ability to persist its state at all. 2. It is a very hard to trigger state since it lasts for little time in the context of the testing framework. However this feature should be covered in the test in some way. The bug was found by @badboy using the clang static analyzer. Effects of the bug on safety of Sentinel === This bug results in severe issues in the following case: 1. A Sentinel is elected leader. 2. During the failover, it persists a wrong config with a known-slave entry listing the master address. 3. The Sentinel crashes and restarts, reading invalid configuration from disk. 4. It sees that the slave now does not obey the logical configuration (should replicate from the current master), so it sends a SLAVEOF command to the master (since the slave master is the same) creating a replication loop (attempt to replicate from itself) which Redis is currently unable to detect. 5. This means that the master is no longer available because of the bug. However the lack of availability should be only transient (at least in my tests, but other states could be possible where the problem is not recovered automatically) because: 6. Sentinels treat masters reporting to be slaves as failing. 7. A new failover is triggered, and a slave is promoted to master. Bug lifetime === The bug is there forever. Commit `16237d78` actually tried to fix the bug but in the wrong way (the computed variable was never used! My fault). So this bug is there basically since the start of Sentinel. Since the bug is hard to trigger, I remember little reports matching this condition, but I remember at least a few. Also in automated tests where instances were stopped and restarted multiple times automatically I remember hitting this issue, however I was not able to reproduce nor to determine with the information I had at the time what was causing the issue.	2015-06-12 18:36:17 +02:00
Salvatore Sanfilippo	4082c38a60	Merge pull request #2571 from therealbill/sentinel-flushconfig-command adding a sentinel command: "flushconfig" per RCP4	2015-05-25 12:06:25 +02:00
antirez	20700fe566	Sentinel: clarify effect of resetting failover_start_time.	2015-05-25 10:32:28 +02:00
antirez	5080f2d699	Sentinel: help subcommand in simulate-failure command	2015-05-25 10:24:27 +02:00
antirez	fb3af75f74	Sentinel: initial failure simulator implemented This commit adds the SENTINEL simulate-failure, that sets specific hooks inside the state machine that will crash Sentinel, for testing purposes.	2015-05-22 11:49:11 +02:00
antirez	c54de703f2	Sentinel: fix sentinelTryConnectionSharing() by checking for no match Trivial omission of the obvious no-match case.	2015-05-20 09:59:55 +02:00
antirez	abc65e8987	Sentinel: SENTINEL CKQUORUM command A way for monitoring systems to check that Sentinel is technically able to reach the quorum and failover, using the currently visible Sentinels.	2015-05-18 12:57:47 +02:00
antirez	b43431ac25	Sentinel: port address update code to shared links logic	2015-05-15 09:47:05 +02:00
antirez	4dee18cb66	Sentinel: config-rewrite unique ID just one time	2015-05-14 17:45:09 +02:00
antirez	f9e942d4ae	Sentinel: remove debugging message from releaseInstanceLink()	2015-05-14 14:12:45 +02:00
antirez	b44c37482c	Sentinel: fix access to NULL link->cc in releaseInstanceLink()	2015-05-14 14:08:23 +02:00
antirez	87b6013adb	Sentinel: remove SHARED! debugging printf	2015-05-14 13:40:23 +02:00
antirez	5a0516b5b9	Sentinel: rewrite callback chain removing instances with shared links Otherwise pending commands callbacks will fire with a reference that no longer exists.	2015-05-14 13:39:26 +02:00
antirez	05dbc82005	Sentinel: debugging code removed from sentinelSendPing()	2015-05-14 10:52:32 +02:00
antirez	58d2bb951a	Sentinel: use active/last time for ping logic The PING trigger was improved again by using two fields instead of a single one to remember when the last ping was sent: 1. The "active" ping is the time at which we sent the last ping that still received no reply. However we continue to ping non replying instances even if they have an old active ping: the link may be disconnected and reconencted in the meantime so the older pings may get lost even if it's a TCP socket. 2. The "last" ping is the time at which we really sent the last ping on the wire, and this is used in order to throttle the amount of pings we send during failures (when no pong is received). All in all the failure detector effectiveness should be identical but we avoid to flood instances with pings during failures or when they are slow.	2015-05-14 09:56:23 +02:00
antirez	3ab49895b4	Sentinel: limit reconnection frequency to the ping period	2015-05-13 14:23:57 +02:00
antirez	0eb0b55ff0	Sentinel: PING trigger improved It's ok to ping as soon as the ping period has elapsed since we received the last PONG, but it's not good that we ping again if there is a pending ping... With this change we'll send a new ping if there is one pending only if two times the ping period elapsed since the ping which is still pending was sent.	2015-05-12 17:03:53 +02:00
antirez	9d5e2ed392	Sentinel: same-Sentinel link sharing across masters	2015-05-12 17:03:00 +02:00
antirez	e0a5246f06	Sentinel: add sentinelGetInstanceTypeString() fuction This is useful for debugging and logging activities: given a sentinelRedisInstance object returns a C string representing the instance type: master, slave, sentinel.	2015-05-12 12:12:25 +02:00
antirez	d6e1347869	Sentinel: add link refcount to instance description	2015-05-11 23:49:19 +02:00
therealbill	4e8ccbe7ea	adding a sentinel command: "flushconfig" This new command triggers a config flush to save the in-memory config to disk. This is useful for cases of a configuration management system or a package manager wiping out your sentinel config while the process is still running - and has not yet been restarted. It can also be useful for scripting a backup and migrate or clone of a running sentinel.	2015-05-11 14:08:57 -05:00
antirez	1029276c0d	Sentinel: connection sharing WIP #1	2015-05-11 13:15:26 +02:00
antirez	611283f743	Sentinel: suppress warnings for not used args.	2015-05-08 17:17:59 +02:00
antirez	3eca0752a6	Sentinel: generate +sentinel again, removed in prev commit.	2015-05-08 17:16:48 +02:00
antirez	b91434cab1	Sentinel: Use privdata instead of c->data in sentinelReceiveHelloMessages() This way we may later share the hiredis link "c" among the same Sentinel instance referenced multiple times for multiple masters.	2015-05-08 17:16:39 +02:00
antirez	b849886a0d	Sentinel: clarify arguments of SENTINEL IS-MASTER-DOWN-BY-ADDR	2015-05-08 17:16:00 +02:00
antirez	a0cd75cd1b	Sentinel: don't detect duplicated Sentinels, just address switch Since with a previous commit Sentinels now persist their unique ID, we no longer need to detect duplicated Sentinels and re-add them. We remove and re-add back using different events only in the case of address switch of the same Sentinel, without generating a new +sentinel event.	2015-05-07 10:07:47 +02:00
antirez	794fc4c9a8	Sentinel: persist its unique ID across restarts. Previously Sentinels always changed unique ID across restarts, relying on the server.runid field. This is not a good idea, and forced Sentinel to rely on detection of duplicated Sentinels and a potentially dangerous clean-up and re-add operation of the Sentinel instance that was rebooted. Now the ID is generated at the first start and persisted in the configuration file, so that a given Sentinel will have its unique ID forever (unless the configuration is manually deleted or there is a filesystem corruption).	2015-05-06 16:19:14 +02:00
therealbill	cc799d253f	Making sentinel flush config on +slave Originally, only the +slave event which occurs when a slave is reconfigured during sentinelResetMasterAndChangeAddress triggers a flush of the config to disk. However, newly discovered slaves don't apparently trigger this flush but do trigger the +slave event issuance. So if you start up a sentinel, add a master, then add a slave to the master (as a way to reproduce it) you'll see the +slave event issued, but the sentinel config won't be updated with the known-slave entry. This change makes sentinel do the flush of the config if a new slave is deteted in sentinelRefreshInstanceInfo.	2015-05-04 12:54:13 +02:00
antirez	99c93f34a7	Sentinel: remove useless sentinelFlushConfig() call To rewrite the config in the loop that adds slaves back after a master reset, in order to handle switching to another master, is useless: it just adds latency since there is an fsync call in the inner loop, without providing any additional guarantee, but the contrary, since if after the first loop iteration the server crashes we end with just a single slave entry losing all the other informations. It is wiser to rewrite the config at the end when the full new state is configured.	2015-05-04 12:50:44 +02:00
clark.kang	eff212ea95	fix sentinel memory leak	2015-04-29 00:05:26 +09:00
Salvatore Sanfilippo	61fb441c8c	Merge pull request #2386 from inkel/sentinel-add-client-command Support CLIENT commands in Redis Sentinel	2015-03-13 18:23:36 +01:00
Salvatore Sanfilippo	e00cb78f67	Merge pull request #2054 from mattsta/fix-set-sentinel-quorum Sentinel: Add initial quorum bounds check	2015-02-25 10:09:40 +01:00
Salvatore Sanfilippo	46bd13b806	Merge pull request #1966 from mattsta/fix-sentinel-info Sentinel: Improve INFO command behavior	2015-02-24 17:20:09 +01:00
Leandro López (inkel)	d5e01519e5	Support CLIENT commands in Redis Sentinel When trying to debug sentinel connections or max connections errors it would be very useful to have the ability to see the list of connected clients to a running sentinel. At the same time it would be very helpful to be able to name each sentinel connection or kill offending clients. This commits adds the already defined CLIENT commands back to Redis Sentinel.	2015-02-02 18:16:18 -03:00
Matt Stancliff	d956d809ac	Fix three simple clang analyzer warnings	2014-12-23 09:31:04 -05:00
Matt Stancliff	ad41a7c404	Add addReplyBulkSds() function Refactor a common pattern into one function so we don't end up with copy/paste programming.	2014-12-23 09:31:02 -05:00
Matt Stancliff	32bba43ac7	Add 'age' value to SENTINEL INFO-CACHE	2014-12-22 21:17:04 -05:00
antirez	bbf0736c4e	sdsformatip() removed. Specialized single-use function. Not the best match for sds.c btw. Also genClientPeerId() is no longer static: we need symbols.	2014-12-11 18:29:04 +01:00
antirez	ce269ad3c5	AnetFormatIP(): renamed, commented, now sticks to IP:port format. A few code style changes + consistent format: not nice for humans but better for parsers.	2014-12-11 18:20:30 +01:00
Matt Stancliff	391fc9b633	Sentinel: Improve INFO command behavior Improvements: - Return empty string if asking for non-existing section (INFO foo) - Fix potential memory leak (caused by sdsempty() then returned if >2 args) - Clean up argument parsing - Allow "all" as valid section (same as "default" or zero args currently) - Move strcasecmp to end of evaluation chain in conditionals Also, since we're C99, I moved some variable declarations to be closer to where they are actually used (saves us from needing to free an empty info if detect argument errors up front). Closes #1915 Closes #1966	2014-12-11 10:49:16 -05:00
Matt Stancliff	491881e13b	Cleanup all IP formatting code Instead of manually checking for strchr(n,':') everywhere, we can use our new centralized IP formatting functions.	2014-12-11 10:12:18 -05:00
antirez	d8158771b5	Sentinel: INFO-CACHE comments reworked a bit. Changed in order to make them more review friendly, based on the experience of reviewing the code myself.	2014-12-10 11:15:13 +01:00
antirez	c83a917286	Sentinel: INFO-CACHE GCC minior code cleanup. I guess the initial goal of the initialization was to suppress GCC warning, but if we have to initialize, we can do it with the base-case value instead of NULL which is never retained.	2014-12-10 11:12:26 +01:00
antirez	0422321617	Sentinel: removed useless flag var from INFO-CACHE.	2014-12-10 11:05:37 +01:00
antirez	7576a27d58	Sentinel: INFO-CACHE reply format command shortened.	2014-12-10 11:04:24 +01:00
Matt Stancliff	f8c73e38b5	Add SENTINEL INFO-CACHE [masters...] Sentinel queries the INFO from every master and from every replica of every master. We can cache the INFO results in Sentinel so Sentinel can be a single place to quickly get all INFO output for an entire Sentinel monitoring group. This commit gives us SENTINEL INFO-CACHE in two forms: - SENTINEL INFO-CACHE — returns all masters and all replicas - SENTINEL INFO-CACHE master0 master1 ... masterN — vararg specify masters Results are returned as a multibulk reply with two top-level entries for each master. The first entry for each master is the name of the master. The second entry is a nested multibulk reply with the contents of INFO, first for the master, then an additional entry for each of the replicas.	2014-11-20 16:56:30 -05:00
Matt Stancliff	6739ef4447	Sentinel: Add initial quorum bounds check Fixes #2054	2014-11-20 16:30:17 -05:00
Matt Stancliff	12d0195b30	Clean up text throughout project - Remove trailing newlines from redis.conf - Fix comment misspelling - Clarifies zipEncodeLength usage and a C API mention (#1243, #1242) - Fix cluster typos (inspired by @papanikge #1507) - Fix rewite -> rewrite in a few places (inspired by #682) Closes #1243, #1242, #1507	2014-09-29 06:49:07 -04:00
antirez	f5efa9bbad	Sentinel sentinelGetLeader() top comment improved.	2014-09-11 19:27:45 +02:00
antirez	f4be6f16f2	Sentinel: fix computation of total number of votes. The code to check the number of voters was never updated to follow the new Sentinel specification, so the number of voters was computed using only the set of Sentinels that provided a vote. This means that there is a changing majority on partitions, even if usually the issue is not triggered because of the configured quorum check (what was broken was the other implicit check that requires anyway half of the known sentinels to agree in order to start a failover).	2014-09-11 18:53:31 +02:00
antirez	0a6cbabb26	Sentinel: don't set announce-ip if is empty.	2014-09-04 11:45:58 +02:00
antirez	cd576a1aab	Sentinel: announce ip/port changes + rewrite. The original implementation was modified in order to allow to selectively announce a different IP or port, and to rewrite the two options in the config file after a rewrite.	2014-09-04 11:23:31 +02:00
Dara Kong	3d939266be	sentinel: Decouple bind address from address sent to other sentinels There are instances such as EC2 where the bind address is private (behind a NAT) and cannot be accessible from WAN. https://groups.google.com/d/msg/redis-db/PVVvjO4nMd0/P3oWC036v3cJ	2014-09-04 10:54:21 +02:00
Matt Stancliff	67e414c7b8	Sentinel: Abort Hello quicker if not connected We can save a little work by aborting when we enter the function if we're disconnected.	2014-09-01 16:34:06 +02:00
Matt Stancliff	7e63dd23f3	Rename two 'buf' vars to 'ip' for better clarity Clearly ip[32] is wrong, but it's less clear that buf[32] was wrong without further reading.	2014-08-25 10:16:20 +02:00
Eiichi Sato	c38884ceac	Sentinel: fix bufsize to support IPv6 address Closes #1914	2014-08-25 10:15:43 +02:00
antirez	edca2b14d2	Remove warnings and improve integer sign correctness.	2014-08-13 11:44:38 +02:00
antirez	e3bae84606	Sentinel implementation of ROLE.	2014-06-23 12:07:41 +02:00
Matt Stancliff	5cd83ef539	Sentinel: bind source address Some deployments need traffic sent from a specific address. This change uses the same policy as Cluster where the first listed bindaddr becomes the source address for outgoing Sentinel communication. Fixes #1667	2014-06-23 11:44:35 +02:00
antirez	41f12ac988	Sentinel: send hello messages ASAP after config change. Eventual configuration convergence is guaranteed by our periodic hello messages to all the instances, however when there are important notices to share, better make a phone call. With this commit we force an hello message to other Sentinal and Redis instances within the next 100 milliseconds of a config update, which is practically better than waiting a few seconds.	2014-06-19 15:17:06 +02:00
antirez	94bc467328	Sentinel: handle SRI_PROMOTED flag correctly. Lack of check of the SRI_PROMOTED flag caused Sentienl to act with the promoted slave turned into a master during failover like if it was a normal instance. Normally this problem was not apparent because during real failovers the old master is down so the bugged code path was not entered, however with manual failovers via the SENTINEL FAILOVER command, the problem was easily triggered. This commit prevents promoted slaves from getting reconfigured, moreover we now explicitly check that during a failover the slave turning into a master is the one we selected for promotion and not a different one.	2014-06-19 10:28:27 +02:00
antirez	2c17591224	Sentinel: send SLAVEOF with MULTI, CLIENT KILL, CONFIG REWRITE. This implements the new Sentinel-Client protocol for the Sentinel part: now instances are reconfigured using a transaction that ensures that the config is rewritten in the target instance, and that clients lose the connection with the instance, in order to be forced to: ask Sentinel, reconnect to the instance, and verify the instance role with the new ROLE command.	2014-06-17 11:03:21 +02:00
antirez	8a588ac14d	More trailing spaces in sentinel.c removed.	2014-05-28 15:46:05 +02:00
antirez	01e3f9ba1d	Remove trailing spaces from sentinel.c.	2014-05-20 14:22:42 +02:00
antirez	2102778606	Sentinel: log when a failover will be attempted again. When a Sentinel performs a failover (successful or not), or when a Sentinel votes for a different Sentinel trying to start a failover, it sets a min delay before it will try to get elected for a failover. While not strictly needed, because if multiple Sentinels will try to failover the same master at the same time, only one configuration will eventually win, this serialization is practically very useful. Normal failovers are cleaner: one Sentinel starts to failover, the others update their config when the Sentinel performing the failover is able to get the selected slave to move from the role of slave to the one of master. However currently this timeout was implicit, so users could see Sentinels not reacting, after a failed failover, for some time, without giving any feedback in the logs to the poor sysadmin waiting for clues. This commit makes Sentinels more verbose about the delay: when a master is down and a failover attempt is not performed because the delay has still not elaped, something like that will be logged: Next failover delay: I will not start a failover before Thu May 8 16:48:59 2014	2014-05-08 16:38:53 +02:00
antirez	931beae9b0	Sentinel: generate +config-update-from event when a new config is received. This event makes clear, before the switch-master event is generated, that a Sentinel received a configuration update from another Sentinel.	2014-05-08 15:59:34 +02:00
antirez	35667d75c3	Fixed undefined variable value with certain code paths. In sentinelFlushConfig() fd could be undefined when the following if statement was true: if (rewrite_status == -1) goto werr; This could cause random file descriptors to get closed.	2014-03-24 21:07:44 +01:00
Matt Stancliff	4290455145	Sentinel: Notify user when config can't be saved	2014-03-24 13:54:14 -04:00
Salvatore Sanfilippo	906c4d77c0	Merge pull request #1617 from mattsta/remove-unused-warning Cluster: remove variable causing warning	2014-03-24 18:33:22 +01:00
Matt Stancliff	67ed5f00aa	Cluster: remove variable causing warning GCC-4.9 warned about this, but clang didn't. This commit fixes warning: sentinel.c: In function 'sentinelReceiveHelloMessages': sentinel.c:2156:43: warning: variable 'master' set but not used [-Wunused-but-set-variable] sentinelRedisInstance ri = c->data, master;	2014-03-18 15:35:09 -04:00
antirez	b9e90a70fa	Sentinel: sentinelRefreshInstanceInfo() minor refactoring. Test sentinel.tilt condition on top and return if it is true. This allows to remove the check for the tilt condition in the remaining code paths of the function.	2014-03-18 15:35:47 +01:00
antirez	218cc5fc39	Sentinel: propagate down-after-ms changes to slaves and sentinels.	2014-03-18 14:37:44 +01:00
antirez	bb6d850160	Sentinel: down-after-milliseconds is not master-specific. addReplySentinelRedisInstance() modified so that this field is displayed for all the kind of instances: Sentinels, Masters, Slaves.	2014-03-18 11:21:17 +01:00
antirez	ae0b7680b3	Sentinel failure detection implementation improved. Failure detection in Sentinel is ping-pong based. It used to work by remembering the last time a valid PONG reply was received, and checking if the reception time was too old compared to the current current time. PINGs were sent at a fixed interval of 1 second. This works in a decent way, but does not scale well when we want to set very small values of "down-after-milliseconds" (this is the node timeout basically). This commit reiplements the failure detection making a number of changes. Some changes are inspired to Redis Cluster failure detection code: * A new last_ping_time field is added in representation of instances. If non zero, we have an active ping that was sent at the specified time. When a valid reply to ping is received, the field is zeroed again. * last_ping_time is not reset when we reconnect the link or send a new ping, so from our point of view it represents the time we started waiting for the instance to reply to our pings without receiving a reply. * last_ping_time is now used in order to check if the instance is timed out. This means that we can have a node timeout of 100 milliseconds and yet the system will work well since the new check is not bound to the period used to send pings. * Pings are now sent every second, or often if the value of down-after-milliseconds is less than one second. With a lower limit of 10 HZ ping frequency. * Link reconnection code was improved. This is used in order to try to reconnect the link when we are at 50% of the node timeout without a valid reply received yet. However the old code triggered unnecessary reconnections when the node timeout was very small. Now that should be ok. The new code passes the tests but more testing is needed and more unit tests stressing the failure detector, so currently this is merged only in the unstable branch.	2014-03-17 18:33:45 +01:00
antirez	3a2ff55617	Sentinel: use CLIENT SETNAME when connecting to Redis. This makes debugging / monitoring of Sentinels simpler since you can identify sentinels in CLIENT LIST output of Redis instances.	2014-03-15 14:59:23 +01:00
Matt Stancliff	584052ee6b	Fix segfault from accessing array out of bounds argc == 2; argv[2] == crash	2014-03-14 17:38:05 -04:00
antirez	ed813863f0	Sentinel: be safe under crash-recovery assumptions. Sentinel's main safety argument is that there are no two configurations for the same master with the same version (configuration epoch). For this to be true Sentinels require to be authorized by a majority. Additionally Sentinels require to do two important things: * Never vote again for the same epoch. * Never exchange an old vote for a fresh one. The first prerequisite, in a crash-recovery system model, requires to persist the master->leader_epoch on durable storage before to reply to messages. This was not the case. We also make sure to persist the current epoch in order to never reply to stale votes requests from other Sentinels, after a recovery. The configuration is persisted by making use of fsync(), this is considered in the context of this code a good enough guarantee that after a restart our durable state is restored, however this may not always be the case depending on the kind of hardware and operating system used.	2014-03-14 14:58:44 +01:00
antirez	365094028b	Sentinel: fake PUBLISH command to receive HELLO messages. Now the way HELLO messages are received is unified. Now it is no longer needed for Sentinels to converge to the higher configuration for a master to be able to chat via some Redis instance, the are able to directly exchanges configurations. Note that this commit does not include the (trivial) change needed to send HELLO messages to Sentinel instances as well, since for an error I committed the change in the previous commit that refactored hello messages processing into a separated function.	2014-03-14 11:07:42 +01:00
antirez	9dfe426fc8	Sentinel: HELLO processing refactored into sentinelProcessHelloMessage().	2014-03-14 11:07:42 +01:00
Jan-Erik Rediger	5f5118bdad	Small typo fixed	2014-03-05 00:41:02 +01:00
antirez	47750998a6	Sentinel: more aggressive failover start desynchronization. Sentinel needs to avoid split brain conditions due to multiple sentinels trying to get voted at the exact same time. So far some desynchronization was provided by fluctuating server.hz, that is the frequency of the timer function call. However the desynchonization provided in this way was not enough when using many Sentinel instances, especially when a large quorum value is used in order to force a greater degree of agreement (more than N/2+1). It was verified that it was likely to trigger a split brain condition, forcing the system to try again after a timeout. Usually the system will succeed after a few retries, but this is not optimal. This commit desynchronizes instances in a more effective way to make it likely that the first attempt will be successful.	2014-03-04 17:09:36 +01:00
antirez	b15411df98	Sentinel: log quorum with +monitor event.	2014-02-24 17:10:20 +01:00
antirez	6b373edb77	Sentinel: generate +monitor events at startup.	2014-02-24 16:33:55 +01:00
antirez	3b7a757468	Sentinel: log +monitor and +set events. Now that we have a runtime configuration system, it is very important to be able to log how the Sentinel configuration changes over time because of API calls.	2014-02-24 16:33:43 +01:00
antirez	25cebf7285	Sentinel: added missing exit(1) after checking for config file.	2014-02-24 16:22:52 +01:00
antirez	b1c1386374	Sentinel: IDONTKNOW error removed. This error was conceived for the older version of Sentinel that worked via master redirection and that was not able to get configuration updates from other Sentinels via the Pub/Sub channel of masters or slaves. This reply does not make sense today, every Sentinel should reply with the best information it has currently. The error will make even more sense in the future since the plan is to allow Sentinels to update the configuration of other Sentinels via gossip with a direct chat without the prerequisite that they have at least a monitored instance in common.	2014-02-22 17:34:46 +01:00
antirez	7d7b3810e7	Sentinel: report instances role switch events. This is useful mostly for debugging of issues.	2014-02-20 12:13:52 +01:00
antirez	7cec9e48ce	Sentinel: SENTINEL_SLAVE_RECONF_RETRY_PERIOD -> RECONF_TIMEOUT Rename define to match the new meaning.	2014-02-18 10:27:38 +01:00
antirez	18b8bad53c	Sentinel: fix slave promotion timeout. If we can't reconfigure a slave in time during failover, go forward as anyway the slave will be fixed by Sentinels in the future, once they detect it is misconfigured. Otherwise a failover in progress may never terminate if for some reason the slave is uncapable to sync with the master while at the same time it is not disconnected.	2014-02-18 08:50:57 +01:00
antirez	e1b77b61f3	Sentinel: better specify startup errors due to config file. Now it logs the file name if it is not accessible. Also there is a different error for the missing config file case, and for the non writable file case.	2014-02-17 16:44:49 +01:00
antirez	2d6eb68993	Sentinel: allow SHUTDOWN command in Sentinel mode.	2014-02-07 11:22:24 +01:00
antirez	3ff1bb4b2e	Sentinel: check arity for SENTINEL MASTER command. This fixes issue #1530.	2014-01-31 10:13:38 +01:00
antirez	d5763dceaf	SENTINEL SET master quorum implemented.	2014-01-14 09:23:26 +01:00
antirez	fe86f890b0	SENTINEL SET: error on bad option name + flush config on error.	2014-01-13 11:55:57 +01:00
antirez	f822516e43	SENTINEL SET implemented. The new command allows to change master-specific configurations at runtime. All the settable parameters can be retrivied via the SENTINEL MASTER command, so there is no equivalent "GET" command.	2014-01-13 11:53:29 +01:00
antirez	3cdcaff069	Sentinel: fix wrong arity error message.	2014-01-13 11:05:13 +01:00
antirez	964f6b17e9	Sentinel: SENTINEL REMOVE command added. The command totally removes a monitored master.	2014-01-10 15:39:36 +01:00
antirez	cf2835519e	Sentinel: releaseSentinelRedisInstance() top comment fixed. The claim about unlinking the instance from the connected hash tables was the opposite of the reality. Also the current actual behavior is safer in most cases, so it is better to manually unlink when needed.	2014-01-10 15:33:42 +01:00
antirez	9d0f46c6f5	Sentinel: flush config on disk when new master is added.	2014-01-10 15:22:06 +01:00
antirez	39f9f449b0	Sentinel: SENTINEL MONITOR command implemented. It allows to add new masters to monitor at runtime.	2014-01-10 15:18:24 +01:00
antirez	c42e4bd0b6	Sentinel: added SENTINEL MASTER <name> command. With SENTINEL MASTERS it was already possible to list all the configured masters, but not a specific one.	2014-01-10 14:41:52 +01:00
antirez	2bb9cd464e	Add all the configurable fields to addReplySentinelRedisInstance(). Note: the auth password with the master is voluntarily not exposed.	2014-01-10 14:31:41 +01:00
antirez	5a7d04ee7b	Trip comment to 80 cols in SentinelCommand().	2014-01-10 14:13:04 +01:00
antirez	5320148883	Sentinel: dead code removed.	2013-12-13 11:01:13 +01:00
antirez	2eb781b35b	dict.c: added optional callback to dictEmpty(). Redis hash table implementation has many non-blocking features like incremental rehashing, however while deleting a large hash table there was no way to have a callback called to do some incremental work. This commit adds this support, as an optiona callback argument to dictEmpty() that is currently called at a fixed interval (one time every 65k deletions).	2013-12-10 18:46:24 +01:00
antirez	c590549e40	Sentinel: fix reported role info sampling. The way the role change was recoded was not sane and too much convoluted, causing the role information to be not always updated. This commit fixes issue #1445.	2013-12-06 12:46:56 +01:00
antirez	2b414a4b5f	Sentinel: fix reported role fields when master is reset. When there is a master address switch, the reported role must be set to master so that we have a chance to re-sample the INFO output to check if the new address is reporting the right role. Otherwise if the role was wrong, it will be sensed as wrong even after the address switch, and for enough time according to the role change time, for Sentinel consider the master SDOWN. This fixes isue #1446, that describes the effects of this bug in practice.	2013-12-06 11:37:46 +01:00
antirez	11e81a1e9a	Fixed grammar: before H the article is a, not an.	2013-12-05 16:35:32 +01:00
antirez	f80cf7363a	Sentinel: don't write HZ when flushing config. See issue #1419.	2013-12-02 15:56:10 +01:00
antirez	dffebbc904	Sentinel: better time desynchronization. Sentinels are now desynchronized in a better way changing the time handler frequency between 10 and 20 HZ. This way on average a desynchronization of 25 milliesconds is produced that should be larger enough compared to network latency, avoiding most split-brain condition during the vote. Now that the clocks are desynchronized, to have larger random delays when performing operations can be easily achieved in the following way. Take as example the function that starts the failover, that is called with a frequency between 10 and 20 HZ and will start the failover every time there are the conditions. By just adding as an additional condition something like rand()%4 == 0, we can amplify the desynchronization between Sentinel instances easily. See issue #1419.	2013-12-02 12:29:42 +01:00
antirez	0addf8aff1	Sentinel: log vote received from other Sentinels.	2013-11-28 15:23:46 +01:00
huangz1990	86a540a66e	fix a bug in sentinel.c about pub/sub link	2013-11-26 19:55:51 +08:00
antirez	6f4fd55762	Sentinel: fixes inverted strcmp() test preventing config updates. The result of this one-char bug was pretty serious, if the new master had the same port of the previous master, but just a different IP address, non-leader Sentinels would not be able to recognize the configuration change. This commit fixes issue #1394. Many thanks to @shanemadden that reported the bug and helped investigating it.	2013-11-25 10:59:53 +01:00
antirez	8d547ebd56	Sentinel: fix type specifier for Hello msg generation. This fixes issue #1395.	2013-11-25 10:24:34 +01:00
antirez	cc6053681f	Sentinel: different comments updated to new implementation.	2013-11-21 16:22:59 +01:00
antirez	685e79998c	Sentinel: cleanup around SENTINEL_INFO_VALIDITY_TIME.	2013-11-21 16:05:41 +01:00
antirez	489d889726	Sentinel: removed mem leak and useless code.	2013-11-21 15:43:55 +01:00
antirez	f55ad3038f	Sentinel: manual failover works again.	2013-11-21 12:39:47 +01:00
antirez	297de1ab26	Sentinel: test for writable config file. This commit introduces a funciton called when Sentinel is ready for normal operations to avoid putting Sentinel specific stuff in redis.c.	2013-11-21 12:28:15 +01:00
antirez	d920177f8d	Sentinel: check for disconnected links in sentinelSendHello(). Does not fix any bug as the test is performed by the caller, but better to have the check.	2013-11-21 11:35:50 +01:00
antirez	8810167d13	Sentinel: Hello message sending code refactored.	2013-11-21 11:31:06 +01:00
antirez	0101c2bcfe	Sentinel: select slave with best (greater) replication offset.	2013-11-20 16:05:36 +01:00
antirez	a6ebd910d8	Sentinel: take the replication offset in slaves state.	2013-11-20 15:53:21 +01:00
antirez	37a51a2568	Sentinel: distinguish between is-master-down-by-addr requests. Some are just to know if the master is down, and in this case the runid in the request is set to "*", others are actually in order to seek for a vote and get elected. In the latter case the runid is set to the runid of the instance seeking for the vote.	2013-11-19 16:50:04 +01:00
antirez	b22d1beea0	Sentinel: various fixes to leader election implementation.	2013-11-19 16:20:42 +01:00
antirez	1f9728cb20	Sentinel: failover script execution fixed.	2013-11-19 12:34:46 +01:00
antirez	90635488ce	Sentinel: no longer used defines removed.	2013-11-19 11:24:36 +01:00
antirez	0a35f65301	Sentinel: when writing config on disk, remember sentinels runid.	2013-11-19 11:11:43 +01:00
antirez	5450833d02	Sentinel: arity of known-sentinel/slave is 4 not 3.	2013-11-19 11:03:47 +01:00
antirez	b8a94463b7	Sentinel: rewriteConfigSentinelOption() sub-iterators var typo fixed.	2013-11-19 10:59:50 +01:00
antirez	16237d78c8	Sentinel: call sentinelFlushConfig() to persist state when needed. Also the sentinel configuration rewriting was modified in order to account for failover in progress, where we need to provide the promoted slave address as master address, and the old master address as one of the slaves address.	2013-11-19 10:55:43 +01:00
antirez	e257ab2bfe	Sentinel: sentinelFlushConfig() to CONFIG REWRITE + fsync.	2013-11-19 10:13:04 +01:00
antirez	5998769c28	Sentinel: CONFIG REWRITE support for Sentinel config.	2013-11-19 09:48:12 +01:00
antirez	47df12d5d9	Sentinel: can-failover option removed, many comments fixed.	2013-11-19 09:28:47 +01:00
antirez	232cdb95ab	Sentinel: added config options useful to take state on config rewrite. We'll use CONFIG REWRITE (internally) in order to store the new configuration of a Sentinel after the internal state changes. In order to do so, we need configuration options (that usually the user will not touch at all) about config epoch of the master, Sentinels and Slaves known for this master, and so forth.	2013-11-18 16:03:03 +01:00
antirez	3a374b0511	Sentinel: failover abort function simplified.	2013-11-18 11:43:35 +01:00
antirez	e0750acf11	Sentinel: slaves reconfig delay modified. The time Sentinel waits since the slave is detected to be configured to the wrong master, before reconfiguring it, is now the failover_timeout time as this makes more sense in order to give the Sentinel performing the failover enoung time to reconfigure the slaves slowly (if required by the configuration). Also we now PUBLISH more frequently the new configuraiton as this allows to switch the reapprearing master back to slave faster.	2013-11-18 11:37:24 +01:00
antirez	83316f515c	Sentinel: failover restart time is now multiple of failover timeout. Also defaulf failover timeout changed to 3 minutes as the failover is a fairly fast procedure most of the times, unless there are a very big number of slaves and the user picked to configure them sequentially (in that case the user should change the failover timeout accordingly).	2013-11-18 11:30:08 +01:00
antirez	3a56013acb	Sentinel: state machine and timeouts simplified.	2013-11-18 11:12:58 +01:00
antirez	4be53b1c5d	Sentinel: election timeout define.	2013-11-18 10:08:06 +01:00
antirez	69d826a354	Sentinel: fix address of master in Hello messages. Once we switched configuration during a failover, we should advertise the new address. This was a serious race condition as the Sentinel performing the failover for a moment advertised the old address with the new configuration epoch: once trasmitted to the other Sentinels the broken configuration would remain there forever, until the next failover (because a greater configuration epoch is required to overwrite an older one).	2013-11-14 10:25:55 +01:00
antirez	e4c65e72c6	Sentinel: master address selection in get-master-address refactored.	2013-11-14 10:23:54 +01:00
antirez	c0d7229364	Sentinel: fix conditional to only affect slaves with wrong master.	2013-11-14 10:23:05 +01:00
antirez	dfbd9c5aeb	Sentinel: simplify and refactor slave reconfig code.	2013-11-14 00:36:43 +01:00
antirez	64ad6648a8	Sentinel: reconfigure slaves to right master.	2013-11-14 00:29:38 +01:00
antirez	3e27d678da	Sentinel: remember last time slave changed master.	2013-11-14 00:20:15 +01:00
antirez	8297745fa6	Sentinel: redirect-to-master is not ok with new algorithm. Now Sentinel believe the current configuration is always the winner and should be applied by Sentinels instead of trying to adapt our view of the cluster based on what we observe. So the only way to modify what a Sentinel believe to be the truth is to win an election and advertise the new configuration via Pub / Sub with a greater configuration epoch.	2013-11-13 17:03:48 +01:00
antirez	76a88f56e5	Sentinel: safer slave reconfig, master reported role should match.	2013-11-13 17:02:09 +01:00

1 2 3 4 5 ...

318 Commits