Commit Graph

3974 Commits

Author SHA1 Message Date
antirez
a2c76ffb1c redis-cli: also remove useless uint8_t. 2014-02-25 13:47:37 +01:00
antirez
ba993cc685 redis-cli: don't use uint64_t where actually not needed.
The computation is just something to take the CPU busy, no need to use a
specific type. Since stdint.h was not included this prevented
compilation on certain systems.
2014-02-25 13:44:31 +01:00
antirez
5580350a7b redis-cli: check argument existence for --pattern. 2014-02-25 12:38:29 +01:00
antirez
c1d67ea9b4 redis-cli: --intrinsic-latency run mode added. 2014-02-25 12:37:52 +01:00
antirez
dcac007b81 redis-cli: added comments to split program in parts. 2014-02-25 12:24:45 +01:00
antirez
386467acfb Sentinel test: restart instances left killed by previous unit.
An unit can abort in the middle for an error. The next unit should not
assume that the instances are in a clean state, and must restart what
was left killed.
2014-02-25 08:48:46 +01:00
antirez
a9360c62e8 Sentinel test: jump to next unit on test failure.
Sentinel tests are designed to be dependent on the previous tests in the
same unit, so usually we can't continue with the next test in the same
unit if a previous test failed.
2014-02-25 08:33:41 +01:00
antirez
044b627549 Sentinel test: test majority crashing Sentinels.
The test was previously performed by removing the master from the
Sentinel monitored masters. The test with the Sentinels crashed is
more similar to real-world partitions / failures.
2014-02-25 08:29:12 +01:00
antirez
630fb3539f Sentinel test: restart_instance should refresh pid attrib.
Also kill_instance was modified to warn when a test will try to kill the
same instance multiple times for error.
2014-02-25 08:23:48 +01:00
antirez
d3a3ef0bc1 Sentinel test: more stuff mored from 00-base to init.
The area a number of mandatory tests to craete a stable setup for
testing that is not too sensitive to timing issues. All those tests
moved to includes/init-tests, and marked as (init).
2014-02-24 17:21:50 +01:00
antirez
b15411df98 Sentinel: log quorum with +monitor event. 2014-02-24 17:10:20 +01:00
antirez
29f4df8018 Sentinel test: removed useless code to set SDOWN timeout.
The new common initialization code used to start a new unit already set
the timeout to 2000 milliseconds.
2014-02-24 16:57:52 +01:00
antirez
6b373edb77 Sentinel: generate +monitor events at startup. 2014-02-24 16:33:55 +01:00
antirez
3b7a757468 Sentinel: log +monitor and +set events.
Now that we have a runtime configuration system, it is very important to
be able to log how the Sentinel configuration changes over time because
of API calls.
2014-02-24 16:33:43 +01:00
antirez
25cebf7285 Sentinel: added missing exit(1) after checking for config file. 2014-02-24 16:22:52 +01:00
antirez
540536c055 Sentinel test: tmp dir and gitignore added. 2014-02-24 11:51:31 +01:00
antirez
09dec3613e Sentinel test: minor fixes to --pause-on-error. 2014-02-23 18:02:52 +01:00
antirez
afd3db17a0 Sentinel test: --pause-on-error option added.
Pause the test with running instances available for state inspection on
error.
2014-02-23 17:57:56 +01:00
antirez
a929867cca Sentinel test: added empty units to fill later. 2014-02-23 17:50:59 +01:00
Salvatore Sanfilippo
e163332858 Merge pull request #1545 from mattsta/fix-redis-cli-sync
Deny SYNC and PSYNC in redis-cli
2014-02-23 17:47:28 +01:00
antirez
b1c1386374 Sentinel: IDONTKNOW error removed.
This error was conceived for the older version of Sentinel that worked
via master redirection and that was not able to get configuration
updates from other Sentinels via the Pub/Sub channel of masters or
slaves.

This reply does not make sense today, every Sentinel should reply with
the best information it has currently. The error will make even more
sense in the future since the plan is to allow Sentinels to update the
configuration of other Sentinels via gossip with a direct chat without
the prerequisite that they have at least a monitored instance in common.
2014-02-22 17:34:46 +01:00
antirez
8c254415f7 Sentinel test: framework improved and conf-update unit added.
It is now possible to kill and restart sentinel or redis instances for
more real-world testing.

The 01 unit tests the capability of Sentinel to update the configuration
of Sentinels rejoining the cluster, however the test is pretty trivial
and more tests should be added.
2014-02-22 17:27:49 +01:00
Salvatore Sanfilippo
1d7d1d9b1f Merge pull request #1559 from mattsta/more-detailed-process-title
Add cluster or sentinel to proc title
2014-02-21 09:32:13 +01:00
Matt Stancliff
2c273e3591 Add cluster or sentinel to proc title
If you launch redis with `redis-server --sentinel` then
in a ps, your output only says "redis-server IP:Port" — this
patch changes the proc title to include [sentinel] or
[cluster] depending on the current server mode:
e.g.  "redis-server IP:Port [sentinel]"
      "redis-server IP:Port [cluster]"
2014-02-20 23:58:54 -05:00
antirez
d7da507683 Sentinel test: move init tests as includes.
Most units will start with these two basic tests to create an
environment where the real tests are ran.
2014-02-20 16:58:23 +01:00
antirez
5765444454 Sentinel test: ability to run just a subset of test files. 2014-02-20 16:28:41 +01:00
antirez
7d7b3810e7 Sentinel: report instances role switch events.
This is useful mostly for debugging of issues.
2014-02-20 12:13:52 +01:00
antirez
e087d8a20d Sentinel test: some reliability fixes to 00-base tests. 2014-02-19 10:26:23 +01:00
antirez
a88a057a1f Sentinel test: check that role matches at end of 00-base. 2014-02-19 10:08:49 +01:00
antirez
2a08c7e5ac Sentinel test: ODOWN and agreement. 2014-02-19 09:44:38 +01:00
antirez
136537dcb0 Sentinel test: check reconfig of slaves and old master. 2014-02-18 17:03:56 +01:00
antirez
8e553ec67c Sentinel test: basic failover tested. Framework improvements. 2014-02-18 16:31:52 +01:00
antirez
c7b7439528 Sentinel test: basic tests for MONITOR and auto-discovery. 2014-02-18 11:53:54 +01:00
antirez
c4fbc1d336 Sentinel test: info fields, master-slave setup, fixes. 2014-02-18 11:38:49 +01:00
antirez
19b863c7fa Prefix test file names with numbers to force exec order. 2014-02-18 11:07:42 +01:00
antirez
141bac4c79 Sentinel test: provide basic commands to access instances. 2014-02-18 11:04:55 +01:00
antirez
7cec9e48ce Sentinel: SENTINEL_SLAVE_RECONF_RETRY_PERIOD -> RECONF_TIMEOUT
Rename define to match the new meaning.
2014-02-18 10:27:38 +01:00
antirez
18b8bad53c Sentinel: fix slave promotion timeout.
If we can't reconfigure a slave in time during failover, go forward as
anyway the slave will be fixed by Sentinels in the future, once they
detect it is misconfigured.

Otherwise a failover in progress may never terminate if for some reason
the slave is uncapable to sync with the master while at the same time
it is not disconnected.
2014-02-18 08:50:57 +01:00
antirez
af788b5852 Sentinel: initial testing framework.
Nothing tested at all so far... Just the infrastructure spawning N
Sentinels and N Redis instances that the test will use again and again.
2014-02-17 17:38:04 +01:00
antirez
34c404e069 Test: colorstr moved to util.tcl. 2014-02-17 17:36:50 +01:00
antirez
a1dca2efab Test: code to test server availability refactored.
Some inline test moved into server_is_up procedure.
Also find_available_port was moved into util since it is going
to be used for the Sentinel test as well.
2014-02-17 16:44:57 +01:00
antirez
ede33fb912 Get absoulte config file path before processig 'dir'.
The code tried to obtain the configuration file absolute path after
processing the configuration file. However if config file was a relative
path and a "dir" statement was processed reading the config, the absolute
path obtained was wrong.

With this fix the absolute path is obtained before processing the
configuration while the server is still in the original directory where
it was executed.
2014-02-17 16:44:53 +01:00
antirez
e1b77b61f3 Sentinel: better specify startup errors due to config file.
Now it logs the file name if it is not accessible. Also there is a
different error for the missing config file case, and for the non
writable file case.
2014-02-17 16:44:49 +01:00
antirez
51bd9da1fd Update cached time in rdbLoad() callback.
server.unixtime and server.mstime are cached less precise timestamps
that we use every time we don't need an accurate time representation and
a syscall would be too slow for the number of calls we require.

Such an example is the initialization and update process of the last
interaction time with the client, that is used for timeouts.

However rdbLoad() can take some time to load the DB, but at the same
time it did not updated the time during DB loading. This resulted in the
bug described in issue #1535, where in the replication process the slave
loads the DB, creates the redisClient representation of its master, but
the timestamp is so old that the master, under certain conditions, is
sensed as already "timed out".

Thanks to @yoav-steinberg and Redis Labs Inc for the bug report and
analysis.
2014-02-13 15:13:26 +01:00
antirez
7e8abcf693 Log when CONFIG REWRITE goes bad. 2014-02-13 14:32:44 +01:00
antirez
f2bdf601be Test: regression for issue #1549.
It was verified that reverting the commit that fixes the bug, the test
no longer passes.
2014-02-13 12:26:38 +01:00
antirez
21e6b0fbe9 Fix script cache bug in the scripting engine.
This commit fixes a serious Lua scripting replication issue, described
by Github issue #1549. The root cause of the problem is that scripts
were put inside the script cache, assuming that slaves and AOF already
contained it, even if the scripts sometimes produced no changes in the
data set, and were not actaully propagated to AOF/slaves.

Example:

    eval "if tonumber(KEYS[1]) > 0 then redis.call('incr', 'x') end" 1 0

Then:

    evalsha <sha1 step 1 script> 1 0

At this step sha1 of the script is added to the replication script cache
(the script is marked as known to the slaves) and EVALSHA command is
transformed to EVAL. However it is not dirty (there is no changes to db),
so it is not propagated to the slaves. Then the script is called again:

    evalsha <sha1 step 1 script> 1 1

At this step master checks that the script already exists in the
replication script cache and doesn't transform it to EVAL command. It is
dirty and propagated to the slaves, but they fail to evaluate the script
as they don't have it in the script cache.

The fix is trivial and just uses the new API to force the propagation of
the executed command regardless of the dirty state of the data set.

Thank you to @minus-infinity on Github for finding the issue,
understanding the root cause, and fixing it.
2014-02-13 12:10:43 +01:00
antirez
fc08c8599f AOF write error: retry with a frequency of 1 hz. 2014-02-12 16:27:59 +01:00
antirez
fe8352540f AOF: don't abort on write errors unless fsync is 'always'.
A system similar to the RDB write error handling is used, in which when
we can't write to the AOF file, writes are no longer accepted until we
are able to write again.

For fsync == always we still abort on errors since there is currently no
easy way to avoid replying with success to the user otherwise, and this
would violate the contract with the user of only acknowledging data
already secured on disk.
2014-02-12 16:11:36 +01:00
antirez
db6d628c3e Cluster: clusterDelNode(): remove node from master's slaves. 2014-02-11 10:34:25 +01:00