Commit Graph

71 Commits

Author SHA1 Message Date
Yossi Gottlieb
5b8350aaaa
Add --dump-logs tests option. (#8459)
Dump the entire server log if a test failed, to easy troubleshooting
with no access to log files.
2021-02-07 12:37:24 +02:00
Yossi Gottlieb
522d93607a
Add io-thread daily CI tests. (#8232)
This adds basic coverage to IO threads by running the cluster and few selected Redis test suite tests with the IO threads enabled.

Also provides some necessary additional improvements to the test suite:

* Add --config to sentinel/cluster tests for arbitrary configuration.
* Fix --tags whitelisting which was broken.
* Add a `network` tag to some tests that are more network intensive. This is work in progress and more tests should be properly tagged in the future.
2021-01-17 15:48:48 +02:00
Yang Bodong
ee59dc1b5c
Tests: fix the problem that Darwin memory leak detection may fail (#8213)
Apparently the "leaks" took reports a different error string about process
that's not found in each version of MacOS.
This cause the test suite to fail on some OS versions, since some tests terminate
the process before looking for leaks.
Instead of looking at the error string, we now look at the (documented) exit code.
2020-12-23 16:28:17 +02:00
Yossi Gottlieb
8c291b97b9
TLS: Add different client cert support. (#8076)
This adds a new `tls-client-cert-file` and `tls-client-key-file`
configuration directives which make it possible to use different
certificates for the TLS-server and TLS-client functions of Redis.

This is an optional directive. If it is not specified the `tls-cert-file`
and `tls-key-file` directives are used for TLS client functions as well.

Also, `utils/gen-test-certs.sh` now creates additional server-only and client-only certs and will skip intensive operations if target files already exist.
2020-12-11 18:31:40 +02:00
Oran Agra
c31055db61 Sanitize dump payload: fuzz tester and fixes for segfaults and leaks it exposed
The test creates keys with various encodings, DUMP them, corrupt the payload
and RESTORES it.
It utilizes the recently added use-exit-on-panic config to distinguish between
 asserts and segfaults.
If the restore succeeds, it runs random commands on the key to attempt to
trigger a crash.

It runs in two modes, one with deep sanitation enabled and one without.
In the first one we don't expect any assertions or segfaults, in the second one
we expect assertions, but no segfaults.
We also check for leaks and invalid reads using valgrind, and if we find them
we print the commands that lead to that issue.

Changes in the code (other than the test):
- Replace a few NPD (null pointer deference) flows and division by zero with an
  assertion, so that it doesn't fail the test. (since we set the server to use
  `exit` rather than `abort` on assertion).
- Fix quite a lot of flows in rdb.c that could have lead to memory leaks in
  RESTORE command (since it now responds with an error rather than panic)
- Add a DEBUG flag for SET-SKIP-CHECKSUM-VALIDATION so that the test don't need
  to bother with faking a valid checksum
- Remove a pile of code in serverLogObjectDebugInfo which is actually unsafe to
  run in the crash report (see comments in the code)
- fix a missing boundary check in lzf_decompress

test suite infra improvements:
- be able to run valgrind checks before the process terminates
- rotate log files when restarting servers
2020-12-06 14:54:34 +02:00
Yossi Gottlieb
ef92f507dd
Fix tests failure on busybox systems. (#7916) 2020-10-18 14:50:29 +03:00
Yossi Gottlieb
2df4cb93ac
Tests: fix unmonitored servers. (#7756)
There is an inherent race condition in port allocation for spawned
servers. If a server fails to start because a port is taken, a new port
is allocated. This fixes a problem where the logs are not truncated and
as a result a large number of unmonitored servers are started.
2020-09-07 17:30:36 +03:00
Oran Agra
2b998de460
Improve valgrind support for cluster tests (#7725)
- redirect valgrind reports to a dedicated file rather than console
- try to avoid killing instances with SIGKILL so that we get the memory
  leak report (killing with SIGTERM before resorting to SIGKILL)
- search for valgrind reports when done, print them and fail the tests
- add --dont-clean option to keep the logs on exit
- fix exit error code when crash is found (would have exited with 0)

changes that affect the normal redis test suite:
- refactor check_valgrind_errors into two functions one to search and
  one to report
- move the search half into util.tcl to serve the cluster tests too
- ignore "address range perms" valgrind warnings which seem non relevant.
2020-09-06 11:11:49 +03:00
Oran Agra
fe5da2e60d test infra - add durable mode to work around test suite crashing
in some cases a command that returns an error possibly due to a timing
issue causes the tcl code to crash and thus prevents the rest of the
tests from running. this adds an option to make the test proceed despite
the crash.
maybe it should be the default mode some day.
2020-09-06 09:59:19 +03:00
Oran Agra
b65e5aca86 test infra - flushall between tests in external mode 2020-09-06 09:59:19 +03:00
Oran Agra
677d14c213 test infra - improve test skipping ability
- skip full units
- skip a single test (not just a list of tests)
- when skipping tag, skip spinning up servers, not just the tests
- skip tags when running against an external server too
- allow using multiple tags (split them)
2020-09-06 09:59:19 +03:00
Oran Agra
e3e69c25fd test infra - reduce disk space usage
this is important when running a test with --loop
2020-09-06 09:59:19 +03:00
Oran Agra
9d527d076b test infra - write test name to logfile 2020-09-06 09:59:19 +03:00
Oran Agra
36b9494385
testsuite may leave servers alive on error (#7549)
in cases where you have
test name {
  start_server {
    start_server {
      assert
    }
  }
}

the exception will be thrown to the test proc, and the servers are
supposed to be killed on the way out. but it seems there was always a
bug of not cleaning the server stack, and recently (#7404) we started
relying on that stack in order to kill them, so with that bug sometimes
we would have tried to kill the same server twice, and leave one alive.

luckly, in most cases the pattern is:
start_server {
  test name {
  }
}
2020-07-21 16:56:19 +03:00
Oran Agra
69ade87325
tests/valgrind: don't use debug restart (#7404)
* tests/valgrind: don't use debug restart

DEBUG REATART causes two issues:
1. it uses execve which replaces the original process and valgrind doesn't
   have a chance to check for errors, so leaks go unreported.
2. valgrind report invalid calls to close() which we're unable to resolve.

So now the tests use restart_server mechanism in the tests, that terminates
the old server and starts a new one, new PID, but same stdout, stderr.

since the stderr can contain two or more valgrind report, it is not enough
to just check for the absence of leaks, we also need to check for some known
errors, we do both, and fail if we either find an error, or can't find a
report saying there are no leaks.

other changes:
- when killing a server that was already terminated we check for leaks too.
- adding DEBUG LEAK which was used to test it.
- adding --trace-children to valgrind, although no longer needed.
- since the stdout contains two or more runs, we need slightly different way
  of checking if the new process is up (explicitly looking for the new PID)
- move the code that handles --wait-server to happen earlier (before
  watching the startup message in the log), and serve the restarted server too.

* squashme - CR fixes
2020-07-10 08:26:52 +03:00
Oran Agra
e258a1c087 tests: each test client work on a distinct port range
apparently when running tests in parallel (the default of --clients 16),
there's a chance for two tests to use the same port.
specifically, one test might shutdown a master and still have the
replica up, and then another test will re-use the port number of master
for another master, and then that replica will connect to the master of
the other test.

this can cause a master to count too many full syncs and fail a test if
we run the tests with --single integration/psync2 --loop --stop

see Probmem 2 in #7314
2020-05-26 11:17:08 +03:00
Oran Agra
cf3789f045 diffrent fix for runtest --host --port 2020-04-06 09:41:14 +03:00
bodong.ybd
336458d4b5 Fix bug of tcl test using external server 2020-03-11 21:01:27 +08:00
antirez
73305861f5 Test engine: experimental change to avoid busy port problems. 2020-02-24 10:46:23 +01:00
antirez
e78c4e813c Test engine: detect timeout when checking for Redis startup. 2020-02-21 18:55:56 +01:00
antirez
c6954de3ea Test engine: better tracking of what workers are doing. 2020-02-21 17:08:45 +01:00
Yossi Gottlieb
b087dd1db6 TLS: Connections refactoring and TLS support.
* Introduce a connection abstraction layer for all socket operations and
integrate it across the code base.
* Provide an optional TLS connections implementation based on OpenSSL.
* Pull a newer version of hiredis with TLS support.
* Tests, redis-cli updates for TLS support.
2019-10-07 21:06:13 +03:00
maya-rv
227965221a
Fix typo 2018-09-04 13:32:02 +03:00
Oran Agra
751eea24c4 test suite infra improvements and fix
* fail the test (exit code) in case of timeout.
* add --wait-server to allow attaching a debugger
* add --dont-clean to keep log files when tests are done
2018-06-26 20:23:55 +03:00
antirez
8444b46d20 Fix test "server is up" detection after logging changes. 2016-12-19 16:49:58 +01:00
antirez
36be34bb87 Test: support for stack logging for OSX malloc/leaks. 2015-10-01 13:02:25 +02:00
antirez
386804246f Test: be more patient waiting for servers to exit.
This should likely fix a false positive when running with the --valgrind
option.
2015-03-31 23:43:38 +02:00
Matt Stancliff
491d57abaa Add --track-origins=yes to valgrind 2015-01-21 15:48:19 +01:00
antirez
fe0d371995 Test: wait for actual startup in start_server.
start_server now uses return value from Tcl exec to get the server pid,
however this introduces errors that depend from timing: a lot of the
testing code base assumed the server to be actually up and running when
server_start returns.

So the old code that waits to see the pid in the log file was restored.
2014-11-28 11:49:26 +01:00
antirez
bd3a51615c Test: try to cleanup still running Redis instances on exit.
It's hard to run the Redis test continuously if it leaks processes on
exceptions / errors.
2014-11-28 11:38:17 +01:00
Matt Stancliff
1cedebb799 Remove trailing spaces from tests 2014-09-29 06:49:08 -04:00
Matt Stancliff
6c16ecaaaa Fix test framework to detect proper server PID
Previously the PID format was:
[PID] Timestamp

But it recently changed to:
PID:X Timestamp

The tcl testing framework was grabbing the PID from \[\d+\], but
that's not valid anymore.

Now we grab the pid from "PID: <PID>" in the part of Redis startup
output to the right of the ASCII logo.
2014-05-23 13:54:29 -04:00
antirez
088b9eadc4 Test: handle new osx 'leaks' error.
Sometimes the process is still there but no longer in a state that can
be checked (after being killed). This used to happen after a call to
SHUTDOWN NOSAVE in the scripting unit, causing a false positive.
2014-05-07 16:12:32 +02:00
antirez
9e0b9f12b2 Test: do not complain when "leaks" can't run because process died. 2014-03-25 09:33:37 +01:00
antirez
a1dca2efab Test: code to test server availability refactored.
Some inline test moved into server_is_up procedure.
Also find_available_port was moved into util since it is going
to be used for the Sentinel test as well.
2014-02-17 16:44:57 +01:00
antirez
dc24a6b132 Return a specific NOAUTH error if authentication is required. 2013-02-12 16:25:41 +01:00
antirez
a18ca73681 Test: fixed osx "leaks" support in test.
Due to changes in recent releases of osx leaks utility, the osx leak
detection no longer worked. Now it is fixed in a way that should be
backward compatible.
2012-12-03 12:06:38 +01:00
YAMAMOTO Takashi
164d57c60d fix a typo in a comment 2012-10-24 17:47:56 +09:00
Michael Schlenker
875944a23f Replace unnecessary calls to echo and cat
Tcl's exec can send data to stdout itself, no need to call cat/echo for
that usually.
2012-04-17 22:20:54 +02:00
Premysl Hruby
9184f8fd00 in kill_server send the signal once, then wait for up to 5sec before sending lethal SIGKILL 2012-04-03 14:20:52 +02:00
antirez
0fefb5bbeb Redis test: regexp to check if valgrind reported errors modified. Now we no longer look at the total count because this includes "possibly lost" bytes that are not interesting for Redis (tons of false positives because of how sds.c works). 2012-03-28 10:55:17 +02:00
antirez
74f10793c8 When running the test in valgrind mode, pass the right flags to show memory leaks stack traces but only including the "definitely lost" items. 2012-03-24 12:07:14 +01:00
antirez
b1d08d4540 Redis test: wait more time for the server to start if it is running using valgrind. 2011-12-07 17:51:21 +01:00
antirez
846bcd9abe Redis test: handle inability to start the server in a better way. 2011-12-07 11:47:38 +01:00
antirez
24bfb570ee Redis test ports selection made more robust. This prevents the test from hanging if an already bound port is selected but the TCP server listening to it does not cause a protocol error with a Redis client PING. Also base port moved away from the range near to the Redis Cluster gossip ports. 2011-10-04 10:05:21 +02:00
antirez
4c378d7f6c new test engine valgrind support 2011-07-11 13:41:06 +02:00
antirez
5ab1461f98 The test now gives some more time for Redis to start before of exiting with an error, since starting with valgrind can take a significant amount of time. 2011-07-09 19:23:46 +02:00
antirez
cabe03eb75 more valgrind friendly test 2011-07-06 15:22:00 +02:00
antirez
72dff2c084 test fixed after ascii art banner modified the output of a running server 2011-04-15 16:35:54 +02:00
antirez
5e1d2d30f7 initial fix of the test suite to run both in in-memory and diskstore mode 2011-01-09 16:49:52 +01:00