Commit Graph

26 Commits

Author SHA1 Message Date
Yossi Gottlieb
8a86bca5ed
Improve test suite to handle external servers better. (#9033)
This commit revives the improves the ability to run the test suite against
external servers, instead of launching and managing `redis-server` processes as
part of the test fixture.

This capability existed in the past, using the `--host` and `--port` options.
However, it was quite limited and mostly useful when running a specific tests.
Attempting to run larger chunks of the test suite experienced many issues:

* Many tests depend on being able to start and control `redis-server` themselves,
and there's no clear distinction between external server compatible and other
tests.
* Cluster mode is not supported (resulting with `CROSSSLOT` errors).

This PR cleans up many things and makes it possible to run the entire test suite
against an external server. It also provides more fine grained controls to
handle cases where the external server supports a subset of the Redis commands,
limited number of databases, cluster mode, etc.

The tests directory now contains a `README.md` file that describes how this
works.

This commit also includes additional cleanups and fixes:

* Tests can now be tagged.
* Tag-based selection is now unified across `start_server`, `tags` and `test`.
* More information is provided about skipped or ignored tests.
* Repeated patterns in tests have been extracted to common procedures, both at a
  global level and on a per-test file basis.
* Cleaned up some cases where test setup was based on a previous test executing
  (a major anti-pattern that repeats itself in many places).
* Cleaned up some cases where test teardown was not part of a test (in the
  future we should have dedicated teardown code that executes even when tests
  fail).
* Fixed some tests that were flaky running on external servers.
2021-06-09 15:13:24 +03:00
Oran Agra
5843a45d01
Skip defrag tests on systems with bigger page sizes (#8294)
The defragger works well on these systems, but the tests and their
thresholds are not adjusted for these big pages, so the defragger isn't
able to get down the fragmentation to the levels the test expects and it
fails on "defrag didn't stop".

Randomly choosing 8k as the threshold for the skipping

Fixes #8265 (which had 65k pages)
2021-01-08 10:03:21 +02:00
Oran Agra
7d9b09adaa
Tests: fix new defrag test to be skipped when not supported (#8185)
Additionally the older defrag tests are using an obsolete way to check
if the defragger is suuported (the error no longer contains "DISABLED").
this doesn't usually makes a difference since these tests are completely
skipped if the allocator is not jemalloc, but that would fail if the
allocator is a jemalloc that doesn't support defrag.
2020-12-14 11:13:46 +02:00
Yossi Gottlieb
2faa0f19eb Fix test failure on slower systems.
Not disabling save, slower systems begun background save that did not
complete in time, resulting with SAVE failing with "ERR Background save
already in progress".
2020-11-04 21:43:55 +02:00
Yossi Gottlieb
843a13e88f
Add a --no-latency tests flag. (#7939)
Useful for running tests on systems which may be way slower than usual.
2020-10-22 11:10:53 +03:00
Oran Agra
9ef8d2f671
Run active defrag while blocked / loading (#7726)
During long running scripts or loading RDB/AOF, we may need to do some
defragging. Since processEventsWhileBlocked is called periodically at
unknown intervals, and many cron jobs either depend on run_with_period
(including active defrag), or rely on being called at server.hz rate
(i.e. active defrag knows ho much time to run by looking at server.hz),
the whileBlockedCron may have to run a loop triggering the cron jobs in it
(currently only active defrag) several times.

Other changes:
- Adding a test for defrag during aof loading.
- Changing key-load-delay config to take negative values for fractions
  of a microsecond sleep
2020-09-03 08:47:29 +03:00
Oran Agra
88d71f4793 fix a rare active defrag edge case bug leading to stagnation
There's a rare case which leads to stagnation in the defragger, causing
it to keep scanning the keyspace and do nothing (not moving any
allocation), this happens when all the allocator slabs of a certain bin
have the same % utilization, but the slab from which new allocations are
made have a lower utilization.

this commit fixes it by removing the current slab from the overall
average utilization of the bin, and also eliminate any precision loss in
the utilization calculation and move the decision about the defrag to
reside inside jemalloc.

and also add a test that consistently reproduce this issue.
2020-05-20 16:04:42 +03:00
Oran Agra
b9fa42a197 testsuite run the defrag latency test solo
this test is time sensitive and it sometimes fail to pass below the
latency threshold, even on strong machines.

this test was the reson we're running just 2 parallel tests in the
github actions CI, revering this.
2020-04-16 18:09:22 +03:00
Oran Agra
2f1a1c3835 fix github actions failing latency test for active defrag - part 2
it seems that running two clients at a time is ok too, resuces action
time from 20 minutes to 10. we'll use this for now, and if one day it
won't be enough we'll have to run just the sensitive tests one by one
separately from the others.

this commit also fixes an issue with the defrag test that appears to be
very rare.
2020-02-27 08:34:53 +02:00
Oran Agra
537893420b fix github actions failing latency test for active defrag
seems that github actions are slow, using just one client to reduce
false positives.

also adding verbose, testing only on latest ubuntu, and building on
older one.

when doing that, i can reduce the test threshold back to something saner
2020-02-25 17:53:23 +02:00
Oran Agra
62adabd0e0 Fix latency sensitivity of new defrag test
I saw that the new defag test for list was failing in CI recently, so i
reduce it's threshold from 12 to 60.

besides that, i add / improve the latency test for that other two defrag
tests (add a sensitive latency and digest / save checks)

and fix bad usage of debug populate (can't overrides existing keys).
this was the original intention, which creates higher fragmentation.
2020-02-23 13:05:52 +02:00
Oran Agra
485425cec7 Defrag big lists in portions to avoid latency and freeze
When active defrag kicks in and finds a big list, it will create a bookmark to
a node so that it is able to resume iteration from that node later.

The quicklist manages that bookmark, and updates it in case that node is deleted.

This will increase memory usage only on lists of over 1000 (see
active-defrag-max-scan-fields) quicklist nodes (1000 ziplists, not 1000 items)
by 16 bytes.

In 32 bit build, this change reduces the maximum effective config of
list-compress-depth and list-max-ziplist-size (from 32767 to 8191)
2020-02-18 17:22:32 +02:00
Oran Agra
d0850369c4 fix small test suite race conditions 2018-11-12 10:26:10 +02:00
Oran Agra
c8452ab005 Fix unstable tests on slow machines.
Few tests had borderline thresholds that were adjusted.

The slave buffers test had two issues, preventing the slave buffer from growing:
1) the slave didn't necessarily go to sleep on time, or woke up too early,
   now using SIGSTOP to make sure it goes to sleep exactly when we want.
2) the master disconnected the slave on timeout
2018-08-21 11:46:07 +03:00
Oran Agra
f89c93c8ad make active defrag test more stable
on slower machines, the active defrag test tended to fail.
although the fragmentation ratio was below the treshold, the defragger was
still in the middle of a scan cycle.

this commit changes:
- the defragger uses the current fragmentation state, rather than the cache one
  that is updated by server cron every 100ms. this actually fixes a bug of
  starting one excess scan cycle
- the test lets the defragger use more CPU cycles, in hope that the defrag
  will be faster, but also give it more time before we give up.
2018-07-18 10:16:33 +03:00
Oran Agra
de495ee7ab minor fix in creating a stream NACK for rdb and defrag tests 2018-06-27 15:34:17 +03:00
Oran Agra
5616d4c603 add active defrag support for streams 2018-06-27 15:00:41 +03:00
antirez
98d5d3f118 Make active defragmentation tests optional.
They failed when active defrag could not be activated because the
Jemalloc version does not include the additional APIs.
2018-05-24 18:04:21 +02:00
Oran Agra
ad133e1023 Active defrag fixes for 32bit builds
problems fixed:
* failing to read fragmentation information from jemalloc
* overflow in jemalloc fragmentation hint to the defragger
* test suite not triggering eviction after population
2018-05-17 09:52:00 +03:00
Oran Agra
806736cdf9 Adding real allocator fragmentation to INFO and MEMORY command + active defrag test
other fixes / improvements:
- LUA script memory isn't taken from zmalloc (taken from libc malloc)
  so it can cause high fragmentation ratio to be displayed (which is false)
- there was a problem with "fragmentation" info being calculated from
  RSS and used_memory sampled at different times (now sampling them together)

other details:
- adding a few more allocator info fields to INFO and MEMORY commands
- improve defrag test to measure defrag latency of big keys
- increasing the accuracy of the defrag test (by looking at real grag info)
  this way we can use an even lower threshold and still avoid false positives
- keep the old (total) "fragmentation" field unchanged, but add new ones for spcific things
- add these the MEMORY DOCTOR command
- deduct LUA memory from the rss in case of non jemalloc allocator (one for which we don't "allocator active/used")
- reduce sampling rate of the rss and allocator info
2018-03-12 15:08:52 +02:00
antirez
c861e1e1ee Defrag: test currently disabled, too many false positives.
Related to #3786.
2017-04-22 15:59:57 +02:00
antirez
a17390853d Defrag: fix test false positive.
Apparently 1.4 is too low compared to what you get in certain setups
(including mine). I raised it to 1.55 that hopefully is still enough to
test that the fragmentation went down from 1.7 but without incurring in
issues, however the test setup may be still fragile so certain times this
may lead to false positives again, it's hard to test for these things
in a determinsitic way.

Related to #3786.
2017-04-22 13:21:41 +02:00
oranagra
0fb5c4ebd8 add test for active defrag 2017-04-22 13:17:09 +02:00
antirez
5e3dcc522b Faster memory efficiency test.
This test on Linux was extremely slow, since in Tcl we can't enable
easily tcp-nodelay, so the busy loop used to take *a lot* with bigger
writes. Fixed using pipelining.
2015-02-10 14:47:45 +01:00
antirez
fcebd9b0f9 Fix false positive in memory efficiency test.
Fixes issue #1298.
2013-11-25 10:21:46 +01:00
antirez
f79b1cb49e Test: added a memory efficiency test. 2013-08-29 16:23:57 +02:00