redict

mirror of https://codeberg.org/redict/redict.git synced 2025-01-23 00:28:26 -05:00

Author	SHA1	Message	Date
yoav-steinberg	2eb9b19612	Fix Eval scripts defrag (broken 7.0 in RC1) (#10271 ) Remove scripts defragger since it was broken since #10126 (released in 7.0 RC1). would crash the server if defragger starts in a server that contains eval scripts. In #10126 the global `lua_script` dict became a dict to a custom `luaScript` struct with an internal `robj` in it instead of a generic `sds` -> `robj` dict. This means we need custom code to defrag it and since scripts should never really cause much fragmentation it makes more sense to simply remove the defrag code for scripts.	2022-02-11 21:58:05 +02:00
Oran Agra	6add1b7217	Add external test that runs without debug command (#9964 ) - add needs:debug flag for some tests - disable "save" in external tests (speedup?) - use debug_digest proc instead of debug command directly so it can be skipped - use OBJECT ENCODING instead of DEBUG OBJECT to get encoding - add a proc for OBJECT REFCOUNT so it can be skipped - move a bunch of tests in latency_monitor tests to happen later so that latency monitor has some values in it - add missing close_replication_stream calls - make sure to close the temp client if DEBUG LOG fails	2021-12-19 17:41:51 +02:00
Oran Agra	d4e7ffb38c	Improve active defrag in jemalloc 5.2 (#9778 ) Background: Following the upgrade to jemalloc 5.2, there was a test that used to be flaky and started failing consistently (on 32bit), so we disabled it (see #9645). This is a test that i introduced in #7289 when i attempted to solve a rare stagnation problem, and it later turned out i failed to solve it, ans what's more i added a test that caused it to be not so rare, and as i mentioned, now in jemalloc 5.2 it became consistent on 32bit. Stagnation can happen when all the slabs of the bin are equally utilized, so the decision to move an allocation from a relatively empty slab to a relatively full one, will never happen, and in that test all the slabs are at 50% utilization, so the defragger could just keep scanning the keyspace and not move anything. What this PR changes: * First, finally in jemalloc 5.2 we have the count of non-full slabs, so when we compare the utilization of the current slab, we can compare it to the average utilization of the non-full slabs in our bin, instead of the total average of our bin. this takes the full slabs out of the game, since they're not candidates for migration (neither source nor target). * Secondly, We add some 12% (100/8) to the decision to defrag an allocation, this is the part that aims to avoid stagnation, and it's especially important since the above mentioned change can get us closer to stagnation. * Thirdly, since jemalloc 5.2 adds sharded bins, we take into account all shards (something that's missing from the original PR that merged it), this isn't expected to make any difference since anyway there should be just one shard. How this was benchmarked. What i did was run the memefficiency test unit with `--verbose` and compare the defragger hits and misses the tests reported. At first, when i took into consideration only the non-full slabs, it got a lot worse (i got into stagnation, or just got a lot of misses and a lot of hits), but when i added the 10% i got back to results that were slightly better than the ones of the jemalloc 5.1 branch. i.e. full defragmentation was achieved with fewer hits (relocations), and fewer misses (keyspace scans).	2021-11-21 13:35:39 +02:00
menwen	d5ca72e38b	fix defrag test looking at the wrong latency metric (#9723 ) the latency event was renamed in #7726, and the outcome was that the test was ineffective (unable to measure the max latency, always seeing 0)	2021-11-02 15:52:56 +02:00
yoav-steinberg	81095b1bd9	Skip Active-defrag edge case test until we fix it. (#9645 ) Test started failing consistently in 32bit builds after upgrading to jemalloc 5.2.1 (#9623).	2021-10-18 13:28:52 +03:00
Oran Agra	1e7ad894d2	Tune timeout of active defrag test (#9426 ) Failed on Raspberry Pi 3b where that single test took about 170 seconds	2021-08-30 12:39:09 +03:00
Binbin	0bfccc55e2	Fixed some typos, add a spell check ci and others minor fix (#8890 ) This PR adds a spell checker CI action that will fail future PRs if they introduce typos and spelling mistakes. This spell checker is based on blacklist of common spelling mistakes, so it will not catch everything, but at least it is also unlikely to cause false positives. Besides that, the PR also fixes many spelling mistakes and types, not all are a result of the spell checker we use. Here's a summary of other changes: 1. Scanned the entire source code and fixes all sorts of typos and spelling mistakes (including missing or extra spaces). 2. Outdated function / variable / argument names in comments 3. Fix outdated keyspace masks error log when we check `config.notify-keyspace-events` in loadServerConfigFromString. 4. Trim the white space at the end of line in `module.c`. Check: https://github.com/redis/redis/pull/7751 5. Some outdated https link URLs. 6. Fix some outdated comment. Such as: - In README: about the rdb, we used to said create a `thread`, change to `process` - dbRandomKey function coment (about the dictGetRandomKey, change to dictGetFairRandomKey) - notifyKeyspaceEvent fucntion comment (add type arg) - Some others minor fix in comment (Most of them are incorrectly quoted by variable names) 7. Modified the error log so that users can easily distinguish between TCP and TLS in `changeBindAddr`	2021-06-10 15:39:33 +03:00
Yossi Gottlieb	8a86bca5ed	Improve test suite to handle external servers better. (#9033 ) This commit revives the improves the ability to run the test suite against external servers, instead of launching and managing `redis-server` processes as part of the test fixture. This capability existed in the past, using the `--host` and `--port` options. However, it was quite limited and mostly useful when running a specific tests. Attempting to run larger chunks of the test suite experienced many issues: * Many tests depend on being able to start and control `redis-server` themselves, and there's no clear distinction between external server compatible and other tests. * Cluster mode is not supported (resulting with `CROSSSLOT` errors). This PR cleans up many things and makes it possible to run the entire test suite against an external server. It also provides more fine grained controls to handle cases where the external server supports a subset of the Redis commands, limited number of databases, cluster mode, etc. The tests directory now contains a `README.md` file that describes how this works. This commit also includes additional cleanups and fixes: * Tests can now be tagged. * Tag-based selection is now unified across `start_server`, `tags` and `test`. * More information is provided about skipped or ignored tests. * Repeated patterns in tests have been extracted to common procedures, both at a global level and on a per-test file basis. * Cleaned up some cases where test setup was based on a previous test executing (a major anti-pattern that repeats itself in many places). * Cleaned up some cases where test teardown was not part of a test (in the future we should have dedicated teardown code that executes even when tests fail). * Fixed some tests that were flaky running on external servers.	2021-06-09 15:13:24 +03:00
Oran Agra	5843a45d01	Skip defrag tests on systems with bigger page sizes (#8294 ) The defragger works well on these systems, but the tests and their thresholds are not adjusted for these big pages, so the defragger isn't able to get down the fragmentation to the levels the test expects and it fails on "defrag didn't stop". Randomly choosing 8k as the threshold for the skipping Fixes #8265 (which had 65k pages)	2021-01-08 10:03:21 +02:00
Oran Agra	7d9b09adaa	Tests: fix new defrag test to be skipped when not supported (#8185 ) Additionally the older defrag tests are using an obsolete way to check if the defragger is suuported (the error no longer contains "DISABLED"). this doesn't usually makes a difference since these tests are completely skipped if the allocator is not jemalloc, but that would fail if the allocator is a jemalloc that doesn't support defrag.	2020-12-14 11:13:46 +02:00
Yossi Gottlieb	2faa0f19eb	Fix test failure on slower systems. Not disabling save, slower systems begun background save that did not complete in time, resulting with SAVE failing with "ERR Background save already in progress".	2020-11-04 21:43:55 +02:00
Yossi Gottlieb	843a13e88f	Add a --no-latency tests flag. (#7939 ) Useful for running tests on systems which may be way slower than usual.	2020-10-22 11:10:53 +03:00
Oran Agra	9ef8d2f671	Run active defrag while blocked / loading (#7726 ) During long running scripts or loading RDB/AOF, we may need to do some defragging. Since processEventsWhileBlocked is called periodically at unknown intervals, and many cron jobs either depend on run_with_period (including active defrag), or rely on being called at server.hz rate (i.e. active defrag knows ho much time to run by looking at server.hz), the whileBlockedCron may have to run a loop triggering the cron jobs in it (currently only active defrag) several times. Other changes: - Adding a test for defrag during aof loading. - Changing key-load-delay config to take negative values for fractions of a microsecond sleep	2020-09-03 08:47:29 +03:00
Oran Agra	88d71f4793	fix a rare active defrag edge case bug leading to stagnation There's a rare case which leads to stagnation in the defragger, causing it to keep scanning the keyspace and do nothing (not moving any allocation), this happens when all the allocator slabs of a certain bin have the same % utilization, but the slab from which new allocations are made have a lower utilization. this commit fixes it by removing the current slab from the overall average utilization of the bin, and also eliminate any precision loss in the utilization calculation and move the decision about the defrag to reside inside jemalloc. and also add a test that consistently reproduce this issue.	2020-05-20 16:04:42 +03:00
Oran Agra	b9fa42a197	testsuite run the defrag latency test solo this test is time sensitive and it sometimes fail to pass below the latency threshold, even on strong machines. this test was the reson we're running just 2 parallel tests in the github actions CI, revering this.	2020-04-16 18:09:22 +03:00
Oran Agra	2f1a1c3835	fix github actions failing latency test for active defrag - part 2 it seems that running two clients at a time is ok too, resuces action time from 20 minutes to 10. we'll use this for now, and if one day it won't be enough we'll have to run just the sensitive tests one by one separately from the others. this commit also fixes an issue with the defrag test that appears to be very rare.	2020-02-27 08:34:53 +02:00
Oran Agra	537893420b	fix github actions failing latency test for active defrag seems that github actions are slow, using just one client to reduce false positives. also adding verbose, testing only on latest ubuntu, and building on older one. when doing that, i can reduce the test threshold back to something saner	2020-02-25 17:53:23 +02:00
Oran Agra	62adabd0e0	Fix latency sensitivity of new defrag test I saw that the new defag test for list was failing in CI recently, so i reduce it's threshold from 12 to 60. besides that, i add / improve the latency test for that other two defrag tests (add a sensitive latency and digest / save checks) and fix bad usage of debug populate (can't overrides existing keys). this was the original intention, which creates higher fragmentation.	2020-02-23 13:05:52 +02:00
Oran Agra	485425cec7	Defrag big lists in portions to avoid latency and freeze When active defrag kicks in and finds a big list, it will create a bookmark to a node so that it is able to resume iteration from that node later. The quicklist manages that bookmark, and updates it in case that node is deleted. This will increase memory usage only on lists of over 1000 (see active-defrag-max-scan-fields) quicklist nodes (1000 ziplists, not 1000 items) by 16 bytes. In 32 bit build, this change reduces the maximum effective config of list-compress-depth and list-max-ziplist-size (from 32767 to 8191)	2020-02-18 17:22:32 +02:00
Oran Agra	d0850369c4	fix small test suite race conditions	2018-11-12 10:26:10 +02:00
Oran Agra	c8452ab005	Fix unstable tests on slow machines. Few tests had borderline thresholds that were adjusted. The slave buffers test had two issues, preventing the slave buffer from growing: 1) the slave didn't necessarily go to sleep on time, or woke up too early, now using SIGSTOP to make sure it goes to sleep exactly when we want. 2) the master disconnected the slave on timeout	2018-08-21 11:46:07 +03:00
Oran Agra	f89c93c8ad	make active defrag test more stable on slower machines, the active defrag test tended to fail. although the fragmentation ratio was below the treshold, the defragger was still in the middle of a scan cycle. this commit changes: - the defragger uses the current fragmentation state, rather than the cache one that is updated by server cron every 100ms. this actually fixes a bug of starting one excess scan cycle - the test lets the defragger use more CPU cycles, in hope that the defrag will be faster, but also give it more time before we give up.	2018-07-18 10:16:33 +03:00
Oran Agra	de495ee7ab	minor fix in creating a stream NACK for rdb and defrag tests	2018-06-27 15:34:17 +03:00
Oran Agra	5616d4c603	add active defrag support for streams	2018-06-27 15:00:41 +03:00
antirez	98d5d3f118	Make active defragmentation tests optional. They failed when active defrag could not be activated because the Jemalloc version does not include the additional APIs.	2018-05-24 18:04:21 +02:00
Oran Agra	ad133e1023	Active defrag fixes for 32bit builds problems fixed: * failing to read fragmentation information from jemalloc * overflow in jemalloc fragmentation hint to the defragger * test suite not triggering eviction after population	2018-05-17 09:52:00 +03:00
Oran Agra	806736cdf9	Adding real allocator fragmentation to INFO and MEMORY command + active defrag test other fixes / improvements: - LUA script memory isn't taken from zmalloc (taken from libc malloc) so it can cause high fragmentation ratio to be displayed (which is false) - there was a problem with "fragmentation" info being calculated from RSS and used_memory sampled at different times (now sampling them together) other details: - adding a few more allocator info fields to INFO and MEMORY commands - improve defrag test to measure defrag latency of big keys - increasing the accuracy of the defrag test (by looking at real grag info) this way we can use an even lower threshold and still avoid false positives - keep the old (total) "fragmentation" field unchanged, but add new ones for spcific things - add these the MEMORY DOCTOR command - deduct LUA memory from the rss in case of non jemalloc allocator (one for which we don't "allocator active/used") - reduce sampling rate of the rss and allocator info	2018-03-12 15:08:52 +02:00
antirez	c861e1e1ee	Defrag: test currently disabled, too many false positives. Related to #3786.	2017-04-22 15:59:57 +02:00
antirez	a17390853d	Defrag: fix test false positive. Apparently 1.4 is too low compared to what you get in certain setups (including mine). I raised it to 1.55 that hopefully is still enough to test that the fragmentation went down from 1.7 but without incurring in issues, however the test setup may be still fragile so certain times this may lead to false positives again, it's hard to test for these things in a determinsitic way. Related to #3786.	2017-04-22 13:21:41 +02:00
oranagra	0fb5c4ebd8	add test for active defrag	2017-04-22 13:17:09 +02:00
antirez	5e3dcc522b	Faster memory efficiency test. This test on Linux was extremely slow, since in Tcl we can't enable easily tcp-nodelay, so the busy loop used to take a lot with bigger writes. Fixed using pipelining.	2015-02-10 14:47:45 +01:00
antirez	fcebd9b0f9	Fix false positive in memory efficiency test. Fixes issue #1298.	2013-11-25 10:21:46 +01:00
antirez	f79b1cb49e	Test: added a memory efficiency test.	2013-08-29 16:23:57 +02:00

33 Commits