redict

mirror of https://codeberg.org/redict/redict.git synced 2025-01-22 16:18:28 -05:00

Author	SHA1	Message	Date
antirez	ece658713b	Modules TSC: Improve inter-thread synchronization. More work to do with server.unixtime and similar. Need to write Helgrind suppression file in order to suppress the valse positives.	2017-05-09 11:57:09 +02:00
antirez	2a51bac44e	Simplify atomicvar.h usage by having the mutex name implicit.	2017-05-04 17:01:00 +02:00
antirez	52bc74f221	Lazyfree: fix lazyfreeGetPendingObjectsCount() race reading counter.	2017-05-04 10:35:40 +02:00
antirez	7d9326b1f3	Modules TSC: HELLO.KEYS reply format fixed.	2017-05-03 23:43:49 +02:00
antirez	9b01b64430	Modules TSC: put the client in the pending write list.	2017-05-03 14:54:48 +02:00
antirez	e67fb915eb	adlist: fix final list count in listJoin().	2017-05-03 14:54:14 +02:00
antirez	79226cb9fa	adlist: fix listJoin() to handle empty lists.	2017-05-03 14:15:25 +02:00
antirez	6798736909	Modules: remove unused var in example module.	2017-05-03 14:10:21 +02:00
antirez	1ed2ff5570	Modules TSC: HELLO.KEYS example draft finished.	2017-05-03 14:08:12 +02:00
antirez	7127f15ebe	Module: fix RedisModule_Call() "l" specifier to create a raw string.	2017-05-03 14:07:10 +02:00
antirez	3fcf959e60	Modules TSC: Release the GIL for all the time we are blocked. Instead of giving the module background operations just a small time to run in the beforeSleep() function, we can have the lock released for all the time we are blocked in the multiplexing syscall.	2017-05-03 11:26:21 +02:00
antirez	ba4a5a3255	Modules TSC: Export symbols of the new API.	2017-05-02 15:19:28 +02:00
antirez	275905b328	Modules TSC: Handling of RM_Reply* functions.	2017-05-02 15:05:39 +02:00
antirez	9c500b89fb	Modules TSC: Basic TS context creeation and handling.	2017-05-02 12:53:10 +02:00
antirez	59b06b14c9	Modules TSC: GIL and cooperative multi tasking setup.	2017-04-28 18:41:10 +02:00
antirez	c180bc7d98	Regression test for PSYNC2 issue #3899 added. Experimentally verified that it can trigger the issue reverting the fix. At least on my system... Being the bug time/backlog dependant, it is very hard to tell if this test will be able to trigger the problem consistently, however even if it triggers the problem once in a while, we'll see it in the CI environment at http://ci.redis.io.	2017-04-28 10:37:07 +02:00
antirez	469d6e2b37	PSYNC2: fix master cleanup when caching it. The master client cleanup was incomplete: resetClient() was missing and the output buffer of the client was not reset, so pending commands related to the previous connection could be still sent. The first problem caused the client argument vector to be, at times, half populated, so that when the correct replication stream arrived the protcol got mixed to the arugments creating invalid commands that nobody called. Thanks to @yangsiran for also investigating this problem, after already providing important design / implementation hints for the original PSYNC2 issues (see referenced Github issue). Note that this commit adds a new function to the list library of Redis in order to be able to reset a list without destroying it. Related to issue #3899.	2017-04-27 17:08:37 +02:00
antirez	c861e1e1ee	Defrag: test currently disabled, too many false positives. Related to #3786.	2017-04-22 15:59:57 +02:00
antirez	a17390853d	Defrag: fix test false positive. Apparently 1.4 is too low compared to what you get in certain setups (including mine). I raised it to 1.55 that hopefully is still enough to test that the fragmentation went down from 1.7 but without incurring in issues, however the test setup may be still fragile so certain times this may lead to false positives again, it's hard to test for these things in a determinsitic way. Related to #3786.	2017-04-22 13:21:41 +02:00
oranagra	0fb5c4ebd8	add test for active defrag	2017-04-22 13:17:09 +02:00
antirez	e3b8492e83	Revert "Jemalloc updated to 4.4.0." This reverts commit 36c1acc222d29e6e2dc9fc25362e4faa471111bd.	2017-04-22 13:17:07 +02:00
antirez	238cebdd5e	Check event loop creation return value. Fix #3951 . Normally we never check for OOM conditions inside Redis since the allocator will always return a pointer or abort the program on OOM conditons. However we cannot have control on epool_create(), that may fail for kernel OOM (according to the manual page) even if all the parameters are correct, so the function aeCreateEventLoop() may indeed return NULL and this condition must be checked.	2017-04-21 16:27:38 +02:00
Salvatore Sanfilippo	3773c06d28	Merge pull request #3950 from kensou97/unstable update block->free after some diff data are written to the child process	2017-04-20 07:55:51 +02:00
antirez	7d9dd80db3	Fix getKeysUsingCommandTable() in cluster mode. Close #3940.	2017-04-19 16:17:08 +02:00
antirez	189a12afb4	PSYNC2: discard pending transactions from cached master. During the review of the fix for #3899, @yangsiran identified an implementation bug: given that the offset is now relative to the applied part of the replication log, when we cache a master, the successive PSYNC2 request will be made in order to include the transaction that was not completely processed. This means that we need to discard any pending transaction from our replication buffer: it will be re-executed.	2017-04-19 14:02:52 +02:00
antirez	22be435efe	Fix PSYNC2 incomplete command bug as described in #3899 . This bug was discovered by @kevinmcgehee and constituted a major hidden bug in the PSYNC2 implementation, caused by the propagation from the master of incomplete commands to slaves. The bug had several results: 1. Borrowing from Kevin text in the issue: "Given that slaves blindly copy over their master's input into their own replication backlog over successive read syscalls, it's possible that with large commands or small TCP buffers, partial commands are present in this buffer. If the master were to fail before successfully propagating the entire command to a slave, the slaves will never execute the partial command (since the client is invalidated) but will copy it to replication backlog which may relay those invalid bytes to its slaves on PSYNC2, corrupting the backlog and possibly other valid commands that follow the failover. Simple command boundaries aren't sufficient to capture this, either, because in the case of a MULTI/EXEC block, if the master successfully propagates a subset of the commands but not the EXEC, then the transaction in the backlog becomes corrupt and could corrupt other slaves that consume this data." 2. As identified by @yangsiran later, there is another effect of the bug. For the same mechanism of the first problem, a slave having another slave, could receive a full resynchronization request with an already half-applied command in the backlog. Once the RDB is ready, it will be sent to the slave, and the replication will continue sending to the sub-slave the other half of the command, which is not valid. The fix, designed by @yangsiran and @antirez, and implemented by @antirez, uses a secondary buffer in order to feed the sub-masters and update the replication backlog and offsets, only when a given part of the query buffer is actually applied to the state of the instance, that is, when the command gets processed and the command is not pending in the Redis transaction buffer because of CLIENT_MULTI state. Given that now the backlog and offsets representation are in agreement with the actual processed commands, both issue 1 and 2 should no longer be possible. Thanks to @kevinmcgehee, @yangsiran and @oranagra for their work in identifying and designing a fix for this problem.	2017-04-19 10:25:45 +02:00
Salvatore Sanfilippo	27fe8e9fb2	Merge pull request #3945 from badboy/dicthash-bench-compile Reorder to make dict-benchmark compile on Linux	2017-04-18 16:31:18 +02:00
antirez	02d02a3754	Fix #3848 by closing the descriptor on error.	2017-04-18 16:24:06 +02:00
antirez	8b7b4d6734	Merge branch 'unstable' of github.com:/antirez/redis into unstable	2017-04-18 16:15:24 +02:00
antirez	da2f9cd186	Fix descriptor leak. Close #3848 .	2017-04-18 16:15:16 +02:00
Salvatore Sanfilippo	332a05dc33	Merge pull request #3856 from viennadd/issue-3847 fix #3847: add close socket before return ANET_ERR.	2017-04-18 16:13:23 +02:00
张文康	5f88bd320e	update block->free after some diff data are written to the child process	2017-04-18 20:10:08 +08:00
antirez	c33493277a	Clarify why we save ziplist elements in revserse order. Also get rid of variables that are now kinda redundant, since the dictionary iterator was removed. This is related to PR #3949.	2017-04-18 11:01:47 +02:00
Salvatore Sanfilippo	0a942f1751	Merge pull request #3949 from spinlock/unstable-rdb-encoding rdb: saving skiplist in reversed order to accelerate the deserialisation process	2017-04-18 10:56:57 +02:00
Jan-Erik Rediger	c4ad4765b0	Reorder to make dict-benchmark compile on Linux Fixes #3944	2017-04-17 13:37:59 +02:00
spinlock	23ec36909e	rdb: saving skiplist in reversed order to accelerate the deserialisation process	2017-04-17 13:22:34 +08:00
antirez	271733f4f8	Cluster: discard pong times in the future. However we allow for 500 milliseconds of tolerance, in order to avoid often discarding semantically valid info (the node is up) because of natural few milliseconds desync among servers even when NTP is used. Note that anyway we should ping the node from time to time regardless and discover if it's actually down from our point of view, since no update is accepted while we have an active ping on the node. Related to #3929.	2017-04-15 10:12:08 +02:00
antirez	3f068b92b9	Test: fix, hopefully, false PSYNC failure like in issue #2715 . And many other related Github issues... all reporting the same problem. There was probably just not enough backlog in certain unlucky runs. I'll ask people that can reporduce if they see now this as fixed as well.	2017-04-14 17:53:11 +02:00
antirez	02777bb252	Cluster: always add PFAIL nodes at end of gossip section. To rely on the fact that nodes in PFAIL state will be shared around by randomly adding them in the gossip section is a weak assumption, especially after changes related to sending less ping/pong packets. We want to always include gossip entries for all the nodes that are in PFAIL state, so that the PFAIL -> FAIL state promotion can happen much faster and reliably. Related to #3929.	2017-04-14 13:39:49 +02:00
antirez	8c829d9e43	Cluster: fix gossip section ping/pong times encoding. The gossip section times are 32 bit, so cannot store the milliseconds time but just the seconds approximation, which is good enough for our uses. At the same time however, when comparing the gossip section times of other nodes with our node's view, we need to convert back to milliseconds. Related to #3929. Without this change the patch to reduce the traffic in the bus message does not work.	2017-04-14 11:01:22 +02:00
antirez	6878a3fedd	Cluster: add clean-logs command to create-cluster script.	2017-04-14 10:52:00 +02:00
antirez	8f7bf2841a	Cluster: decrease ping/pong traffic by trusting other nodes reports. Cluster of bigger sizes tend to have a lot of traffic in the cluster bus just for failure detection: a node will try to get a ping reply from another node no longer than when the half the node timeout would elapsed, in order to avoid a false positive. However this means that if we have N nodes and the node timeout is set to, for instance M seconds, we'll have to ping N nodes every M/2 seconds. This NM/2 pings will receive the same number of pongs, so a total of NM packets per node. However given that we have a total of N nodes doing this, the total number of messages will be NNM. In a 100 nodes cluster with a timeout of 60 seconds, this translates to a total of 10010030 packets per second, summing all the packets exchanged by all the nodes. This is, as you can guess, a lot... So this patch changes the implementation in a very simple way in order to trust the reports of other nodes: if a node A reports a node B as alive at least up to a given time, we update our view accordingly. The problem with this approach is that it could result into a subset of nodes being able to reach a given node X, and preventing others from detecting that is actually not reachable from the majority of nodes. So the above algorithm is refined by trusting other nodes only if we do not have currently a ping pending for the node X, and if there are no failure reports for that node. Since each node, anyway, pings 10 other nodes every second (one node every 100 milliseconds), anyway eventually even trusting the other nodes reports, we will detect if a given node is down from our POV. Now to understand the number of packets that the cluster would exchange for failure detection with the patch, we can start considering the random PINGs that the cluster sent anyway as base line: Each node sends 10 packets per second, so the total traffic if no additioal packets would be sent, including PONG packets, would be: Total messages per second = N102 However by trusting other nodes gossip sections will not AWALYS prevent pinging nodes for the "half timeout reached" rule all the times. The math involved in computing the actual rate as N and M change is quite complex and depends also on another parameter, which is the number of entries in the gossip section of PING and PONG packets. However it is possible to compare what happens in cluster of different sizes experimentally. After applying this patch a very important reduction in the number of packets exchanged is trivial to observe, without apparent impacts on the failure detection performances. Actual numbers with different cluster sizes should be published in the Reids Cluster documentation in the future. Related to #3929.	2017-04-14 10:43:53 +02:00
antirez	c5d6f577f0	Cluster: collect more specific bus messages stats. First step in order to change Cluster in order to use less messages. Related to issue #3929.	2017-04-13 19:22:35 +02:00
antirez	104584b95e	Fix typo in feedReplicationBacklog() top comment.	2017-04-12 12:28:05 +02:00
antirez	1210af3804	Add a top comment in crucial functions inside networking.c.	2017-04-12 10:12:27 +02:00
antirez	4a850be4dc	Set lua-time-limit default value at safe place. Otherwise, as it was, it will overwrite whatever the user set. Close #3703.	2017-04-11 16:56:00 +02:00
antirez	f47607af02	Fix preprocessor if/else chain broken in order to fix #3927 .	2017-04-11 16:54:27 +02:00
antirez	74720ea993	Merge branch 'unstable' of github.com:/antirez/redis into unstable	2017-04-11 16:45:49 +02:00
antirez	aa5b4be02e	Fix zmalloc_get_memory_size() ifdefs to actually use the else branch. Close #3927.	2017-04-11 16:45:11 +02:00
Salvatore Sanfilippo	69ce5c5d10	Merge pull request #3924 from lorneli/unstable Expire: Update comment of activeExpireCycle function	2017-04-11 16:31:55 +02:00

1 2 3 4 5 ...

6196 Commits