42 Commits

Author SHA1 Message Date
antirez
5317f5e99a Added dictGetRandomKeys() to dict.c: mass get random entries.
This new function is useful to get a number of random entries from an
hash table when we just need to do some sampling without particularly
good distribution.

It just jumps at a random place of the hash table and returns the first
N items encountered by scanning linearly.

The main usefulness of this function is to speedup Redis internal
sampling of the key space, for example for key eviction or expiry.
2014-03-20 15:50:46 +01:00
zhanghailei
c0f8665414 FIXED a typo more thank should be more than 2014-03-04 11:21:34 +08:00
zhanghailei
4b9ac6edd0 According to context,the size should be 16 rather than 64 2014-03-04 11:21:34 +08:00
antirez
2eb781b35b dict.c: added optional callback to dictEmpty().
Redis hash table implementation has many non-blocking features like
incremental rehashing, however while deleting a large hash table there
was no way to have a callback called to do some incremental work.

This commit adds this support, as an optiona callback argument to
dictEmpty() that is currently called at a fixed interval (one time every
65k deletions).
2013-12-10 18:46:24 +01:00
antirez
11e81a1e9a Fixed grammar: before H the article is a, not an. 2013-12-05 16:35:32 +01:00
antirez
20fb91fd31 removed not used vars in dictScan(). 2013-11-05 11:56:11 +01:00
antirez
dfeaa84d46 dictScan(): empty hash table requires special handling. 2013-10-28 11:17:18 +01:00
antirez
7bd45659b9 Fixed typos in dictScan() comment. 2013-10-25 17:05:55 +02:00
antirez
34c207227c dictScan() algorithm documented. 2013-10-25 17:01:30 +02:00
Pieter Noordhuis
b63fbea5e4 Fix error in scan algorithm
The irrelevant bits shouldn't be masked to 1. This can result in slots being
skipped when the hash table is resized between calls to the iterator.
2013-10-25 10:50:03 +02:00
Pieter Noordhuis
7f490b197f Add SCAN command 2013-10-25 10:49:48 +02:00
antirez
1c08302edc dictFingerprint(): cast pointers to integer of same size. 2013-08-20 11:49:55 +02:00
antirez
00ddc3500c Revert "Fixed type in dict.c comment: 265 -> 256."
This reverts commit 6253180abd9fd11a385c644fe1dee932ef83d86f.
2013-08-19 17:25:48 +02:00
antirez
6253180abd Fixed type in dict.c comment: 265 -> 256. 2013-08-19 15:10:37 +02:00
antirez
1c75408457 assert.h replaced with redisassert.h when appropriate.
Also a warning was suppressed by including unistd.h in redisassert.h
(needed for _exit()).
2013-08-19 15:01:21 +02:00
antirez
905d4822da dictFingerprint() fingerprinting made more robust.
The previous hashing used the trivial algorithm of xoring the integers
together. This is not optimal as it is very likely that different
hash table setups will hash the same, for instance an hash table at the
start of the rehashing process, and at the end, will have the same
fingerprint.

Now we hash N integers in a smarter way, by summing every integer to the
previous hash, and taking the integer hashing again (see the code for
further details). This way it is a lot less likely that we get a
collision. Moreover this way of hashing explicitly protects from the
same set of integers in a different order to hash to the same number.

This commit is related to issue #1240.
2013-08-19 15:01:12 +02:00
antirez
48cde3fe47 dict.c iterator API misuse protection.
dict.c allows the user to create unsafe iterators, that are iterators
that will not touch the dictionary data structure in any way, preventing
copy on write, but at the same time are limited in their usage.

The limitation is that when itearting with an unsafe iterator, no call
to other dictionary functions must be done inside the iteration loop,
otherwise the dictionary may be incrementally rehashed resulting into
missing elements in the set of the elements returned by the iterator.

However after introducing this kind of iterators a number of bugs were
found due to misuses of the API, and we are still finding
bugs about this issue. The bugs are not trivial to track because the
effect is just missing elements during the iteartion.

This commit introduces auto-detection of the API misuse. The idea is
that an unsafe iterator has a contract: from initialization to the
release of the iterator the dictionary should not change.

So we take a fingerprint of the dictionary state, xoring a few important
dict properties when the unsafe iteartor is initialized. We later check
when the iterator is released if the fingerprint is still the same. If it
is not, we found a misuse of the iterator, as not allowed API calls
changed the internal state of the dictionary.

This code was checked against a real bug, issue #1240.

This is what Redis prints (aborting) when a misuse is detected:

Assertion failed: (iter->fingerprint == dictFingerprint(iter->d)),
function dictReleaseIterator, file dict.c, line 587.
2013-08-19 15:00:57 +02:00
guiquanz
9d09ce3981 Fixed many typos. 2013-01-19 10:59:44 +01:00
charsyam
dee0b939fc Remove unnecessary condition in _dictExpandIfNeeded (dict.c) 2012-11-28 11:44:39 +01:00
antirez
4365e5b2d3 BSD license added to every C source and header file. 2012-11-08 18:31:32 +01:00
antirez
da920e75d4 Hash function switched to murmurhash2.
The previously used hash function, djbhash, is not secure against
collision attacks even when the seed is randomized as there are simple
ways to find seed-independent collisions.

The new hash function appears to be safe (or much harder to exploit at
least) in this case, and has better distribution.

Better distribution does not always means that's better. For instance in
a fast benchmark with "DEBUG POPULATE 1000000" I obtained the following
results:

    1.6 seconds with djbhash
    2.0 seconds with murmurhash2

This is due to the fact that djbhash will hash objects that follow the
pattern `prefix:<id>` and where the id is numerically near, to near
buckets. This improves the locality.

However in other access patterns with keys that have no relation
murmurhash2 has some (apparently minimal) speed advantage.

On the other hand a better distribution should significantly
improve the quality of the distribution of elements returned with
dictGetRandomKey() that is used in SPOP, SRANDMEMBER, RANDOMKEY, and
other commands.

Everything considered, and under the suspect that this commit fixes a
security issue in Redis, we are switching to the new hash function.
If some serious speed regression will be found in the future we'll be able
to step back easiliy.

This commit fixes issue #663.
2012-10-05 11:20:13 +02:00
antirez
eb6e7eb94d Even inside #if 0 comments are comments. 2012-04-21 21:49:21 +02:00
Salvatore Sanfilippo
7d3ee4172f Merge pull request #440 from ErikDubbelboer/spelling
Fixed some spelling errors in comments
2012-04-21 03:31:06 -07:00
antirez
1e35ae7486 Currenly not used code in dict.c commented out. 2012-04-18 23:56:07 +02:00
Erik Dubbelboer
8d16e7a3c6 Update src/dict.c 2012-04-07 15:45:53 +03:00
Erik Dubbelboer
65fd32ab0a Fixed some spelling errors in the comments 2012-04-07 14:40:29 +02:00
huangz1990
9448ddb0c6 fix typo 2012-03-15 14:27:14 +08:00
antirez
b362c111da fixed typo in hahs function seed default value. It is no longer used but fixed to retain the old constant as default anyway. 2012-01-22 01:40:23 +01:00
antirez
a48c8d873b Fix for hash table collision attack. We simply randomize hash table initialization value at startup time. 2012-01-21 23:30:13 +01:00
antirez
aa9a61ccd7 dict.c: added macros in dict.h to set signed and unsigned 64 bit values directly inside the hash entry without using additional memory. 2011-11-08 19:41:29 +01:00
antirez
c0ba9ebe13 dict.c API names modified to be more coincise and consistent. 2011-11-08 17:07:55 +01:00
antirez
71a50956b1 dict.c: added two lower level methods for directly manipulating hash entries. This is useful in order to set 64 bit integers as values directly inside the hash entry (in order to save memory), without casting, and even in 32 bit builds. 2011-11-08 16:57:20 +01:00
antirez
6a7841eb09 added an union in the dict.h structure to store 64 bit integers directly into hash table entries. 2011-11-02 15:28:45 +01:00
antirez
4b53e7365c Introduced a safe iterator interface that can be used to iterate while accessing the dictionary at the same time. Now the default interface is consireded unsafe and should be used only with dictNext() 2011-05-10 10:15:50 +02:00
antirez
05600eb8a7 fixed two diskstore issues, a quasi-deadlock creating problems with I/O speed and a race condition among threads 2011-02-11 11:16:15 +01:00
antirez
1b1f47c915 command lookup process turned into a much more flexible and probably faster hash table 2010-11-03 11:23:59 +01:00
antirez
3856f14759 This should fix Issue 332: when there is a background process saving we still allow the hash tables to grow, but only when a critical treshold is reached. Formerly we prevented the resize at all triggering pathological O(N) behavior. Also there is a fix for the statistics in INFO about the number of keys expired 2010-09-15 14:09:41 +02:00
antirez
e0be2289e9 hash table example commented out in dict.c 2010-07-27 10:00:38 +02:00
Benjamin Kramer
399f2f401c Add zcalloc and use it where appropriate
calloc is more effecient than malloc+memset when the system uses mmap to
allocate memory. mmap always returns zeroed memory so the memset can be
avoided.  The threshold to use mmap is 16k in osx libc and 128k in bsd
libc and glibc. The kernel can lazily allocate the pages, this reduces
memory usage when we have a page table or hash table that is mostly
empty.

This change is most visible when you start a new redis instance with vm
enabled.  You'll see no increased memory usage no matter how big your
page table is.
2010-07-25 00:11:20 +02:00
Benjamin Kramer
d9dd352b36 Remove _dictAlloc and friends
zmalloc calls abort() so _dictPanic will never be called.
2010-07-24 23:10:42 +02:00
Benjamin Kramer
b1e0bd4b9b Reduce code duplication 2010-07-24 22:37:01 +02:00
antirez
e2641e09cc redis.c split into many different C files.
networking related stuff moved into networking.c

moved more code

more work on layout of source code

SDS instantaneuos memory saving. By Pieter and Salvatore at VMware ;)

cleanly compiling again after the first split, now splitting it in more C files

moving more things around... work in progress

split replication code

splitting more

Sets split

Hash split

replication split

even more splitting

more splitting

minor change
2010-07-01 14:38:51 +02:00