This change attempts to switch to an hash function which mitigates
the effects of the HashDoS attack (denial of service attack trying
to force data structures to worst case behavior) while at the same time
providing Redis with an hash function that does not expect the input
data to be word aligned, a condition no longer true now that sds.c
strings have a varialbe length header.
Note that it is possible sometimes that even using an hash function
for which collisions cannot be generated without knowing the seed,
special implementation details or the exposure of the seed in an
indirect way (for example the ability to add elements to a Set and
check the return in which Redis returns them with SMEMBERS) may
make the attacker's life simpler in the process of trying to guess
the correct seed, however the next step would be to switch to a
log(N) data structure when too many items in a single bucket are
detected: this seems like an overkill in the case of Redis.
SPEED REGRESION TESTS:
In order to verify that switching from MurmurHash to SipHash had
no impact on speed, a set of benchmarks involving fast insertion
of 5 million of keys were performed.
The result shows Redis with SipHash in high pipelining conditions
to be about 4% slower compared to using the previous hash function.
However this could partially be related to the fact that the current
implementation does not attempt to hash whole words at a time but
reads single bytes, in order to have an output which is endian-netural
and at the same time working on systems where unaligned memory accesses
are a problem.
Further X86 specific optimizations should be tested, the function
may easily get at the same level of MurMurHash2 if a few optimizations
are performed.
Current todo:
- replace functions in zset.{c,h} with a new unified Redis
zset access API.
Once we get the zset interface fixed, we can squash
relevant commits in this branch and have one nice commit
to merge into unstable.
This commit adds:
- Geo commands
- Tests; runnable with: ./runtest --single unit/geo
- Geo helpers in deps/geohash-int/
- src/geo.{c,h} and src/geojson.{c,h} implementing geo commands
- Updated build configurations to get everything working
- TEMPORARY: src/zset.{c,h} implementing zset score and zset
range reading without writing to client output buffers.
- Modified linkage of one t_zset.c function for use in zset.c
Conflicts:
src/Makefile
src/redis.c
Normally ZADD only returns the number of elements added to a sorted
set, using the RETCH option it returns the sum of elements added or
for which the score was updated.
The user @kjmph provided excellent ideas to improve speed of ZUNIONSTORE
(in certain cases by many order of magnitude), together with an
implementation of the ideas.
While the ideas were sounding, the implementation could be improved both
in terms of speed and clearness, so that's my attempt at reimplementing
the speedup proposed, trying to improve by directly using just a
dictionary with an embedded score inside, and reusing the single-pass
aggregate + order-later approach.
Note that you can't apply this commit without applying the previous
commit in this branch that adds a double in the dictEntry value union.
Issue #1786.
Given that the code was written with a 2 years pause... something
strange happened in the middle. So there was no function to free a
lex range min/max objects, and in some places the range was passed by
value.