Previously the string was created empty then re-sized
to fit the offset, but sds resize causes the sds to
over-allocate by at least 1 MB (which is a lot when
you are operating at bit-level access).
This also improves the speed of initial sets by 2% to 6%
based on quick testing.
Patch logic provided by @oranagra
Fixes#1918
Issue: #2157
As the SET command is parsed, it remembers which options are already set
and if a duplicate option is found, raises an error because it is
essentially an invalid syntax.
It still allows mutually exclusive options like EX and PX because taking
an option over another (precedence) is not essentially a syntactic
error.
However we don't try to do this if the integer is already inside a range
representable with a shared integer.
The performance gain appears to be around ~15% in micro benchmarks,
however in the long run this also helps to improve locality, so should
have more, hard to measure, benefits.
Previously, GETRANGE of a key containing nothing ("")
would allocate a large (size_t)-1 return value causing
crashes on 32bit builds when it tried to allocate the
4 GB return string.
32 bit builds don't have a big enough long to capture
the same range as a 64 bit build. If we use "long long"
we get proper size limits everywhere.
Also updates size of unsigned comparison to fit new size of `end`.
Fixes#1981
Previously the end was casted to a smaller type
which resulted in a wrong check and failed
with values larger than handled by unsigned.
Closes#1847, #1844
All the Redis functions that need to modify the string value of a key in
a destructive way (APPEND, SETBIT, SETRANGE, ...) require to make the
object unshared (if refcount > 1) and encoded in raw format (if encoding
is not already REDIS_ENCODING_RAW).
This was cut & pasted many times in multiple places of the code. This
commit puts the small logic needed into a function called
dbUnshareStringValue().
Previously two string encodings were used for string objects:
1) REDIS_ENCODING_RAW: a string object with obj->ptr pointing to an sds
stirng.
2) REDIS_ENCODING_INT: a string object where the obj->ptr void pointer
is casted to a long.
This commit introduces a experimental new encoding called
REDIS_ENCODING_EMBSTR that implements an object represented by an sds
string that is not modifiable but allocated in the same memory chunk as
the robj structure itself.
The chunk looks like the following:
+--------------+-----------+------------+--------+----+
| robj data... | robj->ptr | sds header | string | \0 |
+--------------+-----+-----+------------+--------+----+
| ^
+-----------------------+
The robj->ptr points to the contiguous sds string data, so the object
can be manipulated with the same functions used to manipulate plan
string objects, however we need just on malloc and one free in order to
allocate or release this kind of objects. Moreover it has better cache
locality.
This new allocation strategy should benefit both the memory usage and
the performances. A performance gain between 60 and 70% was observed
during micro-benchmarks, however there is more work to do to evaluate
the performance impact and the memory usage behavior.
When keyspace events are enabled, the overhead is not sever but
noticeable, so this commit introduces the ability to select subclasses
of events in order to avoid to generate events the user is not
interested in.
The events can be selected using redis.conf or CONFIG SET / GET.
Because of the short circuit behavior of && inverting the two sides of
the if expression avoids an hash table lookup if the non-EX variant of
SET is called.
Thanks to Weibin Yao (@yaoweibin on github) for spotting this.
All the general string operations are implemented in t_string.c, however
the bit operations, while targeting the string type, are better served
in a specific file where we have the implementations of the following
four commands and helper functions:
GETBIT
SETBIT
BITOP
BITCOUNT
In the future this file will probably contain more code related to
making the BITOP and BITCOUNT operations faster.
The motivation for this new commands is to be search in the usage of
Redis for real time statistics. See the article "Fast real time metrics
using Redis".
http://blog.getspool.com/2011/11/29/fast-easy-realtime-metrics-using-redis-bitmaps/
In general Redis strings when used as bitmaps using the SETBIT/GETBIT
command provide a very space-efficient and fast way to store statistics.
For instance in a web application with users, every user can be
associated with a key that shows every day in which the user visited the
web service. This information can be really valuable to extract user
behaviour information.
With Redis bitmaps doing this is very simple just saying that a given
day is 0 (the data the service was put online) and all the next days are
1, 2, 3, and so forth. So with SETBIT it is possible to set the bit
corresponding to the current day every time the user visits the site.
It is possible to take the count of the bit sets on the run, this is
extremely easy using a Lua script. However a fast bit count native
operation can be useful, especially if it can operate on ranges, or when
the string is small like in the case of days (even if you consider many
years it is still extremely little data).
For this reason BITOP was introduced. The command counts the number of
bits set to 1 in a string, with optional range:
BITCOUNT key [start end]
The start/end parameters are similar to GETRANGE. If omitted the whole
string is tested.
Population counting is more useful when bit-level operations like AND,
OR and XOR are avaialble. For instance I can test multiple users to see
the number of days three users visited the site at the same time. To do
this we can take the AND of all the bitmaps, and then count the set bits.
For this reason the BITOP command was introduced:
BITOP [AND|OR|XOR|NOT] dest_key src_key1 src_key2 src_key3 ... src_keyN
In the special case of NOT (that inverts the bits) only one source key
can be passed.
The judicious use of BITCOUNT and BITOP combined can lead to interesting
use cases with very space efficient representation of data.
The implementation provided is still not tested and optimized for speed,
next commits will introduce unit tests. Later the implementation will be
profiled to see if it is possible to gain an important amount of speed
without making the code much more complex.
Move logic concerned with setting a bit in an sds to the SETBIT command
instead of keeping it in sds.c. The function to grow an sds can and will
be reused for a command to set a range within a string value.