639 Commits

Author SHA1 Message Date
xhe
2e8f8c9b0c HELLO without protover
Signed-off-by: xhe <xw897002528@gmail.com>
2020-12-24 13:35:41 +08:00
Madelyn Olson
efaf09ee4b
Flow through the error handling path for most errors (#8226)
Properly throw errors for invalid replication stream and support https://github.com/redis/redis/pull/8217
2020-12-23 19:06:25 -08:00
Greg Femec
266949c7fc
Fix random element selection for large hash tables. (#8133)
When a database on a 64 bit build grows past 2^31 keys, the underlying hash table expands to 2^32 buckets. After this point, the algorithms for selecting random elements only return elements from half of the available buckets because they use random() which has a range of 0 to 2^31 - 1. This causes problems for eviction policies which use dictGetSomeKeys or dictGetRandomKey. Over time they cause the hash table to become unbalanced because, while new keys are spread out evenly across all buckets, evictions come from only half of the available buckets. Eventually this half of the table starts to run out of keys and it takes longer and longer to find candidates for eviction. This continues until no more evictions can happen.

This solution addresses this by using a 64 bit PRNG instead of libc random().

Co-authored-by: Greg Femec <gfemec@google.com>
2020-12-23 15:52:07 +02:00
Oran Agra
2426aaa099
fix valgrind warning created by recent pidfile fix (#8235)
This isn't a leak, just an warning due to unreachable
allocation on the fork child.
Problem created by 92a483b
2020-12-23 12:55:05 +02:00
Meir Shpilraien (Spielrein)
92a483bca2
Fix issue where fork process deletes the parent pidfile (#8231)
Turns out that when the fork child crashes, the crash log was deleting
the pidfile from the disk (although the parent is still running.

Now we set the pidfile of the fork process to NULL so the fork process
will never deletes it.
2020-12-22 15:17:39 +02:00
Oran Agra
411c18bbce
Remove read-only flag from non-keyspace cmds, different approach for EXEC to propagate MULTI (#8216)
In the distant history there was only the read flag for commands, and whatever
command that didn't have the read flag was a write one.
Then we added the write flag, but some portions of the code still used !read
Also some commands that don't work on the keyspace at all, still have the read
flag.

Changes in this commit:
1. remove the read-only flag from TIME, ECHO, ROLE and LASTSAVE

2. EXEC command used to decides if it should propagate a MULTI by looking at
   the command flags (!read & !admin).
   When i was about to change it to look at the write flag instead, i realized
   that this would cause it not to propagate a MULTI for PUBLISH, EVAL, and
   SCRIPT, all 3 are not marked as either a read command or a write one (as
   they should), but all 3 are calling forceCommandPropagation.

   So instead of introducing a new flag to denote a command that "writes" but
   not into the keyspace, and still needs propagation, i decided to rely on
   the forceCommandPropagation, and just fix the code to propagate MULTI when
   needed rather than depending on the command flags at all.

   The implication of my change then is that now it won't decide to propagate
   MULTI when it sees one of these: SELECT, PING, INFO, COMMAND, TIME and
   other commands which are neither read nor write.

3. Changing getNodeByQuery and clusterRedirectBlockedClientIfNeeded in
   cluster.c to look at !write rather than read flag.
   This should have no implications, since these code paths are only reachable
   for commands which access keys, and these are always marked as either read
   or write.

This commit improve MULTI propagation tests, for modules and a bunch of
other special cases, all of which used to pass already before that commit.
the only one that test change that uncovered a change of behavior is the
one that DELs a non-existing key, it used to propagate an empty
multi-exec block, and no longer does.
2020-12-22 12:03:49 +02:00
valentinogeron
4c13945c37
Fix PFDEBUG commands flag (#8222)
- Mark it as a @hyperloglog command (ACL)
- Should not be allowed in OOM
- Add firstkey, lastkey, step
- Add comment that explains the 'write' flag
2020-12-21 15:40:20 +02:00
sundb
51eb0da824
Fix command reset's arity (#8212) 2020-12-18 14:55:39 +02:00
filipe oliveira
19d46f8f2f
Expose Redis main thread cpu time, and scrape system time via INFO command. (#8132)
Exposes the main thread CPU info via info modules ( linux specific only )
(used_cpu_sys_main_thread and used_cpu_user_main_thread). This is important for:

- distinguish between main thread and io-threads cpu time total cpu time consumed ( check
  what is the first bottleneck on the used config )
- distinguish between main thread and modules threads total cpu time consumed

Apart from it, this commit also exposes the server_time_usec within the Server section so that we can
properly differentiate consecutive collection and calculate for example the CPU% and or / cpu time vs
wall time, etc...
2020-12-13 19:14:46 +02:00
Wang Yuan
e3ff414513
Add total_forks to INFO STATS (#8155) 2020-12-13 10:01:18 +02:00
杨博东
4d06d99bf8
Add GEOSEARCH / GEOSEARCHSTORE commands (#8094)
Add commands to query geospatial data with bounding box.

Two new commands that replace the existing 4 GEORADIUS* commands.

GEOSEARCH key [FROMMEMBER member] [FROMLOC long lat] [BYRADIUS radius
unit] [BYBOX width height unit] [WITHCORD] [WITHDIST] [WITHASH] [COUNT
count] [ASC|DESC]

GEOSEARCHSTORE dest_key src_key [FROMMEMBER member] [FROMLOC long lat]
[BYRADIUS radius unit] [BYBOX width height unit] [WITHCORD] [WITHDIST]
[WITHASH] [COUNT count] [ASC|DESC] [STOREDIST]

- Add two types of CIRCULAR_TYPE and RECTANGLE_TYPE to achieve different searches
- Judge whether the point is within the rectangle, refer to:
geohashGetDistanceIfInRectangle
2020-12-12 02:21:05 +02:00
Oran Agra
c31055db61 Sanitize dump payload: fuzz tester and fixes for segfaults and leaks it exposed
The test creates keys with various encodings, DUMP them, corrupt the payload
and RESTORES it.
It utilizes the recently added use-exit-on-panic config to distinguish between
 asserts and segfaults.
If the restore succeeds, it runs random commands on the key to attempt to
trigger a crash.

It runs in two modes, one with deep sanitation enabled and one without.
In the first one we don't expect any assertions or segfaults, in the second one
we expect assertions, but no segfaults.
We also check for leaks and invalid reads using valgrind, and if we find them
we print the commands that lead to that issue.

Changes in the code (other than the test):
- Replace a few NPD (null pointer deference) flows and division by zero with an
  assertion, so that it doesn't fail the test. (since we set the server to use
  `exit` rather than `abort` on assertion).
- Fix quite a lot of flows in rdb.c that could have lead to memory leaks in
  RESTORE command (since it now responds with an error rather than panic)
- Add a DEBUG flag for SET-SKIP-CHECKSUM-VALIDATION so that the test don't need
  to bother with faking a valid checksum
- Remove a pile of code in serverLogObjectDebugInfo which is actually unsafe to
  run in the crash report (see comments in the code)
- fix a missing boundary check in lzf_decompress

test suite infra improvements:
- be able to run valgrind checks before the process terminates
- rotate log files when restarting servers
2020-12-06 14:54:34 +02:00
Oran Agra
ca1c182567 Sanitize dump payload: ziplist, listpack, zipmap, intset, stream
When loading an encoded payload we will at least do a shallow validation to
check that the size that's encoded in the payload matches the size of the
allocation.
This let's us later use this encoded size to make sure the various offsets
inside encoded payload don't reach outside the allocation, if they do, we'll
assert/panic, but at least we won't segfault or smear memory.

We can also do 'deep' validation which runs on all the records of the encoded
payload and validates that they don't contain invalid offsets. This lets us
detect corruptions early and reject a RESTORE command rather than accepting
it and asserting (crashing) later when accessing that payload via some command.

configuration:
- adding ACL flag skip-sanitize-payload
- adding config sanitize-dump-payload [yes/no/clients]

For now, we don't have a good way to ensure MIGRATE in cluster resharding isn't
being slowed down by these sanitation, so i'm setting the default value to `no`,
but later on it should be set to `clients` by default.

changes:
- changing rdbReportError not to `exit` in RESTORE command
- adding a new stat to be able to later check if cluster MIGRATE isn't being
  slowed down by sanitation.
2020-12-06 14:54:34 +02:00
guybe7
1df5bb5687
Make sure we do not propagate nested MULTI/EXEC (#8097)
One way this was happening is when a module issued an RM_Call which would inject MULTI.
If the module command that does that was itself issued by something else that already did
added MULTI (e.g. another module, or a Lua script), it would have caused nested MULTI.

In fact the MULTI state in the client or the MULTI_EMITTED flag in the context isn't
the right indication that we need to propagate MULTI or not, because on a nested calls
(possibly a module action called by a keyspace event of another module action), these
flags aren't retained / reflected.

instead there's now a global propagate_in_transaction flag for that.

in addition to that, we now have a global in_eval and in_exec flags, to serve the flags
of RM_GetContextFlags, since their dependence on the current client is wrong for the same
reasons mentioned above.
2020-12-06 13:14:18 +02:00
Wang Yuan
75f9dec644
Limit the main db and expires dictionaries to expand (#7954)
As we know, redis may reject user's requests or evict some keys if
used memory is over maxmemory. Dictionaries expanding may make
things worse, some big dictionaries, such as main db and expires dict,
may eat huge memory at once for allocating a new big hash table and be
far more than maxmemory after expanding.
There are related issues: #4213 #4583

More details, when expand dict in redis, we will allocate a new big
ht[1] that generally is double of ht[0], The size of ht[1] will be
very big if ht[0] already is big. For db dict, if we have more than
64 million keys, we need to cost 1GB for ht[1] when dict expands.

If the sum of used memory and new hash table of dict needed exceeds
maxmemory, we shouldn't allow the dict to expand. Because, if we
enable keys eviction, we still couldn't add much more keys after
eviction and rehashing, what's worse, redis will keep less keys when
redis only remains a little memory for storing new hash table instead
of users' data. Moreover users can't write data in redis if disable
keys eviction.

What this commit changed ?

Add a new member function expandAllowed for dict type, it provide a way
for caller to allow expand or not. We expose two parameters for this
function: more memory needed for expanding and dict current load factor,
users can implement a function to make a decision by them.
For main db dict and expires dict type, these dictionaries may be very
big and cost huge memory for expanding, so we implement a judgement
function: we can stop dict to expand provisionally if used memory will
be over maxmemory after dict expands, but to guarantee the performance
of redis, we still allow dict to expand if dict load factor exceeds the
safe load factor.
Add test cases to verify we don't allow main db to expand when left
memory is not enough, so that avoid keys eviction.

Other changes:

For new hash table size when expand. Before this commit, the size is
that double used of dict and later _dictNextPower. Actually we aim to
control a dict load factor between 0.5 and 1.0. Now we replace *2 with
+1, since the first check is that used >= size, the outcome of before
will usually be the same as _dictNextPower(used+1). The only case where
it'll differ is when dict_can_resize is false during fork, so that later
the _dictNextPower(used*2) will cause the dict to jump to *4 (i.e.
_dictNextPower(1025*2) will return 4096).
Fix rehash test cases due to changing algorithm of new hash table size
when expand.
2020-12-06 11:53:04 +02:00
Itamar Haber
c1b1e8c329
Adds pub/sub channel patterns to ACL (#7993)
Fixes #7923.

This PR appropriates the special `&` symbol (because `@` and `*` are taken),
followed by a literal value or pattern for describing the Pub/Sub patterns that
an ACL user can interact with. It is similar to the existing key patterns
mechanism in function (additive) and implementation (copy-pasta). It also adds
the allchannels and resetchannels ACL keywords, naturally.

The default user is given allchannels permissions, whereas new users get
whatever is defined by the acl-pubsub-default configuration directive. For
backward compatibility in 6.2, the default of this directive is allchannels but
this is likely to be changed to resetchannels in the next major version for
stronger default security settings.

Unless allchannels is set for the user, channel access permissions are checked
as follows :
* Calls to both PUBLISH and SUBSCRIBE will fail unless a pattern matching the
  argumentative channel name(s) exists for the user.
* Calls to PSUBSCRIBE will fail unless the pattern(s) provided as an argument
  literally exist(s) in the user's list.

Such failures are logged to the ACL log.

Runtime changes to channel permissions for a user with existing subscribing
clients cause said clients to disconnect unless the new permissions permit the
connections to continue. Note, however, that PSUBSCRIBErs' patterns are matched
literally, so given the change bar:* -> b*, pattern subscribers to bar:* will be
disconnected.

Notes/questions:
* UNSUBSCRIBE, PUNSUBSCRIBE and PUBSUB remain unprotected due to lack of reasons
  for touching them.
2020-12-01 14:21:39 +02:00
Oran Agra
bbc2c44541
INFO client_recent_max_input_buffer includes argv array (#8065)
this metric already includes the argv bytes, like what clientsCronTrackClientsMemUsage does, but it's missing the array itself.

p.s. For the purpose of tracking expensive clients we don't need to include the size of the client struct and the static reply buffer in it.
2020-11-25 23:39:01 +02:00
Oran Agra
e6fa47380a
Fix bug with module GIL being released prematurely (#8061)
This is hopefully usually harmles.
The server.ready_keys will usually be empty so the code after releasing
the GIL will soon be done.
The only case where it'll actually process things is when a module
releases a client (or module) blocked on a key, by triggering this NOT
from within a command (e.g. a timer event).

This bug was introduced in redis 6.0.9, see #7903
2020-11-22 14:00:51 +02:00
Oran Agra
61954951ed
Fix oom-score-adj-values range, abs options, and bug when used in config file (#8046)
Fix: When oom-score-adj-values is provided in the config file after
oom-score-adj yes, it'll take an immediate action, before
readOOMScoreAdj was acquired, resulting in an error (out of range score
due to uninitialized value. delay the reaction the real call is made by
main().

Since the values are clamped to -1000..1000, and they're
applied as an offset from the value at startup (which may be -1000), we
need to allow the offsets to reach to +2000 so that a value of +1000 is
achievable in case the value at startup was -1000.

Adding an option for absolute values rather than relative ones.
2020-11-22 13:57:56 +02:00
Meir Shpilraien (Spielrein)
d87a0d0286
Unified MULTI, LUA, and RM_Call with respect to blocking commands (#8025)
Blocking command should not be used with MULTI, LUA, and RM_Call. This is because,
the caller, who executes the command in this context, expects a reply.

Today, LUA and MULTI have a special (and different) treatment to blocking commands:

LUA   - Most commands are marked with no-script flag which are checked when executing
and command from LUA, commands that are not marked (like XREAD) verify that their
blocking mode is not used inside LUA (by checking the CLIENT_LUA client flag).
MULTI - Command that is going to block, first verify that the client is not inside
multi (by checking the CLIENT_MULTI client flag). If the client is inside multi, they
return a result which is a match to the empty key with no timeout (for example blpop
inside MULTI will act as lpop)
For modules that perform RM_Call with blocking command, the returned results type is
REDISMODULE_REPLY_UNKNOWN and the caller can not really know what happened.

Disadvantages of the current state are:

No unified approach, LUA, MULTI, and RM_Call, each has a different treatment
Module can not safely execute blocking command (and get reply or error).
Though It is true that modules are not like LUA or MULTI and should be smarter not
to execute blocking commands on RM_Call, sometimes you want to execute a command base
on client input (for example if you create a module that provides a new scripting
language like javascript or python).
While modules (on modules command) can check for REDISMODULE_CTX_FLAGS_LUA or
REDISMODULE_CTX_FLAGS_MULTI to know not to block the client, there is no way to
check if the command came from another module using RM_Call. So there is no way
for a module to know not to block another module RM_Call execution.

This commit adds a way to unify the treatment for blocking clients by introducing
a new CLIENT_DENY_BLOCKING client flag. On LUA, MULTI, and RM_Call the new flag
turned on to signify that the client should not be blocked. A blocking command
verifies that the flag is turned off before blocking. If a blocking command sees
that the CLIENT_DENY_BLOCKING flag is on, it's not blocking and return results
which are matches to empty key with no timeout (as MULTI does today).

The new flag is checked on the following commands:

List blocking commands: BLPOP, BRPOP, BRPOPLPUSH, BLMOVE,
Zset blocking commands: BZPOPMIN, BZPOPMAX
Stream blocking commands: XREAD, XREADGROUP
SUBSCRIBE, PSUBSCRIBE, MONITOR
In addition, the new flag is turned on inside the AOF client, we do not want to
block the AOF client to prevent deadlocks and commands ordering issues (and there
is also an existing assert in the code that verifies it).

To keep backward compatibility on LUA, all the no-script flags on existing commands
were kept untouched. In addition, a LUA special treatment on XREAD and XREADGROUP was kept.

To keep backward compatibility on MULTI (which today allows SUBSCRIBE, and PSUBSCRIBE).
We added a special treatment on those commands to allow executing them on MULTI.

The only backward compatibility issue that this PR introduces is that now MONITOR
is not allowed inside MULTI.

Tests were added to verify blocking commands are not blocking the client on LUA, MULTI,
or RM_Call. Tests were added to verify the module can check for CLIENT_DENY_BLOCKING flag.

Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Itamar Haber <itamar@redislabs.com>
2020-11-17 18:58:55 +02:00
Yossi Gottlieb
d638b05834
Improve and clean up supervised process support. (#8036)
* Configuration file default should now be "auto".
* Expose "process_supervised" as an info field.
* Log messages improvements (clarify required systemd config, report
  auto-detected supervision mode, etc.)
* Set server.supervised properly, so it can take precedence of
  "daemonize" configuration.
* Produce clear warning if systemd is detected/requested but executable
  is compiled without support for it, instead of silently ignoring.
* Handle systemd notification error on startup, and turn off supervised
  mode if it failed.
2020-11-17 12:52:49 +02:00
swamp0407
ea7cf737a1
Add COPY command (#7953)
Syntax:
COPY <key> <new-key> [DB <dest-db>] [REPLACE]

No support for module keys yet.

Co-authored-by: tmgauss
Co-authored-by: Itamar Haber <itamar@redislabs.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2020-11-17 12:03:05 +02:00
chenyangyang
c1aaad06d8
Modules callbacks for lazy free effort, and unlink (#7912)
Add two optional callbacks to the RedisModuleTypeMethods structure, which is `free_effort`
and `unlink`. the `free_effort` callback indicates the effort required to free a module memory.
Currently, if the effort exceeds LAZYFREE_THRESHOLD, the module memory may be released
asynchronously. the `unlink` callback indicates the key has been removed from the DB by redis, and
may soon be freed by a background thread.

Add `lazyfreed_objects` info field, which represents the number of objects that have been
lazyfreed since redis was started.

Add `RM_GetTypeMethodVersion` API, which return the current redis-server runtime value of
`REDISMODULE_TYPE_METHOD_VERSION`. You can use that when calling `RM_CreateDataType` to know
which fields of RedisModuleTypeMethods are gonna be supported and which will be ignored.
2020-11-16 10:34:04 +02:00
Felipe Machado
d8fd48c436
Add new commands ZDIFF and ZDIFFSTORE (#7961)
- Add ZDIFF and ZDIFFSTORE which work similarly to SDIFF and SDIFFSTORE
- Make sure the new WITHSCORES argument that was added for ZUNION isn't considered valid for ZUNIONSTORE

Co-authored-by: Oran Agra <oran@redislabs.com>
2020-11-15 14:14:25 +02:00
Madelyn Olson
3feff7d78a
Rewritten commands are logged as their original command (#8006)
* Rewritten commands are logged as their original command

Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2020-11-10 13:50:03 -08:00
Oran Agra
7ace7231c6
Better INFO fields to track diskless and disk-based replication progress (#7981)
Expose new `loading_rdb_used_mem` showing the used memory of the server
that saved the RDB file we're currently using.
This is useful in diskless replication when the total size of the rdb is
unkown, and can be used as a rought estimation of progres.

Use that new field to calculate the "user friendly"
`loading_loaded_perc` and `loading_eta_seconds`.

Expose `master_sync_total_bytes` and `master_sync_total_bytes` to complement
on the existing `master_sync_total_bytes` (which cannot be used on its own
to calculate progress).

Add "user friendly" field for `master_sync_perc`
2020-11-05 11:46:16 +02:00
Yossi Gottlieb
1fd456f91a
Add RESET command. (#7982)
Perform full reset of all client connection states, is if the client was
disconnected and re-connected. This affects:

* MULTI state
* Watched keys
* MONITOR mode
* Pub/Sub subscription
* ACL/Authenticated state
* Client tracking state
* Cluster read-only/asking state
* RESP version (reset to 2)
* Selected database
* CLIENT REPLY state

The response is +RESET to make it easily distinguishable from other
responses.

Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Itamar Haber <itamar@redislabs.com>
2020-11-05 10:51:26 +02:00
Oran Agra
a698a6391a
Add maxclients and cluster_connections to INFO CLIENTS (#7979)
Few config settings are also reflected by the INFO command.
these are mainly ones that are important for either an instant view of
the server status (to compare a metric to it's limit config),
Important configurations that are necessary in the crash log (which
currently doesn't print the config),
And things that are important for monitoring solutions (such as
Prometheus), which rely on INFO to collect their data.

Add cluster_connections to INFO CLUSTER:
This makes it possible to be combined together with connected_clients
and connected_slaves and be matched against maxclients
2020-11-04 09:53:43 +02:00
Wang Yuan
89c78a9808
Disable rehash when redis has child process (#8007)
In redisFork(), we don't set child pid, so updateDictResizePolicy()
doesn't take effect, that isn't friendly for copy-on-write.

The bug was introduced this in redis 6.0: 56258c6
2020-11-03 17:16:11 +02:00
Meir Shpilraien (Spielrein)
f210e197f3
Added crash report on SIGABRT (#8004)
The reason that we want to get a full crash report on SIGABRT
is that the jmalloc, when detecting a corruption, calls abort().
This will cause the Redis to exist silently without any report
and without any way to analyze what happened.
2020-11-03 14:59:21 +02:00
Oran Agra
441bfa2dfb
Optionally (default) fail to start if requested bind address is not available (#7936)
Background:
#3467 (redis 4.0.0), started ignoring ENOPROTOOPT, but did that only for
the default bind (in case bind config wasn't explicitly set).
#5598 (redis 5.0.3), added that for bind addresses explicitly set
(following bug reports in Debian for redis 4.0.9 and 5.0.1), it
also ignored a bunch of other errors like EPROTONOSUPPORT which was
requested in #3894, and also added EADDRNOTAVAIL (wasn't clear why).

This (ignoring EADDRNOTAVAIL) makes redis start successfully, even if a
certain network interface isn't up yet , in which case we rather redis
fail and will be re-tried when the NIC is up, see #7933.

However, it turns out that when IPv6 is disabled (supported but unused),
the error we're getting is EADDRNOTAVAIL. and in many systems the
default config file tries to bind to localhost for both v4 and v6 and
would like to silently ignore the error on v6 if disabled.
This means that we sometimes want to ignore EADDRNOTAVAIL and other times
we wanna fail.

So this commit changes these main things:
1. Ignore all the errors we ignore for both explicitly requested bind
   address and a default implicit one.
2. Add a '-' prefix to allow EADDRNOTAVAIL be ignored (by default that's
   different than the previous behavior).
3. Restructure that function in a more readable and maintainable way see
   below.
4. Make the default behavior of listening to all achievable by setting
  a bind config directive to * (previously only possible by omitting
  it)
5. document everything.

The old structure of this function was that even if there are no bind
addresses requested, the loop that runs though the bind addresses runs
at least once anyway!
In that one iteration of the loop it binds to both v4 and v6 addresses,
handles errors for each of them separately, and then eventually at the
if-else chain, handles the error of the last bind attempt again!
This was very hard to read and very error prone to maintain, instead now
when the bind info is missing we create one with two entries, and run
the simple loop twice.
2020-10-28 21:09:15 +02:00
WuYunlong
66037309c6 Fix waste of CPU time about server log in serverCron.
When all the work is just adding logs, we could pull
the condition out so as to use less CPU time when
loglevel is bigger than LL_VERBOSE.
2020-10-27 11:15:14 -07:00
zhenwei pi
a9c0602149
Disable THP if enabled (#7381)
In case redis starts and find that THP is enabled ("always"), instead
of printing a log message, which might go unnoticed, redis will try to
disable it (just for the redis process).

Note: it looks like on self-bulit kernels THP is likely be set to "always" by default.

Some discuss about THP side effect on Linux:
according to http://www.antirez.com/news/84, we can see that
redis latency spikes are caused by linux kernel THP feature.
I have tested on E3-2650 v3, and found that 2M huge page costs
about 0.25ms to fix COW page fault.

Add a new config 'disable-thp', the recommended setting is 'yes',
(default) the redis tries to disable THP by prctl syscall. But
users who really want THP can set it to "no"

Thanks to Oran & Yossi for suggestions.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
2020-10-27 15:04:18 +02:00
Wen Hui
0047702aab
Support ACL for Sentinel Mode (#7888)
This commit implements ACL for Sentinel mode, main work of this PR includes:

- Update Sentinel command table in order to better support ACLs.
- Fix couple of things which currently blocks the support for ACL on sentinel mode.
- Provide "sentinel sentinel-user" and "sentinel sentinel-pass " configuration in order to let sentinel authenticate with a specific user in other sentinels.
- requirepass is kept just for compatibility with old config files

Co-authored-by: Oran Agra <oran@redislabs.com>
2020-10-19 07:33:55 +03:00
Oran Agra
457b7073b5
INFO report peak memory before eviction (#7894)
In some cases one command added a very big bulk of memory, and this
would be "resolved" by the eviction before the next command.

Seeing an unexplained mass eviction we would wish to
know the highest momentary usage too.

Tracking it in call() and beforeSleep() adds some hooks in AOF and RDB
loading.

The fix in clientsCronTrackExpansiveClients is related to #7874
2020-10-18 16:56:43 +03:00
guybe7
addf47dcac
Minor improvements to module blocked on keys (#7903)
- Clarify some documentation comments
- Make sure blocked-on-keys client privdata is accessible
  from withing the timeout callback
- Handle blocked clients in beforeSleep - In case a key
  becomes "ready" outside of processCommand

See #7879 #7880
2020-10-12 17:13:38 +03:00
Yossi Gottlieb
9b7f8ba84b Introduce getKeysResult for getKeysFromCommand.
Avoid using a static buffer for short key index responses, and make it
caller's responsibility to stack-allocate a result type. Responses that
don't fit are still allocated on the heap.
2020-10-11 16:04:14 +03:00
Uri Shachar
dab5ec9b8d
Support getting configuration from both stdin and file at the same time (#7893)
This allows supplying secret configuration (for example - masterauth) via a secure channel
instead of having it in a plaintext file / command line param, while still allowing for most
of the configuration to reside there.

Also, remove 'special' case handling for --check-rdb which hasn't been relevant
since 4.0.0.
2020-10-11 13:43:23 +03:00
Felipe Machado
c3f9e01794
Adds new pop-push commands (LMOVE, BLMOVE) (#6929)
Adding [B]LMOVE <src> <dst> RIGHT|LEFT RIGHT|LEFT. deprecating [B]RPOPLPUSH.

Note that when receiving a BRPOPLPUSH we'll still propagate an RPOPLPUSH,
but on BLMOVE RIGHT LEFT we'll propagate an LMOVE

improvement to existing tests
- Replace "after 1000" with "wait_for_condition" when wait for
  clients to block/unblock.
- Add a pre-existing element to target list on basic tests so
  that we can check if the new element was added to the correct
  side of the list.
- check command stats on the replica to make sure the right
  command was replicated

Co-authored-by: Oran Agra <oran@redislabs.com>
2020-10-08 08:33:17 +03:00
Oran Agra
bea40e6a41
memory reporting of clients argv (#7874)
track and report memory used by clients argv.
this is very usaful in case clients started sending a command and didn't
complete it. in which case the first args of the command are already
trimmed from the query buffer.

in an effort to avoid cache misses and overheads while keeping track of
these, i avoid calling sdsZmallocSize and instead use the sdslen /
bulk-len which can at least give some insight into the problem.

This memory is now added to the total clients memory usage, as well as
the client list.
2020-10-05 11:15:36 +03:00
Oran Agra
eb6241a3dd
Include internal sds fragmentation in MEMORY reporting (#7864)
The MEMORY command is used for debugging memory usage, so it should include internal
fragmentation, same as used_memory
2020-10-01 11:30:22 +03:00
WuYunlong
c2e5546071
Normalize sds test mechanism together with some compile warnings. (#7854) 2020-09-28 11:27:26 +03:00
Wang Yuan
57709c4bc6
Don't write replies if close the client ASAP (#7202)
Before this commit, we would have continued to add replies to the reply buffer even if client
output buffer limit is reached, so the used memory would keep increasing over the configured limit.
What's more, we shouldn’t write any reply to the client if it is set 'CLIENT_CLOSE_ASAP' flag
because that doesn't conform to its definition and we will close all clients flagged with
'CLIENT_CLOSE_ASAP' in ‘beforeSleep’.

Because of code execution order, before this, we may firstly write to part of the replies to
the socket before disconnecting it, but in fact, we may can’t send the full replies to clients
since OS socket buffer is limited. But this unexpected behavior makes some commands work well,
for instance ACL DELUSER, if the client deletes the current user, we need to send reply to client
and close the connection, but before, we close the client firstly and write the reply to reply
buffer. secondly, we shouldn't do this despite the fact it works well in most cases.

We add a flag 'CLIENT_CLOSE_AFTER_COMMAND' to mark clients, this flag means we will close the
client after executing commands and send all entire replies, so that we can write replies to
reply buffer during executing commands, send replies to clients, and close them later.

We also fix some implicit problems. If client output buffer limit is enforced in 'multi/exec',
all commands will be executed completely in redis and clients will not read any reply instead of
partial replies. Even more, if the client executes 'ACL deluser' the using user in 'multi/exec',
it will not read the replies after 'ACL deluser' just like before executing 'client kill' itself
in 'multi/exec'.

We added some tests for output buffer limit breach during multi-exec and using a pipeline of
many small commands rather than one with big response.

Co-authored-by: Oran Agra <oran@redislabs.com>
2020-09-24 16:01:41 +03:00
bodong.ybd
e08bf16637 Add ZINTER/ZUNION command
Syntax: ZINTER/ZUNION numkeys key [key ...] [WEIGHTS weight [weight ...]]
[AGGREGATE SUM|MIN|MAX] [WITHSCORES]

see #7624
2020-09-24 08:59:14 +03:00
WuYunlong
647cac5bb4 Make main thread killable so that it can be canceled at any time.
Refine comment of makeThreadKillable().

This commit can be backported to 5.0, only if we also backport 8b70cb0.

Co-authored-by: Oran Agra <oran@redislabs.com>
2020-09-21 12:10:19 +03:00
Oran Agra
2458e54814
RM_GetContextFlags provides indication that we're in a fork child (#7783) 2020-09-20 13:43:28 +03:00
Wang Yuan
b002d2b4f1
Remove tmp rdb file in background thread (#7762)
We're already using bg_unlink in several places to delete the rdb file in the background,
and avoid paying the cost of the deletion from our main thread.
This commit uses bg_unlink to remove the temporary rdb file in the background too.

However, in case we delete that rdb file just before exiting, we don't actually wait for the
background thread or the main thread to delete it, and just let the OS clean up after us.
i.e. we open the file, unlink it and exit with the fd still open.

Furthermore, rdbRemoveTempFile can be called from a thread and was using snprintf which is
not async-signal-safe, we now use ll2string instead.
2020-09-17 18:20:10 +03:00
Wang Yuan
445a4b669a
Implement redisAtomic to replace _Atomic C11 builtin (#7707)
Redis 6.0 introduces I/O threads, it is so cool and efficient, we use C11
_Atomic to establish inter-thread synchronization without mutex. But the
compiler that must supports C11 _Atomic can compile redis code, that brings a
lot of inconvenience since some common platforms can't support by default such
as CentOS7, so we want to implement redis atomic type to make it more portable.

We have implemented our atomic variable for redis that only has 'relaxed'
operations in src/atomicvar.h, so we implement some operations with
'sequentially-consistent', just like the default behavior of C11 _Atomic that
can establish inter-thread synchronization. And we replace all uses of C11
_Atomic with redis atomic variable.

Our implementation of redis atomic variable uses C11 _Atomic, __atomic or
__sync macros if available, it supports most common platforms, and we will
detect automatically which feature we use. In Makefile we use a dummy file to
detect if the compiler supports C11 _Atomic. Now for gcc, we can compile redis
code theoretically if your gcc version is not less than 4.1.2(starts to support
__sync_xxx operations). Otherwise, we remove use mutex fallback to implement
redis atomic variable for performance and test. You will get compiling errors
if your compiler doesn't support all features of above.

For cover redis atomic variable tests, we add other CI jobs that build redis on
CentOS6 and CentOS7 and workflow daily jobs that run the tests on them.
For them, we just install gcc by default in order to cover different compiler
versions, gcc is 4.4.7 by default installation on CentOS6 and 4.8.5 on CentOS7.

We restore the feature that we can test redis with Helgrind to find data race
errors. But you need install Valgrind in the default path configuration firstly
before running your tests, since we use macros in helgrind.h to tell Helgrind
inter-thread happens-before relationship explicitly for avoiding false positives.
Please open an issue on github if you find data race errors relate to this commit.

Unrelated:
- Fix redefinition of typedef 'RedisModuleUserChangedFunc'
  For some old version compilers, they will report errors or warnings, if we
  re-define function type.
2020-09-17 16:01:45 +03:00
WuYunlong
8b70cb0ef8 bio: fix doFastMemoryTest.
If one thread got SIGSEGV, function sigsegvHandler() would be triggered,
it would call bioKillThreads(). But call pthread_cancel() to cancel itself
would make it block. Also note that if SIGSEGV is caught by bio thread, it
should kill the main thread in order to give a positive report.
2020-09-16 14:15:02 +03:00
Jim Brunner
810e28a397
Incremental eviction processing (#7653)
Rather than blindly evicting until maxmemory limit is achieved, this
update adds a time limit to eviction.  While over the maxmemory limit,
eviction will process before each command AND as a timeProc when no
commands are running.

This will reduce the latency impact on many cases, especially pathological
cases like massive used memory increase during dict rehashing.

There is a risk that some other edge cases (like massive pipelined use
of MGET) could cause Redis memory usage to keep growing despite the
eviction attempts, so a new maxmemory-eviction-tenacity config is
introduced to let users mitigate that.
2020-09-16 09:16:01 +03:00