11527 Commits

Author SHA1 Message Date
chenyang8094
f28e2ce7a4
improve logging around AOF file creation and loading (#10763)
instead of printing a log when a folder or a manifest is missing (level reduced), we print:

total time it took to load all the aof files
when creating a new base or incr file
starting to write to an existing incr file on startup
2022-05-26 17:23:05 +03:00
Meir Shpilraien (Spielrein)
ad25716a75
Added the function name/script sha to the script timeout log message. (#10780)
Added the function name/script sha to the script timeout log message.
This info existed in the log in redis 6.2, was removed in the function refactoring
since was initially complicated, but later made simple.
2022-05-26 16:11:24 +03:00
Valentino Geron
9eb97b5d94
Fix regex support in --only, --skipfile and --skiptest (#10741)
The regex support was added in:
 * https://github.com/redis/redis/pull/9352
 * https://github.com/redis/redis/pull/9555
 * https://github.com/redis/redis/pull/10212

These commits break backword compatiblity with older versions.

This fix keeps the test suite infra compatible with old versions by
default. However, if you want regex, the string must start with `/`
2022-05-25 18:25:38 +03:00
Binbin
450c88f368
Fix BZMPOP gets unblocked by non-key args and returns them (#10764)
This bug was introduced in #9484 (7.0.0).
It result that BZMPOP blocked on non-key arguments.

Like `bzmpop 0 1 myzset min count 10`, this command will additionally
block in these keys (except for the first and the last argument) and can return their values:
- 0: timeout value
- 1: numkeys value
- min: min/max token
- count: count token
2022-05-23 14:15:54 +03:00
yoav-steinberg
843a4cdc07
Add warning for suspected slow system clocksource setting (#10636)
This PR does 2 main things:
1) Add warning for suspected slow system clocksource setting. This is Linux specific.
2) Add a `--check-system` argument to redis which runs all system checks and prints a report.

## System checks
Add a command line option `--check-system` which runs all known system checks and provides
a report to stdout of which systems checks have failed with details on how to reconfigure the
system for optimized redis performance.
The `--system-check` mode exists with an appropriate error code after running all the checks.

## Slow clocksource details
We check the system's clocksource performance by running `clock_gettime()` in a loop and then
checking how much time was spent in a system call (via `getrusage()`). If we spend more than
10% of the time in the kernel then we print a warning. I verified that using the slow clock sources:
`acpi_pm` (~90% in the kernel on my laptop) and `xen` (~30% in the kernel on an ec2 `m4.large`)
we get this warning.

The check runs 5 system ticks so we can detect time spent in kernel at 20% jumps (0%,20%,40%...).
Anything more accurate will require the test to run longer. Typically 5 ticks are 50ms. This means
running the test on startup will delay startup by 50ms. To avoid this we make sure the test is only
executed in the `--check-system` mode.

For a quick startup check, we specifically warn if the we see the system is using the `xen` clocksource
which we know has bad performance and isn't recommended (at least on ec2). In such a case the
user should manually rung redis with `--check-system` to force the slower clocksource test described
above.

## Other changes in the PR

* All the system checks are now implemented as functions in _syscheck.c_.
  They are implemented using a standard interface (see details in _syscheck.c_).
  To do this I moved the checking functions `linuxOvercommitMemoryValue()`,
  `THPIsEnabled()`, `linuxMadvFreeForkBugCheck()` out of _server.c_ and _latency.c_
  and into the new _syscheck.c_. When moving these functions I made sure they don't
  depend on other functionality provided in _server.c_ and made them use a standard
  "check functions" interface. Specifically:
  * I removed all logging out of `linuxMadvFreeForkBugCheck()`. In case there's some
    unexpected error during the check aborts as before, but without any logging.
    It returns an error code 0 meaning the check didn't not complete.
  * All these functions now return 1 on success, -1 on failure, 0 in case the check itself
    cannot be completed.
  * The `linuxMadvFreeForkBugCheck()` function now internally calls `exit()` and not
    `exitFromChild()` because the latter is only available in _server.c_ and I wanted to
    remove that dependency. This isn't an because we don't need to worry about the
    child process created by the test doing anything related to the rdb/aof files which
    is why `exitFromChild()` was created.

* This also fixes parsing of other /proc/\<pid\>/stat fields to correctly handle spaces
  in the process name and be more robust in general. Not that before this fix the rss
  info in `INFO memory` was corrupt in case of spaces in the process name. To
  recreate just rename `redis-server` to `redis server`, start it, and run `INFO memory`.
2022-05-22 17:10:31 +03:00
Oran Agra
b0e18f804d
Scripts that declare the no-writes flag are implicitly allow-oom too. (#10699)
Scripts that have the `no-writes` flag, cannot execute write commands,
and since all `deny-oom` commands are write commands, we now act
as if the `allow-oom` flag is implicitly set for scripts that set the `no-writes` flag.
this also implicitly means that the EVAL*_RO and FCALL_RO commands can
never fails with OOM error.

Note about a bug that's no longer relevant:
There was an issue with EVAL*_RO using shebang not being blocked correctly
in OOM state:
When an EVAL script declares a shebang, it was by default not allowed to run in
OOM state.
but this depends on a flag that is updated before the command is executed, which
was not updated in case of the `_RO` variants.
the result is that if the previous cached state was outdated (either true or false),
the script will either unjustly fail with OOM, or unjustly allowed to run despite
the OOM state.
It doesn't affect scripts without a shebang since these depend on the actual
commands they run, and since these are only read commands, they don't care
for that cached oom state flag.
it did affect scripts with shebang and no allow-oom flag, bug after the change in
this PR, scripts that are run with eval_ro would implicitly have that flag so again
the cached state doesn't matter.

p.s. this isn't a breaking change since all it does is allow scripts to run when they
should / could rather than blocking them.
2022-05-22 16:02:59 +03:00
yoav-steinberg
cb6933e346
Updated HDR histogram from upsteam after they merged our fix in #10606. (#10755)
The code is based on upstream https://github.com/HdrHistogram/HdrHistogram_c
master branch latest commit (e4448cf6d1cd08fff519812d3b1e58bd5a94ac42).
The reason to pull this in now is that their final version of our optimization is even faster.
See: https://github.com/HdrHistogram/HdrHistogram_c/pull/107.
2022-05-22 13:44:29 +03:00
Binbin
18cb4a7d93
Remove ziplist dead code in object.c (#10751)
Remove some dead code in object.c, ziplist is no longer used in 7.0

Some backgrounds:
zipmap - hash: replaced by ziplist in #285
ziplist - hash: replaced by listpack in #8887
ziplist - zset: replaced by listpack in #9366
ziplist - list: replaced by quicklist (listpack) in #2143 / #9740

Moved the location of ziplist.h in the server.c
2022-05-22 12:27:54 +03:00
Yuuoniy
4a7a4e42db
Fix memory leak in streamGetEdgeID (#10753)
si is initialized by streamIteratorStart(), we should call
streamIteratorStop() on it when done.

regression introduced in #9127 (redis 7.0)
2022-05-22 12:15:26 +03:00
Ofir Luzon
00a9d6b314
Add SIGINT handler to redis-cli --bigkeys, --memkeys, --hotkeys, --scan (#10736)
Finish current loop and display the scanned keys summery on SIGINT (Ctrl-C) signal.
It will also prepend the current scanned percentage to the scanned keys summery 1st line.

In this commit I've renamed and relocated `intrinsicLatencyModeStop` function as I'm using the exact same logic.
2022-05-22 10:55:26 +03:00
Binbin
60250f50c2
Fix typos in module comment / documentation (#10740)
minor cleanup in redismodule.h and module.c
2022-05-18 08:29:39 +03:00
Ozan Tezcan
f62d52a5ad
Add const qualifier to config name parameter in RM_RegisterBoolConfig() (#10733)
was present in the C file and missing in the header.
2022-05-16 14:53:38 +03:00
Qu Chen
837c063baf
Replica fail and retry the PSYNC if the master is unresponsive (#10726)
We observed from our replication testing that when the master becomes unresponsive,
or the replication connection is broken during PSYNC so the replica doesn't get a
response from the master, it was not able to recognize that condition as a failure
and instead moved into the full-sync code path. This fix makes the replica fail and
retry the PSYNC with the master in such scenarios.
2022-05-16 12:08:19 +03:00
Qu Chen
a7d6ca9770
Make the check for if script is running or not consistent (#10725)
sometimes it is using `scriptIsRunning()` and other times it is using `server.in_script`.
We should use the `scriptIsRunning()` method consistently throughout the code base.
Removed server.in_script sine it's no longer used / needed.
2022-05-15 14:07:45 +03:00
Tian
4b262182d9
Remove a redundant free in freeClient (#10721) 2022-05-14 19:38:26 -07:00
Stephen Sullivan
acd517c883
re-add SENTINEL SLAVES command, missing in redis 7.0 (#10723)
Alias was mistakenly forgotten when the sub commands introduced as json files.
2022-05-13 19:47:58 +03:00
Wen Hui
135998ed8d
Update comments on command args, and a misleading error reply (#10645)
Updated the comments for:
info command
lmpopCommand and blmpopCommand
sinterGenericCommand 

Fix the missing "key" words in the srandmemberCommand function
For LPOS command, when rank is 0, prompt user that rank could be
positive number or negative number, and add a test for it
2022-05-13 17:55:49 +03:00
Binbin
586a16ad79
Fix race in module fork kill test (#10717)
The purpose of the test is to kill the child while it is running.
From the last two lines we can see the child exits before being killed.
```
- Module fork started pid: 56998
* <fork> fork child started
- Killing running module fork child: 56998
* <fork> fork child exiting
signal-handler (1652267501) Received SIGUSR1 in child, exiting now.
```

In this commit, we pass an argument to `fork.create` indicating how
long it should sleep. For the fork kill test, we use a longer time to
avoid the child exiting before being killed.

Other changes:
use wait_for_condition instead of hardcoded `after 250`.
Unify the test for failing fork with the one for killing it (save time)
2022-05-12 20:10:38 +03:00
yoav-steinberg
b16d1c2713
Fix possible regression around TLS config changes. Add VOLATILE_CONFIG flag for volatile configurations. (#10713)
This fixes a possible regression in Redis 7.0.0, in which doing CONFIG SET
on a TLS config would not reload the configuration in case the new config is
the same file as before.

A volatile configuration is a configuration value which is a reference to the
configuration data and not the configuration data itself. In such a case Redis
doesn't know if the config data changed under the hood and can't assume a
change happens only when the config value changes. Therefore it needs to
be applied even when setting a config value to the same value as it was before.
2022-05-12 17:06:26 +03:00
Yossi Gottlieb
b414605285
Update security page with supported versions. (#10712) 2022-05-11 16:18:02 +03:00
Yossi Gottlieb
8bdd2d5ddd
Fix Makefile.dep generation with ICC. (#10708)
Before this commit, all source files including those that are not going
to be compiled were used. Some of these files are platform specific and
won't even pre-process on another platform. With GCC/Clang, that's not
an issue and they'll simply ignore them, but ICC aborts in this case.

This commit only attempts to generate Makefile.dep from the actual set
of C source files that will be compiled.
2022-05-11 12:06:33 +03:00
Binbin
bfbb15f75d
redis-server command line arguments support take one bulk string with spaces for MULTI_ARG configs parsing. And allow options value to use the -- prefix (#10660)
## Take one bulk string with spaces for MULTI_ARG configs parsing
Currently redis-server looks for arguments that start with `--`,
and anything in between them is considered arguments for the config.
like: `src/redis-server --shutdown-on-sigint nosave force now --port 6380`

MULTI_ARG configs behave differently for CONFIG command, vs the command
line argument for redis-server.
i.e. CONFIG command takes one bulk string with spaces in it, while the
command line takes an argv array with multiple values.

In this PR, in config.c, if `argc > 1` we can take them as is,
and if the config is a `MULTI_ARG` and `argc == 1`, we will split it by spaces.

So both of these will be the same:
```
redis-server --shutdown-on-sigint nosave force now --shutdown-on-sigterm nosave force
redis-server --shutdown-on-sigint nosave "force now" --shutdown-on-sigterm nosave force
redis-server --shutdown-on-sigint nosave "force now" --shutdown-on-sigterm "nosave force"
```

## Allow options value to use the `--` prefix
Currently it decides to switch to the next config, as soon as it sees `--`, 
even if there was not a single value provided yet to the last config,
this makes it impossible to define a config value that has `--` prefix in it.

For instance, if we want to set the logfile to `--my--log--file`,
like `redis-server --logfile --my--log--file --loglevel verbose`,
current code will handle that incorrectly.

In this PR, now we allow a config value that has `--` prefix in it.
**But note that** something like `redis-server --some-config --config-value1 --config-value2 --loglevel debug`
would not work, because if you want to pass a value to a config starting with `--`, it can only be a single value.
like: `redis-server --some-config "--config-value1 --config-value2" --loglevel debug`

An example (using `--` prefix config value):
```
redis-server --logfile --my--log--file --loglevel verbose
redis-cli config get logfile loglevel
1) "loglevel"
2) "verbose"
3) "logfile"
4) "--my--log--file"
```

### Potentially breaking change
`redis-server --save --loglevel verbose` used to work the same as `redis-server --save "" --loglevel verbose`
now, it'll error!
2022-05-11 11:33:35 +03:00
Binbin
783b210db4
FLUSHDB and FLUSHALL add call forceCommandPropagation / FLUSHALL reset dirty counter to 0 if we enable save (#10691)
## FLUSHALL
We used to restore the dirty counter after `rdbSave` zeroed it if we enable save.
Otherwise FLUSHALL will not be replicated nor put into the AOF.

And then we do increment it again below.
Without that extra dirty++, when db was already empty, FLUSHALL
will not be replicated nor put into the AOF.

We now gonna replace all that dirty counter magic with a call
to forceCommandPropagation (REPL and AOF), instead of all the
messing around with the dirty counter.
Added tests to cover three part (dirty counter, REPL, AOF).

One benefit other than cleaner code is that the `rdb_changes_since_last_save` is correct in this case.

## FLUSHDB
FLUSHDB was not replicated nor put into the AOF when db was already empty.
Unlike DEL on a non-existing key, FLUSHDB always does something, and that's to call the module hook. 
So basically FLUSHDB is never a NOP, and thus it should always be propagated.
Not doing that, could mean that if a module does something in that hook, and wants to
avoid issues of that hook being missing on the replica if the db is empty, it'll need to do complicated things.

So now FLUSHDB add call forceCommandPropagation, we will always propagate FLUSHDB.
Always propagating FLUSHDB seems like a safe approach that shouldn't have any drawbacks (other than looking odd)

This was mentioned in #8972

## Test section:
We actually found it while solving a race condition in the BGSAVE test (other.tcl).
It was found in extra_ci Daily Arm64 (test-libc-malloc).
```
[exception]: Executing test client: ERR Background save already in progress.
ERR Background save already in progress
```

It look like `r flushdb` trigger (schedule) a bgsave right after `waitForBgsave r` and before `r save`.
Changing flushdb to flushall, FLUSHALL will do a foreground save and then set the dirty counter to 0.
2022-05-11 11:21:16 +03:00
guybe7
815a6f846a
Dediacted member to hold RedisModuleCommand (#10681)
Fix #10552

We no longer piggyback getkeys_proc to hold the RedisModuleCommand struct, when exists

Others:
Use `doesCommandHaveKeys` in `RM_GetCommandKeysWithFlags` and `getKeysSubcommandImpl`.
It causes a very minor behavioral change in commands that don't have actual keys, but have a spec
with `CMD_KEY_NOT_KEY`.
For example, before this command `COMMAND GETKEYS SPUBLISH` would return
`Invalid arguments specified for command` but not it returns `The command has no key arguments`
2022-05-10 14:56:12 +03:00
Mariya Markova
c2d8d4e648
Replace float zero comparison to FP_ZERO comparison (#10675)
I suggest to use "[fpclassify](https://en.cppreference.com/w/cpp/numeric/math/fpclassify)" for float
comparison with zero, because of expression "value == 0" with value very close to zero can be
considered as true with some performance compiler optimizations.

Note: this code was introduced by 9d520a7f to accept zset scores that get ERANGE in conversion
due to precision loss near 0.
But with Intel compilers, ICC and ICX, where optimizations for 0 check are more aggressive, "==0" is
true for mentioned functions, however should not be. Behavior is seen starting from O2.
This leads to a failure in the ZSCAN test in scan.tcl
2022-05-10 14:55:09 +03:00
Binbin
2a1ea8c7d8
CLUSTER SHARDS should returns slots as integers, not strings (#10683)
It used to returns slots as strings, like:
```
redis> cluster shards
1) 1) "slots"
   2) 1) "10923"
      2) "16383"
```

CLUSTER SHARDS docs and the top comment of #10293 says that it returns integers.
Note other commands like CLUSTER SLOTS, it returns slots as integers.
Use addReplyLongLong instead of addReplyBulkLongLong, now it returns slots as integers:
```
redis> cluster shards
1) 1) "slots"
   2) 1) (integer) 10923
      2) (integer) 16383
```

This is a small breaking change, introduced in 7.0.0 (7.0 RC3, #10293)

Fixes #10680
2022-05-10 14:22:01 +03:00
Meir Shpilraien (Spielrein)
442e73ea09
Fix #10705, avoid relinking the same library twice. (#10706)
Set `old_li` to NULL to avoid linking it again on error.
Before the fix, loading an already existing library will cause the existing library to be added again. This cause not harm other then wrong statistics. The statistics that are effected  by the issue are:
* `libraries_count` and `functions_count` returned by `function stats` command
* `used_memory_functions` returned on `info memory` command
* `functions.caches` returned on `memory stats` command
2022-05-10 11:47:45 +03:00
Ozan Tezcan
a3df2777e8
Fix cursor type in RedisModuleScanCursor (#10698)
Changed cursor's type from `int` to `unsigned long`
allows handling database or key with more than 2^31 elements
2022-05-09 18:45:51 +03:00
Lu JJ
6b44e4ea92
fix some typos in "t_zset.c" (#10670)
fix some typo in "t_zset.c".
1. `zzlisinlexrange` the function name mentioned in the comment is misspelled.
2. fix typo in function name`zarndmemberReplyWithListpack` -> `zrandmemberReplyWithListpack`
2022-05-09 15:04:39 +03:00
Oran Agra
2bcd890d8a
Fix --save command line regression in redis 7.0.0 (#10690)
Unintentional change in #9644 (since RC1) meant that an empty `--save ""` config
from command line, wouldn't have clear any setting from the config file

Added tests to cover that, and improved test infra to take additional
command line args for redis-server
2022-05-09 13:37:49 +03:00
Oran Agra
eb915a82a5
Bug fixes for enum configs with overlapping bit flags (module API) (#10661)
If we want to support bits that can be overlapping, we need to make sure
that:
1. we don't use the same bit for two return values.
2. values should be sorted so that prefer ones (matching more
   bits) come first.
2022-05-09 13:36:53 +03:00
Ozan Tezcan
8fc959216c
Fix RM_Scan() documentation (#10693)
Fixed RM_Scan() usage example: `RedisModuleCursor` -> `RedisModuleScanCursor`
2022-05-09 12:38:45 +03:00
David CARLIER
bdcd4b3df8
zmalloc_get_rss implementation for haiku. (#10687)
also fixing already defined constants build warning while at it.

Co-authored-by: Oran Agra <oran@redislabs.com>
2022-05-08 15:12:17 +03:00
Shaya Potter
4e761eb7e2
update redismodule notify defines to be in sync with server (#10688)
this seems to have been an oversight. syncing the flags so that NOTIFY_NEW is available to modules.
missing in #10512
2022-05-08 15:05:24 +03:00
dependabot[bot]
ff3a3577f2
Bump github/codeql-action from 1 to 2 (#10635)
* Bump github/codeql-action from 1 to 2

Bumps [github/codeql-action](https://github.com/github/codeql-action) from 1 to 2.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](https://github.com/github/codeql-action/compare/v1...v2)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* Avoid CodeQL on push error.

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-04 11:40:08 +03:00
Viktor Söderqvist
ced710fc83
Module API doc script: Mark unreleased API functions (#10674)
* Module API doc script: Mark unreleased API functions

* fix broken quotes in generate-module-api-doc.rb

Co-authored-by: Oran Agra <oran@redislabs.com>
2022-05-03 10:34:18 +03:00
Meir Shpilraien (Spielrein)
f44c343292
Expose Lua error in case of string error. (#10677)
In general, our error handler make sure the error
object is always a table. In some rare cases (such
as OOM error), the error handler will not be called
and the error object will be a string. The PR expose
the error even if its a string and not a table.

Currently there is no way to test it but if it'll ever happen,
it is better to propagate this string upwards than just
generate a generic error without any specific info.
2022-05-03 10:24:05 +03:00
Lu JJ
87131a5fa6
fast path when SDIFF command has the same key as the first key (#10663)
When user uses the same input key for SDIFF as the first one, the result must be empty, so we don't need to process the elements to test.

This method is like the one done in zset‘s `zsetChooseDiffAlgorithm`

Co-authored-by: Oran Agra <oran@redislabs.com>
2022-05-02 16:18:11 +03:00
wutao_water
9c39256a28
cleanup: use ZIPLIST_ENTRY_END macro instead of 1 (#3672)
update macros ZIPLIST_ENTRY_END i think the right definition is ((zl)+intrev32ifbe(ZIPLIST_BYTES(zl))-ZIPLIST_END_SIZE)
2022-05-02 12:46:48 +03:00
David CARLIER
ef68deb3c2
support tcp-keepalive config interval on macOs (#10667)
Till now, on MacOS we only used to enable SO_KEEPALIVE,
but we didn't set the interval which is configurable via the `tcp-keepalive` config.
This adds support for that on MacOS, to match what we already do on Linux.
2022-05-02 09:37:14 +03:00
Lu JJ
1666ffbe1f
fix typo in 'setTypeRandomElement' (#10662)
`the redis object pointer was populated.` -> `the sds pointer was populated.`
We don't populate the redis object pointer in this function.
2022-05-01 12:09:44 +03:00
Lu JJ
02080f2686
add comment to 'container' in 'quicklist.h' (#10656)
add a comment to `container` in `quicklist.h`.
Because `PLAIN` and `PACKED` are not as easy to understand as `NONE`
and `LISTPACK` and we don't have a detailed comment on it.

Co-authored-by: Oran Agra <oran@redislabs.com>
2022-04-28 08:36:40 +03:00
Itamar Haber
6c65345edb
Injects Hugo FrontMatter to module-api.md (#10658) 2022-04-28 08:16:20 +03:00
Wen Hui
f36eac9f68
Update the comments of commands introduced or updated in redis 7.0 (#10659) 2022-04-28 08:13:04 +03:00
Oran Agra
89772ed827
Merge pull request #10651 from oranagra/meir_lua_readonly_tables
# Lua readonly tables
The PR adds support for readonly tables on Lua to prevent security vulnerabilities:
* (CVE-2022-24736) An attacker attempting to load a specially crafted Lua script
  can cause NULL pointer dereference which will result with a crash of the
  redis-server process. This issue affects all versions of Redis.
* (CVE-2022-24735) By exploiting weaknesses in the Lua script execution
  environment, an attacker with access to Redis can inject Lua code that will
  execute with the (potentially higher) privileges of another Redis user.

The PR is spitted into 4 commits.

### Change Lua to support readonly tables

This PR modifies the Lua interpreter code to support a new flag on tables. The new flag indicating that the table is readonly and any attempt to perform any writes on such a table will result in an error. The new feature can be turned off and on using the new `lua_enablereadonlytable` Lua API. The new API can be used **only** from C code. Changes to support this feature was taken from https://luau-lang.org/

### Change eval script to set user code on Lua registry

Today, Redis wrap the user Lua code with a Lua function. For example, assuming the user code is:

```
return redis.call('ping')
```

The actual code that would have sent to the Lua interpreter was:

```
f_b3a02c833904802db9c34a3cf1292eee3246df3c() return redis.call('ping') end
```

The warped code would have been saved on the global dictionary with the following name: `f_<script sha>` (in our example `f_b3a02c833904802db9c34a3cf1292eee3246df3c`). This approach allows one user to easily override the implementation of another user code, example:

```
f_b3a02c833904802db9c34a3cf1292eee3246df3c = function() return 'hacked' end
```

Running the above code will cause `evalsha b3a02c833904802db9c34a3cf1292eee3246df3c 0` to return `hacked` although it should have returned `pong`. Another disadvantage is that Redis basically runs code on the loading (compiling) phase without been aware of it. User can do code injection like this:

```
return 1 end <run code on compling phase> function() return 1
```

The warped code will look like this and the entire `<run code on compiling phase>` block will run outside of eval or evalsha context:

```
f_<sha>() return 1 end <run code on compling phase> function() return 1 end
```

The commits puts the user code on a special Lua table called the registry. This table is not accessible to the user so it can not be manipulated by him. Also there is no longer a need to warp the user code so there is no risk in code injection which will cause running code in the wrong context.

### Use `lua_enablereadonlytable` to protect global tables on eval and function

The commit uses the new `lua_enablereadonlytable` Lua API to protect the global tables of both evals scripts and functions. For eval scripts, the implementation is easy, We simply call `lua_enablereadonlytable` on the global table to turn it into a readonly table.

On functions its more complected, we want to be able to switch globals between load run and function run. To achieve this, we create a new empty table that acts as the globals table for function, we control the actual globals using metatable manipulations. Notice that even if the user gets a pointer to the original tables, all the tables are set to be readonly (using `lua_enablereadonlytable` Lua API) so he can not change them. The following better explains the solution:

```
Global table {} <- global table metatable {.__index = __real_globals__}
```

The `__real_globals__` is depends on the run context (function load or function call).

Why is this solution needed and its not enough to simply switch globals? When we run in the context of function load and create our functions, our function gets the current globals that was set when they were created. Replacing the globals after the creation will not effect them. This is why this trick it mandatory.

### Protect the rest of the global API and add an allowed list to the provided API

The allowed list is done by setting a metatable on the global table before initialising any library. The metatable set the `__newindex` field to a function that check the allowed list before adding the field to the table. Fields which is not on the
allowed list are simply ignored.

After initialisation phase is done we protect the global table and each table that might be reachable from the global table. For each table we also protect the table metatable if exists.

### Performance

Performance tests was done on a private computer and its only purpose is to show that this fix is not causing any performance regression.

case 1: `return redis.call('ping')`
case 2: `for i=1,10000000 do redis.call('ping') end`

|                             | Unstable eval | Unstable function | lua_readonly_tables eval | lua_readonly_tables function |
|-----------------------------|---------------|-------------------|--------------------------|------------------------------|
| case1 ops/sec               | 235904.70     | 236406.62         | 232180.16               | 230574.14                   |
| case1 avg latency ms        | 0.175         | 0.164             | 0.178                    | 0.149                        |
| case2 total time in seconds | 3.373         | 3.444s            | 3.268                   | 3.278                        |

### Breaking changes

* `print` function was removed from Lua because it can potentially cause the Redis processes to get stuck (if no one reads from stdout). Users should use redis.log. An alternative is to override the `print` implementation and print the message to the log file.

All the work by @MeirShpilraien, i'm just publishing it.
2022-04-27 12:48:51 +03:00
chenyang8094
e24d46004b
Delete renamed new incr when write manifest failed (#10649)
Followup fix for #10616
2022-04-27 08:07:52 +03:00
meir
efa162bcd7 Protect any table which is reachable from globals and added globals white list.
The white list is done by setting a metatable on the global table before initializing
any library. The metatable set the `__newindex` field to a function that check
the white list before adding the field to the table. Fields which is not on the
white list are simply ignored.

After initialization phase is done we protect the global table and each table
that might be reachable from the global table. For each table we also protect
the table metatable if exists.
2022-04-27 00:37:40 +03:00
meir
3731580b6b Protect globals of both evals scripts and functions.
Use the new `lua_enablereadonlytable` Lua API to protect the global tables of
both evals scripts and functions. For eval scripts, the implemetation is easy,
We simply call `lua_enablereadonlytable` on the global table to turn it into
a readonly table.

On functions its more complecated, we want to be able to switch globals between
load run and function run. To achieve this, we create a new empty table that
acts as the globals table for function, we control the actual globals using metatable
manipulation. Notice that even if the user gets a pointer to the original tables, all
the tables are set to be readonly (using `lua_enablereadonlytable` Lua API) so he can
not change them. The following inlustration better explain the solution:

```
Global table {} <- global table metatable {.__index = __real_globals__}
```

The `__real_globals__` is set depends on the run context (function load or function call).

Why this solution is needed and its not enough to simply switch globals?
When we run in the context of function load and create our functions, our function gets
the current globals that was set when they were created. Replacing the globals after
the creation will not effect them. This is why this trick it mandatory.
2022-04-27 00:37:40 +03:00
meir
992f9e23c7 Move user eval function to be located on Lua registry.
Today, Redis wrap the user Lua code with a Lua function.
For example, assuming the user code is:

```
return redis.call('ping')
```

The actual code that would have sent to the Lua interpreter was:

```
f_b3a02c833904802db9c34a3cf1292eee3246df3c() return redis.call('ping') end
```

The wraped code would have been saved on the global dictionary with the
following name: `f_<script sha>` (in our example `f_b3a02c833904802db9c34a3cf1292eee3246df3c`).

This approach allows one user to easily override the implementation a another user code, example:

```
f_b3a02c833904802db9c34a3cf1292eee3246df3c = function() return 'hacked' end
```

Running the above code will cause `evalsha b3a02c833904802db9c34a3cf1292eee3246df3c 0` to return
hacked although it should have returned `pong`.

Another disadventage is that Redis basically runs code on the loading (compiling) phase without been
aware of it. User can do code injection like this:

```
return 1 end <run code on compling phase> function() return 1
```

The wraped code will look like this and the entire `<run code on compling phase>` block will run outside
of eval or evalsha context:

```
f_<sha>() return 1 end <run code on compling phase> function() return 1 end
```
2022-04-27 00:20:54 +03:00
meir
8b33d813a3 Added support for Lua readonly tables.
The new feature can be turned off and on using the new `lua_enablereadonlytable` Lua API.
2022-04-27 00:20:54 +03:00