redict/tests/modules
debing.sun d0640029dc
Fix race condition issues between the main thread and module threads (#12817)
Fix #12785 and other race condition issues.
See the following isolated comments.

The following report was obtained using SANITIZER thread.
```sh
make SANITIZER=thread
./runtest-moduleapi --config io-threads 4 --config io-threads-do-reads yes --accurate
```

1. Fixed thread-safe issue in RM_UnblockClient()
Related discussion:
https://github.com/redis/redis/pull/12817#issuecomment-1831181220
* When blocking a client in a module using `RM_BlockClientOnKeys()` or
`RM_BlockClientOnKeysWithFlags()`
with a timeout_callback, calling RM_UnblockClient() in module threads
can lead to race conditions
     in `updateStatsOnUnblock()`.

     - Introduced: 
        Version: 6.2
        PR: #7491

     - Touch:
`server.stat_numcommands`, `cmd->latency_histogram`, `server.slowlog`,
and `server.latency_events`
     
     - Harm Level: High
Potentially corrupts the memory data of `cmd->latency_histogram`,
`server.slowlog`, and `server.latency_events`

     - Solution:
Differentiate whether the call to moduleBlockedClientTimedOut() comes
from the module or the main thread.
Since we can't know if RM_UnblockClient() comes from module threads, we
always assume it does and
let `updateStatsOnUnblock()` asynchronously update the unblock status.
     
* When error reply is called in timeout_callback(), ctx is not
thread-safe, eventually lead to race conditions in `afterErrorReply`.

     - Introduced: 
        Version: 6.2
        PR: #8217

     - Touch
       `server.stat_total_error_replies`, `server.errors`, 

     - Harm Level: High
       Potentially corrupts the memory data of `server.errors`
   
      - Solution: 
Make the ctx in `timeout_callback()` with `REDISMODULE_CTX_THREAD_SAFE`,
and asynchronously reply errors to the client.

2. Made RM_Reply*() family API thread-safe
Related discussion:
https://github.com/redis/redis/pull/12817#discussion_r1408707239
Call chain: `RM_Reply*()` -> `_addReplyToBufferOrList()` -> touch
server.current_client

    - Introduced: 
       Version: 7.2.0
       PR: #12326

   - Harm Level: None
Since the module fake client won't have the `CLIENT_PUSHING` flag, even
if we touch server.current_client,
     we can still exit after `c->flags & CLIENT_PUSHING`.

   - Solution
      Checking `c->flags & CLIENT_PUSHING` earlier.

3. Made freeClient() thread-safe
    Fix #12785

    - Introduced: 
       Version: 4.0
Commit:
3fcf959e60

    - Harm Level: Moderate
       * Trigger assertion
It happens when the module thread calls freeClient while the io-thread
is in progress,
which just triggers an assertion, and doesn't make any race condiaions.

* Touch `server.current_client`, `server.stat_clients_type_memory`, and
`clientMemUsageBucket->clients`.
It happens between the main thread and the module threads, may cause
data corruption.
1. Error reset `server.current_client` to NULL, but theoretically this
won't happen,
because the module has already reset `server.current_client` to old
value before entering freeClient.
2. corrupts `clientMemUsageBucket->clients` in
updateClientMemUsageAndBucket().
3. Causes server.stat_clients_type_memory memory statistics to be
inaccurate.
    
    - Solution:
* No longer counts memory usage on fake clients, to avoid updating
`server.stat_clients_type_memory` in freeClient.
* No longer resetting `server.current_client` in unlinkClient, because
the fake client won't be evicted or disconnected in the mid of the
process.
* Judgment assertion `io_threads_op == IO_THREADS_OP_IDLE` only if c is
not a fake client.

4. Fixed free client args without GIL
Related discussion:
https://github.com/redis/redis/pull/12817#discussion_r1408706695
When freeing retained strings in the module thread (refcount decr), or
using them in some way (refcount incr), we should do so while holding
the GIL,
otherwise, they might be simultaneously freed while the main thread is
processing the unblock client state.

    - Introduced: 
       Version: 6.2.0
       PR: #8141

   - Harm Level: Low
     Trigger assertion or double free or memory leak. 

   - Solution:
Documenting that module API users need to ensure any access to these
retained strings is done with the GIL locked

5. Fix adding fake client to server.clients_pending_write
    It will incorrectly log the memory usage for the fake client.
Related discussion:
https://github.com/redis/redis/pull/12817#issuecomment-1851899163

    - Introduced: 
       Version: 4.0
Commit:
9b01b64430

    - Harm Level: None
      Only result in NOP

    - Solution:
       * Don't add fake client into server.clients_pending_write
* Add c->conn assertion for updateClientMemUsageAndBucket() and
updateClientMemoryUsage() to avoid same
         issue in the future.
So now it will be the responsibility of the caller of both of them to
avoid passing in fake client.

6. Fix calling RM_BlockedClientMeasureTimeStart() and
RM_BlockedClientMeasureTimeEnd() without GIL
    - Introduced: 
       Version: 6.2
       PR: #7491

   - Harm Level: Low
Causes inaccuracies in command latency histogram and slow logs, but does
not corrupt memory.

   - Solution:
Module API users, if know that non-thread-safe APIs will be used in
multi-threading, need to take responsibility for protecting them with
their own locks instead of the GIL, as using the GIL is too expensive.

### Other issue
1. RM_Yield is not thread-safe, fixed via #12905.

### Summarize
1. Fix thread-safe issues for `RM_UnblockClient()`, `freeClient()` and
`RM_Yield`, potentially preventing memory corruption, data disorder, or
assertion.
2. Updated docs and module test to clarify module API users'
responsibility for locking non-thread-safe APIs in multi-threading, such
as RM_BlockedClientMeasureTimeStart/End(), RM_FreeString(),
RM_RetainString(), and RM_HoldString().

### About backpot to 7.2
1. The implement of (1) is not too satisfying, would like to get more
eyes.
2. (2), (3) can be safely for backport
3. (4), (6) just modifying the module tests and updating the
documentation, no need for a backpot.
4. (5) is harmless, no need for a backpot.

---------

Co-authored-by: Oran Agra <oran@redislabs.com>
2024-01-19 15:12:49 +02:00
..
aclcheck.c Allows modules to declare new ACL categories. (#12486) 2023-08-30 13:01:24 -07:00
auth.c Fix usleep compilation warning in auth.c (#11925) 2023-03-16 11:24:52 +02:00
basics.c Skip test for sdsRemoveFreeSpace when mem_allocator is not jemalloc (#11878) 2023-03-07 09:06:58 +02:00
blockedclient.c Fix race condition issues between the main thread and module threads (#12817) 2024-01-19 15:12:49 +02:00
blockonbackground.c Fix race condition issues between the main thread and module threads (#12817) 2024-01-19 15:12:49 +02:00
blockonkeys.c Modules: Unblock from within a timer coverage (#12337) 2023-06-22 23:15:16 +03:00
cmdintrospection.c Overhauls command summaries and man pages. (#11942) 2023-03-29 20:48:59 +03:00
commandfilter.c Un-register notification and server event when RedisModule_OnLoad fails (#12809) 2023-11-27 17:26:33 +02:00
crash.c Handle recursive serverAsserts and provide more information for recursive segfaults (#12857) 2024-01-02 18:20:22 -08:00
datatype2.c Use const char pointer in redismodule.h as far as possible (#10064) 2022-01-18 15:55:20 +02:00
datatype.c Module commands to have ACL categories. (#11708) 2023-03-21 10:07:11 -07:00
defragtest.c Build TLS as a loadable module 2022-08-23 12:37:56 +03:00
eventloop.c delete obsolete REDISMODULE_EXPERIMENTAL_API define in module demos (#10527) 2022-04-05 08:21:41 +03:00
fork.c Fix race in module fork kill test (#10717) 2022-05-12 20:10:38 +03:00
getchannels.c Implemented module getchannels api and renamed channel keyspec (#10299) 2022-02-22 11:00:03 +02:00
getkeys.c Handle key-spec flags with modules (#10237) 2022-02-08 10:01:35 +02:00
hash.c Sort out the mess around writable replicas and lookupKeyRead/Write (#9572) 2021-11-28 11:26:28 +02:00
hooks.c Un-register notification and server event when RedisModule_OnLoad fails (#12809) 2023-11-27 17:26:33 +02:00
infotest.c Update supported version list. (#12488) 2023-08-16 08:36:40 +03:00
keyspace_events.c Un-register notification and server event when RedisModule_OnLoad fails (#12809) 2023-11-27 17:26:33 +02:00
keyspecs.c RM_CreateCommand should not set CMD_KEY_VARIABLE_FLAGS automatically (#11320) 2022-09-28 14:15:07 +03:00
list.c Fix crash due to to reuse iterator entry after list deletion in module (#11383) 2022-10-22 20:36:50 +03:00
Makefile Handle recursive serverAsserts and provide more information for recursive segfaults (#12857) 2024-01-02 18:20:22 -08:00
mallocsize.c Add RM_MallocSizeString, RM_MallocSizeDict (#10542) 2022-04-17 08:31:57 +03:00
misc.c Match REDISMODULE_OPEN_KEY_* flags to LOOKUP_* flags (#11772) 2023-02-09 14:59:05 +02:00
moduleauthtwo.c Custom authentication for Modules (#11659) 2023-03-15 15:18:42 -07:00
moduleconfigs.c Module commands to have ACL categories. (#11708) 2023-03-21 10:07:11 -07:00
moduleconfigstwo.c Module Configurations (#10285) 2022-03-30 15:47:06 +03:00
postnotifications.c Fix delKeysInSlot server events are not executed inside an execution unit (#12745) 2023-12-11 20:15:19 +02:00
propagate.c sub-command support for ACL CAT and COMMAND LIST. redisCommand always stores fullname (#10127) 2022-01-23 10:05:06 +02:00
publish.c Fix broken protocol when PUBLISH emits local push inside MULTI (#12326) 2023-06-20 20:41:41 +03:00
rdbloadsave.c Add RM_RdbLoad and RM_RdbSave module API functions (#11852) 2023-04-09 12:07:32 +03:00
reply.c Add RM_ReplyWithErrorFormat that can support format (#11923) 2023-04-12 10:11:29 +03:00
scan.c Replace deprecated REDISMODULE_POSTPONED_ARRAY_LEN in module tests and examples (#9677) 2021-10-25 12:00:43 +03:00
stream.c Sort out the mess around writable replicas and lookupKeyRead/Write (#9572) 2021-11-28 11:26:28 +02:00
subcommands.c Block some specific characters in module command names (#11434) 2022-11-03 13:19:49 +02:00
test_lazyfree.c Sort out the mess around writable replicas and lookupKeyRead/Write (#9572) 2021-11-28 11:26:28 +02:00
testrdb.c Avoid saving module aux on RDB if no aux data was saved by the module. (#11374) 2022-10-18 19:45:46 +03:00
timer.c Modules: Mark all APIs non-experimental (#9983) 2021-12-30 12:17:22 +02:00
usercall.c Fix race condition issues between the main thread and module threads (#12817) 2024-01-19 15:12:49 +02:00
zset.c Delete empty key if fails after moduleCreateEmptyKey() in module (#12129) 2023-05-07 10:13:19 +03:00