Fix CLUSTER SHARDS crash in 7.0/7.2 mixed clusters where shard ids are not sync (#12832)

Crash reported in #12695. In the process of upgrading the cluster from
7.0 to 7.2, because the 7.0 nodes will not gossip shard id, in 7.2 we
will rely on shard id to build the server.cluster->shards dict.

In some cases, for example, the 7.0 master node and the 7.2 replica node.
From the view of 7.2 replica node, the cluster->shards dictionary does not
have its master node. In this case calling CLUSTER SHARDS on the 7.2 replica
node may crash.

We should fix the underlying assumption of updateShardId, which is that the
shard dict should be always in sync with the node's shard_id. The fix was
suggested by PingXie, see more details in #12695.
This commit is contained in:
Binbin 2024-01-08 12:54:41 +08:00 committed by GitHub
parent ca1f67af80
commit 5b0c6a8255
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -1609,6 +1609,7 @@ void clusterRenameNode(clusterNode *node, char *newname) {
serverAssert(retval == DICT_OK);
memcpy(node->name, newname, CLUSTER_NAMELEN);
clusterAddNode(node);
clusterAddNodeToShard(node->shard_id, node);
}
void clusterAddNodeToShard(const char *shard_id, clusterNode *node) {
@ -2156,6 +2157,7 @@ void clusterProcessGossipSection(clusterMsg *hdr, clusterLink *link) {
node->tls_port = msg_tls_port;
node->cport = ntohs(g->cport);
clusterAddNode(node);
clusterAddNodeToShard(node->shard_id, node);
}
}
@ -2957,6 +2959,10 @@ int clusterProcessPacket(clusterLink *link) {
clusterNodeAddSlave(master,sender);
sender->slaveof = master;
/* Update the shard_id when a replica is connected to its
* primary in the very first time. */
updateShardId(sender, master->shard_id);
/* Update config. */
clusterDoBeforeSleep(CLUSTER_TODO_SAVE_CONFIG);
}