Convert background cluster networking calls to async - Gossip, Replication #914

msft-paddy14 · 2025-01-14T07:41:43Z

There are multiple cluster networking calls that are being converted to async to improve performance and responsiveness of server. Generally there are cross node communication in a mesh like metadata communication system, if a node is unresponsive in this mesh and we're making synchronous calls we can block the .NET threadpool and cause thread starvation. .NET will eventually create more threads if Max limit is not there but that will cause memory overhead and perf hit. So it's safer to convert these calls to async.
There should be no major overhead of async state machine in non critical path like these but the gains in the scenarios described above are immense.

Additional minor change to remove Meet Lock:
The lock is not really protecting anything in this context as we still proceed with Meet rather than returning the thread until it's available. So doesn't seem like we really need a lock. In some cases during startup if there are multiple meets (and one or more nodes are unresponsive), it can cause CAS retries during it's release so better to remove it if it's not needed.
Also note that MEET is currently Blocking and consumes a threadpool thread.

CPU stacks indicating WriteUnlock taking time

vazois · 2025-01-14T17:21:12Z

So, the meet lock is there to throttle bursts of meet requests. The same connection is used for gossip and if that gets delayed a timeout will occur which will break and re-initialize the connection, unless gossip-delay is increases. I would make the WriteLock async if that is an issue.

vazois · 2025-01-15T16:42:33Z

So, the meet lock is there to throttle bursts of meet requests. The same connection is used for gossip and if that gets delayed a timeout will occur which will break and re-initialize the connection, unless gossip-delay is increases. I would make the WriteLock async if that is an issue.

Should be fine to go ahead and remove that lock

vazois

Looks good!

libs/cluster/Server/Gossip.cs

libs/cluster/Server/GarnetServerNode.cs

…hub.com/microsoft/garnet into users/padgupta/remove_cluster_meet_lock

msft-paddy14 added 2 commits January 14, 2025 13:07

Remove meet lock

f5ea81b

Fix newline at end of GarnetServerNode.cs file

fb5996b

vazois self-requested a review January 14, 2025 17:23

msft-paddy14 and others added 3 commits January 15, 2025 07:56

fix formatting

156319d

Make all background connection handling async

2cebec7

formatting fix

364eac5

msft-paddy14 changed the title ~~Remove meet lock in TryMeet~~ Convert background cluster networking calls to async - Gossip, Replication Jan 15, 2025

vazois approved these changes Jan 15, 2025

View reviewed changes

Merge branch 'main' into users/padgupta/remove_cluster_meet_lock

e659503

badrishc reviewed Jan 15, 2025

View reviewed changes

libs/cluster/Server/Gossip.cs Outdated Show resolved Hide resolved

badrishc reviewed Jan 15, 2025

View reviewed changes

libs/cluster/Server/GarnetServerNode.cs Outdated Show resolved Hide resolved

msft-paddy14 added 2 commits January 16, 2025 10:47

address comments for async/await syntax

e56af59

Merge branch 'users/padgupta/remove_cluster_meet_lock' of https://git…

28e4084

…hub.com/microsoft/garnet into users/padgupta/remove_cluster_meet_lock

badrishc approved these changes Jan 16, 2025

View reviewed changes

Merge branch 'main' into users/padgupta/remove_cluster_meet_lock

6cd9f44

TalZaccai merged commit 18fce91 into main Jan 16, 2025
18 checks passed

TalZaccai deleted the users/padgupta/remove_cluster_meet_lock branch January 16, 2025 19:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert background cluster networking calls to async - Gossip, Replication #914

Convert background cluster networking calls to async - Gossip, Replication #914

msft-paddy14 commented Jan 14, 2025 •

edited

Loading

vazois commented Jan 14, 2025

vazois commented Jan 15, 2025

vazois left a comment

Convert background cluster networking calls to async - Gossip, Replication #914

Convert background cluster networking calls to async - Gossip, Replication #914

Conversation

msft-paddy14 commented Jan 14, 2025 • edited Loading

vazois commented Jan 14, 2025

vazois commented Jan 15, 2025

vazois left a comment

Choose a reason for hiding this comment

msft-paddy14 commented Jan 14, 2025 •

edited

Loading