Fix data race of sub.seq during replay -> live tail cutover #43
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Prevent a data race that leads to out-of-order events being delivered during cutover from replay to live-tailing, partially resolving #42.
emitToSubscriber
reads and writessub.seq
, but does not itself lock the subscriber'ssub.lk
. It can't, because its main caller, the serverEmit
function, takes that lock across its call toemitToSubscriber
.The other caller is the replay goroutine, which runs concurrently, and did not take the lock, leaving
sub.seq
exposed to races when both it andEmit
ran concurrently (for ~0.5s during cutover).This change makes the replay goroutine take the lock before calling
emitToSubscriber
, protectingsub.seq
from this race.I've verified locally that jetstream still works and cuts over, and created a pair of repro scripts that (try to) trigger the data race: https://gist.github.com/uniphil/346acb62088022729394d2324bf2ad8a
With the changes from this PR, the local repro goes from almost always triggering the race to never in all tests so far -- see sample output in the linked gist. I'll verify prod again if this gets merged and deployed 🙂