Prevent Subscriber.seq
data race and slow replay emits from filling subscriber outbox
#45
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
#43 without (hopefully) its potentially-livetail-blocking side-effect.
During a small window around cutover from replay to live-tail, both goroutines are concurrently pushing messages to the subscriber's outbox, and read-modify-write'ing the subscriber's
seq
as part of that. This data race can be detected with golang's race detector, so jetstream currently has some undefined behaviour.This change swaps the racing
int64
for its atomic counterpart. It's probably disappointing. It doesn't fix the logical data race (we still read-modify-write concurrently) but does address the undefined behaviour in the binary. Plus, atomics let us write with aSwap
, so we can actually detect when the race happens and at least know about it.As a result, the race detector no longer complains. Out-of-order events are still reproducible for me locally with the repro from #43, and made slightly easier to test by adding a
--per-ip-limit=0
flag for local dev, so that multiple websockets from localhost are not combined into a single ratelimiter. (by default the current per-ip limiting behaviour is retained)Finally, in an attempt to reduce the problems reported in #27 and #31, the subscriber outbox is monitored when emitting during replay to slow down if it reaches 1/3 of its capacity. Since replay uses blocking sends, we don't really need to use the full outbox capacity, and this gives the client a better chance of surviving cutover since it will hopefully make it unlikely that their outbox is near-full at that time. (more detail in the last commit)