Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix deadlocks by making CurrentThreadExecutor reentrant and reusing it #493

Closed
wants to merge 3 commits into from

Conversation

andersk
Copy link
Contributor

@andersk andersk commented Feb 5, 2025

When there are multiple CurrentThreadExecutors on the same stack, it seems there’s no good way to decide in advance which one new jobs should be routed to. If we send a job to an executor too low in the stack, something in a higher executor’s queue might deadlock waiting for it to finish (#348?). If we send a job to an executor too high in the stack, that job might deadlock waiting for something in a lower executor’s queue to finish (#492), or the executor might have already terminated and been marked broken (#491).

So instead, make a single CurrentThreadExecutor for the loop, and reuse it on the same stack as many times as necessary, thereby allowing any pending job to run no matter where in the stack we currently are.

It is still possible for a job higher in the stack to deadlock waiting for a job that’s currently executing lower in the stack, but in that case, there’s truly no solution as far as I can see (the user needs to migrate more of their sync code to async, I guess).

@carltongibson
Copy link
Member

carltongibson commented Feb 5, 2025

Hi @andersk — thanks for this. The other maybe related issue I had on my radar was #475.

It's been on my list to go back to pre-3.8.0 and refollow the logic behind each change, but haven't got to any of that 🤹. If we can pin down what's actually happening here, and resolve the issues it would be amazing 🤩

#367 is on the list too. (That was the first one I think.)

@andersk
Copy link
Contributor Author

andersk commented Feb 5, 2025

There’s no question those are related, in that #475 was supposedly fixed by #478 which caused #491 (as verified by running the test case in #491 before and after #478), and #367 caused #492 (as verified similarly).

@andersk
Copy link
Contributor Author

andersk commented Feb 7, 2025

Hmm, load testing this has revealed that it’s prone to putting lots of jobs on the stack at once before they have a chance to complete, leading to RecursionError.

Maybe what we want to do instead is chain each CurrentThreadExecutor to the previous one down in the stack, and when it terminates, have it redirect all its remaining and future jobs there, so that jobs can migrate down the stack but not up? I’ll play with this some more.

@andersk
Copy link
Contributor Author

andersk commented Feb 13, 2025

I implemented that strategy: closing in favor of

@andersk andersk closed this Feb 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants