-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Execute user's callback after setting request status #32
Execute user's callback after setting request status #32
Conversation
When a request completes immediately, the callback is executed before the request method returns, and thus it isn't possible to get the status of the object for which we don't have a reference yet. For now we just hack around those cases (which happen infrequently), but in the future we should find a better way to test this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
To ensure that a user-defined request callback has access to the status, an argument containing the status is now added to the callback function prototype. This allows the callback to know the request status before the `ucxx::Request`s is set, and thus preventing the application from believing the request has completed while the callback did not run yet. To make this change more evident and organized, new types `RequestCallbackUserFunction` and `RequestCallbackUserData` are added.
@wence- I pushed the changes we discussed previously in #26 (comment) via caa129f. It seems that the issue of the request result being set before the callback is executed is now resolved and it seems more stable. However, I still do see segfaults if I run tests in a loop, I'm not sure if they are related to changes in this PR or if they are just showing up now because other issues were resolved. Also the issue I was attempting to workaround with #37 is still present, and is quite annoying since it seems to be only happening frequently only in the |
Can you provide a reproducer? Just |
FWIW, the issues I saw in #26 are not reproducible with this new change. |
Yes, exactly that. |
I guess my system is slightly different (no nvlink, but not sure if that is relevant for these tests). Using
|
I can't remember exactly what tests were segfaulting but I can't reproduce that now, furthermore if I skip the same tests that we're already skipping in CI they all complete. On a DGX-1 running the following in a loop seems to always complete successfully:
I believe those issues may all be related to #24 and we should tackle them in the near future but are probably not related to this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor docstring updates for the new callback
interface, otherwise LGTM!
Co-authored-by: Lawrence Mitchell <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestions are now in, thanks @wence- !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Peter!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving ops-codeowner
file changes
Thanks @wence- and @ajschmidt8 for reviews! |
/merge |
Once the user-defined callback is executed, it's important that the status for the request is already set, this may be used by the callback itself, as for example in the
ucxx::RequestTagMulti
where the request status must be known to ensure it did result in an error.