You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been trying to trace NCCL kernels (ReduceScatter in this example).
I'm seeing many invocations of a CALL instruction and it seems like that the trace generated is incomplete.
What's weird is that this only happens after a vector size of 128kB. I can trace reducescatter kernels for vector sizes smaller than 128kB with no issues and correlate them fairly well on hardware.
Any advice would help on interpreting this and possible solutions.
The text was updated successfully, but these errors were encountered:
Hi,
I've been trying to trace NCCL kernels (ReduceScatter in this example).
I'm seeing many invocations of a CALL instruction and it seems like that the trace generated is incomplete.
What's weird is that this only happens after a vector size of 128kB. I can trace reducescatter kernels for vector sizes smaller than 128kB with no issues and correlate them fairly well on hardware.
Any advice would help on interpreting this and possible solutions.
The text was updated successfully, but these errors were encountered: