You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 11, 2022. It is now read-only.
I have posted a question regarding the ordering of blocks' execution when there are more blocks than there can be on a device at a given point in time. Nvidia's dev answered that: 1) we cannot assume anything about the order of block's execution
and 2) if block becomes resident, it does not retire until all threads in the block have gone to competition. (https://devtalk.nvidia.com/default/topic/1044740/performance-cost-of-too-many-blocks-/?offset=8#5301239).
However, many blocks uses hundreds of thousands of blocks, and the blocks have to be executed in the order of sample and layer ( 1st layer of 1st sample should be completed for 2nd layer of 1st sample to proceed, and 1st sample should be completed for 2nd sample can proceed ). I have read every line of nv_wavenet_persistent.cuh, and it seems like either one of the two things that the dev has said has to be wrong. Either you can specify the order of blocks, or block can be taken out of execution even if it has not gone to completion ( you use an infinite while-loop to make a block wait for the previous layer, and use block-wise synchronization to make sure that previous sample has been created. Maybe one of these causes block's early retiring? ). Or is it the "barrier.sync" PTX code that is ensuring the correctness of execution of blocks?
Thanks
The text was updated successfully, but these errors were encountered:
isaacleeai
changed the title
how are you ensuring that all the blocks are executed in order?
[manyblock mode] how are you ensuring that all the blocks are executed in order?
Dec 4, 2018
The code is relying on thread block launch ordering which, as was described in that thread is a gray area that we shouldn't be depending on, so it's a bug in the code :)
A correct implementation will use atomics to determine the ordered block indices.
I wanted to make sure there wasn't any code I overlooked, as I am trying to build a parallel version of wavenet from studying your code :)
I was thinking cooperative groups would do the trick ( by only allocating as many blocks as there can be on the device at a given point in time and having the blocks go through multiple iterations for each sample ). What do you mean using atomics? Could you please refer me to some literature regarding "atomics" ( as you aren't talking about atomic arithmetic operations and mutex, or are you? )?
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
I have posted a question regarding the ordering of blocks' execution when there are more blocks than there can be on a device at a given point in time. Nvidia's dev answered that: 1) we cannot assume anything about the order of block's execution
and 2) if block becomes resident, it does not retire until all threads in the block have gone to competition. (https://devtalk.nvidia.com/default/topic/1044740/performance-cost-of-too-many-blocks-/?offset=8#5301239).
However, many blocks uses hundreds of thousands of blocks, and the blocks have to be executed in the order of sample and layer ( 1st layer of 1st sample should be completed for 2nd layer of 1st sample to proceed, and 1st sample should be completed for 2nd sample can proceed ). I have read every line of nv_wavenet_persistent.cuh, and it seems like either one of the two things that the dev has said has to be wrong. Either you can specify the order of blocks, or block can be taken out of execution even if it has not gone to completion ( you use an infinite while-loop to make a block wait for the previous layer, and use block-wise synchronization to make sure that previous sample has been created. Maybe one of these causes block's early retiring? ). Or is it the "barrier.sync" PTX code that is ensuring the correctness of execution of blocks?
Thanks
The text was updated successfully, but these errors were encountered: