[manyblock mode] how are you ensuring that all the blocks are executed in order? #80

isaacleeai · 2018-12-04T00:37:17Z

I have posted a question regarding the ordering of blocks' execution when there are more blocks than there can be on a device at a given point in time. Nvidia's dev answered that: 1) we cannot assume anything about the order of block's execution
and 2) if block becomes resident, it does not retire until all threads in the block have gone to competition. (https://devtalk.nvidia.com/default/topic/1044740/performance-cost-of-too-many-blocks-/?offset=8#5301239).

However, many blocks uses hundreds of thousands of blocks, and the blocks have to be executed in the order of sample and layer ( 1st layer of 1st sample should be completed for 2nd layer of 1st sample to proceed, and 1st sample should be completed for 2nd sample can proceed ). I have read every line of nv_wavenet_persistent.cuh, and it seems like either one of the two things that the dev has said has to be wrong. Either you can specify the order of blocks, or block can be taken out of execution even if it has not gone to completion ( you use an infinite while-loop to make a block wait for the previous layer, and use block-wise synchronization to make sure that previous sample has been created. Maybe one of these causes block's early retiring? ). Or is it the "barrier.sync" PTX code that is ensuring the correctness of execution of blocks?

Thanks

BrianPharris · 2018-12-04T21:56:57Z

The code is relying on thread block launch ordering which, as was described in that thread is a gray area that we shouldn't be depending on, so it's a bug in the code :)

A correct implementation will use atomics to determine the ordered block indices.

isaacleeai · 2018-12-05T00:10:58Z

Oh okay, thanks.

I wanted to make sure there wasn't any code I overlooked, as I am trying to build a parallel version of wavenet from studying your code :)

I was thinking cooperative groups would do the trick ( by only allocating as many blocks as there can be on the device at a given point in time and having the blocks go through multiple iterations for each sample ). What do you mean using atomics? Could you please refer me to some literature regarding "atomics" ( as you aren't talking about atomic arithmetic operations and mutex, or are you? )?

isaacleeai changed the title ~~how are you ensuring that all the blocks are executed in order?~~ [manyblock mode] how are you ensuring that all the blocks are executed in order? Dec 4, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[manyblock mode] how are you ensuring that all the blocks are executed in order? #80

[manyblock mode] how are you ensuring that all the blocks are executed in order? #80

isaacleeai commented Dec 4, 2018

BrianPharris commented Dec 4, 2018

isaacleeai commented Dec 5, 2018 •

edited

Loading

[manyblock mode] how are you ensuring that all the blocks are executed in order? #80

[manyblock mode] how are you ensuring that all the blocks are executed in order? #80

Comments

isaacleeai commented Dec 4, 2018

BrianPharris commented Dec 4, 2018

isaacleeai commented Dec 5, 2018 • edited Loading

isaacleeai commented Dec 5, 2018 •

edited

Loading