-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizing GPU acceleration of binary collision algorithms. #4577
Conversation
Thanks for this PR!
but I got the following error:
Note that this error only appears when using |
for more information, see https://pre-commit.ci
Co-authored-by: Remi Lehe <[email protected]>
Source/Particles/Collision/BinaryCollision/NuclearFusion/NuclearFusionFunc.H
Outdated
Show resolved
Hide resolved
Source/Particles/Collision/BinaryCollision/Coulomb/ElasticCollisionPerez.H
Outdated
Show resolved
Hide resolved
@RemiLehe I ran the CPU performance tests as we discussed (using cases 1 - 3 of the Turner benchmarks). The short and good news is that the results matched expectation and performance was NOT worse on CPU due to these changes. So, as discussed, this PR can be merged 🎉 Here are the inclusive timings (note that
|
@roelof-groenewald Thanks a lot for doing these tests! This is great news! |
[Edit by @RemiLehe]: The aim of this PR is to speed up binary collisions on GPU by exposing more parallelism: instead of looping with one GPU thread per cell, we loop with one GPU thread per "number of independent pair" (i.e. pairs that do not touch the same macroparticles, so that there is no race condition), where the number of independent pairs is determined by the lower number of macroparticle of either species, within each cell.
[Updated]: This PR includes optimized GPU implementation for the two particle collision algorithms (Coulomb and Nuclear).
Average Performance Improvement: 4x
Results for a specific collisions-heavy use-case from Dave inputs_1d_H1_lassen2.txt
Built with:
cmake -DWarpX_DIMS=1 -DWarpX_COMPUTE=CUDA
on PerlmutterNew kernel:
WarpX dev branch:
Error Rate
Slightly higher for a few Azure tests leading to failure of CI pipelines. All good otherwise.
TODO (@RemiLehe):
#ifdef
to use the new code only with GPU