Use RK4 to integrate the B-field in time in the hybrid-PIC algorithm #4461

roelof-groenewald · 2023-12-01T07:41:38Z

This PR provides a performance improvement in the hybrid-PIC algorithm by using a Runge-Kutta scheme to advance the B-field rather than a simple forward differencing scheme. This allows a lower number of substeps to be used in the field advance which improves the algorithm performance.

This should be merged after #4405 as these changes were branched off that PR.

Todo:

Complete RK4 transition
Rerun benchmarks to confirm physics accuracy
Update CI tests
Change default to 10 sub-steps

Source/FieldSolver/WarpXPushFieldsHybridPIC.cpp

aveksler1 · 2023-12-05T22:22:22Z

Performance Improvement
The performance improvement from switching from a first order substep procedure to update the B-field (left) to a 4th order RK substep scheme (right) is shown below. ~1.6x faster!

Caption: Parallel scaling of the WarpX hybrid-PIC algorithm on Perlmutter GPU nodes with fixed gris size of 148x148x592, while particle count increases proportionally to the compute resources, i.e., a mixed strong and weak scaling test.

Ensuring correct physics
We reran some EM modes and Landau damping test in 1D, 2D, 3D, and RZ as physics tests to make sure the transition to RK4 didn't break anything. The number of substeps was fixed at 20 (a 10 to 50-fold reduction compared to the previous # of substeps required for these tests for numerical stability).

Parallel EM Modes
1d:

3d:

Perpendicular EM Modes

2d:

Normal Modes of cylindrical plasma (RZ):

Landau Damping
1d:

2d:

…lgorithm Co-authored-by: Avigdor Veksler <[email protected]>

* continued transition to RK4 in B-field time integration in hybrid-PIC algorithm * RK-integration over all levels

for more information, see https://pre-commit.ci

Source/FieldSolver/FiniteDifferenceSolver/HybridPICModel/HybridPICModel.cpp

clarkse

I think this looks good and I am excited to use it as well since the previous second order integration scheme needed substantial subcycling to remain stable.

Source/FieldSolver/FiniteDifferenceSolver/HybridPICModel/HybridPICModel.cpp

ax3l · 2023-12-12T00:41:30Z

Source/FieldSolver/FiniteDifferenceSolver/HybridPICModel/HybridPICModel.cpp

+        B_old[ii] = MultiFab(
+            Bfield[lev][ii]->boxArray(), Bfield[lev][ii]->DistributionMap(), 1,
+            Bfield[lev][ii]->nGrowVect()
+        );
+        MultiFab::Copy(B_old[ii], *Bfield[lev][ii], 0, 0, 1, ng);
+
+        K[ii] = MultiFab(


If I am not mistaken, then you can reduce the memory footprint of the currently allocated MultiFabs if you destruct the prior stored on before constructing the next one... cc @WeiqunZhang

K[ii] is empty before the assignment. In some other situations, yes.

@ax3l Just to make sure I understand, you mean if K[ii] already held a MultiFab it would be better to destruct it before overwriting with a new MultiFab?

Co-authored-by: Axel Huebl <[email protected]>

RemiLehe · 2024-01-22T19:40:43Z

Source/FieldSolver/FiniteDifferenceSolver/HybridPICModel/HybridPICModel.H

+        amrex::Vector<std::unique_ptr<amrex::MultiFab>> const& rhofield,
+        amrex::Vector<std::array< std::unique_ptr<amrex::MultiFab>, 3>> const& edge_lengths,
+        amrex::Real dt, DtType dt_type,
+        amrex::IntVect ng, std::optional<bool> nodal_sync);


Could you add docstrings for these functions (esp. for the parameter dt_type, which may be the least intuitive)?

RemiLehe · 2024-01-22T19:48:00Z

Source/FieldSolver/FiniteDifferenceSolver/HybridPICModel/HybridPICModel.cpp

+        // Subtract B_old from the Bfield for each direction, to get
+        // B = dt * K2 + 0.5 * dt * K3.
+        MultiFab::Subtract(*Bfield[lev][ii], B_old[ii], 0, 0, 1, ng);
+
+        // Add dt * K2 + 0.5 * dt * K3 to index 0 of K (= 0.5 * dt * K0).
+        MultiFab::Add(K[ii], *Bfield[lev][ii], 0, 0, 1, ng);
+
+        // Add 2 * 0.5 * dt * K1 to index 0 of K.
+        MultiFab::LinComb(
+            K[ii], 1.0, K[ii], 0, 2.0, K[ii], 1, 0, 1, ng
+        );
+
+        // Overwrite the Bfield with the Runge-Kutta sum:
+        // B_new = B_old + 1/3 * dt * (0.5 * K0 + K1 + K2 + 0.5 * K3).
+        MultiFab::LinComb(
+            *Bfield[lev][ii], 1.0, B_old[ii], 0, 1.0/3.0, K[ii], 0, 0, 1, ng
+        );


Would it be worth to write all these operations in a single ParallelFor kernel, both for readability and to avoid kernel-launch overheads from multiple kernels?

Thanks for the suggestion. I also thought about that as a good follow-up performance improvement but haven't gotten around to it yet since with this new scheme the field solver takes a small amount of runtime in the simulations we are commonly running (so it is not high on the priority list). But of course it would be a good thing to make the code more performant. My current idea is to save this for a time when we have a new team member for which this would be a good task as a way to learn the hybrid-PIC algorithm as well as how to work with WarpX/AMReX.

RemiLehe

Looks good to me! Thanks for this PR!
I added a few comments that could be addressed in follow-up PRs.

roelof-groenewald added Performance optimization component: fluid-ohm Related to the Ohm's law solver (with fluid electrons) labels Dec 1, 2023

roelof-groenewald requested a review from clarkse December 1, 2023 21:54

github-advanced-security bot found potential problems Dec 1, 2023

View reviewed changes

Source/FieldSolver/WarpXPushFieldsHybridPIC.cpp Fixed Show fixed Hide fixed

Source/FieldSolver/WarpXPushFieldsHybridPIC.cpp Fixed Show fixed Hide fixed

roelof-groenewald force-pushed the ohms_law_runge_kutta branch from e110406 to 863f44a Compare December 4, 2023 19:15

roelof-groenewald and others added 15 commits December 5, 2023 14:26

add external current support to the hybrid-PIC solver

a8f6855

add RZ support for FiniteDifferenceSolver::CalculateCurrentAmpere

3ba19c0

allow an initial Bz field to be set in RZ

e2671d4

code cleanup and addition of CI test

1a62d2f

revert unwanted changes

523c3c3

WIP transition to use RK4 in B-field time integration in hybrid-PIC a…

508e649

…lgorithm Co-authored-by: Avigdor Veksler <[email protected]>

fix some clang-tidy issues

bccb49b

Complete BfieldEvolveRK (#2)

890b2df

* continued transition to RK4 in B-field time integration in hybrid-PIC algorithm * RK-integration over all levels

[pre-commit.ci] auto fixes from pre-commit.com hooks

b823c62

for more information, see https://pre-commit.ci

update some of the CI tests

3c1190f

more CI test updates

29906be

Various code cleanups

bd2b553

more updates to CI tests and checksum benchmarks

e0d3205

more updates to CI tests and checksum benchmarks; remove commented code

a68ea7f

update documentation

ff0c9c0

roelof-groenewald force-pushed the ohms_law_runge_kutta branch from 1910fc1 to ff0c9c0 Compare December 5, 2023 22:32

fix 1 for RZ CI test after merging of ECP-WarpX#4464

10ef709

roelof-groenewald changed the title ~~[WIP] Use RK4 to integrate the B-field in time in the hybrid-PIC algorithm~~ Use RK4 to integrate the B-field in time in the hybrid-PIC algorithm Dec 6, 2023

roelof-groenewald requested a review from RemiLehe December 6, 2023 05:23

clarkse reviewed Dec 6, 2023

View reviewed changes

Source/FieldSolver/FiniteDifferenceSolver/HybridPICModel/HybridPICModel.cpp Outdated Show resolved Hide resolved

clarkse approved these changes Dec 6, 2023

View reviewed changes

ax3l requested review from dpgrote and ax3l December 6, 2023 22:34

ax3l assigned RemiLehe Dec 12, 2023

ax3l reviewed Dec 12, 2023

View reviewed changes

roelof-groenewald and others added 3 commits December 11, 2023 17:49

Avoid using const with Real passed by value

0c163c4

Co-authored-by: Axel Huebl <[email protected]>

reduce default number of substeps to 10

dda5707

formatting fix

6ca6249

RemiLehe reviewed Jan 22, 2024

View reviewed changes

RemiLehe approved these changes Jan 22, 2024

View reviewed changes

RemiLehe merged commit d248490 into ECP-WarpX:development Jan 22, 2024

roelof-groenewald deleted the ohms_law_runge_kutta branch January 22, 2024 20:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use RK4 to integrate the B-field in time in the hybrid-PIC algorithm #4461

Use RK4 to integrate the B-field in time in the hybrid-PIC algorithm #4461

roelof-groenewald commented Dec 1, 2023 •

edited

Loading

aveksler1 commented Dec 5, 2023

clarkse left a comment

ax3l Dec 12, 2023

WeiqunZhang Dec 12, 2023

roelof-groenewald Dec 12, 2023

RemiLehe Jan 22, 2024 •

edited

Loading

RemiLehe Jan 22, 2024

roelof-groenewald Jan 22, 2024

RemiLehe left a comment

Use RK4 to integrate the B-field in time in the hybrid-PIC algorithm #4461

Use RK4 to integrate the B-field in time in the hybrid-PIC algorithm #4461

Conversation

roelof-groenewald commented Dec 1, 2023 • edited Loading

aveksler1 commented Dec 5, 2023

clarkse left a comment

Choose a reason for hiding this comment

ax3l Dec 12, 2023

Choose a reason for hiding this comment

WeiqunZhang Dec 12, 2023

Choose a reason for hiding this comment

roelof-groenewald Dec 12, 2023

Choose a reason for hiding this comment

RemiLehe Jan 22, 2024 • edited Loading

Choose a reason for hiding this comment

RemiLehe Jan 22, 2024

Choose a reason for hiding this comment

roelof-groenewald Jan 22, 2024

Choose a reason for hiding this comment

RemiLehe left a comment

Choose a reason for hiding this comment

roelof-groenewald commented Dec 1, 2023 •

edited

Loading

RemiLehe Jan 22, 2024 •

edited

Loading