Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use RK4 to integrate the B-field in time in the hybrid-PIC algorithm #4461

Merged
merged 19 commits into from
Jan 22, 2024

Conversation

roelof-groenewald
Copy link
Member

@roelof-groenewald roelof-groenewald commented Dec 1, 2023

This PR provides a performance improvement in the hybrid-PIC algorithm by using a Runge-Kutta scheme to advance the B-field rather than a simple forward differencing scheme. This allows a lower number of substeps to be used in the field advance which improves the algorithm performance.

This should be merged after #4405 as these changes were branched off that PR.

Todo:

  • Complete RK4 transition
  • Rerun benchmarks to confirm physics accuracy
  • Update CI tests
  • Change default to 10 sub-steps

@roelof-groenewald roelof-groenewald added Performance optimization component: fluid-ohm Related to the Ohm's law solver (with fluid electrons) labels Dec 1, 2023
@aveksler1
Copy link
Contributor

Performance Improvement
The performance improvement from switching from a first order substep procedure to update the B-field (left) to a 4th order RK substep scheme (right) is shown below. ~1.6x faster!
image
Caption: Parallel scaling of the WarpX hybrid-PIC algorithm on Perlmutter GPU nodes with fixed gris size of 148x148x592, while particle count increases proportionally to the compute resources, i.e., a mixed strong and weak scaling test.

Ensuring correct physics
We reran some EM modes and Landau damping test in 1D, 2D, 3D, and RZ as physics tests to make sure the transition to RK4 didn't break anything. The number of substeps was fixed at 20 (a 10 to 50-fold reduction compared to the previous # of substeps required for these tests for numerical stability).

Parallel EM Modes
1d:
spectrum_par_1d_20_substeps_1e-07_eta

3d:
spectrum_par_3d_20_substeps_1e-07_eta

Perpendicular EM Modes

2d:
spectrum_perp_2d_100_substeps_1e-05_eta

Normal Modes of cylindrical plasma (RZ):
normal_modes_disp

Landau Damping
1d:
ion_Landau_damping_1d_T_ratio_0 3333333333333333

2d:
ion_Landau_damping_2d_T_ratio_0 3333333333333333

@roelof-groenewald roelof-groenewald changed the title [WIP] Use RK4 to integrate the B-field in time in the hybrid-PIC algorithm Use RK4 to integrate the B-field in time in the hybrid-PIC algorithm Dec 6, 2023
Copy link
Member

@clarkse clarkse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good and I am excited to use it as well since the previous second order integration scheme needed substantial subcycling to remain stable.

@ax3l ax3l requested review from dpgrote and ax3l December 6, 2023 22:34
Comment on lines +573 to +579
B_old[ii] = MultiFab(
Bfield[lev][ii]->boxArray(), Bfield[lev][ii]->DistributionMap(), 1,
Bfield[lev][ii]->nGrowVect()
);
MultiFab::Copy(B_old[ii], *Bfield[lev][ii], 0, 0, 1, ng);

K[ii] = MultiFab(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am not mistaken, then you can reduce the memory footprint of the currently allocated MultiFabs if you destruct the prior stored on before constructing the next one... cc @WeiqunZhang

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K[ii] is empty before the assignment. In some other situations, yes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ax3l Just to make sure I understand, you mean if K[ii] already held a MultiFab it would be better to destruct it before overwriting with a new MultiFab?

amrex::Vector<std::unique_ptr<amrex::MultiFab>> const& rhofield,
amrex::Vector<std::array< std::unique_ptr<amrex::MultiFab>, 3>> const& edge_lengths,
amrex::Real dt, DtType dt_type,
amrex::IntVect ng, std::optional<bool> nodal_sync);
Copy link
Member

@RemiLehe RemiLehe Jan 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add docstrings for these functions (esp. for the parameter dt_type, which may be the least intuitive)?

Comment on lines +650 to +666
// Subtract B_old from the Bfield for each direction, to get
// B = dt * K2 + 0.5 * dt * K3.
MultiFab::Subtract(*Bfield[lev][ii], B_old[ii], 0, 0, 1, ng);

// Add dt * K2 + 0.5 * dt * K3 to index 0 of K (= 0.5 * dt * K0).
MultiFab::Add(K[ii], *Bfield[lev][ii], 0, 0, 1, ng);

// Add 2 * 0.5 * dt * K1 to index 0 of K.
MultiFab::LinComb(
K[ii], 1.0, K[ii], 0, 2.0, K[ii], 1, 0, 1, ng
);

// Overwrite the Bfield with the Runge-Kutta sum:
// B_new = B_old + 1/3 * dt * (0.5 * K0 + K1 + K2 + 0.5 * K3).
MultiFab::LinComb(
*Bfield[lev][ii], 1.0, B_old[ii], 0, 1.0/3.0, K[ii], 0, 0, 1, ng
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be worth to write all these operations in a single ParallelFor kernel, both for readability and to avoid kernel-launch overheads from multiple kernels?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. I also thought about that as a good follow-up performance improvement but haven't gotten around to it yet since with this new scheme the field solver takes a small amount of runtime in the simulations we are commonly running (so it is not high on the priority list). But of course it would be a good thing to make the code more performant. My current idea is to save this for a time when we have a new team member for which this would be a good task as a way to learn the hybrid-PIC algorithm as well as how to work with WarpX/AMReX.

Copy link
Member

@RemiLehe RemiLehe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thanks for this PR!
I added a few comments that could be addressed in follow-up PRs.

@RemiLehe RemiLehe merged commit d248490 into ECP-WarpX:development Jan 22, 2024
@roelof-groenewald roelof-groenewald deleted the ohms_law_runge_kutta branch January 22, 2024 20:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: fluid-ohm Related to the Ohm's law solver (with fluid electrons) Performance optimization
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants