Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CK_TILE] Improve RMS/Layer Normalization 2 Pass Pipeline Performance #1861

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from

Conversation

ruanjm
Copy link
Contributor

@ruanjm ruanjm commented Feb 5, 2025

Improve RMS Normalization Performance
Mainly forcus on tile shape 64 x 65536.

Note

Current setting may not be optimal for other tensor shapes especially for the cases that m is not 64.

  1. Set RepeatN as 1 for vectorized-8 cases.
  2. Fine tuned thread grid size for various vectorized size under 64 x * cases. Note that performance tests are based on aiter test https://github.com/ROCm/aiter/blob/main/op_tests/test_rmsnorm2d.py. Smoke tests in CK (https://github.com/ROCm/composable_kernel/tree/develop/example/ck_tile/10_rmsnorm2d/script) is only used for reference.
  3. In fused-add-store cases, use residual buffer in 2nd pass for saving memory transactions and calculations.

@ruanjm ruanjm force-pushed the amd/dev/jruan/norm_perf branch from 886b80d to 346e56b Compare February 6, 2025 07:03
@ruanjm ruanjm changed the title [CK_TILE] Improve RMS Normalization 2 Pass Pipeline Performance [CK_TILE] Improve RMS/Layer Normalization 2 Pass Pipeline Performance Feb 6, 2025
@samjwu samjwu requested a review from a team as a code owner February 7, 2025 22:12
Copy link
Contributor

@spolifroni-amd spolifroni-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing to review for docs.

@ruanjm ruanjm force-pushed the amd/dev/jruan/norm_perf branch from 346e56b to 28f93c9 Compare February 12, 2025 03:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants