Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: enable multi-blocks execution for moe align kernel #107

Open
yiakwy-xpu-ml-framework-team opened this issue Feb 7, 2025 · 0 comments

Comments

@yiakwy-xpu-ml-framework-team
Copy link

yiakwy-xpu-ml-framework-team commented Feb 7, 2025

Suggestion Description

The current implementation in https://github.com/ROCm/aiter/blob/main/csrc/kernels/moe_align_block_size_kernels.cu does not support multi-blocks execution.

This has been enabled in latest sglang. We could suport it here:

   align_kernel<<<1, 1024, 0, stream>>>(...);

    const int block_threads = 256;
    const int num_blocks = (topk_ids.numel() + block_threads - 1) / block_threads;
    const int max_blocks = 65535;
    const int actual_blocks = std::min(num_blocks, max_blocks);

    auto sort_kernel = moe_token_sort_kernel<scalar_t>;
    sort_kernel<<<actual_blocks, block_threads, 0, stream>>>(...);

In this PR sgl-project/sglang#3347

Operating System

No response

GPU

No response

ROCm Component

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant