We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The current implementation in https://github.com/ROCm/aiter/blob/main/csrc/kernels/moe_align_block_size_kernels.cu does not support multi-blocks execution.
This has been enabled in latest sglang. We could suport it here:
align_kernel<<<1, 1024, 0, stream>>>(...); const int block_threads = 256; const int num_blocks = (topk_ids.numel() + block_threads - 1) / block_threads; const int max_blocks = 65535; const int actual_blocks = std::min(num_blocks, max_blocks); auto sort_kernel = moe_token_sort_kernel<scalar_t>; sort_kernel<<<actual_blocks, block_threads, 0, stream>>>(...);
In this PR sgl-project/sglang#3347
No response
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Suggestion Description
The current implementation in https://github.com/ROCm/aiter/blob/main/csrc/kernels/moe_align_block_size_kernels.cu does not support multi-blocks execution.
This has been enabled in latest sglang. We could suport it here:
In this PR sgl-project/sglang#3347
Operating System
No response
GPU
No response
ROCm Component
No response
The text was updated successfully, but these errors were encountered: