Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-disaggregated performance 怎么测试 #89

Closed
ToBeResumed opened this issue Jan 22, 2025 · 2 comments
Closed

Non-disaggregated performance 怎么测试 #89

ToBeResumed opened this issue Jan 22, 2025 · 2 comments

Comments

@ToBeResumed
Copy link

在vllm_benchmark result中给出的几种测试对比组,请问Non-disaggregated模式是怎么设置的,对应步骤里哪步?

@ShangmingCai
Copy link
Collaborator

@ToBeResumed Non-disaggregated的实验你直接通过vllm起实例就可以了,通过传入参数开启chunked-prefill,比如

CUDA_VISIBLE_DEVICES=0 python3 \
    -m vllm.entrypoints.openai.api_server \
    --model $model \
    --port 8100 \  
    --max-model-len 10000 \ 
    --enable-chunked-prefill \
    --gpu-memory-utilization 0.8

@ToBeResumed
Copy link
Author

ToBeResumed commented Jan 23, 2025

谢谢解答,新年快乐

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants