We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
在vllm_benchmark result中给出的几种测试对比组,请问Non-disaggregated模式是怎么设置的,对应步骤里哪步?
The text was updated successfully, but these errors were encountered:
@ToBeResumed Non-disaggregated的实验你直接通过vllm起实例就可以了,通过传入参数开启chunked-prefill,比如
CUDA_VISIBLE_DEVICES=0 python3 \ -m vllm.entrypoints.openai.api_server \ --model $model \ --port 8100 \ --max-model-len 10000 \ --enable-chunked-prefill \ --gpu-memory-utilization 0.8
Sorry, something went wrong.
谢谢解答,新年快乐
No branches or pull requests
在vllm_benchmark result中给出的几种测试对比组,请问Non-disaggregated模式是怎么设置的,对应步骤里哪步?
The text was updated successfully, but these errors were encountered: