-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Issues: triton-inference-server/server
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Milestones
Assignee
Sort
Issues list
Expected model dimensions when expected shape is not suitable to batch
#7981
opened Jan 31, 2025 by
codeofdutyAI
Pytorch backend: Model is run in no_grad mode even with INFERENCE_MODE=false
#7974
opened Jan 28, 2025 by
hakanardo
vLLM backend Hugging Face feature branch model loading
enhancement
New feature or request
#7963
opened Jan 23, 2025 by
knitzschke
unexpected throughput results - Increasing instance group count VS deploying the count distributed on the same card using shared computing windows
performance
A possible performance tune-up
#7956
opened Jan 21, 2025 by
ariel291888
How to start/expose the metrics end point of the Triton Server via openai_frontend/main.py arguments
#7954
opened Jan 21, 2025 by
shuknk8s
Segmentation Fault error when crafting pb_utils.Tensor object Triton BLS model
bug
Something isn't working
#7953
opened Jan 18, 2025 by
carldomond7
Failed to launch triton-server:”error: creating server: Internal - failed to load all models“
module: backends
Issues related to the backends
#7950
opened Jan 17, 2025 by
pzydzh
Triton crashes with SIGSEGV
crash
Related to server crashes, segfaults, etc.
#7938
opened Jan 15, 2025 by
ctxqlxs
[Question] Are the libnvinfer_builder_resources necessary in the triton image ?
question
Further information is requested
#7932
opened Jan 14, 2025 by
MatthieuToulemont
Server build with python BE failing due to missing Boost lib
#7925
opened Jan 9, 2025 by
buddhapuneeth
OpenAI-Compatible Frontend should support world_size larger than 1
enhancement
New feature or request
#7914
opened Jan 3, 2025 by
cocodee
vllm_backend: What is the right way to use downloaded model + Further information is requested
model.json
together?
question
#7912
opened Jan 2, 2025 by
kyoungrok0517
Python backend with multiple instances cause unexpected and non-deterministic results
bug
Something isn't working
#7907
opened Dec 25, 2024 by
NadavShmayo
MIG deployment of triton cause "CacheManager Init Failed. Error: -17"
bug
Something isn't working
#7906
opened Dec 25, 2024 by
LSC527
Shared memory io bottleneck?
performance
A possible performance tune-up
#7905
opened Dec 24, 2024 by
wensimin
Support for guided decoding for vllm backend
enhancement
New feature or request
#7897
opened Dec 20, 2024 by
Inkorak
How Triton inference server always compare the current frame infer result with the previous one
question
Further information is requested
#7893
opened Dec 19, 2024 by
Komoro2023
async execute is not run concurrently
bug
Something isn't working
#7888
opened Dec 17, 2024 by
ShuaiShao93
Error when using ONNX with TensorRT (ORT-TRT) Optimization on Multi-GPU
bug
Something isn't working
#7885
opened Dec 16, 2024 by
efajardo-nv
Manual warmup per model instance / specify warmup config dynamically using c api
#7884
opened Dec 16, 2024 by
asaff1
Previous Next
ProTip!
What’s not been updated in a month: updated:<2025-01-01.