triton-inference-server / server Public

Notifications You must be signed in to change notification settings
Fork 1.5k
Star 8.6k

Code
Issues 621
Pull requests 64
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: triton-inference-server/server

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

621 Open 3,229 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Expected model dimensions when expected shape is not suitable to batch

#7981 opened Jan 31, 2025 by codeofdutyAI

[Question] triton-client numpy 2 support

#7979 opened Jan 30, 2025 by john-pixforce

Pytorch backend: Model is run in no_grad mode even with INFERENCE_MODE=false

#7974 opened Jan 28, 2025 by hakanardo

Method 'forward' is not defined error !

#7968 opened Jan 26, 2025 by MHmi1

vLLM backend Hugging Face feature branch model loading enhancement

New feature or request

#7963 opened Jan 23, 2025 by knitzschke

unexpected throughput results - Increasing instance group count VS deploying the count distributed on the same card using shared computing windows performance

A possible performance tune-up

#7956 opened Jan 21, 2025 by ariel291888

How to start/expose the metrics end point of the Triton Server via openai_frontend/main.py arguments

#7954 opened Jan 21, 2025 by shuknk8s

Segmentation Fault error when crafting pb_utils.Tensor object Triton BLS model bug

Something isn't working

#7953 opened Jan 18, 2025 by carldomond7

Failed to launch triton-server：”error: creating server: Internal - failed to load all models“ module: backends

Issues related to the backends

#7950 opened Jan 17, 2025 by pzydzh

build.py broken in r24.11 bug

Something isn't working

#7939 opened Jan 15, 2025 by prm-james-hill

Triton crashes with SIGSEGV crash

Related to server crashes, segfaults, etc.

#7938 opened Jan 15, 2025 by ctxqlxs

[Question] Are the libnvinfer_builder_resources necessary in the triton image ? question

Further information is requested

#7932 opened Jan 14, 2025 by MatthieuToulemont

Server build with python BE failing due to missing Boost lib

#7925 opened Jan 9, 2025 by buddhapuneeth

running triton as a inference service on host

#7915 opened Jan 3, 2025 by sriram-dsl

OpenAI-Compatible Frontend should support world_size larger than 1 enhancement

New feature or request

#7914 opened Jan 3, 2025 by cocodee

vllm_backend: What is the right way to use downloaded model + model.json together? question

Further information is requested

#7912 opened Jan 2, 2025 by kyoungrok0517

Python backend with multiple instances cause unexpected and non-deterministic results bug

Something isn't working

#7907 opened Dec 25, 2024 by NadavShmayo

MIG deployment of triton cause "CacheManager Init Failed. Error: -17" bug

Something isn't working

#7906 opened Dec 25, 2024 by LSC527

Shared memory io bottleneck? performance

A possible performance tune-up

#7905 opened Dec 24, 2024 by wensimin

Support for guided decoding for vllm backend enhancement

New feature or request

#7897 opened Dec 20, 2024 by Inkorak

How Triton inference server always compare the current frame infer result with the previous one question

Further information is requested

#7893 opened Dec 19, 2024 by Komoro2023

async execute is not run concurrently bug

Something isn't working

#7888 opened Dec 17, 2024 by ShuaiShao93

Unable to open shared memory region

#7887 opened Dec 17, 2024 by zjhong12581

Error when using ONNX with TensorRT (ORT-TRT) Optimization on Multi-GPU bug

Something isn't working

#7885 opened Dec 16, 2024 by efajardo-nv

Manual warmup per model instance / specify warmup config dynamically using c api

#7884 opened Dec 16, 2024 by asaff1

Previous 1 2 3 4 5 … 24 25 Next

Previous Next

ProTip! What’s not been updated in a month: updated:<2025-01-01.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly