Limit build to certain quants only #1041

vladfaust · 2024-12-06T08:25:34Z

vladfaust
Dec 6, 2024

I deploy llama.cpp on a CUDA server with exact quant known in advance (say, Q4K_M with FA enabled); it'd be neat to be able to skip building all the unnecessary optimized kernels, e.g. mmq-instance-q3_k.cu etc., as they're taking significant time to build.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit build to certain quants only #1041

{{title}}

Replies: 0 comments

Select a reply

Limit build to certain quants only #1041

vladfaust Dec 6, 2024

Replies: 0 comments

vladfaust
Dec 6, 2024