Misc. bug: commit 5f0db95 breaks model loading on some AMD gpus #11405

daniandtheweb · 2025-01-25T02:43:07Z

Name and Version

version: 4549 (466ea66)
built with cc (GCC) 14.2.1 20240910 for x86_64-pc-linux-gnu

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-server, llama-bench

Command line

./llama-bench -m ~/Applications/chat/gguf/llama-2-7b.Q4_0.gguf -ngl 100

Problem description & steps to reproduce

Commit 5f0db95, specifically VMM support, seems to break model loading on Radeon RX 5700 XT.

Every model I try to load isn't able to load properly, it doesn't matter how small the model is.

Disabling VMM at build time with GGML_CUDA_NO_VMM=ON solves the issue.

First Bad Commit

5f0db95

Relevant log output

./llama-bench -m ~/Applications/chat/gguf/llama-2-7b.Q4_0.gguf -ngl 100                                                  0.260s 
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 5700 XT, compute capability 10.1, VMM: yes
| model                          |       size |     params | backend    | ngl |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | -------------------: |
/home/daniandtheweb/Applications/chat/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:359: HipVMM Failure: out of memory

ptrace: Operation not permitted.
No stack.
The program is not being run.
zsh: IOT instruction (core dumped)  ./llama-bench -m ~/Applications/chat/gguf/llama-2-7b.Q4_0.gguf -ngl 100

The text was updated successfully, but these errors were encountered:

ggerganov · 2025-01-25T09:21:57Z

cc @IMbackK

IMbackK · 2025-01-25T09:28:32Z

RX 5700 XT and rdna1 in general is not supported by rocm, so probably vmm is just broken there. But to cover our bases, could you specify the versions of rocm/ rocr you are using? also are you running amdgpu-dkms with kfd or are you running amdgpu from the mainline linux kernel?

I would recommend trying the second reproducer here ROCm/ROCR-Runtime#285 (the one thats supposed to work) and filing another issue against rocr.

Anyhow we will have to disable vmm on RDNA1 in the mean time.

daniandtheweb · 2025-01-25T15:23:28Z

I'm currently building rocm 6.1.2. rocm and, as you said, it isn't officially supported on rdna1.
I'm currently running the official amdgpu from the stock kernel on Arch.

IMbackK · 2025-01-25T15:29:45Z

Ok, could you try rocm 6.2.4 as shipped in the arch repos?

Also, not a fix but could you try with https://github.com/ROCm/ROCK-Kernel-Driver? this is what amd themselves uses and rocr when running on this kernel uses an entirely different kernel interface to alloc vram.

We need to know what configurations exactly to disable vmm on.

I personally only tested MI100 and RX6800XT on rocm 6.3 with current git rocr with both the kfd and the regular drm kernel path

MangoTCF · 2025-01-25T15:33:35Z

I'm experiencing the same on my Framework 16, both on the discrete(RX7700S, gfx1102) and integrated(Radeon 780M,gfx1103) GPUs.

common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
/home/mango/programming/ai/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:359: HipVMM Failure: out of memory

ptrace: Operation not permitted.
No stack.
The program is not being run.
[1]    4798 IOT instruction (core dumped)  build/bin/llama-server -m  -ngl 256 --host 0.0.0.0 --port 30001 -fa -t 16 -sp

Using kernel 6.13.0-arch1-1
And ROCm version 6.2.41134-0 from official Arch repos, with this addition for the iGPU to work

Disabling VMM with -DGGML_CUDA_NO_VMM=1 works as far as I tested

IMbackK · 2025-01-25T15:36:29Z

hmm ok thats annoying, i gues ill disable it for anything except gfx9 and gfx103x for now

IMbackK · 2025-01-25T15:37:12Z

Could you guys try the repoducer i linked above and file an issue against rocr?

MangoTCF · 2025-01-25T15:43:54Z

Could you guys try the repoducer i linked above and file an issue against rocr?

I might be stupid, but this means i have to build the kernel in the repo, right?

MangoTCF · 2025-01-25T15:46:23Z

hmm ok thats annoying, i gues ill disable it for anything except gfx9 and gfx103x for now

I also can test on a gfx1030 machine (RX6700 iirc)

daniandtheweb · 2025-01-25T15:50:03Z

Ok, could you try rocm 6.2.4 as shipped in the arch repos?

Also, not a fix but could you try with https://github.com/ROCm/ROCK-Kernel-Driver? this is what amd themselves uses and rocr when running on this kernel uses an entirely different kernel interface to alloc vram.

We need to know what configurations exactly to disable vmm on.

I personally only tested MI100 and RX6800XT on rocm 6.3 with current git rocr with both the kfd and the regular drm kernel path

I've just tested on arch's package and the issue is the same.

Here's the output of the reproducer that you've posted in the rocr issue:

Device recommended granularity 4096
Error: hipErrorOutOfMemory: out of memory at 18
reserved pool at (nil)
Error: hipErrorInvalidValue: invalid argument at 29
unmapping 12288 at (nil)
Error: hipErrorInvalidValue: invalid argument at 44
Freeing virtual space 34359738368 at (nil)
Error: hipErrorInvalidValue: invalid argument at 46

IMbackK · 2025-01-25T15:53:13Z

Could you guys try the repoducer i linked above and file an issue against rocr?

I might be stupid, but this means i have to build the kernel in the repo, right?

these are different things, the reproducer is useful to file a bug against rocr using the kfd kernel may solve the issue (but it works fine on the mainline kernel here)

Ok, could you try rocm 6.2.4 as shipped in the arch repos?
Also, not a fix but could you try with https://github.com/ROCm/ROCK-Kernel-Driver? this is what amd themselves uses and rocr when running on this kernel uses an entirely different kernel interface to alloc vram.
We need to know what configurations exactly to disable vmm on.
I personally only tested MI100 and RX6800XT on rocm 6.3 with current git rocr with both the kfd and the regular drm kernel path

I've just tested on arch's package and the issue is the same.

Here's the output of the reproducer that you've posted in the rocr issue:
Device recommended granularity 4096
Error: hipErrorOutOfMemory: out of memory at 18
reserved pool at (nil)
Error: hipErrorInvalidValue: invalid argument at 29
unmapping 12288 at (nil)
Error: hipErrorInvalidValue: invalid argument at 44
Freeing virtual space 34359738368 at (nil)
Error: hipErrorInvalidValue: invalid argument at 46

Please take this result and the reproducer and file anthoer issue agains rocr with as mutch info as possible.

daniandtheweb · 2025-01-25T15:55:04Z

Please take this result and the reproducer and file anthoer issue agains rocr with as mutch info as possible.

Is it worth reporting even if the card isn't supported?

MangoTCF · 2025-01-25T15:56:36Z

hmm ok thats annoying, i gues ill disable it for anything except gfx9 and gfx103x for now

I also can test on a gfx1030 machine (RX6700 iirc)

ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 6700 XT, compute capability 10.3, VMM: yes
...
...
...
/home/mango/neuro/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:359: HipVMM Failure: out of memory

ptrace: Operation not permitted.
No stack.
The program is not being run.
[1]    2276539 IOT instruction (core dumped)  build/bin/llama-server -m ..//models/txt/L3-8B-Stheno-3.2.gguf -t 12 -ngl

..Yeah, it failed too.

uname -a
Linux mtcf.local 6.12.9-arch1-1 #1 SMP PREEMPT_DYNAMIC Fri, 10 Jan 2025 00:39:41 +0000 x86_64 GNU/Linux

With same HIP from official repos

MangoTCF · 2025-01-25T15:57:43Z

these are different things, the reproducer is useful to file a bug against rocr using the kfd kernel may solve the issue (but it works fine on the mainline kernel here)

I don't understand how this works then... Sry, I didn't touch the kernel side of Linux too much at all

IMbackK · 2025-01-25T15:57:55Z

Hmm thats even more strange now since it works fine on gfx1030 here

MangoTCF · 2025-01-25T15:59:40Z

Disabling VMM on the gfx1030 machine works too

IMbackK · 2025-01-25T16:09:17Z

OK so i downgraded everything to the regular arch linux packages incl the arch linux 6.12.10.arch1-1 kernel and it still works fine:

% ./bin/llama-bench -m /home/philipp/machine-lerning/Models/llms/GGUF/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf -ngl 99 
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 6800 XT, compute capability 10.3, VMM: yes
| model                          |       size |     params | backend    | ngl |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | -------------------: |
| llama 8B Q4_K - Medium         |   4.58 GiB |     8.03 B | ROCm       |  99 |         pp512 |       1195.33 ± 1.12 |
| llama 8B Q4_K - Medium         |   4.58 GiB |     8.03 B | ROCm       |  99 |         tg128 |         65.98 ± 0.04 |

so lacking a reasonable explanation for why this works here but not in your cases, im just going to disable it by default for now.

IMbackK · 2025-01-25T16:09:59Z

Please take this result and the reproducer and file anthoer issue agains rocr with as mutch info as possible.

Is it worth reporting even if the card isn't supported?

Please do, esp with the reports from the other machines here should be fine.

MangoTCF · 2025-01-25T16:13:07Z

If that helps, here's the backtrace from the gfx1102

#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x00007ffff4a773a3 in __pthread_kill_internal (threadid=<optimized out>, signo=6) at pthread_kill.c:78
#2  0x00007ffff4a1e120 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007ffff4a054c3 in __GI_abort () at abort.c:79
#4  0x00007ffff7cf5b3a in ggml_abort (file=0x7ffff501ab78 "/home/mango/programming/ai/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu", 
    line=359, fmt=0x7ffff5022129 "HipVMM Failure: %s\n") at /home/mango/programming/ai/llama.cpp/ggml/src/ggml.c:180
#5  0x00007ffff7b5cc30 in ggml_cuda_pool_vmm::alloc (this=0x5555686a6ba0, size=4608, actual_size=0x7fffffff7c78)
    at /home/mango/programming/ai/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:359
#6  0x00007ffff7b59bde in ggml_cuda_pool_alloc<char>::alloc (this=0x7fffffff7c68, size=4608)
    at /home/mango/programming/ai/llama.cpp/ggml/src/ggml-hip/../ggml-cuda/common.cuh:563
#7  ggml_cuda_pool_alloc<char>::alloc (this=0x7fffffff7c68, pool=..., size=4608)
    at /home/mango/programming/ai/llama.cpp/ggml/src/ggml-hip/../ggml-cuda/common.cuh:569
#8  ggml_cuda_op_mul_mat (ctx=..., src0=0x555567b1d130, src1=0x555568176010, dst=0x555568176180, 
    op=0x7ffff7b64910 <ggml_cuda_op_mul_mat_vec_q(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*, char const*, float const*, char const*, float*, long, long, long, long, ihipStream_t*)>, 
    quantize_src1=0x7ffff7b7aa40 <quantize_row_q8_1_cuda(float const*, void*, long, long, long, long, ggml_type, ihipStream_t*)>)
    at /home/mango/programming/ai/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:1424
#9  0x00007ffff7b567bf in ggml_cuda_compute_forward (ctx=..., dst=0x555568176180)
    at /home/mango/programming/ai/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:2139
#10 evaluate_and_capture_cuda_graph (cuda_ctx=0x555567e05c50, cgraph=<optimized out>, ggml_cuda_cpy_fn_ptrs=..., 
    graph_evaluated_or_captured=<optimized out>, use_cuda_graph=<optimized out>, cuda_graph_update_required=<optimized out>)
    at /home/mango/programming/ai/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:2573
#11 ggml_backend_cuda_graph_compute (backend=<optimized out>, cgraph=<optimized out>)
    at /home/mango/programming/ai/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:2685
#12 0x00007ffff7d07393 in ggml_backend_sched_compute_splits (sched=0x5555686ebb50)
    at /home/mango/programming/ai/llama.cpp/ggml/src/ggml-backend.cpp:1397
#13 ggml_backend_sched_graph_compute_async (sched=0x5555686ebb50, graph=<optimized out>)
    at /home/mango/programming/ai/llama.cpp/ggml/src/ggml-backend.cpp:1588
#14 0x00007ffff7e9c1c0 in llama_graph_compute (lctx=..., gf=gf@entry=0x555568134e20, n_threads=n_threads@entry=16, 
    threadpool=threadpool@entry=0x0) at /usr/include/c++/14.2.1/bits/unique_ptr.h:193
#15 0x00007ffff7ea3051 in llama_decode_impl (lctx=..., inp_batch=...) at /home/mango/programming/ai/llama.cpp/src/llama.cpp:8627
#16 0x00007ffff7ea4047 in llama_decode (ctx=<optimized out>, batch=...) at /home/mango/programming/ai/llama.cpp/src/llama.cpp:9933
#17 0x00005555556666cb in common_init_from_params (params=...) at /home/mango/programming/ai/llama.cpp/common/common.cpp:1058
#18 0x00005555555d63c8 in server_context::load_model (this=this@entry=0x7fffffffbd60, params=...)
    at /home/mango/programming/ai/llama.cpp/examples/server/server.cpp:1748
#19 0x000055555558521d in main (argc=<optimized out>, argv=<optimized out>)
    at /home/mango/programming/ai/llama.cpp/examples/server/server.cpp:4356

And from gfx1030

#0  0x00007ffff4db63f4 in ?? () from /usr/lib/libc.so.6
#1  0x00007ffff4d5d120 in raise () from /usr/lib/libc.so.6
#2  0x00007ffff4d444c3 in abort () from /usr/lib/libc.so.6
#3  0x00007ffff5363b3a in ggml_abort (file=0x7ffff5422e03 "/home/mango/neuro/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu", line=359, 
    fmt=0x7ffff54220ed "HipVMM Failure: %s\n") at /home/mango/neuro/llama.cpp/ggml/src/ggml.c:180
#4  0x00007ffff7bd72e0 in ggml_cuda_pool_vmm::alloc (this=0x55556bf029e0, size=9216, actual_size=0x7fffffff81d8)
    at /home/mango/neuro/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:359
#5  0x00007ffff7bd428e in ggml_cuda_pool_alloc<char>::alloc (this=0x7fffffff81c8, size=9216)
    at /home/mango/neuro/llama.cpp/ggml/src/ggml-hip/../ggml-cuda/common.cuh:563
#6  ggml_cuda_pool_alloc<char>::alloc (this=0x7fffffff81c8, pool=..., size=9216)
    at /home/mango/neuro/llama.cpp/ggml/src/ggml-hip/../ggml-cuda/common.cuh:569
#7  ggml_cuda_op_mul_mat (ctx=..., src0=0x55556b1b1430, src1=0x555566ce67f0, dst=0x555566ce6960, 
    op=0x7ffff7bdefc0 <ggml_cuda_op_mul_mat_vec_q(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*, char const*, float const*, char const*, float*, long, long, long, long, ihipStream_t*)>, 
    quantize_src1=0x7ffff7bf50f0 <quantize_row_q8_1_cuda(float const*, void*, long, long, long, long, ggml_type, ihipStream_t*)>)
    at /home/mango/neuro/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:1424
#8  0x00007ffff7bd0e6f in ggml_cuda_compute_forward (ctx=..., dst=0x555566ce6960)
    at /home/mango/neuro/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:2139
#9  evaluate_and_capture_cuda_graph (cuda_ctx=0x55556be6e7d0, cgraph=<optimized out>, ggml_cuda_cpy_fn_ptrs=..., 
    graph_evaluated_or_captured=<optimized out>, use_cuda_graph=<optimized out>, cuda_graph_update_required=<optimized out>)
    at /home/mango/neuro/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:2573
#10 ggml_backend_cuda_graph_compute (backend=<optimized out>, cgraph=<optimized out>)
    at /home/mango/neuro/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:2685
#11 0x00007ffff5375393 in ggml_backend_sched_compute_splits (sched=0x55556bee32c0)
    at /home/mango/neuro/llama.cpp/ggml/src/ggml-backend.cpp:1397
#12 ggml_backend_sched_graph_compute_async (sched=0x55556bee32c0, graph=<optimized out>)
    at /home/mango/neuro/llama.cpp/ggml/src/ggml-backend.cpp:1588
#13 0x00007ffff7e9c1c0 in llama_graph_compute (lctx=..., gf=gf@entry=0x555566ca5600, n_threads=n_threads@entry=12, 
    threadpool=threadpool@entry=0x0) at /usr/include/c++/14.2.1/bits/unique_ptr.h:193
#14 0x00007ffff7ea3051 in llama_decode_impl (lctx=..., inp_batch=...) at /home/mango/neuro/llama.cpp/src/llama.cpp:8627
#15 0x00007ffff7ea4047 in llama_decode (ctx=<optimized out>, batch=...) at /home/mango/neuro/llama.cpp/src/llama.cpp:9933
#16 0x000055555566668b in common_init_from_params (params=...) at /home/mango/neuro/llama.cpp/common/common.cpp:1058
#17 0x00005555555d6388 in server_context::load_model (this=this@entry=0x7fffffffc2c0, params=...)
    at /home/mango/neuro/llama.cpp/examples/server/server.cpp:1748
#18 0x000055555558521d in main (argc=<optimized out>, argv=<optimized out>) at /home/mango/neuro/llama.cpp/examples/server/server.cpp:4356

daniandtheweb · 2025-01-25T16:13:21Z

Please do, esp with the reports from the other machines here should be fine.

I'm currently installing Ubuntu on another partition to install the official 6.3.1 packages and create the report from there. Is it necessary to use the kernel driver for the issue to be useful?

IMbackK · 2025-01-25T16:13:57Z

no, but mention you are using the mainline kernel

IMbackK · 2025-01-25T16:15:18Z

could you guys try with "iommu=pt" on the kernel cmdline?

IMbackK · 2025-01-25T16:15:43Z

and have the iommu enabled in the bios

MangoTCF · 2025-01-25T16:19:22Z

No bios setting for it but with the kernel setting it still fails on the Framework

daniandtheweb · 2025-01-25T16:20:27Z

I get some more info about the error on Ubuntu.

./llama-bench -m ~/Applications/chat/gguf/llama-2-7b.Q4_0.gguf -ngl 100                                                  0.001s 
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 5700 XT, compute capability 10.1, VMM: yes
| model                          |       size |     params | backend    | ngl |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | -------------------: |
/home/daniandtheweb/Applications/chat/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:359: HipVMM Failure: out of memory

Memory critical error by agent node-0 (Agent handle: 0x5e72356bd650) on address 0x72d1bcc00000. Reason: Memory in use. 
zsh: IOT instruction (core dumped)  ./llama-bench -m ~/Applications/chat/gguf/llama-2-7b.Q4_0.gguf -ngl 100

IMbackK · 2025-01-25T16:20:32Z

dose you dmseg contain
iommu: Default domain type: Passthrough (set via kernel command line)

and

perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank) ...

?

MangoTCF · 2025-01-25T16:22:24Z

dose you dmseg contain iommu: Default domain type: Passthrough (set via kernel command line)

and

perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank) ...

?

yup

kern  :info  : [    0.000000] Command line: quiet splash loglevel=3 root=/dev/ssd/root iommu=pt amdgpu.ppfeaturemask=0xffffffff
kern  :notice: [    0.037885] Kernel command line: quiet splash loglevel=3 root=/dev/ssd/root iommu=pt amdgpu.ppfeaturemask=0xffffffff
kern  :info  : [    0.434218] iommu: Default domain type: Passthrough (set via kernel command line)
kern  :info  : [    0.487475] pci 0000:00:01.0: Adding to iommu group 0
kern  :info  : [    0.487489] pci 0000:00:01.1: Adding to iommu group 1
kern  :info  : [    0.487508] pci 0000:00:02.0: Adding to iommu group 2
............
kern  :info  : [    0.488114] pci 0000:c6:00.4: Adding to iommu group 30
kern  :info  : [    0.488127] pci 0000:c6:00.5: Adding to iommu group 31
kern  :info  : [    0.488140] pci 0000:c6:00.6: Adding to iommu group 32
kern  :info  : [    0.492755] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).

MangoTCF · 2025-01-25T16:23:21Z

The gfx1030 machine is the same(same dmesg, same crash)

MangoTCF · 2025-01-25T16:40:35Z

anything else for me to try?

IMbackK · 2025-01-25T16:41:15Z

no i out of ideas, so im just going to disable it for now,

I am on an Zen2 epyc, maybe for some reason it dosent work on consumer platforms.

IMbackK · 2025-01-25T16:43:07Z

@MangoTCF Please add your me too with details on your configuration to the issue created by @daniandtheweb

SteelPh0enix · 2025-01-25T16:45:05Z

This issue is also related to this commit: #11421

However, in my case i'm able to load the model, but the generation doesn't work properly at all.
Update: the VMM fix also fixes my issue.

MangoTCF · 2025-01-25T16:46:06Z

@MangoTCF Please add your me too with details on your configuration to the issue created by @daniandtheweb

This one?

daniandtheweb · 2025-01-25T17:00:53Z

@MangoTCF He means this: ROCm/ROCR-Runtime#287

MangoTCF · 2025-01-25T17:18:54Z

Oops. Did that now

@MangoTCF He means this: ROCm/ROCR-Runtime#287

daniandtheweb added the bug-unconfirmed label Jan 25, 2025

daniandtheweb changed the title ~~Misc. bug: commit 5f0db95 breaks model loading on RX 5700 XT~~ Misc. bug: commit 5f0db95 breaks model loading on some AMD gpus Jan 25, 2025

daniandtheweb mentioned this issue Jan 25, 2025

[Issue]: Unable to map memory regions to virtual address space ROCm/ROCR-Runtime#287

Open

IMbackK mentioned this issue Jan 25, 2025

Hip: disable VMM on hip as it seams that it dosent work in some configurations #11420

Merged

daniandtheweb closed this as completed Jan 25, 2025

tonatiuhmira mentioned this issue Feb 12, 2025

HipVMM bug prevents loading any model on desktop systems YellowRoseCx/koboldcpp-rocm#105

Closed

Misc. bug: commit 5f0db95 breaks model loading on some AMD gpus #11405

Misc. bug: commit 5f0db95 breaks model loading on some AMD gpus #11405

Comments

daniandtheweb commented Jan 25, 2025 • edited Loading

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

ggerganov commented Jan 25, 2025

IMbackK commented Jan 25, 2025 • edited Loading

daniandtheweb commented Jan 25, 2025

IMbackK commented Jan 25, 2025 • edited Loading

MangoTCF commented Jan 25, 2025 • edited Loading

IMbackK commented Jan 25, 2025 • edited Loading

IMbackK commented Jan 25, 2025

MangoTCF commented Jan 25, 2025

MangoTCF commented Jan 25, 2025

daniandtheweb commented Jan 25, 2025

IMbackK commented Jan 25, 2025

daniandtheweb commented Jan 25, 2025

MangoTCF commented Jan 25, 2025

MangoTCF commented Jan 25, 2025 • edited Loading

IMbackK commented Jan 25, 2025

MangoTCF commented Jan 25, 2025

IMbackK commented Jan 25, 2025

IMbackK commented Jan 25, 2025

MangoTCF commented Jan 25, 2025 • edited Loading

daniandtheweb commented Jan 25, 2025

IMbackK commented Jan 25, 2025 • edited Loading

IMbackK commented Jan 25, 2025

IMbackK commented Jan 25, 2025

MangoTCF commented Jan 25, 2025

daniandtheweb commented Jan 25, 2025 • edited Loading

IMbackK commented Jan 25, 2025

MangoTCF commented Jan 25, 2025

MangoTCF commented Jan 25, 2025 • edited Loading

MangoTCF commented Jan 25, 2025

IMbackK commented Jan 25, 2025

IMbackK commented Jan 25, 2025

SteelPh0enix commented Jan 25, 2025 • edited Loading

MangoTCF commented Jan 25, 2025

daniandtheweb commented Jan 25, 2025

MangoTCF commented Jan 25, 2025

daniandtheweb commented Jan 25, 2025 •

edited

Loading

IMbackK commented Jan 25, 2025 •

edited

Loading

IMbackK commented Jan 25, 2025 •

edited

Loading

MangoTCF commented Jan 25, 2025 •

edited

Loading

IMbackK commented Jan 25, 2025 •

edited

Loading

MangoTCF commented Jan 25, 2025 •

edited

Loading

MangoTCF commented Jan 25, 2025 •

edited

Loading

IMbackK commented Jan 25, 2025 •

edited

Loading

daniandtheweb commented Jan 25, 2025 •

edited

Loading

MangoTCF commented Jan 25, 2025 •

edited

Loading

SteelPh0enix commented Jan 25, 2025 •

edited

Loading