Conflicting dwarf version errors #13

smillerc · 2024-12-20T14:44:06Z

When I try to run some of the examples in the documentation (https://juliagpu.github.io/AcceleratedKernels.jl/stable/api/foreachindex/), I get dwarf version errors. For instance, when I run:

using CUDA
import AcceleratedKernels as AK
const x = CuArray(reshape(1:3000, 3, 1000))
const y = similar(x)
AK.foraxes(x, 2) do i
    for j in axes(x, 1)
        @inbounds y[j, i] = 2 * x[j, i] + 1
    end
end

I get this (and a bunch more LLVM errors)

warning: linking module flags 'Dwarf Version': IDs have conflicting values ('i32 4' from globals with 'i32 2' from start)
ERROR: InvalidIRError: compiling MethodInstance for AcceleratedKernels.gpu__forindices_global!(::KernelAbstractions.CompilerMetadata{…}, ::var"#1#2", ::Base.OneTo{…}) resulted in invalid LLVM IR

and more like

Reason: unsupported dynamic function invocation (call to convert)
...
Reason: unsupported call to an unknown function (call to julia.get_pgcstack)
...
Reason: unsupported call to a lazy-initialized function (call to jl_gc_run_pending_finalizers)
...
Reason: unsupported call to an external C function (call to jl_gc_have_pending_finalizers)
...

Here's my system info

julia> versioninfo()
Julia Version 1.11.2
Commit 5e9a32e7af2 (2024-12-01 20:02 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 32 × Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, cascadelake)
Threads: 32 default, 0 interactive, 16 GC (on 32 virtual cores)

julia> CUDA.versioninfo()
CUDA runtime 12.6, artifact installation
CUDA driver 12.6
NVIDIA driver 550.120.0

CUDA libraries: 
- CUBLAS: 12.6.4
- CURAND: 10.3.7
- CUFFT: 11.3.0
- CUSOLVER: 11.7.1
- CUSPARSE: 12.5.4
- CUPTI: 2024.3.2 (API 24.0.0)
- NVML: 12.0.0+550.120

Julia packages: 
- CUDA: 5.5.2
- CUDA_Driver_jll: 0.10.4+0
- CUDA_Runtime_jll: 0.15.5+0

Toolchain:
- Julia: 1.11.2
- LLVM: 16.0.6

2 devices:
  0: NVIDIA RTX A4000 (sm_86, 15.525 GiB / 15.992 GiB available)
  1: NVIDIA RTX A4000 (sm_86, 13.911 GiB / 15.992 GiB available)

The text was updated successfully, but these errors were encountered:

anicusan · 2024-12-22T00:55:59Z

I retried the code and got an error indeed, though seemingly different to yours. However, when put inside a function, it seems to work:

using CUDA
import AcceleratedKernels as AK

function addfunc()
    x = CuArray(reshape(1:3000, 3, 1000))
    y = similar(x)
    AK.foraxes(x, 2) do i
        for j in axes(x, 1)
            @inbounds y[j, i] = 2 * x[j, i] + 1
        end
    end
    y
end

addfunc()

Would that work on your end?

If not, it may be a CUDA configuration error - would you be able to run some higher-level, pure-CUDA.jl code, like:

using CUDA
x = CuArray(1:1000)
y = CuArray(1:1000)
z = x + y

smillerc · 2025-01-06T14:05:24Z

@anicusan When I place it in a function like you have, it works, and I know my CUDA config is correct, since I use elsewhere in other settings.

In practice, all my code will be in functions anyways. I'm curious as to why it throws all these errors when not in a function. I've been toying with the idea of using AK, since it looks like a nice concise way to avoid kernel boilerplate code.

anicusan · 2025-01-16T11:56:29Z

I was finally able to look closer into this - what happens is the lambda (inside the do block) somehow captures global variables/functions implicitly defined in the Julia session - or lambdas in global scopes behave differently? - see e.g. the jl_alloc_genericmemory, julia.get_gc_frame_slot unknown function errors; I am not sure if this is a recent change in how Julia lambdas capture variables... unfortunately this is not something that can be changed within AcceleratedKernels.

This only seems to be a problem for do blocks in global scope that capture external variables, as often done for foreachindex and foraxes - it does not happen for a reduce without external capture:

const x = MtlArray(reshape(1:3000, 3, 1000))
AK.reduce(x, init=0) do a, b
    a + b    # self-contained, does not reference outside variables
end

In terms of future stability, lambda capture inside functions is too pervasive in both base Julia and JuliaGPU codes for it to not work inside functions, so I am expecting it to be fine when written as such.

I will update the docs and README to avoid global variables, even when used with const, then I will close this issue - if there is anything else, please feel free to continue the discussion / re-open :)

smillerc · 2025-01-16T14:01:38Z

Ok, thanks for looking into this!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conflicting dwarf version errors #13

Conflicting dwarf version errors #13

smillerc commented Dec 20, 2024

anicusan commented Dec 22, 2024

smillerc commented Jan 6, 2025

anicusan commented Jan 16, 2025

smillerc commented Jan 16, 2025

Conflicting dwarf version errors #13

Conflicting dwarf version errors #13

Comments

smillerc commented Dec 20, 2024

anicusan commented Dec 22, 2024

smillerc commented Jan 6, 2025

anicusan commented Jan 16, 2025

smillerc commented Jan 16, 2025