Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conflicting dwarf version errors #13

Open
smillerc opened this issue Dec 20, 2024 · 4 comments
Open

Conflicting dwarf version errors #13

smillerc opened this issue Dec 20, 2024 · 4 comments

Comments

@smillerc
Copy link

When I try to run some of the examples in the documentation (https://juliagpu.github.io/AcceleratedKernels.jl/stable/api/foreachindex/), I get dwarf version errors. For instance, when I run:

using CUDA
import AcceleratedKernels as AK
const x = CuArray(reshape(1:3000, 3, 1000))
const y = similar(x)
AK.foraxes(x, 2) do i
    for j in axes(x, 1)
        @inbounds y[j, i] = 2 * x[j, i] + 1
    end
end

I get this (and a bunch more LLVM errors)

warning: linking module flags 'Dwarf Version': IDs have conflicting values ('i32 4' from globals with 'i32 2' from start)
ERROR: InvalidIRError: compiling MethodInstance for AcceleratedKernels.gpu__forindices_global!(::KernelAbstractions.CompilerMetadata{…}, ::var"#1#2", ::Base.OneTo{…}) resulted in invalid LLVM IR

and more like

Reason: unsupported dynamic function invocation (call to convert)
...
Reason: unsupported call to an unknown function (call to julia.get_pgcstack)
...
Reason: unsupported call to a lazy-initialized function (call to jl_gc_run_pending_finalizers)
...
Reason: unsupported call to an external C function (call to jl_gc_have_pending_finalizers)
...

Here's my system info

julia> versioninfo()
Julia Version 1.11.2
Commit 5e9a32e7af2 (2024-12-01 20:02 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 32 × Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, cascadelake)
Threads: 32 default, 0 interactive, 16 GC (on 32 virtual cores)

julia> CUDA.versioninfo()
CUDA runtime 12.6, artifact installation
CUDA driver 12.6
NVIDIA driver 550.120.0

CUDA libraries: 
- CUBLAS: 12.6.4
- CURAND: 10.3.7
- CUFFT: 11.3.0
- CUSOLVER: 11.7.1
- CUSPARSE: 12.5.4
- CUPTI: 2024.3.2 (API 24.0.0)
- NVML: 12.0.0+550.120

Julia packages: 
- CUDA: 5.5.2
- CUDA_Driver_jll: 0.10.4+0
- CUDA_Runtime_jll: 0.15.5+0

Toolchain:
- Julia: 1.11.2
- LLVM: 16.0.6

2 devices:
  0: NVIDIA RTX A4000 (sm_86, 15.525 GiB / 15.992 GiB available)
  1: NVIDIA RTX A4000 (sm_86, 13.911 GiB / 15.992 GiB available)
@anicusan
Copy link
Member

I retried the code and got an error indeed, though seemingly different to yours. However, when put inside a function, it seems to work:

using CUDA
import AcceleratedKernels as AK

function addfunc()
    x = CuArray(reshape(1:3000, 3, 1000))
    y = similar(x)
    AK.foraxes(x, 2) do i
        for j in axes(x, 1)
            @inbounds y[j, i] = 2 * x[j, i] + 1
        end
    end
    y
end

addfunc()

Would that work on your end?

If not, it may be a CUDA configuration error - would you be able to run some higher-level, pure-CUDA.jl code, like:

using CUDA
x = CuArray(1:1000)
y = CuArray(1:1000)
z = x + y

@smillerc
Copy link
Author

smillerc commented Jan 6, 2025

@anicusan When I place it in a function like you have, it works, and I know my CUDA config is correct, since I use elsewhere in other settings.

In practice, all my code will be in functions anyways. I'm curious as to why it throws all these errors when not in a function. I've been toying with the idea of using AK, since it looks like a nice concise way to avoid kernel boilerplate code.

@anicusan
Copy link
Member

I was finally able to look closer into this - what happens is the lambda (inside the do block) somehow captures global variables/functions implicitly defined in the Julia session - or lambdas in global scopes behave differently? - see e.g. the jl_alloc_genericmemory, julia.get_gc_frame_slot unknown function errors; I am not sure if this is a recent change in how Julia lambdas capture variables... unfortunately this is not something that can be changed within AcceleratedKernels.

This only seems to be a problem for do blocks in global scope that capture external variables, as often done for foreachindex and foraxes - it does not happen for a reduce without external capture:

const x = MtlArray(reshape(1:3000, 3, 1000))
AK.reduce(x, init=0) do a, b
    a + b    # self-contained, does not reference outside variables
end

In terms of future stability, lambda capture inside functions is too pervasive in both base Julia and JuliaGPU codes for it to not work inside functions, so I am expecting it to be fine when written as such.

I will update the docs and README to avoid global variables, even when used with const, then I will close this issue - if there is anything else, please feel free to continue the discussion / re-open :)

@smillerc
Copy link
Author

Ok, thanks for looking into this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants