-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use CUDA 11 wheels to avoid statically linking CUDA components #137
Comments
bdice
changed the title
Use CUDA 11 wheels
Use CUDA 11 wheels to avoid statically linking CUDA components
Jan 22, 2025
This was referenced Jan 22, 2025
This was referenced Jan 22, 2025
Here's a minimal, reproducible example for the cuVS failures. docker run \
--rm \
--gpus all \
-v $(pwd):/opt/work \
-w /opt/work \
-it rapidsai/citestwheel:cuda11.8.0-rockylinux8-py3.12 \
bash
# pinning to the latest libraft / pylibraft with the problematic linking
pip install \
'cuvs-cu11[test]==25.2.*,>=0.0.0a0' \
'libraft-cu11==25.2.0a41' \
'pylibraft-cu11==25.2.0a41'
cd ./python/cuvs/cuvs
pytest 'test/test_ivf_pq.py::test_ivf_pq_search_params'
# cuvs.common.exceptions.CuvsException: cuBLAS error Using the packages built from rapidsai/raft#2548, I saw the tests pass 🎉 LIBRAFT_WHEELHOUSE=$(RAPIDS_PY_WHEEL_NAME="libraft_cu11" rapids-get-pr-wheel-artifact raft 2548 cpp)
PYLIBRAFT_WHEELHOUSE=$(RAPIDS_PY_WHEEL_NAME="pylibraft_cu11" rapids-get-pr-wheel-artifact raft 2548 python)
RAFT_DASK_WHEELHOUSE=$(RAPIDS_PY_WHEEL_NAME="raft_dask_cu11" rapids-get-pr-wheel-artifact raft 2548 python)
pip install \
'cuvs-cu11[test]==25.2.*,>=0.0.0a0' \
"$(echo ${LIBRAFT_WHEELHOUSE}/*.whl)" \
"$(echo ${PYLIBRAFT_WHEELHOUSE}/*.whl)" \
"$(echo ${RAFT_DASK_WHEELHOUSE}/*.whl)"
cd ./python/cuvs/cuvs
pytest 'test/test_ivf_pq.py::test_ivf_pq_search_params'
# === 4 passed in 1.45s === So I think rapidsai/raft#2548 will fix this (at least for cuVS) |
rapids-bot bot
pushed a commit
to rapidsai/raft
that referenced
this issue
Jan 22, 2025
Contributes to rapidsai/build-planning#137 Follow-up to #2531 . See the linked issue for many more details, but in short... using a dynamically-loaded libraft which has statically-linked cuBLAS causes issues for other libraries. There are now aarch64 CUDA 11 wheels for cuBLAS and other CUDA libraries, so it's possible to have RAFT wheels dynamically link against them. This PR does that. ## Notes for Reviewers This has other side benefits in addition to fixing runtime issues... it also simplifies the wheel-building scripts and CMake, and makes CUDA 11 wheels noticeably smaller 😊 Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Bradley Dice (https://github.com/bdice) URL: #2548
rapids-bot bot
pushed a commit
to rapidsai/cugraph
that referenced
this issue
Jan 22, 2025
Contributes to rapidsai/build-planning#137 Follow-up to #4804 Wheel builds here currently list out some shared library to exclude in `auditwheel repair`, which they pick up transitively via linking to `libraft`. https://github.com/rapidsai/cugraph/blob/a9c923bb3f4a6a6f5a9d46337adc65d969717567/ci/build_wheel.sh#L42-L49 The version components of those library names can change when those libraries have ABI breakages, for example across CUDA major version boundaries. This proposes replacing specific versions with wildcards, to exclude *all* versions of those libraries. ## Notes for Reviewers This is especially relevant given this: rapidsai/raft#2548 For example, the latest `nvidia-cublas-cu11` has `libcublas.so.11` while `nvidia-cublas-cu12` has `libcublas.so.12`. Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Bradley Dice (https://github.com/bdice) URL: #4877
rapids-bot bot
pushed a commit
to rapidsai/cuvs
that referenced
this issue
Jan 22, 2025
Due to some failures coming from libraft C++ wheels, CUDA 11 wheel CI will not pass. This PR temporarily disables CUDA 11 wheel tests until those issues can be resolved. See rapidsai/build-planning#137. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - James Lamb (https://github.com/jameslamb) URL: #599
rapids-bot bot
pushed a commit
to rapidsai/cugraph
that referenced
this issue
Jan 22, 2025
Due to some failures coming from libraft C++ wheels, CUDA 11 wheel CI will not pass. This PR temporarily disables CUDA 11 wheel tests until those issues can be resolved. See rapidsai/build-planning#137. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - James Lamb (https://github.com/jameslamb) URL: #4876
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This issue proposes to use CUDA 11 wheels as dependencies for RAPIDS wheels. This is an extension of #35. Originally, that issue's scope was reduced to focus on only using CUDA wheels for CUDA 12 packages, because at that time CUDA 11 ARM wheels (specifically ARM!) did not exist for all of the math libraries that RAPIDS depends on. That was rectified as of about August 2024, but we had already done the migration for just CUDA 12. We did not attempt to go back and add support for CUDA 11.
As a part of the work for #33, we came across a pitfall that we previously recognized, but forgot about: cuBLAS only works properly across DSOs if it is using shared linkage. (An upstream nvbug is linked in that comment.)
We are observing this issue in CI for cuVS and cuML, with errors like those below.
cuVS CI failures
This points directly to
pylibraft
, which is built usinglibraft
C++ wheels.https://github.com/rapidsai/cuvs/actions/runs/12836113145/job/35800080495#step:9:666
cuML CI failures
This points directly to
pylibraft
, which is built usinglibraft
C++ wheels.https://github.com/rapidsai/cuml/actions/runs/12883195240/job/35916934900#step:9:4966
cuGraph CI failures
https://github.com/rapidsai/cugraph/actions/runs/12882240011/job/35914143656#step:9:16398
Currently, our proposed solution is to add support for CUDA wheels to our CUDA 11 builds, which should mitigate the problem and unify our code paths between CUDA 11 and CUDA 12. This should be the lowest-effort path that allows us to continue forward with dynamic linking between RAPIDS C++ wheels (#33).
In the immediate term, we will disable CUDA 11 wheels CI for cuVS, cuML, and cuGraph so they are not blocked.
The text was updated successfully, but these errors were encountered: