Support dynamic linking between RAPIDS wheels #33

vyasr · 2024-04-01T17:22:28Z

Currently RAPIDS wheels adhere strictly to the manylinux policy. While the glibc/kernel ABI restrictions are not particularly onerous, the requirement that binary wheels be essentially self-contained and only depend on a small set of external shared libraries is problematic. To adhere to this restriction, RAPIDS wheels statically link (or in rare cases, bundle) all of their external library dependencies, leading to severe binary bloat. The biggest problem with this behavior is that the current sizes prohibit us from publishing our wheels on PyPI. Beyond that come the usual more infrastructural problems: longer CI times due to extra compilation, larger binaries making wheel download and installation slower, etc. The focus of this issue is to define a better solution than static linking for this problem that still adheres to the manylinux spec in spirit while reducing binary sizes. This issue will not address the usage of CUDA math library dynamic library wheels; that will be discussed separately.

Proposed Solution

RAPIDS should start publishing its C++ libraries as standalone wheels that can be pip installed independently from the Python(/Cython) wheels.These wheels should

Be py3 wheels (independent of Python version, except in rare cases like ucxx where we actually use the Python C API in the C++ library) that are built once per arch/CUDA major version
Continue to statically link to the CUDA runtime and math libraries
Contain a complete C++ dev library including CMake files, headers, and transitive dependencies. IOW these wheels should be suitable both for use both during compilation and at runtime.
Leverage scikit-build-core's entry point support to automate exposing their CMake to other packages building against them.

A key question to address is how to encode binary dependencies between wheels. One option is for each wheel to embed RPATHs pointing to the expected relative path to library dependencies in other wheels. This could be accomplished with some CMake to extract library locations from targets and then construct relative paths during the build based on the assumption that the packages are installed into a standard site-packages layout. However, since this approach is fragile and has generally been frowned upon by the Python community in the past, I suggest that we instead exploit dynamic loading to load the library on import of a package. This choice would make packages sensitive to import order (C++ wheels would need to be imported before any other extension module that links to them) but I think that's a reasonable price to pay since it only matters when depending on a C++ wheel. This solution also lets us handle the logic in Python, making it far easier to configure and control. Moreover, it will make the solution fairly composable when an extension module depends on a C++ wheel that depends on yet another C++ wheel.

Once these wheels exist, we should rewrite the existing Python packages to require the corresponding C++ wheels. The current approach of "find C++ if exists, build otherwise" can be scrapped in favor of always requiring that the C++ CMake package be found. Consumers will have the choice of installing the C++ library (e.g. from conda), building it from source, or installing the C++ wheel. The C++ wheel will become a hard dependency in pyproject.toml, so it will automatically be installed when building. In conda environments the pyproject dependencies are ignored, so the new wheels will not be installed, and similarly in devcontainer builds where requirements are generated dynamically from dependencies.yaml. Ultimately a pylibraft->libraft dependency will behave nearly identically to a raft-dask->pylibraft dependency from the perspective of dependency management.

Notes

Since the Python wheels will be dynamically linking to the C++ libraries, these wheels should be a lot closer to what we need in devcontainer/DLFW/PB2/etc builds. As a result we may be able to actually start using them there.

Implementation notes

no need to add to build.sh (adding wheel build for libcudf cudf#15483 (comment))
make cpp vs. python in all shared-workflows jobs, gha-tools scripts explicit
- adding wheel build for libcudf cudf#15483 (comment)
- fix libcudf wheel publishing, make package-type explicit in wheel publishing cudf#16650

24.06 release

Give feedback

librmm: Build C++ wheel rmm#1529
libucx: publish v1.16.0.post1 ucx-wheels#7
Options

24.08 release

Give feedback

libkvikio: add wheel output kvikio#369
libucxx: add build wheel script and accompanying version info ucxx#167
Options

24.10 release

Give feedback

libcudf: adding wheel build for libcudf cudf#15483
libcuspatial (Distribute libcuspatial wheels cuspatial#1450)
Options

24.12 release

Give feedback

No tasks being tracked yet.

Options

25.02 release

Give feedback

libraft (introduce libraft wheels raft#2531)
libcugraph (introduce libcugraph wheels cugraph#4804)
libcuml (https://github.com/rapidsai/rapids-wheels-planning/issues/52)
libcuvs (https://github.com/rapidsai/rapids-wheels-planning/issues/55)
libwholegraph (https://github.com/rapidsai/rapids-wheels-planning/issues/59)
Options

The text was updated successfully, but these errors were encountered:

msarahan · 2024-04-01T17:43:50Z

Contain a complete C++ dev library including CMake files, headers, and transitive dependencies. IOW these wheels should be suitable both for use both during compilation and at runtime.

How much space does this cost? I understand the simplicity benefits of doing it this way, but if our mission is to save space, why are we making this compromise?

vyasr · 2024-04-01T18:15:07Z

I'd guess it'll be on the order of 25MB. IMO if after the other changes we're still that close to the 1GB limit on any package then I don't think removing these files would be a real solution since all it would take is adding one new arch etc to the compilation for us to be over the limit again.

What alternative would you suggest? That we build against RAPIDS dependencies installed in some other way and then specify a runtime dependency that contains only the libraries and nothing else?

Also I'd add that I would expect the bulk of those 25 MB to come from bundling CCCL, which we could fix by creating a wheel for rapids-core-dependencies as well.

bdice · 2024-04-01T18:20:11Z

Another significant benefit of this approach would be that the marginal cost of building for more Python versions (e.g. 3.12) would be much smaller. The most significant build cost would be paid exactly once for the C++ wheel (rather than for each Python minor version) and then we could build for many Python minor versions at a significantly reduced resource cost.

vyasr · 2024-04-01T18:37:10Z

Yes, that's definitely something else I considered. I was originally thinking of exposing the C++ library from the Python wheel directly, but one of the (multiple) reasons that tipped me towards a separate wheel was making the Python wheels relatively cheap to build.

msarahan · 2024-04-01T18:37:35Z

25 MB probably isn't enough to warrant extra complexity given the hundreds of MB we already are dealing with. We should definitely measure this stuff, though.

We can have build-requires and requires dependencies both be specified. The former being the one with dev stuff, the latter without. It's not as nice as Conda's run_exports, but same idea. Doable, with room for improvement in tooling.

jakirkham · 2024-04-01T20:06:39Z

If we were able to have a thin Cython layer around each dependency that we used exclusively, we could use that in other packages and have the benefits of reduced library duplication/static linking

vyasr · 2024-04-01T22:11:55Z

We can have build-requires and requires dependencies both be specified. The former being the one with dev stuff, the latter without. It's not as nice as Conda's run_exports, but same idea. Doable, with room for improvement in tooling.

Yes, I definitely think that's worth doing. I considered that but didn't want to include that as part of this proposal because that's a change that we should try to make concurrently with conda packaging (we don't split this in conda either). There's a writeup about this somewhere, I'll find it and share it.

vyasr · 2024-04-01T22:12:45Z

If we were able to have a thin Cython layer around each dependency that we used exclusively, we could use that in other packages and have the benefits of reduced library duplication/static linking

I'm not sure I follow what you mean. How is this different from what's being proposed here, aside from adding a Cython wrapper? What would that Cython wrapper do?

vyasr · 2024-04-02T00:14:29Z

On the subject of measurements, here's what I currently see locally:

[root@dt08 cpp_wheels]# ls -lh wheelhouse/
total 1.6G
-rw-r--r-- 1 root root 821M Apr  1 20:42 libcugraph-24.6.0-cp311-cp311-manylinux_2_17_x86_64.whl
-rw-r--r-- 1 root root 788M Apr  1 03:59 libraft-24.6.0-cp311-cp311-manylinux_2_17_x86_64.whl
-rw-r--r-- 1 root root 3.7M Apr  1 03:14 librmm-24.6.0-cp311-cp311-manylinux_2_17_x86_64.whl
-rw-r--r-- 1 root root 1.5M Apr  1 20:49 pylibcugraph-24.6.0-cp311-cp311-manylinux_2_17_x86_64.whl
-rw-r--r-- 1 root root 3.9M Apr  1 04:01 pylibraft-24.6.0-cp311-cp311-manylinux_2_17_x86_64.whl
-rw-r--r-- 1 root root 1.7M Apr  1 03:15 rmm-24.6.0-cp311-cp311-manylinux_2_17_x86_64.whl

We're under 1 GB with this! For context, the pylibcugraph and cugraph wheels I see from recent PRs is 1.47 GB. One major missing piece here is NCCL, which I expect will add ~100MB back to the size.

If I open up the wheels and look at their contents:

[root@dt08 wheelhouse]# du -sh unpacked_libcugraph/libcugraph/*
512     unpacked_libcugraph/libcugraph/VERSION
9.5K    unpacked_libcugraph/libcugraph/__init__.py
9.5K    unpacked_libcugraph/libcugraph/_version.py
19M     unpacked_libcugraph/libcugraph/include
1.1G    unpacked_libcugraph/libcugraph/lib64
9.5K    unpacked_libcugraph/libcugraph/load.py
[root@dt08 wheelhouse]# du -sh unpacked_libraft/libraft/*
512     unpacked_libraft/libraft/VERSION
9.5K    unpacked_libraft/libraft/__init__.py
9.5K    unpacked_libraft/libraft/_version.py
37M     unpacked_libraft/libraft/include
1.1G    unpacked_libraft/libraft/lib64
9.5K    unpacked_libraft/libraft/load.py
1.5K    unpacked_libraft/libraft/test
[root@dt08 wheelhouse]# du -sh unpacked_libcugraph/libcugraph/lib64/*
67K     unpacked_libcugraph/libcugraph/lib64/cmake
1.1G    unpacked_libcugraph/libcugraph/lib64/libcugraph.so
4.7M    unpacked_libcugraph/libcugraph/lib64/libcugraph_c.so
192K    unpacked_libcugraph/libcugraph/lib64/rapids
[root@dt08 wheelhouse]# du -sh unpacked_libraft/libraft/lib64/*
238K    unpacked_libraft/libraft/lib64/cmake
1.1G    unpacked_libraft/libraft/lib64/libraft.so
192K    unpacked_libraft/libraft/lib64/rapids

Definitely suggests that as I expected we wouldn't be benefiting much from trying to optimize the include directory, at least not unless we dramatically reduce library sizes somehow.

jameslamb · 2024-04-02T14:37:34Z

Nice! I want to say this somewhere, here seems as good a place as any... since 1GB is a special value (a PyPI limit), I think as part of this work we should be enforcing that limit on wheels in CI across all the repos.

That could be done with that pydistcheck thing I made or with a shell script using du or similar. But either way, I think it'd be useful to catch "hey this artifact is gonna be too big" in CI instead of during publishing to PyPI.

jameslamb · 2024-04-12T20:56:35Z

Adding a link to this highly-relevant conversation happening on the Python discourse over the last 2 weeks.

https://discuss.python.org/t/enforcing-consistent-metadata-for-packages/50008/28

Some quotes that really stood out to me

Multiple times when we’ve discussed size limit requests and questions like “why are packages using CUDA so large?”, the suggestion given to package authors to reduce binary size consumption is to split out the non-Python parts into a separate wheel, and depend on that in the main package (which uses the CPython C API and has to be built 5 times if one supports 5 Python 3.x versions)

and

Similarly, for native dependencies, NumPy and SciPy both vendor the libopenblas shared library (see pypackaging-native’s page on this for more details ). It takes up about 67% of the numpy wheel sizes, and ~40% of scipy wheel sizes. With four minor Python versions supported, that’s 8x the same thing being vendored. We’d actually really like to unvendor that, and already have a separate wheel: scipy-openblas32 · PyPI . However, depending on it is forbidden without marking everything as dynamic, which isn’t great. So we’ve done all the hard work, dealing with packaging, symbol mangling and supporting functionality to safely load a shared library from another wheel. But the blocker is that we cannot express the dependency (important, we don’t want to ship an sdist for scipy-openblas32, it’s really only about unvendoring a binary).

This conversation is closely related to PEP 725 - "Specifying external dependencies in pyproject.toml" (link)

vyasr · 2024-04-17T22:57:37Z

Thanks James! We should probably chime in there at some point, but perhaps once we're a bit further along with our implementation.

vyasr · 2024-04-20T05:06:31Z

One thing that we should keep in mind while implementing this feature is that it may cause problems for our usage of sccache in CI. After this change, C++ library dependencies will now be found in other wheels instead of being downloaded via CPM. While CPM's downloads will always go to the same path, wheels will instead be downloaded into a different ephemeral virtual environment during builds every time. If sccache sees the different path as a different dependency (i.e. if the path change results in a cache miss) then we will end up recompiling artifacts far more frequently than we should. I'm not sure if this is the case, so it's something we'll have to experiment with if nobody else knows for sure either (@trxcllnt, @ajschmidt8, or @robertmaynard might know this already). If it is an issue, there are two ways out of this:

The sure path: turn off build isolation and install dependencies manually into the root environment or into a manually created venv in a specified directory. Either option will produce consistent paths.
The easier path, if it's viable: sccache may allow configuration of the key to force it to use hashes of files (included headers and linked libraries) exclusively instead of paths, in which case we could just do that and not worry about the ephemeral path changes. @robertmaynard mentioned that this might exist.

trxcllnt · 2024-04-22T17:36:18Z

sccache doesn't allow overriding the computed hash (aside from respecting an additional envvar to hash with everything else), so you'll have to do --no-build-isolation and install dependencies into a consistent location. This is why this is why the devcontainers and DLFW builds do this.

robertmaynard · 2024-04-22T19:48:32Z

I was thinking about pre-processor mode (https://github.com/mozilla/sccache/blob/main/docs/Local.md#preprocessor-cache-mode ) but that only allows you to ignore the working directory in the hash, and not other directories.

Plus it doesn't work with non local backed caches...

vyasr · 2024-04-23T01:28:36Z

OK yeah so be it, I figured no build isolation was where we'd end up but wanted to check. It would have been nice if sccache had added some feature that made this possible!

jakirkham · 2024-04-24T19:33:53Z

cc @raydouglass (for awareness)

Follow-up to #15483. Contributes to rapidsai/build-planning#33. Adds a build-time dependency on `libkvikio` wheels for `libcudf` wheels (per #15483 (comment)). With this change, CPM is no longer used to download and install the kvikio headers. Before: ```text -- Found cuFile: /usr/local/cuda/lib64/libcufile.so -- CPM: Adding package [email protected] (branch-24.10) ``` ([recent build link from branch-24.10](https://github.com/rapidsai/cudf/actions/runs/10780576194/job/29896649202#step:9:7673)) After: ```text -- KvikIO: Found cuFile Batch API: TRUE -- KvikIO: Found cuFile Stream API: TRUE -- CPM: Using local package [email protected] ``` ([build link from this PR](https://github.com/rapidsai/cudf/actions/runs/10780504202/job/29896555443?pr=16778#step:9:7754)) ## Notes for Reviewers ### This removes kvikio headers/CMake files from libcudf wheels Cuts around 0.8 MB (23 files) out of `libcudf` wheels. As of this PR, these would no longer be vendored in `libcudf` wheels: ```text 0 09-08-2024 06:17 libcudf/include/kvikio/ 0 09-08-2024 06:17 libcudf/include/kvikio/shim/ 6356 09-08-2024 06:17 libcudf/include/kvikio/batch.hpp 3812 09-08-2024 06:17 libcudf/include/kvikio/buffer.hpp 10499 09-08-2024 06:17 libcudf/include/kvikio/utils.hpp 1399 09-08-2024 06:17 libcudf/include/kvikio/cufile_config.hpp 33385 09-08-2024 06:17 libcudf/include/kvikio/file_handle.hpp 7299 09-08-2024 06:17 libcudf/include/kvikio/driver.hpp 9678 09-08-2024 06:17 libcudf/include/kvikio/defaults.hpp 5352 09-08-2024 06:17 libcudf/include/kvikio/stream.hpp 6002 09-08-2024 06:17 libcudf/include/kvikio/error.hpp 4501 09-08-2024 06:17 libcudf/include/kvikio/bounce_buffer.hpp 3197 09-08-2024 06:17 libcudf/include/kvikio/parallel_operation.hpp 9864 09-08-2024 06:17 libcudf/include/kvikio/posix_io.hpp 717 09-08-2024 06:17 libcudf/include/kvikio/version_config.hpp 4529 09-08-2024 06:17 libcudf/include/kvikio/shim/cuda.hpp 3331 09-08-2024 06:17 libcudf/include/kvikio/shim/utils.hpp 4055 09-08-2024 06:17 libcudf/include/kvikio/shim/cufile_h_wrapper.hpp 2242 09-08-2024 06:17 libcudf/include/kvikio/shim/cuda_h_wrapper.hpp 7510 09-08-2024 06:17 libcudf/include/kvikio/shim/cufile.hpp 0 09-08-2024 06:17 libcudf/lib64/cmake/kvikio/ 5031 09-08-2024 06:17 libcudf/lib64/cmake/kvikio/kvikio-targets.cmake 3681 09-08-2024 06:17 libcudf/lib64/cmake/kvikio/kvikio-config-version.cmake 6915 09-08-2024 06:17 libcudf/lib64/cmake/kvikio/kvikio-config.cmake 1529 09-08-2024 06:17 libcudf/lib64/cmake/kvikio/kvikio-dependencies.cmake 3851 09-08-2024 06:17 libcudf/lib64/cmake/kvikio/FindcuFile.cmake ``` This is safe because kvikio is a PRIVATE dependency of `libcudf`. https://github.com/rapidsai/cudf/blob/150f1b10ed9c702d5283216b746df685e1708716/cpp/CMakeLists.txt#L796-L802 # Authors: - James Lamb (https://github.com/jameslamb) - Bradley Dice (https://github.com/bdice) Approvers: - Bradley Dice (https://github.com/bdice) URL: #16778

jameslamb · 2024-10-22T15:49:39Z

I think as part of this work we should be enforcing that limit on wheels in CI across all the repos.

I split this proposal out into its own issue: #110

jameslamb · 2024-10-22T22:43:23Z

I've updated the task list here. I need some help understanding the sequence here though.

#33 (comment) said that symbol visibility issues in RAFT need to be resolved, tracked in rapidsai/raft#1722.

A bunch of PRs have gone in contributing to rapidsai/raft#1722, but that issue is still open... I'm not sure what's left for it.

That comment also said creating a libraft needed to wait until "the cuvs-raft" split. @mmccarty said that that was tracked in rapidsai/cuvs#113, which is now closed.

So am I right that these things need to be done in the following order?

whatever symbol-visibility stuff remains for [FEA] RAFT should ensure all its symbols are hidden from shared object libraries raft#1722
add a libraft wheel
add libcuml, libcugraph, and libwholegraph wheels (any order)

cc @vyasr @robertmaynard

vyasr · 2024-10-22T23:39:47Z

We need to touch base with @cjnolet to get an update on what the current plan is for raft. There are a few questions that we need answers to, mostly around what the cuvs-raft relationship is going to wind up being and whether raft will still become header-only as was originally planned. In the scramble around cuvs there were some instances where the ideas were reconsidered and I don't know what the current plan is and what the timeline is. I'd like to minimize duplicate work around this as much as possible since some cases will have more pitfalls than others and it would be wasteful to go down a rabbit hole that we expect to vanish eventually anyway.

robertmaynard · 2024-10-23T13:00:53Z

A bunch of PRs have gone in contributing to rapidsai/raft#1722, but that issue is still open... I'm not sure what's left for it.

Jake original proposal also includes having every host template function in RAFT ( e.g. ~90% of RAFT host code ) should be annotated as attribute((visibility("hidden"))). That is a massive change and most likely breaks the ability to pass RAFT types across DSO boundaries.
Given the constraints that RAFT has ( cross DSO support ) I think we could close the issue now.

So am I right that these things need to be done in the following order?

whatever symbol-visibility stuff remains for [FEA] RAFT should ensure all its symbols are hidden from shared object libraries raft#1722

add a libraft wheel

add libcuml, libcugraph, and libwholegraph wheels (any order)

We can skip steps 1 and 2 and go straight to three. The libraft wheel I expect will have minimal value ( as measured by library size ) going forward and is not needed for correctness when building libcuml or libcugraph.

Related to rapidsai/build-planning#33 and rapidsai/build-planning#74 The last use of CMake function `install_aliased_imported_targets()` here was removed in #478. This proposes removing the file holding its definition. Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Kyle Edwards (https://github.com/KyleFromNVIDIA) URL: #545

Related to rapidsai/build-planning#33 and rapidsai/build-planning#74 The last use of CMake function `install_aliased_imported_targets()` here was removed in #16946. This proposes removing the file holding its definition. Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Kyle Edwards (https://github.com/KyleFromNVIDIA) URL: #17276

Follow-up to #260. Contributes to rapidsai/build-planning#33 Limits `libucxx` wheel-building to just running once per combination of `(CUDA version, CPU architecture)`... cutting out 8 unnecessary CI jobs per commit. ## Notes for Reviewers ### Why is this safe to do? Unlike wheels that have Cython code, `libucxx` wheels don't depend on the Python minor version https://github.com/rapidsai/ucxx/blob/ec860d901f944625e506d85adc0e08021fa4ffd4/python/libucxx/pyproject.toml#L48 e.g., they have tags like ```text libucxx_cu12-0.42.0a18-py3-none-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl ``` Similar filters are being used for most C++ wheel builds across RAPIDS, e.g. https://github.com/rapidsai/cudf/blob/a95fbc88f94df24c3418766fbbea5b6633ff2328/.github/workflows/pr.yaml#L222-L230 Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Peter Andreas Entschev (https://github.com/pentschev) - Mike Sarahan (https://github.com/msarahan) URL: #344

jameslamb · 2025-01-14T14:50:21Z

Putting this here in a central place to link to from multiple PRs... end state I'm trying to get to with these PRs:

The lib{project} wheels are used at build time to provide lib{project}.so libraries and the headers to link against them.

---
title: Build dependencies
---
flowchart LR
    A[libraft] -->B[pylibraft]
    A --> C[raft-dask]
    B --> C

    D[libcugraph] --> E[pylibcugraph]
    D --> F[cugraph]
    E --> F
    A --> D
    A --> E
    A --> F
    B --> E
    B --> F

    G[libcuml] --> H[cuml]
    G --> H
    A --> G
    A --> H
    B --> H

The lib{project} wheels are used at run time to dynamically load those libraries via dlopen(), so downstream projects don't have to rely on RPATHs.

(the same approach we've been using for RAPIDS C++ wheels, e.g. how libcuspatial loads libcudf.so and then libcuspatial.so (code link))

---
title: Runtime dependencies
---
flowchart LR
    A[libraft] -->B[pylibraft]
    A --> C[raft-dask]
    B --> C

    D[libcugraph] --> E[pylibcugraph]
    D --> F[cugraph]
    E --> F
    A --> D
    C --> F
    B --> E
    B --> F

    G[libcuml] --> H[cuml]
    G --> H
    A --> G
    B --> H
    C --> H

Whenever we get there, libwholegraph and libcuvs could be set up in similar ways.

Replaces #2306, contributes to rapidsai/build-planning#33. Proposes packaging `libraft` as a wheel, which is then re-used by: * `pylibraft-cu{11,12}` and `raft-cu{11,12}` (this PR) * `libcugraph-cu{11,12}`, `pylibcugraph-cu{11,12}`, and `cugraph-cu{11,12}` in rapidsai/cugraph#4804 * `libcuml-cu{11,12}` and `cuml-cu{11,12}` in rapidsai/cuml#6199 As part of this, also proposes: * introducing a new CMake option, `RAFT_COMPILE_DYNAMIC_ONLY`, to allow building/installing only the dynamic shared library (i.e. skipping the static library) * enforcing `rapids-cmake`'s preferred CMake style (#2531 (comment)) * making wheel-building CI jobs always depend on other wheel-building CI jobs, not tests or `*-publish` (to reduce end-to-end CI time) ## Notes for Reviewers ### Benefits of these changes * smaller wheels (see "Size Changes" below) * faster compile times (no more re-compiling RAFT in cuGraph and cuML CI) * other benefits mentioned in rapidsai/build-planning#33 ### Wheel contents `libraft`: * `libraft.so` (shared library) * RAFT headers * vendored dependencies (`fmt`, CCCL, `cuco`, `cute`, `cutlass`) `pylibraft`: * `pylibraft` Python / Cython code and compiled Cython extensions `raft-dask`: * `raft-dask` Python / Cython code and compiled Cython extension ### Dependency Flows In short.... `libraft` contains a `libraft.so` dynamic library and the headers to link against it. * Anything that needs to link against RAFT at build time pulls in `libraft` wheels as a build dependency. * Anything that needs RAFT's symbols at runtime pulls it in as a runtime dependency, and calls `libraft.load_library()`. For more details and some flowcharts, see rapidsai/build-planning#33 (comment) ### Size changes (CUDA 12, Python 3.12, x86_64) | wheel | num files (before) | num files (these PRs) | size (before) | size (these PRs) | |:---------------:|------------------:|-----------------:|--------------:|-------------:| | `libraft`. | --- | 3169 | --- | 19M | | `pylibraft` | 64 | 63 | 11M | 1M | | `raft-dask` | 29 | 28 | 188M | 188M | | `libcugraph` | --- | 1762 | --- | 903M | | `pylibcugraph` | 190 | 187 | 901M | 2M | | `cugraph` | 315 | 313 | 899M | 3.0M | | `libcuml` | --- | 1766 | --- | 289M | | `cuml` | 442 | --- | 517M | --- | |**TOTAL** | **1,040** | **7,268** | **2,516M** | **1,405M** | *NOTES: size = compressed, "before" = 2025-01-13 nightlies* <details><summary>how I calculated those (click me)</summary> * `cugraph`: nightly commit = rapidsai/cugraph@8507cbf, PR = rapidsai/cugraph#4804 * `cuml`: nightly commit = rapidsai/cuml@7c715c4, PR = rapidsai/cuml#6199 * `raft`: nightly commit = 1b62c41, PR = this PR ```shell docker run \ --rm \ --network host \ --env RAPIDS_NIGHTLY_DATE=2025-01-13 \ --env CUGRAPH_NIGHTLY_SHA=8507cbf63db2f349136b266d3e6e787b189f45a0 \ --env CUGRAPH_PR="pull-request/4804" \ --env CUGRAPH_PR_SHA="2ef32eaa006a84c0bd16220bb8e8af34198fbee8" \ --env CUML_NIGHTLY_SHA=7c715c494dff71274d0fdec774bdee12a7e78827 \ --env CUML_PR="pull-request/6199" \ --env CUML_PR_SHA="2ef32eaa006a84c0bd16220bb8e8af34198fbee8" \ --env RAFT_NIGHTLY_SHA=1b62c4117a35b11ce3c830daae248e32ebf75e3f \ --env RAFT_PR="pull-request/2531" \ --env RAFT_PR_SHA="0d6597b08919f2aae8ac268f1a68d6a8fe5beb4e" \ --env RAPIDS_PY_CUDA_SUFFIX=cu12 \ --env WHEEL_DIR_BEFORE=/tmp/wheels-before \ --env WHEEL_DIR_AFTER=/tmp/wheels-after \ -it rapidsai/ci-wheel:cuda12.5.1-rockylinux8-py3.12 \ bash # --- nightly wheels --- # mkdir -p ./wheels-before export RAPIDS_BUILD_TYPE=branch export RAPIDS_REF_NAME="branch-25.02" # pylibraft RAPIDS_PY_WHEEL_NAME="pylibraft_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REPOSITORY=rapidsai/raft \ RAPIDS_SHA=${RAFT_NIGHTLY_SHA} \ rapids-download-wheels-from-s3 python ./wheels-before # raft-dask RAPIDS_PY_WHEEL_NAME="raft_dask_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REPOSITORY=rapidsai/raft \ RAPIDS_SHA=${RAFT_NIGHTLY_SHA} \ rapids-download-wheels-from-s3 python ./wheels-before # cugraph RAPIDS_PY_WHEEL_NAME="cugraph_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REPOSITORY=rapidsai/cugraph \ RAPIDS_SHA=${CUGRAPH_NIGHTLY_SHA} \ rapids-download-wheels-from-s3 python ./wheels-before # pylibcugraph RAPIDS_PY_WHEEL_NAME="pylibcugraph_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REPOSITORY=rapidsai/cugraph \ RAPIDS_SHA=${CUGRAPH_NIGHTLY_SHA} \ rapids-download-wheels-from-s3 python ./wheels-before # cuml RAPIDS_PY_WHEEL_NAME="cuml_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REPOSITORY=rapidsai/cuml \ RAPIDS_SHA=${CUML_NIGHTLY_SHA} \ rapids-download-wheels-from-s3 python ./wheels-before # --- wheels from CI --- # mkdir -p ./wheels-after export RAPIDS_BUILD_TYPE="pull-request" # libraft RAPIDS_PY_WHEEL_NAME="libraft_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REPOSITORY=rapidsai/raft \ RAPIDS_REF_NAME="${RAFT_PR}" \ RAPIDS_SHA="${RAFT_PR_SHA}" \ rapids-download-wheels-from-s3 cpp ./wheels-after # pylibraft RAPIDS_PY_WHEEL_NAME="pylibraft_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REPOSITORY=rapidsai/raft \ RAPIDS_REF_NAME="${RAFT_PR}" \ RAPIDS_SHA="${RAFT_PR_SHA}" \ rapids-download-wheels-from-s3 python ./wheels-after # raft-dask RAPIDS_PY_WHEEL_NAME="raft_dask_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REPOSITORY=rapidsai/raft \ RAPIDS_REF_NAME="${RAFT_PR}" \ RAPIDS_SHA="${RAFT_PR_SHA}" \ rapids-download-wheels-from-s3 python ./wheels-after # libcugraph RAPIDS_PY_WHEEL_NAME="libcugraph_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REPOSITORY=rapidsai/cugraph \ RAPIDS_REF_NAME="${CUGRAPH_PR}" \ RAPIDS_SHA="${CUGRAPH_PR_SHA}" \ rapids-download-wheels-from-s3 cpp ./wheels-after # pylibcugraph RAPIDS_PY_WHEEL_NAME="pylibcugraph_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REPOSITORY=rapidsai/cugraph \ RAPIDS_REF_NAME="${CUGRAPH_PR}" \ RAPIDS_SHA="${CUGRAPH_PR_SHA}" \ rapids-download-wheels-from-s3 python ./wheels-after # cugraph RAPIDS_PY_WHEEL_NAME="cugraph_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REPOSITORY=rapidsai/cugraph \ RAPIDS_REF_NAME="${CUGRAPH_PR}" \ RAPIDS_SHA="${CUGRAPH_PR_SHA}" \ rapids-download-wheels-from-s3 python ./wheels-after # libcuml RAPIDS_PY_WHEEL_NAME="libcuml_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REPOSITORY=rapidsai/cuml \ RAPIDS_REF_NAME="${CUML_PR}" \ RAPIDS_SHA="${CUML_PR_SHA}" \ rapids-download-wheels-from-s3 cpp ./wheels-after # cuml RAPIDS_PY_WHEEL_NAME="cuml_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REPOSITORY=rapidsai/cuml \ RAPIDS_REF_NAME="${CUML_PR}" \ RAPIDS_SHA="${CUML_PR_SHA}" \ rapids-download-wheels-from-s3 python ./wheels-after pip install pydistcheck pydistcheck \ --inspect \ --select 'distro-too-large-compressed' \ ./wheels-before/*.whl \ | grep -E '^checking|files: | compressed' \ > ./before.txt # get more exact sizes du -sh ./wheels-before/* pydistcheck \ --inspect \ --select 'distro-too-large-compressed' \ ./wheels-after/*.whl \ | grep -E '^checking|files: | compressed' \ > ./after.txt # get more exact sizes du -sh ./wheels-after/* ``` </details> ### How I tested this These other PRs: * rapidsai/devcontainers#435 * rapidsai/cugraph-gnn#110 * rapidsai/cuml#6199 * rapidsai/cugraph#4804

Contributes to rapidsai/build-planning#33 Adjusts `rapids-build-utils` manifest for release 25.02 to account for the introduction of new `libcugraph` wheels (rapidsai/cugraph#4804). ## Notes for Reviewers This shouldn't be merged still pointing at my forks. Plan: 1. admin-merge rapidsai/cugraph#4804 once everything except devcontainers CI there is passing 2. point this PR at upstream `rapidsai/cugraph` 3. observe CI passing and merge this normally (or admin-merge to save time) --------- Co-authored-by: Bradley Dice <[email protected]> Co-authored-by: Paul Taylor <[email protected]>

Replaces #4340, contributes to rapidsai/build-planning#33. Proposes packaging `libcugraph` as a wheel, which is then re-used by `cugraph-cu{11,12}` and `pylibcugraph-cu{11,12}` wheels. ## Notes for Reviewers ### Benefits of these changes * smaller wheels (see "Size Changes" below) - *no more `pylibcugraph` and `cugraph` both holding copies of libcugraph.so* * faster compile times - *no more re-compiling RAFT, thanks to rapidsai/raft#2531 - *no more recompiling libcugraph.so in both `pylibcugraph` and `cugraph` wheel builds* * other benefits mentioned in rapidsai/build-planning#33 ### Wheel contents `libcugraph`: * `libcugraph.so` (shared library) * cuGraph headers * vendored dependencies (`fmt`, `spdlog`, CCCL, `cuco`) `pylibcugraph`: * `pylibcugraph` Python / Cython code and compiled Cython extensions `cugraph`: * `cugraph` Python / Cython code and compiled Cython extension ### Dependency Flows In short.... `libcugraph` contains `libcugraph.so` and `libcugraph_c.so` dynamic libraries and the headers to link against it. * Anything that needs to link against cuGraph at build time pulls in `libcugraph` wheels as a build dependency. * Anything that needs cuGraph's symbols at runtime pulls it in as a runtime dependency, and calls `libcugraph.load_library()`. For more details and some flowcharts, see rapidsai/build-planning#33 (comment) ### Size changes (CUDA 12, Python 3.12, x86_64) | wheel | num files (before) | num files (this PR) | size (before) | size (this PR) | |:---------------:|------------------:|-----------------:|--------------:|-------------:| | `libcugraph` | --- | 1762 | --- | 903M | | `pylibcugraph` | 190 | 187 | 901M | 2M | | `cugraph` | 315 | 313 | 899M | 3M | |**TOTAL** | **505** | **2,262** | **1,800M** | **908M** | *NOTES: size = compressed, "before" = 2025-01-13 nightlies* *This is a cuGraph-specific slice of the table from rapidsai/raft#2531. See that PR for details.* ### How I tested this These other PRs: * rapidsai/devcontainers#435 * rapidsai/cugraph-gnn#110 Authors: - James Lamb (https://github.com/jameslamb) - Ralph Liu (https://github.com/nv-rliu) - Bradley Dice (https://github.com/bdice) Approvers: - Brad Rees (https://github.com/BradReesWork) - Bradley Dice (https://github.com/bdice) URL: #4804

This was referenced Apr 6, 2024

Use CUDA wheels to avoid statically linking CUDA components in our wheels #35

Closed

Build C++ wheel rapidsai/rmm#1529

Merged

Build C++ wheel rapidsai/raft#2264

Merged

Build C++ wheel rapidsai/cugraph#4340

Closed

msarahan added epic public labels Apr 18, 2024

This was referenced Apr 18, 2024

Build Python packages using the limited API #42

Open

Properly support building pure Python packages #43

Open

Align conda and wheel building workflows #44

Open

Build and ship C++ RAPIDS binaries #45

Open

vyasr mentioned this issue Apr 23, 2024

Simplify and standardize (or replace) build.sh scripts #53

Open

jameslamb assigned msarahan May 3, 2024

vyasr mentioned this issue May 3, 2024

Update for RAPIDS C++ wheels rapidsai/ci-imgs#138

Closed

2 tasks

jameslamb mentioned this issue Sep 15, 2024

[python-package] SegFault on MacOS when pytorch is installed microsoft/LightGBM#6595

Open

vyasr mentioned this issue Oct 9, 2024

Test rapidsai/cudf#17024

Closed

3 tasks

This was referenced Oct 11, 2024

build wheels without build isolation rapidsai/cuspatial#1473

Merged

Remove build isolation in wheel builds #108

Closed

jameslamb mentioned this issue Oct 22, 2024

Add validation on wheel file characteristics in CI #110

Closed

This was referenced Nov 7, 2024

remove WheelHelpers.cmake rapidsai/cudf#17276

Merged

remove WheelHelpers.cmake rapidsai/kvikio#545

Merged

bdice mentioned this issue Nov 19, 2024

Add support for Python 3.13 #120

Open

4 tasks

jameslamb mentioned this issue Dec 5, 2024

introduce libcugraph wheels rapidsai/cugraph#4804

Merged

This was referenced Dec 18, 2024

introduce libraft wheels rapidsai/raft#2531

Merged

avoid unnecessary wheel builds rapidsai/ucxx#344

Merged

jameslamb mentioned this issue Dec 30, 2024

WIP: [DO NOT MERGE] introduce libcuml wheels rapidsai/cuml#6199

Draft

jameslamb mentioned this issue Jan 10, 2025

add libcugraph Python builds rapidsai/devcontainers#435

Merged

vyasr mentioned this issue Jan 16, 2025

Develop a consistent CMake strategy for supporting static library builds #134

Open

This was referenced Jan 21, 2025

WIP: [DO NOT MERGE] introduce libcuvs wheels rapidsai/cuvs#594

Draft

C++ wheel builds: standardizations for 25.02 #136

Open

WIP: add libcuvs Python builds rapidsai/devcontainers#440

Draft

bdice mentioned this issue Jan 22, 2025

Use CUDA 11 wheels to avoid statically linking CUDA components #137

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support dynamic linking between RAPIDS wheels #33

Support dynamic linking between RAPIDS wheels #33

vyasr commented Apr 1, 2024 •

edited by jameslamb

Loading

24.06 release

24.08 release

24.10 release

24.12 release

25.02 release

msarahan commented Apr 1, 2024

vyasr commented Apr 1, 2024 •

edited

Loading

bdice commented Apr 1, 2024

vyasr commented Apr 1, 2024

msarahan commented Apr 1, 2024

jakirkham commented Apr 1, 2024

vyasr commented Apr 1, 2024

vyasr commented Apr 1, 2024

vyasr commented Apr 2, 2024 •

edited

Loading

jameslamb commented Apr 2, 2024

jameslamb commented Apr 12, 2024

vyasr commented Apr 17, 2024

vyasr commented Apr 20, 2024

trxcllnt commented Apr 22, 2024 •

edited

Loading

robertmaynard commented Apr 22, 2024

vyasr commented Apr 23, 2024

jakirkham commented Apr 24, 2024

jameslamb commented Oct 22, 2024

jameslamb commented Oct 22, 2024

vyasr commented Oct 22, 2024

robertmaynard commented Oct 23, 2024

jameslamb commented Jan 14, 2025

Support dynamic linking between RAPIDS wheels #33

Support dynamic linking between RAPIDS wheels #33

Comments

vyasr commented Apr 1, 2024 • edited by jameslamb Loading

Proposed Solution

Notes

Implementation notes

24.06 release

24.08 release

24.10 release

24.12 release

25.02 release

msarahan commented Apr 1, 2024

vyasr commented Apr 1, 2024 • edited Loading

bdice commented Apr 1, 2024

vyasr commented Apr 1, 2024

msarahan commented Apr 1, 2024

jakirkham commented Apr 1, 2024

vyasr commented Apr 1, 2024

vyasr commented Apr 1, 2024

vyasr commented Apr 2, 2024 • edited Loading

jameslamb commented Apr 2, 2024

jameslamb commented Apr 12, 2024

vyasr commented Apr 17, 2024

vyasr commented Apr 20, 2024

trxcllnt commented Apr 22, 2024 • edited Loading

robertmaynard commented Apr 22, 2024

vyasr commented Apr 23, 2024

jakirkham commented Apr 24, 2024

jameslamb commented Oct 22, 2024

jameslamb commented Oct 22, 2024

vyasr commented Oct 22, 2024

robertmaynard commented Oct 23, 2024

jameslamb commented Jan 14, 2025

vyasr commented Apr 1, 2024 •

edited by jameslamb

Loading

vyasr commented Apr 1, 2024 •

edited

Loading

vyasr commented Apr 2, 2024 •

edited

Loading

trxcllnt commented Apr 22, 2024 •

edited

Loading