Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: [DO NOT MERGE] introduce libcuml wheels #6199

Draft
wants to merge 60 commits into
base: branch-25.02
Choose a base branch
from

Conversation

jameslamb
Copy link
Member

@jameslamb jameslamb commented Dec 30, 2024

Replaces #6006, contributes to rapidsai/build-planning#33.

Proposes packaging libcuml as a wheel, which is then re-used by cuml-cu{11,12} wheels.

Notes for Reviewers

If you see this note, that means this is not ready for review.

Benefits of these changes

Wheel contents

libcuml:

  • libcuml++.so (shared library) and its headers
  • libcumlprims_mg.so (shared library) and its headers
  • other vendored dependencies (CCCL, fmt)

cuml:

  • cuml Python / Cython code and compiled Cython extensions

Dependency Flows

In short.... libcuml contains libcuml.so and libcumlprims_mg.so dynamic libraries and the headers to link against them.

  • Anything that needs to link against cuML at build time pulls in libcugraph wheels as a build dependency.
  • Anything that needs cuML's symbols at runtime pulls it in as a runtime dependency, and calls libcuml.load_library().

For more details and some flowcharts, see rapidsai/build-planning#33 (comment)

Size changes (CUDA 12, Python 3.12, x86_64)

wheel num files (before) num files (this PR) size (before) size (these PRs)
libcuml --- --- --- ---
cuml 442 --- 517M ---
TOTAL 442 --- 517M ---

NOTES: size = compressed, "before" = 2025-01-13 nightlies

how I calculated those (click me)
  • nightly commit = 7c715c4
  • PR = this PR
docker run \
    --rm \
    --network host \
    --env RAPIDS_NIGHTLY_DATE=2025-01-13 \
    --env CUML_NIGHTLY_SHA=7c715c494dff71274d0fdec774bdee12a7e78827 \
    --env CUML_PR="pull-request/6199" \
    --env CUML_PR_SHA="7c715c494dff71274d0fdec774bdee12a7e78827" \
    --env RAPIDS_PY_CUDA_SUFFIX=cu12 \
    --env WHEEL_DIR_BEFORE=/tmp/wheels-before \
    --env WHEEL_DIR_AFTER=/tmp/wheels-after \
    -it rapidsai/ci-wheel:cuda12.5.1-rockylinux8-py3.12 \
    bash

# --- nightly wheels --- #
mkdir -p ./wheels-before

export RAPIDS_BUILD_TYPE=branch
export RAPIDS_REF_NAME="branch-25.02"

# cuml
RAPIDS_PY_WHEEL_NAME="cuml_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/cuml \
RAPIDS_SHA=${CUML_NIGHTLY_SHA} \
    rapids-download-wheels-from-s3 python ./wheels-before

# --- wheels from CI --- #
mkdir -p ./wheels-after

export RAPIDS_BUILD_TYPE="pull-request"

# libcuml
RAPIDS_PY_WHEEL_NAME="libcuml_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/cuml \
RAPIDS_REF_NAME="${CUML_PR}" \
RAPIDS_SHA="${CUML_PR_SHA}" \
    rapids-download-wheels-from-s3 cpp ./wheels-after

# cuml
RAPIDS_PY_WHEEL_NAME="cuml_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/cuml \
RAPIDS_REF_NAME="${CUML_PR}" \
RAPIDS_SHA="${CUML_PR_SHA}" \
    rapids-download-wheels-from-s3 python ./wheels-after

pip install pydistcheck
pydistcheck \
    --inspect \
    --select 'distro-too-large-compressed' \
    ./wheels-before/*.whl \
| grep -E '^checking|files: | compressed' \
> ./before.txt

# get more exact sizes
du -sh ./wheels-before/*

pydistcheck \
    --inspect \
    --select 'distro-too-large-compressed' \
    ./wheels-after/*.whl \
| grep -E '^checking|files: | compressed' \
> ./after.txt

# get more exact sizes
du -sh ./wheels-after/*

How I tested this

These other PRs:

  • (TODO: add devcontainers PR)

@jameslamb jameslamb added 2 - In Progress Currenty a work in progress 5 - DO NOT MERGE Hold off on merging; see PR for details labels Dec 30, 2024

This comment was marked as resolved.

@github-actions github-actions bot removed the CUDA/C++ label Dec 31, 2024
@github-actions github-actions bot added conda conda issue CUDA/C++ labels Jan 2, 2025
raydouglass pushed a commit to rapidsai/raft that referenced this pull request Jan 16, 2025
Replaces #2306, contributes to
rapidsai/build-planning#33.

Proposes packaging `libraft` as a wheel, which is then re-used by:

* `pylibraft-cu{11,12}` and `raft-cu{11,12}` (this PR)
* `libcugraph-cu{11,12}`, `pylibcugraph-cu{11,12}`, and
`cugraph-cu{11,12}` in rapidsai/cugraph#4804
* `libcuml-cu{11,12}` and `cuml-cu{11,12}` in
rapidsai/cuml#6199

As part of this, also proposes:

* introducing a new CMake option, `RAFT_COMPILE_DYNAMIC_ONLY`, to allow
building/installing only the dynamic shared library (i.e. skipping the
static library)
* enforcing `rapids-cmake`'s preferred CMake style
(#2531 (comment))
* making wheel-building CI jobs always depend on other wheel-building CI
jobs, not tests or `*-publish` (to reduce end-to-end CI time)

## Notes for Reviewers

### Benefits of these changes

* smaller wheels (see "Size Changes" below)
* faster compile times (no more re-compiling RAFT in cuGraph and cuML
CI)
* other benefits mentioned in
rapidsai/build-planning#33

### Wheel contents

`libraft`:

* `libraft.so` (shared library)
* RAFT headers
* vendored dependencies (`fmt`, CCCL, `cuco`, `cute`, `cutlass`)

`pylibraft`:

* `pylibraft` Python / Cython code and compiled Cython extensions

`raft-dask`:

* `raft-dask` Python / Cython code and compiled Cython extension

### Dependency Flows

In short.... `libraft` contains a `libraft.so` dynamic library and the
headers to link against it.

* Anything that needs to link against RAFT at build time pulls in
`libraft` wheels as a build dependency.
* Anything that needs RAFT's symbols at runtime pulls it in as a runtime
dependency, and calls `libraft.load_library()`.

For more details and some flowcharts, see
rapidsai/build-planning#33 (comment)

### Size changes (CUDA 12, Python 3.12, x86_64)

| wheel | num files (before) | num files (these PRs) | size (before) |
size (these PRs) |

|:---------------:|------------------:|-----------------:|--------------:|-------------:|
| `libraft`. | --- | 3169 | --- | 19M |
| `pylibraft` | 64 | 63 | 11M | 1M |
| `raft-dask` | 29 | 28 | 188M | 188M |
| `libcugraph` | --- | 1762 | --- | 903M |
| `pylibcugraph` | 190 | 187 | 901M | 2M |
| `cugraph` | 315 | 313 | 899M | 3.0M |
| `libcuml` | --- | 1766 | --- | 289M |
| `cuml` | 442 | --- | 517M | --- |
|**TOTAL** | **1,040** | **7,268** | **2,516M** | **1,405M** |

*NOTES: size = compressed, "before" = 2025-01-13 nightlies*

<details><summary>how I calculated those (click me)</summary>

* `cugraph`: nightly commit =
rapidsai/cugraph@8507cbf,
PR = rapidsai/cugraph#4804
* `cuml`: nightly commit =
rapidsai/cuml@7c715c4,
PR = rapidsai/cuml#6199
* `raft`: nightly commit =
1b62c41,
PR = this PR

```shell
docker run \
    --rm \
    --network host \
    --env RAPIDS_NIGHTLY_DATE=2025-01-13 \
    --env CUGRAPH_NIGHTLY_SHA=8507cbf63db2f349136b266d3e6e787b189f45a0 \
    --env CUGRAPH_PR="pull-request/4804" \
    --env CUGRAPH_PR_SHA="2ef32eaa006a84c0bd16220bb8e8af34198fbee8" \
    --env CUML_NIGHTLY_SHA=7c715c494dff71274d0fdec774bdee12a7e78827 \
    --env CUML_PR="pull-request/6199" \
    --env CUML_PR_SHA="2ef32eaa006a84c0bd16220bb8e8af34198fbee8" \
    --env RAFT_NIGHTLY_SHA=1b62c4117a35b11ce3c830daae248e32ebf75e3f \
    --env RAFT_PR="pull-request/2531" \
    --env RAFT_PR_SHA="0d6597b08919f2aae8ac268f1a68d6a8fe5beb4e" \
    --env RAPIDS_PY_CUDA_SUFFIX=cu12 \
    --env WHEEL_DIR_BEFORE=/tmp/wheels-before \
    --env WHEEL_DIR_AFTER=/tmp/wheels-after \
    -it rapidsai/ci-wheel:cuda12.5.1-rockylinux8-py3.12 \
    bash

# --- nightly wheels --- #
mkdir -p ./wheels-before

export RAPIDS_BUILD_TYPE=branch
export RAPIDS_REF_NAME="branch-25.02"

# pylibraft
RAPIDS_PY_WHEEL_NAME="pylibraft_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/raft \
RAPIDS_SHA=${RAFT_NIGHTLY_SHA} \
    rapids-download-wheels-from-s3 python ./wheels-before

# raft-dask
RAPIDS_PY_WHEEL_NAME="raft_dask_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/raft \
RAPIDS_SHA=${RAFT_NIGHTLY_SHA} \
    rapids-download-wheels-from-s3 python ./wheels-before

# cugraph
RAPIDS_PY_WHEEL_NAME="cugraph_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/cugraph \
RAPIDS_SHA=${CUGRAPH_NIGHTLY_SHA} \
    rapids-download-wheels-from-s3 python ./wheels-before

# pylibcugraph
RAPIDS_PY_WHEEL_NAME="pylibcugraph_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/cugraph \
RAPIDS_SHA=${CUGRAPH_NIGHTLY_SHA} \
    rapids-download-wheels-from-s3 python ./wheels-before

# cuml
RAPIDS_PY_WHEEL_NAME="cuml_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/cuml \
RAPIDS_SHA=${CUML_NIGHTLY_SHA} \
    rapids-download-wheels-from-s3 python ./wheels-before

# --- wheels from CI --- #
mkdir -p ./wheels-after

export RAPIDS_BUILD_TYPE="pull-request"

# libraft
RAPIDS_PY_WHEEL_NAME="libraft_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/raft \
RAPIDS_REF_NAME="${RAFT_PR}" \
RAPIDS_SHA="${RAFT_PR_SHA}" \
    rapids-download-wheels-from-s3 cpp ./wheels-after

# pylibraft
RAPIDS_PY_WHEEL_NAME="pylibraft_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/raft \
RAPIDS_REF_NAME="${RAFT_PR}" \
RAPIDS_SHA="${RAFT_PR_SHA}" \
    rapids-download-wheels-from-s3 python ./wheels-after

# raft-dask
RAPIDS_PY_WHEEL_NAME="raft_dask_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/raft \
RAPIDS_REF_NAME="${RAFT_PR}" \
RAPIDS_SHA="${RAFT_PR_SHA}" \
    rapids-download-wheels-from-s3 python ./wheels-after

# libcugraph
RAPIDS_PY_WHEEL_NAME="libcugraph_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/cugraph \
RAPIDS_REF_NAME="${CUGRAPH_PR}" \
RAPIDS_SHA="${CUGRAPH_PR_SHA}" \
    rapids-download-wheels-from-s3 cpp ./wheels-after

# pylibcugraph
RAPIDS_PY_WHEEL_NAME="pylibcugraph_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/cugraph \
RAPIDS_REF_NAME="${CUGRAPH_PR}" \
RAPIDS_SHA="${CUGRAPH_PR_SHA}" \
    rapids-download-wheels-from-s3 python ./wheels-after

# cugraph
RAPIDS_PY_WHEEL_NAME="cugraph_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/cugraph \
RAPIDS_REF_NAME="${CUGRAPH_PR}" \
RAPIDS_SHA="${CUGRAPH_PR_SHA}" \
    rapids-download-wheels-from-s3 python ./wheels-after

# libcuml
RAPIDS_PY_WHEEL_NAME="libcuml_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/cuml \
RAPIDS_REF_NAME="${CUML_PR}" \
RAPIDS_SHA="${CUML_PR_SHA}" \
    rapids-download-wheels-from-s3 cpp ./wheels-after

# cuml
RAPIDS_PY_WHEEL_NAME="cuml_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/cuml \
RAPIDS_REF_NAME="${CUML_PR}" \
RAPIDS_SHA="${CUML_PR_SHA}" \
    rapids-download-wheels-from-s3 python ./wheels-after

pip install pydistcheck
pydistcheck \
    --inspect \
    --select 'distro-too-large-compressed' \
    ./wheels-before/*.whl \
| grep -E '^checking|files: | compressed' \
> ./before.txt

# get more exact sizes
du -sh ./wheels-before/*

pydistcheck \
    --inspect \
    --select 'distro-too-large-compressed' \
    ./wheels-after/*.whl \
| grep -E '^checking|files: | compressed' \
> ./after.txt

# get more exact sizes
du -sh ./wheels-after/*
```

</details>

### How I tested this

These other PRs:

* rapidsai/devcontainers#435
* rapidsai/cugraph-gnn#110
* rapidsai/cuml#6199
* rapidsai/cugraph#4804
@rapidsai rapidsai deleted a comment from bdice Jan 17, 2025
# libcuml (C++) and cuml (Cython).
set(CUML_USE_CUVS_STATIC OFF)
set(CUML_EXCLUDE_CUVS_FROM_ALL ON)
include(${CUML_CPP_SRC}/cmake/thirdparty/get_cuvs.cmake)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the Cython code has direct uses cuVS:

cdef extern from "cuvs/cluster/kmeans.hpp" namespace \

so cuVS is needed in the build environment for both libcuml and cuml wheels. That means we end up compiling libcuvs.so in every libcuml build AND every cuml build.

Even if with decent cache hit rates, on this PR I've seen that result in it taking on the order of 2.5 hours end-to-end for all the build-libcuml and build-cuml jobs to complete :/

Maybe we need to stop here and try to add a libcuvs wheel?

)
fi
elif [[ "${package_dir}" == "python/cuml" ]]; then
# TODO(jameslamb): why are the CUDA 11 wheels so big???
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't found the root cause yet, but the CUDA 11 cuml wheels being produced on this branch are a lot bigger than I'd expect.

Suspect it's related to static linking against the CUDA math wheels, but I'm surprised that the difference could be so large for these Cython extensions.

For context, the Cython extension sizes don't really same to vary much by CUDA version on latest branch-25.02:

It's just libcuml++.so driving the big difference in total size on branch-25.02.

CUDA 11.8.0, arm64, Python 3.11

file size
  * compressed size: 1.1G
  * uncompressed size: 4.0G
  * compression space saving: 72.4%
contents
  * directories: 74
  * files: 441 (85 compiled)
...
largest files
  * (86.8M) cuml/experimental/fil/fil.cpython-311-aarch64-linux-gnu.so
  * (86.7M) cuml/fil/fil.cpython-311-aarch64-linux-gnu.so
  * (85.3M) cuml/ensemble/randomforest_shared.cpython-311-aarch64-linux-gnu.so
  * (85.2M) cuml/explainer/tree_shap.cpython-311-aarch64-linux-gnu.so
  * (85.2M) cuml/explainer/kernel_shap.cpython-311-aarch64-linux-gnu.so

(build link)

CUDA 12.5.1, arm64, Python 3.11

file size
  * compressed size: 8.9M
  * uncompressed size: 30.1M
  * compression space saving: 70.4%
contents
  * directories: 74
  * files: 441 (85 compiled)
...
largest files
  * (2.6M) cuml/experimental/fil/fil.cpython-311-aarch64-linux-gnu.so
  * (2.4M) cuml/fil/fil.cpython-311-aarch64-linux-gnu.so
  * (1.0M) cuml/cluster/hdbscan/hdbscan.cpython-311-aarch64-linux-gnu.so
  * (0.9M) cuml/svm/linear.cpython-311-aarch64-linux-gnu.so
  * (0.9M) cuml/manifold/umap.cpython-311-aarch64-linux-gnu.so

(build link)

# --- RAFT---#
# find RAFT before cuVS, to avoid
# cuVS CMake defining conflicting versions of targets like 'nvidia::cutlass::cutlass'
include(${CUML_CPP_SRC}/cmake/thirdparty/get_raft.cmake)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without first finding RAFT here, builds fail like this:

  -- Found Thrust: /pyenv/versions/3.12.7/lib/python3.12/site-packages/libraft/lib64/rapids/cmake/thrust/thrust-config.cmake (found suitable exact version "2.7.0.0")
  -- Found CCCL: /pyenv/versions/3.12.7/lib/python3.12/site-packages/libraft/lib64/rapids/cmake/cccl/cccl-config.cmake (found version "2.7.0.0")
  -- Found nvtx3: /pyenv/versions/3.12.7/lib/python3.12/site-packages/librmm/lib64/cmake/nvtx3/nvtx3-config.cmake (found version "3.1.0")
  -- Found rmm: /pyenv/versions/3.12.7/lib/python3.12/site-packages/librmm/lib64/cmake/rmm/rmm-config.cmake (found version "25.02.0")
  CMake Error at /pyenv/versions/3.12.7/lib/python3.12/site-packages/libraft/lib64/cmake/NvidiaCutlass/NvidiaCutlassTargets.cmake:42 (message):
    Some (but not all) targets in this export set were already defined.

    Targets Defined: nvidia::cutlass::cutlass

    Targets not yet defined: nvidia::cutlass::tools::util

  Call Stack (most recent call first):
    /pyenv/versions/3.12.7/lib/python3.12/site-packages/libraft/lib64/cmake/NvidiaCutlass/NvidiaCutlassConfig.cmake:9 (include)
    /pyenv/versions/3.12.7/lib/python3.12/site-packages/cmake/data/share/cmake-3.31/Modules/CMakeFindDependencyMacro.cmake:76 (find_package)
    /pyenv/versions/3.12.7/lib/python3.12/site-packages/libraft/lib64/cmake/raft/raft-dependencies.cmake:43 (find_dependency)
    /pyenv/versions/3.12.7/lib/python3.12/site-packages/libraft/lib64/cmake/raft/raft-config.cmake:83 (include)
    /pyenv/versions/3.12.7/lib/python3.12/site-packages/libcuml/lib64/cmake/cuml/cuml-dependencies.cmake:40 (find_package)
    /pyenv/versions/3.12.7/lib/python3.12/site-packages/libcuml/lib64/cmake/cuml/cuml-config.cmake:72 (include)
    CMakeLists.txt:150 (find_package)


  -- Configuring incomplete, errors occurred!

  *** CMake configuration failed
  error: subprocess-exited-with-error

I suspect that's some interaction between RAFT's exports (which do include cutlass)) and cuVS's (code link), but I haven't figured it out yet.

@jameslamb
Copy link
Member Author

/ok to test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - In Progress Currenty a work in progress 5 - DO NOT MERGE Hold off on merging; see PR for details ci CMake conda conda issue CUDA/C++ Cython / Python Cython or Python issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants