-
Notifications
You must be signed in to change notification settings - Fork 546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Make dask
an optional dependency
#6240
base: branch-25.02
Are you sure you want to change the base?
Conversation
This moves all dask dependencies of `cuml` to be optional. `pip` users can install them via a `dask` extra (e.g. `pip install cuml[dask]`), while conda users can install them with a conda metapackage (e.g. `pip install cuml-dask`). Dask dependency versions are pinned accordingly as needed.
We now only install dask dependencies for the dask test runs.
- Removes dask imports from everywhere except the `cuml.dask` namespace - Cleans up the imports in the `cuml.dask` namespace. In particular, non of the existing warning code was actually functional. - Adds a nice `ImportError` for users trying to import `cuml.dask` without the required dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's still some failing tests in cuml/tests/
that rely on dask being present:
test_base
/test_pickle
: these create shared tests for all estimators, but obviously the dask tests can't run when dask isn't installed. Easiest fix would be to skip those automatically if dask isn't installed, and add these files to the dask test runs in CI so they still run somewhere.test_benchmark
: some benchmarks rely on dask. Could skip these if dask isn't installed?test_serialize.py:test_naive_bayes_cuda
: this is a dask specific test, but afaict we're just testing that dask's serializers can serialize a cuml class? This test should probably be moved into thetests/dask
folder.
- cuda-cudart-dev | ||
{% endif %} | ||
- cuda-python | ||
outputs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't verified this recipe works at all, just based it on other recipes I've done in the past that have used multiple outputs for metapackages. Hoping CI/review will catch any issues here 🤞 .
" conda install cuml-dask # either conda install\n" | ||
' pip install -U "cuml[dask]" # or pip install\n' | ||
) | ||
raise ImportError(msg) from exc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we raise a nicer ImportError
when a user tries to import something from cuml.dask
when all dependencies aren't installed.
else: | ||
warnings.warn( | ||
"Dask not found. All Dask-based multi-GPU operation is disabled." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This previous warnings code would never run. If a dep wasn't present, the error would occur far before here. Better to just use normal imports.
This PR makes
dask
/distributed
/dask_cudf
/raft_dask
/dask_cuda
optional dependencies.By default, an install of
cuml
will not include these dependencies. Users will not be able to use the functionality available incuml.dask
, but everything else will work as is.To install
cuml
withdask
support, users can pip or conda install the extra deps asFixes #5934.