Releases: rapidsai/cuml
Releases · rapidsai/cuml
v21.08.02
v21.08.01
v21.08.00
🚨 Breaking Changes
- Remove deprecated target_weights in UMAP (#4081) @lowener
- Upgrade Treelite to 2.0.0 (#4072) @hcho3
- RF/DT cleanup (#4005) @venkywonka
- RF: memset and batch size optimization for computing splits (#4001) @venkywonka
- Remove old RF backend (#3868) @RAMitchell
- Enable warp-per-tree inference in FIL for regression and binary classification (#3760) @levsnv
🐛 Bug Fixes
- Disabling umap reproducibility tests for cuda 11.4 (#4128) @cjnolet
- Fix for crash in RF when
max_leaves
parameter is specified (#4126) @vinaydes - Running umap mnmg test twice (#4112) @cjnolet
- Minimal fix for
SparseRandomProjection
(#4100) @viclafargue - Creating copy of
components
in PCA transform and inverse transform (#4099) @divyegala - Fix SVM model parameter handling in case n_support=0 (#4097) @tfeher
- Fix set_params for linear models (#4096) @lowener
- Fix train test split pytest comparison (#4062) @dantegd
- Fix fit_transform on KMeans (#4055) @lowener
- Fixing -1 key access in 1nn reduce op in HDBSCAN (#4052) @divyegala
- Disable installing gbench to avoid container permission issues (#4049) @dantegd
- Fix double fit crash in preprocessing models (#4040) @viclafargue
- Always add
faiss
library alias if it's missing (#4028) @trxcllnt - Fixing intermittent HBDSCAN pytest failure in CI (#4025) @divyegala
- HDBSCAN bug on A100 (#4024) @divyegala
- Add treelite include paths to treelite targets (#4023) @trxcllnt
- Add Treelite_BINARY_DIR include to
cuml++
build interface include paths (#4018) @trxcllnt - Small ARIMA-related bug fixes in Hessenberg reduction and make_arima (#4017) @Nyrio
- Update setup.py (#4015) @ajschmidt8
- Update
treelite
version inget_treelite.cmake
(#4014) @ajschmidt8 - Fix build with latest RAFT branch-21.08 (#4012) @trxcllnt
- Skipping hdbscan pytests when gpu is a100 (#4007) @cjnolet
- Using 64-bit array lengths to increase scale of pca & tsvd (#3983) @cjnolet
- Fix MNMG test in Dask RF (#3964) @hcho3
- Use nested include in destination of install headers to avoid docker permission issues (#3962) @dantegd
- Fix automerge #3939 (#3952) @dantegd
- Update UCX-Py version to 0.21 (#3950) @pentschev
- Fix kernel and line info in cmake (#3941) @dantegd
- Fix for multi GPU PCA compute failing bug after transform and added error handling when n_components is not passed (#3912) @akaanirban
- Tolerate QN linesearch failures when it's harmless (#3791) @achirkin
📖 Documentation
- Improve docstrings for silhouette score metrics. (#4026) @bdice
- Update CHANGELOG.md link (#3956) @Salonijain27
- Update documentation build examples to be generator agnostic (#3909) @robertmaynard
- Improve FIL code readability and documentation (#3056) @levsnv
🚀 New Features
- Add Multinomial and Bernoulli Naive Bayes variants (#4053) @lowener
- Add weighted K-Means sampling for SHAP (#4051) @Nanthini10
- Use chebyshev, canberra, hellinger and minkowski distance metrics (#3990) @mdoijade
- Implement vector leaf prediction for fil. (#3917) @RAMitchell
- change TargetEncoder's smooth argument from ratio to count (#3876) @daxiongshu
- Enable warp-per-tree inference in FIL for regression and binary classification (#3760) @levsnv
🛠️ Improvements
- Remove clang/clang-tools from conda recipe (#4109) @dantegd
- Pin dask version (#4108) @galipremsagar
- ANN warnings/tests updates (#4101) @viclafargue
- Removing local memory operations from computeSplitKernel and other optimizations (#4083) @vinaydes
- Fix libfaiss dependency to not expressly depend on conda-forge (#4082) @Ethyling
- Remove deprecated target_weights in UMAP (#4081) @lowener
- Upgrade Treelite to 2.0.0 (#4072) @hcho3
- Optimize dtype conversion for FIL (#4070) @dantegd
- Adding quick notes to HDBSCAN public API docs as to why discrepancies may occur between cpu and gpu impls. (#4061) @cjnolet
- Update
conda
environment name for CI (#4039) @ajschmidt8 - Rewrite random forest gtests (#4038) @RAMitchell
- Updating Clang Version to 11.0.0 (#4029) @codereport
- Raise ARIMA parameter limits from 4 to 8 (#4022) @Nyrio
- Testing extract clusters in HDBSCAN (#4009) @divyegala
- ARIMA - Kalman loop rewrite: single megakernel instead of host loop (#4006) @Nyrio
- RF/DT cleanup (#4005) @venkywonka
- Exposing condensed hierarchy through cython for easier unit-level testing (#4004) @cjnolet
- Use the 21.08 branch of rapids-cmake as rmm requires it (#4002) @robertmaynard
- RF: memset and batch size optimization for computing splits (#4001) @venkywonka
- Reducing cluster size to number of selected clusters. Returning stability scores (#3987) @cjnolet
- HDBSCAN: Lazy-loading (and caching) condensed & single-linkage tree objects (#3986) @cjnolet
- Fix
21.08
forward-merge conflicts (#3982) @ajschmidt8 - Update Dask/Distributed version (#3978) @pentschev
- Use clang-tools on x86 only (#3969) @jakirkham
- Promote
trustworthiness_score
to public header, add missing includes, update dependencies (#3968) @trxcllnt - Moving FAISS ANN wrapper to raft (#3963) @cjnolet
- Add MG weighted k-means (#3959) @lowener
- Remove unused code in UMAP. (#3931) @trivialfis
- Fix automerge #3900 and correct package versions in meta packages (#3918) @dantegd
- Adaptive stress tests when GPU memory capacity is insufficient (#3916) @lowener
- Fix merge conflicts (#3892) @ajschmidt8
- Remove old RF backend (#3868) @RAMitchell
- Refactor to extract random forest objectives (#3854) @RAMitchell
v21.06.02
v21.06.01
v21.06.00
🚨 Breaking Changes
- Remove Base.enable_rmm_pool method as it is no longer needed (#3875) @teju85
- RF: Make experimental-backend default for regression tasks and deprecate old-backend. (#3872) @venkywonka
- Deterministic UMAP with floating point rounding. (#3848) @trivialfis
- Fix RF regression performance (#3845) @RAMitchell
- Add feature to print forest shape in FIL upon importing (#3763) @levsnv
- Remove 'seed' and 'output_type' deprecated features (#3739) @lowener
🐛 Bug Fixes
- Disable UMAP deterministic test on CTK11.2 (#3942) @trivialfis
- Revert #3869 (#3933) @hcho3
- RF: fix the bug in
pdf_to_cdf
device function that causes hang whenn_bins > TPB && n_bins % TPB != 0
(#3921) @venkywonka - Fix number of permutations in pytest and getting handle for cuml models (#3920) @dantegd
- Fix typo in umap
target_weight
parameter (#3914) @lowener - correct compliation of cuml c library (#3908) @robertmaynard
- Correct install path for include folder to avoid double nesting (#3901) @dantegd
- Add type check for y in train_test_split (#3886) @Nanthini10
- Fix for MNMG test_rf_classification_dask_fil_predict_proba (#3831) @lowener
- Fix MNMG test test_rf_regression_dask_fil (#3830) @hcho3
- AgglomerativeClustering support single cluster and ignore only zero distances from self-loops (#3824) @cjnolet
📖 Documentation
- Small doc fixes for 21.06 release (#3936) @dantegd
- Document ability to export cuML RF to predict on other machines (#3890) @hcho3
🚀 New Features
- Deterministic UMAP with floating point rounding. (#3848) @trivialfis
- HDBSCAN (#3821) @cjnolet
- Add feature to print forest shape in FIL upon importing (#3763) @levsnv
🛠️ Improvements
- Pin dask ot 2021.5.1 for 21.06 release (#3937) @dantegd
- Upgrade xgboost to 1.4.2 (#3925) @dantegd
- Use UCX-Py 0.20 (#3911) @jakirkham
- Upgrade NCCL to 2.9.9 (#3902) @dantegd
- Update conda developer environments (#3898) @viclafargue
- ARIMA: pre-allocation of temporary memory to reduce latencies (#3895) @Nyrio
- Condense TSNE parameters into a struct (#3884) @lowener
- Update
CHANGELOG.md
links for calver (#3883) @ajschmidt8 - Make sure
__init__
is called in graph callback. (#3881) @trivialfis - Update docs build script (#3877) @ajschmidt8
- Remove Base.enable_rmm_pool method as it is no longer needed (#3875) @teju85
- RF: Make experimental-backend default for regression tasks and deprecate old-backend. (#3872) @venkywonka
- Enable probability output from RF binary classifier (alternative implementaton) (#3869) @hcho3
- CI test speed improvement (#3851) @lowener
- Fix RF regression performance (#3845) @RAMitchell
- Update to CMake 3.20 features,
rapids-cmake
andCPM
(#3844) @dantegd - Support sparse input features in QN solvers and Logistic Regression (#3827) @achirkin
- Trustworthiness score improvements (#3826) @viclafargue
- Performance optimization of RF split kernels by removing empty cycles (#3818) @vinaydes
- Correct deprecate positional args decorator for CalVer (#3784) @lowener
- ColumnTransformer & FunctionTransformer (#3745) @viclafargue
- Remove 'seed' and 'output_type' deprecated features (#3739) @lowener
v0.19.0
🚨 Breaking Changes
- Use the new RF backend by default for classification (#3686) @hcho3
- Deprecating quantile-per-tree and removing three previously deprecated Random Forest parameters (#3667) @vinaydes
- Update predict() / predict_proba() of RF to match sklearn (#3609) @hcho3
- Upgrade FAISS to 1.7.x (#3509) @viclafargue
- cuML's estimator Base class for preprocessing models (#3270) @viclafargue
🐛 Bug Fixes
- Fix brute force KNN distance metric issue (#3755) @viclafargue
- Fix min_max_axis (#3735) @viclafargue
- Fix NaN errors observed with ARIMA in CUDA 11.2 builds (#3730) @Nyrio
- Fix random state generator (#3716) @viclafargue
- Fixes the out of memory access issue for computeSplit kernels (#3715) @vinaydes
- Fixing umap gtest failure under cuda 11.2. (#3696) @cjnolet
- Fix irreproducibility issue in RF classification (#3693) @vinaydes
- BUG fix BatchedLevelAlgo DtClsTest & DtRegTest failing tests (#3690) @venkywonka
- Restore the functionality of RF score() (#3685) @hcho3
- Use main build.sh to build docs in docs CI (#3681) @dantegd
- Revert "Update conda recipes pinning of repo dependencies" (#3680) @raydouglass
- Skip tests that fail on CUDA 11.2 (#3679) @dantegd
- Dask KNN Cl&Re 1D labels (#3668) @viclafargue
- Update conda recipes pinning of repo dependencies (#3666) @mike-wendt
- OOB access in GLM SoftMax (#3642) @divyegala
- SilhouetteScore C++ tests seed (#3640) @divyegala
- SimpleImputer fix (#3624) @viclafargue
- Silhouette Score
make_monotonic
for non-monotonic label set (#3619) @divyegala - Fixing support for empty rows in sparse Jaccard / Cosine (#3612) @cjnolet
- Fix train_test_split with stratify option (#3611) @Nanthini10
- Update predict() / predict_proba() of RF to match sklearn (#3609) @hcho3
- Change dask and distributed branch to main (#3593) @dantegd
- Fixes memory allocation for experimental backend and improves quantile computations (#3586) @vinaydes
- Add ucx-proc package back that got lost during an auto merge conflict (#3550) @dantegd
- Fix failing Hellinger gtest (#3549) @cjnolet
- Directly invoke make for non-CMake docs target (#3534) @wphicks
- Fix Codecov.io Coverage Upload for Branch Builds (#3524) @mdemoret-nv
- Ensure global_output_type is thread-safe (#3497) @wphicks
- List as input for SimpleImputer (#3489) @viclafargue
📖 Documentation
- Add sparse docstring comments (#3712) @JohnZed
- FIL and Dask demo (#3698) @miroenev
- Deprecating quantile-per-tree and removing three previously deprecated Random Forest parameters (#3667) @vinaydes
- Fixing Indentation for Docstring Generators (#3650) @mdemoret-nv
- Update doc to indicate ExtraTree support (#3635) @hcho3
- Update doc, now that FIL supports multi-class classification (#3634) @hcho3
- Document model_type='xgboost_json' in FIL (#3633) @hcho3
- Including log loss metric to the documentation website (#3617) @lowener
- Update the build doc regarding the use of GCC 7.5 (#3605) @hcho3
- Update One-Hot Encoder doc (#3600) @lowener
- Fix documentation of KMeans (#3595) @lowener
🚀 New Features
- Reduce the size of the cuml libraries (#3702) @robertmaynard
- Use ninja as default CMake generator (#3664) @wphicks
- Single-Linkage Hierarchical Clustering Python Wrapper (#3631) @cjnolet
- Support for precomputed distance matrix in DBSCAN (#3585) @Nyrio
- Adding haversine to brute force knn (#3579) @cjnolet
- Support for sample_weight parameter in LogisticRegression (#3572) @viclafargue
- Provide "--ccache" flag for build.sh (#3566) @wphicks
- Eliminate unnecessary includes discovered by cppclean (#3564) @wphicks
- Single-linkage Hierarchical Clustering C++ (#3545) @cjnolet
- Expose sparse distances via semiring to Python API (#3516) @lowener
- Use cmake --build in build.sh to facilitate switching build tools (#3487) @wphicks
- Add cython hinge_loss (#3409) @Nanthini10
- Adding CodeCov Info for Dask Tests (#3338) @mdemoret-nv
- Add predict_proba() to XGBoost-style models in FIL C++ (#2894) @levsnv
🛠️ Improvements
- Updating docs, readme, and umap param tests for 0.19 (#3731) @cjnolet
- Locking RAFT hash for 0.19 (#3721) @cjnolet
- Upgrade to Treelite 1.1.0 (#3708) @hcho3
- Update to XGBoost 1.4.0rc1 (#3699) @hcho3
- Use the new RF backend by default for classification (#3686) @hcho3
- Update LogisticRegression documentation (#3677) @viclafargue
- Preprocessing out of experimental (#3676) @viclafargue
- ENH Decision Tree new backend
computeSplit*Kernel
histogram calculation optimization (#3674) @venkywonka - Remove
check_cupy8
(#3669) @viclafargue - Use custom conda build directory for ccache integration (#3658) @dillon-cullinan
- Disable three flaky tests (#3657) @hcho3
- CUDA 11.2 developer environment (#3648) @dantegd
- Store data frequencies in tree nodes of RF (#3647) @hcho3
- Row major Gram matrices (#3639) @tfeher
- Converting all Estimator Constructors to Keyword Arguments (#3636) @mdemoret-nv
- Adding make_pipeline + test score with pipeline (#3632) @viclafargue
- ENH Decision Tree new backend
computeSplitClassificationKernel
histogram calculation and occupancy optimization (#3616) @venkywonka - Revert "ENH Fix stale GHA and prevent duplicates " (#3614) @mike-wendt
- ENH Fix stale GHA and prevent duplicates (#3613) @mike-wendt
- KNN from RAFT (#3603) @viclafargue
- Update Changelog Link (#3601) @ajschmidt8
- Move SHAP explainers out of experimental (#3596) @dantegd
- Fixing compatibility issue with CUDA array interface (#3594) @lowener
- Remove cutlass usage in row major input for euclidean exp/unexp, cosine and L1 distance matrix (#3589) @mdoijade
- Test FIL probabilities with absolute error thresholds in python (#3582) @levsnv
- Removing sparse prims and fused l2 nn prim from cuml (#3578) @cjnolet
- Prepare Changelog for Automation (#3570) @ajschmidt8
- Print debug message if SVM convergence is poor (#3562) @tfeher
- Fix merge conflicts in 3552 (#3557) @ajschmidt8
- Additional distance metrics for ANN (#3533) @viclafargue
- Improve warning message when QN solver reaches max_iter (#3515) @tfeher
- Fix merge conflicts in 3502 (#3513) @ajschmidt8
- Upgrade FAISS to 1.7.x (#3509) @viclafargue
- ENH Pass ccache variables to conda recipe & use Ninja in CI (#3508) @Ethyling
- Fix forward-merger conflicts in #3502 (#3506) @dantegd
- Sklearn meta-estimators into namespace (#3493) @viclafargue
- Add flexibility to copyright checker (#3466) @lowener
- Update sparse KNN to use rmm device buffer (#3460) @lowener
- Fix forward-merger conflicts in #3444 (#3455) @ajschmidt8
- Replace ML::MetricType with raft::distance::DistanceType (#3389) @lowener
- RF param initialization cython and C++ layer cleanup (#3358) @venkywonka
- MNMG RF broadcast feature (#3349) @viclafargue
- cuML's estimator Base class for preprocessing models (#3270) @viclafargue
- Make
_get_tags
a class/static method (#3257) @dantegd - NVTX Markers for RF and RF-backend (#3014) @venkywonka
v0.18.0
Breaking Changes 🚨
- cuml.experimental SHAP improvements (#3433) @dantegd
- Enable feature sampling for the experimental backend of Random Forest (#3364) @vinaydes
- re-enable cuML's copyright checker script (#3363) @teju85
- Batched Silhouette Score (#3362) @divyegala
- Update failing MNMG tests (#3348) @viclafargue
- Rename print_summary() of Dask RF to get_summary_text(); it now returns string to the client (#3341) @hcho3
- Rename dump_as_json() -> get_json(); expose it from Dask RF (#3340) @hcho3
- MNMG KNN consolidation (#3307) @viclafargue
- Return confusion matrix as int unless float weights are used (#3275) @lowener
- Approximate Nearest Neighbors (#2780) @viclafargue
Bug Fixes 🐛
- HOTFIX Add ucx-proc package back that got lost during an auto merge conflict (#3551) @dantegd
- Non project-flash CI ml test 18.04 issue debugging and bugfixing (#3495) @dantegd
- Temporarily xfail KBinsDiscretizer uniform tests (#3494) @wphicks
- Fix illegal memory accesses when NITEMS > 1, and nrows % NITEMS != 0. (#3480) @canonizer
- Update call to dask client persist (#3474) @dantegd
- Adding warning for IVFPQ (#3472) @viclafargue
- Fix failing sparse NN test in CI by allowing small number of index discrepancies (#3454) @cjnolet
- Exempting thirdparty code from copyright checks (#3453) @lowener
- Relaxing Batched SilhouetteScore Test Constraint (#3452) @divyegala
- Mark kbinsdiscretizer quantile tests as xfail (#3450) @wphicks
- Fixing documentation on SimpleImputer (#3447) @lowener
- Skipping IVFPQ (#3429) @viclafargue
- Adding tol to dask test_kmeans (#3426) @lowener
- Fix memory bug for SVM with large n_rows (#3420) @tfeher
- Allow linear regression for with CUDA >=11.0 (#3417) @wphicks
- Fix vectorizer tests by restoring sort behavior in groupby (#3416) @JohnZed
- Ensure make_classification respects output type (#3415) @wphicks
- Clean Up
#include
Dependencies (#3402) @mdemoret-nv - Fix Nearest Neighbor Stress Test (#3401) @lowener
- Fix array_equal in tests (#3400) @viclafargue
- Improving Copyright Check When Not Running in CI (#3398) @mdemoret-nv
- Also xfail zlib errors when downloading newsgroups data (#3393) @JohnZed
- Fix for ANN memory release bug (#3391) @viclafargue
- XFail Holt Winters test where statsmodels has known issues with gcc 9.3.0 (#3385) @JohnZed
- FIX Update cupy to >= 7.8 and remove unused build.sh script (#3378) @dantegd
- re-enable cuML's copyright checker script (#3363) @teju85
- Update failing MNMG tests (#3348) @viclafargue
- Rename print_summary() of Dask RF to get_summary_text(); it now returns string to the client (#3341) @hcho3
- Fixing
make_blobs
to Respect the Global Output Type (#3339) @mdemoret-nv - Fix permutation explainer (#3332) @RAMitchell
- k-means bug fix in debug build (#3321) @akkamesh
- Fix for default arguments of PCA (#3320) @lowener
- Provide workaround for cupy.percentile bug (#3315) @wphicks
- Fix SVR unit test parameter (#3294) @tfeher
- Add xfail on fetching 20newsgroup dataset (test_naive_bayes) (#3291) @lowener
- Remove unused keyword in PorterStemmer code (#3289) @wphicks
- Remove static specifier in DecisionTree unit test for C++14 compliance (#3281) @wphicks
- Correct pure virtual declaration in manifold_inputs_t (#3279) @wphicks
Documentation 📖
- Correct import path in docs for experimental preprocessing features (#3488) @wphicks
- Minor doc updates for 0.18 (#3475) @JohnZed
- Improve Python Docs with Default Role (#3445) @mdemoret-nv
- Fixing Python Documentation Errors and Warnings (#3428) @mdemoret-nv
- Remove outdated references to changelog in CONTRIBUTING.md (#3328) @wphicks
- Adding highlighting to bibtex in readme (#3296) @cjnolet
New Features 🚀
- Improve runtime performance of RF to Treelite conversion (#3410) @wphicks
- Parallelize Treelite to FIL conversion over trees (#3396) @wphicks
- Parallelize RF to Treelite conversion over trees (#3395) @wphicks
- Allow saving Dask RandomForest models immediately after training (fixes #3331) (#3388) @jameslamb
- genetic programming initial structures (#3387) @teju85
- MNMG DBSCAN (#3382) @Nyrio
- FIL to use L1 cache when input columns don't fit into shared memory (#3370) @levsnv
- Enable feature sampling for the experimental backend of Random Forest (#3364) @vinaydes
- Batched Silhouette Score (#3362) @divyegala
- Rename dump_as_json() -> get_json(); expose it from Dask RF (#3340) @hcho3
- Exposing model_selection in a similar way to scikit-learn (#3329) @ptartan21
- Promote IncrementalPCA from experimental in 0.18 release (#3327) @lowener
- Create labeler.yml (#3324) @jolorunyomi
- Add slow high-precision mode to KNN (#3304) @wphicks
- Sparse TSNE (#3293) @divyegala
- Sparse Generalized SPMV (semiring) Primitive (#3146) @cjnolet
- Multiclass meta estimator wrappers and multiclass SVC (#3092) @tfeher
- Approximate Nearest Neighbors (#2780) @viclafargue
- Add KNN parameter to t-SNE (#2592) @aleksficek
Improvements 🛠️
- Update stale GHA with exemptions & new labels (#3507) @mike-wendt
- Add GHA to mark issues/prs as stale/rotten (#3500) @Ethyling
- Fix naive bayes inputs (#3448) @cjnolet
- Prepare Changelog for Automation (#3442) @ajschmidt8
- cuml.experimental SHAP improvements (#3433) @dantegd
- Speed up knn tests (#3411) @JohnZed
- Replacing sklearn functions with cuml in RF MNMG notebook (#3408) @lowener
- Auto-label PRs based on their content (#3407) @jolorunyomi
- Use stable 1.0.0 version of Treelite (#3394) @hcho3
- API update to match RAFT PR #120 (#3386) @drobison00
- Update linear models to use RMM memory allocation (#3365) @lowener
- Updating dense pairwise distance enum names (#3352) @cjnolet
- Upgrade Treelite module (#3316) @hcho3
- Removed FIL node types with
_t
suffix (#3314) @canonizer - MNMG KNN consolidation (#3307) @viclafargue
- Updating PyTests to Stay Below 4 Gb Limit (#3306) @mdemoret-nv
- Refactoring: move internal FIL interface to a separate file (#3292) @canonizer
- Return confusion matrix as int unless float weights are used (#3275) @lowener
- 018 add unfitted error pca & tests on IPCA (#3272) @lowener
- Linear models predict function consolidation (#3256) @dantegd
- Preparing sparse primitives for movement to RAFT (#3157) @cjnolet