Maintenance#
+-
+
Make
skrub
compatible with scikit-learn 1.6. +#1169 by Guillaume Lemaitre.
+
From 3b7dff7213f538f4a4537534e9e807ed0de95525 Mon Sep 17 00:00:00 2001
From: skrub-ci A new parameter A parameter Generating a The labels on bar plots in the Improve the performance of Make Release history#
+Release 0.4.1#
+Changes#
+
+
+verbose
has been added to the TableReport
to toggle on or off the
+printing of progress information when a report is being generated.
+#1182 by Priscilla Baah.verbose
has been added to the patch_display()
to toggle on or off the
+printing of progress information when a table report is being generated.
+#1188 by Priscilla Baah.tabular_learner()
accepts the alias "regression"
for the option
+"regressor"
and "classification"
for "classifier"
.
+#1180 by Mojdeh Rastgoo.Bug fixes#
+
+
+TableReport
could have an effect on the matplotib
+configuration which could cause plots not to display inline in jupyter
+notebooks any more. This has been fixed in skrub in #1172 by
+Jérôme Dockès and the matplotlib issue can be tracked
+here.TableReport
for columns of object dtypes
+that have a repr spanning multiple lines could be unreadable. This has been
+fixed in #1196 by Jérôme Dockès.deduplicate()
by removing some unnecessary
+computations. #1193 by Jérôme Dockès.Maintenance#
+
+
+skrub
compatible with scikit-learn 1.6.
+#1169 by Guillaume Lemaitre.Release 0.4.0#
Minor changes#1157 by Jérôme Dockès.
The TableReport
could raise an exception when one of the columns
contained datetimes with time zones and missing values; this has been fixed in
@@ -632,8 +671,8 @@
For tree-based models, tabular_learner()
now adds
handle_unknown=’use_encoded_value’ to the OrdinalEncoder, to avoid
@@ -680,8 +719,8 @@
Polars dataframes are now supported across all skrub
estimators.
TableReport
generates an interactive report for a dataframe. This
@@ -689,8 +728,8 @@
The InterpolationJoiner
now supports polars dataframes. #1016
by Théo Jolivet.
Joiner
and fuzzy_join()
used to raise an error when columns
with the same name appeared in the main and auxiliary table (after adding the
@@ -721,8 +760,8 @@
The Joiner
has been adapted to support polars dataframes. #945 by Théo Jolivet.
The TableVectorizer
now consistently applies the same transformation
@@ -752,8 +791,8 @@
GapEncoder
and MinHashEncoder
used to modify their input
in-place, replacing missing values with a string. They no longer do so. Their
@@ -780,8 +819,8 @@
TargetEncoder
has been removed in favor of
sklearn.preprocessing.TargetEncoder
, available since scikit-learn 1.3.
DatetimeEncoder
doesn’t remove constant features anymore.
It also supports an ‘errors’ argument to raise or coerce errors during
@@ -920,7 +959,7 @@
TableVectorizer
never output a sparse matrix by default. This can be changed by
increasing the sparse_threshold parameter. #646 by Leo Grinsztajn
TableVectorizer
doesn’t fail anymore if an infered type doesn’t work during transform.
+
TableVectorizer
doesn’t fail anymore if an inferred type doesn’t work during transform.
The new entries not matching the type are replaced by missing values. #666 by Leo Grinsztajn
fuzzy_join()
and FeatureAugmenter
can now join on numerical columns based on the euclidean distance.
#530 by Jovan Stojanovic
Improvement of date column detection and date format inference in TableVectorizer
. The
format inference now tries to find a format which works for all non-missing values of the column, and only
@@ -974,8 +1013,8 @@
SuperVectorizer is renamed as TableVectorizer
, a warning is raised when using the old name.
#484 by Jovan Stojanovic
Add example Wikipedia embeddings to enrich the data. #487 by Jovan Stojanovic
datasets.fetching: contains a new function get_ken_embeddings()
that can be used to download Wikipedia
@@ -1016,8 +1055,8 @@
The MinHashEncoder
now considers None and empty strings as missing values, rather
than raising an error. #378 by Gael Varoquaux
New encoder: DatetimeEncoder
can transform a datetime column into several numerical columns
(year, month, day, hour, minute, second, …). It is now the default transformer used
@@ -1071,8 +1110,8 @@
Fixed a bug in the TableVectorizer
causing a FutureWarning
when using the get_feature_names_out()
method. #262 by Lilian Boulard
Improvements to the TableVectorizer
@@ -1106,16 +1145,16 @@Major changes
Fixed a bug that resulted in the GapEncoder
ignoring the analyzer argument. #242 by Jovan Stojanovic
GapEncoder
’s get_feature_names_out now accepts all iterators, not just lists. #255 by Lilian Boulard
Fixed DeprecationWarning
raised by the usage of distutils.version.LooseVersion. #261 by Lilian Boulard
Remove trailing imports in the MinHashEncoder
.
Fix typos and update links for website.
Also see pre-release 0.2.0a1 below for additional changes.
-Bump minimum dependencies:
Removed hard-coded CSV file dirty_cat/data/FiveThirtyEight_Midwest_Survey.csv.
Improvements to the TableVectorizer
pip install git+https://github.com/dirty-cat/dirty_cat.git
Bump minimum dependencies:
Fix get_feature_names for scikit-learn > 0.21. #216 by Alexis Cvetkov
RuntimeWarnings due to overflow in GapEncoder
. #161 by Alexis Cvetkov
GapEncoder
: Added online Gamma-Poisson factorization through the
GapEncoder
class. This method discovers latent categories formed
@@ -1240,8 +1279,8 @@
Multiprocessing exception in notebook. #154 by Lilian Boulard