🗺️ Public Roadmap

Detection of problematic data slices
Basic explanation of found issues via feature importances
Limited embedding computation for images, audio, text
Extended embedding support, e.g., more embedding models and allow precomputed embeddings
Speed up embedding computation using datasets library
Improved issue detection algorithm, avoiding duplicate detections of similar problems and outliers influencing the segment detection
Support application on datasets without labels (outlier based)
Adaptive drop reference for datasets that contain a wide variety of data
Large data support for detection and reporting, e.g., 500k audio samples with transcriptions
Different interfaces from min_drop, min_support. Maybe n_slices and sort by criterion?
Support application without model (by training simple baseline model)
Improve normalization for mixed type runs e.g. embedding + one categorical or numeric variable.
Walthroughs for unstructured, structured and mixed data. Also, in depth tutorial explaining all the parameters.
Soft Dependencies for embedding computation and autml as torch and xgboost dependencies are large
Per use case helpers such as find_issues_object_detection, find_issues_ts_forecasting, ...
Allow for model comparisons via intersection, difference, ...
Allow application of sliceguard on timeseries
Add Sliceguard deepdive notebook to show more advanced usage
Build sphinx docs
Stronger automated testing
Robustify outlier detection algorithm. Probably better parameter choice.
Interpretable features for images, audio, text. E.g., dark image, quiet audio, long audio, contains common word x, ...
Generation of a summary report doing predefined checks
"Supervised" clustering that incorporates classes, probabilities, metrics, not only features
Data connectors for faster application on common data formats
Support embedding generation for remote resources, e.g. audio/images hosted on webservers
Improved explanations for found issues, e.g., via SHAP

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ROADMAP.md

ROADMAP.md

🗺️ Public Roadmap

Files

ROADMAP.md

Latest commit

History

ROADMAP.md

File metadata and controls

🗺️ Public Roadmap