Skip to content

ML.NET 3.0.0

Compare
Choose a tag to compare
@michaelgsharp michaelgsharp released this 22 Nov 19:45
· 164 commits to main since this release
d96d7b7

ML.NET 3.0.0

New Features

  • Add the ability to use Object Detection using TorchSharp (#6605) - We have added a new deep learning model back by TorchSharp that lets you fine tune your own Object Detection model!
  • Add SamplingKeyColumnName to AutoMLExperiment API (#6649) - You can now set the SamplingKeyColumnName when you are using AutoML. Thanks @torronen!
  • Add Object Detection to AutoML Sweeper (#6633) - Added Object Detection to the AutoML Sweeper so now they can be used together.
  • Add String Vector support to DataFrame (#6628) - Adds support for String Vectors in DataFrame. This also allows for Better IDataView <-> DataFrame conversions.
  • Add AutoZero tuner to BinaryClassification (#6615) - Can now use AutoZero tuner in AutoML Binary Classification experiments.
  • Added in fairness assessment and mitigation (#6539) - Support for fairness assessment and mitigation tool
  • Added in Support for some Intel OneDal Algorithms (#6521) - You can now use Intel's OneDal for some algorithms. This gives you access to some accelerated versions of these algorithms. The models are fully interoperable between ML.NET's normal models and these, so you can train with OneDal and then still run on machines where OneDal is not supported. Thanks @rgesteve!
  • Add in ability to have pre-defined weights for ngrams (#6458) - If you know the weights of your NGrams already you can now directly provide that.
  • Add SentenceSimilarity sweepable estimator in AutoML (#6445) - Can now use SentenceSimilarity with the sweepable estimator.
  • Add VBufferDataFrameCoumn to DataFrame (#6409) - Now DataFrame can support the VBuffer from ML.NET so the IDataView <-> DataFrame conversion can work with those types.
  • Added ADO.NET importing/exporting functionality to DataFrame (#5975) - Can now use ADO.NET import/export with DataFrames. Thanks @andrei-faber!
  • Added native binaries for Windows Arm64 (#6813) - This allows certain native transforms to be run on Widows Arm that were disabled before.
  • Switches some computational code to use the new Tensor Primitives package (#6875)
  • Add QA sweepable estimator in AutoML (#6781)
  • Add NameEntityRecognition and Q&A deep learning tasks. (#6760)
  • Adds the ability to load a pre-trained LightGBM file and import it into ML.Net. (#6569)

Enhancements

  • Expose ExperimentSettings.MaxModel as public (#6663) - Exposes ExperimentSettings.MaxModel as public so now you can set the number of Max Models you want for an AutoML experiment.
  • Update to latest version of TorchSharp (#6636) - Updated to the latest version of TorchSharp and fixed any breaking changes so we can take advantage of their new features and bug fixes.
  • Update to latest version of Onnx Runtime (#6624) - Updated to the latest version of Onnx Runtime and fixed any breaking changes so we can take advantage of their new features and bug fixes.
  • Update ML.NET to compile with .NET8 (#6641) - Removed some deprecated code now throws errors on .NET8 as well as other minor fixes to allow working/building with .NET8.
  • Added more logging to Object Detection (#6646) - Added more logging while Object Detection is training so even if epochs take a long time you can be sure things are still moving.
  • Update timeout error message in AutoMLExperiment (#6613) - Updated the error message so it is more clear what happened.
  • Add batchsize and arch to imageClassification SweepableTrainer (#6597) - Added batchsize and arch to the ImageClassification SweepableTrainer so those can now be trained on.
  • Update max_model when trial fails (#6596)
  • Add default search space for standard trainers (#6576) - Added a default search space for all standard trainers so users have reasonable default values.
  • Adding more metrics to BinaryClassification Experiment (#6571)
  • Add checkAlive in NasBertTrainer (#6546) - Now we check between batches if cancellation was requested and stop processing if so.
  • OneDAL - Fallback to default implementation (#6538) - If you specify you want to use OneDal but something happens that prevents you from using it, like it can't find the binaries/etc, it will auto default back to the normal implementation instead of crashing.
  • Add addKeyValueAnnotationsAsText flag in AutoML (#6535)
  • Add continuous resource monitoring to AutoML.IMonitor (#6520) - Thanks @andrasfuchs!
  • Update WebClient to HttpClient implementations (#6476) - Update the usage of WebClient to HttpClient since WebClient is now deprecated. Thanks @rgesteve!
  • Set AutoML trial to unsuccess if trial loss is nan/inf (#6430) - Now trial will be marked as unsuccesssful if the loss is an invalid number.
  • Add diskConvert option in fast tree search space (#6316)
  • Avoid Boxing/Unboxing on accessing elements of VBufferDataFrameColumn (#6867) and (#6865) - Thanks @asmirnov82!
  • Update LightGBM to version 3.X.X from 2.X.X (#6880)
  • Implement vectorized binary arithmetic operations for DataFrames (#6854) - Thanks @asmirnov82!
  • Upgrade .NET Interactive (#6857) - Thanks @colombod!
  • Improve performance of column cloning inside DataFrame arithmetics (#6814) - Thanks @asmirnov82!
  • Add performance benchmarks for dataframe arithmetic operations (#6827) - Thanks @asmirnov82!
  • Simplify tt files for PrimitiveDataFrameColumnAritmetics (#6830) - Thanks @asmirnov82!
  • Improve performance of DataFrame binary comparison operations (#6869) - Thanks @asmirnov82!
  • Allow a CultureInfo to be used for parsing CSV values into DataFrame (#6782) - Thanks @asmirnov82!
  • File-scoped namespaces in files under Prediction (Microsoft.ML.Core) (#6792) - Thanks @Lehonti!
  • File-scoped namespaces in files under ComponentModel (Microsoft.ML.Core) (#6788) - Thanks @Lehonti!
  • File-scoped namespaces in files under Data (Microsoft.ML.Core) (#6789) - Thanks @Lehonti!
  • File-scoped namespaces in files under EntryPoints (Microsoft.ML.Core) (#6790) - Thanks @Lehonti!
  • File-scoped namespaces in files under Environment (Microsoft.ML.Core) (#6791) - Thanks @Lehonti!
  • Add TargetType to Type_convert (#6785)
  • Modernized some argument checks that still used string literals for parameter names (#6766) - Thanks @Lehonti!
  • Improve DataFrame Arithmetics implementation (#6763) - Thanks @asmirnov82!
  • Fixed mac build and minor torch sharp changes (#6776)
  • Clean DataFrame meaningless code (#6761) - Thanks @asmirnov82!
  • Provide ability to filter dataframe column by null via ElementWise Methods (#6723)
  • Add missing implementation for datetime relevant arrow type into dataframe (#6675) - Thanks @asmirnov82!
  • Fix DataFrame to allow to store columns with size more than 2 Gb (#6710) - Thanks @asmirnov82!
  • Remove redundant column names collection from DataFrameColumnCollection (#6701) - Thanks @asmirnov82!
  • Clean dataframe math (#6709) - Thanks @asmirnov82!
  • Continue training on OOM error && add subsampling support for trainValidationDatasetManager (#6714)
  • Add epsilon to eci inverse probability (#6668)

Bug Fixes

  • Fix DataFrame ToString (#6673) - Use correct alignment for columns to produce readable output when columns have longer names. Thanks @asmirnov82!
  • Fix DataFrame null math (#6661) - Fixes max in DataFrame columns when there are null values to match what Pandas does.
  • Clean up PrimitiveColumnContainer (#6656) - Cleaned up the code in PrimitiveColumnContainer so its more correct and easier to use.
  • Fix Apply in PrimitiveColumnContainer (#6642) - Fixes the Apply method so it no longer changes the source column. Thanks @janholo!
  • Fix datetime null error (#6627) - Fixes loading a null datetime from a database so it now returns correctly instead of throwing an error.
  • Fix AggregateTrainingStopManager is trying to cancel disposed tokens (#6612) - Will no longer try and cancel already disposed tokens.
  • Fix ToString bug for sweepable pipeline (#6610)
  • Change Test to Validate in Dataset manager (#6599)
  • Fixed System.OperationCanceledException when calling experimentResult.BestRun.Estimator.Fit (#6572)
  • Fixed cancellation bug in SweepablePipelineRunner && Fixed object null exception in AutoML v1.0 regression API (#6560)
  • Fixed OneDal dispatching issues (#6547) - OneDal now dispatches correctly.
  • Fixed Multi-threaded access issue (#6537) - Fixed a multi-threaded access issue for variable length string arrays in ONNX models.
  • Fixed AutoML experiments in non declarative style not working (#6447)
  • Fix DataFrame Saving csv with VBufferDataFrameColumn (#6860) - Thanks @asmirnov82!
  • Fixes incorrect work of DataFrame with VBufferColumn (#6851) - Thanks @asmirnov82!
  • Increase performance of DataFrame arithmetic operations by enhancing calculations on nullable values (#6846) - Thanks @asmirnov82!
  • DataFrame incorrectly sets column value for index higher than Buffer.MaxCapacity (#6849) - Thanks 2asmirnov82!
  • PrimitiveDataFrameColumn.Clone method crashes when is used with IEnumerable mapIndices argument (#6822) - Thanks @asmirnov82!
  • Addresses #6533, OneDal Index was outside the bounds of the array. (#6838) - Thanks @rgesteve!
  • Fix wrong type conversion on PrimitiveDataFrameColumn (#6834) - Thanks @novelhawk!
  • Append dataframe rows based on column names (#6808) - Thanks @asmirnov82!
  • Fix inconsistent null handling in DataFrame Arithmetics (#6770) - Thanks @asmirnov82!
  • Fix DataFrame.LoadCsv can not load CSV with duplicate column names (#6772) - Thanks @asmirnov82!
  • Fix issue with addIndexColumn in DataFrame.LoadCsv (#6769) - Thanks @asmirnov82!
  • Fix text classification InvocationException during cross-validation, issue #2718 from model builder (#6768)
  • Fix incorrect DataFrame min max computation with NULL (#6734) - Thanks @asmirnov82!
  • Fix ML.Fairlean using ToList on Row Collection with Count more than Max.Int (#6678) - Thanks @asmirnov82!
  • Fix the behavior of column SetName method (#6676) - Thanks @asmirnov82!
  • Fix dataframe arithmetics for columns having several value buffers (column size is more than 2 Gb) (#6724) - Thanks @asmirnov82!
  • AutoML.Net avoid empty dataset in trainValidationDatasetManager (#6756)
  • Fix DataFrame bounds checking on indexing elements (#6681) - Thanks @asmirnov82!
  • Reset DataFrame.RowCount to zero, when DataFrame is empty (#6698) - Thanks @asmirnov82!
  • Stop shuffle rows in ITrainValidationDatasetManager (#6742)
  • SMAC - ignore fail trial during initialize (#6738)
  • Fix non-thread-safe use of Random in tokenizers (#6695)
  • Fixing license (#6689) and (#6690)

Build / Test updates

  • Remove MSIL Check for TorchSharp (#6658) - Removes the MSIL check for TorchSharp while we figure out how we want to correctly handle this.
  • Change code coverage build pool (#6647) - Changed codecoverage build pool so the builds are faster and more stable.
  • Update AutoMLExperimentTests.cs to fix timeout error (#6638)
  • update interactive kernel version (#6836)
  • Update dependencies (#6837)
  • Update dependencies from dotnet/arcade (#6566 & #6518 & #6451 & #6439)
  • Mac python fix (#6549)
  • Moving onedal nuget download from onedal to native where its needed for building (#6527)
  • New os image for official builds (#6467)
  • Removed deprecated yosemite brew (#6805)
  • Run tests that requires more than 2 Gb of memory only on 64-bit env (#6758) - Thanks @asmirnov82!
  • Reduce coupling of Data.Analysis.Tests project (#6759) - Thanks @asmirnov82!
  • Update build templates to handle feature branches (#6744)
  • Fix OSX official build. (#6739)
  • Helix Fixes (#6721)
  • Update dependencies from dotnet/arcade (#6691)
  • Disable flaky test (#6685)
  • License expression (#6674)
  • Use -Svc pool providers in release/ branches for billing purposes* (#6434)

Documentation Updates

  • Add doc for CreateSweepableEstimator, Parameter and SearchSpace (#6611)
  • Add AutoMLExperiment example doc (#6594)
  • Fix minor doc typos (#6557)
  • Fix minor roadmap nits (#6480)
  • 2023 roadmap outline (#6444)
  • Fixed typo for calibrators (#6438) - Thanks @KKghub!
  • Fix docs for DataViewRowCursor (#6855) - Thanks @Akash190104

Breaking changes

  • None