Version 0.10.0

Version 0.9.0

Bug Fixes

Documentation Updates

Build Changes

We’re now using Numba for performance-sensitive parts of Dask-ML. Dask-ML is now a pure-python project, so we can provide universal wheels.

Version 0.8.0


  • Automatically replace default scikit-learn scorers with dask-aware versions in Incremental (GH#200)
  • Added the dask_ml.metrics.log_loss() loss function and neg_log_loss scorer (GH#318)
  • Fixed handling of array-like fit-parameters to GridSearchCV and BaseSearchCV (GH#320)

Bug Fixes

  • Fixed dtype in LabelEncoder.fit_transform() to be integer, rather than the dtype of the classes for dask arrays (GH#311)

Version 0.7.0


API Breaking Changes

  • Removed the basis_inds_ attribute from dask_ml.cluster.SpectralClustering as its no longer used (GH#152)

  • Change to clone the underlying estimator before training (GH#258). This induces a few changes

    1. The underlying estimator no longer gives access to learned attributes like coef_. We recommend using Incremental.coef_.
    2. State no longer leaks between successive fit calls. Note that Incremental.partial_fit() is still available if you want state, like learned attributes or random seeds, to be re-used. This is useful if you’re making multiple passes over the training data.
  • Changed get_params and set_params for dask_ml.wrappers.Incremental to no longer magically get / set parameters for the underlying estimator (GH#258). To specify parameters for the underlying estimator, use the double-underscore prefix convention established by scikit-learn:

    inc.set_params('estimator__alpha': 10)


Dask-SearchCV is now being developed in the dask/dask-ml repository. Users who previously installed dask-searchcv should now just install dask-ml.

Bug Fixes

  • Fixed random seed generation on 32-bit platforms (GH#230)

Version 0.6.0

API Breaking Changes


Version 0.5.0

API Breaking Changes

  • The n_samples_seen_ attribute on dask_ml.preprocessing.StandardScalar is now consistently numpy.nan (GH#157).
  • Changed the algorithm for dask_ml.datasets.make_blobs(), dask_ml.datasets.make_regression() and dask_ml.datasets.make_classfication() to reduce the single-machine peak memory usage (GH#67)

Bug Fixes

  • dask_ml.preprocessing.StandardScalar now works on DataFrame inputs (GH#157).

Version 0.4.1

This release added several new estimators.


Added dask_ml.preprocessing.RobustScaler

Scale features using statistics that are robust to outliers. This mirrors sklearn.preprocessing.RobustScalar (GH#62).

Added dask_ml.preprocessing.OrdinalEncoder

Encodes categorical features as ordinal, in one ordered feature (GH#119).

Added dask_ml.wrappers.ParallelPostFit

A meta-estimator for fitting with any scikit-learn estimator, but post-processing (predict, transform, etc.) in parallel on dask arrays. See Parallel Meta-estimators for more (GH#132).

Version 0.4.0

API Changes

  • Changed the arguments of the dask-glm based estimators in dask_glm.linear_model to match scikit-learn’s API (GH#94).

    • To specify lambuh use C = 1.0 / lambduh (the default of 1.0 is unchanged)
    • The rho, over_relax, abstol and reltol arguments have been removed. Provide them in solver_kwargs instead.

    This affects the LinearRegression, LogisticRegression and PoissonRegression estimators.


  • Accept dask.dataframe for dask-glm based estimators (GH#84).

Version 0.3.2


  • Added dask_ml.preprocessing.TruncatedSVD() and dask_ml.preprocessing.PCA() (GH#78)

Version 0.3.0


  • Added KMeans.predict() (GH#83)

API Changes

  • Changed the fitted attributes on MinMaxScaler and StandardScaler to be concrete NumPy or pandas objects, rather than persisted dask objects (GH#75).