Version 0.6.0

API Breaking Changes


Version 0.5.0

API Breaking Changes

  • The n_samples_seen_ attribute on dask_ml.preprocessing.StandardScalar is now consistently numpy.nan (GH#157).
  • Changed the algorithm for dask_ml.datasets.make_blobs(), dask_ml.datasets.make_regression() and dask_ml.datasets.make_classfication() to reduce the single-machine peak memory usage (GH#67)

Bug Fixes

  • dask_ml.preprocessing.StandardScalar now works on DataFrame inputs (GH#157).

Version 0.4.1

This release added several new estimators.


Scale features using statistics that are robust to outliers. This mirrors sklearn.preprocessing.RobustScalar (GH#62).

Encodes categorical features as ordinal, in one ordered feature (GH#119).

A meta-estimator for fitting with any scikit-learn estimator, but post-processing (predict, transform, etc.) in parallel on dask arrays. See Parallel Meta-estimators for more (GH#132).

Version 0.4.0

API Changes

  • Changed the arguments of the dask-glm based estimators in dask_glm.linear_model to match scikit-learn’s API (GH#94).

    • To specify lambuh use C = 1.0 / lambduh (the default of 1.0 is unchanged)
    • The rho, over_relax, abstol and reltol arguments have been removed. Provide them in solver_kwargs instead.

    This affects the LinearRegression, LogisticRegression and PoissonRegression estimators.


  • Accept dask.dataframe for dask-glm based estimators (GH#84).

Version 0.3.2


  • Added dask_ml.preprocessing.TruncatedSVD() and dask_ml.preprocessing.PCA() (GH#78)

Version 0.3.0


  • Added KMeans.predict() (GH#83)

API Changes

  • Changed the fitted attributes on MinMaxScaler and StandardScaler to be concrete NumPy or pandas objects, rather than persisted dask objects (GH#75).