Generalized Linear Models

LinearRegression([penalty, dual, tol, C, …]) Esimator for linear_regression.
LogisticRegression([penalty, dual, tol, C, …]) Esimator for logistic_regression.
PoissonRegression([penalty, dual, tol, C, …]) Esimator for poisson_regression.

Generalized linear models are a broad class of commonly used models. These implementations scale well out to large datasets either on a single machine or distributed cluster. They can be powered by a variety of optimization algorithms and use a variety of regularizers.

These follow the scikit-learn estimator API, and so can be dropped into existing routines like grid search and pipelines, but are implemented externally with new, scalable algorithms and so can consume distributed dask arrays and dataframes rather than just single-machine NumPy and Pandas arrays and dataframes.

Example

In [1]: from dask_ml.linear_model import LogisticRegression

In [2]: from dask_ml.datasets import make_classification

In [3]: X, y = make_classification(chunks=50)

In [4]: lr = LogisticRegression()

In [5]: lr.fit(X, y)
Out[5]: 
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1.0, max_iter=100, multiclass='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='admm',
          solver_kwargs=None, tol=0.0001, verbose=0, warm_start=False)

Algorithms

admm(X, y[, regularizer, lamduh, rho, …]) Alternating Direction Method of Multipliers
gradient_descent(X, y[, max_iter, tol, family]) Michael Grant’s implementation of Gradient Descent.
lbfgs(X, y[, regularizer, lamduh, max_iter, …]) L-BFGS solver using scipy.optimize implementation
newton(X, y[, max_iter, tol, family]) Newtons Method for Logistic Regression.
proximal_grad(X, y[, regularizer, lamduh, …])
Parameters:

Regularizers

ElasticNet([weight]) Elastic net regularization.
L1 L1 regularization.
L2 L2 regularization.
Regularizer Abstract base class for regularization object.