dask_ml.xgboost.XGBClassifier

class dask_ml.xgboost.XGBClassifier(max_depth=3, learning_rate=0.1, n_estimators=100, silent=True, objective='binary:logistic', booster='gbtree', n_jobs=1, nthread=None, gamma=0, min_child_weight=1, max_delta_step=0, subsample=1, colsample_bytree=1, colsample_bylevel=1, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, base_score=0.5, random_state=0, seed=None, missing=None, **kwargs)
Attributes:
feature_importances_

Returns

Methods

apply(X[, ntree_limit]) Return the predicted leaf every tree for each sample.
evals_result() Return the evaluation results.
fit(X[, y]) Fit a gradient boosting classifier
get_booster() Get the underlying xgboost Booster of this model.
get_params([deep]) Get parameters.
get_xgb_params() Get xgboost type parameters.
predict(X) Predict with data.
predict_proba(data[, ntree_limit]) Predict the probability of each data example being of a given class.
score(X, y[, sample_weight]) Returns the mean accuracy on the given test data and labels.
set_params(**params) Set the parameters of this estimator.
__init__(max_depth=3, learning_rate=0.1, n_estimators=100, silent=True, objective='binary:logistic', booster='gbtree', n_jobs=1, nthread=None, gamma=0, min_child_weight=1, max_delta_step=0, subsample=1, colsample_bytree=1, colsample_bylevel=1, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, base_score=0.5, random_state=0, seed=None, missing=None, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

apply(X, ntree_limit=0)

Return the predicted leaf every tree for each sample.

Parameters:
X : array_like, shape=[n_samples, n_features]

Input features matrix.

ntree_limit : int

Limit number of trees in the prediction; defaults to 0 (use all trees).

Returns:
X_leaves : array_like, shape=[n_samples, n_trees]

For each datapoint x in X and for each tree, return the index of the leaf x ends up in. Leaves are numbered within [0; 2**(self.max_depth+1)), possibly with gaps in the numbering.

evals_result()

Return the evaluation results.

If eval_set is passed to the fit function, you can call evals_result() to get evaluation results for all passed eval_sets. When eval_metric is also passed to the fit function, the evals_result will contain the eval_metrics passed to the fit function

Returns:
evals_result : dictionary
feature_importances_
Returns:
feature_importances_ : array of shape = [n_features]
fit(X, y=None)

Fit a gradient boosting classifier

Parameters:
X : array-like [n_samples, n_features]

Feature Matrix. May be a dask.array or dask.dataframe

y : array-like

Labels

Returns:
self : XGBClassifier

Notes

This differs from the XGBoost version in three ways

  1. The sample_weight, eval_set, eval_metric,
early_stopping_rounds and verbose fit kwargs are not supported.
  1. The labels are not automatically label-encoded
  2. The classes_ and n_classes_ attributes are not learned
get_booster()

Get the underlying xgboost Booster of this model.

This will raise an exception when fit was not called

Returns:
booster : a xgboost booster of underlying model
get_params(deep=False)

Get parameters.

get_xgb_params()

Get xgboost type parameters.

predict(X)

Predict with data. NOTE: This function is not thread safe.

For each booster object, predict can only be called from one thread. If you want to run prediction using multiple thread, call xgb.copy() to make copies of model object and then call predict
data : DMatrix
The dmatrix storing the input.
output_margin : bool
Whether to output the raw untransformed margin value.
ntree_limit : int
Limit number of trees in the prediction; defaults to 0 (use all trees).

prediction : numpy array

predict_proba(data, ntree_limit=0)

Predict the probability of each data example being of a given class. NOTE: This function is not thread safe.

For each booster object, predict can only be called from one thread. If you want to run prediction using multiple thread, call xgb.copy() to make copies of model object and then call predict
data : DMatrix
The dmatrix storing the input.
ntree_limit : int
Limit number of trees in the prediction; defaults to 0 (use all trees).
prediction : numpy array
a numpy array with the probability of each data example being of a given class.
score(X, y, sample_weight=None)

Returns the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
X : array-like, shape = (n_samples, n_features)

Test samples.

y : array-like, shape = (n_samples) or (n_samples, n_outputs)

True labels for X.

sample_weight : array-like, shape = [n_samples], optional

Sample weights.

Returns:
score : float

Mean accuracy of self.predict(X) wrt. y.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
self