Class RandomForestClassifier (2.25.0)

RandomForestClassifier(
    n_estimators: int = 100,
    *,
    tree_method: typing.Literal["auto", "exact", "approx", "hist"] = "auto",
    min_tree_child_weight: int = 1,
    colsample_bytree: float = 1.0,
    colsample_bylevel: float = 1.0,
    colsample_bynode: float = 0.8,
    gamma: float = 0.0,
    max_depth: int = 15,
    subsample: float = 0.8,
    reg_alpha: float = 0.0,
    reg_lambda: float = 1.0,
    tol: float = 0.01,
    enable_global_explain: bool = False,
    xgboost_version: typing.Literal["0.9", "1.1"] = "0.9"
)

A random forest classifier.

A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.

Parameters
Name	Description
`n_estimators`	`Optional[int]` Number of parallel trees constructed during each iteration. Default to 100. Minimum value is 2.
`tree_method`	`Optional[str]` Specify which tree method to use. Default to "auto". If this parameter is set to default, XGBoost will choose the most conservative option available. Possible values: "exact", "approx", "hist".
`min_child_weight`	`Optional[float]` Minimum sum of instance weight(hessian) needed in a child. Default to 1.
`colsample_bytree`	`Optional[float]` Subsample ratio of columns when constructing each tree. Default to 1.0. The value should be between 0 and 1.
`colsample_bylevel`	`Optional[float]` Subsample ratio of columns for each level. Default to 1.0. The value should be between 0 and 1.
`colsample_bynode`	`Optional[float]` Subsample ratio of columns for each split. Default to 0.8. The value should be between 0 and 1.
`gamma`	`Optional[float]` (min_split_loss) Minimum loss reduction required to make a further partition on a leaf node of the tree. Default to 0.0.
`max_depth`	`Optional[int]` Maximum tree depth for base learners. Default to 15. The value should be greater than 0 and less than 1.
`subsample`	`Optional[float]` Subsample ratio of the training instance. Default to 0.8. The value should be greater than 0 and less than 1.
`reg_alpha`	`Optional[float]` L1 regularization term on weights (xgb's alpha). Default to 0.0.
`reg_lambda`	`Optional[float]` L2 regularization term on weights (xgb's lambda). Default to 1.0.
`tol`	`Optional[float]` Minimum relative loss improvement necessary to continue training. Default to 0.01.
`enable_global_explain`	`Optional[bool]` Whether to compute global explanations using explainable AI to evaluate global feature importance to the model. Default to False.
`xgboost_version`	`Optional[str]` Specifies the Xgboost version for model training. Default to "0.9". Possible values: "0.9", "1.1".ß

Methods

repr

__repr__()

Print the estimator's constructor with all non-default parameter values.

fit

fit(
    X: typing.Union[
        bigframes.dataframe.DataFrame,
        bigframes.series.Series,
        pandas.core.frame.DataFrame,
        pandas.core.series.Series,
    ],
    y: typing.Union[
        bigframes.dataframe.DataFrame,
        bigframes.series.Series,
        pandas.core.frame.DataFrame,
        pandas.core.series.Series,
    ],
    X_eval: typing.Optional[
        typing.Union[
            bigframes.dataframe.DataFrame,
            bigframes.series.Series,
            pandas.core.frame.DataFrame,
            pandas.core.series.Series,
        ]
    ] = None,
    y_eval: typing.Optional[
        typing.Union[
            bigframes.dataframe.DataFrame,
            bigframes.series.Series,
            pandas.core.frame.DataFrame,
            pandas.core.series.Series,
        ]
    ] = None,
) -> bigframes.ml.base._T

Build a forest of trees from the training set (X, y).

Parameters
Name	Description
`X`	`bigframes.dataframe.DataFrame or bigframes.series.Series or pandas.core.frame.DataFrame or pandas.core.series.Series` Series or DataFrame of shape (n_samples, n_features). Training data.
`y`	`bigframes.dataframe.DataFrame or bigframes.series.Series or pandas.core.frame.DataFrame or pandas.core.series.Series` Series or DataFrame of shape (n_samples,) or (n_samples, n_targets). Target values. Will be cast to X's dtype if necessary.
`X_eval`	`bigframes.dataframe.DataFrame or bigframes.series.Series or pandas.core.frame.DataFrame or pandas.core.series.Series` Series or DataFrame of shape (n_samples, n_features). Evaluation data.
`y_eval`	`bigframes.dataframe.DataFrame or bigframes.series.Series or pandas.core.frame.DataFrame or pandas.core.series.Series` Series or DataFrame of shape (n_samples,) or (n_samples, n_targets). Evaluation target values. Will be cast to X_eval's dtype if necessary.

Returns
Type	Description
`ForestModel`	Fitted estimator.

get_params

get_params(deep: bool = True) -> typing.Dict[str, typing.Any]

Get parameters for this estimator.

Parameter
Name	Description
`deep`	`bool, default True` Default `True`. If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
Type	Description
`Dictionary`	A dictionary of parameter names mapped to their values.

predict

predict(
    X: typing.Union[
        bigframes.dataframe.DataFrame,
        bigframes.series.Series,
        pandas.core.frame.DataFrame,
        pandas.core.series.Series,
    ],
) -> bigframes.dataframe.DataFrame

Predict regression target for X.

The predicted regression target of an input sample is computed as the mean predicted regression targets of the trees in the forest.

Returns
Type	Description
`bigframes.dataframe.DataFrame`	The predicted values.

register

register(vertex_ai_model_id: typing.Optional[str] = None) -> bigframes.ml.base._T

After register, go to the Google Cloud console (https://console.cloud.google.com/vertex-ai/models) to manage the model registries. Refer to https://cloud.google.com/vertex-ai/docs/model-registry/introduction for more options.

Parameter
Name	Description
`vertex_ai_model_id`	`Optional[str], default None` Optional string id as model id in Vertex. If not set, will default to 'bigframes_{bq_model_id}'. Vertex Ai model id will be truncated to 63 characters due to its limitation.

score

score(
    X: typing.Union[
        bigframes.dataframe.DataFrame,
        bigframes.series.Series,
        pandas.core.frame.DataFrame,
        pandas.core.series.Series,
    ],
    y: typing.Union[
        bigframes.dataframe.DataFrame,
        bigframes.series.Series,
        pandas.core.frame.DataFrame,
        pandas.core.series.Series,
    ],
)

Calculate evaluation metrics of the model.

Parameters
Name	Description
`X`	`bigframes.dataframe.DataFrame or bigframes.series.Series` A BigQuery DataFrame as evaluation data.
`y`	`bigframes.dataframe.DataFrame or bigframes.series.Series` A BigQuery DataFrame as evaluation labels.

Returns
Type	Description
`bigframes.dataframe.DataFrame`	The DataFrame as evaluation result.

to_gbq

to_gbq(
    model_name: str, replace: bool = False
) -> bigframes.ml.ensemble.RandomForestClassifier

Save the model to BigQuery.

Parameters
Name	Description
`model_name`	`str` The name of the model.
`replace`	`bool, default False` Determine whether to replace if the model already exists. Default to False.

Returns
Type	Description
`RandomForestClassifier`	Saved model.