Class KMeans (0.16.0)

KMeans(n_clusters: int = 8)

K-Means clustering.

Parameter
Name	Description
`n_clusters`	`int, default 8` The number of clusters to form as well as the number of centroids to generate. Default to 8.

Properties

cluster_centers_

Information of cluster centers.

Returns

Type Description

bigframes.dataframe.DataFrame DataFrame of cluster centers, containing following columns: centroid_id: An integer that identifies the centroid. feature: The column name that contains the feature. numerical_value: If feature is numeric, the value of feature for the centroid that centroid_id identifies. If feature is not numeric, the value is NULL. categorical_value: An list of mappings containing information about categorical features. Each mapping contains the following fields: categorical_value.category: The name of each category. categorical_value.value: The value of categorical_value.category for the centroid that centroid_id identifies. The output contains one row per feature per centroid.

Methods

repr

__repr__()

Print the estimator's constructor with all non-default parameter values

fit

fit(
    X: typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series],
    y: typing.Optional[
        typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series]
    ] = None,
) -> bigframes.ml.base._T

Compute k-means clustering.

Parameters
Name	Description
`X`	`bigframes.dataframe.DataFrame or bigframes.series.Series` DataFrame of shape (n_samples, n_features). Training data.
`y`	`default None` Not used, present here for API consistency by convention.

Returns
Type	Description
`KMeans`	Fitted Estimator.

get_params

get_params(deep: bool = True) -> typing.Dict[str, typing.Any]

Get parameters for this estimator.

Parameter
Name	Description
`deep`	`bool, default True` Default `True`. If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
Type	Description
`Dictionary`	A dictionary of parameter names mapped to their values.

predict

predict(
    X: typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series]
) -> bigframes.dataframe.DataFrame

Predict the closest cluster each sample in X belongs to.

Parameter
Name	Description
`X`	`bigframes.dataframe.DataFrame or bigframes.series.Series` DataFrame of shape (n_samples, n_features). New data to predict.

Returns
Type	Description
`bigframes.dataframe.DataFrame`	DataFrame of shape (n_samples, n_input_columns + n_prediction_columns). Returns predicted labels.

register

register(vertex_ai_model_id: typing.Optional[str] = None) -> bigframes.ml.base._T

After register, go to Google Cloud Console (https://console.cloud.google.com/vertex-ai/models) to manage the model registries. Refer to https://cloud.google.com/vertex-ai/docs/model-registry/introduction for more options.

Parameter
Name	Description
`vertex_ai_model_id`	`Optional[str], default None` optional string id as model id in Vertex. If not set, will by default to 'bigframes_{bq_model_id}'. Vertex Ai model id will be truncated to 63 characters due to its limitation.

score

score(
    X: typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series], y=None
) -> bigframes.dataframe.DataFrame

Calculate evaluation metrics of the model.

Parameters
Name	Description
`X`	`bigframes.dataframe.DataFrame or bigframes.series.Series` DataFrame of shape (n_samples, n_features). New Data.
`y`	`default None` Not used, present here for API consistency by convention.

Returns
Type	Description
`bigframes.dataframe.DataFrame`	DataFrame of the metrics.

to_gbq

to_gbq(model_name: str, replace: bool = False) -> bigframes.ml.cluster.KMeans

Save the model to BigQuery.

Parameters
Name	Description
`model_name`	`str` the name of the model.
`replace`	`bool, default False` whether to replace if the model already exists. Default to False.

Returns
Type	Description
`KMeans`	saved model.