PCA(
n_components: typing.Optional[typing.Union[int, float]] = None,
*,
svd_solver: typing.Literal["full", "randomized", "auto"] = "auto"
)
Principal component analysis (PCA).
Parameters | |
---|---|
Name | Description |
n_components |
int, float or None, default None
Number of components to keep. If n_components is not set all components are kept. n_components = min(n_samples, n_features). If 0 < n_components < 1, select the number of components such that the amount of variance that needs to be explained is greater than the percentage specified by n_components. |
svd_solver |
"full", "randomized" or "auto", default "auto"
The solver to use to calculate the principal components. Details: https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-pca#pca_solver. |
Properties
components_
Principal axes in feature space, representing the directions of maximum variance in the data.
Returns | |
---|---|
Type | Description |
bigframes.dataframe.DataFrame | DataFrame of principal components, containing following columns: principal_component_id: An integer that identifies the principal component. feature: The column name that contains the feature. numerical_value: If feature is numeric, the value of feature for the principal component that principal_component_id identifies. If feature isn't numeric, the value is NULL. categorical_value: An list of mappings containing information about categorical features. Each mapping contains the following fields: categorical_value.category: The name of each category. categorical_value.value: The value of categorical_value.category for the centroid that centroid_id identifies. The output contains one row per feature per component. |
explained_variance_
The amount of variance explained by each of the selected components.
Returns | |
---|---|
Type | Description |
bigframes.dataframe.DataFrame | DataFrame containing following columns: principal_component_id: An integer that identifies the principal component. explained_variance: The factor by which the eigenvector is scaled. Eigenvalue and explained variance are the same concepts in PCA. |
explained_variance_ratio_
Percentage of variance explained by each of the selected components.
Returns | |
---|---|
Type | Description |
bigframes.dataframe.DataFrame | DataFrame containing following columns: principal_component_id: An integer that identifies the principal component. explained_variance_ratio: the total variance is the sum of variances, also known as eigenvalues, of all of the individual principal components. The explained variance ratio by a principal component is the ratio between the variance, also known as eigenvalue, of that principal component and the total variance. |
Methods
__repr__
__repr__()
Print the estimator's constructor with all non-default parameter values
detect_anomalies
detect_anomalies(
X: typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series],
*,
contamination: float = 0.1
) -> bigframes.dataframe.DataFrame
Detect the anomaly data points of the input.
Parameters | |
---|---|
Name | Description |
X |
bigframes.dataframe.DataFrame or bigframes.series.Series
Series or a DataFrame to detect anomalies. |
contamination |
float, default 0.1
Identifies the proportion of anomalies in the training dataset that are used to create the model. The value must be in the range [0, 0.5]. |
Returns | |
---|---|
Type | Description |
bigframes.dataframe.DataFrame | detected DataFrame. |
fit
fit(
X: typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series],
y: typing.Optional[
typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series]
] = None,
) -> bigframes.ml.base._T
Fit the model according to the given training data.
Parameters | |
---|---|
Name | Description |
X |
bigframes.dataframe.DataFrame or bigframes.series.Series
Series or DataFrame of shape (n_samples, n_features). Training vector, where |
y |
default None
Ignored. |
Returns | |
---|---|
Type | Description |
PCA | Fitted estimator. |
get_params
get_params(deep: bool = True) -> typing.Dict[str, typing.Any]
Get parameters for this estimator.
Parameter | |
---|---|
Name | Description |
deep |
bool, default True
Default |
Returns | |
---|---|
Type | Description |
Dictionary | A dictionary of parameter names mapped to their values. |
predict
predict(
X: typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series]
) -> bigframes.dataframe.DataFrame
Predict the closest cluster for each sample in X.
Parameter | |
---|---|
Name | Description |
X |
bigframes.dataframe.DataFrame or bigframes.series.Series
Series or a DataFrame to predict. |
Returns | |
---|---|
Type | Description |
bigframes.dataframe.DataFrame | predicted DataFrames. |
register
register(vertex_ai_model_id: typing.Optional[str] = None) -> bigframes.ml.base._T
Register the model to Vertex AI.
After register, go to Google Cloud Console (https://console.cloud.google.com/vertex-ai/models) to manage the model registries. Refer to https://cloud.google.com/vertex-ai/docs/model-registry/introduction for more options.
Parameter | |
---|---|
Name | Description |
vertex_ai_model_id |
Optional[str], default None
optional string id as model id in Vertex. If not set, will by default to 'bigframes_{bq_model_id}'. Vertex Ai model id will be truncated to 63 characters due to its limitation. |
score
score(X=None, y=None) -> bigframes.dataframe.DataFrame
Calculate evaluation metrics of the model.
Parameters | |
---|---|
Name | Description |
X |
default None
Ignored. |
y |
default None
Ignored. |
Returns | |
---|---|
Type | Description |
bigframes.dataframe.DataFrame | DataFrame that represents model metrics. |
to_gbq
to_gbq(model_name: str, replace: bool = False) -> bigframes.ml.decomposition.PCA
Save the model to BigQuery.
Parameters | |
---|---|
Name | Description |
model_name |
str
the name of the model. |
replace |
bool, default False
whether to replace if the model already exists. Default to False. |
Returns | |
---|---|
Type | Description |
PCA | saved model. |