Class AutoMLTabularTrainingJob (1.3.0)

AutoMLTabularTrainingJob(
    display_name: str,
    optimization_prediction_type: str,
    optimization_objective: Optional[str] = None,
    column_specs: Optional[Dict[str, str]] = None,
    column_transformations: Optional[Union[Dict, List[Dict]]] = None,
    optimization_objective_recall_value: Optional[float] = None,
    optimization_objective_precision_value: Optional[float] = None,
    project: Optional[str] = None,
    location: Optional[str] = None,
    credentials: Optional[google.auth.credentials.Credentials] = None,
    training_encryption_spec_key_name: Optional[str] = None,
    model_encryption_spec_key_name: Optional[str] = None,
)

Constructs a AutoML Tabular Training Job.

Example usage:

job = training_jobs.AutoMLTabularTrainingJob( display_name="my_display_name", optimization_prediction_type="classification", optimization_objective="minimize-log-loss", column_specs={"column_1": "auto", "column_2": "numeric"}, )

Parameters

Name Description
display_name str

Required. The user-defined name of this TrainingPipeline.

optimization_prediction_type str

The type of prediction the Model is to produce. "classification" - Predict one out of multiple target values is picked for each row. "regression" - Predict a value based on its relation to other values. This type is available only to columns that contain semantically numeric values, i.e. integers or floating point number, even if stored as e.g. strings.

optimization_objective str

Optional. Objective function the Model is to be optimized towards. The training task creates a Model that maximizes/minimizes the value of the objective function over the validation set. The supported optimization objectives depend on the prediction type, and in the case of classification also the number of distinct values in the target column (two distint values -> binary, 3 or more distinct values -> multi class). If the field is not set, the default objective function is used. Classification (binary): "maximize-au-roc" (default) - Maximize the area under the receiver operating characteristic (ROC) curve. "minimize-log-loss" - Minimize log loss. "maximize-au-prc" - Maximize the area under the precision-recall curve. "maximize-precision-at-recall" - Maximize precision for a specified recall value. "maximize-recall-at-precision" - Maximize recall for a specified precision value. Classification (multi class): "minimize-log-loss" (default) - Minimize log loss. Regression: "minimize-rmse" (default) - Minimize root-mean-squared error (RMSE). "minimize-mae" - Minimize mean-absolute error (MAE). "minimize-rmsle" - Minimize root-mean-squared log error (RMSLE).

column_specs Dict[str, str]

Optional. Alternative to column_transformations where the keys of the dict are column names and their respective values are one of AutoMLTabularTrainingJob.column_data_types. When creating transformation for BigQuery Struct column, the column should be flattened using "." as the delimiter. Only columns with no child should have a transformation. If an input column has no transformations on it, such a column is ignored by the training, except for the targetColumn, which should have no transformations defined on. Only one of column_transformations or column_specs should be passed.

column_transformations Union[Dict, List[Dict]]

Optional. Transformations to apply to the input columns (i.e. columns other than the targetColumn). Each transformation may produce multiple result values from the column's value, and all are used for training. When creating transformation for BigQuery Struct column, the column should be flattened using "." as the delimiter. Only columns with no child should have a transformation. If an input column has no transformations on it, such a column is ignored by the training, except for the targetColumn, which should have no transformations defined on. Only one of column_transformations or column_specs should be passed. Consider using column_specs as column_transformations will be deprecated eventually.

optimization_objective_recall_value float

Optional. Required when maximize-precision-at-recall optimizationObjective was picked, represents the recall value at which the optimization is done. The minimum value is 0 and the maximum is 1.0.

optimization_objective_precision_value float

Optional. Required when maximize-recall-at-precision optimizationObjective was picked, represents the precision value at which the optimization is done. The minimum value is 0 and the maximum is 1.0.

project str

Optional. Project to run training in. Overrides project set in aiplatform.init.

location str

Optional. Location to run training in. Overrides location set in aiplatform.init.

credentials auth_credentials.Credentials

Optional. Custom credentials to use to run call training service. Overrides credentials set in aiplatform.init.

training_encryption_spec_key_name Optional[str]

Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the training pipeline. Has the form: projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key. The key needs to be in the same region as where the compute resource is created. If set, this TrainingPipeline will be secured by this key. Note: Model trained by this TrainingPipeline is also secured by this key if model_to_upload is not set separately. Overrides encryption_spec_key_name set in aiplatform.init.

model_encryption_spec_key_name Optional[str]

Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the model. Has the form: projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key. The key needs to be in the same region as where the compute resource is created. If set, the trained Model will be secured by this key. Overrides encryption_spec_key_name set in aiplatform.init.

Inheritance

builtins.object > google.cloud.aiplatform.base.VertexAiResourceNoun > builtins.object > google.cloud.aiplatform.base.FutureManager > google.cloud.aiplatform.base.VertexAiResourceNounWithFutureManager > google.cloud.aiplatform.training_jobs._TrainingJob > AutoMLTabularTrainingJob

Methods

get_auto_column_specs

get_auto_column_specs(
    dataset: google.cloud.aiplatform.datasets.tabular_dataset.TabularDataset,
    target_column: str,
)

Returns a dict with all non-target columns as keys and 'auto' as values.

Example usage:

column_specs = training_jobs.AutoMLTabularTrainingJob.get_auto_column_specs( dataset=my_dataset, target_column="my_target_column", )

Parameters
Name Description
dataset datasets.TabularDataset

Required. Intended dataset.

target_column str

Required. Intended target column.

run

run(
    dataset: google.cloud.aiplatform.datasets.tabular_dataset.TabularDataset,
    target_column: str,
    training_fraction_split: float = 0.8,
    validation_fraction_split: float = 0.1,
    test_fraction_split: float = 0.1,
    predefined_split_column_name: Optional[str] = None,
    weight_column: Optional[str] = None,
    budget_milli_node_hours: int = 1000,
    model_display_name: Optional[str] = None,
    disable_early_stopping: bool = False,
    sync: bool = True,
)

Runs the training job and returns a model.

Data fraction splits: Any of training_fraction_split, validation_fraction_split and test_fraction_split may optionally be provided, they must sum to up to 1. If the provided ones sum to less than 1, the remainder is assigned to sets as decided by Vertex AI. If none of the fractions are set, by default roughly 80% of data will be used for training, 10% for validation, and 10% for test.

Parameters
Name Description
dataset datasets.TabularDataset

Required. The dataset within the same Project from which data will be used to train the Model. The Dataset must use schema compatible with Model being trained, and what is compatible should be described in the used TrainingPipeline's [training_task_definition] [google.cloud.aiplatform.v1beta1.TrainingPipeline.training_task_definition]. For tabular Datasets, all their data is exported to training, to pick and choose from.

target_column str

Required. The name of the column values of which the Model is to predict.

training_fraction_split float

Required. The fraction of the input data that is to be used to train the Model. This is ignored if Dataset is not provided.

validation_fraction_split float

Required. The fraction of the input data that is to be used to validate the Model. This is ignored if Dataset is not provided.

test_fraction_split float

Required. The fraction of the input data that is to be used to evaluate the Model. This is ignored if Dataset is not provided.

predefined_split_column_name str

Optional. The key is a name of one of the Dataset's data columns. The value of the key (either the label's value or value in the column) must be one of {training, validation, test}, and it defines to which set the given piece of data is assigned. If for a piece of data the key is not present or has an invalid value, that piece is ignored by the pipeline. Supported only for tabular and time series Datasets.

weight_column str

Optional. Name of the column that should be used as the weight column. Higher values in this column give more importance to the row during Model training. The column must have numeric values between 0 and 10000 inclusively, and 0 value means that the row is ignored. If the weight column field is not set, then all rows are assumed to have equal weight of 1.

budget_milli_node_hours int

Optional. The train budget of creating this Model, expressed in milli node hours i.e. 1,000 value in this field means 1 node hour. The training cost of the model will not exceed this budget. The final cost will be attempted to be close to the budget, though may end up being (even) noticeably smaller - at the backend's discretion. This especially may happen when further model training ceases to provide any improvements. If the budget is set to a value known to be insufficient to train a Model for the given training set, the training won't be attempted and will error. The minimum value is 1000 and the maximum is 72000.

model_display_name str

Optional. If the script produces a managed Vertex AI Model. The display name of the Model. The name can be up to 128 characters long and can be consist of any UTF-8 characters. If not provided upon creation, the job's display_name is used.

disable_early_stopping bool

Required. If true, the entire budget is used. This disables the early stopping feature. By default, the early stopping feature is enabled, which means that training might stop before the entire training budget has been used, if further training does no longer brings significant improvement to the model.

sync bool

Whether to execute this method synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.

Exceptions
Type Description
RuntimeError If Training job has already been run or is waiting to run.
Returns
Type Description
model The trained Vertex AI Model resource or None if training did not produce a Vertex AI Model.