Reference documentation and code samples for the Google Cloud Ai Platform V1 Client class InputDataConfig.
Specifies Vertex AI owned input data to be used for training, and possibly evaluating, the Model.
Generated from protobuf message google.cloud.aiplatform.v1.InputDataConfig
Methods
__construct
Constructor.
Parameters | |
---|---|
Name | Description |
data |
array
Optional. Data for populating the Message object. |
↳ fraction_split |
Google\Cloud\AIPlatform\V1\FractionSplit
Split based on fractions defining the size of each set. |
↳ filter_split |
Google\Cloud\AIPlatform\V1\FilterSplit
Split based on the provided filters for each set. |
↳ predefined_split |
Google\Cloud\AIPlatform\V1\PredefinedSplit
Supported only for tabular Datasets. Split based on a predefined key. |
↳ timestamp_split |
Google\Cloud\AIPlatform\V1\TimestampSplit
Supported only for tabular Datasets. Split based on the timestamp of the input data pieces. |
↳ stratified_split |
Google\Cloud\AIPlatform\V1\StratifiedSplit
Supported only for tabular Datasets. Split based on the distribution of the specified column. |
↳ gcs_destination |
Google\Cloud\AIPlatform\V1\GcsDestination
The Cloud Storage location where the training data is to be written to. In the given directory a new directory is created with name:
|
↳ bigquery_destination |
Google\Cloud\AIPlatform\V1\BigQueryDestination
Only applicable to custom training with tabular Dataset with BigQuery source. The BigQuery project location where the training data is to be written to. In the given project a new dataset is created with name
|
↳ dataset_id |
string
Required. The ID of the Dataset in the same Project and Location which data will be used to train the Model. The Dataset must use schema compatible with Model being trained, and what is compatible should be described in the used TrainingPipeline's [training_task_definition] [google.cloud.aiplatform.v1.TrainingPipeline.training_task_definition]. For tabular Datasets, all their data is exported to training, to pick and choose from. |
↳ annotations_filter |
string
Applicable only to Datasets that have DataItems and Annotations. A filter on Annotations of the Dataset. Only Annotations that both match this filter and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on (for the auto-assigned that role is decided by Vertex AI). A filter with same syntax as the one used in ListAnnotations may be used, but note here it filters across all Annotations of the Dataset, and not just within a single DataItem. |
↳ annotation_schema_uri |
string
Applicable only to custom training with Datasets that have DataItems and Annotations. Cloud Storage URI that points to a YAML file describing the annotation schema. The schema is defined as an OpenAPI 3.0.2 Schema Object. The schema files that can be used here are found in gs://google-cloud-aiplatform/schema/dataset/annotation/ , note that the chosen schema must be consistent with metadata of the Dataset specified by dataset_id. Only Annotations that both match this schema and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on. When used in conjunction with annotations_filter, the Annotations used for training are filtered by both annotations_filter and annotation_schema_uri. |
↳ saved_query_id |
string
Only applicable to Datasets that have SavedQueries. The ID of a SavedQuery (annotation set) under the Dataset specified by dataset_id used for filtering Annotations for training. Only Annotations that are associated with this SavedQuery are used in respectively training. When used in conjunction with annotations_filter, the Annotations used for training are filtered by both saved_query_id and annotations_filter. Only one of saved_query_id and annotation_schema_uri should be specified as both of them represent the same thing: problem type. |
↳ persist_ml_use_assignment |
bool
Whether to persist the ML use assignment to data item system labels. |
getFractionSplit
Split based on fractions defining the size of each set.
Generated from protobuf field .google.cloud.aiplatform.v1.FractionSplit fraction_split = 2;
Returns | |
---|---|
Type | Description |
Google\Cloud\AIPlatform\V1\FractionSplit|null |
hasFractionSplit
setFractionSplit
Split based on fractions defining the size of each set.
Generated from protobuf field .google.cloud.aiplatform.v1.FractionSplit fraction_split = 2;
Parameter | |
---|---|
Name | Description |
var |
Google\Cloud\AIPlatform\V1\FractionSplit
|
Returns | |
---|---|
Type | Description |
$this |
getFilterSplit
Split based on the provided filters for each set.
Generated from protobuf field .google.cloud.aiplatform.v1.FilterSplit filter_split = 3;
Returns | |
---|---|
Type | Description |
Google\Cloud\AIPlatform\V1\FilterSplit|null |
hasFilterSplit
setFilterSplit
Split based on the provided filters for each set.
Generated from protobuf field .google.cloud.aiplatform.v1.FilterSplit filter_split = 3;
Parameter | |
---|---|
Name | Description |
var |
Google\Cloud\AIPlatform\V1\FilterSplit
|
Returns | |
---|---|
Type | Description |
$this |
getPredefinedSplit
Supported only for tabular Datasets.
Split based on a predefined key.
Generated from protobuf field .google.cloud.aiplatform.v1.PredefinedSplit predefined_split = 4;
Returns | |
---|---|
Type | Description |
Google\Cloud\AIPlatform\V1\PredefinedSplit|null |
hasPredefinedSplit
setPredefinedSplit
Supported only for tabular Datasets.
Split based on a predefined key.
Generated from protobuf field .google.cloud.aiplatform.v1.PredefinedSplit predefined_split = 4;
Parameter | |
---|---|
Name | Description |
var |
Google\Cloud\AIPlatform\V1\PredefinedSplit
|
Returns | |
---|---|
Type | Description |
$this |
getTimestampSplit
Supported only for tabular Datasets.
Split based on the timestamp of the input data pieces.
Generated from protobuf field .google.cloud.aiplatform.v1.TimestampSplit timestamp_split = 5;
Returns | |
---|---|
Type | Description |
Google\Cloud\AIPlatform\V1\TimestampSplit|null |
hasTimestampSplit
setTimestampSplit
Supported only for tabular Datasets.
Split based on the timestamp of the input data pieces.
Generated from protobuf field .google.cloud.aiplatform.v1.TimestampSplit timestamp_split = 5;
Parameter | |
---|---|
Name | Description |
var |
Google\Cloud\AIPlatform\V1\TimestampSplit
|
Returns | |
---|---|
Type | Description |
$this |
getStratifiedSplit
Supported only for tabular Datasets.
Split based on the distribution of the specified column.
Generated from protobuf field .google.cloud.aiplatform.v1.StratifiedSplit stratified_split = 12;
Returns | |
---|---|
Type | Description |
Google\Cloud\AIPlatform\V1\StratifiedSplit|null |
hasStratifiedSplit
setStratifiedSplit
Supported only for tabular Datasets.
Split based on the distribution of the specified column.
Generated from protobuf field .google.cloud.aiplatform.v1.StratifiedSplit stratified_split = 12;
Parameter | |
---|---|
Name | Description |
var |
Google\Cloud\AIPlatform\V1\StratifiedSplit
|
Returns | |
---|---|
Type | Description |
$this |
getGcsDestination
The Cloud Storage location where the training data is to be
written to. In the given directory a new directory is created with
name:
dataset-<dataset-id>-<annotation-type>-<timestamp-of-training-call>
where timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601 format.
All training input data is written into that directory. The Vertex AI environment variables representing Cloud Storage data URIs are represented in the Cloud Storage wildcard format to support sharded data. e.g.: "gs://.../training-*.jsonl"
- AIP_DATA_FORMAT = "jsonl" for non-tabular data, "csv" for tabular data
- AIP_TRAINING_DATA_URI = "gcs_destination/dataset-
- AIP_VALIDATION_DATA_URI = "gcs_destination/dataset-
- AIP_TEST_DATA_URI = "gcs_destination/dataset-
Generated from protobuf field .google.cloud.aiplatform.v1.GcsDestination gcs_destination = 8;
Returns | |
---|---|
Type | Description |
Google\Cloud\AIPlatform\V1\GcsDestination|null |
hasGcsDestination
setGcsDestination
The Cloud Storage location where the training data is to be
written to. In the given directory a new directory is created with
name:
dataset-<dataset-id>-<annotation-type>-<timestamp-of-training-call>
where timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601 format.
All training input data is written into that directory. The Vertex AI environment variables representing Cloud Storage data URIs are represented in the Cloud Storage wildcard format to support sharded data. e.g.: "gs://.../training-*.jsonl"
- AIP_DATA_FORMAT = "jsonl" for non-tabular data, "csv" for tabular data
- AIP_TRAINING_DATA_URI = "gcs_destination/dataset-
- AIP_VALIDATION_DATA_URI = "gcs_destination/dataset-
- AIP_TEST_DATA_URI = "gcs_destination/dataset-
Generated from protobuf field .google.cloud.aiplatform.v1.GcsDestination gcs_destination = 8;
Parameter | |
---|---|
Name | Description |
var |
Google\Cloud\AIPlatform\V1\GcsDestination
|
Returns | |
---|---|
Type | Description |
$this |
getBigqueryDestination
Only applicable to custom training with tabular Dataset with BigQuery source.
The BigQuery project location where the training data is to be written
to. In the given project a new dataset is created with name
dataset_<dataset-id>_<annotation-type>_<timestamp-of-training-call>
where timestamp is in YYYY_MM_DDThh_mm_ss_sssZ format. All training
input data is written into that dataset. In the dataset three
tables are created, training
, validation
and test
.
- AIP_DATA_FORMAT = "bigquery".
- AIP_TRAINING_DATA_URI = "bigquery_destination.dataset_
- AIP_VALIDATION_DATA_URI = "bigquery_destination.dataset_
- AIP_TEST_DATA_URI = "bigquery_destination.dataset_
Generated from protobuf field .google.cloud.aiplatform.v1.BigQueryDestination bigquery_destination = 10;
Returns | |
---|---|
Type | Description |
Google\Cloud\AIPlatform\V1\BigQueryDestination|null |
hasBigqueryDestination
setBigqueryDestination
Only applicable to custom training with tabular Dataset with BigQuery source.
The BigQuery project location where the training data is to be written
to. In the given project a new dataset is created with name
dataset_<dataset-id>_<annotation-type>_<timestamp-of-training-call>
where timestamp is in YYYY_MM_DDThh_mm_ss_sssZ format. All training
input data is written into that dataset. In the dataset three
tables are created, training
, validation
and test
.
- AIP_DATA_FORMAT = "bigquery".
- AIP_TRAINING_DATA_URI = "bigquery_destination.dataset_
- AIP_VALIDATION_DATA_URI = "bigquery_destination.dataset_
- AIP_TEST_DATA_URI = "bigquery_destination.dataset_
Generated from protobuf field .google.cloud.aiplatform.v1.BigQueryDestination bigquery_destination = 10;
Parameter | |
---|---|
Name | Description |
var |
Google\Cloud\AIPlatform\V1\BigQueryDestination
|
Returns | |
---|---|
Type | Description |
$this |
getDatasetId
Required. The ID of the Dataset in the same Project and Location which data will be used to train the Model. The Dataset must use schema compatible with Model being trained, and what is compatible should be described in the used TrainingPipeline's [training_task_definition] [google.cloud.aiplatform.v1.TrainingPipeline.training_task_definition].
For tabular Datasets, all their data is exported to training, to pick and choose from.
Generated from protobuf field string dataset_id = 1 [(.google.api.field_behavior) = REQUIRED];
Returns | |
---|---|
Type | Description |
string |
setDatasetId
Required. The ID of the Dataset in the same Project and Location which data will be used to train the Model. The Dataset must use schema compatible with Model being trained, and what is compatible should be described in the used TrainingPipeline's [training_task_definition] [google.cloud.aiplatform.v1.TrainingPipeline.training_task_definition].
For tabular Datasets, all their data is exported to training, to pick and choose from.
Generated from protobuf field string dataset_id = 1 [(.google.api.field_behavior) = REQUIRED];
Parameter | |
---|---|
Name | Description |
var |
string
|
Returns | |
---|---|
Type | Description |
$this |
getAnnotationsFilter
Applicable only to Datasets that have DataItems and Annotations.
A filter on Annotations of the Dataset. Only Annotations that both match this filter and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on (for the auto-assigned that role is decided by Vertex AI). A filter with same syntax as the one used in ListAnnotations may be used, but note here it filters across all Annotations of the Dataset, and not just within a single DataItem.
Generated from protobuf field string annotations_filter = 6;
Returns | |
---|---|
Type | Description |
string |
setAnnotationsFilter
Applicable only to Datasets that have DataItems and Annotations.
A filter on Annotations of the Dataset. Only Annotations that both match this filter and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on (for the auto-assigned that role is decided by Vertex AI). A filter with same syntax as the one used in ListAnnotations may be used, but note here it filters across all Annotations of the Dataset, and not just within a single DataItem.
Generated from protobuf field string annotations_filter = 6;
Parameter | |
---|---|
Name | Description |
var |
string
|
Returns | |
---|---|
Type | Description |
$this |
getAnnotationSchemaUri
Applicable only to custom training with Datasets that have DataItems and Annotations.
Cloud Storage URI that points to a YAML file describing the annotation schema. The schema is defined as an OpenAPI 3.0.2 Schema Object. The schema files that can be used here are found in gs://google-cloud-aiplatform/schema/dataset/annotation/ , note that the chosen schema must be consistent with metadata of the Dataset specified by dataset_id. Only Annotations that both match this schema and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on. When used in conjunction with annotations_filter, the Annotations used for training are filtered by both annotations_filter and annotation_schema_uri.
Generated from protobuf field string annotation_schema_uri = 9;
Returns | |
---|---|
Type | Description |
string |
setAnnotationSchemaUri
Applicable only to custom training with Datasets that have DataItems and Annotations.
Cloud Storage URI that points to a YAML file describing the annotation schema. The schema is defined as an OpenAPI 3.0.2 Schema Object. The schema files that can be used here are found in gs://google-cloud-aiplatform/schema/dataset/annotation/ , note that the chosen schema must be consistent with metadata of the Dataset specified by dataset_id. Only Annotations that both match this schema and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on. When used in conjunction with annotations_filter, the Annotations used for training are filtered by both annotations_filter and annotation_schema_uri.
Generated from protobuf field string annotation_schema_uri = 9;
Parameter | |
---|---|
Name | Description |
var |
string
|
Returns | |
---|---|
Type | Description |
$this |
getSavedQueryId
Only applicable to Datasets that have SavedQueries.
The ID of a SavedQuery (annotation set) under the Dataset specified by dataset_id used for filtering Annotations for training. Only Annotations that are associated with this SavedQuery are used in respectively training. When used in conjunction with annotations_filter, the Annotations used for training are filtered by both saved_query_id and annotations_filter. Only one of saved_query_id and annotation_schema_uri should be specified as both of them represent the same thing: problem type.
Generated from protobuf field string saved_query_id = 7;
Returns | |
---|---|
Type | Description |
string |
setSavedQueryId
Only applicable to Datasets that have SavedQueries.
The ID of a SavedQuery (annotation set) under the Dataset specified by dataset_id used for filtering Annotations for training. Only Annotations that are associated with this SavedQuery are used in respectively training. When used in conjunction with annotations_filter, the Annotations used for training are filtered by both saved_query_id and annotations_filter. Only one of saved_query_id and annotation_schema_uri should be specified as both of them represent the same thing: problem type.
Generated from protobuf field string saved_query_id = 7;
Parameter | |
---|---|
Name | Description |
var |
string
|
Returns | |
---|---|
Type | Description |
$this |
getPersistMlUseAssignment
Whether to persist the ML use assignment to data item system labels.
Generated from protobuf field bool persist_ml_use_assignment = 11;
Returns | |
---|---|
Type | Description |
bool |
setPersistMlUseAssignment
Whether to persist the ML use assignment to data item system labels.
Generated from protobuf field bool persist_ml_use_assignment = 11;
Parameter | |
---|---|
Name | Description |
var |
bool
|
Returns | |
---|---|
Type | Description |
$this |
getSplit
Returns | |
---|---|
Type | Description |
string |
getDestination
Returns | |
---|---|
Type | Description |
string |