Vertex AI V1 API - Class Google::Cloud::AIPlatform::V1::InputDataConfig (v0.63.0)

Reference documentation and code samples for the Vertex AI V1 API class Google::Cloud::AIPlatform::V1::InputDataConfig.

Specifies Vertex AI owned input data to be used for training, and possibly evaluating, the Model.

Inherits

  • Object

Extended By

  • Google::Protobuf::MessageExts::ClassMethods

Includes

  • Google::Protobuf::MessageExts

Methods

#annotation_schema_uri

def annotation_schema_uri() -> ::String
Returns
  • (::String) — Applicable only to custom training with Datasets that have DataItems and Annotations.

    Cloud Storage URI that points to a YAML file describing the annotation schema. The schema is defined as an OpenAPI 3.0.2 Schema Object. The schema files that can be used here are found in gs://google-cloud-aiplatform/schema/dataset/annotation/ , note that the chosen schema must be consistent with metadata of the Dataset specified by dataset_id.

    Only Annotations that both match this schema and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on.

    When used in conjunction with annotations_filter, the Annotations used for training are filtered by both annotations_filter and annotation_schema_uri.

#annotation_schema_uri=

def annotation_schema_uri=(value) -> ::String
Parameter
  • value (::String) — Applicable only to custom training with Datasets that have DataItems and Annotations.

    Cloud Storage URI that points to a YAML file describing the annotation schema. The schema is defined as an OpenAPI 3.0.2 Schema Object. The schema files that can be used here are found in gs://google-cloud-aiplatform/schema/dataset/annotation/ , note that the chosen schema must be consistent with metadata of the Dataset specified by dataset_id.

    Only Annotations that both match this schema and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on.

    When used in conjunction with annotations_filter, the Annotations used for training are filtered by both annotations_filter and annotation_schema_uri.

Returns
  • (::String) — Applicable only to custom training with Datasets that have DataItems and Annotations.

    Cloud Storage URI that points to a YAML file describing the annotation schema. The schema is defined as an OpenAPI 3.0.2 Schema Object. The schema files that can be used here are found in gs://google-cloud-aiplatform/schema/dataset/annotation/ , note that the chosen schema must be consistent with metadata of the Dataset specified by dataset_id.

    Only Annotations that both match this schema and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on.

    When used in conjunction with annotations_filter, the Annotations used for training are filtered by both annotations_filter and annotation_schema_uri.

#annotations_filter

def annotations_filter() -> ::String
Returns
  • (::String) — Applicable only to Datasets that have DataItems and Annotations.

    A filter on Annotations of the Dataset. Only Annotations that both match this filter and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on (for the auto-assigned that role is decided by Vertex AI). A filter with same syntax as the one used in ListAnnotations may be used, but note here it filters across all Annotations of the Dataset, and not just within a single DataItem.

#annotations_filter=

def annotations_filter=(value) -> ::String
Parameter
  • value (::String) — Applicable only to Datasets that have DataItems and Annotations.

    A filter on Annotations of the Dataset. Only Annotations that both match this filter and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on (for the auto-assigned that role is decided by Vertex AI). A filter with same syntax as the one used in ListAnnotations may be used, but note here it filters across all Annotations of the Dataset, and not just within a single DataItem.

Returns
  • (::String) — Applicable only to Datasets that have DataItems and Annotations.

    A filter on Annotations of the Dataset. Only Annotations that both match this filter and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on (for the auto-assigned that role is decided by Vertex AI). A filter with same syntax as the one used in ListAnnotations may be used, but note here it filters across all Annotations of the Dataset, and not just within a single DataItem.

#bigquery_destination

def bigquery_destination() -> ::Google::Cloud::AIPlatform::V1::BigQueryDestination
Returns
  • (::Google::Cloud::AIPlatform::V1::BigQueryDestination) — Only applicable to custom training with tabular Dataset with BigQuery source.

    The BigQuery project location where the training data is to be written to. In the given project a new dataset is created with name dataset_<dataset-id>_<annotation-type>_<timestamp-of-training-call> where timestamp is in YYYY_MM_DDThh_mm_ss_sssZ format. All training input data is written into that dataset. In the dataset three tables are created, training, validation and test.

    • AIP_DATA_FORMAT = "bigquery".
    • AIP_TRAINING_DATA_URI = "bigquery_destination.dataset_

    • AIP_VALIDATION_DATA_URI = "bigquery_destination.dataset_

    • AIP_TEST_DATA_URI = "bigquery_destination.dataset_

    Note: The following fields are mutually exclusive: bigquery_destination, gcs_destination. If a field in that set is populated, all other fields in the set will automatically be cleared.

#bigquery_destination=

def bigquery_destination=(value) -> ::Google::Cloud::AIPlatform::V1::BigQueryDestination
Parameter
  • value (::Google::Cloud::AIPlatform::V1::BigQueryDestination) — Only applicable to custom training with tabular Dataset with BigQuery source.

    The BigQuery project location where the training data is to be written to. In the given project a new dataset is created with name dataset_<dataset-id>_<annotation-type>_<timestamp-of-training-call> where timestamp is in YYYY_MM_DDThh_mm_ss_sssZ format. All training input data is written into that dataset. In the dataset three tables are created, training, validation and test.

    • AIP_DATA_FORMAT = "bigquery".
    • AIP_TRAINING_DATA_URI = "bigquery_destination.dataset_

    • AIP_VALIDATION_DATA_URI = "bigquery_destination.dataset_

    • AIP_TEST_DATA_URI = "bigquery_destination.dataset_

    Note: The following fields are mutually exclusive: bigquery_destination, gcs_destination. If a field in that set is populated, all other fields in the set will automatically be cleared.

Returns
  • (::Google::Cloud::AIPlatform::V1::BigQueryDestination) — Only applicable to custom training with tabular Dataset with BigQuery source.

    The BigQuery project location where the training data is to be written to. In the given project a new dataset is created with name dataset_<dataset-id>_<annotation-type>_<timestamp-of-training-call> where timestamp is in YYYY_MM_DDThh_mm_ss_sssZ format. All training input data is written into that dataset. In the dataset three tables are created, training, validation and test.

    • AIP_DATA_FORMAT = "bigquery".
    • AIP_TRAINING_DATA_URI = "bigquery_destination.dataset_

    • AIP_VALIDATION_DATA_URI = "bigquery_destination.dataset_

    • AIP_TEST_DATA_URI = "bigquery_destination.dataset_

    Note: The following fields are mutually exclusive: bigquery_destination, gcs_destination. If a field in that set is populated, all other fields in the set will automatically be cleared.

#dataset_id

def dataset_id() -> ::String
Returns
  • (::String) — Required. The ID of the Dataset in the same Project and Location which data will be used to train the Model. The Dataset must use schema compatible with Model being trained, and what is compatible should be described in the used TrainingPipeline's [training_task_definition] [google.cloud.aiplatform.v1.TrainingPipeline.training_task_definition]. For tabular Datasets, all their data is exported to training, to pick and choose from.

#dataset_id=

def dataset_id=(value) -> ::String
Parameter
  • value (::String) — Required. The ID of the Dataset in the same Project and Location which data will be used to train the Model. The Dataset must use schema compatible with Model being trained, and what is compatible should be described in the used TrainingPipeline's [training_task_definition] [google.cloud.aiplatform.v1.TrainingPipeline.training_task_definition]. For tabular Datasets, all their data is exported to training, to pick and choose from.
Returns
  • (::String) — Required. The ID of the Dataset in the same Project and Location which data will be used to train the Model. The Dataset must use schema compatible with Model being trained, and what is compatible should be described in the used TrainingPipeline's [training_task_definition] [google.cloud.aiplatform.v1.TrainingPipeline.training_task_definition]. For tabular Datasets, all their data is exported to training, to pick and choose from.

#filter_split

def filter_split() -> ::Google::Cloud::AIPlatform::V1::FilterSplit
Returns
  • (::Google::Cloud::AIPlatform::V1::FilterSplit) — Split based on the provided filters for each set.

    Note: The following fields are mutually exclusive: filter_split, fraction_split, predefined_split, timestamp_split, stratified_split. If a field in that set is populated, all other fields in the set will automatically be cleared.

#filter_split=

def filter_split=(value) -> ::Google::Cloud::AIPlatform::V1::FilterSplit
Parameter
  • value (::Google::Cloud::AIPlatform::V1::FilterSplit) — Split based on the provided filters for each set.

    Note: The following fields are mutually exclusive: filter_split, fraction_split, predefined_split, timestamp_split, stratified_split. If a field in that set is populated, all other fields in the set will automatically be cleared.

Returns
  • (::Google::Cloud::AIPlatform::V1::FilterSplit) — Split based on the provided filters for each set.

    Note: The following fields are mutually exclusive: filter_split, fraction_split, predefined_split, timestamp_split, stratified_split. If a field in that set is populated, all other fields in the set will automatically be cleared.

#fraction_split

def fraction_split() -> ::Google::Cloud::AIPlatform::V1::FractionSplit
Returns
  • (::Google::Cloud::AIPlatform::V1::FractionSplit) — Split based on fractions defining the size of each set.

    Note: The following fields are mutually exclusive: fraction_split, filter_split, predefined_split, timestamp_split, stratified_split. If a field in that set is populated, all other fields in the set will automatically be cleared.

#fraction_split=

def fraction_split=(value) -> ::Google::Cloud::AIPlatform::V1::FractionSplit
Parameter
  • value (::Google::Cloud::AIPlatform::V1::FractionSplit) — Split based on fractions defining the size of each set.

    Note: The following fields are mutually exclusive: fraction_split, filter_split, predefined_split, timestamp_split, stratified_split. If a field in that set is populated, all other fields in the set will automatically be cleared.

Returns
  • (::Google::Cloud::AIPlatform::V1::FractionSplit) — Split based on fractions defining the size of each set.

    Note: The following fields are mutually exclusive: fraction_split, filter_split, predefined_split, timestamp_split, stratified_split. If a field in that set is populated, all other fields in the set will automatically be cleared.

#gcs_destination

def gcs_destination() -> ::Google::Cloud::AIPlatform::V1::GcsDestination
Returns
  • (::Google::Cloud::AIPlatform::V1::GcsDestination) — The Cloud Storage location where the training data is to be written to. In the given directory a new directory is created with name: dataset-<dataset-id>-<annotation-type>-<timestamp-of-training-call> where timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601 format. All training input data is written into that directory.

    The Vertex AI environment variables representing Cloud Storage data URIs are represented in the Cloud Storage wildcard format to support sharded data. e.g.: "gs://.../training-*.jsonl"

    • AIP_DATA_FORMAT = "jsonl" for non-tabular data, "csv" for tabular data
    • AIP_TRAINING_DATA_URI = "gcs_destination/dataset-

    • AIP_VALIDATION_DATA_URI = "gcs_destination/dataset-

    • AIP_TEST_DATA_URI = "gcs_destination/dataset-

    Note: The following fields are mutually exclusive: gcs_destination, bigquery_destination. If a field in that set is populated, all other fields in the set will automatically be cleared.

#gcs_destination=

def gcs_destination=(value) -> ::Google::Cloud::AIPlatform::V1::GcsDestination
Parameter
  • value (::Google::Cloud::AIPlatform::V1::GcsDestination) — The Cloud Storage location where the training data is to be written to. In the given directory a new directory is created with name: dataset-<dataset-id>-<annotation-type>-<timestamp-of-training-call> where timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601 format. All training input data is written into that directory.

    The Vertex AI environment variables representing Cloud Storage data URIs are represented in the Cloud Storage wildcard format to support sharded data. e.g.: "gs://.../training-*.jsonl"

    • AIP_DATA_FORMAT = "jsonl" for non-tabular data, "csv" for tabular data
    • AIP_TRAINING_DATA_URI = "gcs_destination/dataset-

    • AIP_VALIDATION_DATA_URI = "gcs_destination/dataset-

    • AIP_TEST_DATA_URI = "gcs_destination/dataset-

    Note: The following fields are mutually exclusive: gcs_destination, bigquery_destination. If a field in that set is populated, all other fields in the set will automatically be cleared.

Returns
  • (::Google::Cloud::AIPlatform::V1::GcsDestination) — The Cloud Storage location where the training data is to be written to. In the given directory a new directory is created with name: dataset-<dataset-id>-<annotation-type>-<timestamp-of-training-call> where timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601 format. All training input data is written into that directory.

    The Vertex AI environment variables representing Cloud Storage data URIs are represented in the Cloud Storage wildcard format to support sharded data. e.g.: "gs://.../training-*.jsonl"

    • AIP_DATA_FORMAT = "jsonl" for non-tabular data, "csv" for tabular data
    • AIP_TRAINING_DATA_URI = "gcs_destination/dataset-

    • AIP_VALIDATION_DATA_URI = "gcs_destination/dataset-

    • AIP_TEST_DATA_URI = "gcs_destination/dataset-

    Note: The following fields are mutually exclusive: gcs_destination, bigquery_destination. If a field in that set is populated, all other fields in the set will automatically be cleared.

#persist_ml_use_assignment

def persist_ml_use_assignment() -> ::Boolean
Returns
  • (::Boolean) — Whether to persist the ML use assignment to data item system labels.

#persist_ml_use_assignment=

def persist_ml_use_assignment=(value) -> ::Boolean
Parameter
  • value (::Boolean) — Whether to persist the ML use assignment to data item system labels.
Returns
  • (::Boolean) — Whether to persist the ML use assignment to data item system labels.

#predefined_split

def predefined_split() -> ::Google::Cloud::AIPlatform::V1::PredefinedSplit
Returns
  • (::Google::Cloud::AIPlatform::V1::PredefinedSplit) — Supported only for tabular Datasets.

    Split based on a predefined key.

    Note: The following fields are mutually exclusive: predefined_split, fraction_split, filter_split, timestamp_split, stratified_split. If a field in that set is populated, all other fields in the set will automatically be cleared.

#predefined_split=

def predefined_split=(value) -> ::Google::Cloud::AIPlatform::V1::PredefinedSplit
Parameter
  • value (::Google::Cloud::AIPlatform::V1::PredefinedSplit) — Supported only for tabular Datasets.

    Split based on a predefined key.

    Note: The following fields are mutually exclusive: predefined_split, fraction_split, filter_split, timestamp_split, stratified_split. If a field in that set is populated, all other fields in the set will automatically be cleared.

Returns
  • (::Google::Cloud::AIPlatform::V1::PredefinedSplit) — Supported only for tabular Datasets.

    Split based on a predefined key.

    Note: The following fields are mutually exclusive: predefined_split, fraction_split, filter_split, timestamp_split, stratified_split. If a field in that set is populated, all other fields in the set will automatically be cleared.

#saved_query_id

def saved_query_id() -> ::String
Returns
  • (::String) — Only applicable to Datasets that have SavedQueries.

    The ID of a SavedQuery (annotation set) under the Dataset specified by dataset_id used for filtering Annotations for training.

    Only Annotations that are associated with this SavedQuery are used in respectively training. When used in conjunction with annotations_filter, the Annotations used for training are filtered by both saved_query_id and annotations_filter.

    Only one of saved_query_id and annotation_schema_uri should be specified as both of them represent the same thing: problem type.

#saved_query_id=

def saved_query_id=(value) -> ::String
Parameter
  • value (::String) — Only applicable to Datasets that have SavedQueries.

    The ID of a SavedQuery (annotation set) under the Dataset specified by dataset_id used for filtering Annotations for training.

    Only Annotations that are associated with this SavedQuery are used in respectively training. When used in conjunction with annotations_filter, the Annotations used for training are filtered by both saved_query_id and annotations_filter.

    Only one of saved_query_id and annotation_schema_uri should be specified as both of them represent the same thing: problem type.

Returns
  • (::String) — Only applicable to Datasets that have SavedQueries.

    The ID of a SavedQuery (annotation set) under the Dataset specified by dataset_id used for filtering Annotations for training.

    Only Annotations that are associated with this SavedQuery are used in respectively training. When used in conjunction with annotations_filter, the Annotations used for training are filtered by both saved_query_id and annotations_filter.

    Only one of saved_query_id and annotation_schema_uri should be specified as both of them represent the same thing: problem type.

#stratified_split

def stratified_split() -> ::Google::Cloud::AIPlatform::V1::StratifiedSplit
Returns
  • (::Google::Cloud::AIPlatform::V1::StratifiedSplit) — Supported only for tabular Datasets.

    Split based on the distribution of the specified column.

    Note: The following fields are mutually exclusive: stratified_split, fraction_split, filter_split, predefined_split, timestamp_split. If a field in that set is populated, all other fields in the set will automatically be cleared.

#stratified_split=

def stratified_split=(value) -> ::Google::Cloud::AIPlatform::V1::StratifiedSplit
Parameter
  • value (::Google::Cloud::AIPlatform::V1::StratifiedSplit) — Supported only for tabular Datasets.

    Split based on the distribution of the specified column.

    Note: The following fields are mutually exclusive: stratified_split, fraction_split, filter_split, predefined_split, timestamp_split. If a field in that set is populated, all other fields in the set will automatically be cleared.

Returns
  • (::Google::Cloud::AIPlatform::V1::StratifiedSplit) — Supported only for tabular Datasets.

    Split based on the distribution of the specified column.

    Note: The following fields are mutually exclusive: stratified_split, fraction_split, filter_split, predefined_split, timestamp_split. If a field in that set is populated, all other fields in the set will automatically be cleared.

#timestamp_split

def timestamp_split() -> ::Google::Cloud::AIPlatform::V1::TimestampSplit
Returns
  • (::Google::Cloud::AIPlatform::V1::TimestampSplit) — Supported only for tabular Datasets.

    Split based on the timestamp of the input data pieces.

    Note: The following fields are mutually exclusive: timestamp_split, fraction_split, filter_split, predefined_split, stratified_split. If a field in that set is populated, all other fields in the set will automatically be cleared.

#timestamp_split=

def timestamp_split=(value) -> ::Google::Cloud::AIPlatform::V1::TimestampSplit
Parameter
  • value (::Google::Cloud::AIPlatform::V1::TimestampSplit) — Supported only for tabular Datasets.

    Split based on the timestamp of the input data pieces.

    Note: The following fields are mutually exclusive: timestamp_split, fraction_split, filter_split, predefined_split, stratified_split. If a field in that set is populated, all other fields in the set will automatically be cleared.

Returns
  • (::Google::Cloud::AIPlatform::V1::TimestampSplit) — Supported only for tabular Datasets.

    Split based on the timestamp of the input data pieces.

    Note: The following fields are mutually exclusive: timestamp_split, fraction_split, filter_split, predefined_split, stratified_split. If a field in that set is populated, all other fields in the set will automatically be cleared.