- 1.77.0 (latest)
- 1.76.0
- 1.75.0
- 1.74.0
- 1.73.0
- 1.72.0
- 1.71.1
- 1.70.0
- 1.69.0
- 1.68.0
- 1.67.1
- 1.66.0
- 1.65.0
- 1.63.0
- 1.62.0
- 1.60.0
- 1.59.0
- 1.58.0
- 1.57.0
- 1.56.0
- 1.55.0
- 1.54.1
- 1.53.0
- 1.52.0
- 1.51.0
- 1.50.0
- 1.49.0
- 1.48.0
- 1.47.0
- 1.46.0
- 1.45.0
- 1.44.0
- 1.43.0
- 1.39.0
- 1.38.1
- 1.37.0
- 1.36.4
- 1.35.0
- 1.34.0
- 1.33.1
- 1.32.0
- 1.31.1
- 1.30.1
- 1.29.0
- 1.28.1
- 1.27.1
- 1.26.1
- 1.25.0
- 1.24.1
- 1.23.0
- 1.22.1
- 1.21.0
- 1.20.0
- 1.19.1
- 1.18.3
- 1.17.1
- 1.16.1
- 1.15.1
- 1.14.0
- 1.13.1
- 1.12.1
- 1.11.0
- 1.10.0
- 1.9.0
- 1.8.1
- 1.7.1
- 1.6.2
- 1.5.0
- 1.4.3
- 1.3.0
- 1.2.0
- 1.1.1
- 1.0.1
- 0.9.0
- 0.8.0
- 0.7.1
- 0.6.0
- 0.5.1
- 0.4.0
- 0.3.1
ImageDataset(
dataset_name: str,
project: typing.Optional[str] = None,
location: typing.Optional[str] = None,
credentials: typing.Optional[google.auth.credentials.Credentials] = None,
)
A managed image dataset resource for Vertex AI.
Use this class to work with a managed image dataset. To create a managed image dataset, you need a datasource file in CSV format and a schema file in YAML format. A schema is optional for a custom model. You put the CSV file and the schema into Cloud Storage buckets.
Use image data for the following objectives:
- Single-label classification. For more information, see Prepare image training data for single-label classification.
- Multi-label classification. For more information, see Prepare image training data for multi-label classification.
- Object detection. For more information, see Prepare image training data for object detection.
The following code shows you how to create an image dataset by importing data from a CSV datasource file and a YAML schema file. The schema file you use depends on whether your image dataset is used for single-label classification, multi-label classification, or object detection.
my_dataset = aiplatform.ImageDataset.create(
display_name="my-image-dataset",
gcs_source=['gs://path/to/my/image-dataset.csv'],
import_schema_uri=['gs://path/to/my/schema.yaml']
)
Properties
create_time
Time this resource was created.
display_name
Display name of this resource.
encryption_spec
Customer-managed encryption key options for this Vertex AI resource.
If this is set, then all resources created by this Vertex AI resource will be encrypted with the provided encryption key.
gca_resource
The underlying resource proto representation.
labels
User-defined labels containing metadata about this resource.
Read more about labels at https://goo.gl/xmQnxf
metadata_schema_uri
The metadata schema uri of this dataset resource.
name
Name of this resource.
resource_name
Full qualified resource name.
update_time
Time this resource was last updated.
Methods
ImageDataset
ImageDataset(
dataset_name: str,
project: typing.Optional[str] = None,
location: typing.Optional[str] = None,
credentials: typing.Optional[google.auth.credentials.Credentials] = None,
)
Retrieves an existing managed dataset given a dataset name or ID.
Parameters | |
---|---|
Name | Description |
dataset_name |
str
Required. A fully-qualified dataset resource name or dataset ID. Example: "projects/123/locations/us-central1/datasets/456" or "456" when project and location are initialized or passed. |
project |
str
Optional project to retrieve dataset from. If not set, project set in aiplatform.init will be used. |
location |
str
Optional location to retrieve dataset from. If not set, location set in aiplatform.init will be used. |
credentials |
auth_credentials.Credentials
Custom credentials to use to retrieve this Dataset. Overrides credentials set in aiplatform.init. |
create
create(
display_name: typing.Optional[str] = None,
gcs_source: typing.Optional[typing.Union[str, typing.Sequence[str]]] = None,
import_schema_uri: typing.Optional[str] = None,
data_item_labels: typing.Optional[typing.Dict] = None,
project: typing.Optional[str] = None,
location: typing.Optional[str] = None,
credentials: typing.Optional[google.auth.credentials.Credentials] = None,
request_metadata: typing.Optional[typing.Sequence[typing.Tuple[str, str]]] = (),
labels: typing.Optional[typing.Dict[str, str]] = None,
encryption_spec_key_name: typing.Optional[str] = None,
sync: bool = True,
create_request_timeout: typing.Optional[float] = None,
) -> google.cloud.aiplatform.datasets.image_dataset.ImageDataset
Creates a new image dataset.
Optionally imports data into the dataset when a source and
import_schema_uri
are passed in.
Parameters | |
---|---|
Name | Description |
display_name |
str
Optional. The user-defined name of the dataset. The name must contain 128 or fewer UTF-8 characters. |
gcs_source |
Union[str, Sequence[str]]
Optional. The URI to one or more Google Cloud Storage buckets that contain your datasets. For example, |
import_schema_uri |
str
Optional. A URI for a YAML file stored in Cloud Storage that describes the import schema used to validate the dataset. The schema is an OpenAPI 3.0.2 Schema object. |
data_item_labels |
Dict
Optional. A dictionary of label information. Each dictionary item contains a label and a label key. Each image in the dataset includes one dictionary of label information. If a data item is added or merged into a dataset, and that data item contains an image that's identical to an image that’s already in the dataset, then the data items are merged. If two identical labels are detected during the merge, each with a different label key, then one of the label and label key dictionary items is randomly chosen to be into the merged data item. Images and documents are compared using their binary data (bytes), not on their content. If annotation labels are referenced in a schema specified by the |
project |
str
Optional. The name of the Google Cloud project to which this |
location |
str
Optional. The Google Cloud region where this dataset is uploaded. This region overrides the region that was set by |
credentials |
auth_credentials.Credentials
Optional. The credentials that are used to upload the |
request_metadata |
Sequence[Tuple[str, str]]
Optional. Strings that contain metadata that's sent with the request. |
labels |
Dict[str, str]
Optional. Labels with user-defined metadata to organize your Vertex AI Tensorboards. The maximum length of a key and of a value is 64 unicode characters. Labels and keys can contain only lowercase letters, numeric characters, underscores, and dashes. International characters are allowed. No more than 64 user labels can be associated with one Tensorboard (system labels are excluded). For more information and examples of using labels, see Using labels to organize Google Cloud Platform resources. System reserved label keys are prefixed with |
encryption_spec_key_name |
Optional[str]
Optional. The Cloud KMS resource identifier of the customer managed encryption key that's used to protect the dataset. The format of the key is |
sync |
bool
If |
create_request_timeout |
float
Optional. The number of seconds for the timeout of the create request. |
Returns | |
---|---|
Type | Description |
image_dataset (ImageDataset) |
An instantiated representation of the managed ImageDataset resource. |
delete
delete(sync: bool = True) -> None
Deletes this Vertex AI resource. WARNING: This deletion is permanent.
Parameter | |
---|---|
Name | Description |
sync |
bool
Whether to execute this deletion synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed. |
export_data
export_data(output_dir: str) -> typing.Sequence[str]
Exports data to output dir to GCS.
Parameter | |
---|---|
Name | Description |
output_dir |
str
Required. The Google Cloud Storage location where the output is to be written to. In the given directory a new directory will be created with name: |
Returns | |
---|---|
Type | Description |
exported_files (Sequence[str]) |
All of the files that are exported in this export operation. |
export_data_for_custom_training
export_data_for_custom_training(
output_dir: str,
annotation_filter: typing.Optional[str] = None,
saved_query_id: typing.Optional[str] = None,
annotation_schema_uri: typing.Optional[str] = None,
split: typing.Optional[
typing.Union[typing.Dict[str, str], typing.Dict[str, float]]
] = None,
) -> typing.Dict[str, typing.Any]
Exports data to output dir to GCS for custom training use case.
Example annotation_schema_uri (image classification): gs://google-cloud-aiplatform/schema/dataset/annotation/image_classification_1.0.0.yaml
Example split (filter split): { "training_filter": "labels.aiplatform.googleapis.com/ml_use=training", "validation_filter": "labels.aiplatform.googleapis.com/ml_use=validation", "test_filter": "labels.aiplatform.googleapis.com/ml_use=test", } Example split (fraction split): { "training_fraction": 0.7, "validation_fraction": 0.2, "test_fraction": 0.1, }
Parameters | |
---|---|
Name | Description |
output_dir |
str
Required. The Google Cloud Storage location where the output is to be written to. In the given directory a new directory will be created with name: |
annotation_filter |
str
Optional. An expression for filtering what part of the Dataset is to be exported. Only Annotations that match this filter will be exported. The filter syntax is the same as in |
saved_query_id |
str
Optional. The ID of a SavedQuery (annotation set) under this Dataset used for filtering Annotations for training. Only used for custom training data export use cases. Only applicable to Datasets that have SavedQueries. Only Annotations that are associated with this SavedQuery are used in respectively training. When used in conjunction with annotations_filter, the Annotations used for training are filtered by both saved_query_id and annotations_filter. Only one of saved_query_id and annotation_schema_uri should be specified as both of them represent the same thing: problem type. |
annotation_schema_uri |
str
Optional. The Cloud Storage URI that points to a YAML file describing the annotation schema. The schema is defined as an OpenAPI 3.0.2 Schema Object. The schema files that can be used here are found in gs://google-cloud-aiplatform/schema/dataset/annotation/, note that the chosen schema must be consistent with metadata_schema_uri of this Dataset. Only used for custom training data export use cases. Only applicable if this Dataset that have DataItems and Annotations. Only Annotations that both match this schema and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on. When used in conjunction with annotations_filter, the Annotations used for training are filtered by both annotations_filter and annotation_schema_uri. |
split |
Union[Dict[str, str], Dict[str, float]]
The instructions how the export data should be split between the training, validation and test sets. |
Returns | |
---|---|
Type | Description |
export_data_response (Dict) |
Response message for DatasetService.ExportData in Dictionary format. |
import_data
import_data(
gcs_source: typing.Union[str, typing.Sequence[str]],
import_schema_uri: str,
data_item_labels: typing.Optional[typing.Dict] = None,
sync: bool = True,
import_request_timeout: typing.Optional[float] = None,
) -> google.cloud.aiplatform.datasets.dataset._Dataset
Upload data to existing managed dataset.
Parameters | |
---|---|
Name | Description |
gcs_source |
Union[str, Sequence[str]]
Required. Google Cloud Storage URI(-s) to the input file(s). May contain wildcards. For more information on wildcards, see https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames. .. rubric:: Examples str: "gs://bucket/file.csv" Sequence[str]: ["gs://bucket/file1.csv", "gs://bucket/file2.csv"] |
import_schema_uri |
str
Required. Points to a YAML file stored on Google Cloud Storage describing the import format. Validation will be done against the schema. The schema is defined as an |
data_item_labels |
Dict
Labels that will be applied to newly imported DataItems. If an identical DataItem as one being imported already exists in the Dataset, then these labels will be appended to these of the already existing one, and if labels with identical key is imported before, the old label value will be overwritten. If two DataItems are identical in the same import data operation, the labels will be combined and if key collision happens in this case, one of the values will be picked randomly. Two DataItems are considered identical if their content bytes are identical (e.g. image bytes or pdf bytes). These labels will be overridden by Annotation labels specified inside index file referenced by |
sync |
bool
Whether to execute this method synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed. |
import_request_timeout |
float
Optional. The timeout for the import request in seconds. |
Returns | |
---|---|
Type | Description |
dataset (Dataset) |
Instantiated representation of the managed dataset resource. |
list
list(
filter: typing.Optional[str] = None,
order_by: typing.Optional[str] = None,
project: typing.Optional[str] = None,
location: typing.Optional[str] = None,
credentials: typing.Optional[google.auth.credentials.Credentials] = None,
) -> typing.List[google.cloud.aiplatform.base.VertexAiResourceNoun]
List all instances of this Dataset resource.
Example Usage:
aiplatform.TabularDataset.list( filter='labels.my_key="my_value"', order_by='display_name' )
Parameters | |
---|---|
Name | Description |
filter |
str
Optional. An expression for filtering the results of the request. For field names both snake_case and camelCase are supported. |
order_by |
str
Optional. A comma-separated list of fields to order by, sorted in ascending order. Use "desc" after a field name for descending. Supported fields: |
project |
str
Optional. Project to retrieve list from. If not set, project set in aiplatform.init will be used. |
location |
str
Optional. Location to retrieve list from. If not set, location set in aiplatform.init will be used. |
credentials |
auth_credentials.Credentials
Optional. Custom credentials to use to retrieve list. Overrides credentials set in aiplatform.init. |
to_dict
to_dict() -> typing.Dict[str, typing.Any]
Returns the resource proto as a dictionary.
update
update(
*,
display_name: typing.Optional[str] = None,
labels: typing.Optional[typing.Dict[str, str]] = None,
description: typing.Optional[str] = None,
update_request_timeout: typing.Optional[float] = None
) -> google.cloud.aiplatform.datasets.dataset._Dataset
Update the dataset. Updatable fields:
display_name
description
labels
Parameters | |
---|---|
Name | Description |
display_name |
str
Optional. The user-defined name of the Dataset. The name can be up to 128 characters long and can be consist of any UTF-8 characters. |
labels |
Dict[str, str]
Optional. Labels with user-defined metadata to organize your Tensorboards. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. No more than 64 user labels can be associated with one Tensorboard (System labels are excluded). See https://goo.gl/xmQnxf for more information and examples of labels. System reserved label keys are prefixed with "aiplatform.googleapis.com/" and are immutable. |
description |
str
Optional. The description of the Dataset. |
update_request_timeout |
float
Optional. The timeout for the update request in seconds. |
Returns | |
---|---|
Type | Description |
dataset (Dataset) |
Updated dataset. |
wait
wait()
Helper method that blocks until all futures are complete.