Package google.cloud.aiplatform.v1.schema.predict.instance

Index

TextEmbeddingPredictionInstance (message)
TextEmbeddingPredictionInstance.TaskType (enum)
VideoGenerationModelInstance (message)
VideoGenerationModelInstance.Image (message)
VideoGenerationModelInstance.Mask (message)
VideoGenerationModelInstance.ReferenceImage (message)
VideoGenerationModelInstance.Video (message)
VirtualTryOnModelInstance (message)
VirtualTryOnModelInstance.Image (message)
VirtualTryOnModelInstance.PersonImage (message)
VirtualTryOnModelInstance.ProductImage (message)
VirtualTryOnModelInstance.ProductImageConfig (message)
VirtualTryOnModelInstance.ProductImageConfig.MaskMode (enum)
VisionEmbeddingModelInstance (message)
VisionEmbeddingModelInstance.Image (message)
VisionEmbeddingModelInstance.Video (message)
VisionEmbeddingModelInstance.Video.VideoSegmentConfig (message)
VisionGenerativeModelInstance (message)
VisionGenerativeModelInstance.ControlImageConfig (message)
VisionGenerativeModelInstance.ControlImageConfig.ControlType (enum)
VisionGenerativeModelInstance.Image (message)
VisionGenerativeModelInstance.Mask (message)
VisionGenerativeModelInstance.Mask.BoundingPolyList (message)
VisionGenerativeModelInstance.MaskImageConfig (message)
VisionGenerativeModelInstance.MaskImageConfig.MaskMode (enum)
VisionGenerativeModelInstance.ReferenceImage (message)
VisionGenerativeModelInstance.ReferenceImage.ReferenceType (enum)
VisionGenerativeModelInstance.StyleImageConfig (message)
VisionGenerativeModelInstance.SubjectImageConfig (message)
VisionGenerativeModelInstance.SubjectImageConfig.SubjectType (enum)
VisionReasoningModelInstance (message)
VisionReasoningModelInstance.Image (message)
VisionReasoningModelInstance.Video (message)

TextEmbeddingPredictionInstance

Prediction input format for Text Embedding. LINT.IfChange

Fields

Fields
`content`	`string` The main text content to embed.
`title`	`string` Optional identifier of the text content.
`task_type`	`TaskType` Optional downstream task the embeddings will be used for.

content

string

The main text content to embed.

title

string

Optional identifier of the text content.

task_type

TaskType

Optional downstream task the embeddings will be used for.

TaskType

Represents a downstream task the embeddings will be used for. next_id: 9

Enums
`DEFAULT`	Unset value, which will default to one of the other enum values.
`RETRIEVAL_QUERY`	Specifies the given text is a query in a search/retrieval setting.
`RETRIEVAL_DOCUMENT`	Specifies the given text is a document from the corpus being searched.
`SEMANTIC_SIMILARITY`	Specifies the given text will be used for STS.
`CLASSIFICATION`	Specifies that the given text will be classified.
`CLUSTERING`	Specifies that the embeddings will be used for clustering.
`QUESTION_ANSWERING`	Specifies that the embeddings will be used for question answering.
`FACT_VERIFICATION`	Specifies that the embeddings will be used for fact verification.
`CODE_RETRIEVAL_QUERY`	Specifies that the embeddings will be used for code retrieval.

VideoGenerationModelInstance

Video generation input format for video generation model.

Fields
`prompt`	`string` The text prompt for generating the videos.
`image`	`Image` An image to use as the first frame of the generated video. If an input image is provided, an input video is not supported.
`video`	`Video` An input video. If this field is provided, an input image is not supported. If a mask is provided along with the video, this video will be editing using the mask. Otherwise, this video will be extended by the given duration.
`last_frame`	`Image` Image to use as the last frame of generated videos. An input image must also be provided.
`camera_control`	`string` Camera motion to use in generated videos. An input image must also be provided. Valid values are: - fixed - pan_left - pan_right - tilt_up - tilt_down - truck_left - truck_right - pedestal_up - pedestal_down - push_in - pull_out
`mask`	`Mask` Mask to use in generated videos.
`reference_images[]`	`ReferenceImage` The images to use as the references to generate the videos. If this field is provided, the text prompt field must also be provided. The image, video, or last_frame field are not supported. Each image must be associated with a type. Veo 2 supports up to 3 asset images or 1 style image.

Image

Image input format for the prediction.

Fields
`mime_type`	`string` The MIME type of the content of the image. Only the images in below listed MIME types are supported. - image/jpeg - image/png
Union field `data`. The image data. `data` can be only one of the following:
`bytes_base64_encoded`	`string` Base64 encoded bytes string representing the image.
`gcs_uri`	`string` The Google Cloud Storage location of the image.

Mask

Mask input format for the prediction.

Fields
`mime_type`	`string` Valid values: - image/png - image/jpeg - image/webp - video/mov - video/mpeg - video/mp4 - video/mpg - video/avi - video/wmv - video/mpegps - video/flv
`mask_mode`	`string` Describes how the mask will be used. Inpainting masks must match the aspect ration of the input video. Outpainting masks can be either 9:16 or 16:9. Available options are: - insert: The image mask contains a masked rectangular region which is applied on the first frame of the input video. The object described in the prompt is inserted into this region and will appear in subsequent frames. - remove: The image mask is used to determine an object in the first video frame to track. This object is removed from the video. - remove_static: The image mask is used to determine a region in the video. Objects in this region will be removed. - outpaint: The image mask contains a masked rectangular region where the input video will go. The remaining area will be generated. Video masks are not supported.
Union field `data`. The mask data. `data` can be only one of the following:
`bytes_base64_encoded`	`string` Base64 encoded bytes string representing the mask.
`gcs_uri`	`string` The Google Cloud Storage location of the mask.

ReferenceImage

Reference image input format for the prediction. A ReferenceImage is an image that is used to provide additional context for the video generation.

Fields

Fields
`image`	`Image` The image data to be used as the reference image.
`reference_type`	`string` The type of the reference image, which defines how the reference image will be used to generate the video. Supported types are: - asset: The reference image provides assets to the generated video, such as the scene, an object, a character, etc. - style: The aesthetics of the reference image, including colors, lighting, texture, etc., are used as the style of the generated video, such as 'anime', 'photography', 'origami', etc.

image

Image

The image data to be used as the reference image.

reference_type

string

The type of the reference image, which defines how the reference image will be used to generate the video. Supported types are: - asset: The reference image provides assets to the generated video, such as the scene, an object, a character, etc. - style: The aesthetics of the reference image, including colors, lighting, texture, etc., are used as the style of the generated video, such as 'anime', 'photography', 'origami', etc.

Video

Video input format for the prediction.

Fields
`mime_type`	`string` The MIME type of the content of the video. Only the videos in below listed MIME types are supported. - video/mov - video/mpeg - video/mp4 - video/mpg - video/avi - video/wmv - video/mpegps - video/flv
Union field `data`. The video data. `data` can be only one of the following:
`gcs_uri`	`string` The Google Cloud Storage location of the video on which to perform the prediction.
`bytes_base64_encoded`	`string` Base64 encoded bytes string representing the video.

VirtualTryOnModelInstance

Media generation input format for the Virtual Try On model.

Fields

Fields
`prompt`	`string` The text prompt for generating the images. This is required for both editing and generation.
`product_images[]`	`ProductImage` The image of the products to wear on the person.
`person_image`	`PersonImage` The image of the person to be edited with the product images.

prompt

string

The text prompt for generating the images. This is required for both editing and generation.

product_images[]

ProductImage

The image of the products to wear on the person.

person_image

PersonImage

The image of the person to be edited with the product images.

Image

Input image and metadata.

Fields
`mime_type`	`string` The MIME type of the content of the image. Only the images in below listed MIME types are supported. - image/jpeg - image/png
Union field `data`. The image bytes or Cloud Storage URI to make the prediction on. `data` can be only one of the following:
`bytes_base64_encoded`	`string` Base64 encoded bytes string representing the image.
`gcs_uri`	`string` The Cloud Storage URI of the image.

PersonImage

A PersonImage is used to provide the person image and its associated configuration options for Virtual Try On.

Fields

Fields
`image`	`Image` The image bytes or Cloud Storage URI of the person or subject that will be edited using the product images.

image

Image

The image bytes or Cloud Storage URI of the person or subject that will be edited using the product images.

ProductImage

A ProductImage is used to provide the product image and its associated configuration options for Virtual Try On.

Fields

Fields
`image`	`Image` The actual image data of the reference image.
`mask_image`	`Image` The mask image associated with this product. If provided, the mask image will be used to guide the image editing.
`product_image_config`	`ProductImageConfig` A config for the product image.

image

Image

The actual image data of the reference image.

mask_image

Image

The mask image associated with this product. If provided, the mask image will be used to guide the image editing.

product_image_config

ProductImageConfig

A config for the product image.

ProductImageConfig

Config for the product image.

Fields

Fields
`mask_mode`	`MaskMode` Mode used to control the segmentation logic.
`dilation`	`float` Dilation to be used with this Mask.
`product_description`	`string` Description of the product.

mask_mode

MaskMode

Mode used to control the segmentation logic.

dilation

float

Dilation to be used with this Mask.

product_description

string

Description of the product.

MaskMode

Mode used to generate the mask if mask is not provided.

Enums
`MASK_MODE_DEFAULT`	Default value for mask mode.
`MASK_MODE_USER_PROVIDED`	User provided mask. No segmentation needed.
`MASK_MODE_DETECTION_BOX`	Mask from detected bounding boxes.
`MASK_MODE_CLOTHING_AREA`	Masks from segmenting the clothing area with open-vocab segmentation.
`MASK_MODE_PARSED_PERSON`	Masks from segmenting the person body and clothing using the person-parsing model.

VisionEmbeddingModelInstance

Media embedding input format for large vision model embedding api.

Fields

Fields
`image`	`Image` The image bytes or Cloud Storage URI to generate the image embedding.
`text`	`string` The text for generating the text embedding.
`video`	`Video` The video bytes or Cloud Storage URI to generate the video embedding.

image

Image

The image bytes or Cloud Storage URI to generate the image embedding.

text

string

The text for generating the text embedding.

video

Video

The video bytes or Cloud Storage URI to generate the video embedding.

Image

The image bytes or Cloud Storage URI to make the prediction on.

Fields
`mime_type`	`string` The MIME type of the content of the image. Only the images in below listed MIME types are supported. - image/jpeg - image/png
Union field `data`. `data` can be only one of the following:
`bytes_base64_encoded`	`string` Base64 encoded bytes string representing the image.
`gcs_uri`	`string`

Video

The video bytes or Cloud Storage URI to make the prediction on.

Fields
`video_segment_config`	`VideoSegmentConfig` Video configurations.
Union field `data`. `data` can be only one of the following:
`bytes_base64_encoded`	`string` Base64 encoded bytes string representing the video.
`gcs_uri`	`string`

VideoSegmentConfig

Video segment configurations.

Fields

Fields
`start_offset_sec`	`int32` The start offset of the video segment in seconds.
`end_offset_sec`	`int32` The end offset of the video segment in seconds.
`interval_sec`	`int32` The interval of the video for which the embedding will be generated. The minimum value for interval_sec is 4. If the interval is less than 4, an InvalidArgumentError will be returned. There is no limitations on the maximum value of the interval. However, if the interval is larger than min(video length, 120s), it will affect the quality of the generated embeddings.

start_offset_sec

int32

The start offset of the video segment in seconds.

end_offset_sec

int32

The end offset of the video segment in seconds.

interval_sec

int32

The interval of the video for which the embedding will be generated. The minimum value for interval_sec is 4. If the interval is less than 4, an InvalidArgumentError will be returned. There is no limitations on the maximum value of the interval. However, if the interval is larger than min(video length, 120s), it will affect the quality of the generated embeddings.

VisionGenerativeModelInstance

Media generation input format for large vision model.

Fields
`image`	`Image` The image bytes or Cloud Storage URI to make the prediction on. It is required for editing. Not needed for generation. This field will be used to determine whether the call is editing or generation.
`prompt`	`string` The text prompt for generating the images. This is required for both editing and generation.
`mask`	`Mask` Masked field will be editied based on the text content provided. This can be either an image or a polygon. It should not be provided without images. Optional field for editing the images.
`reference_images[]`	`ReferenceImage` The reference images to be used for editing and customization capabilities. Imagen 3 Capability adds support for multiple reference images, each of which can be a mask, control, style, or subject image. Depending on the reference type, the reference_config field will be populated with the corresponding config.

ControlImageConfig

Config for control image used for editing.

Fields
`control_type`	`ControlType` Type of control image.
`enable_control_image_computation`	`bool` Whether to compute the control image for the request.
`superpixel_region_size`	`int32` Region size of the superpixel control image.
`superpixel_ruler`	`float` Ruler of the superpixel control image.

ControlType

Type of control image.

Enums
`CONTROL_TYPE_DEFAULT`	Default value for control image.
`CONTROL_TYPE_CANNY`	Canny sketch control image.
`CONTROL_TYPE_SCRIBBLE`	Scribble sketch control image using HED model.
`CONTROL_TYPE_FACE_MESH`	Control mode for using Face mesh style editing
`CONTROL_TYPE_COLOR_SUPERPIXEL`	Color superpixel control image.

Image

Fields
`mime_type`	`string` The MIME type of the content of the image. Only the images in below listed MIME types are supported. - image/jpeg - image/png
Union field `data`. The image bytes or Cloud Storage URI to make the prediction on. `data` can be only one of the following:
`bytes_base64_encoded`	`string` Base64 encoded bytes string representing the image.
`gcs_uri`	`string`

Mask

Fields

Fields
Union field `data`. `data` can be only one of the following:
`image`	`Image`
`polygon_list`	`BoundingPolyList`

Union field data.

data can be only one of the following:

image

Image

polygon_list

BoundingPolyList

BoundingPolyList

Fields
`polygons[]`	`BoundingPoly`

MaskImageConfig

Config for masked image editing using Imagen 3 Capability

Fields

Fields
`mask_mode`	`MaskMode` Mode used to generate the mask if mask is not provided.
`dilation`	`float` Dilation to be used with this Mask. This value is used to dilate the mask before applying the edit mode.
`mask_classes[]`	`int32` The segmentation classes which are used in the MASK_MODE_SEMANTIC mode.

mask_mode

MaskMode

Mode used to generate the mask if mask is not provided.

dilation

float

Dilation to be used with this Mask. This value is used to dilate the mask before applying the edit mode.

mask_classes[]

int32

The segmentation classes which are used in the MASK_MODE_SEMANTIC mode.

MaskMode

Mode used to generate the mask if mask is not provided.

Enums
`MASK_MODE_DEFAULT`	Default value for mask mode.
`MASK_MODE_USER_PROVIDED`	User provided mask. No generation needed.
`MASK_MODE_BACKGROUND`	Background mask. All elements detected as background will be masked.
`MASK_MODE_FOREGROUND`	Foreground mask. All elements detected as foreground will be masked.
`MASK_MODE_SEMANTIC`	Semantic mask. Objects identified as one of the classes defined in mask_classes will be masked.

ReferenceImage

A ReferenceImage is an image that is used to provide additional context for the image generation or editing.

Fields
`reference_image`	`Image` The actual image data of the reference image.
`reference_id`	`int32` The id of the reference image. This must be unique within the request.
`reference_type`	`ReferenceType` The type of the reference image.
Union field `reference_config`. A config describing the reference image. `reference_config` can be only one of the following:
`mask_image_config`	`MaskImageConfig` A config for a mask image.
`control_image_config`	`ControlImageConfig` A config for a control image.
`style_image_config`	`StyleImageConfig` A config for a style image.
`subject_image_config`	`SubjectImageConfig` A config for a subject image.

ReferenceType

The type of the reference image.

Enums
`REFERENCE_TYPE_DEFAULT`	Default value for reference in image.
`REFERENCE_TYPE_RAW`	A normal RGB image.
`REFERENCE_TYPE_MASK`	A mask image.
`REFERENCE_TYPE_CONTROL`	A control (line sketch) image.
`REFERENCE_TYPE_STYLE`	A style image.
`REFERENCE_TYPE_SUBJECT`	A subject image.
`REFERENCE_TYPE_CONTENT`	A content image for R2I.

StyleImageConfig

Config for style image used for editing.

Fields

Fields
`style_description`	`string` Description of the style image.

style_description

string

Description of the style image.

SubjectImageConfig

Config for subject image used for editing.

Fields

Fields
`subject_description`	`string` Description of the subject image.
`subject_type`	`SubjectType` Type of subject image.

subject_description

string

Description of the subject image.

subject_type

SubjectType

Type of subject image.

SubjectType

Type of subject image.

Enums
`SUBJECT_TYPE_DEFAULT`	Default value for subject image.
`SUBJECT_TYPE_PERSON`	The subject of the image is a person.
`SUBJECT_TYPE_ANIMAL`	The subject of the image is an animal.
`SUBJECT_TYPE_PRODUCT`	The subject of the image is a product/object.

VisionReasoningModelInstance

Vision reasoning input format for large vision model. Model only supports one instance at a time.

Fields
`prompt`	`string` The text prompt for guiding the response in QA.
`mask`	`Image` Text responses will be generated from the masked area if mask is provided.
Union field `content`. `content` can be only one of the following:
`image`	`Image` The image bytes or Cloud Storage URI to make the prediction on.
`video`	`Video` The video bytes or Cloud storage URI to make the prediction on.

Image

Fields
`mime_type`	`string` Optional. The MIME type of the content of the image. Only the images in below listed MIME types are supported. - image/jpeg - image/png
Union field `data`. The image bytes or Cloud Storage URI to make the prediction on. `data` can be only one of the following:
`bytes_base64_encoded`	`string` Base64 encoded bytes string representing the image.
`gcs_uri`	`string` Cloud Storage URI representing the image in user project.

Video

Fields

Fields
Union field `data`. The video string bytes or Cloud Storage URI to make the prediction on. `data` can be only one of the following:
`bytes_base64_encoded`	`string` Base64 encoded bytes string representing the video.
`gcs_uri`	`string`

Union field data. The video string bytes or Cloud Storage URI to make the prediction on. data can be only one of the following:

bytes_base64_encoded

string

Base64 encoded bytes string representing the video.

gcs_uri

string