Package google.cloud.aiplatform.v1.schema.predict.instance

Index

TextEmbeddingPredictionInstance

Prediction input format for Text Embedding. LINT.IfChange

Fields
content

string

The main text content to embed.

title

string

Optional identifier of the text content.

task_type

TaskType

Optional downstream task the embeddings will be used for.

TaskType

Represents a downstream task the embeddings will be used for. next_id: 9

Enums
DEFAULT Unset value, which will default to one of the other enum values.
RETRIEVAL_QUERY Specifies the given text is a query in a search/retrieval setting.
RETRIEVAL_DOCUMENT Specifies the given text is a document from the corpus being searched.
SEMANTIC_SIMILARITY Specifies the given text will be used for STS.
CLASSIFICATION Specifies that the given text will be classified.
CLUSTERING Specifies that the embeddings will be used for clustering.
QUESTION_ANSWERING Specifies that the embeddings will be used for question answering.
FACT_VERIFICATION Specifies that the embeddings will be used for fact verification.
CODE_RETRIEVAL_QUERY Specifies that the embeddings will be used for code retrieval.

VideoGenerationModelInstance

Video generation input format for video generation model.

Fields
prompt

string

The text prompt for generating the videos.

image

Image

An image to use as the first frame of the generated video. If an input image is provided, an input video is not supported.

video

Video

An input video. If this field is provided, an input image is not supported. If a mask is provided along with the video, this video will be editing using the mask. Otherwise, this video will be extended by the given duration.

last_frame

Image

Image to use as the last frame of generated videos. An input image must also be provided.

camera_control

string

Camera motion to use in generated videos. An input image must also be provided. Valid values are: - fixed - pan_left - pan_right - tilt_up - tilt_down - truck_left - truck_right - pedestal_up - pedestal_down - push_in - pull_out

mask

Mask

Mask to use in generated videos.

reference_images[]

ReferenceImage

The images to use as the references to generate the videos. If this field is provided, the text prompt field must also be provided. The image, video, or last_frame field are not supported. Each image must be associated with a type. Veo 2 supports up to 3 asset images or 1 style image.

Image

Image input format for the prediction.

Fields
mime_type

string

The MIME type of the content of the image. Only the images in below listed MIME types are supported. - image/jpeg - image/png

Union field data. The image data. data can be only one of the following:
bytes_base64_encoded

string

Base64 encoded bytes string representing the image.

gcs_uri

string

The Google Cloud Storage location of the image.

Mask

Mask input format for the prediction.

Fields
mime_type

string

Valid values: - image/png - image/jpeg - image/webp - video/mov - video/mpeg - video/mp4 - video/mpg - video/avi - video/wmv - video/mpegps - video/flv

mask_mode

string

Describes how the mask will be used. Inpainting masks must match the aspect ration of the input video. Outpainting masks can be either 9:16 or 16:9. Available options are: - insert: The image mask contains a masked rectangular region which is applied on the first frame of the input video. The object described in the prompt is inserted into this region and will appear in subsequent frames. - remove: The image mask is used to determine an object in the first video frame to track. This object is removed from the video. - remove_static: The image mask is used to determine a region in the video. Objects in this region will be removed. - outpaint: The image mask contains a masked rectangular region where the input video will go. The remaining area will be generated. Video masks are not supported.

Union field data. The mask data. data can be only one of the following:
bytes_base64_encoded

string

Base64 encoded bytes string representing the mask.

gcs_uri

string

The Google Cloud Storage location of the mask.

ReferenceImage

Reference image input format for the prediction. A ReferenceImage is an image that is used to provide additional context for the video generation.

Fields
image

Image

The image data to be used as the reference image.

reference_type

string

The type of the reference image, which defines how the reference image will be used to generate the video. Supported types are: - asset: The reference image provides assets to the generated video, such as the scene, an object, a character, etc. - style: The aesthetics of the reference image, including colors, lighting, texture, etc., are used as the style of the generated video, such as 'anime', 'photography', 'origami', etc.

Video

Video input format for the prediction.

Fields
mime_type

string

The MIME type of the content of the video. Only the videos in below listed MIME types are supported. - video/mov - video/mpeg - video/mp4 - video/mpg - video/avi - video/wmv - video/mpegps - video/flv

Union field data. The video data. data can be only one of the following:
gcs_uri

string

The Google Cloud Storage location of the video on which to perform the prediction.

bytes_base64_encoded

string

Base64 encoded bytes string representing the video.

VirtualTryOnModelInstance

Media generation input format for the Virtual Try On model.

Fields
prompt

string

The text prompt for generating the images. This is required for both editing and generation.

product_images[]

ProductImage

The image of the products to wear on the person.

person_image

PersonImage

The image of the person to be edited with the product images.

Image

Input image and metadata.

Fields
mime_type

string

The MIME type of the content of the image. Only the images in below listed MIME types are supported. - image/jpeg - image/png

Union field data. The image bytes or Cloud Storage URI to make the prediction on. data can be only one of the following:
bytes_base64_encoded

string

Base64 encoded bytes string representing the image.

gcs_uri

string

The Cloud Storage URI of the image.

PersonImage

A PersonImage is used to provide the person image and its associated configuration options for Virtual Try On.

Fields
image

Image

The image bytes or Cloud Storage URI of the person or subject that will be edited using the product images.

ProductImage

A ProductImage is used to provide the product image and its associated configuration options for Virtual Try On.

Fields
image

Image

The actual image data of the reference image.

mask_image

Image

The mask image associated with this product. If provided, the mask image will be used to guide the image editing.

product_image_config

ProductImageConfig

A config for the product image.

ProductImageConfig

Config for the product image.

Fields
mask_mode

MaskMode

Mode used to control the segmentation logic.

dilation

float

Dilation to be used with this Mask.

product_description

string

Description of the product.

MaskMode

Mode used to generate the mask if mask is not provided.

Enums
MASK_MODE_DEFAULT Default value for mask mode.
MASK_MODE_USER_PROVIDED User provided mask. No segmentation needed.
MASK_MODE_DETECTION_BOX Mask from detected bounding boxes.
MASK_MODE_CLOTHING_AREA Masks from segmenting the clothing area with open-vocab segmentation.
MASK_MODE_PARSED_PERSON Masks from segmenting the person body and clothing using the person-parsing model.

VisionEmbeddingModelInstance

Media embedding input format for large vision model embedding api.

Fields
image

Image

The image bytes or Cloud Storage URI to generate the image embedding.

text

string

The text for generating the text embedding.

video

Video

The video bytes or Cloud Storage URI to generate the video embedding.

Image

The image bytes or Cloud Storage URI to make the prediction on.

Fields
mime_type

string

The MIME type of the content of the image. Only the images in below listed MIME types are supported. - image/jpeg - image/png

Union field data.

data can be only one of the following:

bytes_base64_encoded

string

Base64 encoded bytes string representing the image.

gcs_uri

string

Video

The video bytes or Cloud Storage URI to make the prediction on.

Fields
video_segment_config

VideoSegmentConfig

Video configurations.

Union field data.

data can be only one of the following:

bytes_base64_encoded

string

Base64 encoded bytes string representing the video.

gcs_uri

string

VideoSegmentConfig

Video segment configurations.

Fields
start_offset_sec

int32

The start offset of the video segment in seconds.

end_offset_sec

int32

The end offset of the video segment in seconds.

interval_sec

int32

The interval of the video for which the embedding will be generated. The minimum value for interval_sec is 4. If the interval is less than 4, an InvalidArgumentError will be returned. There is no limitations on the maximum value of the interval. However, if the interval is larger than min(video length, 120s), it will affect the quality of the generated embeddings.

VisionGenerativeModelInstance

Media generation input format for large vision model.

Fields
image

Image

The image bytes or Cloud Storage URI to make the prediction on. It is required for editing. Not needed for generation. This field will be used to determine whether the call is editing or generation.

prompt

string

The text prompt for generating the images. This is required for both editing and generation.

mask

Mask

Masked field will be editied based on the text content provided. This can be either an image or a polygon. It should not be provided without images. Optional field for editing the images.

reference_images[]

ReferenceImage

The reference images to be used for editing and customization capabilities. Imagen 3 Capability adds support for multiple reference images, each of which can be a mask, control, style, or subject image. Depending on the reference type, the reference_config field will be populated with the corresponding config.

ControlImageConfig

Config for control image used for editing.

Fields
control_type

ControlType

Type of control image.

enable_control_image_computation

bool

Whether to compute the control image for the request.

superpixel_region_size

int32

Region size of the superpixel control image.

superpixel_ruler

float

Ruler of the superpixel control image.

ControlType

Type of control image.

Enums
CONTROL_TYPE_DEFAULT Default value for control image.
CONTROL_TYPE_CANNY Canny sketch control image.
CONTROL_TYPE_SCRIBBLE Scribble sketch control image using HED model.
CONTROL_TYPE_FACE_MESH Control mode for using Face mesh style editing
CONTROL_TYPE_COLOR_SUPERPIXEL Color superpixel control image.

Image

Fields
mime_type

string

The MIME type of the content of the image. Only the images in below listed MIME types are supported. - image/jpeg - image/png

Union field data. The image bytes or Cloud Storage URI to make the prediction on. data can be only one of the following:
bytes_base64_encoded

string

Base64 encoded bytes string representing the image.

gcs_uri

string

Mask

Fields

Union field data.

data can be only one of the following:

image

Image

polygon_list

BoundingPolyList

BoundingPolyList

Fields
polygons[]

BoundingPoly

MaskImageConfig

Config for masked image editing using Imagen 3 Capability

Fields
mask_mode

MaskMode

Mode used to generate the mask if mask is not provided.

dilation

float

Dilation to be used with this Mask. This value is used to dilate the mask before applying the edit mode.

mask_classes[]

int32

The segmentation classes which are used in the MASK_MODE_SEMANTIC mode.

MaskMode

Mode used to generate the mask if mask is not provided.

Enums
MASK_MODE_DEFAULT Default value for mask mode.
MASK_MODE_USER_PROVIDED User provided mask. No generation needed.
MASK_MODE_BACKGROUND Background mask. All elements detected as background will be masked.
MASK_MODE_FOREGROUND Foreground mask. All elements detected as foreground will be masked.
MASK_MODE_SEMANTIC Semantic mask. Objects identified as one of the classes defined in mask_classes will be masked.

ReferenceImage

A ReferenceImage is an image that is used to provide additional context for the image generation or editing.

Fields
reference_image

Image

The actual image data of the reference image.

reference_id

int32

The id of the reference image. This must be unique within the request.

reference_type

ReferenceType

The type of the reference image.

Union field reference_config. A config describing the reference image. reference_config can be only one of the following:
mask_image_config

MaskImageConfig

A config for a mask image.

control_image_config

ControlImageConfig

A config for a control image.

style_image_config

StyleImageConfig

A config for a style image.

subject_image_config

SubjectImageConfig

A config for a subject image.

ReferenceType

The type of the reference image.

Enums
REFERENCE_TYPE_DEFAULT Default value for reference in image.
REFERENCE_TYPE_RAW A normal RGB image.
REFERENCE_TYPE_MASK A mask image.
REFERENCE_TYPE_CONTROL A control (line sketch) image.
REFERENCE_TYPE_STYLE A style image.
REFERENCE_TYPE_SUBJECT A subject image.
REFERENCE_TYPE_CONTENT A content image for R2I.

StyleImageConfig

Config for style image used for editing.

Fields
style_description

string

Description of the style image.

SubjectImageConfig

Config for subject image used for editing.

Fields
subject_description

string

Description of the subject image.

subject_type

SubjectType

Type of subject image.

SubjectType

Type of subject image.

Enums
SUBJECT_TYPE_DEFAULT Default value for subject image.
SUBJECT_TYPE_PERSON The subject of the image is a person.
SUBJECT_TYPE_ANIMAL The subject of the image is an animal.
SUBJECT_TYPE_PRODUCT The subject of the image is a product/object.

VisionReasoningModelInstance

Vision reasoning input format for large vision model. Model only supports one instance at a time.

Fields
prompt

string

The text prompt for guiding the response in QA.

mask

Image

Text responses will be generated from the masked area if mask is provided.

Union field content.

content can be only one of the following:

image

Image

The image bytes or Cloud Storage URI to make the prediction on.

video

Video

The video bytes or Cloud storage URI to make the prediction on.

Image

Fields
mime_type

string

Optional. The MIME type of the content of the image. Only the images in below listed MIME types are supported. - image/jpeg - image/png

Union field data. The image bytes or Cloud Storage URI to make the prediction on. data can be only one of the following:
bytes_base64_encoded

string

Base64 encoded bytes string representing the image.

gcs_uri

string

Cloud Storage URI representing the image in user project.

Video

Fields
Union field data. The video string bytes or Cloud Storage URI to make the prediction on. data can be only one of the following:
bytes_base64_encoded

string

Base64 encoded bytes string representing the video.

gcs_uri

string