This guide shows you how to create and use multimodal datasets in Vertex AI for generative AI tasks. This page describes the following topics: The following diagram summarizes the overall workflow for using multimodal datasets:
With multimodal datasets on Vertex AI, you can create, manage, share, and use multimodal datasets for Generative AI. Key features include: You can use multimodal datasets through the Vertex AI SDK for Python or REST API. Multimodal datasets are a type of managed datasets on Vertex AI and differ from other managed datasets in the following ways: In the Google Cloud console, on the project selector page,
select or create a Google Cloud project.
Verify that billing is enabled for your Google Cloud project.
Enable the Vertex AI, BigQuery, and Cloud Storage APIs.
In the Google Cloud console, on the project selector page,
select or create a Google Cloud project.
Verify that billing is enabled for your Google Cloud project.
Enable the Vertex AI, BigQuery, and Cloud Storage APIs.
You can create a multimodal From a Pandas DataFrame From a BigQuery DataFrame From a BigQuery table From a JSONL file in Cloud Storage In the following example, the JSONL file contains requests that are already formatted for Gemini, so no assembly is required. From an existing multimodal dataset From a BigQuery table A template defines how to transform your multimodal dataset into a format that a generative model can use. You must attach a template to your dataset before you can run a tuning or batch prediction job. 1. Construct a template There are two ways to construct a template: Use the Manually construct a template from a 2. Attach the template to the dataset Call the The By default, the dataset's attached For example, assume that your multimodal dataset contains the following data: The You can tune Gemini models using a multimodal dataset. Assess the dataset to check for errors, such as dataset formatting or model errors. Call Call the Assess the dataset to get the token and billable character count for your tuning job. Call Call the For more information, see Create a tuning job. You can get batch predictions using a multimodal dataset. Assess the dataset to check for errors, such as dataset formatting or model errors. Call Call the Assess the dataset to get the token count for your job. Call Call the You can use your multimodal dataset to get batch predictions by passing the BigQuery For more information, see Request a batch prediction job. When you tune a model or run a batch prediction job, you are billed for Generative AI usage and querying the dataset in BigQuery. When you create, assemble, or assess your multimodal dataset, you are billed for storing and querying multimodal datasets in BigQuery. Specifically, the following operations use those underlying services: Use the Pricing Calculator to generate a cost estimate based on your projected usage.
assemble
and assess
for previewing data, validating requests, and estimating costs.Before you begin
from google.cloud.aiplatform.preview import datasets
# To use related features, you may also need to import some of the following features:
from vertexai.preview.tuning import sft
from vertexai.batch_prediction import BatchPredictionJob
from vertexai.generative_models import Content, Part, Tool, ToolConfig, SafetySetting, GenerationConfig, FunctionDeclaration
Create a dataset
dataset
from different sources:
Source
Description
Use Case
Pandas or BigQuery DataFrame
Create a dataset from an in-memory DataFrame object.
Ideal for interactive development in notebooks where data is already loaded or manipulated as a DataFrame.
BigQuery table
Create a dataset directly from an existing table in BigQuery using its URI.
Best for large, persistent datasets already stored in BigQuery, avoiding data movement.
JSONL file in Cloud Storage
Create a dataset from a JSONL file where each line is a pre-formatted request for Gemini.
Useful when your data is already structured in the final Gemini request format and stored in Cloud Storage.
Existing multimodal dataset
Load a previously created multimodal dataset by its name or by listing available datasets.
For reusing or continuing work on an existing dataset within Vertex AI.
Vertex AI SDK for Python
my_dataset = datasets.MultimodalDataset.from_pandas(
dataframe=my_dataframe,
target_table_id=table_id # optional
)
my_dataset = datasets.MultimodalDataset.from_bigframes(
dataframe=my_dataframe,
target_table_id=table_id # optional
)
my_dataset_from_bigquery = datasets.MultimodalDataset.from_bigquery(
bigquery_uri=f"bq://projectId.datasetId.tableId"
)
my_dataset = datasets.MultimodalDataset.from_gemini_request_jsonl(
gcs_uri = gcs_uri_of_jsonl_file,
)
# Get the most recently created dataset
first_dataset = datasets.MultimodalDataset.list()[0]
# Load dataset based on its name
same_dataset = datasets.MultimodalDataset(first_dataset.name)
REST
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT/locations/LOCATION/datasets" \
-d '{
"display_name": "TestDataset",
"metadataSchemaUri": "gs://google-cloud-aiplatform/schema/dataset/metadata/multimodal_1.0.0.yaml",
"metadata": {
"inputConfig": {
"bigquery_source": {
"uri": "bq://projectId.datasetId.tableId"
}
}
}
}'
Construct and attach a template
Method
Description
Pros
Cons
construct_single_turn_template
A helper method to create a template for simple, single-turn (prompt-response) conversations.
Simple and concise for basic use cases.
Limited to single-turn conversations; less flexible.
Manual construction with
GeminiExample
Manually define the conversation structure, including multi-turn interactions and system instructions, using
Content
and Part
objects.Highly flexible; supports complex, multi-turn conversations and fine-grained control over the prompt structure.
More verbose and complex to set up.
Vertex AI SDK for Python
construct_single_turn_template
helper method:template_config = datasets.construct_single_turn_template(
prompt="This is the image: {image_uris}",
response="{labels}",
system_instruction='You are a botanical image classifier. Analyze the provided image '
'and determine the most accurate classification of the flower.'
'These are the only flower categories: [\'daisy\', \'dandelion\', \'roses\', \'sunflowers\', \'tulips\'].'
'Return only one category per image.'
)
GeminiExample
: This allows for finer granularity, such as multi-turn conversations. The following code sample also includes optional commented code for specifying a field_mapping
, which lets you use a placeholder name that is different from the column name of the dataset.# Define a GeminiExample
gemini_example = datasets.GeminiExample(
contents=[
Content(role="user", parts=[Part.from_text("This is the image: {image_uris}")]),
Content(role="model", parts=[Part.from_text("This is the flower class: {label}.")]),
Content(role="user", parts=[Part.from_text("Your response should only contain the class label.")]),
Content(role="model", parts=[Part.from_text("{label}")]),
# Optional: If you specify a field_mapping, you can use different placeholder values. For example:
# Content(role="user", parts=[Part.from_text("This is the image: {uri_placeholder}")]),
# Content(role="model", parts=[Part.from_text("This is the flower class: {flower_placeholder}.")]),
# Content(role="user", parts=[Part.from_text("Your response should only contain the class label.")]),
# Content(role="model", parts=[Part.from_text("{flower_placeholder}")]),
],
system_instruction=Content(
parts=[
Part.from_text(
'You are a botanical image classifier. Analyze the provided image '
'and determine the most accurate classification of the flower.'
'These are the only flower categories: [\'daisy\', \'dandelion\', \'roses\', \'sunflowers\', \'tulips\'].'
'Return only one category per image.'
)
]
),
)
# construct the template, specifying a map for the placeholder
template_config = datasets.GeminiTemplateConfig(
gemini_example=gemini_example,
# Optional: Map the template placeholders to the column names of your dataset.
# Not required if the template placesholders are column names of the dataset.
# field_mapping={"uri_placeholder": "image_uris", "flower_placeholder": "labels"},
)
my_dataset.attach_template_config(template_config=template_config)
REST
patch
method and update the metadata
field with the following:
bigquery_uri
. For datasets created from other sources, such as JSONL or DataFrame, this is the BigQuery table where your data was copied.gemini_template_config
.curl -X PATCH \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d $'{
"metadata": {
"input_config": {
"bigquery_source": {
"uri": "bq://projectId.datasetId.tableId"
}
},
"gemini_template_config_source": {
"gemini_template_config": {
"gemini_example": {
"contents": [
{
"role": "user",
"parts": [
{
"text": "This is the image: {image_uris}"
}
]
},
{
"role": "model",
"parts": [
{
"text": "response"
}
]
}
]
"systemInstruction": {
"parts": [
{
"text": "You are a botanical image classifier."
}
]
}
}
}
}
}
}' \
"https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID?updateMask=metadata"
(Optional) Assemble the dataset
assemble
method applies the template to transform your dataset and stores the output in a new BigQuery table. This lets you preview the data before you use it for a tuning or batch prediction job.template_config
is used, but you can specify a template to override the default behavior.Vertex AI SDK for Python
table_id, assembly = my_dataset.assemble(template_config=template_config)
# Inspect the results
assembly.head()
REST
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID:assemble" \
-d '{}'
Row
image_uris
labels
1
gs://cloud-samples-data/ai-platform/flowers/daisy/1396526833_fb867165be_n.jpg
daisy
assemble
method creates a new BigQuery table with the name table_id
where each row contains the request body. For example:{
"contents": [
{
"parts": [
{
"text": "This is the image: "
},
{
"fileData": {
"fileUri": "gs://cloud-samples-data/ai-platform/flowers/daisy/1396526833_fb867165be_n.jpg",
"mimeType": "image/jpeg"
}
}
],
"role": "user"
},
{
"parts": [
{
"text": "daisy"
}
],
"role": "model"
}
],
"systemInstruction": {
"parts": [
{
"text": "You are a botanical image classifier. Analyze the provided image and determine the most accurate classification of the flower.These are the only flower categories: ['daisy', 'dandelion', 'roses', 'sunflowers', 'tulips'].Return only one category per image."
}
]
}
}
Tune your model
(Optional) Validate the dataset
Vertex AI SDK for Python
assess_tuning_validity()
. By default, the dataset's attached template_config
is used, but you can specify a template to override this behavior.# Attach template
my_dataset.attach_template_config(template_config=template_config)
# Validation for tuning
validation = my_dataset.assess_tuning_validity(
model_name="gemini-2.0-flash-001",
dataset_usage="SFT_TRAINING"
)
# Inspect validation result
validation.errors
REST
assess
method and provide a TuningValidationAssessmentConfig
.curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID:assess" \
-d '{
"tuningValidationAssessmentConfig": {
"modelName": "projects/PROJECT_ID/locations/LOCATION/models/gemini-2.0-flash-001",
"datasetUsage": "SFT_TRAINING"
}
}'
(Optional) Estimate resource usage
Vertex AI SDK for Python
assess_tuning_resources()
.# Resource estimation for tuning.
tuning_resources = my_dataset.assess_tuning_resources(
model_name="gemini-2.0-flash-001"
)
print(tuning_resources)
# For example, TuningResourceUsageAssessmentResult(token_count=362688, billable_character_count=122000)
REST
assess
method and provide a TuningResourceUsageAssessmentConfig
.curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID:assess" \
-d '{
"tuningResourceUsageAssessmentConfig": {
"modelName": "projects/PROJECT_ID/locations/LOCATION/models/gemini-2.0-flash-001"
}
}'
Run the tuning job
Vertex AI SDK for Python
from vertexai.tuning import sft
sft_tuning_job = sft.train(
source_model="gemini-2.0-flash-001",
# Pass the Vertex Multimodal Datasets directly
train_dataset=my_multimodal_dataset,
validation_dataset=my_multimodal_validation_dataset,
)
Google Gen AI SDK
from google import genai
from google.genai.types import HttpOptions, CreateTuningJobConfig
client = genai.Client(http_options=HttpOptions(api_version="v1"))
tuning_job = client.tunings.tune(
base_model="gemini-2.0-flash-001",
# Pass the resource name of the Vertex Multimodal Dataset, not the dataset object
training_dataset={
"vertex_dataset_resource": my_multimodal_dataset.resource_name
},
# Optional
config=CreateTuningJobConfig(
tuned_model_display_name="Example tuning job"),
)
Get batch predictions
(Optional) Validate the dataset
Vertex AI SDK for Python
assess_batch_prediction_validity()
. By default, the dataset's attached template_config
is used, but you can specify a template to override this behavior.# Attach template
my_dataset.attach_template_config(template_config=template_config)
# Validation for batch prediction
validation = my_dataset.assess_batch_prediction_validity(
model_name="gemini-2.0-flash-001",
dataset_usage="SFT_TRAINING"
)
# Inspect validation result
validation.errors
REST
assess
method and provide a batchPredictionValidationAssessmentConfig
.curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID:assess" \
-d '{
"batchPredictionValidationAssessmentConfig": {
"modelName": "projects/PROJECT_ID/locations/LOCATION/models/gemini-2.0-flash-001",
}
}'
(Optional) Estimate resource usage
Vertex AI SDK for Python
assess_batch_prediction_resources()
.batch_prediction_resources = my_dataset.assess_batch_prediction_resources(
model_name="gemini-2.0-flash"
)
print(batch_prediction_resources)
# For example, BatchPredictionResourceUsageAssessmentResult(token_count=362688, audio_token_count=122000)
REST
assess
method and provide a batchPredictionResourceUsageAssessmentConfig
.curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID:assess" \
-d '{
"batchPredictionResourceUsageAssessmentConfig": {
"modelName": "projects/PROJECT_ID/locations/LOCATION/models/gemini-2.0-flash-001"
}
}'
Run the batch prediction job
table_id
of the assembled output.Vertex AI SDK for Python
from vertexai.batch_prediction import BatchPredictionJob
# Dataset needs to have an attached template_config to batch prediction
my_dataset.attach_template_config(template_config=template_config)
# assemble dataset to get assembly table id
assembly_table_id, _ = my_dataset.assemble()
batch_prediction_job = BatchPredictionJob.submit(
source_model="gemini-2.0-flash-001",
input_dataset=assembly_table_id,
)
Google Gen AI SDK
from google import genai
client = genai.Client(http_options=HttpOptions(api_version="v1"))
# Attach template_config and assemble dataset
my_dataset.attach_template_config(template_config=template_config)
assembly_table_id, _ = my_dataset.assemble()
job = client.batches.create(
model="gemini-2.0-flash-001",
src=assembly_table_id,
)
Limitations
Pricing
Create
dataset
Assemble
dataset
Assess
dataset
Multimodal datasets
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-29 UTC.