Multimodal datasets

This guide shows you how to create and use multimodal datasets in Vertex AI for generative AI tasks.

This page describes the following topics:

  • Create a dataset: Learn how to create a multimodal dataset from various sources like BigQuery, DataFrames, or JSONL files.
  • Construct and attach a template: Define how to transform your dataset into a format that a generative model can understand.
  • Assemble the dataset: Optionally, apply the template to your dataset to preview the data before it is passed to the model.
  • Tune your model: Use your prepared dataset to perform supervised fine-tuning on a Gemini model.
  • Get batch predictions: Use your dataset to get batch predictions from a Gemini model.

The following diagram summarizes the overall workflow for using multimodal datasets:

With multimodal datasets on Vertex AI, you can create, manage, share, and use multimodal datasets for Generative AI. Key features include:

  • Flexible data sources: Load datasets from BigQuery, DataFrames, or JSONL files in Cloud Storage.
  • Reusability: Create your dataset once and use it across different job types, such as supervised fine-tuning and batch prediction, which helps prevent data duplication and formatting issues.
  • Centralized management: Keep all your generative AI datasets in a single, managed location.
  • Pre-run validation: Validate your schema and structure and quantify the resources needed for downstream tasks, helping you catch errors and estimate costs before you start a task.

You can use multimodal datasets through the Vertex AI SDK for Python or REST API.

Multimodal datasets are a type of managed datasets on Vertex AI and differ from other managed datasets in the following ways:

  • Modality support: Multimodal datasets can include data of any modality (text, image, audio, video), while other managed datasets are for a single modality.
  • Use case: Multimodal datasets are exclusively for Generative AI services on Vertex AI, such as tuning and batch prediction. Other managed dataset types are for Vertex AI predictive models.
  • Supported methods: Multimodal datasets support additional methods like assemble and assess for previewing data, validating requests, and estimating costs.
  • Storage: Multimodal datasets are stored in BigQuery, which is optimized for large datasets.

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Verify that billing is enabled for your Google Cloud project.

  4. Enable the Vertex AI, BigQuery, and Cloud Storage APIs.

    Enable the APIs

  5. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  6. Verify that billing is enabled for your Google Cloud project.

  7. Enable the Vertex AI, BigQuery, and Cloud Storage APIs.

    Enable the APIs

  8. Install and initialize the Vertex AI SDK for Python
  9. Import the following libraries:
    from google.cloud.aiplatform.preview import datasets
    
    # To use related features, you may also need to import some of the following features:
    from vertexai.preview.tuning import sft
    from vertexai.batch_prediction import BatchPredictionJob
    
    from vertexai.generative_models import Content, Part, Tool, ToolConfig, SafetySetting, GenerationConfig, FunctionDeclaration
    

Create a dataset

You can create a multimodal dataset from different sources:

Source Description Use Case
Pandas or BigQuery DataFrame Create a dataset from an in-memory DataFrame object. Ideal for interactive development in notebooks where data is already loaded or manipulated as a DataFrame.
BigQuery table Create a dataset directly from an existing table in BigQuery using its URI. Best for large, persistent datasets already stored in BigQuery, avoiding data movement.
JSONL file in Cloud Storage Create a dataset from a JSONL file where each line is a pre-formatted request for Gemini. Useful when your data is already structured in the final Gemini request format and stored in Cloud Storage.
Existing multimodal dataset Load a previously created multimodal dataset by its name or by listing available datasets. For reusing or continuing work on an existing dataset within Vertex AI.

Vertex AI SDK for Python

From a Pandas DataFrame

my_dataset = datasets.MultimodalDataset.from_pandas(
    dataframe=my_dataframe,
    target_table_id=table_id    # optional
)

From a BigQuery DataFrame

my_dataset = datasets.MultimodalDataset.from_bigframes(
    dataframe=my_dataframe,
    target_table_id=table_id    # optional
)

From a BigQuery table

my_dataset_from_bigquery = datasets.MultimodalDataset.from_bigquery(
    bigquery_uri=f"bq://projectId.datasetId.tableId"
)

From a JSONL file in Cloud Storage

In the following example, the JSONL file contains requests that are already formatted for Gemini, so no assembly is required.

my_dataset = datasets.MultimodalDataset.from_gemini_request_jsonl(
  gcs_uri = gcs_uri_of_jsonl_file,
)

From an existing multimodal dataset

# Get the most recently created dataset
first_dataset = datasets.MultimodalDataset.list()[0]

# Load dataset based on its name
same_dataset = datasets.MultimodalDataset(first_dataset.name)

REST

From a BigQuery table

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT/locations/LOCATION/datasets" \
-d '{
  "display_name": "TestDataset",
  "metadataSchemaUri": "gs://google-cloud-aiplatform/schema/dataset/metadata/multimodal_1.0.0.yaml",
  "metadata": {
    "inputConfig": {
      "bigquery_source": {
        "uri": "bq://projectId.datasetId.tableId"
      }
    }
  }
}'

Construct and attach a template

A template defines how to transform your multimodal dataset into a format that a generative model can use. You must attach a template to your dataset before you can run a tuning or batch prediction job.

Method Description Pros Cons
construct_single_turn_template A helper method to create a template for simple, single-turn (prompt-response) conversations. Simple and concise for basic use cases. Limited to single-turn conversations; less flexible.
Manual construction with GeminiExample Manually define the conversation structure, including multi-turn interactions and system instructions, using Content and Part objects. Highly flexible; supports complex, multi-turn conversations and fine-grained control over the prompt structure. More verbose and complex to set up.

Vertex AI SDK for Python

1. Construct a template

There are two ways to construct a template:

  • Use the construct_single_turn_template helper method:

    template_config = datasets.construct_single_turn_template(
            prompt="This is the image: {image_uris}",
            response="{labels}",
            system_instruction='You are a botanical image classifier. Analyze the provided image '
                    'and determine the most accurate classification of the flower.'
                    'These are the only flower categories: [\'daisy\', \'dandelion\', \'roses\', \'sunflowers\', \'tulips\'].'
                    'Return only one category per image.'
    )
    
  • Manually construct a template from a GeminiExample: This allows for finer granularity, such as multi-turn conversations. The following code sample also includes optional commented code for specifying a field_mapping, which lets you use a placeholder name that is different from the column name of the dataset.

    # Define a GeminiExample
    gemini_example = datasets.GeminiExample(
        contents=[
            Content(role="user", parts=[Part.from_text("This is the image: {image_uris}")]),
            Content(role="model", parts=[Part.from_text("This is the flower class: {label}.")]),
          Content(role="user", parts=[Part.from_text("Your response should only contain the class label.")]),
          Content(role="model", parts=[Part.from_text("{label}")]),
    
          # Optional: If you specify a field_mapping, you can use different placeholder values. For example:
          # Content(role="user", parts=[Part.from_text("This is the image: {uri_placeholder}")]),
          # Content(role="model", parts=[Part.from_text("This is the flower class: {flower_placeholder}.")]),
          # Content(role="user", parts=[Part.from_text("Your response should only contain the class label.")]),
          # Content(role="model", parts=[Part.from_text("{flower_placeholder}")]),
        ],
        system_instruction=Content(
            parts=[
                Part.from_text(
                    'You are a botanical image classifier. Analyze the provided image '
                    'and determine the most accurate classification of the flower.'
                    'These are the only flower categories: [\'daisy\', \'dandelion\', \'roses\', \'sunflowers\', \'tulips\'].'
                    'Return only one category per image.'
                )
            ]
        ),
    )
    
    # construct the template, specifying a map for the placeholder
    template_config = datasets.GeminiTemplateConfig(
        gemini_example=gemini_example,
    
        # Optional: Map the template placeholders to the column names of your dataset.
        # Not required if the template placesholders are column names of the dataset.
        # field_mapping={"uri_placeholder": "image_uris", "flower_placeholder": "labels"},
    )
    

2. Attach the template to the dataset

my_dataset.attach_template_config(template_config=template_config)

REST

Call the patch method and update the metadata field with the following:

  • The URI of the BigQuery table. For datasets created from a BigQuery table, this is your source bigquery_uri. For datasets created from other sources, such as JSONL or DataFrame, this is the BigQuery table where your data was copied.
  • A gemini_template_config.
curl -X PATCH \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d $'{
  "metadata": {
    "input_config": {
      "bigquery_source": {
        "uri": "bq://projectId.datasetId.tableId"
      }
    },
    "gemini_template_config_source": {
      "gemini_template_config": {
        "gemini_example": {
          "contents": [
            {
              "role": "user",
              "parts": [
                {
                  "text": "This is the image: {image_uris}"

                }
              ]
            },
            {
              "role": "model",
              "parts": [
                {
                  "text": "response"
                }
              ]
            }
          ]
        "systemInstruction": {
            "parts": [
                {
                    "text": "You are a botanical image classifier."
                }
            ]
          }
        }
      }
    }
  }
}' \
"https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID?updateMask=metadata"

(Optional) Assemble the dataset

The assemble method applies the template to transform your dataset and stores the output in a new BigQuery table. This lets you preview the data before you use it for a tuning or batch prediction job.

By default, the dataset's attached template_config is used, but you can specify a template to override the default behavior.

Vertex AI SDK for Python

table_id, assembly = my_dataset.assemble(template_config=template_config)

# Inspect the results
assembly.head()

REST

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID:assemble" \
-d '{}'

For example, assume that your multimodal dataset contains the following data:

Row image_uris labels
1 gs://cloud-samples-data/ai-platform/flowers/daisy/1396526833_fb867165be_n.jpg daisy

The assemble method creates a new BigQuery table with the name table_id where each row contains the request body. For example:

{
  "contents": [
    {
      "parts": [
        {
          "text": "This is the image: "
        },
        {
          "fileData": {
            "fileUri": "gs://cloud-samples-data/ai-platform/flowers/daisy/1396526833_fb867165be_n.jpg",
            "mimeType": "image/jpeg"
          }
        }
      ],
      "role": "user"
    },
    {
      "parts": [
        {
          "text": "daisy"
        }
      ],
      "role": "model"
    }
  ],
  "systemInstruction": {
    "parts": [
      {
        "text": "You are a botanical image classifier. Analyze the provided image and determine the most accurate classification of the flower.These are the only flower categories: ['daisy', 'dandelion', 'roses', 'sunflowers', 'tulips'].Return only one category per image."
      }
    ]
  }
}

Tune your model

You can tune Gemini models using a multimodal dataset.

(Optional) Validate the dataset

Assess the dataset to check for errors, such as dataset formatting or model errors.

Vertex AI SDK for Python

Call assess_tuning_validity(). By default, the dataset's attached template_config is used, but you can specify a template to override this behavior.

# Attach template
my_dataset.attach_template_config(template_config=template_config)

# Validation for tuning
validation = my_dataset.assess_tuning_validity(
    model_name="gemini-2.0-flash-001",
    dataset_usage="SFT_TRAINING"
)

# Inspect validation result
validation.errors

REST

Call the assess method and provide a TuningValidationAssessmentConfig.

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID:assess" \
-d '{
  "tuningValidationAssessmentConfig": {
    "modelName": "projects/PROJECT_ID/locations/LOCATION/models/gemini-2.0-flash-001",
    "datasetUsage": "SFT_TRAINING"
  }
}'

(Optional) Estimate resource usage

Assess the dataset to get the token and billable character count for your tuning job.

Vertex AI SDK for Python

Call assess_tuning_resources().

# Resource estimation for tuning.
tuning_resources = my_dataset.assess_tuning_resources(
    model_name="gemini-2.0-flash-001"
)

print(tuning_resources)
# For example, TuningResourceUsageAssessmentResult(token_count=362688, billable_character_count=122000)

REST

Call the assess method and provide a TuningResourceUsageAssessmentConfig.

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID:assess" \
-d '{
  "tuningResourceUsageAssessmentConfig": {
    "modelName": "projects/PROJECT_ID/locations/LOCATION/models/gemini-2.0-flash-001"
  }
}'

Run the tuning job

Vertex AI SDK for Python

from vertexai.tuning import sft

sft_tuning_job = sft.train(
  source_model="gemini-2.0-flash-001",
  # Pass the Vertex Multimodal Datasets directly
  train_dataset=my_multimodal_dataset,
  validation_dataset=my_multimodal_validation_dataset,
)

Google Gen AI SDK

from google import genai
from google.genai.types import HttpOptions, CreateTuningJobConfig

client = genai.Client(http_options=HttpOptions(api_version="v1"))

tuning_job = client.tunings.tune(
  base_model="gemini-2.0-flash-001",
  # Pass the resource name of the Vertex Multimodal Dataset, not the dataset object
  training_dataset={
      "vertex_dataset_resource": my_multimodal_dataset.resource_name
  },
  # Optional
  config=CreateTuningJobConfig(
      tuned_model_display_name="Example tuning job"),
)

For more information, see Create a tuning job.

Get batch predictions

You can get batch predictions using a multimodal dataset.

(Optional) Validate the dataset

Assess the dataset to check for errors, such as dataset formatting or model errors.

Vertex AI SDK for Python

Call assess_batch_prediction_validity(). By default, the dataset's attached template_config is used, but you can specify a template to override this behavior.

# Attach template
my_dataset.attach_template_config(template_config=template_config)

# Validation for batch prediction
validation = my_dataset.assess_batch_prediction_validity(
    model_name="gemini-2.0-flash-001",
    dataset_usage="SFT_TRAINING"
)

# Inspect validation result
validation.errors

REST

Call the assess method and provide a batchPredictionValidationAssessmentConfig.

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID:assess" \
-d '{
  "batchPredictionValidationAssessmentConfig": {
    "modelName": "projects/PROJECT_ID/locations/LOCATION/models/gemini-2.0-flash-001",
  }
}'

(Optional) Estimate resource usage

Assess the dataset to get the token count for your job.

Vertex AI SDK for Python

Call assess_batch_prediction_resources().

batch_prediction_resources = my_dataset.assess_batch_prediction_resources(
    model_name="gemini-2.0-flash"
)

print(batch_prediction_resources)
# For example, BatchPredictionResourceUsageAssessmentResult(token_count=362688, audio_token_count=122000)

REST

Call the assess method and provide a batchPredictionResourceUsageAssessmentConfig.

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID:assess" \
-d '{
  "batchPredictionResourceUsageAssessmentConfig": {
    "modelName": "projects/PROJECT_ID/locations/LOCATION/models/gemini-2.0-flash-001"
  }
}'

Run the batch prediction job

You can use your multimodal dataset to get batch predictions by passing the BigQuery table_id of the assembled output.

Vertex AI SDK for Python

from vertexai.batch_prediction import BatchPredictionJob

# Dataset needs to have an attached template_config to batch prediction
my_dataset.attach_template_config(template_config=template_config)

# assemble dataset to get assembly table id
assembly_table_id, _ = my_dataset.assemble()

batch_prediction_job = BatchPredictionJob.submit(
    source_model="gemini-2.0-flash-001",
    input_dataset=assembly_table_id,
)

Google Gen AI SDK

from google import genai

client = genai.Client(http_options=HttpOptions(api_version="v1"))

# Attach template_config and assemble dataset
my_dataset.attach_template_config(template_config=template_config)
assembly_table_id, _ = my_dataset.assemble()

job = client.batches.create(
    model="gemini-2.0-flash-001",
    src=assembly_table_id,
)

For more information, see Request a batch prediction job.

Limitations

  • You can use multimodal datasets only with generative AI features. You can't use them with non-generative AI features such as AutoML training and custom training.
  • You can use multimodal datasets only with Google models such as Gemini. You can't use them with third-party models.

Pricing

When you tune a model or run a batch prediction job, you are billed for Generative AI usage and querying the dataset in BigQuery.

When you create, assemble, or assess your multimodal dataset, you are billed for storing and querying multimodal datasets in BigQuery. Specifically, the following operations use those underlying services:

  • Create dataset
    • Datasets created from either an existing BigQuery table or DataFrame incur no additional storage costs because a logical view is used instead of storing another copy of the data.
    • Datasets created from other sources copy the data to a new BigQuery table, which incurs storage costs in BigQuery. For example, active logical storage for $0.02 per GiB per month.
  • Assemble dataset
    • This method creates a new BigQuery table that contains the full dataset in model request format, which incurs storage costs in BigQuery. For example, active logical storage for $0.02 per GiB per month.
    • This method also reads the dataset once, which incurs query costs in BigQuery. For example, on-demand compute pricing is $6.25 per TiB.
  • Assess dataset
    • This method reads the dataset once, which incurs query costs in BigQuery. For example, on-demand compute pricing is $6.25 per TiB.

Use the Pricing Calculator to generate a cost estimate based on your projected usage.