Starting April 29, 2025, Gemini 1.5 Pro and Gemini 1.5 Flash models are not available in projects that have no prior usage of these models, including new projects. For details, see Model versions and lifecycle.

Create a context cache

This guide shows you how to create a context cache, which lets you store a large amount of data for use in multiple requests to a Gemini model. This page covers the following topics:

Limits and requirements: Learn about the constraints for caching content, including size and duration.
Create a context cache: Follow examples using the REST API, Python SDK, or Go SDK to create a standard context cache.
Create a context cache with CMEK: Use your own encryption keys for added security when creating a cache.

Before you can use a context cache, you must create one. Key characteristics of a context cache include the following:

Content type: Cached content can be any of the MIME types supported by Gemini multimodal models, such as text, audio, or video. You can specify more than one file to cache. For more information, see the following media requirements:
Content source: You can specify content to cache using a blob, text, or a path to a file in a Cloud Storage bucket. For content larger than 10 MB, you must use a file URI from a Cloud Storage bucket.
Location: The cached content is stored in the same region where you make the request to create the cache.
Expiration: A context cache has a finite lifespan. The default expiration time is 60 minutes. You can specify a different expiration time using the ttl or expire_time property when you create the cache. You can also update the expiration time for an unexpired cache. After a cache expires, you must recreate it to use its content again.

Limits

The content that you cache must adhere to the limits shown in the following table:

Context caching limits
Minimum cache token count	`2,048` (Gemini 2.5 Pro) `1,024` (Gemini 2.5 Flash) `1,024` (Gemini 2.0 Flash) `1,024` (Gemini 2.0 Flash-Lite)
Maximum size of content you can cache using a blob or text	10 MB
Minimum time before a cache expires after it's created	1 minute
Maximum time before a cache expires after it's created	There isn't a maximum cache duration

Before you begin

Location support

Context caching isn't supported in the Sydney, Australia (australia-southeast1) region.

Encryption key support

Context caching supports Customer-Managed Encryption Keys (CMEKs), allowing you to control the encryption of your cached data and protect your sensitive information with encryption keys that you manage and own. This provides an additional layer of security and compliance.

Refer to the example for more details.

Access Transparency support

Context caching supports Access Transparency.

Create a context cache

The following examples show how to create a context cache using the REST API or the Google Gen AI SDK for Python and Go. Use this table to help you decide which option is best for you.

Method	Description	Use Case
REST API	Send requests directly to the Vertex AI API endpoint using tools like `curl` or a custom HTTP client.	Best for quick testing, environments without a supported SDK, or integrating with non-Python/Go applications.
Gen AI SDK for Python	Use an idiomatic Python library to interact with the API.	Best for Python developers who want a streamlined development experience.
Gen AI SDK for Go	Use an idiomatic Go library to interact with the API.	Best for Go developers who want to integrate context caching into their Go applications.

Python

Install

pip install --upgrade google-genai

To learn more, see the SDK reference documentation.

Set environment variables to use the Gen AI SDK with Vertex AI:

# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT
export GOOGLE_CLOUD_LOCATION=us-central1
export GOOGLE_GENAI_USE_VERTEXAI=True

from google import genai
from google.genai.types import Content, CreateCachedContentConfig, HttpOptions, Part

client = genai.Client(http_options=HttpOptions(api_version="v1"))

system_instruction = """
You are an expert researcher. You always stick to the facts in the sources provided, and never make up new facts.
Now look at these research papers, and answer the following questions.
"""

contents = [
    Content(
        role="user",
        parts=[
            Part.from_uri(
                file_uri="gs://cloud-samples-data/generative-ai/pdf/2312.11805v3.pdf",
                mime_type="application/pdf",
            ),
            Part.from_uri(
                file_uri="gs://cloud-samples-data/generative-ai/pdf/2403.05530.pdf",
                mime_type="application/pdf",
            ),
        ],
    )
]

content_cache = client.caches.create(
    model="gemini-2.5-flash",
    config=CreateCachedContentConfig(
        contents=contents,
        system_instruction=system_instruction,
        # (Optional) For enhanced security, the content cache can be encrypted using a Cloud KMS key
        # kms_key_name = "projects/.../locations/us-central1/keyRings/.../cryptoKeys/..."
        display_name="example-cache",
        ttl="86400s",
    ),
)

print(content_cache.name)
print(content_cache.usage_metadata)
# Example response:
#   projects/111111111111/locations/us-central1/cachedContents/1111111111111111111
#   CachedContentUsageMetadata(audio_duration_seconds=None, image_count=167,
#       text_count=153, total_token_count=43130, video_duration_seconds=None)

Go

Learn how to install or update the Go.

To learn more, see the SDK reference documentation.

Set environment variables to use the Gen AI SDK with Vertex AI:

# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT
export GOOGLE_CLOUD_LOCATION=us-central1
export GOOGLE_GENAI_USE_VERTEXAI=True

import (
	"context"
	"encoding/json"
	"fmt"
	"io"
	"time"

	genai "google.golang.org/genai"
)

// createContentCache shows how to create a content cache with an expiration parameter.
func createContentCache(w io.Writer) (string, error) {
	ctx := context.Background()

	client, err := genai.NewClient(ctx, &genai.ClientConfig{
		HTTPOptions: genai.HTTPOptions{APIVersion: "v1"},
	})
	if err != nil {
		return "", fmt.Errorf("failed to create genai client: %w", err)
	}

	modelName := "gemini-2.5-flash"

	systemInstruction := "You are an expert researcher. You always stick to the facts " +
		"in the sources provided, and never make up new facts. " +
		"Now look at these research papers, and answer the following questions."

	cacheContents := []*genai.Content{
		{
			Parts: []*genai.Part{
				{FileData: &genai.FileData{
					FileURI:  "gs://cloud-samples-data/generative-ai/pdf/2312.11805v3.pdf",
					MIMEType: "application/pdf",
				}},
				{FileData: &genai.FileData{
					FileURI:  "gs://cloud-samples-data/generative-ai/pdf/2403.05530.pdf",
					MIMEType: "application/pdf",
				}},
			},
			Role: "user",
		},
	}
	config := &genai.CreateCachedContentConfig{
		Contents: cacheContents,
		SystemInstruction: &genai.Content{
			Parts: []*genai.Part{
				{Text: systemInstruction},
			},
		},
		DisplayName: "example-cache",
		TTL:         time.Duration(time.Duration.Seconds(86400)),
	}

	res, err := client.Caches.Create(ctx, modelName, config)
	if err != nil {
		return "", fmt.Errorf("failed to create content cache: %w", err)
	}

	cachedContent, err := json.MarshalIndent(res, "", "  ")
	if err != nil {
		return "", fmt.Errorf("failed to marshal cache info: %w", err)
	}

	// See the documentation: https://pkg.go.dev/google.golang.org/genai#CachedContent
	fmt.Fprintln(w, string(cachedContent))

	// Example response:
	// {
	//   "name": "projects/111111111111/locations/us-central1/cachedContents/1111111111111111111",
	//   "displayName": "example-cache",
	//   "model": "projects/111111111111/locations/us-central1/publishers/google/models/gemini-2.5-flash",
	//   "createTime": "2025-02-18T15:05:08.29468Z",
	//   "updateTime": "2025-02-18T15:05:08.29468Z",
	//   "expireTime": "2025-02-19T15:05:08.280828Z",
	//   "usageMetadata": {
	//     "imageCount": 167,
	//     "textCount": 153,
	//     "totalTokenCount": 43125
	//   }
	// }

	return res.Name, nil
}

REST

To create a context cache with the REST API, send a POST request to the publisher model endpoint. The following example shows how to create a context cache by using a file stored in a Cloud Storage bucket.

Before using any of the request data, make the following replacements:

PROJECT_ID: Your project ID.
LOCATION: The region to process the request and where the cached content is stored. For a list of supported regions, see Available regions.
CACHE_DISPLAY_NAME: A meaningful display name to describe and to help you identify each context cache.
MIME_TYPE: The MIME type of the content to cache.
CONTENT_TO_CACHE_URI: The Cloud Storage URI of the content to cache.
MODEL_ID: The model to use for caching.

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/cachedContents

Request JSON body:

{
  "model": "projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID",
  "displayName": "CACHE_DISPLAY_NAME",
  "contents": [{
    "role": "user",
      "parts": [{
        "fileData": {
          "mimeType": "MIME_TYPE",
          "fileUri": "CONTENT_TO_CACHE_URI"
        }
      }]
  },
  {
    "role": "model",
      "parts": [{
        "text": "This is sample text to demonstrate explicit caching."
      }]
  }]
}

To send your request, choose one of these options:

curl

Note: The following command assumes that you have logged in to the gcloud CLI with your user account by running gcloud init or gcloud auth login , or by using Cloud Shell, which automatically logs you into the gcloud CLI . You can check the currently active account by running gcloud auth list.

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/cachedContents"

PowerShell

Note: The following command assumes that you have logged in to the gcloud CLI with your user account by running gcloud init or gcloud auth login . You can check the currently active account by running gcloud auth list.

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/cachedContents" | Select-Object -Expand Content

You should receive a JSON response similar to the following:

Response

{
  "name": "projects/PROJECT_NUMBER/locations/us-central1/cachedContents/CACHE_ID",
  "model": "projects/PROJECT_ID/locations/us-central1/publishers/google/models/gemini-2.0-flash-001",
  "createTime": "2024-06-04T01:11:50.808236Z",
  "updateTime": "2024-06-04T01:11:50.808236Z",
  "expireTime": "2024-06-04T02:11:50.794542Z"
}

Example curl command

LOCATION="us-central1"
MODEL_ID="gemini-2.0-flash-001"
PROJECT_ID="test-project"
MIME_TYPE="video/mp4"
CACHED_CONTENT_URI="gs://path-to-bucket/video-file-name.mp4"

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/cachedContents -d \
'{
  "model":"projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}",
  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "fileData": {
            "mimeType": "${MIME_TYPE}",
            "fileUri": "${CACHED_CONTENT_URI}"
          }
        }
      ]
    }
  ]
}'

Create a context cache with CMEK

To implement context caching with CMEKs, create a CMEK by following the instructions and make sure the Vertex AI per-product, per-project service account (P4SA) has the necessary Cloud KMS CryptoKey Encrypter/Decrypter permissions on the key. This lets you securely create and manage cached content as well make other calls like {List, Update, Delete, Get} CachedContent(s) without repeatedly specifying a KMS key.

REST

You can use REST to create a context cache by using the Vertex AI API to send a POST request to the publisher model endpoint. The following example shows how to create a context cache using a file stored in a Cloud Storage bucket.

Before using any of the request data, make the following replacements:

PROJECT_ID: .
LOCATION: The region to process the request and where the cached content is stored. For a list of supported regions, see Available regions.
MODEL_ID: gemini-2.0-flash-001.
CACHE_DISPLAY_NAME: A meaningful display name to describe and to help you identify each context cache.
MIME_TYPE: The MIME type of the content to cache.
CACHED_CONTENT_URI: The Cloud Storage URI of the content to cache.
KMS_KEY_NAME: The Cloud KMS key name.

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/cachedContents

Request JSON body:

{
  "model": "projects/PROJECT_ID/locations/LOCATION/publishers/google/models/gemini-2.0-flash-001",
  "displayName": "CACHE_DISPLAY_NAME",
  "contents": [{
    "role": "user",
      "parts": [{
        "fileData": {
          "mimeType": "MIME_TYPE",
          "fileUri": "CONTENT_TO_CACHE_URI"
        }
      }]}],
    "encryptionSpec": {
      "kmsKeyName": "KMS_KEY_NAME"
    }
}

To send your request, choose one of these options:

curl

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/cachedContents"

PowerShell

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/cachedContents" | Select-Object -Expand Content

You should receive a JSON response similar to the following:

Response

{
  "name": "projects/PROJECT_NUMBER/locations/us-central1/cachedContents/CACHE_ID",
  "model": "projects/PROJECT_ID/locations/us-central1/publishers/google/models/gemini-2.0-flash-001",
  "createTime": "2024-06-04T01:11:50.808236Z",
  "updateTime": "2024-06-04T01:11:50.808236Z",
  "expireTime": "2024-06-04T02:11:50.794542Z"
}

Example curl command

LOCATION="us-central1"
MODEL_ID="gemini-2.0-flash-001"
PROJECT_ID="test-project"
MIME_TYPE="video/mp4"
CACHED_CONTENT_URI="gs://path-to-bucket/video-file-name.mp4"
KMS_KEY_NAME="projects/${PROJECT_ID}/locations/{LOCATION}/keyRings/your-key-ring/cryptoKeys/your-key"

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/cachedContents -d \
'{

"model": "projects/{PROJECT_ID}}/locations/{LOCATION}/publishers/google/models/{MODEL_ID}",
  "contents" : [
    {
      "role": "user",
      "parts": [
        {
          "file_data": {
            "mime_type":"{MIME_TYPE}",
            "file_uri":"{CACHED_CONTENT_URI}"
          }
        }
      ]
    }
  ],
  "encryption_spec" :
  {
    "kms_key_name":"{KMS_KEY_NAME}"
  }
}'

GenAI SDK for Python

Install

pip install --upgrade google-genai

To learn more, see the SDK reference documentation.

Set environment variables to use the Gen AI SDK with Vertex AI:

# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT
export GOOGLE_CLOUD_LOCATION=us-central1
export GOOGLE_GENAI_USE_VERTEXAI=True

import os
from google import genai
from google.genai.types import Content, CreateCachedContentConfig, HttpOptions, Part

os.environ['GOOGLE_CLOUD_PROJECT'] = 'vertexsdk'
os.environ['GOOGLE_CLOUD_LOCATION'] = 'us-central1'
os.environ['GOOGLE_GENAI_USE_VERTEXAI'] = 'True'
  
client = genai.Client(http_options=HttpOptions(api_version="v1"))

system_instruction = """
You are an expert researcher. You always stick to the facts in the sources provided, and never make up new facts.
Now look at these research papers, and answer the following questions.
"""

contents = [
    Content(
        role="user",
        parts=[
            Part.from_uri(
                file_uri="gs://cloud-samples-data/generative-ai/pdf/2312.11805v3.pdf",
                mime_type="application/pdf",
            ),
            Part.from_uri(
                file_uri="gs://cloud-samples-data/generative-ai/pdf/2403.05530.pdf",
                mime_type="application/pdf",
            ),
        ],
    )
]

content_cache = client.caches.create(
    model="gemini-2.0-flash-001",
    config=CreateCachedContentConfig(
        contents=contents,
        system_instruction=system_instruction,
        display_name="example-cache",
        kms_key_name="projects/vertexsdk/locations/us-central1/keyRings/your-project/cryptoKeys/your-key",
        ttl="86400s",
    ),
)

print(content_cache.name)
print(content_cache.usage_metadata)

GenAI SDK for Go

Learn how to install or update the Gen AI SDK for Go.

To learn more, see the SDK reference documentation.

Set environment variables to use the Gen AI SDK with Vertex AI:


import (
    "context"
    "encoding/json"
    "fmt"
    "io"

    genai "google.golang.org/genai"
)

// createContentCache shows how to create a content cache with an expiration parameter.
func createContentCache(w io.Writer) (string, error) {
    ctx := context.Background()

    client, err := genai.NewClient(ctx, &genai.ClientConfig{
        HTTPOptions: genai.HTTPOptions{APIVersion: "v1beta1"},
    })
    if err != nil {
        return "", fmt.Errorf("failed to create genai client: %w", err)
    }

    modelName := "gemini-2.0-flash-001"

    systemInstruction := "You are an expert researcher. You always stick to the facts " +
        "in the sources provided, and never make up new facts. " +
        "Now look at these research papers, and answer the following questions."

    cacheContents := []*genai.Content{
        {
            Parts: []*genai.Part{
                {FileData: &genai.FileData{
                    FileURI:  "gs://cloud-samples-data/generative-ai/pdf/2312.11805v3.pdf",
                    MIMEType: "application/pdf",
                }},
                {FileData: &genai.FileData{
                    FileURI:  "gs://cloud-samples-data/generative-ai/pdf/2403.05530.pdf",
                    MIMEType: "application/pdf",
                }},
            },
            Role: "user",
        },
    }
    config := &genai.CreateCachedContentConfig{
        Contents: cacheContents,
        SystemInstruction: &genai.Content{
            Parts: []*genai.Part{
                {Text: systemInstruction},
            },
        },
        DisplayName: "example-cache",
        KmsKeyName:  "projects/vertexsdk/locations/us-central1/keyRings/your-project/cryptoKeys/your-key",
        TTL:         "86400s",
    }

    res, err := client.Caches.Create(ctx, modelName, config)
    if err != nil {
        return "", fmt.Errorf("failed to create content cache: %w", err)
    }

    cachedContent, err := json.MarshalIndent(res, "", "  ")
    if err != nil {
        return "", fmt.Errorf("failed to marshal cache info: %w", err)
    }

    // See the documentation: https://pkg.go.dev/google.golang.org/genai#CachedContent
    fmt.Fprintln(w, string(cachedContent))

    return res.Name, nil
}

What's next

Learn how to use a context cache.
Learn how to update the expiration time of a context cache.

Create a context cache Stay organized with collections Save and categorize content based on your preferences.

Limits

Before you begin

Location support

Encryption key support

Access Transparency support

Create a context cache

Python

Install

Go

REST

curl

PowerShell

Response

Example curl command

Create a context cache with CMEK

REST

curl

PowerShell

Response

Example curl command

GenAI SDK for Python

GenAI SDK for Go

What's next

Create a context cache