Use a context cache

You can use REST APIs or the Python SDK to reference content stored in a context cache in a generative AI application. Before it can be used, you must first create the context cache.

The context cache object you use in your code includes the following properties:

  • name - The context cache resource name. Its format is projects/PROJECT_NUMBER/locations/LOCATION/cachedContents/CACHE_ID. When you create a context cache, you can find its resource name is in the response. The project number is a unique identifier for your project. The cache ID is an ID for your cache. When you specify a context cache in your code, you must use the full context cache resource name. The following is an example that shows how you specify a cached content resource name in a request body:

    "cached_content": "projects/123456789012/locations/us-central1/123456789012345678"
    
  • model - The resource name for the model used to create the cache. Its format is projects/PROJECT_NUMBER/locations/LOCATION/publishers/PUBLISHER_NAME/models/MODEL_ID.

  • createTime - A Timestamp that specifies the create time of the context cache.

  • updateTime - A Timestamp that specifies the most recent update time of a context cache. After a context cache is created, and before it's updated, its createTime and updateTime are the same.

  • expireTime - A Timestamp that specifies when a context cache expires. The default expireTime is 60 minutes after the createTime. You can update the cache with a new expiration time. For more information, see Update the context cache. After a cache expires, it's marked for deletion and you shouldn't assume that it can be used or updated. If you need to use a context cache that expired, you need to recreate it with an appropriate expiration time.

Context cache use restrictions

The following features can be specified when you create a context cache. You shouldn't specify them again in your request:

  • The GenerativeModel.system_instructions property. This property is used to specify instructions to the model before the model receives instructions from a user. For more information, see System instructions.

  • The GenerativeModel.tool_config property. The tool_config property is used to specify tools used by the Gemini model, such as a tool used by the function calling feature.

  • The GenerativeModel.tools property. The GenerativeModel.tools property is used to specify functions to create a function calling application. For more information, see Function calling.

Use a context cache sample

The following shows how to use a context cache. When you use a context cache, you can't specify the following properties:

  • GenerativeModel.system_instructions
  • GenerativeModel.tool_config
  • GenerativeModel.tools

Gen AI SDK for Python

Install

pip install --upgrade google-genai
To learn more, see the SDK reference documentation.

Set environment variables to use the Gen AI SDK with Vertex AI:

# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT
export GOOGLE_CLOUD_LOCATION=us-central1
export GOOGLE_GENAI_USE_VERTEXAI=True

from google import genai
from google.genai.types import GenerateContentConfig, HttpOptions

client = genai.Client(http_options=HttpOptions(api_version="v1beta1"))

# Use content cache to generate text response
# E.g cache_name = 'projects/111111111111/locations/us-central1/cachedContents/1111111111111111111'
response = client.models.generate_content(
    model="gemini-1.5-pro-002",
    contents="Summarize the pdfs",
    config=GenerateContentConfig(
        cached_content=cache_name,
    ),
)
print(response.text)
# Example response
#   The Gemini family of multimodal models from Google DeepMind demonstrates remarkable capabilities across various
#   modalities, including image, audio, video, and text....

REST

You can use REST to use a context cache with a prompt by using the Vertex AI API to send a POST request to the publisher model endpoint.

Before using any of the request data, make the following replacements:

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/gemini-1.5-pro-002:generateContent

Request JSON body:

{
  "cachedContent": "projects/PROJECT_NUMBER/locations/LOCATION/cachedContents/CACHE_ID",
  "contents": [
      {"role":"user","parts":[{"text":"PROMPT_TEXT"}]}
  ],
  "generationConfig": {
      "maxOutputTokens": 8192,
      "temperature": 1,
      "topP": 0.95,
  },
  "safetySettings": [
      {
          "category": "HARM_CATEGORY_HATE_SPEECH",
          "threshold": "BLOCK_MEDIUM_AND_ABOVE"
      },
      {
          "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
          "threshold": "BLOCK_MEDIUM_AND_ABOVE"
      },
      {
          "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
          "threshold": "BLOCK_MEDIUM_AND_ABOVE"
      },
      {
          "category": "HARM_CATEGORY_HARASSMENT",
          "threshold": "BLOCK_MEDIUM_AND_ABOVE"
      }
  ],
}

To send your request, choose one of these options:

curl

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/gemini-1.5-pro-002:generateContent"

PowerShell

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/gemini-1.5-pro-002:generateContent" | Select-Object -Expand Content

You should receive a JSON response similar to the following.

Example curl command

LOCATION="us-central1"
MODEL_ID="gemini-1.5-pro-002"
PROJECT_ID="test-project"

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:generateContent" -d \
'{
  "cachedContent": "projects/${PROJECT_NUMBER}/locations/${LOCATION}/cachedContents/${CACHE_ID}",
  "contents": [
      {"role":"user","parts":[{"text":"What are the benefits of exercise?"}]}
  ],
  "generationConfig": {
      "maxOutputTokens": 8192,
      "temperature": 1,
      "topP": 0.95,
  },
  "safetySettings": [
    {
      "category": "HARM_CATEGORY_HATE_SPEECH",
      "threshold": "BLOCK_MEDIUM_AND_ABOVE"
    },
    {
      "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
      "threshold": "BLOCK_MEDIUM_AND_ABOVE"
    },
    {
      "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
      "threshold": "BLOCK_MEDIUM_AND_ABOVE"
    },
    {
      "category": "HARM_CATEGORY_HARASSMENT",
      "threshold": "BLOCK_MEDIUM_AND_ABOVE"
    }
  ],
}'