컨텍스트 캐시 만들기

컨텍스트 캐시를 사용하려면 먼저 만들어야 합니다. 생성하는 컨텍스트 캐시에는 Gemini 모델에 대한 여러 요청에서 사용할 수 있는 대량의 데이터가 포함되어 있습니다. 캐시된 콘텐츠는 캐시 생성을 요청한 리전에 저장됩니다.

캐시된 콘텐츠는 Gemini 멀티모달 모델에서 지원하는 모든 MIME 유형이 될 수 있습니다. 예를 들어 대량의 텍스트, 오디오 또는 동영상을 캐시할 수 있습니다. 캐시할 파일을 두 개 이상 지정할 수 있습니다. 자세한 내용은 다음 미디어 요구사항을 참조하세요.

Cloud Storage 버킷에 저장된 blob, 텍스트 또는 파일 경로를 사용하여 캐시할 콘텐츠를 지정합니다. 캐시하는 콘텐츠의 크기가 10MB를 초과하는 경우 Cloud Storage 버킷에 저장된 파일의 URI를 사용하여 지정해야 합니다.

캐시된 콘텐츠의 수명은 유한합니다. 컨텍스트 캐시의 기본 만료 시간은 생성 후 60분입니다. 다른 만료 시간을 원하는 경우 컨텍스트 캐시를 만들 때 ttl 또는 expire_time 속성을 사용하여 다른 만료 시간을 지정할 수 있습니다. 만료되지 않은 컨텍스트 캐시의 만료 시간을 업데이트할 수도 있습니다. ttl 및 expire_time을 지정하는 방법에 관한 자세한 내용은 만료 시간 업데이트를 참조하세요.

컨텍스트 캐시가 만료되면 더 이상 사용할 수 없습니다. 향후 프롬프트 요청에서 만료된 컨텍스트 캐시의 콘텐츠를 참조하려는 경우 컨텍스트 캐시를 다시 만들어야 합니다.

한도

캐시하는 콘텐츠는 다음 표에 표시된 한도를 준수해야 합니다.

컨텍스트 캐싱 한도
최소 캐시 토큰 수	`2,048`(Gemini 2.5 Pro) `1,024`(Gemini 2.5 Flash) `1,024`(Gemini 2.0 Flash) `1,024`(Gemini 2.0 Flash-Lite)
blob 또는 텍스트를 사용하여 캐시할 수 있는 최대 콘텐츠 크기	10MB
캐시가 생성된 후 만료되기 전까지의 최소 시간	1분
캐시가 생성된 후 만료되기 전까지의 최대 시간	최대 캐시 기간이 없습니다.

위치 지원

오스트레일리아 시드니(australia-southeast1) 리전에서는 컨텍스트 캐싱이 지원되지 않습니다.

암호화 키 지원

컨텍스트 캐싱은 고객 관리 암호화 키(CMEK)를 지원하므로 캐시된 데이터의 암호화를 제어하고 관리 및 소유한 암호화 키로 민감한 정보를 보호할 수 있습니다. 이를 통해 보안 및 규정 준수 기능이 한층 강화됩니다.

자세한 내용은 예시를 참조하세요.

액세스 투명성 지원

컨텍스트 캐싱은 액세스 투명성을 지원합니다.

컨텍스트 캐시 만들기 예시

다음 예는 컨텍스트 캐시를 만드는 방법을 보여줍니다.

Python

설치

pip install --upgrade google-genai

자세한 내용은 SDK 참고 문서를 참조하세요.

Vertex AI에서 Gen AI SDK를 사용하도록 환경 변수를 설정합니다.

# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT
export GOOGLE_CLOUD_LOCATION=us-central1
export GOOGLE_GENAI_USE_VERTEXAI=True

from google import genai
from google.genai.types import Content, CreateCachedContentConfig, HttpOptions, Part

client = genai.Client(http_options=HttpOptions(api_version="v1"))

system_instruction = """
You are an expert researcher. You always stick to the facts in the sources provided, and never make up new facts.
Now look at these research papers, and answer the following questions.
"""

contents = [
    Content(
        role="user",
        parts=[
            Part.from_uri(
                file_uri="gs://cloud-samples-data/generative-ai/pdf/2312.11805v3.pdf",
                mime_type="application/pdf",
            ),
            Part.from_uri(
                file_uri="gs://cloud-samples-data/generative-ai/pdf/2403.05530.pdf",
                mime_type="application/pdf",
            ),
        ],
    )
]

content_cache = client.caches.create(
    model="gemini-2.5-flash",
    config=CreateCachedContentConfig(
        contents=contents,
        system_instruction=system_instruction,
        # (Optional) For enhanced security, the content cache can be encrypted using a Cloud KMS key
        # kms_key_name = "projects/.../locations/us-central1/keyRings/.../cryptoKeys/..."
        display_name="example-cache",
        ttl="86400s",
    ),
)

print(content_cache.name)
print(content_cache.usage_metadata)
# Example response:
#   projects/111111111111/locations/us-central1/cachedContents/1111111111111111111
#   CachedContentUsageMetadata(audio_duration_seconds=None, image_count=167,
#       text_count=153, total_token_count=43130, video_duration_seconds=None)

Go

Go를 설치하거나 업데이트하는 방법을 알아보세요.

자세한 내용은 SDK 참고 문서를 참조하세요.

Vertex AI에서 Gen AI SDK를 사용하도록 환경 변수를 설정합니다.

# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT
export GOOGLE_CLOUD_LOCATION=us-central1
export GOOGLE_GENAI_USE_VERTEXAI=True

import (
	"context"
	"encoding/json"
	"fmt"
	"io"
	"time"

	genai "google.golang.org/genai"
)

// createContentCache shows how to create a content cache with an expiration parameter.
func createContentCache(w io.Writer) (string, error) {
	ctx := context.Background()

	client, err := genai.NewClient(ctx, &genai.ClientConfig{
		HTTPOptions: genai.HTTPOptions{APIVersion: "v1"},
	})
	if err != nil {
		return "", fmt.Errorf("failed to create genai client: %w", err)
	}

	modelName := "gemini-2.0-flash-001"

	systemInstruction := "You are an expert researcher. You always stick to the facts " +
		"in the sources provided, and never make up new facts. " +
		"Now look at these research papers, and answer the following questions."

	cacheContents := []*genai.Content{
		{
			Parts: []*genai.Part{
				{FileData: &genai.FileData{
					FileURI:  "gs://cloud-samples-data/generative-ai/pdf/2312.11805v3.pdf",
					MIMEType: "application/pdf",
				}},
				{FileData: &genai.FileData{
					FileURI:  "gs://cloud-samples-data/generative-ai/pdf/2403.05530.pdf",
					MIMEType: "application/pdf",
				}},
			},
			Role: "user",
		},
	}
	config := &genai.CreateCachedContentConfig{
		Contents: cacheContents,
		SystemInstruction: &genai.Content{
			Parts: []*genai.Part{
				{Text: systemInstruction},
			},
		},
		DisplayName: "example-cache",
		TTL:         time.Duration(time.Duration.Seconds(86400)),
	}

	res, err := client.Caches.Create(ctx, modelName, config)
	if err != nil {
		return "", fmt.Errorf("failed to create content cache: %w", err)
	}

	cachedContent, err := json.MarshalIndent(res, "", "  ")
	if err != nil {
		return "", fmt.Errorf("failed to marshal cache info: %w", err)
	}

	// See the documentation: https://pkg.go.dev/google.golang.org/genai#CachedContent
	fmt.Fprintln(w, string(cachedContent))

	// Example response:
	// {
	//   "name": "projects/111111111111/locations/us-central1/cachedContents/1111111111111111111",
	//   "displayName": "example-cache",
	//   "model": "projects/111111111111/locations/us-central1/publishers/google/models/gemini-2.0-flash-001",
	//   "createTime": "2025-02-18T15:05:08.29468Z",
	//   "updateTime": "2025-02-18T15:05:08.29468Z",
	//   "expireTime": "2025-02-19T15:05:08.280828Z",
	//   "usageMetadata": {
	//     "imageCount": 167,
	//     "textCount": 153,
	//     "totalTokenCount": 43125
	//   }
	// }

	return res.Name, nil
}

REST

Vertex AI API를 사용하여 게시자 모델 엔드포인트에 POST 요청을 보내면 REST를 사용하여 컨텍스트 캐시를 만들 수 있습니다. 다음 예시에서는 Cloud Storage 버킷에 저장된 파일을 사용하여 컨텍스트 캐시를 만드는 방법을 보여줍니다.

요청 데이터를 사용하기 전에 다음을 바꿉니다.

PROJECT_ID: 프로젝트 ID입니다.
LOCATION: 요청을 처리하고 캐시된 콘텐츠가 저장되는 리전입니다. 지원되는 리전 목록은 사용 가능한 리전을 참조하세요.
CACHE_DISPLAY_NAME: 각 컨텍스트 캐시를 설명하고 식별하는 데 도움이 되는 의미 있는 표시 이름입니다.
MIME_TYPE: 캐시할 콘텐츠의 MIME 유형입니다.
CONTENT_TO_CACHE_URI: 캐시할 콘텐츠의 Cloud Storage URI입니다.
MODEL_ID: 캐싱에 사용할 모델입니다.

HTTP 메서드 및 URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/cachedContents

JSON 요청 본문:

{
  "model": "projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID",
  "displayName": "CACHE_DISPLAY_NAME",
  "contents": [{
    "role": "user",
      "parts": [{
        "fileData": {
          "mimeType": "MIME_TYPE",
          "fileUri": "CONTENT_TO_CACHE_URI"
        }
      }]
  },
  {
    "role": "model",
      "parts": [{
        "text": "This is sample text to demonstrate explicit caching."
      }]
  }]
}

요청을 보내려면 다음 옵션 중 하나를 선택합니다.

curl

참고: 다음 명령어는 gcloud init 또는 gcloud auth login을 실행하거나 gcloud CLI에 자동으로 로그인하는 Cloud Shell을 사용하여 사용자 계정으로 gcloud CLI에 로그인했다고 가정합니다. gcloud auth list를 실행하면 현재 활성 계정을 확인할 수 있습니다.

요청 본문을 request.json 파일에 저장하고 다음 명령어를 실행합니다.

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/cachedContents"

PowerShell

참고: 다음 명령어는 gcloud init 또는 gcloud auth login을 실행하여 사용자 계정으로 gcloud CLI에 로그인했다고 가정합니다. gcloud auth list를 실행하면 현재 활성 계정을 확인할 수 있습니다.

요청 본문을 request.json 파일에 저장하고 다음 명령어를 실행합니다.

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/cachedContents" | Select-Object -Expand Content

다음과 비슷한 JSON 응답이 표시됩니다.

응답

{
  "name": "projects/PROJECT_NUMBER/locations/us-central1/cachedContents/CACHE_ID",
  "model": "projects/PROJECT_ID/locations/us-central1/publishers/google/models/gemini-2.0-flash-001",
  "createTime": "2024-06-04T01:11:50.808236Z",
  "updateTime": "2024-06-04T01:11:50.808236Z",
  "expireTime": "2024-06-04T02:11:50.794542Z"
}

curl 명령어 예시

LOCATION="us-central1"
MODEL_ID="gemini-2.0-flash-001"
PROJECT_ID="test-project"
MIME_TYPE="video/mp4"
CACHED_CONTENT_URI="gs://path-to-bucket/video-file-name.mp4"

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/cachedContents -d \
'{
  "model":"projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}",
  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "fileData": {
            "mimeType": "${MIME_TYPE}",
            "fileUri": "${CACHED_CONTENT_URI}"
          }
        }
      ]
    }
  ]
}'

CMEK로 컨텍스트 캐시 만들기

CMEK로 컨텍스트 캐싱을 구현하려면 안내에 따라 CMEK를 만들고 Vertex AI 제품별 프로젝트별 서비스 계정(P4SA)에 키에 필요한 Cloud KMS CryptoKey 암호화/복호화 권한이 있는지 확인합니다. 이렇게 하면 KMS 키를 반복적으로 지정하지 않고도 캐시된 콘텐츠를 안전하게 만들고 관리할 수 있으며 {List, Update, Delete, Get} CachedContent와 같은 다른 호출을 실행할 수 있습니다.

REST

요청 데이터를 사용하기 전에 다음을 바꿉니다.

PROJECT_ID: 프로젝트 ID입니다.
LOCATION: 요청을 처리하고 캐시된 콘텐츠가 저장되는 리전입니다. 지원되는 리전 목록은 사용 가능한 리전을 참조하세요.
MODEL_ID: gemini-2.0-flash-001.
CACHE_DISPLAY_NAME: 각 컨텍스트 캐시를 설명하고 식별하는 데 도움이 되는 의미 있는 표시 이름입니다.
MIME_TYPE: 캐시할 콘텐츠의 MIME 유형입니다.
CACHED_CONTENT_URI: 캐시할 콘텐츠의 Cloud Storage URI입니다.
KMS_KEY_NAME: Cloud KMS 키 이름입니다.

HTTP 메서드 및 URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/cachedContents

JSON 요청 본문:

{
  "model": "projects/PROJECT_ID/locations/LOCATION/publishers/google/models/gemini-2.0-flash-001",
  "displayName": "CACHE_DISPLAY_NAME",
  "contents": [{
    "role": "user",
      "parts": [{
        "fileData": {
          "mimeType": "MIME_TYPE",
          "fileUri": "CONTENT_TO_CACHE_URI"
        }
      }]}],
    "encryptionSpec": {
      "kmsKeyName": "KMS_KEY_NAME"
    }
}

요청을 보내려면 다음 옵션 중 하나를 선택합니다.

curl

요청 본문을 request.json 파일에 저장하고 다음 명령어를 실행합니다.

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/cachedContents"

PowerShell

요청 본문을 request.json 파일에 저장하고 다음 명령어를 실행합니다.

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/cachedContents" | Select-Object -Expand Content

다음과 비슷한 JSON 응답이 표시됩니다.

응답

{
  "name": "projects/PROJECT_NUMBER/locations/us-central1/cachedContents/CACHE_ID",
  "model": "projects/PROJECT_ID/locations/us-central1/publishers/google/models/gemini-2.0-flash-001",
  "createTime": "2024-06-04T01:11:50.808236Z",
  "updateTime": "2024-06-04T01:11:50.808236Z",
  "expireTime": "2024-06-04T02:11:50.794542Z"
}

curl 명령어 예시

LOCATION="us-central1"
MODEL_ID="gemini-2.0-flash-001"
PROJECT_ID="test-project"
MIME_TYPE="video/mp4"
CACHED_CONTENT_URI="gs://path-to-bucket/video-file-name.mp4"
KMS_KEY_NAME="projects/${PROJECT_ID}/locations/{LOCATION}/keyRings/your-key-ring/cryptoKeys/your-key"

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/cachedContents -d \
'{

"model": "projects/{PROJECT_ID}}/locations/{LOCATION}/publishers/google/models/{MODEL_ID}",
  "contents" : [
    {
      "role": "user",
      "parts": [
        {
          "file_data": {
            "mime_type":"{MIME_TYPE}",
            "file_uri":"{CACHED_CONTENT_URI}"
          }
        }
      ]
    }
  ],
  "encryption_spec" :
  {
    "kms_key_name":"{KMS_KEY_NAME}"
  }
}'

Python용 Gen AI SDK

설치

pip install --upgrade google-genai

자세한 내용은 SDK 참고 문서를 참조하세요.

Vertex AI에서 Gen AI SDK를 사용하도록 환경 변수를 설정합니다.

# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT
export GOOGLE_CLOUD_LOCATION=us-central1
export GOOGLE_GENAI_USE_VERTEXAI=True

import os
from google import genai
from google.genai.types import Content, CreateCachedContentConfig, HttpOptions, Part

os.environ['GOOGLE_CLOUD_PROJECT'] = 'vertexsdk'
os.environ['GOOGLE_CLOUD_LOCATION'] = 'us-central1'
os.environ['GOOGLE_GENAI_USE_VERTEXAI'] = 'True'
  
client = genai.Client(http_options=HttpOptions(api_version="v1"))

system_instruction = """
You are an expert researcher. You always stick to the facts in the sources provided, and never make up new facts.
Now look at these research papers, and answer the following questions.
"""

contents = [
    Content(
        role="user",
        parts=[
            Part.from_uri(
                file_uri="gs://cloud-samples-data/generative-ai/pdf/2312.11805v3.pdf",
                mime_type="application/pdf",
            ),
            Part.from_uri(
                file_uri="gs://cloud-samples-data/generative-ai/pdf/2403.05530.pdf",
                mime_type="application/pdf",
            ),
        ],
    )
]

content_cache = client.caches.create(
    model="gemini-2.0-flash-001",
    config=CreateCachedContentConfig(
        contents=contents,
        system_instruction=system_instruction,
        display_name="example-cache",
        kms_key_name="projects/vertexsdk/locations/us-central1/keyRings/your-project/cryptoKeys/your-key",
        ttl="86400s",
    ),
)

print(content_cache.name)
print(content_cache.usage_metadata)

Go용 Gen AI SDK

Go용 Gen AI SDK를 설치하거나 업데이트하는 방법을 알아보세요.

자세한 내용은 SDK 참고 문서를 참조하세요.

Vertex AI에서 Gen AI SDK를 사용하도록 환경 변수를 설정합니다.


import (
    "context"
    "encoding/json"
    "fmt"
    "io"

    genai "google.golang.org/genai"
)

// createContentCache shows how to create a content cache with an expiration parameter.
func createContentCache(w io.Writer) (string, error) {
    ctx := context.Background()

    client, err := genai.NewClient(ctx, &genai.ClientConfig{
        HTTPOptions: genai.HTTPOptions{APIVersion: "v1beta1"},
    })
    if err != nil {
        return "", fmt.Errorf("failed to create genai client: %w", err)
    }

    modelName := "gemini-2.0-flash-001"

    systemInstruction := "You are an expert researcher. You always stick to the facts " +
        "in the sources provided, and never make up new facts. " +
        "Now look at these research papers, and answer the following questions."

    cacheContents := []*genai.Content{
        {
            Parts: []*genai.Part{
                {FileData: &genai.FileData{
                    FileURI:  "gs://cloud-samples-data/generative-ai/pdf/2312.11805v3.pdf",
                    MIMEType: "application/pdf",
                }},
                {FileData: &genai.FileData{
                    FileURI:  "gs://cloud-samples-data/generative-ai/pdf/2403.05530.pdf",
                    MIMEType: "application/pdf",
                }},
            },
            Role: "user",
        },
    }
    config := &genai.CreateCachedContentConfig{
        Contents: cacheContents,
        SystemInstruction: &genai.Content{
            Parts: []*genai.Part{
                {Text: systemInstruction},
            },
        },
        DisplayName: "example-cache",
        KmsKeyName:  "projects/vertexsdk/locations/us-central1/keyRings/your-project/cryptoKeys/your-key",
        TTL:         "86400s",
    }

    res, err := client.Caches.Create(ctx, modelName, config)
    if err != nil {
        return "", fmt.Errorf("failed to create content cache: %w", err)
    }

    cachedContent, err := json.MarshalIndent(res, "", "  ")
    if err != nil {
        return "", fmt.Errorf("failed to marshal cache info: %w", err)
    }

    // See the documentation: https://pkg.go.dev/google.golang.org/genai#CachedContent
    fmt.Fprintln(w, string(cachedContent))

    return res.Name, nil
}

다음 단계

컨텍스트 캐시 사용 방법 알아보기
컨텍스트 캐시 만료 시간 업데이트 방법 알아보기

컨텍스트 캐시 만들기 컬렉션을 사용해 정리하기 내 환경설정을 기준으로 콘텐츠를 저장하고 분류하세요.

한도

위치 지원

암호화 키 지원

액세스 투명성 지원

컨텍스트 캐시 만들기 예시

Python

설치

Go

REST

curl

PowerShell

응답

curl 명령어 예시

CMEK로 컨텍스트 캐시 만들기

REST

curl

PowerShell

응답

curl 명령어 예시

Python용 Gen AI SDK

Go용 Gen AI SDK

다음 단계

컨텍스트 캐시 만들기