Examples

This guide provides examples of how to use the OpenAI-compatible Chat Completions API with Gemini models, covering the following topics:

You can call the Chat Completions API in two ways:

Method Description Use Case
Call a managed Gemini model Send requests to a Google-managed endpoint for a specific Gemini model. Best for general use cases, quick setup, and accessing the latest Google models without managing infrastructure.
Call a self-deployed model Send requests to an endpoint that you create by deploying a model on Vertex AI. Ideal when you need a dedicated endpoint for a fine-tuned model or require specific configurations not available on the default endpoint.

Call Gemini with the Chat Completions API

You can send requests as either non-streaming or streaming.

Request Type Description Pros Cons
Non-streaming The full response is generated and then sent back in a single chunk. Simpler to implement; the complete response is available at once. Higher perceived latency because the user waits for the entire response to be generated.
Streaming The response is sent back in small chunks as it's being generated. To enable streaming, set "stream": true in the request body. Lower perceived latency; provides a more interactive experience as the response appears incrementally. Requires more complex client-side logic to handle the incoming stream of data.

Send a non-streaming request

The following sample shows how to send a non-streaming request.

REST

  curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
  https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/openapi/chat/completions \
  -d '{
    "model": "google/${MODEL_ID}",
    "messages": [{
      "role": "user",
      "content": "Write a story about a magic backpack."
    }]
  }'
  

Python

Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Python API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

from google.auth import default
import google.auth.transport.requests

import openai

# TODO(developer): Update and un-comment below lines
# project_id = "PROJECT_ID"
# location = "us-central1"

# Programmatically get an access token
credentials, _ = default(scopes=["https://www.googleapis.com/auth/cloud-platform"])
credentials.refresh(google.auth.transport.requests.Request())

# OpenAI Client
client = openai.OpenAI(
    base_url=f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",
    api_key=credentials.token,
)

response = client.chat.completions.create(
    model="google/gemini-2.0-flash-001",
    messages=[{"role": "user", "content": "Why is the sky blue?"}],
)

print(response)

Send a streaming request

The following sample shows how to send a streaming request by setting "stream": true.

REST

  curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
  https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/openapi/chat/completions \
  -d '{
    "model": "google/${MODEL_ID}",
    "stream": true,
    "messages": [{
      "role": "user",
      "content": "Write a story about a magic backpack."
    }]
  }'
  

Python

Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Python API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

from google.auth import default
import google.auth.transport.requests

import openai

# TODO(developer): Update and un-comment below lines
# project_id = "PROJECT_ID"
# location = "us-central1"

# Programmatically get an access token
credentials, _ = default(scopes=["https://www.googleapis.com/auth/cloud-platform"])
credentials.refresh(google.auth.transport.requests.Request())

# OpenAI Client
client = openai.OpenAI(
    base_url=f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",
    api_key=credentials.token,
)

response = client.chat.completions.create(
    model="google/gemini-2.0-flash-001",
    messages=[{"role": "user", "content": "Why is the sky blue?"}],
    stream=True,
)
for chunk in response:
    print(chunk)

Send a prompt and an image to the Gemini API in Vertex AI

The following sample shows how to send a multimodal request that includes text and an image.

Python

Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Python API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.


from google.auth import default
import google.auth.transport.requests

import openai

# TODO(developer): Update and un-comment below lines
# project_id = "PROJECT_ID"
# location = "us-central1"

# Programmatically get an access token
credentials, _ = default(scopes=["https://www.googleapis.com/auth/cloud-platform"])
credentials.refresh(google.auth.transport.requests.Request())

# OpenAI Client
client = openai.OpenAI(
    base_url=f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",
    api_key=credentials.token,
)

response = client.chat.completions.create(
    model="google/gemini-2.0-flash-001",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe the following image:"},
                {
                    "type": "image_url",
                    "image_url": "gs://cloud-samples-data/generative-ai/image/scones.jpg",
                },
            ],
        }
    ],
)

print(response)

Call a self-deployed model with the Chat Completions API

Send a non-streaming request

The following sample shows how to send a non-streaming request to a self-deployed model.

REST

  curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
  https://aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/global/endpoints/${ENDPOINT}/chat/completions \
  -d '{
    "messages": [{
      "role": "user",
      "content": "Write a story about a magic backpack."
    }]
  }'

Python

Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Python API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

from google.auth import default
import google.auth.transport.requests

import openai

# TODO(developer): Update and un-comment below lines
# project_id = "PROJECT_ID"
# location = "us-central1"
# model_id = "gemma-2-9b-it"
# endpoint_id = "YOUR_ENDPOINT_ID"

# Programmatically get an access token
credentials, _ = default(scopes=["https://www.googleapis.com/auth/cloud-platform"])
credentials.refresh(google.auth.transport.requests.Request())

# OpenAI Client
client = openai.OpenAI(
    base_url=f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/{endpoint_id}",
    api_key=credentials.token,
)

response = client.chat.completions.create(
    model=model_id,
    messages=[{"role": "user", "content": "Why is the sky blue?"}],
)
print(response)

Send a streaming request

The following sample shows how to send a streaming request to a self-deployed model.

REST

    curl -X POST \
      -H "Authorization: Bearer $(gcloud auth print-access-token)" \
      -H "Content-Type: application/json" \
    https://aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/global/endpoints/${ENDPOINT}/chat/completions \
    -d '{
      "stream": true,
      "messages": [{
        "role": "user",
        "content": "Write a story about a magic backpack."
      }]
    }'
  

Python

Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Python API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

from google.auth import default
import google.auth.transport.requests

import openai

# TODO(developer): Update and un-comment below lines
# project_id = "PROJECT_ID"
# location = "us-central1"
# model_id = "gemma-2-9b-it"
# endpoint_id = "YOUR_ENDPOINT_ID"

# Programmatically get an access token
credentials, _ = default(scopes=["https://www.googleapis.com/auth/cloud-platform"])
credentials.refresh(google.auth.transport.requests.Request())

# OpenAI Client
client = openai.OpenAI(
    base_url=f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/{endpoint_id}",
    api_key=credentials.token,
)

response = client.chat.completions.create(
    model=model_id,
    messages=[{"role": "user", "content": "Why is the sky blue?"}],
    stream=True,
)
for chunk in response:
    print(chunk)

extra_body examples

You can use the extra_body field to pass Google-specific parameters in your request.

  • REST API: To pass parameters using the REST API, add them within a google object.

    {
      ...,
      "extra_body": {
         "google": {
           ...,
           "thought_tag_marker": "..."
         }
       }
    }
    
  • Python SDK: To pass parameters using the Python SDK, provide them in a dictionary to the extra_body argument.

    client.chat.completions.create(
      ...,
      extra_body = {
        'extra_body': { 'google': { ... } }
      },
    )
    

extra_content examples

You can use the extra_content field with the REST API to add extra information to messages or tool calls.

  • With a string content field

    {
      "messages": [
        { "role": "...", "content": "...", "extra_content": { "google": { ... } } }
      ]
    }
    
  • Per message in a multipart content field

    {
      "messages": [
        {
          "role": "...",
          "content": [
            { "type": "...", ..., "extra_content": { "google": { ... } } }
          ]
        }
    }
    
  • Per tool call

    {
      "messages": [
        {
          "role": "...",
          "tool_calls": [
            {
              ...,
              "extra_content": { "google": { ... } }
            }
          ]
        }
      ]
    }
    

Sample curl requests

You can use these curl requests to interact with the API directly, without using an SDK.

Use thinking_config with extra_body

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://us-central1-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/us-central1/endpoints/openapi/chat/completions \
  -d '{ \
    "model": "google/gemini-2.5-flash-preview-04-17", \
    "messages": [ \
      { "role": "user", \
      "content": [ \
        { "type": "text", \
          "text": "Are there any primes number of the form n*ceil(log(n))" \
        }] }], \
    "extra_body": { \
      "google": { \
          "thinking_config": { \
          "include_thoughts": true, "thinking_budget": 10000 \
        }, \
        "thought_tag_marker": "think" } }, \
    "stream": true }'

Multimodal requests

The Chat Completions API supports a variety of multimodal input, including audio and video.

Pass image data with image_url

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://us-central1-aiplatform.googleapis.com/v1/projects/${PROJECT}/locations/us-central1/endpoints/openapi/chat/completions \
  -d '{ \
    "model": "google/gemini-2.0-flash-001", \
    "messages": [{ "role": "user", "content": [ \
      { "type": "text", "text": "Describe this image" }, \
      { "type": "image_url", "image_url": "gs://cloud-samples-data/generative-ai/image/scones.jpg" }] }] }'

Pass audio data with input_audio

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://us-central1-aiplatform.googleapis.com/v1/projects/${PROJECT}/locations/us-central1/endpoints/openapi/chat/completions \
  -d '{ \
    "model": "google/gemini-2.0-flash-001", \
    "messages": [ \
      { "role": "user", \
        "content": [ \
          { "type": "text", "text": "Describe this: " }, \
          { "type": "input_audio", "input_audio": { \
            "format": "audio/mp3", \
            "data": "gs://cloud-samples-data/generative-ai/audio/pixel.mp3" } }] }] }'

Structured output

You can use the response_format parameter to request structured JSON output from the model.

Example with the Python SDK

from pydantic import BaseModel
from openai import OpenAI

client = OpenAI()

class CalendarEvent(BaseModel):
    name: str
    date: str
    participants: list[str]

completion = client.beta.chat.completions.parse(
    model="google/gemini-2.5-flash-preview-04-17",
    messages=[
        {"role": "system", "content": "Extract the event information."},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday."},
    ],
    response_format=CalendarEvent,
)

print(completion.choices[0].message.parsed)

What's next