This guide provides examples of how to use the OpenAI-compatible Chat Completions API with Gemini models, covering the following topics:
- Call Gemini with the Chat Completions API: Shows how to send streaming and non-streaming requests to managed Gemini models.
- Call a self-deployed model with the Chat Completions API: Provides examples for sending requests to your own deployed models.
extra_body
examples: Explains how to pass additional Google-specific parameters in your requests.extra_content
examples: Demonstrates how to add extra content to messages or tool calls.- Sample
curl
requests: Offers directcurl
examples for advanced use cases like multimodal input. - Structured output: Shows how to request structured JSON output from the model.
You can call the Chat Completions API in two ways:
Method | Description | Use Case |
---|---|---|
Call a managed Gemini model | Send requests to a Google-managed endpoint for a specific Gemini model. | Best for general use cases, quick setup, and accessing the latest Google models without managing infrastructure. |
Call a self-deployed model | Send requests to an endpoint that you create by deploying a model on Vertex AI. | Ideal when you need a dedicated endpoint for a fine-tuned model or require specific configurations not available on the default endpoint. |
Call Gemini with the Chat Completions API
You can send requests as either non-streaming or streaming.
Request Type | Description | Pros | Cons |
---|---|---|---|
Non-streaming | The full response is generated and then sent back in a single chunk. | Simpler to implement; the complete response is available at once. | Higher perceived latency because the user waits for the entire response to be generated. |
Streaming | The response is sent back in small chunks as it's being generated. To enable streaming, set "stream": true in the request body. |
Lower perceived latency; provides a more interactive experience as the response appears incrementally. | Requires more complex client-side logic to handle the incoming stream of data. |
Send a non-streaming request
The following sample shows how to send a non-streaming request.
REST
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/openapi/chat/completions \ -d '{ "model": "google/${MODEL_ID}", "messages": [{ "role": "user", "content": "Write a story about a magic backpack." }] }'
Python
Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Python API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Send a streaming request
The following sample shows how to send a streaming request by setting "stream": true
.
REST
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/openapi/chat/completions \ -d '{ "model": "google/${MODEL_ID}", "stream": true, "messages": [{ "role": "user", "content": "Write a story about a magic backpack." }] }'
Python
Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Python API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Send a prompt and an image to the Gemini API in Vertex AI
The following sample shows how to send a multimodal request that includes text and an image.
Python
Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Python API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Call a self-deployed model with the Chat Completions API
Send a non-streaming request
The following sample shows how to send a non-streaming request to a self-deployed model.
REST
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/global/endpoints/${ENDPOINT}/chat/completions \ -d '{ "messages": [{ "role": "user", "content": "Write a story about a magic backpack." }] }'
Python
Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Python API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Send a streaming request
The following sample shows how to send a streaming request to a self-deployed model.
REST
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/global/endpoints/${ENDPOINT}/chat/completions \ -d '{ "stream": true, "messages": [{ "role": "user", "content": "Write a story about a magic backpack." }] }'
Python
Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Python API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
extra_body
examples
You can use the extra_body
field to pass Google-specific parameters in your request.
REST API: To pass parameters using the REST API, add them within a
google
object.{ ..., "extra_body": { "google": { ..., "thought_tag_marker": "..." } } }
Python SDK: To pass parameters using the Python SDK, provide them in a dictionary to the
extra_body
argument.client.chat.completions.create( ..., extra_body = { 'extra_body': { 'google': { ... } } }, )
extra_content
examples
You can use the extra_content
field with the REST API to add extra information to messages or tool calls.
With a string
content
field{ "messages": [ { "role": "...", "content": "...", "extra_content": { "google": { ... } } } ] }
Per message in a multipart
content
field{ "messages": [ { "role": "...", "content": [ { "type": "...", ..., "extra_content": { "google": { ... } } } ] } }
Per tool call
{ "messages": [ { "role": "...", "tool_calls": [ { ..., "extra_content": { "google": { ... } } } ] } ] }
Sample curl
requests
You can use these curl
requests to interact with the API directly, without using an SDK.
Use thinking_config
with extra_body
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://us-central1-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/us-central1/endpoints/openapi/chat/completions \
-d '{ \
"model": "google/gemini-2.5-flash-preview-04-17", \
"messages": [ \
{ "role": "user", \
"content": [ \
{ "type": "text", \
"text": "Are there any primes number of the form n*ceil(log(n))" \
}] }], \
"extra_body": { \
"google": { \
"thinking_config": { \
"include_thoughts": true, "thinking_budget": 10000 \
}, \
"thought_tag_marker": "think" } }, \
"stream": true }'
Multimodal requests
The Chat Completions API supports a variety of multimodal input, including audio and video.
Pass image data with image_url
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://us-central1-aiplatform.googleapis.com/v1/projects/${PROJECT}/locations/us-central1/endpoints/openapi/chat/completions \
-d '{ \
"model": "google/gemini-2.0-flash-001", \
"messages": [{ "role": "user", "content": [ \
{ "type": "text", "text": "Describe this image" }, \
{ "type": "image_url", "image_url": "gs://cloud-samples-data/generative-ai/image/scones.jpg" }] }] }'
Pass audio data with input_audio
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://us-central1-aiplatform.googleapis.com/v1/projects/${PROJECT}/locations/us-central1/endpoints/openapi/chat/completions \
-d '{ \
"model": "google/gemini-2.0-flash-001", \
"messages": [ \
{ "role": "user", \
"content": [ \
{ "type": "text", "text": "Describe this: " }, \
{ "type": "input_audio", "input_audio": { \
"format": "audio/mp3", \
"data": "gs://cloud-samples-data/generative-ai/audio/pixel.mp3" } }] }] }'
Structured output
You can use the response_format
parameter to request structured JSON output from the model.
Example with the Python SDK
from pydantic import BaseModel
from openai import OpenAI
client = OpenAI()
class CalendarEvent(BaseModel):
name: str
date: str
participants: list[str]
completion = client.beta.chat.completions.parse(
model="google/gemini-2.5-flash-preview-04-17",
messages=[
{"role": "system", "content": "Extract the event information."},
{"role": "user", "content": "Alice and Bob are going to a science fair on Friday."},
],
response_format=CalendarEvent,
)
print(completion.choices[0].message.parsed)
What's next
- Learn how to call the Inference API with the OpenAI-compatible syntax.
- Learn how to call the Function Calling API with OpenAI-compatible syntax.
- Learn more about the Gemini API.
- Learn more about migrating from Azure OpenAI to the Gemini API.