Call Gemini with the Chat Completions API
The following sample shows you how to send non-streaming requests:
REST
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/openapi/chat/completions \ -d '{ "model": "google/${MODEL_ID}", "messages": [{ "role": "user", "content": "Write a story about a magic backpack." }] }'
Python
Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Python API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
The following sample shows you how to send streaming requests to a Gemini model by using the Chat Completions API:
REST
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/openapi/chat/completions \ -d '{ "model": "google/${MODEL_ID}", "stream": true, "messages": [{ "role": "user", "content": "Write a story about a magic backpack." }] }'
Python
Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Python API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Call a self-deployed model with the Chat Completions API
The following sample shows you how to send non-streaming requests:
REST
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/global/endpoints/${ENDPOINT}/chat/completions \ -d '{ "messages": [{ "role": "user", "content": "Write a story about a magic backpack." }] }'
Python
Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Python API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
The following sample shows you how to send streaming requests to a self-deployed model by using the Chat Completions API:
REST
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/global/endpoints/${ENDPOINT}/chat/completions \ -d '{ "stream": true, "messages": [{ "role": "user", "content": "Write a story about a magic backpack." }] }'
Python
Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Python API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
extra_body
examples
You can use either the SDK or the REST API to pass in extra_body
.
Add thought_tag_marker
{
...,
"extra_body": {
"google": {
...,
"thought_tag_marker": "..."
}
}
}
Add extra_body
using the SDK
client.chat.completions.create(
...,
extra_body = {
'extra_body': { 'google': { ... } }
},
)
extra_content
examples
You can populate this field by using the REST API directly.
extra_content
with string content
{
"messages": [
{ "role": "...", "content": "...", "extra_content": { "google": { ... } } }
]
}
Per-message extra_content
{
"messages": [
{
"role": "...",
"content": [
{ "type": "...", ..., "extra_content": { "google": { ... } } }
]
}
}
Per-tool call extra_content
{
"messages": [
{
"role": "...",
"tool_calls": [
{
...,
"extra_content": { "google": { ... } }
}
]
}
]
}
Sample curl
requests
You can use these curl
requests directly, rather than going through the SDK.
Multimodal requests
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/global/endpoints/openapi/chat/completions \
-d '{ \
"model": "google/gemini-2.0-flash-001", \
"messages": [ \
{ "role": "user", \
"content": [ \
{ "type": "text", "text": "Summarize the main points of this audio file: " }, \
{ "type": "input_audio", "input_audio": { \
"format": "audio/mp3", \
"data": "gs://cloud-samples-data/generative-ai/audio/pixel.mp3" } }] }] }'
Structured output
You can use the response_format
parameter to get structured output.
Example using SDK
from pydantic import BaseModel
from openai import OpenAI
client = OpenAI()
class CalendarEvent(BaseModel):
name: str
date: str
participants: list[str]
completion = client.beta.chat.completions.parse(
model="google/gemini-2.5-flash-preview-04-17",
messages=[
{"role": "system", "content": "Extract the event information."},
{"role": "user", "content": "Alice and Bob are going to a science fair on Friday."},
],
response_format=CalendarEvent,
)
print(completion.choices[0].message.parsed)
What's next
- See examples of calling the Inference API with the OpenAI-compatible syntax.
- See examples of calling the Function Calling API with OpenAI-compatible syntax.
- Learn more about the Gemini API.
- Learn more about migrating from Azure OpenAI to the Gemini API.