Starting April 29, 2025, Gemini 1.5 Pro and Gemini 1.5 Flash models are not available in projects that have no prior usage of these models, including new projects. For details, see Model versions and lifecycle.
Stay organized with collections
Save and categorize content based on your preferences.
This document shows how to use the OpenAI-compatible Chat Completions API to interact with Vertex AI models. This document covers the following topics:
Supported models: Learn which Gemini and self-deployed Model Garden models are compatible with the API.
Supported parameters: Review the list of standard OpenAI parameters that you can use.
Gemini-specific parameters: Discover how to use Gemini-specific features through the extra_body and extra_part fields.
The Chat Completions API is an OpenAI-compatible endpoint that lets you use OpenAI Python and REST libraries to interact with Gemini on Vertex AI. If you already use the OpenAI libraries, this API offers a way to switch between OpenAI models and Vertex AI hosted models to compare output, cost, and scalability with minimal changes to your existing code. If you don't use the OpenAI libraries, we recommend using the Google Gen AI SDK.
Supported models
The Chat Completions API supports both Gemini models and select self-deployed models from Model Garden.
Gemini models
The Chat Completions API supports the following Gemini models:
For Google models, the Chat Completions API supports the following OpenAI
parameters. For a description of each parameter, see OpenAI's documentation on
Creating chat completions.
Parameter support for third-party models varies by model. To see which parameters
are supported, consult the model's documentation.
messages
System message
User message: The text and
image_url types are supported. The
image_url type supports images stored a
Cloud Storage URI or a base64 encoding in the form
"data:<MIME-TYPE>;base64,<BASE64-ENCODED-BYTES>". To
learn how to create a Cloud Storage bucket and upload a file to it,
see
Discover object storage.
The detail option is not supported.
Assistant message
Tool message
Function message: This field is deprecated, but supported for backwards compatibility.
model
max_completion_tokens
Alias for max_tokens.
max_tokens
n
frequency_penalty
presence_penalty
reasoning_effort
Configures how much time and how many tokens are used on a response.
low: 1024
medium: 8192
high: 24576
As no thoughts are included in the response, only one of
reasoning_effort or extra_body.google.thinking_config
may be specified.
response_format
json_object: Interpreted as passing "application/json" to the
Gemini API.
text: Interpreted as passing "text/plain" to the Gemini
API.
Any other MIME type is passed as is to the model, such as passing
"application/json" directly.
seed
Corresponds to GenerationConfig.seed.
stop
stream
temperature
top_p
tools
type
function
name
description
parameters: Specify parameters by using the
OpenAPI specification.
This differs from the OpenAI parameters field, which is
described as a JSON Schema object. To learn about keyword
differences between OpenAPI and JSON Schema, see the
OpenAPI guide.
tool_choice
none
auto
required: Corresponds to the mode ANY in the
FunctionCallingConfig.
validated: Corresponds to the mode VALIDATED
in the FunctionCallingConfig. This is Google-specific.
web_search_options
Corresponds to the GoogleSearch tool. No sub-options are
supported.
function_call
This field is deprecated, but supported for backwards
compatibility.
functions
This field is deprecated, but supported for backwards
compatibility.
If you pass any unsupported parameter, it is ignored.
Multimodal input parameters
The Chat Completions API supports select multimodal inputs.
input_audio
data: Any URI or valid blob format. We support all blob types,
including image, audio, and video. Anything supported by
GenerateContent is supported (HTTP, Cloud Storage, etc.).
format: OpenAI supports both wav (audio/wav)
and mp3 (audio/mp3). Using Gemini, all valid MIME
types are supported.
image_url
data: Like input_audio, any URI or valid blob
format is supported.
Note that image_url as a URL will default to the image/* MIME-type
and image_url as blob data can be used as any multimodal input.
detail: Similar to
media resolution,
this determines the maximum tokens per image for the request. Note that while
OpenAI's field is per-image, Gemini enforces the same detail across
the request, and passing multiple detail types in one request will throw
an error.
In general, the data parameter can be a URI or a combination of MIME type and
base64 encoded bytes in the form "data:<MIME-TYPE>;base64,<BASE64-ENCODED-BYTES>".
For a full list of MIME types, see GenerateContent.
For more information on OpenAI's base64 encoding, see their documentation.
To use features that are supported by Gemini but not by OpenAI models, pass them as parameters within an extra_content or extra_body field. If you pass these features outside of these fields, they are ignored.
extra_body features
To use Gemini-specific extra_body features, include them in a google field.
{...,"extra_body":{"google":{...,// Add extra_body features here.}}}
Used to separate a model's thoughts from its responses for models with Thinking available.
If not specified, no tags will be returned around the model's thoughts. If present, subsequent queries
will strip the thought tags and mark the thoughts appropriately for context. This helps
preserve the appropriate context for subsequent queries.
extra_part features
The extra_part field lets you specify additional settings for each Part. To use Gemini-specific extra_part features, include them in a google field.
{...,"extra_part":{"google":{...,// Add extra_part features here.}}}
extra_content
A field for adding Gemini-specific content that shouldn't be
ignored.
thought
This will explicitly mark if a field is a thought (and take precedence over
thought_tag_marker). This should be used to specify whether a tool call
is part of a thought or not.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-27 UTC."],[],[],null,["# Using OpenAI libraries with Vertex AI\n\n| To see an example of using the Chat Completions API,\n| run the \"Call Gemini with the OpenAI Library\" notebook in one of the following\n| environments:\n|\n| [Open in Colab](https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/gemini/chat-completions/intro_chat_completions_api.ipynb)\n|\n|\n| \\|\n|\n| [Open in Colab Enterprise](https://console.cloud.google.com/vertex-ai/colab/import/https%3A%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fgemini%2Fchat-completions%2Fintro_chat_completions_api.ipynb)\n|\n|\n| \\|\n|\n| [Open\n| in Vertex AI Workbench](https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https%3A%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fgemini%2Fchat-completions%2Fintro_chat_completions_api.ipynb)\n|\n|\n| \\|\n|\n| [View on GitHub](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/chat-completions/intro_chat_completions_api.ipynb)\n\nThe Chat Completions API works as an Open AI-compatible endpoint, designed to\nmake it easier to interface with Gemini on Vertex AI by\nusing the OpenAI libraries for Python and REST. If you're already using the\nOpenAI libraries, you can use this API as a low-cost way to switch between\ncalling OpenAI models and Vertex AI hosted models to compare\noutput, cost, and scalability, without changing your existing code.\nIf you aren't already using the OpenAI libraries, we recommend that you\n[use the Google Gen AI SDK](/vertex-ai/generative-ai/docs/start/quickstarts/quickstart-multimodal).\n\nSupported models\n----------------\n\nThe Chat Completions API supports both Gemini models and select\nself-deployed models from Model Garden.\n\n### Gemini models\n\nThe following models provide support for the Chat Completions API:\n\n- [Gemini 2.5 Pro](/vertex-ai/generative-ai/docs/models/gemini/2-5-pro)\n- [Gemini 2.5 Flash](/vertex-ai/generative-ai/docs/models/gemini/2-5-flash)\n- [Gemini 2.0 Flash](/vertex-ai/generative-ai/docs/models/gemini/2-0-flash)\n- [Gemini 2.0 Flash-Lite](/vertex-ai/generative-ai/docs/models/gemini/2-0-flash-lite)\n\n### Self-deployed models from Model Garden\n\nThe\n[Hugging Face Text Generation Interface (HF TGI)](https://huggingface.co/docs/text-generation-inference/en/index)\nand\n[Vertex AI Model Garden prebuilt vLLM](http://us-docker.pkg.dev/vertex-ai/vertex-vision-model-garden-dockers/pytorch-vllm-serve)\ncontainers support the Chat Completions API. However,\nnot every model deployed to these containers supports the Chat Completions API.\nThe following table includes the most popular supported models by container:\n\nSupported parameters\n--------------------\n\nFor Google models, the Chat Completions API supports the following OpenAI\nparameters. For a description of each parameter, see OpenAI's documentation on\n[Creating chat completions](https://platform.openai.com/docs/api-reference/chat/create).\nParameter support for third-party models varies by model. To see which parameters\nare supported, consult the model's documentation.\n\nIf you pass any unsupported parameter, it is ignored.\n\n### Multimodal input parameters\n\nThe Chat Completions API supports select multimodal inputs.\n\nIn general, the `data` parameter can be a URI or a combination of MIME type and\nbase64 encoded bytes in the form `\"data:\u003cMIME-TYPE\u003e;base64,\u003cBASE64-ENCODED-BYTES\u003e\"`.\nFor a full list of MIME types, see [`GenerateContent`](/vertex-ai/generative-ai/docs/model-reference/inference#blob).\nFor more information on OpenAI's base64 encoding, see [their documentation](https://platform.openai.com/docs/guides/images-vision#giving-a-model-images-as-input).\n\nFor usage, see our [multimodal input examples](/vertex-ai/generative-ai/docs/migrate/openai/examples#multimodal_input_examples).\n\n### Gemini-specific parameters\n\nThere are several features supported by Gemini that are not available in OpenAI models.\nThese features can still be passed in as parameters, but must be contained within an\n`extra_content` or `extra_body` or they will be ignored.\n\n### `extra_body` features\n\nInclude a `google` field to contain any Gemini-specific\n`extra_body` features. \n\n {\n ...,\n \"extra_body\": {\n \"google\": {\n ...,\n // Add extra_body features here.\n }\n }\n }\n\n### `extra_part` features\n\n`extra_part` lets you specify additional settings at a per-`Part` level.\n\nInclude a `google` field to contain any Gemini-specific\n`extra_part` features. \n\n {\n ...,\n \"extra_part\": {\n \"google\": {\n ...,\n // Add extra_part features here.\n }\n }\n }\n\nWhat's next\n-----------\n\n- Learn more about [authentication and credentialing](/vertex-ai/generative-ai/docs/migrate/openai/auth-and-credentials) with the OpenAI-compatible syntax.\n- See examples of calling the [Chat Completions API](/vertex-ai/generative-ai/docs/migrate/openai/examples) with the OpenAI-compatible syntax.\n- See examples of calling the [Inference API](/vertex-ai/generative-ai/docs/model-reference/inference#examples) with the OpenAI-compatible syntax.\n- See examples of calling the [Function Calling API](/vertex-ai/generative-ai/docs/model-reference/function-calling#examples) with OpenAI-compatible syntax.\n- Learn more about the [Gemini API](/vertex-ai/generative-ai/docs/overview).\n- Learn more about [migrating from Azure OpenAI to the Gemini API](/vertex-ai/generative-ai/docs/migrate/migrate-from-azure-to-gemini)."]]