Using OpenAI libraries with Vertex AI

The Chat Completions API works as an Open AI-compatible endpoint, designed to make it easier to interface with Gemini on Vertex AI by using the OpenAI libraries for Python and REST. If you're already using the OpenAI libraries, you can use this API as a low-cost way to switch between calling OpenAI models and Vertex AI hosted models to compare output, cost, and scalability, without changing your existing code. This helps ensure compatibility across providers and consistency with community standards. If you aren't already using the OpenAI libraries, we recommend that you use the Google Gen AI SDK.

Supported models

The Chat Completions API supports both Gemini models and select self-deployed models from Model Garden.

Gemini models

The following models provide support for the Chat Completions API:

Self-deployed models from Model Garden

The Hugging Face Text Generation Interface (HF TGI) and Vertex AI Model Garden prebuilt vLLM containers support the Chat Completions API. However, not every model deployed to these containers supports the Chat Completions API. The following table includes the most popular supported models by container:

HF TGI

vLLM

Supported parameters

For Google models, the Chat Completions API supports the following OpenAI parameters. For a description of each parameter, see OpenAI's documentation on Creating chat completions. Parameter support for third-party models varies by model. To see which parameters are supported, consult the model's documentation.

messages
  • System message
  • User message: The text and image_url types are supported. The image_url type supports images stored a Cloud Storage URI or a base 64 encoding in the form "data:<MIME-TYPE>;base64,<BASE64-ENCODED-BYTES>". To learn how to create a Cloud Storage bucket and upload a file to it, see Discover object storage. The detail option is not supported.
  • Assistant message
  • Tool message
  • Function message: This field is deprecated, but supported for backwards compatibility.
model
max_tokens
n
frequency_penalty
presence_penalty
response_format
  • json_object: Interpreted as passing "application/json" to the Gemini API.
  • text: Interpreted as passing "text/plain" to the Gemini API.
  • Any other MIME type is passed as is to the model, such as passing "application/json" directly.
stop
stream
temperature
top_p
tools
  • type
  • function
    • name
    • description
    • parameters: Specify parameters by using the OpenAPI specification. This differs from the OpenAI parameters field, which is described as a JSON Schema object. To learn about keyword differences between OpenAPI and JSON Schema, see the OpenAPI guide.
tool_choice
  • none
  • auto
  • required: Corresponds to the mode ANY in the FunctionCallingConfig.
function_call This field is deprecated, but supported for backwards compatibility.
functions This field is deprecated, but supported for backwards compatibility.

If you pass any unsupported parameter, it is ignored.

Gemini-specific parameters

There are several features supported by Gemini that are not available in OpenAI models. These features can still be passed in as parameters, but must be contained within an extra_content or extra_body or they will be ignored.

extra_body features

safety_settings This corresponds to Gemini's SafetySetting.
cached_content This corresponds to Gemini's GenerateContentRequest.cached_content.
thought_tag_marker Used to separate a model's thoughts from its responses for models with Thinking available.
If not specified, no tags will be returned around the model's thoughts. If present, subsequent queries will strip the thought tags and mark the thoughts appropriately for context. This helps preserve the appropriate context for subsequent queries.

What's next