Authenticate

This page shows you how to authenticate with the Vertex AI Chat Completions API using the OpenAI Python library.

This page explains the following authentication topics:

The following diagram summarizes the overall workflow:

Install the OpenAI SDK

To use the OpenAI Python libraries, install the OpenAI SDK:

pip install openai

Authentication methods

You can authenticate with the Chat Completions API by either modifying your client setup or by changing your environment configuration to use Google authentication and a Vertex AI endpoint. The following table compares these methods to help you choose the one that best suits your use case.

Method Description Pros Cons Use Case
Client setup Programmatically configure the OpenAI client with Google credentials and the Vertex AI endpoint within your application code. Configuration is explicit and self-contained within your application code, and does not rely on external environment settings. Requires you to hardcode credentials and endpoint URLs or use a separate configuration management system. Applications where environment variables are not easily managed or when you need to control configuration entirely within the code.
Environment variables Set standard OpenAI environment variables (OPENAI_API_KEY, OPENAI_BASE_URL) that the library reads automatically. Keeps credentials and configuration separate from code. Easy to switch between environments (dev, prod). Requires managing environment variables on the host system, which can be complex in some deployment scenarios. Recommended for most applications, especially those deployed in containerized or cloud environments where setting environment variables is standard practice.

Some models, such as those in Model Garden and supported Hugging Face models, must be deployed to a Vertex AI endpoint before they can serve requests. When you call these self-deployed models from the Chat Completions API, you must specify the endpoint ID. To list your existing Vertex AI endpoints, use the gcloud ai endpoints list command.

Client setup

To get Google credentials programmatically in Python, use the google-auth Python SDK:

pip install google-auth requests

Python

Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Python API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

import openai

from google.auth import default
import google.auth.transport.requests

# TODO(developer): Update and un-comment below lines
# project_id = "PROJECT_ID"
# location = "us-central1"

# Programmatically get an access token
credentials, _ = default(scopes=["https://www.googleapis.com/auth/cloud-platform"])
credentials.refresh(google.auth.transport.requests.Request())
# Note: the credential lives for 1 hour by default (https://cloud.google.com/docs/authentication/token-types#at-lifetime); after expiration, it must be refreshed.

##############################
# Choose one of the following:
##############################

# If you are calling a Gemini model, set the ENDPOINT_ID variable to use openapi.
ENDPOINT_ID = "openapi"

# If you are calling a self-deployed model from Model Garden, set the
# ENDPOINT_ID variable and set the client's base URL to use your endpoint.
# ENDPOINT_ID = "YOUR_ENDPOINT_ID"

# OpenAI Client
client = openai.OpenAI(
    base_url=f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/{ENDPOINT_ID}",
    api_key=credentials.token,
)

By default, access tokens last for 1 hour. You can extend the life of your access token or periodically refresh your token and update the openai.api_key variable.

Environment variables

Install the Google Cloud CLI. The OpenAI library automatically reads the OPENAI_API_KEY and OPENAI_BASE_URL environment variables to configure the authentication and endpoint in its default client. Set the following variables:

$ export PROJECT_ID=PROJECT_ID
$ export LOCATION=LOCATION
$ export OPENAI_API_KEY="$(gcloud auth application-default print-access-token)"

To call a Gemini model, set the MODEL_ID variable and use the openapi endpoint:

$ export MODEL_ID=MODEL_ID
$ export OPENAI_BASE_URL="https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/openapi"

To call a self-deployed model from Model Garden, set the ENDPOINT variable and use that in your URL instead:

$ export ENDPOINT=ENDPOINT_ID
$ export OPENAI_BASE_URL="https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/${ENDPOINT}"

Next, initialize the client:

client = openai.OpenAI()

The Gemini Chat Completions API uses OAuth to authenticate with a short-lived access token. By default, access tokens last for 1 hour. You can extend the life of your access token or periodically refresh your token and update the OPENAI_API_KEY environment variable.

Refresh your credentials

Access tokens are short-lived and expire after one hour by default. To maintain authentication for sessions longer than one hour, you can create a utility that automatically refreshes your credentials. The following Python example shows you how to implement this utility.

Python

from typing import Any

import google.auth
import google.auth.transport.requests
import openai


class OpenAICredentialsRefresher:
    def __init__(self, **kwargs: Any) -> None:
        # Set a placeholder key here
        self.client = openai.OpenAI(**kwargs, api_key="PLACEHOLDER")
        self.creds, self.project = google.auth.default(
            scopes=["https://www.googleapis.com/auth/cloud-platform"]
        )

    def __getattr__(self, name: str) -> Any:
        if not self.creds.valid:
            self.creds.refresh(google.auth.transport.requests.Request())

            if not self.creds.valid:
                raise RuntimeError("Unable to refresh auth")

            self.client.api_key = self.creds.token
        return getattr(self.client, name)



    # TODO(developer): Update and un-comment below lines
    # project_id = "PROJECT_ID"
    # location = "us-central1"

    client = OpenAICredentialsRefresher(
        base_url=f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",
    )

    response = client.chat.completions.create(
        model="google/gemini-2.0-flash-001",
        messages=[{"role": "user", "content": "Why is the sky blue?"}],
    )

    print(response)

What's next