Deployments

Google and Partner models and generative AI features on Vertex AI are available for specific regions and a global endpoint. Global endpoints cover the entire world and provide higher availability and reliability than single regions.

For information on data residency, see Data residency.

Global endpoint

Selecting a global endpoint for your requests can improve overall availability while reducing resource exhausted (429) errors. Don't use the global endpoint if you have ML processing requirements, because you can't control or know which region your ML processing requests are sent to when a request is made.

Supported models

Usage of the global endpoint is supported for the following models:

Use the global endpoint

To use the global endpoint, exclude the location from the endpoint name and configure the location of the resource to global. For example, the following is global endpoint URL:

https://aiplatform.googleapis.com/v1/projects/test-project/locations/global/publishers/google/models/gemini-2.0-flash-001:generateContent

For the Google Gen AI SDK, create a client that uses the global location:

client = genai.Client(
    vertexai=True, project='your-project-id', location='global'
)

Limitations

The following capabilities are not available when using the global endpoint:

  • Tuning
  • Batch prediction
  • Context caching
  • Retrieval-augmented generation (RAG) corpus (RAG requests are supported)
  • VPC Service Controls
  • Provisioned Throughput

What's next