For information on data residency, see Data residency.
Global endpoint
Selecting a global endpoint for your requests can improve overall availability while reducing resource exhausted (429) errors. Don't use the global endpoint if you have ML processing requirements, because you can't control or know which region your ML processing requests are sent to when a request is made.
Supported models
Usage of the global endpoint is supported for the following models:
Use the global endpoint
To use the global endpoint, exclude the location from the endpoint name and
configure the location of the resource to global
. For example, the following
is global endpoint URL:
https://aiplatform.googleapis.com/v1/projects/test-project/locations/global/publishers/google/models/gemini-2.0-flash-001:generateContent
For the Google Gen AI SDK, create a client that uses the global
location:
client = genai.Client(
vertexai=True, project='your-project-id', location='global'
)
Limitations
The following capabilities are not available when using the global endpoint:
- Tuning
- Batch prediction
- Context caching
- Retrieval-augmented generation (RAG) corpus (RAG requests are supported)
- VPC Service Controls
- Provisioned Throughput
What's next
- For a notebook tutorial that demonstrates the global endpoint, see Intro to the Vertex AI global endpoint.
- Learn more about Generative AI on Vertex AI data residency.
- Learn about Google Cloud regions.
- Learn more about security controls by feature.
- Learn about the models that provide Generative AI on Vertex AI support. See Generative AI foundational model reference.
- Learn about Vertex AI locations.