A dedicated public endpoint is a public endpoint for online prediction. It offers the following benefits:
- Dedicated networking: When you send a prediction request to a dedicated public endpoint, it is isolated from other users' traffic.
- Optimized network latency
- Larger payload support: Up to 10 MB.
- Longer request timeouts: Configurable up to 1 hour.
- Generative AI-ready: Streaming and gRPC are supported. Inference timeout is configurable up to 1 hour.
For these reasons, dedicated public endpoints are recommended as a best practice for serving Vertex AI online predictions.
To learn more, see Choose an endpoint type.
Create a dedicated public endpoint and deploy a model to it
You can create a dedicated endpoint and deploy a model to it by using the Google Cloud console. For details, see Deploy a model by using the Google Cloud console.
You can also create a dedicated public endpoint and deploy a model to it by using the Vertex AI API as follows:
- Create a dedicated public endpoint. When creating the endpoint, select the Enable dedicated DNS checkbox.
- Deploy the model by using the Vertex AI API.
Get online predictions from a dedicated public endpoint
You can send online prediction requests to a dedicated public endpoint by using the Vertex AI SDK for Python. For details, see Send an online prediction request to a dedicated public endpoint.
Tutorial
Limitations
- VPC Service Controls isn't supported. Use a Private Service Connect endpoint instead.
What's next
- Learn about Vertex AI online prediction endpoint types.