Choose an endpoint type

To deploy a model for online prediction, you need an endpoint. Endpoints can be divided into the following types:

  • Public endpoints can be accessed over the public internet. They are easier to use, because no private network infrastructure is required. There are two types of public endpoints: dedicated and shared. A dedicated public endpoint is a faster endpoint providing production isolation, support for larger payload sizes, and longer request timeouts than a shared public endpoint. Also, when you send a prediction request to a dedicated public endpoint, it is isolated from other users' traffic. For these reasons, dedicated public endpoints are recommended as a best practice.

  • Private Service Connect endpoints provide a secure connection for private communication between on-premises and Google Cloud. They can be used to control Google API traffic through the use of Private Service Connect APIs. They are recommended as a best practice.

  • Private endpoints also provide a secure connection to your model and can also be used for private communication between on-premises and Google Cloud. They use private services access over a VPC Network Peering connection.

For more information about deploying a model to an endpoint, see Deploy a model to an endpoint.

The following table compares the supported endpoint types for serving Vertex AI online predictions.

Dedicated public endpoint (recommended) Shared public endpoint Private Service Connect endpoint (recommended) Private endpoint
Purpose Default networking experience. Enables submitting requests from public internet (if VPC Service Controls isn't enabled). Default networking experience. Enables submitting requests from public internet (if VPC Service Controls isn't enabled). Recommended for production enterprise applications. Improves network latency and security by ensuring requests and responses are routed privately. Recommended for production enterprise applications. Improves network latency and security by ensuring requests and responses are routed privately.
Inbound networking Public internet using dedicated networking plane Public internet using shared networking plane Private networking using Private Service Connect endpoint Private networking using Private services access (VPC Network Peering)
Outbound networking Public internet Public internet Not supported Private networking using Private services access (VPC Network Peering)
VPC Service Controls Not supported. Use a Private Service Connect endpoint instead. Supported Supported Supported
Cost Vertex AI Prediction Vertex AI Prediction Vertex AI Prediction + Private Service Connect endpoint Vertex AI Prediction + Private services access (see: "Using a Private Service Connect endpoint (forwarding rule) to access a published service")
Network latency Optimized Unoptimized Optimized Optimized
Encryption in transit TLS with CA-signed certificate TLS with CA-signed certificate Optional TLS with self-signed certificate None
Inference timeout Configurable up to 1 hour 60 seconds Configurable up to 1 hour 60 seconds
Payload size limit 10 MB 1.5 MB 10 MB 10 MB
QPM quota Unlimited 30,000 Unlimited Unlimited
Protocol support HTTP or gRPC HTTP HTTP or gRPC HTTP
Streaming support Yes (SSE) No Yes (SSE) No
Traffic split Yes Yes Yes No
Request and response logging Yes Yes Yes No
Access logging Yes Yes Yes No
Tuned Gemini model deployment No Yes No No
AutoML models and explainability No Yes No No
Client libraries supported Vertex AI SDK for Python Vertex AI client libraries, Vertex AI SDK for Python Vertex AI SDK for Python Vertex AI SDK for Python

What's next