Choose an endpoint type

The following table compares the supported endpoint types for serving Vertex AI online predictions and indicates which types are recommended as best practices.

Dedicated public endpoint (recommended) Public endpoint Private Service Connect endpoint (recommended) Private endpoint
Purpose Default networking experience. Enables submitting requests from public internet (if VPC Service Controls isn't enabled). Default networking experience. Enables submitting requests from public internet (if VPC Service Controls isn't enabled). Recommended for production enterprise applications. Improves network latency and security by ensuring requests and responses are routed privately. Recommended for production enterprise applications. Improves network latency and security by ensuring requests and responses are routed privately.
Inbound networking Public internet using dedicated networking plane Public internet using shared networking plane Private networking using Private Service Connect endpoint Private networking using Private services access (VPC Network Peering)
Outbound networking Public internet Public internet Not supported Private networking using Private services access (VPC Network Peering)
VPC Service Controls Not supported. Use Private Service Connect endpoint instead. Supported Supported Supported
Cost Vertex AI Prediction Vertex AI Prediction Vertex AI Prediction + Private Service Connect endpoint Vertex AI Prediction + Private services access (see: "Using a Private Service Connect endpoint (forwarding rule) to access a published service")
Network latency Optimized Unoptimized Optimized Lower P50 latency, higher P99 (tail) latency
Encryption in transit TLS with CA-signed certificate TLS with CA-signed certificate Optional TLS with self-signed certificate None
Inference timeout Configurable up to 1 hour 60 seconds Configurable up to 1 hour 60 seconds
Payload size limit 10 MB 1.5 MB 10 MB 10 MB
QPM quota Unlimited 30,000 Unlimited Unlimited
Protocol support HTTP or gRPC HTTP HTTP or gRPC HTTP
Streaming support Yes (SSE) No Yes (SSE) No
Traffic split Yes Yes Yes No
Request and response logging Yes Yes Yes No
Access logging Yes Yes Yes No
AutoML models and explainability No Yes No No