The following table compares the supported endpoint types for serving Vertex AI online predictions and indicates which types are recommended as best practices.
Dedicated public endpoint (recommended) | Public endpoint | Private Service Connect endpoint (recommended) | Private endpoint | |
---|---|---|---|---|
Purpose | Default networking experience. Enables submitting requests from public internet (if VPC Service Controls isn't enabled). | Default networking experience. Enables submitting requests from public internet (if VPC Service Controls isn't enabled). | Recommended for production enterprise applications. Improves network latency and security by ensuring requests and responses are routed privately. | Recommended for production enterprise applications. Improves network latency and security by ensuring requests and responses are routed privately. |
Inbound networking | Public internet using dedicated networking plane | Public internet using shared networking plane | Private networking using Private Service Connect endpoint | Private networking using Private services access (VPC Network Peering) |
Outbound networking | Public internet | Public internet | Not supported | Private networking using Private services access (VPC Network Peering) |
VPC Service Controls | Not supported. Use Private Service Connect endpoint instead. | Supported | Supported | Supported |
Cost | Vertex AI Prediction | Vertex AI Prediction | Vertex AI Prediction + Private Service Connect endpoint | Vertex AI Prediction + Private services access (see: "Using a Private Service Connect endpoint (forwarding rule) to access a published service") |
Network latency | Optimized | Unoptimized | Optimized | Lower P50 latency, higher P99 (tail) latency |
Encryption in transit | TLS with CA-signed certificate | TLS with CA-signed certificate | Optional TLS with self-signed certificate | None |
Inference timeout | Configurable up to 1 hour | 60 seconds | Configurable up to 1 hour | 60 seconds |
Payload size limit | 10 MB | 1.5 MB | 10 MB | 10 MB |
QPM quota | Unlimited | 30,000 | Unlimited | Unlimited |
Protocol support | HTTP or gRPC | HTTP | HTTP or gRPC | HTTP |
Streaming support | Yes (SSE) | No | Yes (SSE) | No |
Traffic split | Yes | Yes | Yes | No |
Request and response logging | Yes | Yes | Yes | No |
Access logging | Yes | Yes | Yes | No |
AutoML models and explainability | No | Yes | No | No |