Choose an endpoint type

To deploy a model for online inference, you need an endpoint. Endpoints can be divided into the following types:

Public endpoints can be accessed over the public internet. They are easier to use, because no private network infrastructure is required. There are two types of public endpoints: dedicated and shared. A dedicated public endpoint is a faster endpoint providing production isolation, support for larger payload sizes, and longer request timeouts than a shared public endpoint. Also, when you send an inference request to a dedicated public endpoint, it is isolated from other users' traffic. For these reasons, dedicated public endpoints are recommended as a best practice.
Dedicated private endpoints using Private Service Connect provide a secure connection for private communication between on-premises and Google Cloud. They can be used to control Google API traffic through the use of Private Service Connect APIs. They are recommended as a best practice.
Private endpoints also provide a secure connection to your model and can also be used for private communication between on-premises and Google Cloud. They use private services access over a VPC Network Peering connection.

For more information about deploying a model to an endpoint, see Deploy a model to an endpoint.

The following table compares the supported endpoint types for serving Vertex AI online inferences.

	Dedicated public endpoint (recommended)	Shared public endpoint	Dedicated private endpoint using Private Service Connect (recommended)	Private endpoint
Purpose	Default networking experience. Enables submitting requests from public internet.	Default networking experience. Enables submitting requests from public internet.	Recommended for production enterprise applications. Improves network latency and security by making sure requests and responses are routed privately.	Recommended for production enterprise applications. Improves network latency and security by making sure requests and responses are routed privately.
Networking access	Public internet using dedicated networking plane	Public internet using shared networking plane	Private networking using Private Service Connect endpoint	Private networking using Private services access (VPC Network Peering)
VPC Service Controls	Not supported. Use a dedicated private endpoint instead.	Supported	Supported	Supported
Cost	Vertex AI Inference	Vertex AI Inference	Vertex AI Inference + Private Service Connect endpoint	Vertex AI Inference + Private services access (see: "Using a Private Service Connect endpoint (forwarding rule) to access a published service")
Network latency	Optimized	Unoptimized	Optimized	Optimized
Encryption in transit	TLS with CA-signed certificate	TLS with CA-signed certificate	Optional TLS with self-signed certificate	None
Inference timeout	Configurable up to 1 hour	60 seconds	Configurable up to 1 hour	60 seconds
Payload size limit	10 MB	1.5 MB	10 MB	10 MB
QPM quota	Unlimited	30,000	Unlimited	Unlimited
Protocol support	HTTP or gRPC	HTTP	HTTP or gRPC	HTTP
Streaming support	Yes (SSE)	No	Yes (SSE)	No
Traffic split	Yes	Yes	Yes	No
Request and response logging	Yes	Yes	Yes	No
Access logging	Yes	Yes	Yes	No
Tuned Gemini model deployment	No	Yes	No	No
AutoML models and explainability	No	Yes	No	No
Client libraries supported	Vertex AI SDK for Python	Vertex AI client libraries, Vertex AI SDK for Python	Vertex AI SDK for Python	Vertex AI SDK for Python

What's next

Learn more about deploying a model to an endpoint.

Choose an endpoint type Stay organized with collections Save and categorize content based on your preferences.

What's next

Choose an endpoint type