Private Service Connect lets you access Vertex AI online predictions securely from multiple consumer projects and VPC networks without the need for public IP addresses, public internet access, or an explicitly peered internal IP address range.
We recommend Private Service Connect for online prediction use cases that have the following requirements:
- Require private and secure connections
- Require low latency
- Don't need to be publicly accessible
Private Service Connect uses a forwarding rule in your VPC network to send traffic unidirectionally to the Vertex AI online prediction service. The forwarding rule connects to a service attachment that exposes the Vertex AI service to your VPC network. For more information, see About accessing Vertex AI services through Private Service Connect. To learn more about setting up Private Service Connect, see the Private Service Connect overview in the Virtual Private Cloud (VPC) documentation.
Create the online prediction endpoint
Use one of the following methods to create an online prediction endpoint with Private Service Connect enabled:
Console
In the Google Cloud console, in Vertex AI, go to the Online prediction page.
Click Create.
Provide a display name for the endpoint.
Select
Private.Select
Private Service Connect.Click Select project IDs.
Select projects to add to the allowlist for the endpoint.
Click Continue.
Choose your model specifications. For more information, see Deploy a model to an endpoint.
Click Create to create your endpoint and deploy your model to it.
Make a note of the endpoint ID in the response.
API
REST
Before using any of the request data, make the following replacements:
VERTEX_AI_PROJECT_ID
: the ID of the Google Cloud project where you're creating the online prediction endpoint.REGION
: the region where you're using Vertex AI.VERTEX_AI_ENDPOINT_NAME
: the display name for the online prediction endpoint.ALLOWED_PROJECTS
: a comma-separated list of Google Cloud project IDs, each enclosed in quotation marks, for example,["PROJECTID1", "PROJECTID2"]
. If a project isn't contained in this list, you won't be able to send prediction requests to the Vertex AI endpoint from it. Make sure to include VERTEX_AI_PROJECT_ID in this list so that you can call the endpoint from the same project it's in.
HTTP method and URL:
POST https://REGION-aiplatform.googleapis.com/v1/projects/VERTEX_AI_PROJECT_ID/locations/REGION/endpoints
Request JSON body:
{ "displayName": "VERTEX_AI_ENDPOINT_NAME", "privateServiceConnectConfig": { "enablePrivateServiceConnect": true, "projectAllowlist": ["ALLOWED_PROJECTS"] } }
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
{ "name": "projects/VERTEX_AI_PROJECT_NUMBER/locations/REGION/endpoints/ENDPOINT_ID/operations/OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.aiplatform.v1.CreateEndpointOperationMetadata", "genericMetadata": { "createTime": "2020-11-05T17:45:42.812656Z", "updateTime": "2020-11-05T17:45:42.812656Z" } } }
ENDPOINT_ID
.
Python
Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Python API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
PROJECT_ID = "VERTEX_AI_PROJECT_ID"
REGION = "REGION"
VERTEX_AI_ENDPOINT_NAME = "VERTEX_AI_ENDPOINT_NAME"
from google.cloud import aiplatform
aiplatform.init(project=PROJECT_ID, location=REGION)
# Create the forwarding rule in the consumer project
psc_endpoint = aiplatform.PrivateEndpoint.create(
display_name=VERTEX_AI_ENDPOINT_NAME,
project=PROJECT_ID,
location=REGION,
private_service_connect_config=aiplatform.PrivateEndpoint.PrivateServiceConnectConfig(
project_allowlist=["ALLOWED_PROJECTS"],
),
)
Replace the following:
VERTEX_AI_PROJECT_ID
: the ID of the Google Cloud project where you're creating the online prediction endpointREGION
: the region where you're using Vertex AIVERTEX_AI_ENDPOINT_NAME
: the display name for the online prediction endpointALLOWED_PROJECTS
: a comma-separated list of Google Cloud project IDs, each enclosed in quotation marks. For example,["PROJECTID1", "PROJECTID2"]
. If a project isn't contained in this list, you won't be able to send prediction requests to the Vertex AI endpoint from it. Make sure to include VERTEX_AI_PROJECT_ID in this list so that you can call the endpoint from the same project it's in.
Make a note of the ENDPOINT_ID
at the end of the returned
endpoint URI:
INFO:google.cloud.aiplatform.models:To use this PrivateEndpoint in another session:
INFO:google.cloud.aiplatform.models:endpoint = aiplatform.PrivateEndpoint('projects/VERTEX_AI_PROJECT_ID/locations/REGION/endpoints/ENDPOINT_ID')
Deploy the model
After you create your online prediction endpoint with Private Service Connect enabled, deploy your model to it, following the steps outlined in Deploy a model to an endpoint.
Get the service attachment URI
When you deploy your model, a service attachment is created for the online
prediction endpoint. This service attachment represents the
Vertex AI online prediction service that's being exposed to your
VPC network. Run the
gcloud ai endpoints describe
command
to get the service attachment URI.
List only the
serviceAttachment
value from the endpoint details:gcloud ai endpoints describe ENDPOINT_ID \ --project=VERTEX_AI_PROJECT_ID \ --region=REGION \ | grep -i serviceAttachment
Replace the following:
ENDPOINT_ID
: the ID of your online prediction endpointVERTEX_AI_PROJECT_ID
: the ID of the Google Cloud project where you created your online prediction endpointREGION
: the region for this request
The output is similar to the following:
serviceAttachment: projects/ac74a9f84c2e5f2a1-tp/regions/us-central1/serviceAttachments/gkedpm-c6e6a854a634dc99472bb802f503c1
Make a note of the entire string in the
serviceAttachment
field. This is the service attachment URI.
Create a forwarding rule
You can reserve an internal IP address and create a forwarding rule with that address. To create the forwarding rule, you need the service attachment URI from the previous step.
To reserve an internal IP address for the forwarding rule, use the
gcloud compute addresses create
command:gcloud compute addresses create ADDRESS_NAME \ --project=VPC_PROJECT_ID \ --region=REGION \ --subnet=SUBNETWORK \ --addresses=INTERNAL_IP_ADDRESS
Replace the following:
ADDRESS_NAME
: a name for the internal IP addressVPC_PROJECT_ID
: the ID of the Google Cloud project that hosts your VPC network. If your online prediction endpoint and your Private Service Connect forwarding rule are hosted in the same project, useVERTEX_AI_PROJECT_ID
for this parameter.REGION
: the Google Cloud region where the Private Service Connect forwarding rule is to be createdSUBNETWORK
: the name of the VPC subnet that contains the IP addressINTERNAL_IP_ADDRESS
: the internal IP address to reserve. This parameter is optional.- If this parameter is specified, the IP address must be within the subnet's primary IP address range. The IP address can be an RFC 1918 address or a subnet with non-RFC ranges.
- If this parameter is omitted, an internal IP address is allocated automatically.
- For more information, see Reserve a new static internal IPv4 or IPv6 address.
To verify that the IP address is reserved, use the
gcloud compute addresses list
command:gcloud compute addresses list --filter="name=(ADDRESS_NAME)" \ --project=VPC_PROJECT_ID
In the response, verify that a
RESERVED
status appears for the IP address.To create the forwarding rule and point it to the online prediction service attachment, use the
gcloud compute forwarding-rules create
command:gcloud compute forwarding-rules create PSC_FORWARDING_RULE_NAME \ --address=ADDRESS_NAME \ --project=VPC_PROJECT_ID \ --region=REGION \ --network=VPC_NETWORK_NAME \ --target-service-attachment=SERVICE_ATTACHMENT_URI
Replace the following:
PSC_FORWARDING_RULE_NAME
: a name for the forwarding ruleVPC_NETWORK_NAME
: the name of the VPC network where the endpoint is to be createdSERVICE_ATTACHMENT_URI
: the service attachment that you made a note of earlier
To verify that the service attachment accepts the endpoint, use the
gcloud compute forwarding-rules describe
command:gcloud compute forwarding-rules describe PSC_FORWARDING_RULE_NAME \ --project=VPC_PROJECT_ID \ --region=REGION
In the response, verify that an
ACCEPTED
status appears in thepscConnectionStatus
field.
Optional: Get the internal IP address
If you didn't specify a value for INTERNAL_IP_ADDRESS
when you
created the forwarding rule, you can get the address
that was allocated automatically by using the
gcloud compute forwarding-rules describe
command:
gcloud compute forwarding-rules describe PSC_FORWARDING_RULE_NAME \
--project=VERTEX_AI_PROJECT_ID \
--region=REGION \
| grep -i IPAddress
Replace the following:
VERTEX_AI_PROJECT_ID
: your project IDREGION
: the region name for this request
Get online predictions
Getting online predictions from an endpoint with Private Service Connect is similar to getting online predictions from public endpoints, except for the following considerations:
- The request must be sent from a project that was specified in the
projectAllowlist
when the online prediction endpoint was created. - If global access isn't enabled, the request must be sent from the same region.
To get predictions using REST, you must connect using the endpoint's static IP address, unless you create a DNS record for the internal IP address. For example, you must send your
predict
requests to the following endpoint:https://INTERNAL_IP_ADDRESS/v1/projects/VERTEX_AI_PROJECT_ID/locations/REGION/endpoints/ENDPOINT_ID:predict
Replace
INTERNAL_IP_ADDRESS
with the internal IP address that you reserved earlier.
The following sections provide examples of how you can send the predict request using Python.
First example
psc_endpoint = aiplatform.PrivateEndpoint("projects/VERTEX_AI_PROJECT_ID/locations/REGION/endpoints/ENDPOINT_ID")
REQUEST_FILE = "PATH_TO_INPUT_FILE"
import json
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
with open(REQUEST_FILE) as json_file:
data = json.load(json_file)
response = psc_endpoint.predict(
instances=data["instances"], endpoint_override=INTERNAL_IP_ADDRESS
)
print(response)
Replace PATH_TO_INPUT_FILE
with a path to a JSON file
containing the request input.
Second example
import json
import requests
import urllib3
import google.auth.transport.requests
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
REQUEST_FILE = "PATH_TO_INPUT_FILE"
# Programmatically get credentials and generate an access token
creds, project = google.auth.default()
auth_req = google.auth.transport.requests.Request()
creds.refresh(auth_req)
access_token = creds.token
# Note: the credential lives for 1 hour by default
# After expiration, it must be refreshed
# See https://cloud.google.com/docs/authentication/token-types#at-lifetime
with open(REQUEST_FILE) as json_file:
data = json.load(json_file)
url = "https://INTERNAL_IP_ADDRESS/v1/projects/VERTEX_AI_PROJECT_ID/locations/REGION/endpoints/ENDPOINT_ID:predict"
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {access_token}" # Add access token to headers
}
payload = {
"instances": data["instances"],
}
response = requests.post(url, headers=headers, json=payload, verify=False)
print(response.json())
Optional: Create a DNS record for the internal IP address
We recommend that you create a DNS record so that you can get online predictions from your endpoint without needing to specify the internal IP address.
For more information, see Other ways to configure DNS.
Create a private DNS zone by using the
gcloud dns managed-zones create
command. This zone is associated with the VPC network that the forwarding rule was created in.DNS_NAME_SUFFIX="prediction.p.vertexai.goog." # DNS names have "." at the end. gcloud dns managed-zones create ZONE_NAME \ --project=VPC_PROJECT_ID \ --dns-name=$DNS_NAME_SUFFIX \ --networks=VPC_NETWORK_NAME \ --visibility=private \ --description="A DNS zone for Vertex AI endpoints using Private Service Connect."
Replace the following:
ZONE_NAME
: the name of the DNS zone
To create a DNS record in the zone, use the
gcloud dns record-sets create
command:DNS_NAME=ENDPOINT_ID.REGION-VERTEX_AI_PROJECT_NUMBER.$DNS_NAME_SUFFIX gcloud dns record-sets create $DNS_NAME \ --rrdatas=INTERNAL_IP_ADDRESS \ --zone=ZONE_NAME \ --type=A \ --ttl=60 \ --project=VPC_PROJECT_ID
Replace the following:
VERTEX_AI_PROJECT_NUMBER
: the project number for yourVERTEX_AI_PROJECT_ID
project. You can locate this project number in the Google Cloud console. For more information, see Identifying projects.INTERNAL_IP_ADDRESS
: the internal IP address of your online prediction endpoint
Now you can send your
predict
requests to:https://ENDPOINT_ID.REGION-VERTEX_AI_PROJECT_NUMBER.prediction.p.vertexai.goog/v1/projects/VERTEX_AI_PROJECT_ID/locations/REGION/endpoints/ENDPOINT_ID:predict
The following is an example of how you can send the predict request to the DNS zone using Python:
REQUEST_FILE = "PATH_TO_INPUT_FILE"
import json
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
with open(REQUEST_FILE) as json_file:
data = json.load(json_file)
response = psc_endpoint.predict(
instances=data["instances"], endpoint_override=DNS_NAME
)
print(response)
Replace DNS_NAME
with the DNS name that you specified in the
gcloud dns record-sets create
command.
Limitations
Vertex AI endpoints with Private Service Connect are subject to the following limitations:
- Private egress from within the endpoint isn't supported. Because Private Service Connect forwarding rules are unidirectional, other private Google Cloud workloads aren't accessible inside your container.
- An endpoint's
projectAllowlist
configuration can't be changed. - Access logging isn't supported.
- Request and response logging isn't supported.
- Vertex Explainable AI isn't supported.
Preview limitations
In the Preview, the following additional limitations apply:
- If you undeploy all Private Service Connect models and redeploy them, you must recreate the forwarding rule, even if the service attachment name is the same.
- All endpoints must have the same
projectAllowlist
configuration.