The information in this page applies to custom-trained models and AutoML models. For Model Garden deployment, see Use models in Model Garden.
Private Service Connect lets you deploy your custom-trained Vertex AI model and serve online inferences securely to multiple consumer projects and VPC networks without the need for public IP addresses, public internet access, or an explicitly peered internal IP address range.
We recommend Private Service Connect for online inference use cases that have the following requirements:
- Require private and secure connections
- Require low latency
- Don't need to be publicly accessible
Private Service Connect uses a forwarding rule in your VPC network to send traffic unidirectionally to the Vertex AI online inference service. The forwarding rule connects to a service attachment that exposes the Vertex AI service to your VPC network. For more information, see About accessing Vertex AI services through Private Service Connect. To learn more about setting up Private Service Connect, see the Private Service Connect overview in the Virtual Private Cloud (VPC) documentation.
Dedicated private endpoints support both HTTP and gRPC communication protocols. For gRPC requests, the x-vertex-ai-endpoint-id header must be included for proper endpoint identification. The following APIs are supported:
- Predict
- RawPredict
- StreamRawPredict
- Chat Completion (Model Garden only)
You can send online inference requests to a dedicated private endpoint by using the Vertex AI SDK for Python. For details, see Get online inferences.
Required roles
    
      To get the permission that
      you need to create a Private Service Connect endpoint,
    
      ask your administrator to grant you the
    
  
  
    
      Vertex AI User  (roles/aiplatform.user)
     IAM role on your project.
  
  
  
  
  For more information about granting roles, see Manage access to projects, folders, and organizations.
  
  
      
        This predefined role contains the
         aiplatform.endpoints.create
        permission,
         which is required to
        create a Private Service Connect endpoint.
      
    
  
  
    
      
You might also be able to get this permission with custom roles or other predefined roles.
For more information about Vertex AI roles and permissions, see Vertex AI access control with IAM and Vertex AI IAM permissions.
Create the online inference endpoint
Use one of the following methods to create an online inference endpoint with Private Service Connect enabled.
The default request timeout for a Private Service Connect
endpoint is 10 minutes.
In the Vertex AI SDK for Python, you can optionally specify a different request
timeout by specifying a new
inference_timeout
value, as shown in the following example. The maximum timeout value is
3600 seconds (1 hour).
Console
- In the Google Cloud console, in Vertex AI, go to the Online prediction page. 
- Click Create. 
- Provide a display name for the endpoint. 
- Select Private. 
- Select Private Service Connect. 
- Click Select project IDs. 
- Select projects to add to the allowlist for the endpoint. 
- Click Continue. 
- Choose your model specifications. For more information, see Deploy a model to an endpoint. 
- Click Create to create your endpoint and deploy your model to it. 
- Make a note of the endpoint ID in the response. 
API
REST
Before using any of the request data, make the following replacements:
- VERTEX_AI_PROJECT_ID: the ID of the Google Cloud project where you're creating the online prediction endpoint.
- REGION: the region where you're using Vertex AI.
- VERTEX_AI_ENDPOINT_NAME: the display name for the online prediction endpoint.
- ALLOWED_PROJECTS: a comma-separated list of Google Cloud project IDs, each enclosed in quotation marks, for example,- ["PROJECTID1", "PROJECTID2"]. If a project isn't contained in this list, you won't be able to send prediction requests to the Vertex AI endpoint from it. Make sure to include VERTEX_AI_PROJECT_ID in this list so that you can call the endpoint from the same project it's in.
- INFERENCE_TIMEOUT_SECS: (Optional) Number of seconds in the optional
    inferenceTimeoutfield.
HTTP method and URL:
POST https://REGION-aiplatform.googleapis.com/v1/projects/VERTEX_AI_PROJECT_ID/locations/REGION/endpoints
Request JSON body:
{
  "displayName": "VERTEX_AI_ENDPOINT_NAME",
  "privateServiceConnectConfig": {
    "enablePrivateServiceConnect": true,
    "projectAllowlist": ["ALLOWED_PROJECTS"],
    "clientConnectionConfig": {
      "inferenceTimeout": {
        "seconds": INFERENCE_TIMEOUT_SECS
      }
    }
  }
}
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
{
  "name": "projects/VERTEX_AI_PROJECT_NUMBER/locations/REGION/endpoints/ENDPOINT_ID/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.CreateEndpointOperationMetadata",
    "genericMetadata": {
      "createTime": "2020-11-05T17:45:42.812656Z",
      "updateTime": "2020-11-05T17:45:42.812656Z"
    }
  }
}
ENDPOINT_ID.
  
Python
Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Python API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Replace the following:
- VERTEX_AI_PROJECT_ID: the ID of the Google Cloud project where you're creating the online inference endpoint
- REGION: the region where you're using Vertex AI
- VERTEX_AI_ENDPOINT_NAME: the display name for the online inference endpoint
- ALLOWED_PROJECTS: a comma-separated list of Google Cloud project IDs, each enclosed in quotation marks. For example,- ["PROJECTID1", "PROJECTID2"]. If a project isn't contained in this list, you won't be able to send inference requests to the Vertex AI endpoint from it. Make sure to include VERTEX_AI_PROJECT_ID in this list so that you can call the endpoint from the same project it's in.
- INFERENCE_TIMEOUT_SECS: (Optional) Number of seconds in the optional- inference_timeoutvalue.
PROJECT_ID = "VERTEX_AI_PROJECT_ID"
REGION = "REGION"
VERTEX_AI_ENDPOINT_NAME = "VERTEX_AI_ENDPOINT_NAME"
INFERENCE_TIMEOUT_SECS = "INFERENCE_TIMEOUT_SECS"
from google.cloud import aiplatform
aiplatform.init(project=PROJECT_ID, location=REGION)
# Create the forwarding rule in the consumer project
psc_endpoint = aiplatform.PrivateEndpoint.create(
display_name=VERTEX_AI_ENDPOINT_NAME,
project=PROJECT_ID,
location=REGION,
private_service_connect_config=aiplatform.PrivateEndpoint.PrivateServiceConnectConfig(
    project_allowlist=["ALLOWED_PROJECTS"],
    ),
inference_timeout=INFERENCE_TIMEOUT_SECS,
)
Make a note of the ENDPOINT_ID at the end of the returned
endpoint URI:
INFO:google.cloud.aiplatform.models:To use this PrivateEndpoint in another session:
INFO:google.cloud.aiplatform.models:endpoint = aiplatform.PrivateEndpoint('projects/VERTEX_AI_PROJECT_ID/locations/REGION/endpoints/ENDPOINT_ID')
Create the online inference endpoint with PSC automation (Preview)
Online inference integrates with service connectivity automation, which lets you configure inference endpoints with PSC automation. This simplifies the process by automatically creating PSC endpoints, and is particularly beneficial for ML developers who lack permissions to create network resources such as forwarding rules within a project.
To get started, your network administrator must establish a service connection
policy. This policy is a one-time
configuration per project and network that lets
Vertex AI (service class gcp-vertexai) generate PSC endpoints
within your projects and networks.
Next, you can create endpoints using the PSC automation configuration and then deploy your models. Once the deployment is complete, the relevant PSC endpoint information is accessible within the endpoints.
Limitations
- VPC Service Controls aren't supported.
- A regional limit of 500 endpoints applies to PSC automation configurations.
- PSC automation results are purged when no model is deployed or is in the process of being deployed to the endpoint. Upon cleanup and subsequent model deployment, new automation results feature distinct IP addresses and forwarding rules.
Create a service connection policy
You must be a network administrator to create the
service connection policy.
A Service connection policy is required to let Vertex AI
create PSC endpoints in your networks. Without a valid policy, the automation
fails with a CONNECTION_POLICY_MISSING error.
- Create your service connection policy. - POLICY_NAME: A user-specified name for the policy.
- PROJECT_ID: The ID of the service project where you are creating Vertex AI resources. 
- VPC_PROJECT: The project ID where your client VPC is located. For single VPC setup, this is the same as - $PROJECT. For Shared VPC setup, this is the VPC host project.
- NETWORK_NAME: The name of the network to deploy to. 
- REGION: The network's region. 
- PSC_SUBNETS: The Private Service Connect subnets to use. 
 - gcloud network-connectivity service-connection-policies create POLICY_NAME \ --project=VPC_PROJECT \ --network=projects/PROJECT_ID/global/networks/NETWORK_NAME \ --service-class=gcp-vertexai --region=REGION --subnets=PSC_SUBNETS
- View your service connection policy. - gcloud network-connectivity service-connection-policies list \ --project=VPC_PROJECT -–region=REGION - For a single VPC setup, a sample looks like this: - gcloud network-connectivity service-connection-policies create test-policy \ --network=default \ --project=YOUR_PROJECT_ID \ --region=us-central1 \ --service-class=gcp-vertexai \ --subnets=default \ --psc-connection-limit=500 \ --description=test 
Create the online inference endpoint with PSC automation config
In the PSCAutomationConfig, check to be sure that the projectId is in
the allowlist.
REST
Before using any of the request data, make the following replacements:
- REGION: The region where you're using Vertex AI.
- VERTEX_AI_PROJECT_ID: The ID of the Google Cloud project where you're creating the online inference endpoint.
- VERTEX_AI_ENDPOINT_NAME: The display name for the online prediction endpoint.
- NETWORK_NAME: the full resource name, including the project ID, instead of the project number.
HTTP method and URL:
POST https://REGION-aiplatform.googleapis.com/v1/projects/VERTEX_AI_PROJECT_ID/locations/REGION/endpoints
Request JSON body:
{
  {
    displayName: "VERTEX_AI_ENDPOINT_NAME",
    privateServiceConnectConfig: {
      enablePrivateServiceConnect: true,
      projectAllowlist: ["VERTEX_AI_PROJECT_ID"],
      pscAutomationConfigs: [
        { "project_id": "VERTEX_AI_PROJECT_ID", "network": "projects/VERTEX_AI_PROJECT_ID/global/networks/NETWORK_NAME" },
      ],
    },
  },
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
{
  "name": "projects/VERTEX_AI_PROJECT_NUMBER/locations/REGION/endpoints/ENDPOINT_ID/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.CreateEndpointOperationMetadata",
    "genericMetadata": {
      "createTime": "2020-11-05T17:45:42.812656Z",
      "updateTime": "2020-11-05T17:45:42.812656Z"
    }
  }
}
ENDPOINT_ID.
  
Python
Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Python API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Replace the following:
- VERTEX_AI_PROJECT_ID: the ID of the Google Cloud project where you're creating the online inference endpoint
- REGION: the region where you're using Vertex AI
- VERTEX_AI_ENDPOINT_NAME: the display name for the online inference endpoint
- NETWORK_NAME: the full resource name, including the project ID, instead of the project number.
PROJECT_ID = "VERTEX_AI_PROJECT_ID"
REGION = "REGION"
VERTEX_AI_ENDPOINT_NAME = "VERTEX_AI_ENDPOINT_NAME"
from google.cloud import aiplatform
aiplatform.init(project=PROJECT_ID, location=REGION)
config =
aiplatform.compat.types.service_networking.PrivateServiceConnectConfig(
        enable_private_service_connect=True,
        project_allowlist="VERTEX_AI_PROJECT_ID"
        psc_automation_configs=[
            aiplatform.compat.types.service_networking.PSCAutomationConfig(
                project_id="VERTEX_AI_PROJECT_ID"
network=projects/"VERTEX_AI_PROJECT_ID"/global/networks/"NETWORK_NAME",
            )
        ]
    )
psc_endpoint = aiplatform.PrivateEndpoint.create(
     display_name="VERTEX_AI_ENDPOINT_NAME"
     private_service_connect_config=config,
)
Deploy the model
After you create your online inference endpoint with Private Service Connect enabled, deploy your model to it, following the steps outlined in Deploy a model to an endpoint.
Create PSC Endpoint Manually
Get the service attachment URI
When you deploy your model, a service attachment is created for the online
inference endpoint. This service attachment represents the
Vertex AI online inference service that's being exposed to your
VPC network. Run the
gcloud ai endpoints describe command
to get the service attachment URI.
- List only the - serviceAttachmentvalue from the endpoint details:- gcloud ai endpoints describe ENDPOINT_ID \ --project=VERTEX_AI_PROJECT_ID \ --region=REGION \ | grep -i serviceAttachment- Replace the following: - ENDPOINT_ID: the ID of your online inference endpoint
- VERTEX_AI_PROJECT_ID: the ID of the Google Cloud project where you created your online inference endpoint
- REGION: the region for this request
 - The output is similar to the following: - serviceAttachment: projects/ac74a9f84c2e5f2a1-tp/regions/us-central1/serviceAttachments/gkedpm-c6e6a854a634dc99472bb802f503c1
- Make a note of the entire string in the - serviceAttachmentfield. This is the service attachment URI.
Create a forwarding rule
You can reserve an internal IP address and create a forwarding rule with that address. To create the forwarding rule, you need the service attachment URI from the previous step.
- To reserve an internal IP address for the forwarding rule, use the - gcloud compute addresses createcommand:- gcloud compute addresses create ADDRESS_NAME \ --project=VPC_PROJECT_ID \ --region=REGION \ --subnet=SUBNETWORK \ --addresses=INTERNAL_IP_ADDRESS- Replace the following: - ADDRESS_NAME: a name for the internal IP address
- VPC_PROJECT_ID: the ID of the Google Cloud project that hosts your VPC network. If your online inference endpoint and your Private Service Connect forwarding rule are hosted in the same project, use- VERTEX_AI_PROJECT_IDfor this parameter.
- REGION: the Google Cloud region where the Private Service Connect forwarding rule is to be created
- SUBNETWORK: the name of the VPC subnet that contains the IP address
- INTERNAL_IP_ADDRESS: the internal IP address to reserve. This parameter is optional.- If this parameter is specified, the IP address must be within the subnet's primary IP address range. The IP address can be an RFC 1918 address or a subnet with non-RFC ranges.
- If this parameter is omitted, an internal IP address is allocated automatically.
- For more information, see Reserve a new static internal IPv4 or IPv6 address.
 
 
- To verify that the IP address is reserved, use the - gcloud compute addresses listcommand:- gcloud compute addresses list --filter="name=(ADDRESS_NAME)" \ --project=VPC_PROJECT_ID- In the response, verify that a - RESERVEDstatus appears for the IP address.
- To create the forwarding rule and point it to the online inference service attachment, use the - gcloud compute forwarding-rules createcommand:- gcloud compute forwarding-rules create PSC_FORWARDING_RULE_NAME \ --address=ADDRESS_NAME \ --project=VPC_PROJECT_ID \ --region=REGION \ --network=VPC_NETWORK_NAME \ --target-service-attachment=SERVICE_ATTACHMENT_URI- Replace the following: - PSC_FORWARDING_RULE_NAME: a name for the forwarding rule
- VPC_NETWORK_NAME: the name of the VPC network where the endpoint is to be created
- SERVICE_ATTACHMENT_URI: the service attachment that you made a note of earlier
 
- To verify that the service attachment accepts the endpoint, use the - gcloud compute forwarding-rules describecommand:- gcloud compute forwarding-rules describe PSC_FORWARDING_RULE_NAME \ --project=VPC_PROJECT_ID \ --region=REGION- In the response, verify that an - ACCEPTEDstatus appears in the- pscConnectionStatusfield.
Optional: Get the internal IP address
If you didn't specify a value for INTERNAL_IP_ADDRESS when you
created the forwarding rule, you can get the address
that was allocated automatically by using the
gcloud compute forwarding-rules describe command:
gcloud compute forwarding-rules describe PSC_FORWARDING_RULE_NAME \
--project=VERTEX_AI_PROJECT_ID \
--region=REGION \
| grep -i IPAddress
Replace the following:
- VERTEX_AI_PROJECT_ID: your project ID
- REGION: the region name for this request
Optional: Get PSC endpoint from PSC automation result
You can get the generated IP address and forwarding rule from the inference endpoint. Here's an example:
"privateServiceConnectConfig": {
  "enablePrivateServiceConnect": true,
  "projectAllowlist": [
    "your-project-id",
  ],
  "pscAutomationConfigs": [
    {
      "projectId": "your-project-id",
      "network": "projects/your-project-id/global/networks/default",
      "ipAddress": "10.128.15.209",
      "forwardingRule": "https://www.googleapis.com/compute/v1/projects/your-project-id/regions/us-central1/forwardingRules/sca-auto-fr-47b0d6a4-eaff-444b-95e6-e4dc1d10101e",
      "state": "PSC_AUTOMATION_STATE_SUCCESSFUL"
    },
  ]
}
Here are some error handling details.
- Automation failure doesn't affect the outcome of the model deployment.
- The success or failure of the operation is indicated in the state.
- If successful, the IP address and forwarding rule is displayed.
- If unsuccessful, an error message is displayed.
 
- Automation configurations are removed when no models are deployed or in the process of being deployed to the endpoint. This results in a change to the IP address and forwarding rule if a model is deployed later.
- Failed automation won't recover. In case of failure, you can still create the PSC endpoint manually, see Create PSC Endpoint Manually.
Get online inferences
Getting online inferences from an endpoint with Private Service Connect is similar to getting online inferences from public endpoints, except for the following considerations:
- The request must be sent from a project that was specified in the
projectAllowlistwhen the online inference endpoint was created.
- If global access isn't enabled, the request must be sent from the same region.
- There are two ports open, 443 with TLS using self-signed certificate and 80 without TLS. Both ports support HTTP and GRPC. All traffic will be under private network and won't go through public internet.
- To obtain inferences, a connection must be established using the endpoint's static IP address, unless a DNS record is created for the internal IP address. For example, send the - predictrequests to the following endpoint:- https://INTERNAL_IP_ADDRESS/v1/projects/VERTEX_AI_PROJECT_ID/locations/REGION/endpoints/ENDPOINT_ID:predict- Replace - INTERNAL_IP_ADDRESSwith the internal IP address that you reserved earlier.
- For gRPC requests: To ensure proper endpoint identification for gRPC requests, it is necessary to include the header - x-vertex-ai-endpoint-id. This is required as endpoint information is not conveyed within the request path for gRPC communication.
- Secure connections (Port 443): When establishing secure connections using port 443, the server uses a self-signed certificate. To proceed with the connection, one of the following approaches is recommended: - Option 1: Certificate validation bypass: Configure the client to ignore certificate validation and establish the connection using either the IP address of the server or a preferred DNS resolution method.
- Option 2: Trust store integration: Obtain the server's self-signed
certificate, add it to the local trust store of the client system, and use the
DNS name in the format *.prediction.p.vertexai.googfor establishing the connection. This method ensures secure communication through certificate validation. The server's certificate can be written to PSC_CERTIFICATE_FILE.pem with the command:
 - openssl s_client -showcerts -connect INTERNAL_IP_ADDRESS:443 \ -servername *.prediction.p.vertexai.goog \ </dev/null 2>/dev/null | sed -n \ '/-----BEGIN CERTIFICATE-----/,/-----END CERTIFICATE-----/p' > PSC_CERTIFICATE_FILE.pem
The following sections provide examples of how you can send the predict request using Python.
First example
psc_endpoint = aiplatform.PrivateEndpoint("projects/VERTEX_AI_PROJECT_ID/locations/REGION/endpoints/ENDPOINT_ID")
REQUEST_FILE = "PATH_TO_INPUT_FILE"
import json
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
with open(REQUEST_FILE) as json_file:
    data = json.load(json_file)
    response = psc_endpoint.predict(
        instances=data["instances"], endpoint_override=INTERNAL_IP_ADDRESS
    )
print(response)
Replace PATH_TO_INPUT_FILE with a path to a JSON file
containing the request input.
Second example
import json
import requests
import urllib3
import google.auth.transport.requests
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
REQUEST_FILE = "PATH_TO_INPUT_FILE"
# Programmatically get credentials and generate an access token
creds, project = google.auth.default()
auth_req = google.auth.transport.requests.Request()
creds.refresh(auth_req)
access_token = creds.token
# Note: the credential lives for 1 hour by default
# After expiration, it must be refreshed
# See https://cloud.google.com/docs/authentication/token-types#access-tokens
# for token lifetimes.
with open(REQUEST_FILE) as json_file:
    data = json.load(json_file)
    url = "https://INTERNAL_IP_ADDRESS/v1/projects/VERTEX_AI_PROJECT_ID/locations/REGION/endpoints/ENDPOINT_ID:predict"
    headers = {
      "Content-Type": "application/json",
      "Authorization": f"Bearer {access_token}"  # Add access token to headers
    }
    payload = {
      "instances": data["instances"],
    }
response = requests.post(url, headers=headers, json=payload, verify=False)
print(response.json())
Optional: Create a DNS record for the internal IP address
We recommend that you create a DNS record so that you can get online inferences from your endpoint without needing to specify the internal IP address.
For more information, see Other ways to configure DNS.
- Create a private DNS zone by using the - gcloud dns managed-zones createcommand. This zone is associated with the VPC network that the forwarding rule was created in.- DNS_NAME_SUFFIX="prediction.p.vertexai.goog." # DNS names have "." at the end. gcloud dns managed-zones create ZONE_NAME \ --project=VPC_PROJECT_ID \ --dns-name=$DNS_NAME_SUFFIX \ --networks=VPC_NETWORK_NAME \ --visibility=private \ --description="A DNS zone for Vertex AI endpoints using Private Service Connect."- Replace the following: - ZONE_NAME: the name of the DNS zone
 
- To create a DNS record in the zone, use the - gcloud dns record-sets createcommand:- DNS_NAME=ENDPOINT_ID-REGION-VERTEX_AI_PROJECT_NUMBER.$DNS_NAME_SUFFIX gcloud dns record-sets create $DNS_NAME \ --rrdatas=INTERNAL_IP_ADDRESS \ --zone=ZONE_NAME \ --type=A \ --ttl=60 \ --project=VPC_PROJECT_ID- Replace the following: - VERTEX_AI_PROJECT_NUMBER: the project number for your- VERTEX_AI_PROJECT_IDproject. You can locate this project number in the Google Cloud console. For more information, see Identifying projects.
- INTERNAL_IP_ADDRESS: the internal IP address of your online inference endpoint
 - Now you can send your - predictrequests to:- https://ENDPOINT_ID-REGION-VERTEX_AI_PROJECT_NUMBER.prediction.p.vertexai.goog/v1/projects/VERTEX_AI_PROJECT_ID/locations/REGION/endpoints/ENDPOINT_ID:predict
The following is an example of how you can send the predict request to the DNS zone using Python:
REQUEST_FILE = "PATH_TO_INPUT_FILE"
import json
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
with open(REQUEST_FILE) as json_file:
    data = json.load(json_file)
    response = psc_endpoint.predict(
        instances=data["instances"], endpoint_override=DNS_NAME
    )
print(response)
Replace DNS_NAME with the DNS name that you specified in the
gcloud dns record-sets create command.
Limitations
Vertex AI endpoints with Private Service Connect are subject to the following limitations:
- Deployment of tuned Gemini models isn't supported.
- Private egress from within the endpoint isn't supported. Because Private Service Connect forwarding rules are unidirectional, other private Google Cloud workloads aren't accessible inside your container.
- An endpoint's projectAllowlistvalue can't be changed.
- Vertex Explainable AI isn't supported.
- Before you delete an endpoint, you must undeploy your model from that endpoint.
- If all models are undeployed for more than 10
minutes, the service attachment might be deleted. Check the
Private Service Connect connection status;
if it's CLOSED, recreate the forwarding rule.
- After you've deleted your endpoint, you won't be able to reuse that endpoint name for up to 7 days.
- A project can have up to 10 different projectAllowlistvalues in its Private Service Connect configurations.