This quickstart guides the Application Operator (AO) through the process of using the Vertex AI Online Predictions API on Google Distributed Cloud (GDC) air-gapped.
Before you begin
Before trying online predictions, perform the following steps:
- Create and train a prediction model targeting one of the supported containers.
- If you don't have a project, work with your Platform Administrator (PA) to create one.
- Work with your Infrastructure Operator (IO) to ensure the Prediction user cluster exists and your user project allows incoming external traffic.
- Export your model artifacts for prediction.
- Deploy your model to an endpoint.
- Format your input for online prediction.
Get an authentication token
You must get a token to authenticate the requests to the
Online Prediction service. This step is necessary if you
use the curl
tool to make requests.
Follow these steps to get an authentication token:
gdcloud CLI
Export the identity token for the specified account to an environment variable:
export TOKEN="$($HOME/gdcloud auth print-identity-token --audiences=https://ENDPOINT)"
Replace ENDPOINT
with the Online Predictions endpoint. For more information, view service statuses and endpoints.
Python
Install the
google-auth
client library:pip install google-auth
Add the following code to a Python script:
import google.auth from google.auth.transport import requests api_endpoint = "https://ENDPOINT" creds, project_id = google.auth.default() creds = creds.with_gdch_audience(api_endpoint) def test_get_token(): req = requests.Request() creds.refresh(req) print(creds.token) if __name__=="__main__": test_get_token()
Replace
ENDPOINT
with the Online Predictions endpoint that you use for your organization. For more information, view service status and endpoints.Save the Python script with a name such as
prediction.py
.Run the Python script to fetch the token:
python SCRIPT_NAME
Replace
SCRIPT_NAME
with the name you gave to your Python script, such asprediction.py
.
The output shows the authentication token.
Add the token to the header of the curl
requests you make, as in
the following example:
-H "Authorization: Bearer TOKEN"
Send an online prediction request
Send an online prediction request to the model's endpoint URL using HTTP or gRPC.
HTTP
The following example uses HTTP to send an online prediction request.
Use the curl
tool to call the HTTP endpoint. For example:
curl -X POST -H "Content-Type: application/json; charset=utf-8" -H "Authorization: Bearer TOKEN"
https://ENDPOINT_URL_PATH.GDC_URL:443/v1/model:predict -d @JSON_FILE_NAME.json
{
"predictions": [[-357.10849], [-171.621658]
]
}
Replace the following:
ENDPOINT_URL_PATH
: the endpoint URL path for the online prediction request.GDC_URL
: the URL of your organization in Distributed Cloud, for example,org-1.zone1.gdch.test
.JSON_FILE_NAME
: the name of the JSON file with the request body details for your online prediction.TOKEN
: the authentication token you obtained.
You obtain the output following the command. The API response is in JSON format.
gRPC
The following example uses gRPC to send an online prediction request:
Install the
google-cloud-aiplatform
Python client library by following the instructions from Install Vertex AI client libraries.When downloading the client library you want to install, choose one of the following library files, depending on your operating system:
- CentOS:
centos-google-cloud-aiplatform-1.34.0.tar.gz
- Ubuntu:
ubuntu-google-cloud-aiplatform-1.34.0.tar.gz
Use the following URL to download the client library:
https://GDC_URL/.well-known/static/client-libraries/LIBRARY_FILE
Replace the following:
GDC_URL
: the URL of your organization in Distributed Cloud.LIBRARY_FILE
: the name of the library file depending on the operating system, for example,ubuntu-google-cloud-aiplatform-1.34.0.tar.gz
.
- CentOS:
Save the following code to a Python script:
import json import os from typing import Sequence import grpc from absl import app from absl import flags from google.auth.transport import requests from google.protobuf import json_format from google.protobuf.struct_pb2 import Value from google.cloud.aiplatform_v1.services import prediction_service _INPUT = flags.DEFINE_string("input", None, "input", required=True) _HOST = flags.DEFINE_string("host", None, "Prediction endpoint", required=True) _ENDPOINT_ID = flags.DEFINE_string("endpoint_id", None, "endpoint id", required=True) os.environ["GRPC_DEFAULT_SSL_ROOTS_FILE_PATH"] = "path-to-ca-cert-file.cert" # ENDPOINT_RESOURCE_NAME is a placeholder value that doesn't affect prediction behavior. ENDPOINT_RESOURCE_NAME="projects/000000000000/locations/us-central1/endpoints/00000000000000" def get_sts_token(host): creds = None try: creds, _ = google.auth.default() creds = creds.with_gdch_audience(host+":443") req = requests.Request() creds.refresh(req) print("Got token: ") print(creds.token) except Exception as e: print("Caught exception" + str(e)) raise e return creds.token # predict_client_secure builds a client that requires TLS def predict_client_secure(host, token): with open(os.environ["GRPC_DEFAULT_SSL_ROOTS_FILE_PATH"], 'rb') as f: channel_creds = grpc.ssl_channel_credentials(f.read()) call_creds = grpc.access_token_call_credentials(token) creds = grpc.composite_channel_credentials( channel_creds, call_creds, ) client = prediction_service.PredictionServiceClient( transport=prediction_service.transports.grpc.PredictionServiceGrpcTransport( channel=grpc.secure_channel(target=host+":443", credentials=creds))) return client def predict_func(client, instances): resp = client.predict( endpoint=ENDPOINT_RESOURCE_NAME, instances=instances, metadata=[("x-vertex-ai-endpoint-id", _ENDPOINT_ID.value)] ) print(resp) def main(argv: Sequence[str]): del argv # Unused. with open(_INPUT.value) as json_file: data = json.load(json_file) instances = [json_format.ParseDict(s, Value()) for s in data["instances"]] token = get_sts_token(_HOST.value) client = predict_client_secure(_HOST.value, token) predict_func(client=client, instances=instances) if __name__=="__main__": app.run(main)
Make the gRPC call to the prediction server:
python PYTHON_FILE_NAME.py --input JSON_FILE_NAME.json \ --host ENDPOINT_URL_PATH.GDC_URL \ --endpoint_id ENDPOINT_ID \
Replace the following:
PYTHON_FILE_NAME
: the name of the Python file where you saved the script.JSON_FILE_NAME
: the name of the JSON file with the request body details for your online prediction.ENDPOINT_URL_PATH
: the endpoint URL path for the online prediction request.GDC_URL
: the URL of your organization in Distributed Cloud, for example,org-1.zone1.gdch.test
.ENDPOINT_ID
: the value of the endpoint ID.
If successful, you receive a JSON response similar to one of the responses on Response body examples.