Try Online Predictions

This quickstart guides the Application Operator (AO) through the process of using the Vertex AI Online Predictions API on Google Distributed Cloud (GDC) air-gapped.

Before you begin

Before trying online predictions, perform the following steps:

  1. Create and train a prediction model targeting one of the supported containers.
  2. If you don't have a project, work with your Platform Administrator (PA) to create one.
  3. Work with your Infrastructure Operator (IO) to ensure the Prediction user cluster exists and your user project allows incoming external traffic.
  4. Export your model artifacts for prediction.
  5. Deploy your model to an endpoint.
  6. Format your input for online prediction.

Get an authentication token

You must get a token to authenticate the requests to the Online Prediction service. This step is necessary if you use the curl tool to make requests.

Follow these steps to get an authentication token:

gdcloud CLI

Export the identity token for the specified account to an environment variable:

export TOKEN="$($HOME/gdcloud auth print-identity-token --audiences=https://ENDPOINT)"

Replace ENDPOINT with the Online Predictions endpoint. For more information, view service statuses and endpoints.

Python

  1. Install the google-auth client library:

    pip install google-auth
    
  2. Add the following code to a Python script:

    import google.auth
    from google.auth.transport import requests
    
    api_endpoint = "https://ENDPOINT"
    
    creds, project_id = google.auth.default()
    creds = creds.with_gdch_audience(api_endpoint)
    
    def test_get_token():
      req = requests.Request()
      creds.refresh(req)
      print(creds.token)
    
    if __name__=="__main__":
      test_get_token()
    

    Replace ENDPOINT with the Online Predictions endpoint that you use for your organization. For more information, view service status and endpoints.

  3. Save the Python script with a name such as prediction.py.

  4. Run the Python script to fetch the token:

    python SCRIPT_NAME
    

    Replace SCRIPT_NAME with the name you gave to your Python script, such as prediction.py.

The output shows the authentication token.

Add the token to the header of the curl requests you make, as in the following example:

-H "Authorization: Bearer TOKEN"

Send an online prediction request

Send an online prediction request to the model's endpoint URL using HTTP or gRPC.

HTTP

The following example uses HTTP to send an online prediction request.

Use the curl tool to call the HTTP endpoint. For example:

curl -X POST -H "Content-Type: application/json; charset=utf-8" -H "Authorization: Bearer TOKEN"
https://ENDPOINT_URL_PATH.GDC_URL:443/v1/model:predict -d @JSON_FILE_NAME.json

{
    "predictions": [[-357.10849], [-171.621658]
    ]
}

Replace the following:

  • ENDPOINT_URL_PATH: the endpoint URL path for the online prediction request.
  • GDC_URL: the URL of your organization in Distributed Cloud, for example, org-1.zone1.gdch.test.
  • JSON_FILE_NAME: the name of the JSON file with the request body details for your online prediction.
  • TOKEN: the authentication token you obtained.

You obtain the output following the command. The API response is in JSON format.

gRPC

The following example uses gRPC to send an online prediction request:

  1. Install the google-cloud-aiplatform Python client library by following the instructions from Install Vertex AI client libraries.

    When downloading the client library you want to install, choose one of the following library files, depending on your operating system:

    • CentOS: centos-google-cloud-aiplatform-1.34.0.tar.gz
    • Ubuntu: ubuntu-google-cloud-aiplatform-1.34.0.tar.gz

    Use the following URL to download the client library:

    https://GDC_URL/.well-known/static/client-libraries/LIBRARY_FILE
    

    Replace the following:

    • GDC_URL: the URL of your organization in Distributed Cloud.
    • LIBRARY_FILE: the name of the library file depending on the operating system, for example, ubuntu-google-cloud-aiplatform-1.34.0.tar.gz.
  2. Save the following code to a Python script:

    import json
    import os
    from typing import Sequence
    
    import grpc
    from absl import app
    from absl import flags
    
    from google.auth.transport import requests
    from google.protobuf import json_format
    from google.protobuf.struct_pb2 import Value
    from google.cloud.aiplatform_v1.services import prediction_service
    
    _INPUT = flags.DEFINE_string("input", None, "input", required=True)
    _HOST = flags.DEFINE_string("host", None, "Prediction endpoint", required=True)
    _ENDPOINT_ID = flags.DEFINE_string("endpoint_id", None, "endpoint id", required=True)
    
    os.environ["GRPC_DEFAULT_SSL_ROOTS_FILE_PATH"] = "path-to-ca-cert-file.cert"
    
    # ENDPOINT_RESOURCE_NAME is a placeholder value that doesn't affect prediction behavior.
    ENDPOINT_RESOURCE_NAME="projects/000000000000/locations/us-central1/endpoints/00000000000000"
    
    def get_sts_token(host):
      creds = None
      try:
        creds, _ = google.auth.default()
        creds = creds.with_gdch_audience(host+":443")
        req = requests.Request()
        creds.refresh(req)
        print("Got token: ")
        print(creds.token)
      except Exception as e:
        print("Caught exception" + str(e))
        raise e
    return creds.token
    
    # predict_client_secure builds a client that requires TLS
    def predict_client_secure(host, token):
      with open(os.environ["GRPC_DEFAULT_SSL_ROOTS_FILE_PATH"], 'rb') as f:
        channel_creds = grpc.ssl_channel_credentials(f.read())
    
      call_creds = grpc.access_token_call_credentials(token)
    
      creds = grpc.composite_channel_credentials(
        channel_creds,
        call_creds,
      )
    
      client = prediction_service.PredictionServiceClient(
          transport=prediction_service.transports.grpc.PredictionServiceGrpcTransport(
           channel=grpc.secure_channel(target=host+":443", credentials=creds)))
    
      return client
    
    def predict_func(client, instances):
      resp = client.predict(
        endpoint=ENDPOINT_RESOURCE_NAME,
        instances=instances,
        metadata=[("x-vertex-ai-endpoint-id", _ENDPOINT_ID.value)]
      )
      print(resp)
    
    def main(argv: Sequence[str]):
      del argv  # Unused.
      with open(_INPUT.value) as json_file:
          data = json.load(json_file)
          instances = [json_format.ParseDict(s, Value()) for s in data["instances"]]
    
      token = get_sts_token(_HOST.value)
      client = predict_client_secure(_HOST.value, token)
      predict_func(client=client, instances=instances)
    
    if __name__=="__main__":
      app.run(main)
    
  3. Make the gRPC call to the prediction server:

    python PYTHON_FILE_NAME.py --input JSON_FILE_NAME.json \
        --host ENDPOINT_URL_PATH.GDC_URL \
        --endpoint_id ENDPOINT_ID \
    

    Replace the following:

    • PYTHON_FILE_NAME: the name of the Python file where you saved the script.
    • JSON_FILE_NAME: the name of the JSON file with the request body details for your online prediction.
    • ENDPOINT_URL_PATH: the endpoint URL path for the online prediction request.
    • GDC_URL: the URL of your organization in Distributed Cloud, for example, org-1.zone1.gdch.test.
    • ENDPOINT_ID: the value of the endpoint ID.

If successful, you receive a JSON response similar to one of the responses on Response body examples.