This page shows you how to get online (real-time) inferences and explanations from your tabular classification or regression models using the Google Cloud console or the Vertex AI API.
An online inference is a synchronous request as opposed to a batch inference, which is an asynchronous request. Use online inferences when making requests in response to application input or in other situations where you require timely inference.
You must deploy a model to an endpoint before you can use that model to serve online inferences. Deploying a model associates physical resources with the model so it can serve online inferences with low latency.
The topics covered are:
- Deploy a model to an endpoint
- Get an online inference using your deployed model
- Get an online explanation using your deployed model
Before you begin
Before you can get online inferences, you must first train a classification or regression model.
Deploy a model to an endpoint
You can deploy more than one model to an endpoint, and you can deploy a model to more than one endpoint. For more information about options and use cases for deploying models, see About deploying models.
Use one of the following methods to deploy a model:
Google Cloud console
- In the Google Cloud console, in the Vertex AI section, go to the Models page. 
- Click the name of the model you want to deploy to open its details page. 
- Select the Deploy & Test tab. - If your model is already deployed to any endpoints, they are listed in the Deploy your model section. 
- Click Deploy to endpoint. 
- In the Define your endpoint page, configure as follows: - You can choose to deploy your model to a new endpoint or an existing endpoint. - To deploy your model to a new endpoint, select Create new endpoint and provide a name for the new endpoint.
- To deploy your model to an existing endpoint, select Add to existing endpoint and select the endpoint from the drop-down list.
- You can add more than one model to an endpoint, and you can add a model to more than one endpoint. Learn more.
 
- Click Continue. 
 
- In the Model settings page, configure as follows: - 
     If you're deploying your model to a new endpoint, accept 100 for the Traffic split. If you're deploying your model to an existing endpoint that has one or more models deployed to it, you must update the Traffic split percentage for the model you are deploying and the already deployed models so that all of the percentages add up to 100%. 
- 
     Enter the Minimum number of compute nodes you want to provide for your model. This is the number of nodes available to this model at all times. You are charged for the nodes used, whether to handle inference load or for standby (minimum) nodes, even without inference traffic. See the pricing page. 
- 
     Select your Machine type. Larger machine resources will increase your inference performance and increase costs. 
- 
     Learn how to change the default settings for inference logging. 
- 
    Click Continue 
 
- 
     
- In the Model monitoring page, click Continue. 
- In the Monitoring objectives page, configure as follows: - Enter the location of your training data.
- Enter the name of the target column.
 
- Click Deploy to deploy your model to the endpoint. 
API
When you deploy a model using the Vertex AI API, complete the following steps:
- Create an endpoint if needed.
- Get the endpoint ID.
- Deploy the model to the endpoint.
Create an endpoint
If you are deploying a model to an existing endpoint, you can skip this step.
gcloud
The following example uses the gcloud ai endpoints create command:
  gcloud ai endpoints create \
    --region=LOCATION \
    --display-name=ENDPOINT_NAME
Replace the following:
- LOCATION_ID: The region where you are using Vertex AI.
- ENDPOINT_NAME: The display name for the endpoint. - The Google Cloud CLI tool might take a few seconds to create the endpoint. 
REST
Before using any of the request data, make the following replacements:
- LOCATION_ID: Your region.
- PROJECT_ID: Your project ID.
- ENDPOINT_NAME: The display name for the endpoint.
HTTP method and URL:
POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints
Request JSON body:
{
  "display_name": "ENDPOINT_NAME"
}
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/endpoints/ENDPOINT_ID/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.CreateEndpointOperationMetadata",
    "genericMetadata": {
      "createTime": "2020-11-05T17:45:42.812656Z",
      "updateTime": "2020-11-05T17:45:42.812656Z"
    }
  }
}
"done": true.
  
Java
Before trying this sample, follow the Java setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Java API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
Before trying this sample, follow the Node.js setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Node.js API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
Get the endpoint ID
You need the endpoint ID to deploy the model.
gcloud
The following example uses the gcloud ai endpoints list command:
  gcloud ai endpoints list \
    --region=LOCATION \
    --filter=display_name=ENDPOINT_NAME
Replace the following:
- LOCATION_ID: The region where you are using Vertex AI.
- ENDPOINT_NAME: The display name for the endpoint. - Note the number that appears in the - ENDPOINT_IDcolumn. Use this ID in the following step.
REST
Before using any of the request data, make the following replacements:
- LOCATION_ID: The region where you are using Vertex AI.
- PROJECT_ID: .
- ENDPOINT_NAME: The display name for the endpoint.
HTTP method and URL:
GET https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints?filter=display_name=ENDPOINT_NAME
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
{
  "endpoints": [
    {
      "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/endpoints/ENDPOINT_ID",
      "displayName": "ENDPOINT_NAME",
      "etag": "AMEw9yPz5pf4PwBHbRWOGh0PcAxUdjbdX2Jm3QO_amguy3DbZGP5Oi_YUKRywIE-BtLx",
      "createTime": "2020-04-17T18:31:11.585169Z",
      "updateTime": "2020-04-17T18:35:08.568959Z"
    }
  ]
}
Deploy the model
Select the tab below for your language or environment:
gcloud
The following examples use the gcloud ai endpoints deploy-model command.
The following example deploys a Model to an Endpoint without using GPUs
  to accelerate prediction serving and without splitting traffic between multiple
  DeployedModel resources:
Before using any of the command data below, make the following replacements:
- ENDPOINT_ID: The ID for the endpoint.
- LOCATION_ID: The region where you are using Vertex AI.
- MODEL_ID: The ID for the model to be deployed.
- 
DEPLOYED_MODEL_NAME: A name for the DeployedModel. You can use the display name of theModelfor theDeployedModelas well.
- 
MACHINE_TYPE: Optional. The machine resources used for each node of this
deployment. Its default setting is n1-standard-2. Learn more about machine types.
- 
MIN_REPLICA_COUNT: The minimum number of nodes for this deployment.
The node count can be increased or decreased as required by the inference load,
up to the maximum number of nodes and never fewer than this number of nodes.
    
      This value must be greater than or equal to 1. If the --min-replica-countflag is omitted, the value defaults to 1.
- 
MAX_REPLICA_COUNT: The maximum number of nodes for this deployment.
The node count can be increased or decreased as required by the inference load,
up to this number of nodes and never fewer than the minimum number of nodes.
 If you omit the --max-replica-countflag, then maximum number of nodes is set to the value of--min-replica-count.
Execute the gcloud ai endpoints deploy-model command:
Linux, macOS, or Cloud Shell
gcloud ai endpoints deploy-model ENDPOINT_ID\ --region=LOCATION_ID \ --model=MODEL_ID \ --display-name=DEPLOYED_MODEL_NAME \ --machine-type=MACHINE_TYPE \ --min-replica-count=MIN_REPLICA_COUNT \ --max-replica-count=MAX_REPLICA_COUNT \ --traffic-split=0=100
Windows (PowerShell)
gcloud ai endpoints deploy-model ENDPOINT_ID` --region=LOCATION_ID ` --model=MODEL_ID ` --display-name=DEPLOYED_MODEL_NAME ` --machine-type=MACHINE_TYPE ` --min-replica-count=MIN_REPLICA_COUNT ` --max-replica-count=MAX_REPLICA_COUNT ` --traffic-split=0=100
Windows (cmd.exe)
gcloud ai endpoints deploy-model ENDPOINT_ID^ --region=LOCATION_ID ^ --model=MODEL_ID ^ --display-name=DEPLOYED_MODEL_NAME ^ --machine-type=MACHINE_TYPE ^ --min-replica-count=MIN_REPLICA_COUNT ^ --max-replica-count=MAX_REPLICA_COUNT ^ --traffic-split=0=100
Splitting traffic
The --traffic-split=0=100 flag in the preceding examples sends 100% of prediction
traffic that the Endpoint receives to the new DeployedModel, which is
represented by the temporary ID 0. If your Endpoint already has other
DeployedModel resources, then you can split traffic between the new
DeployedModel and the old ones.
For example, to send 20% of traffic to the new DeployedModel and 80% to an older one,
run the following command.
Before using any of the command data below, make the following replacements:
- OLD_DEPLOYED_MODEL_ID: the ID of the existing
    DeployedModel.
Execute the gcloud ai endpoints deploy-model command:
Linux, macOS, or Cloud Shell
gcloud ai endpoints deploy-model ENDPOINT_ID\ --region=LOCATION_ID \ --model=MODEL_ID \ --display-name=DEPLOYED_MODEL_NAME \ --machine-type=MACHINE_TYPE \ --min-replica-count=MIN_REPLICA_COUNT \ --max-replica-count=MAX_REPLICA_COUNT \ --traffic-split=0=20,OLD_DEPLOYED_MODEL_ID=80
Windows (PowerShell)
gcloud ai endpoints deploy-model ENDPOINT_ID` --region=LOCATION_ID ` --model=MODEL_ID ` --display-name=DEPLOYED_MODEL_NAME \ --machine-type=MACHINE_TYPE ` --min-replica-count=MIN_REPLICA_COUNT ` --max-replica-count=MAX_REPLICA_COUNT ` --traffic-split=0=20,OLD_DEPLOYED_MODEL_ID=80
Windows (cmd.exe)
gcloud ai endpoints deploy-model ENDPOINT_ID^ --region=LOCATION_ID ^ --model=MODEL_ID ^ --display-name=DEPLOYED_MODEL_NAME \ --machine-type=MACHINE_TYPE ^ --min-replica-count=MIN_REPLICA_COUNT ^ --max-replica-count=MAX_REPLICA_COUNT ^ --traffic-split=0=20,OLD_DEPLOYED_MODEL_ID=80
REST
You use the endpoints.predict method to request an online inference.
Deploy the model.
Before using any of the request data, make the following replacements:
- LOCATION_ID: The region where you are using Vertex AI.
- PROJECT_ID: .
- ENDPOINT_ID: The ID for the endpoint.
- MODEL_ID: The ID for the model to be deployed.
- 
DEPLOYED_MODEL_NAME: A name for the DeployedModel. You can use the display name of theModelfor theDeployedModelas well.
- 
MACHINE_TYPE: Optional. The machine resources used for each node of this
deployment. Its default setting is n1-standard-2. Learn more about machine types.
- ACCELERATOR_TYPE: The type of accelerator to be attached to the machine. Optional if ACCELERATOR_COUNT is not specified or is zero. Not recommended for AutoML models or custom-trained models that are using non-GPU images. Learn more.
- ACCELERATOR_COUNT: The number of accelerators for each replica to use. Optional. Should be zero or unspecified for AutoML models or custom-trained models that are using non-GPU images.
- MIN_REPLICA_COUNT: The minimum number of nodes for this deployment. The node count can be increased or decreased as required by the inference load, up to the maximum number of nodes and never fewer than this number of nodes. This value must be greater than or equal to 1.
- MAX_REPLICA_COUNT: The maximum number of nodes for this deployment. The node count can be increased or decreased as required by the inference load, up to this number of nodes and never fewer than the minimum number of nodes.
- REQUIRED_REPLICA_COUNT: Optional. The required number of nodes for this deployment to be marked as successful. Must be greater than or equal to 1 and fewer than or equal to the minimum number of nodes. If not specified, the default value is the minimum number of nodes.
- TRAFFIC_SPLIT_THIS_MODEL: The percentage of the prediction traffic to this endpoint to be routed to the model being deployed with this operation. Defaults to 100. All traffic percentages must add up to 100. Learn more about traffic splits.
- DEPLOYED_MODEL_ID_N: Optional. If other models are deployed to this endpoint, you must update their traffic split percentages so that all percentages add up to 100.
- TRAFFIC_SPLIT_MODEL_N: The traffic split percentage value for the deployed model id key.
- PROJECT_NUMBER: Your project's automatically generated project number
HTTP method and URL:
POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel
Request JSON body:
{
  "deployedModel": {
    "model": "projects/PROJECT/locations/us-central1/models/MODEL_ID",
    "displayName": "DEPLOYED_MODEL_NAME",
    "dedicatedResources": {
       "machineSpec": {
         "machineType": "MACHINE_TYPE",
         "acceleratorType": "ACCELERATOR_TYPE",
         "acceleratorCount": "ACCELERATOR_COUNT"
       },
       "minReplicaCount": MIN_REPLICA_COUNT,
       "maxReplicaCount": MAX_REPLICA_COUNT,
       "requiredReplicaCount": REQUIRED_REPLICA_COUNT
     },
  },
  "trafficSplit": {
    "0": TRAFFIC_SPLIT_THIS_MODEL,
    "DEPLOYED_MODEL_ID_1": TRAFFIC_SPLIT_MODEL_1,
    "DEPLOYED_MODEL_ID_2": TRAFFIC_SPLIT_MODEL_2
  },
}
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
{
  "name": "projects/PROJECT_ID/locations/LOCATION/endpoints/ENDPOINT_ID/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.DeployModelOperationMetadata",
    "genericMetadata": {
      "createTime": "2020-10-19T17:53:16.502088Z",
      "updateTime": "2020-10-19T17:53:16.502088Z"
    }
  }
}
Java
Before trying this sample, follow the Java setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Java API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
Node.js
Before trying this sample, follow the Node.js setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Node.js API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Learn how to change the default settings for inference logging.
Get operation status
Some requests start long-running operations that require time to complete. These requests return an operation name, which you can use to view the operation's status or cancel the operation. Vertex AI provides helper methods to make calls against long-running operations. For more information, see Working with long-running operations.
Get an online inference using your deployed model
To make an online inference, submit one or more test items to a model for analysis. The model returns results that are based on your model's objective. Use the Google Cloud console or the Vertex AI API to request an online inference.
Google Cloud console
- In the Google Cloud console, in the Vertex AI section, go to the Models page. 
- From the list of models, click the name of the model to request inferences from. 
- Select the Deploy & test tab. 
- Under the Test your model section, add test items to request an inference. The baseline inference data is filled in for you, or you can enter your own inference data and click Predict. - After the inference is complete, Vertex AI returns the results in the console. 
API: Classification
gcloud
- 
    Create a file named request.jsonwith the following contents:{ "instances": [ { PREDICTION_DATA_ROW } ] }Replace the following: - 
      
PREDICTION_DATA_ROW: A JSON object with keys as the feature names and values as the corresponding feature values. For example, for a dataset with a number, an array of strings, and a category, the row of data might look like the following example request: "length":3.6, "material":"cotton", "tag_array": ["abc","def"] A value must be provided for every feature included in training. The format of the data used for prediction must match the format used for training. Refer to Data format for predictions for details. 
 
- 
      
- 
    
Run the following command: gcloud ai endpoints predict ENDPOINT_ID \ --region=LOCATION_ID \ --json-request=request.json Replace the following: - ENDPOINT_ID: The ID for the endpoint.
- LOCATION_ID: The region where you are using Vertex AI.
 
REST
You use the endpoints.predict method to request an online inference.
Before using any of the request data, make the following replacements:
- 
       LOCATION_ID: Region where Endpoint is located. For example, us-central1.
- PROJECT_ID: Your project ID.
- ENDPOINT_ID: The ID for the endpoint.
- 
PREDICTION_DATA_ROW: A JSON object with keys as the feature names and values as the corresponding feature values. For example, for a dataset with a number, an array of strings, and a category, the row of data might look like the following example request: "length":3.6, "material":"cotton", "tag_array": ["abc","def"] A value must be provided for every feature included in training. The format of the data used for prediction must match the format used for training. Refer to Data format for predictions for details. 
- DEPLOYED_MODEL_ID: Output by the predictmethod, and accepted as input by theexplainmethod. The ID of the model used to generate the inference. If you need to request explanations for a previously requested inference, and you have more than one model deployed, you can use this ID to ensure that the explanations are returned for the same model that provided the previous inference.
HTTP method and URL:
POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:predict
Request JSON body:
{
  "instances": [
    {
      PREDICTION_DATA_ROW
    }
  ]
}
To send your request, choose one of these options:
curl
      Save the request body in a file named request.json,
      and execute the following command:
    
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:predict"
PowerShell
      Save the request body in a file named request.json,
      and execute the following command:
    
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:predict" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
   {
     "predictions": [
      {
         "scores": [
           0.96771615743637085,
           0.032283786684274673
         ],
         "classes": [
           "0",
           "1"
         ]
      }
     ]
     "deployedModelId": "2429510197"
   }
   Java
Before trying this sample, follow the Java setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Java API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
Before trying this sample, follow the Node.js setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Node.js API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
API: Regression
gcloud
- 
    Create a file named `request.json` with the following contents: { "instances": [ { PREDICTION_DATA_ROW } ] }Replace the following: - 
PREDICTION_DATA_ROW: A JSON object with keys as the feature names and values as the corresponding feature values. For example, for a dataset with a number, an array of numbers, and a category, the row of data might look like the following example request: "age":3.6, "sq_ft":5392, "code": "90331" A value must be provided for every feature included in training. The format of the data used for prediction must match the format used for training. Refer to Data format for predictions for details. 
 
- 
- 
Run the following command: gcloud ai endpoints predict ENDPOINT_ID \ --region=LOCATION_ID \ --json-request=request.json Replace the following: - ENDPOINT_ID: The ID for the endpoint.
- LOCATION_ID: The region where you are using Vertex AI.
 
REST
You use the endpoints.predict method to request an online inference.
Before using any of the request data, make the following replacements:
- 
       LOCATION_ID: Region where Endpoint is located. For example, us-central1.
- PROJECT_ID: .
- ENDPOINT_ID: The ID for the endpoint.
- 
PREDICTION_DATA_ROW: A JSON object with keys as the feature names and values as the corresponding feature values. For example, for a dataset with a number, an array of numbers, and a category, the row of data might look like the following example request: "age":3.6, "sq_ft":5392, "code": "90331" A value must be provided for every feature included in training. The format of the data used for prediction must match the format used for training. Refer to Data format for predictions for details. 
- DEPLOYED_MODEL_ID: Output by the predictmethod, and accepted as input by theexplainmethod. The ID of the model used to generate the inference. If you need to request explanations for a previously requested inference, and you have more than one model deployed, you can use this ID to ensure that the explanations are returned for the same model that provided the previous inference.
HTTP method and URL:
POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:predict
Request JSON body:
{
  "instances": [
    {
      PREDICTION_DATA_ROW
    }
  ]
}
To send your request, choose one of these options:
curl
      Save the request body in a file named request.json,
      and execute the following command:
    
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:predict"
PowerShell
      Save the request body in a file named request.json,
      and execute the following command:
    
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:predict" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
{
  "predictions": [
    [
      {
        "value": 65.14233
      }
    ]
  ],
  "deployedModelId": "DEPLOYED_MODEL_ID"
}
Java
Before trying this sample, follow the Java setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Java API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
Before trying this sample, follow the Node.js setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Node.js API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
Interpret prediction results
Classification
Classification models return a confidence score.
The confidence score communicates how strongly your model associates each class or label with a test item. The higher the number, the higher the model's confidence that the label should be applied to that item. You decide how high the confidence score must be for you to accept the model's results.
Regression
Regression models return an inference value.
If your model uses probabilistic inference, the value field contains the
minimizer of the optimization objective. For example, if your optimization
objective is minimize-rmse, the value field contains the mean value.
If it is minimize-mae, the value field contains the median value.
If your model uses probabilistic inference with quantiles, Vertex AI provides quantile values and inferences in addition to the minimizer of the optimization objective. Quantile values are set during model training. Quantile inferences are the inference values associated with the quantile values.
Get an online explanation using your deployed model
You can request an inference with explanations (also called feature attributions) to see how your model arrived at an inference. The local feature importance values tell you how much each feature contributed to the inference result. Feature attributions are included in Vertex AI inferences through Vertex Explainable AI.
Console
When you use the Google Cloud console to request an online inference, the local feature importance values are automatically returned.
If you used the pre-filled prediction values, the local feature importance values are all zero. This is because the pre-filled values are the baseline prediction data, so the prediction returned is the baseline prediction value.
gcloud
- Create a file named - request.jsonwith the following contents:- { "instances": [ { PREDICTION_DATA_ROW } ] }- Replace the following: - 
PREDICTION_DATA_ROW: A JSON object with keys as the feature names and values as the corresponding feature values. For example, for a dataset with a number, an array of strings, and a category, the row of data might look like the following example request: "length":3.6, "material":"cotton", "tag_array": ["abc","def"] A value must be provided for every feature included in training. The format of the data used for prediction must match the format used for training. Refer to Data format for predictions for details. 
 
- 
- Run the following command: - gcloud ai endpoints explain ENDPOINT_ID \ --region=LOCATION_ID \ --json-request=request.json - Replace the following: - ENDPOINT_ID: The ID for the endpoint.
- LOCATION_ID: The region where you are using Vertex AI.
 - Optionally, if you want to send an explanation request to a specific - DeployedModelon the- Endpoint, you can specify the- --deployed-model-idflag:- gcloud ai endpoints explain ENDPOINT_ID \ --region=LOCATION \ --deployed-model-id=DEPLOYED_MODEL_ID \ --json-request=request.json - In addition to the placeholders described previously, replace the following: - 
DEPLOYED_MODEL_ID Optional: The ID of the deployed model for which you want to get
explanations. The ID is included in the predictmethod's response. If you need to request explanations for a particular model and you have more than one model deployed to the same endpoint, you can use this ID to ensure that the explanations are returned for that particular model.
 
REST
The following example shows an online inference request for a tabular classification model with local feature attributions. The request format is the same for regression models.
Before using any of the request data, make the following replacements:
- 
    LOCATION: Region where Endpoint is located. For example, us-central1.
- PROJECT: .
- ENDPOINT_ID: The ID for the endpoint.
- 
PREDICTION_DATA_ROW: A JSON object with keys as the feature names and values as the corresponding feature values. For example, for a dataset with a number, an array of strings, and a category, the row of data might look like the following example request: "length":3.6, "material":"cotton", "tag_array": ["abc","def"] A value must be provided for every feature included in training. The format of the data used for prediction must match the format used for training. Refer to Data format for predictions for details. 
- 
    DEPLOYED_MODEL_ID (optional): The ID of the deployed model for which you want to get
    explanations. The ID is included in the predictmethod's response. If you need to request explanations for a particular model and you have more than one model deployed to the same endpoint, you can use this ID to ensure that the explanations are returned for that particular model.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/endpoints/ENDPOINT_ID:explain
Request JSON body:
{
  "instances": [
    {
      PREDICTION_DATA_ROW
    }
  ],
  "deployedModelId": "DEPLOYED_MODEL_ID"
}
To send your request, choose one of these options:
curl
      Save the request body in a file named request.json,
      and execute the following command:
    
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/endpoints/ENDPOINT_ID:explain"
PowerShell
      Save the request body in a file named request.json,
      and execute the following command:
    
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/endpoints/ENDPOINT_ID:explain" | Select-Object -Expand Content
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
Get explanations for a previously returned prediction
Because explanations increase resource usage, you might want to reserve requesting explanations for situations when you specifically need them. Sometimes, it can be helpful to request explanations for an inference result you've already received, perhaps because the inference was an outlier or did not make sense.
If all of your inferences are coming from the same model, you can simply resend
the request data, with explanations requested this time. However, if you have
multiple models returning inferences, you must make sure you send the
explanation request to the correct model. You can view explanations for a
particular model by including the deployed model's ID deployedModelID in your
request, which is included in the response of the original inference request.
Note that the deployed model ID is different from the model ID.
Interpret explanation results
To calculate local feature importance, first the baseline inference score is calculated. Baseline values are computed from the training data, using the median value for numeric features and the mode for categorical features. The inference generated from the baseline values is the baseline inference score. Baseline values are calculated once for a model and do not change.
For a specific inference, the local feature importance for each feature tells you how much that feature added to or subtracted from the result as compared with the baseline inference score. The sum of all of the feature importance values equals the difference between the baseline inference score and the inference result.
For classification models, the score is always between 0.0 and 1.0, inclusive. Therefore, local feature importance values for classification models are always between -1.0 and 1.0 (inclusive).
For examples of feature attribution queries and to learn more, see Feature Attributions for Classification and Regression.Example output for inferences and explanations
Classification
The return payload for an online inference from a tabular classification model with feature importance looks similar to the following example.
The instanceOutputValue of 0.928652400970459 is the
confidence score of the highest-scoring class, in this case
class_a. The baselineOutputValue field contains
the baseline inference score, 0.808652400970459. The feature that
contributed most strongly to this result was feature_3.
{
"predictions": [
  {
    "scores": [
      0.928652400970459,
      0.071347599029541
    ],
    "classes": [
      "class_a",
      "class_b"
    ]
  }
]
"explanations": [
  {
    "attributions": [
      {
        "baselineOutputValue": 0.808652400970459,
        "instanceOutputValue": 0.928652400970459,
        "approximationError":  0.0058915703929231,
        "featureAttributions": {
          "feature_1": 0.012394922231235,
          "feature_2": 0.050212341234556,
          "feature_3": 0.057392736534209,
        },
        "outputIndex": [
          0
        ],
        "outputName": "scores"
      }
    ],
  }
]
"deployedModelId": "234567"
}
Regression
The return payload for an online inference with feature importance from a tabular regression model looks similar to the following example.
The instanceOutputValue of 1795.1246466281819 is the
predicted value. The baselineOutputValue field contains
the baseline inference score, 1788.7423095703125. The feature that
contributed most strongly to this result was feature_3.
{
"predictions": [
  {
    "value": 1795.1246466281819
  }
]
"explanations": [
  {
    "attributions": [
      {
        "baselineOutputValue": 1788.7423095703125,
        "instanceOutputValue": 1795.1246466281819,
        "approximationError": 0.0038215703911553,
        "featureAttributions": {
          "feature_1": 0.123949222312359,
          "feature_2": 0.802123412345569,
          "feature_3": 5.456264423211472,
        },
        "outputIndex": [
          -1
        ]
      }
    ]
  }
],
"deployedModelId": "345678"
}
What's next
- Learn how to export your model.
- Learn about pricing for online inferences.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-10-24 UTC.