此页面由 Cloud Translation API 翻译。

获取在线推理和说明

本页介绍了如何使用 Google Cloud 控制台或 Vertex AI API 从表格分类或回归模型获取在线（实时）推理结果和说明。

在线推理是同步请求，与之相对的是批量推理（异步请求）。如果您要发出请求以响应应用输入，或者在其他需要及时推理的情况下，可以使用在线推理。

您必须先将模型部署到端点，然后才能使用该模型执行在线推理。部署模型会将物理资源与模型相关联，以便以低延迟方式执行在线推理。

涵盖的主题如下：

将模型部署到端点
使用已部署的模型获取在线推理结果
使用已部署的模型获取在线说明

准备工作

在获取在线推理结果之前，您必须先训练分类或回归模型，并针对准确率进行评估。

将模型部署到端点

您可以将多个模型部署到一个端点，也可以将一个模型部署到多个端点。如需详细了解部署模型的方法和使用场景，请参阅部署模型简介。

请使用以下方法之一部署模型：

Google Cloud 控制台

在 Google Cloud 控制台的 Vertex AI 部分中，前往模型页面。

转到“模型”页面
点击要部署的模型名称以打开其详情页面。
选择部署和测试标签页。

如果模型已部署到任何端点，部署模型 (Deploy your model) 部分中会列出这些端点。
点击部署到端点。
在定义端点页面中，按如下方式配置：
1. 您可以选择将模型部署到新端点或现有端点。
  - 如需将模型部署到新的端点，请选择 创建新端点并为新端点提供名称。
  - 如需将模型部署到现有端点，请选择 添加到现有端点，然后从下拉列表中选择端点。
  - 您可以将多个模型添加到一个端点，也可以将一个模型添加到多个端点。了解详情。
2. 点击继续。
在模型设置页面中，按如下方式配置：
1. 如果您要将模型部署到新端点，请接受 100 的流量拆分值。如果您要将模型部署到已部署有一个或多个模型的现有端点，则必须为要部署的模型和已部署模型更新流量拆分百分比，以使所有百分比的总和为 100%。
2. 输入要为模型提供的计算节点数下限。
  
  这是此模型始终可用的节点数。您需要为使用的节点（无论是否处理推理负载）或备用（最少）节点付费，即使没有推理流量也是如此。请参阅价格页面。
3. 选择机器类型。
  
  较大的机器资源会提高推理性能并增加费用。
4. 了解如何更改推理日志记录的默认设置。
5. 点击继续
在模型监控页面中，点击继续。
在监控目标页面中，按如下方式配置：
1. 输入训练数据的位置。
2. 输入目标列的名称。
点击部署，将模型部署到端点。

API

使用 Vertex AI API 部署模型时，请完成以下步骤：

根据需要创建端点。
获取端点 ID。
将模型部署到端点。

创建端点

如果要将模型部署到现有端点，您可以跳过此步骤。

gcloud

以下示例使用 gcloud ai endpoints create 命令：

  gcloud ai endpoints create \
    --region=LOCATION \
    --display-name=ENDPOINT_NAME

替换以下内容：

LOCATION_ID：您在其中使用 Vertex AI 的区域。
ENDPOINT_NAME：端点的显示名称。

Google Cloud CLI 工具可能需要几秒钟才能创建端点。

REST

在使用任何请求数据之前，请先进行以下替换：

LOCATION_ID：您的区域。
PROJECT_ID：您的项目 ID。
ENDPOINT_NAME：端点的显示名称。

HTTP 方法和网址：

POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints

请求 JSON 正文：

{
  "display_name": "ENDPOINT_NAME"
}

如需发送您的请求，请展开以下选项之一：

curl（Linux、macOS 或 Cloud Shell）

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI，或者使用了 Cloud Shell，这会使您自动登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints"

PowerShell (Windows)

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints" | Select-Object -Expand Content

您应该收到类似以下内容的 JSON 响应：

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/endpoints/ENDPOINT_ID/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.CreateEndpointOperationMetadata",
    "genericMetadata": {
      "createTime": "2020-11-05T17:45:42.812656Z",
      "updateTime": "2020-11-05T17:45:42.812656Z"
    }
  }
}

您可以轮询操作状态，直到响应包含 "done": true。

Java

在尝试此示例之前，请按照《Vertex AI 快速入门：使用客户端库》中的 Java 设置说明执行操作。如需了解详情，请参阅 Vertex AI Java API 参考文档。

如需向 Vertex AI 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。


import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.aiplatform.v1.CreateEndpointOperationMetadata;
import com.google.cloud.aiplatform.v1.Endpoint;
import com.google.cloud.aiplatform.v1.EndpointServiceClient;
import com.google.cloud.aiplatform.v1.EndpointServiceSettings;
import com.google.cloud.aiplatform.v1.LocationName;
import java.io.IOException;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;

public class CreateEndpointSample {

  public static void main(String[] args)
      throws IOException, InterruptedException, ExecutionException, TimeoutException {
    // TODO(developer): Replace these variables before running the sample.
    String project = "YOUR_PROJECT_ID";
    String endpointDisplayName = "YOUR_ENDPOINT_DISPLAY_NAME";
    createEndpointSample(project, endpointDisplayName);
  }

  static void createEndpointSample(String project, String endpointDisplayName)
      throws IOException, InterruptedException, ExecutionException, TimeoutException {
    EndpointServiceSettings endpointServiceSettings =
        EndpointServiceSettings.newBuilder()
            .setEndpoint("us-central1-aiplatform.googleapis.com:443")
            .build();

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (EndpointServiceClient endpointServiceClient =
        EndpointServiceClient.create(endpointServiceSettings)) {
      String location = "us-central1";
      LocationName locationName = LocationName.of(project, location);
      Endpoint endpoint = Endpoint.newBuilder().setDisplayName(endpointDisplayName).build();

      OperationFuture<Endpoint, CreateEndpointOperationMetadata> endpointFuture =
          endpointServiceClient.createEndpointAsync(locationName, endpoint);
      System.out.format("Operation name: %s\n", endpointFuture.getInitialFuture().get().getName());
      System.out.println("Waiting for operation to finish...");
      Endpoint endpointResponse = endpointFuture.get(300, TimeUnit.SECONDS);

      System.out.println("Create Endpoint Response");
      System.out.format("Name: %s\n", endpointResponse.getName());
      System.out.format("Display Name: %s\n", endpointResponse.getDisplayName());
      System.out.format("Description: %s\n", endpointResponse.getDescription());
      System.out.format("Labels: %s\n", endpointResponse.getLabelsMap());
      System.out.format("Create Time: %s\n", endpointResponse.getCreateTime());
      System.out.format("Update Time: %s\n", endpointResponse.getUpdateTime());
    }
  }
}

Node.js

在尝试此示例之前，请按照《Vertex AI 快速入门：使用客户端库》中的 Node.js 设置说明执行操作。如需了解详情，请参阅 Vertex AI Node.js API 参考文档。

如需向 Vertex AI 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。

/**
 * TODO(developer): Uncomment these variables before running the sample.\
 * (Not necessary if passing values as arguments)
 */

// const endpointDisplayName = 'YOUR_ENDPOINT_DISPLAY_NAME';
// const project = 'YOUR_PROJECT_ID';
// const location = 'YOUR_PROJECT_LOCATION';

// Imports the Google Cloud Endpoint Service Client library
const {EndpointServiceClient} = require('@google-cloud/aiplatform');

// Specifies the location of the api endpoint
const clientOptions = {
  apiEndpoint: 'us-central1-aiplatform.googleapis.com',
};

// Instantiates a client
const endpointServiceClient = new EndpointServiceClient(clientOptions);

async function createEndpoint() {
  // Configure the parent resource
  const parent = `projects/${project}/locations/${location}`;
  const endpoint = {
    displayName: endpointDisplayName,
  };
  const request = {
    parent,
    endpoint,
  };

  // Get and print out a list of all the endpoints for this resource
  const [response] = await endpointServiceClient.createEndpoint(request);
  console.log(`Long running operation : ${response.name}`);

  // Wait for operation to complete
  await response.promise();
  const result = response.result;

  console.log('Create endpoint response');
  console.log(`\tName : ${result.name}`);
  console.log(`\tDisplay name : ${result.displayName}`);
  console.log(`\tDescription : ${result.description}`);
  console.log(`\tLabels : ${JSON.stringify(result.labels)}`);
  console.log(`\tCreate time : ${JSON.stringify(result.createTime)}`);
  console.log(`\tUpdate time : ${JSON.stringify(result.updateTime)}`);
}
createEndpoint();

Python 版 Vertex AI SDK

如需了解如何安装或更新 Vertex AI SDK for Python，请参阅安装 Vertex AI SDK for Python。如需了解详情，请参阅 Python 版 Vertex AI SDK API 参考文档。

def create_endpoint_sample(
    project: str,
    display_name: str,
    location: str,
):
    aiplatform.init(project=project, location=location)

    endpoint = aiplatform.Endpoint.create(
        display_name=display_name,
        project=project,
        location=location,
    )

    print(endpoint.display_name)
    print(endpoint.resource_name)
    return endpoint

获取端点 ID

您需要端点 ID 才能部署模型。

gcloud

以下示例使用 gcloud ai endpoints list 命令：

  gcloud ai endpoints list \
    --region=LOCATION \
    --filter=display_name=ENDPOINT_NAME

替换以下内容：

LOCATION_ID：您在其中使用 Vertex AI 的区域。
ENDPOINT_NAME：端点的显示名称。

请注意 ENDPOINT_ID 列中显示的数字。请在以下步骤中使用此 ID。

REST

在使用任何请求数据之前，请先进行以下替换：

LOCATION_ID：您在其中使用 Vertex AI 的区域。
PROJECT_ID：您的项目 ID。
ENDPOINT_NAME：端点的显示名称。

HTTP 方法和网址：

GET https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints?filter=display_name=ENDPOINT_NAME

如需发送您的请求，请展开以下选项之一：

curl（Linux、macOS 或 Cloud Shell）

执行以下命令：

curl -X GET \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints?filter=display_name=ENDPOINT_NAME"

PowerShell (Windows)

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

执行以下命令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints?filter=display_name=ENDPOINT_NAME" | Select-Object -Expand Content

您应该收到类似以下内容的 JSON 响应：

{
  "endpoints": [
    {
      "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/endpoints/ENDPOINT_ID",
      "displayName": "ENDPOINT_NAME",
      "etag": "AMEw9yPz5pf4PwBHbRWOGh0PcAxUdjbdX2Jm3QO_amguy3DbZGP5Oi_YUKRywIE-BtLx",
      "createTime": "2020-04-17T18:31:11.585169Z",
      "updateTime": "2020-04-17T18:35:08.568959Z"
    }
  ]
}

请记下 ENDPOINT_ID。

部署模型

在下面选择您的语言或环境对应的标签页：

gcloud

以下示例使用 gcloud ai endpoints deploy-model 命令。

以下示例将 Model 部署到 Endpoint，但不使用 GPU 来加快预测服务速度，而且未在多个 DeployedModel 资源之间拆分流量：

在使用下面的命令数据之前，请先进行以下替换：

ENDPOINT_ID：端点的 ID。
LOCATION_ID：您在其中使用 Vertex AI 的区域。
MODEL_ID：要部署的模型的 ID。
DEPLOYED_MODEL_NAME：DeployedModel 的名称。您还可以将 Model 的显示名用于 DeployedModel。
MACHINE_TYPE：可选。用于此部署的每个节点的机器资源。其默认设置为 n1-standard-2。详细了解机器类型。
MIN_REPLICA_COUNT：此部署的最小节点数。节点数可根据推理负载的需要而增加或减少，直至达到节点数上限并且绝不会少于此节点数。此值必须大于或等于 1。如果省略 --min-replica-count 标志，则该值默认为 1。
MAX_REPLICA_COUNT：此部署的节点数上限。节点数可根据推理负载的需要而增加或减少，直至达到此节点数并且绝不会少于节点数下限。如果您省略 --max-replica-count 标志，则节点数上限将设置为 --min-replica-count 的值。

执行 gcloud ai endpoints deploy-model 命令：

Linux、macOS 或 Cloud Shell

gcloud ai endpoints deploy-model ENDPOINT_ID\
  --region=LOCATION_ID \
  --model=MODEL_ID \
  --display-name=DEPLOYED_MODEL_NAME \
  --machine-type=MACHINE_TYPE \
  --min-replica-count=MIN_REPLICA_COUNT \
  --max-replica-count=MAX_REPLICA_COUNT \
  --traffic-split=0=100

Windows (PowerShell)

gcloud ai endpoints deploy-model ENDPOINT_ID`
  --region=LOCATION_ID `
  --model=MODEL_ID `
  --display-name=DEPLOYED_MODEL_NAME `
  --machine-type=MACHINE_TYPE `
  --min-replica-count=MIN_REPLICA_COUNT `
  --max-replica-count=MAX_REPLICA_COUNT `
  --traffic-split=0=100

Windows (cmd.exe)

gcloud ai endpoints deploy-model ENDPOINT_ID^
  --region=LOCATION_ID ^
  --model=MODEL_ID ^
  --display-name=DEPLOYED_MODEL_NAME ^
  --machine-type=MACHINE_TYPE ^
  --min-replica-count=MIN_REPLICA_COUNT ^
  --max-replica-count=MAX_REPLICA_COUNT ^
  --traffic-split=0=100

拆分流量

上述示例中的 --traffic-split=0=100 标志会将 Endpoint 接收的 100% 预测流量发送到新 DeployedModel（使用临时 ID 0 表示）。如果您的 Endpoint 已有其他 DeployedModel 资源，那么您可以在新 DeployedModel 和旧资源之间拆分流量。例如，如需将 20% 的流量发送到新 DeployedModel，将 80% 发送到旧版本，请运行以下命令。

在使用下面的命令数据之前，请先进行以下替换：

OLD_DEPLOYED_MODEL_ID：现有 DeployedModel 的 ID。

执行 gcloud ai endpoints deploy-model 命令：

Linux、macOS 或 Cloud Shell

gcloud ai endpoints deploy-model ENDPOINT_ID\
  --region=LOCATION_ID \
  --model=MODEL_ID \
  --display-name=DEPLOYED_MODEL_NAME \ 
  --machine-type=MACHINE_TYPE \
  --min-replica-count=MIN_REPLICA_COUNT \
  --max-replica-count=MAX_REPLICA_COUNT \
  --traffic-split=0=20,OLD_DEPLOYED_MODEL_ID=80

Windows (PowerShell)

gcloud ai endpoints deploy-model ENDPOINT_ID`
  --region=LOCATION_ID `
  --model=MODEL_ID `
  --display-name=DEPLOYED_MODEL_NAME \ 
  --machine-type=MACHINE_TYPE `
  --min-replica-count=MIN_REPLICA_COUNT `
  --max-replica-count=MAX_REPLICA_COUNT `
  --traffic-split=0=20,OLD_DEPLOYED_MODEL_ID=80

Windows (cmd.exe)

gcloud ai endpoints deploy-model ENDPOINT_ID^
  --region=LOCATION_ID ^
  --model=MODEL_ID ^
  --display-name=DEPLOYED_MODEL_NAME \ 
  --machine-type=MACHINE_TYPE ^
  --min-replica-count=MIN_REPLICA_COUNT ^
  --max-replica-count=MAX_REPLICA_COUNT ^
  --traffic-split=0=20,OLD_DEPLOYED_MODEL_ID=80

REST

您可以使用 endpoints.predict 方法请求在线推理。

部署此模型。

在使用任何请求数据之前，请先进行以下替换：

LOCATION_ID：您在其中使用 Vertex AI 的区域。
PROJECT_ID：您的项目 ID。
ENDPOINT_ID：端点的 ID。
MODEL_ID：要部署的模型的 ID。
DEPLOYED_MODEL_NAME：DeployedModel 的名称。您还可以将 Model 的显示名用于 DeployedModel。
MACHINE_TYPE：可选。用于此部署的每个节点的机器资源。其默认设置为 n1-standard-2。详细了解机器类型。
ACCELERATOR_TYPE：要挂接到机器的加速器类型。如果未指定 ACCELERATOR_COUNT 或为零，则可选。建议不要用于使用非 GPU 映像的 AutoML 模型或自定义训练模型。了解详情。
ACCELERATOR_COUNT：每个副本要使用的加速器数量。可选。对于使用非 GPU 映像的 AutoML 模型或自定义模型，应该为零或未指定。
MIN_REPLICA_COUNT：此部署的最小节点数。节点数可根据推理负载的需要而增加或减少，直至达到节点数上限并且绝不会少于此节点数。此值必须大于或等于 1。
MAX_REPLICA_COUNT：此部署的节点数上限。节点数可根据推理负载的需要而增加或减少，直至达到此节点数并且绝不会少于节点数下限。
REQUIRED_REPLICA_COUNT：可选。此部署被标记为成功所需的节点数。必须大于或等于 1，且小于或等于节点数下限。如果未指定，则默认值为节点数量下限。
TRAFFIC_SPLIT_THIS_MODEL：流向此端点的要路由到使用此操作部署的模型的预测流量百分比。默认值为 100。所有流量百分比之和必须为 100。详细了解流量拆分。
DEPLOYED_MODEL_ID_N：可选。如果将其他模型部署到此端点，您必须更新其流量拆分百分比，以便所有百分比之和等于 100。
TRAFFIC_SPLIT_MODEL_N：已部署模型 ID 密钥的流量拆分百分比值。
PROJECT_NUMBER：自动生成的项目编号

HTTP 方法和网址：

POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel

请求 JSON 正文：

{
  "deployedModel": {
    "model": "projects/PROJECT/locations/us-central1/models/MODEL_ID",
    "displayName": "DEPLOYED_MODEL_NAME",
    "dedicatedResources": {
       "machineSpec": {
         "machineType": "MACHINE_TYPE",
         "acceleratorType": "ACCELERATOR_TYPE",
         "acceleratorCount": "ACCELERATOR_COUNT"
       },
       "minReplicaCount": MIN_REPLICA_COUNT,
       "maxReplicaCount": MAX_REPLICA_COUNT,
       "requiredReplicaCount": REQUIRED_REPLICA_COUNT
     },
  },
  "trafficSplit": {
    "0": TRAFFIC_SPLIT_THIS_MODEL,
    "DEPLOYED_MODEL_ID_1": TRAFFIC_SPLIT_MODEL_1,
    "DEPLOYED_MODEL_ID_2": TRAFFIC_SPLIT_MODEL_2
  },
}

如需发送您的请求，请展开以下选项之一：

curl（Linux、macOS 或 Cloud Shell）

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel"

PowerShell (Windows)

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel" | Select-Object -Expand Content

您应该收到类似以下内容的 JSON 响应：

{
  "name": "projects/PROJECT_ID/locations/LOCATION/endpoints/ENDPOINT_ID/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.DeployModelOperationMetadata",
    "genericMetadata": {
      "createTime": "2020-10-19T17:53:16.502088Z",
      "updateTime": "2020-10-19T17:53:16.502088Z"
    }
  }
}

Java

在尝试此示例之前，请按照《Vertex AI 快速入门：使用客户端库》中的 Java 设置说明执行操作。如需了解详情，请参阅 Vertex AI Java API 参考文档。

如需向 Vertex AI 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。

import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.aiplatform.v1.DedicatedResources;
import com.google.cloud.aiplatform.v1.DeployModelOperationMetadata;
import com.google.cloud.aiplatform.v1.DeployModelResponse;
import com.google.cloud.aiplatform.v1.DeployedModel;
import com.google.cloud.aiplatform.v1.EndpointName;
import com.google.cloud.aiplatform.v1.EndpointServiceClient;
import com.google.cloud.aiplatform.v1.EndpointServiceSettings;
import com.google.cloud.aiplatform.v1.MachineSpec;
import com.google.cloud.aiplatform.v1.ModelName;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.ExecutionException;

public class DeployModelCustomTrainedModelSample {

  public static void main(String[] args)
      throws IOException, ExecutionException, InterruptedException {
    // TODO(developer): Replace these variables before running the sample.
    String project = "PROJECT";
    String endpointId = "ENDPOINT_ID";
    String modelName = "MODEL_NAME";
    String deployedModelDisplayName = "DEPLOYED_MODEL_DISPLAY_NAME";
    deployModelCustomTrainedModelSample(project, endpointId, modelName, deployedModelDisplayName);
  }

  static void deployModelCustomTrainedModelSample(
      String project, String endpointId, String model, String deployedModelDisplayName)
      throws IOException, ExecutionException, InterruptedException {
    EndpointServiceSettings settings =
        EndpointServiceSettings.newBuilder()
            .setEndpoint("us-central1-aiplatform.googleapis.com:443")
            .build();
    String location = "us-central1";

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (EndpointServiceClient client = EndpointServiceClient.create(settings)) {
      MachineSpec machineSpec = MachineSpec.newBuilder().setMachineType("n1-standard-2").build();
      DedicatedResources dedicatedResources =
          DedicatedResources.newBuilder().setMinReplicaCount(1).setMachineSpec(machineSpec).build();

      String modelName = ModelName.of(project, location, model).toString();
      DeployedModel deployedModel =
          DeployedModel.newBuilder()
              .setModel(modelName)
              .setDisplayName(deployedModelDisplayName)
              // `dedicated_resources` must be used for non-AutoML models
              .setDedicatedResources(dedicatedResources)
              .build();
      // key '0' assigns traffic for the newly deployed model
      // Traffic percentage values must add up to 100
      // Leave dictionary empty if endpoint should not accept any traffic
      Map<String, Integer> trafficSplit = new HashMap<>();
      trafficSplit.put("0", 100);
      EndpointName endpoint = EndpointName.of(project, location, endpointId);
      OperationFuture<DeployModelResponse, DeployModelOperationMetadata> response =
          client.deployModelAsync(endpoint, deployedModel, trafficSplit);

      // You can use OperationFuture.getInitialFuture to get a future representing the initial
      // response to the request, which contains information while the operation is in progress.
      System.out.format("Operation name: %s\n", response.getInitialFuture().get().getName());

      // OperationFuture.get() will block until the operation is finished.
      DeployModelResponse deployModelResponse = response.get();
      System.out.format("deployModelResponse: %s\n", deployModelResponse);
    }
  }
}

Python 版 Vertex AI SDK

如需了解如何安装或更新 Vertex AI SDK for Python，请参阅安装 Vertex AI SDK for Python。如需了解详情，请参阅 Python 版 Vertex AI SDK API 参考文档。

def deploy_model_with_dedicated_resources_sample(
    project,
    location,
    model_name: str,
    machine_type: str,
    endpoint: Optional[aiplatform.Endpoint] = None,
    deployed_model_display_name: Optional[str] = None,
    traffic_percentage: Optional[int] = 0,
    traffic_split: Optional[Dict[str, int]] = None,
    min_replica_count: int = 1,
    max_replica_count: int = 1,
    accelerator_type: Optional[str] = None,
    accelerator_count: Optional[int] = None,
    explanation_metadata: Optional[explain.ExplanationMetadata] = None,
    explanation_parameters: Optional[explain.ExplanationParameters] = None,
    metadata: Optional[Sequence[Tuple[str, str]]] = (),
    sync: bool = True,
):
    """
    model_name: A fully-qualified model resource name or model ID.
          Example: "projects/123/locations/us-central1/models/456" or
          "456" when project and location are initialized or passed.
    """

    aiplatform.init(project=project, location=location)

    model = aiplatform.Model(model_name=model_name)

    # The explanation_metadata and explanation_parameters should only be
    # provided for a custom trained model and not an AutoML model.
    model.deploy(
        endpoint=endpoint,
        deployed_model_display_name=deployed_model_display_name,
        traffic_percentage=traffic_percentage,
        traffic_split=traffic_split,
        machine_type=machine_type,
        min_replica_count=min_replica_count,
        max_replica_count=max_replica_count,
        accelerator_type=accelerator_type,
        accelerator_count=accelerator_count,
        explanation_metadata=explanation_metadata,
        explanation_parameters=explanation_parameters,
        metadata=metadata,
        sync=sync,
    )

    model.wait()

    print(model.display_name)
    print(model.resource_name)
    return model

Node.js

在尝试此示例之前，请按照《Vertex AI 快速入门：使用客户端库》中的 Node.js 设置说明执行操作。如需了解详情，请参阅 Vertex AI Node.js API 参考文档。

如需向 Vertex AI 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。

const automl = require('@google-cloud/automl');
const client = new automl.v1beta1.AutoMlClient();

/**
 * Demonstrates using the AutoML client to create a model.
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const projectId = '[PROJECT_ID]' e.g., "my-gcloud-project";
// const computeRegion = '[REGION_NAME]' e.g., "us-central1";
// const datasetId = '[DATASET_ID]' e.g., "TBL2246891593778855936";
// const tableId = '[TABLE_ID]' e.g., "1991013247762825216";
// const columnId = '[COLUMN_ID]' e.g., "773141392279994368";
// const modelName = '[MODEL_NAME]' e.g., "testModel";
// const trainBudget = '[TRAIN_BUDGET]' e.g., "1000",
// `Train budget in milli node hours`;

// A resource that represents Google Cloud Platform location.
const projectLocation = client.locationPath(projectId, computeRegion);

// Get the full path of the column.
const columnSpecId = client.columnSpecPath(
  projectId,
  computeRegion,
  datasetId,
  tableId,
  columnId
);

// Set target column to train the model.
const targetColumnSpec = {name: columnSpecId};

// Set tables model metadata.
const tablesModelMetadata = {
  targetColumnSpec: targetColumnSpec,
  trainBudgetMilliNodeHours: trainBudget,
};

// Set datasetId, model name and model metadata for the dataset.
const myModel = {
  datasetId: datasetId,
  displayName: modelName,
  tablesModelMetadata: tablesModelMetadata,
};

// Create a model with the model metadata in the region.
client
  .createModel({parent: projectLocation, model: myModel})
  .then(responses => {
    const initialApiResponse = responses[1];
    console.log(`Training operation name: ${initialApiResponse.name}`);
    console.log('Training started...');
  })
  .catch(err => {
    console.error(err);
  });

了解如何更改推理日志记录的默认设置。

获取操作状态

某些请求会启动需要一些时间才能完成的长时间运行的操作。这些请求会返回操作名称，您可以使用该名称查看操作状态或取消操作。Vertex AI 提供辅助方法来调用长时间运行的操作。如需了解详情，请参阅使用长时间运行的操作。

使用已部署的模型获取在线推理结果

如需进行在线推理，请向模型提交一个或多个测试项进行分析，模型会返回基于模型目标的结果。使用 Google Cloud 控制台或 Vertex AI API 请求在线推理。

Google Cloud 控制台

在 Google Cloud 控制台的 Vertex AI 部分中，前往模型页面。

转到“模型”页面
从模型列表中，点击要向其请求推理的模型的名称。
选择部署和测试标签页。
在测试模型部分下，添加测试项以请求推理。系统会为您填充基准推理数据，您也可以输入自己的推理数据并点击预测。

推理完成后，Vertex AI 会在控制台中返回结果。

API：分类

gcloud

创建名为 request.json 且包含以下内容的文件：
```
      {
  "instances": [
    {
      PREDICTION_DATA_ROW
    }
  ]
}
    
```
替换以下内容：
- PREDICTION_DATA_ROW：一个 JSON 对象，使用键作为特征名称，值作为相应的特征值。例如，对于包含数字、字符串数组和类别的数据集，数据行可能类似于以下示例请求：
```
"length":3.6,
"material":"cotton",
"tag_array": ["abc","def"]
```
  必须为训练中包含的每个特征提供一个值。用于预测的数据格式必须与用于训练的格式匹配。如需了解详情，请参阅预测的数据格式。
运行以下命令：
```
gcloud ai endpoints predict ENDPOINT_ID \
  --region=LOCATION_ID \
  --json-request=request.json
```
替换以下内容：
- ENDPOINT_ID：端点的 ID。
- LOCATION_ID：您在其中使用 Vertex AI 的区域。

REST

您可以使用 endpoints.predict 方法请求在线推理。

在使用任何请求数据之前，请先进行以下替换：

LOCATION_ID：端点所在的区域。例如 us-central1。
PROJECT_ID：您的项目 ID。
ENDPOINT_ID：端点的 ID。
PREDICTION_DATA_ROW：一个 JSON 对象，使用键作为特征名称，值作为相应的特征值。例如，对于包含数字、字符串数组和类别的数据集，数据行可能类似于以下示例请求：
```
"length":3.6,
"material":"cotton",
"tag_array": ["abc","def"]
```
必须为训练中包含的每个特征提供一个值。用于预测的数据格式必须与用于训练的格式匹配。如需了解详情，请参阅预测的数据格式。
DEPLOYED_MODEL_ID：由 predict 方法输出，由 explain 方法接受为输入。用于生成推理的模型的 ID。如果您需要为之前请求的推理请求说明，并且您部署了多个模型，则可以使用此 ID 来确保为提供之前推理的同一模型返回说明。

HTTP 方法和网址：

POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:predict

请求 JSON 正文：

{
  "instances": [
    {
      PREDICTION_DATA_ROW
    }
  ]
}

如需发送请求，请选择以下方式之一：

curl

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:predict"

PowerShell

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:predict" | Select-Object -Expand Content

您应该收到类似以下内容的 JSON 响应：

   {
     "predictions": [
      {
         "scores": [
           0.96771615743637085,
           0.032283786684274673
         ],
         "classes": [
           "0",
           "1"
         ]
      }
     ]
     "deployedModelId": "2429510197"
   }

Java

在尝试此示例之前，请按照《Vertex AI 快速入门：使用客户端库》中的 Java 设置说明执行操作。如需了解详情，请参阅 Vertex AI Java API 参考文档。

如需向 Vertex AI 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。


import com.google.cloud.aiplatform.util.ValueConverter;
import com.google.cloud.aiplatform.v1.EndpointName;
import com.google.cloud.aiplatform.v1.PredictResponse;
import com.google.cloud.aiplatform.v1.PredictionServiceClient;
import com.google.cloud.aiplatform.v1.PredictionServiceSettings;
import com.google.cloud.aiplatform.v1.schema.predict.prediction.TabularClassificationPredictionResult;
import com.google.protobuf.ListValue;
import com.google.protobuf.Value;
import com.google.protobuf.util.JsonFormat;
import java.io.IOException;
import java.util.List;

public class PredictTabularClassificationSample {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    String project = "YOUR_PROJECT_ID";
    String instance = "[{ “feature_column_a”: “value”, “feature_column_b”: “value”}]";
    String endpointId = "YOUR_ENDPOINT_ID";
    predictTabularClassification(instance, project, endpointId);
  }

  static void predictTabularClassification(String instance, String project, String endpointId)
      throws IOException {
    PredictionServicPredictionServiceSettingsceSettings =
        PredictionServicPredictionServiceSettings          .setEndpoint("us-central1-aiplatform.googleapis.com:443")
            .build();

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (PredictionServicPredictionServiceClientceClient =
        PredictionServicPredictionServiceClientonServiceSettings)) {
      String location = "us-central1";
      EndpointName endEndpointNameEndpointName.of(EndpointNameation, endpointId);

      ListValue.BuildeListValueue = ListValue.newBuiListValue     JsonFormat.parseJsonFormatinstance, listValue);
      List<Value> instanListValuelistValue.getValuesList();

      Value parametersValuelue.newBuilderValuetListValue(listValue).build();
      PredictResponse PredictResponse =
          predictionServiceClient.predict(endpointName, instanceList, parameters);
      System.out.println("Predict Tabular Classification Response");
      System.out.format("\tDeployed Model Id: %s\n", predictResponse.predictResponse.getDeployedModelId().out.println("Predictions");
      for (Value predictionValueedictResponse.predictResponse.getPredictionsList()larClassificTabularClassificationPredictionResultuilder =
            TabularClassificTabularClassificationPredictionResult       TabularClassificTabularClassificationPredictionResult      (TabularClassificTabularClassificationPredictionResult  ValueConverter.fValueConvertertBuilder, prediction);

        for (int i = 0; i < result.getClasseresult.getClassesCount()   System.out.printf("\tClass: %s", result.getClasseresult.getClasses(i)tem.out.printf("\tScore: %f", result.getScoresresult.getScores(i)   }
    }
  }
}

Node.js

在尝试此示例之前，请按照《Vertex AI 快速入门：使用客户端库》中的 Node.js 设置说明执行操作。如需了解详情，请参阅 Vertex AI Node.js API 参考文档。

如需向 Vertex AI 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。

/**
 * TODO(developer): Uncomment these variables before running the sample.\
 * (Not necessary if passing values as arguments)
 */

// const endpointId = 'YOUR_ENDPOINT_ID';
// const project = 'YOUR_PROJECT_ID';
// const location = 'YOUR_PROJECT_LOCATION';
const aiplatform = require('@google-cloud/aiplatform');
const {prediction} =
  aiplatform.protos.google.cloud.aiplatform.v1.schema.predict;

// Imports the Google Cloud Prediction service client
const {PredictionServiceClient} = aiplatform.v1;

// Import the helper module for converting arbitrary protobuf.Value objects.
const {helpers} = aiplatform;

// Specifies the location of the api endpoint
const clientOptions = {
  apiEndpoint: 'us-central1-aiplatform.googleapis.com',
};

// Instantiates a client
const predictionServiceClient = new PredictionServiceClient(clientOptions);

async function predictTablesClassification() {
  // Configure the endpoint resource
  const endpoint = `projects/${project}/locations/${location}/endpoints/${endpointId}`;
  const parameters = helpers.toValue({});

  const instance = helpers.toValue({
    petal_length: '1.4',
    petal_width: '1.3',
    sepal_length: '5.1',
    sepal_width: '2.8',
  });

  const instances = [instance];
  const request = {
    endpoint,
    instances,
    parameters,
  };

  // Predict request
  const [response] = await predictionServiceClient.predict(request);

  console.log('Predict tabular classification response');
  console.log(`\tDeployed model id : ${response.deployedModelId}\n`);
  const predictions = response.predictions;
  console.log('Predictions :');
  for (const predictionResultVal of predictions) {
    const predictionResultObj =
      prediction.TabularClassificationPredictionResult.fromValue(
        predictionResultVal
      );
    for (const [i, class_] of predictionResultObj.classes.entries()) {
      console.log(`\tClass: ${class_}`);
      console.log(`\tScore: ${predictionResultObj.scores[i]}\n\n`);
    }
  }
}
predictTablesClassification();

Python 版 Vertex AI SDK

如需了解如何安装或更新 Vertex AI SDK for Python，请参阅安装 Vertex AI SDK for Python。如需了解详情，请参阅 Python 版 Vertex AI SDK API 参考文档。

def predict_tabular_classification_sample(
    project: str,
    location: str,
    endpoint_name: str,
    instances: List[Dict],
):
    """
    Args
        project: Your project ID or project number.
        location: Region where Endpoint is located. For example, 'us-central1'.
        endpoint_name: A fully qualified endpoint name or endpoint ID. Example: "projects/123/locations/us-central1/endpoints/456" or
               "456" when project and location are initialized or passed.
        instances: A list of one or more instances (examples) to return a prediction for.
    """
    aiplatform.init(project=project, location=location)

    endpoint = aiplatform.Endpoint(endpoint_name)

    response = endpoint.predict(instances=instances)

    for prediction_ in response.predictions:
        print(prediction_)

API：回归

gcloud

创建一个名为“request.json”且包含以下内容的文件：
```
      {
  "instances": [
    {
      PREDICTION_DATA_ROW
    }
  ]
}
    
```
替换以下内容：
- PREDICTION_DATA_ROW：一个 JSON 对象，使用键作为特征名称，值作为相应的特征值。例如，对于包含数字、数字数组和类别的数据集，数据行可能类似于以下示例请求：
```
"age":3.6,
"sq_ft":5392,
"code": "90331"
```
  必须为训练中包含的每个特征提供一个值。用于预测的数据格式必须与用于训练的格式匹配。如需了解详情，请参阅预测的数据格式。
运行以下命令：
```
gcloud ai endpoints predict ENDPOINT_ID \
  --region=LOCATION_ID \
  --json-request=request.json
```
替换以下内容：
- ENDPOINT_ID：端点的 ID。
- LOCATION_ID：您在其中使用 Vertex AI 的区域。

REST

您可以使用 endpoints.predict 方法请求在线推理。

在使用任何请求数据之前，请先进行以下替换：

LOCATION_ID：端点所在的区域。例如 us-central1。
PROJECT_ID：您的项目 ID。
ENDPOINT_ID：端点的 ID。
PREDICTION_DATA_ROW：一个 JSON 对象，使用键作为特征名称，值作为相应的特征值。例如，对于包含数字、数字数组和类别的数据集，数据行可能类似于以下示例请求：
```
"age":3.6,
"sq_ft":5392,
"code": "90331"
```
必须为训练中包含的每个特征提供一个值。用于预测的数据格式必须与用于训练的格式匹配。如需了解详情，请参阅预测的数据格式。
DEPLOYED_MODEL_ID：由 predict 方法输出，由 explain 方法接受为输入。用于生成推理的模型的 ID。如果您需要为之前请求的推理请求说明，并且您部署了多个模型，则可以使用此 ID 来确保为提供之前推理的同一模型返回说明。

HTTP 方法和网址：

POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:predict

请求 JSON 正文：

{
  "instances": [
    {
      PREDICTION_DATA_ROW
    }
  ]
}

如需发送请求，请选择以下方式之一：

curl

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:predict"

PowerShell

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:predict" | Select-Object -Expand Content

您应该收到类似以下内容的 JSON 响应：


{
  "predictions": [
    [
      {
        "value": 65.14233,
        "lower_bound": 4.6572,
        "upper_bound": 164.0279
      }
    ]
  ],
  "deployedModelId": "DEPLOYED_MODEL_ID"
}

Java

在尝试此示例之前，请按照《Vertex AI 快速入门：使用客户端库》中的 Java 设置说明执行操作。如需了解详情，请参阅 Vertex AI Java API 参考文档。

如需向 Vertex AI 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。


import com.google.cloud.aiplatform.util.ValueConverter;
import com.google.cloud.aiplatform.v1.EndpointName;
import com.google.cloud.aiplatform.v1.PredictResponse;
import com.google.cloud.aiplatform.v1.PredictionServiceClient;
import com.google.cloud.aiplatform.v1.PredictionServiceSettings;
import com.google.cloud.aiplatform.v1.schema.predict.prediction.TabularRegressionPredictionResult;
import com.google.protobuf.ListValue;
import com.google.protobuf.Value;
import com.google.protobuf.util.JsonFormat;
import java.io.IOException;
import java.util.List;

public class PredictTabularRegressionSample {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    String project = "YOUR_PROJECT_ID";
    String instance = "[{ “feature_column_a”: “value”, “feature_column_b”: “value”}]";
    String endpointId = "YOUR_ENDPOINT_ID";
    predictTabularRegression(instance, project, endpointId);
  }

  static void predictTabularRegression(String instance, String project, String endpointId)
      throws IOException {
    PredictionServicPredictionServiceSettingsceSettings =
        PredictionServicPredictionServiceSettings          .setEndpoint("us-central1-aiplatform.googleapis.com:443")
            .build();

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (PredictionServicPredictionServiceClientceClient =
        PredictionServicPredictionServiceClientonServiceSettings)) {
      String location = "us-central1";
      EndpointName endEndpointNameEndpointName.of(EndpointNameation, endpointId);

      ListValue.BuildeListValueue = ListValue.newBuiListValue     JsonFormat.parseJsonFormatinstance, listValue);
      List<Value> instanListValuelistValue.getValuesList();

      Value parametersValuelue.newBuilderValuetListValue(listValue).build();
      PredictResponse PredictResponse =
          predictionServiceClient.predict(endpointName, instanceList, parameters);
      System.out.println("Predict Tabular Regression Response");
      System.out.format("\tDisplay Model Id: %s\n", predictResponse.predictResponse.getDeployedModelId().out.println("Predictions");
      for (Value predictionValueedictResponse.predictResponse.getPredictionsList()larRegressioTabularRegressionPredictionResultuilder =
            TabularRegressioTabularRegressionPredictionResult        TabularRegressioTabularRegressionPredictionResult      (TabularRegressioTabularRegressionPredictionResult.fValueConvertertBuilder, prediction);

        System.out.printf("\tUpper bound: %f\n", result.getUpperBresult.getUpperBound()m.out.printf("\tLower bound: %f\n", result.getLowerBresult.getLowerBound()m.out.printf("\tValue: %f\n", result.getValue(result.getValue()
  }
}

Node.js

在尝试此示例之前，请按照《Vertex AI 快速入门：使用客户端库》中的 Node.js 设置说明执行操作。如需了解详情，请参阅 Vertex AI Node.js API 参考文档。

如需向 Vertex AI 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。

/**
 * TODO(developer): Uncomment these variables before running the sample.\
 * (Not necessary if passing values as arguments)
 */

// const endpointId = 'YOUR_ENDPOINT_ID';
// const project = 'YOUR_PROJECT_ID';
// const location = 'YOUR_PROJECT_LOCATION';
const aiplatform = require('@google-cloud/aiplatform');
const {prediction} =
  aiplatform.protos.google.cloud.aiplatform.v1.schema.predict;

// Imports the Google Cloud Prediction service client
const {PredictionServiceClient} = aiplatform.v1;

// Import the helper module for converting arbitrary protobuf.Value objects.
const {helpers} = aiplatform;

// Specifies the location of the api endpoint
const clientOptions = {
  apiEndpoint: 'us-central1-aiplatform.googleapis.com',
};

// Instantiates a client
const predictionServiceClient = new PredictionServiceClient(clientOptions);

async function predictTablesRegression() {
  // Configure the endpoint resource
  const endpoint = `projects/${project}/locations/${location}/endpoints/${endpointId}`;
  const parameters = helpers.toValue({});

  // TODO (erschmid): Make this less painful
  const instance = helpers.toValue({
    BOOLEAN_2unique_NULLABLE: false,
    DATETIME_1unique_NULLABLE: '2019-01-01 00:00:00',
    DATE_1unique_NULLABLE: '2019-01-01',
    FLOAT_5000unique_NULLABLE: 1611,
    FLOAT_5000unique_REPEATED: [2320, 1192],
    INTEGER_5000unique_NULLABLE: '8',
    NUMERIC_5000unique_NULLABLE: 16,
    STRING_5000unique_NULLABLE: 'str-2',
    STRUCT_NULLABLE: {
      BOOLEAN_2unique_NULLABLE: false,
      DATE_1unique_NULLABLE: '2019-01-01',
      DATETIME_1unique_NULLABLE: '2019-01-01 00:00:00',
      FLOAT_5000unique_NULLABLE: 1308,
      FLOAT_5000unique_REPEATED: [2323, 1178],
      FLOAT_5000unique_REQUIRED: 3089,
      INTEGER_5000unique_NULLABLE: '1777',
      NUMERIC_5000unique_NULLABLE: 3323,
      TIME_1unique_NULLABLE: '23:59:59.999999',
      STRING_5000unique_NULLABLE: 'str-49',
      TIMESTAMP_1unique_NULLABLE: '1546387199999999',
    },
    TIMESTAMP_1unique_NULLABLE: '1546387199999999',
    TIME_1unique_NULLABLE: '23:59:59.999999',
  });

  const instances = [instance];
  const request = {
    endpoint,
    instances,
    parameters,
  };

  // Predict request
  const [response] = await predictionServiceClient.predict(request);

  console.log('Predict tabular regression response');
  console.log(`\tDeployed model id : ${response.deployedModelId}`);
  const predictions = response.predictions;
  console.log('\tPredictions :');
  for (const predictionResultVal of predictions) {
    const predictionResultObj =
      prediction.TabularRegressionPredictionResult.fromValue(
        predictionResultVal
      );
    console.log(`\tUpper bound: ${predictionResultObj.upper_bound}`);
    console.log(`\tLower bound: ${predictionResultObj.lower_bound}`);
    console.log(`\tLower bound: ${predictionResultObj.value}`);
  }
}
predictTablesRegression();

Python 版 Vertex AI SDK

如需了解如何安装或更新 Vertex AI SDK for Python，请参阅安装 Vertex AI SDK for Python。如需了解详情，请参阅 Python 版 Vertex AI SDK API 参考文档。

def predict_tabular_regression_sample(
    project: str,
    location: str,
    endpoint_name: str,
    instances: List[Dict],
):
    aiplatform.init(project=project, location=location)

    endpoint = aiplatform.Endpoint(endpoint_name)

    response = endpoint.predict(instances=instances)

    for prediction_ in response.predictions:
        print(prediction_)

解读预测结果

分类

分类模型会返回置信度分数。

置信度分数传达了模型将每个类列或标签与测试项相关联的强度。该数值越高，模型应用于该项的置信度就越高。您可以决定接受模型的置信度分数为多高。

回归

回归模型会返回推理值。对于 BigQuery 目标位置，它们还会返回推理区间。推理区间提供模型 95% 有实际结果的一系列值。

使用已部署的模型获取在线说明

您可以请求带有说明（也称为特征归因）的推理，以了解模型如何得出推理结果。局部特征重要性值可以表示每个特征对推理结果的影响程度。特征归因通过 Vertex Explainable AI 包含在 Vertex AI 推理中。

控制台

使用 Google Cloud 控制台请求在线推理时，系统会自动返回局部特征重要性值。

如果使用预填充的预测值，则局部特征重要性值均为 0。这是因为预填充值是基准预测数据，因此返回的预测是基准预测值。

gcloud

创建名为 request.json 且包含以下内容的文件：
```
{
  "instances": [
    {
      PREDICTION_DATA_ROW
    }
  ]
}
```
请替换以下内容：
- PREDICTION_DATA_ROW：一个 JSON 对象，使用键作为特征名称，值作为相应的特征值。例如，对于包含数字、字符串数组和类别的数据集，数据行可能类似于以下示例请求：
```
"length":3.6,
"material":"cotton",
"tag_array": ["abc","def"]
```
  必须为训练中包含的每个特征提供一个值。用于预测的数据格式必须与用于训练的格式匹配。如需了解详情，请参阅预测的数据格式。
运行以下命令：
```
gcloud ai endpoints explain ENDPOINT_ID \
  --region=LOCATION_ID \
  --json-request=request.json
```
替换以下内容：
- ENDPOINT_ID：端点的 ID。
- LOCATION_ID：您在其中使用 Vertex AI 的区域。
（可选）如果您想要向 Endpoint 中的特定 DeployedModel 发送说明请求，则可以指定 --deployed-model-id 标志：
```
gcloud ai endpoints explain ENDPOINT_ID \
  --region=LOCATION \
  --deployed-model-id=DEPLOYED_MODEL_ID \
  --json-request=request.json
```
除了上述占位符之外，还替换以下内容：
- DEPLOYED_MODEL_ID（可选）：您想要为其获取说明的已部署模型的 ID。此 ID 包含在 predict 方法的响应中。如果您需要为特定模型请求说明，并且您在同一个端点上部署了多个模型，则可以使用此 ID 来确保为该特定模型返回说明。

REST

以下示例展示了表格分类模型的在线推理请求，包括局部特征归因。回归模型的请求格式与此相同。

在使用任何请求数据之前，请先进行以下替换：

LOCATION：端点所在的区域。例如 us-central1。
PROJECT：您的项目 ID。
ENDPOINT_ID：端点的 ID。
PREDICTION_DATA_ROW：一个 JSON 对象，使用键作为特征名称，值作为相应的特征值。例如，对于包含数字、字符串数组和类别的数据集，数据行可能类似于以下示例请求：
```
"length":3.6,
"material":"cotton",
"tag_array": ["abc","def"]
```
必须为训练中包含的每个特征提供一个值。用于预测的数据格式必须与用于训练的格式匹配。如需了解详情，请参阅预测的数据格式。
DEPLOYED_MODEL_ID（可选）：您想要为其获取说明的已部署模型的 ID。此 ID 包含在 predict 方法的响应中。如果您需要为特定模型请求说明，并且您在同一个端点上部署了多个模型，则可以使用此 ID 来确保为该特定模型返回说明。

HTTP 方法和网址：

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/endpoints/ENDPOINT_ID:explain

请求 JSON 正文：

{
  "instances": [
    {
      PREDICTION_DATA_ROW
    }
  ],
  "deployedModelId": "DEPLOYED_MODEL_ID"
}

如需发送请求，请选择以下方式之一：

curl

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/endpoints/ENDPOINT_ID:explain"

PowerShell

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/endpoints/ENDPOINT_ID:explain" | Select-Object -Expand Content

Python 版 Vertex AI SDK

如需了解如何安装或更新 Vertex AI SDK for Python，请参阅安装 Vertex AI SDK for Python。如需了解详情，请参阅 Python 版 Vertex AI SDK API 参考文档。

def explain_sample(project: str, location: str, endpoint_id: str, instance_dict: Dict):

    aiplatform.init(project=project, location=location)

    endpoint = aiplatform.Endpoint(endpoint_id)

    response = endpoint.explain(instances=[instance_dict], parameters={})

    for explanation in response.explanations:
        print(" explanation")
        # Feature attributions.
        attributions = explanation.attributions
        for attribution in attributions:
            print("  attribution")
            print("   baseline_output_value:", attribution.baseline_output_value)
            print("   instance_output_value:", attribution.instance_output_value)
            print("   output_display_name:", attribution.output_display_name)
            print("   approximation_error:", attribution.approximation_error)
            print("   output_name:", attribution.output_name)
            output_index = attribution.output_index
            for output_index in output_index:
                print("   output_index:", output_index)

    for prediction in response.predictions:
        print(prediction)

获取之前返回的预测结果的说明

由于说明会增加资源使用量，所以您可能需要在有特别需要时才请求说明。有时，针对已经收到的推理结果请求说明非常有用，例如，因为推理结果是离群值或没有意义。

如果所有推理结果来自同一模型，则只需重新发送请求数据，同时请求说明。但是，如果您有多个模型返回推理结果，则必须确保将说明请求发送到正确的模型。如需查看特定模型的说明，您可以在请求中添加已部署模型的 ID deployedModelID，这包含在原始推理请求的响应中。请注意，已部署模型 ID 与模型 ID 不同。

解读说明结果

如需计算局部特征重要性，请先计算基准推理得分。基准值使用数值特征的中值和分类特征的模式基于训练数据进行计算。根据基准值生成的推理是基准推理得分。模型基准值仅计算一次，并且不会变化。

对于特定的推理，每个特征的局部特征重要性告诉您与基准推理得分相比，该特征在结果中增加或减少了多少。所有特征重要性值的总和等于基准推理分数与推理结果之间的差。

对于分类模型，得分始终介于 0.0 和 1.0 之间（含边界值）。因此，分类模型的局部特征重要性值始终介于 -1.0 和 1.0（含边界值）之间。

如需查看特征归因查询示例并了解详情，请参阅分类和回归的特征归因。

推理和说明的输出示例

分类

具有特征重要性的表格分类模型的在线推理返回的载荷类似于以下示例。

0.928652400970459 的 instanceOutputValue 是得分最高类别的置信度得分，在本例中为 class_a。baselineOutputValue 字段包含基准推理得分 0.808652400970459。对这一结果贡献最大的特征是 feature_3。

{
"predictions": [
  {
    "scores": [
      0.928652400970459,
      0.071347599029541
    ],
    "classes": [
      "class_a",
      "class_b"
    ]
  }
]
"explanations": [
  {
    "attributions": [
      {
        "baselineOutputValue": 0.808652400970459,
        "instanceOutputValue": 0.928652400970459,
        "approximationError":  0.0058915703929231,
        "featureAttributions": {
          "feature_1": 0.012394922231235,
          "feature_2": 0.050212341234556,
          "feature_3": 0.057392736534209,
        },
        "outputIndex": [
          0
        ],
        "outputName": "scores"
      }
    ],
  }
]
"deployedModelId": "234567"
}

回归

表格回归模型中具有特征重要性的在线推理返回的载荷类似于以下示例。

1795.1246466281819 的 instanceOutputValue 是预测值，lower_bound 和 upper_bound 字段提供了 95% 的置信区间。baselineOutputValue 字段包含基准推理得分 1788.7423095703125。对这一结果贡献最大的特征是 feature_3。

{
"predictions": [
  {
    "value": 1795.1246466281819,
    "lower_bound": 246.32196807861328,
    "upper_bound": 8677.51904296875
  }
]
"explanations": [
  {
    "attributions": [
      {
        "baselineOutputValue": 1788.7423095703125,
        "instanceOutputValue": 1795.1246466281819,
        "approximationError": 0.0038215703911553,
        "featureAttributions": {
          "feature_1": 0.123949222312359,
          "feature_2": 0.802123412345569,
          "feature_3": 5.456264423211472,
        },
        "outputIndex": [
          -1
        ]
      }
    ]
  }
],
"deployedModelId": "345678"
}

后续步骤

了解如何导出模型。

获取在线推理和说明 使用集合让一切井井有条 根据您的偏好保存内容并对其进行分类。

准备工作

将模型部署到端点

Google Cloud 控制台

API

创建端点

gcloud

REST

curl（Linux、macOS 或 Cloud Shell）

PowerShell (Windows)

Java

Node.js

Python 版 Vertex AI SDK

获取端点 ID

gcloud

REST

curl（Linux、macOS 或 Cloud Shell）

PowerShell (Windows)

部署模型

gcloud

Linux、macOS 或 Cloud Shell

Windows (PowerShell)

Windows (cmd.exe)

拆分流量

Linux、macOS 或 Cloud Shell

Windows (PowerShell)

Windows (cmd.exe)

REST

curl（Linux、macOS 或 Cloud Shell）

PowerShell (Windows)

Java

Python 版 Vertex AI SDK

Node.js

获取操作状态

使用已部署的模型获取在线推理结果

Google Cloud 控制台

API：分类

gcloud

REST

curl

PowerShell

Java

Node.js

Python 版 Vertex AI SDK

API：回归

gcloud

REST

curl

PowerShell

Java

Node.js

Python 版 Vertex AI SDK

解读预测结果

分类

回归

使用已部署的模型获取在线说明

控制台

gcloud

REST

curl

PowerShell

Python 版 Vertex AI SDK

解读说明结果

推理和说明的输出示例

分类

回归

后续步骤

获取在线推理和说明