从图片分类模型获取推理结果

本页面介绍了如何使用 Google Cloud 控制台或 Vertex AI API 通过图片分类模型获取在线（实时）推理和批量推理。

在线推理与批量推理之间的区别

在线推理是指向模型端点发出同步请求。如果您要发出请求以响应应用输入，或者在其他需要及时推理的情况下，可以使用在线推理。

批量推理是异步请求。您可以直接从模型资源请求批量推理，而无需将模型部署到端点。对于图片数据，如果您不需要获得即时响应并且希望通过一个请求处理累积数据，则可以使用批量推理。

获取在线推理结果

将模型部署到端点

您必须先将模型部署到端点，然后才能使用该模型执行在线推理。部署模型的操作会将物理资源与模型相关联，以便它以较低的延迟执行在线推理。

您可以将多个模型部署到一个端点，也可以将一个模型部署到多个端点。如需详细了解部署模型的方法和使用场景，请参阅部署模型简介。

请使用以下方法之一部署模型：

Google Cloud 控制台

在 Google Cloud 控制台的 Vertex AI 部分中，前往模型页面。

转到“模型”页面
点击要部署的模型的名称以打开其模型说明页面。
在版本 ID 列中，点击要部署的模型版本的 ID
点击部署和测试。

如果模型已部署到任何端点，部署模型 (Deploy your model) 部分中会列出这些端点。
点击部署到端点。
如需将模型部署到新的端点，请点击 创建新端点并为新端点输入名称。如需将模型部署到现有端点，请点击 添加到现有端点，然后选择端点名称。

您可以将多个模型添加到一个端点，也可以将一个模型添加到多个端点。了解详情。

如果您将模型部署到新端点，请选择访问端点的方式：
- 点击标准以使端点可通过 REST API 进行推理。
- 点击专用，使端点使用专用连接。
如果您将模型部署到已部署有一个或多个模型的现有端点，请更新要部署的模型和已部署模型的流量拆分百分比，以使百分比的总和为 100%。
选择 AutoML 图片，然后按如下方式进行配置：
1. 如果您要将模型部署到新端点，请接受 100 的流量拆分值。否则，请为端点上所有模型调整流量拆分值，使它们的总和为 100。
2. 输入要为模型提供的计算节点数量。
  
  这是此模型始终可用的节点数。即使没有推理流量，您也需要为节点付费。请参阅价格页面。
3. 了解如何更改推理日志记录的默认设置。
4. 仅限分类模型（可选）：在可解释性选项部分，选择 为此模型启用特征归因以启用 Vertex Explainable AI。接受现有的可视化设置或选择新值，然后点击完成。
  部署配置有 Vertex Explainable AI 的 AutoML 图片分类模型并执行提供解释的推理属于可选步骤。在部署时启用 Vertex Explainable AI 会产生基于已部署节点数量和部署时间的额外费用。如需了解详情，请参阅价格。
5. 为模型点击完成，当所有流量分配百分比均正确无误时，点击继续。
  随即将显示在其中部署模型的区域。此区域必须是您在其中创建模型的区域。
6. 点击部署，将模型部署到端点。

API

使用 Vertex AI API 部署模型时，请完成以下步骤：

根据需要创建端点。
获取端点 ID。
将模型部署到端点。

创建端点

如果要将模型部署到现有端点，您可以跳过此步骤。

gcloud

以下示例使用 gcloud ai endpoints create 命令：

gcloud ai endpoints create \
  --region=LOCATION \
  --display-name=ENDPOINT_NAME

替换以下内容：

LOCATION_ID：您在其中使用 Vertex AI 的区域。
ENDPOINT_NAME：端点的显示名称。

Google Cloud CLI 工具可能需要几秒钟才能创建端点。

REST

在使用任何请求数据之前，请先进行以下替换：

LOCATION_ID：您的区域。
PROJECT_ID：您的项目 ID。
ENDPOINT_NAME：端点的显示名称。

HTTP 方法和网址：

POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints

请求 JSON 正文：

{
  "display_name": "ENDPOINT_NAME"
}

如需发送您的请求，请展开以下选项之一：

curl（Linux、macOS 或 Cloud Shell）

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI，或者使用了 Cloud Shell，这会使您自动登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints"

PowerShell (Windows)

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints" | Select-Object -Expand Content

您应该收到类似以下内容的 JSON 响应：

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/endpoints/ENDPOINT_ID/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.CreateEndpointOperationMetadata",
    "genericMetadata": {
      "createTime": "2020-11-05T17:45:42.812656Z",
      "updateTime": "2020-11-05T17:45:42.812656Z"
    }
  }
}

您可以轮询操作状态，直到响应包含 "done": true。

Java

在尝试此示例之前，请按照《Vertex AI 快速入门：使用客户端库》中的 Java 设置说明执行操作。如需了解详情，请参阅 Vertex AI Java API 参考文档。

如需向 Vertex AI 进行身份验证，请设置应用默认凭证。如需了解详情，请参阅为本地开发环境设置身份验证。


import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.aiplatform.v1.CreateEndpointOperationMetadata;
import com.google.cloud.aiplatform.v1.Endpoint;
import com.google.cloud.aiplatform.v1.EndpointServiceClient;
import com.google.cloud.aiplatform.v1.EndpointServiceSettings;
import com.google.cloud.aiplatform.v1.LocationName;
import java.io.IOException;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;

public class CreateEndpointSample {

  public static void main(String[] args)
      throws IOException, InterruptedException, ExecutionException, TimeoutException {
    // TODO(developer): Replace these variables before running the sample.
    String project = "YOUR_PROJECT_ID";
    String endpointDisplayName = "YOUR_ENDPOINT_DISPLAY_NAME";
    createEndpointSample(project, endpointDisplayName);
  }

  static void createEndpointSample(String project, String endpointDisplayName)
      throws IOException, InterruptedException, ExecutionException, TimeoutException {
    EndpointServiceSettings endpointServiceSettings =
        EndpointServiceSettings.newBuilder()
            .setEndpoint("us-central1-aiplatform.googleapis.com:443")
            .build();

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (EndpointServiceClient endpointServiceClient =
        EndpointServiceClient.create(endpointServiceSettings)) {
      String location = "us-central1";
      LocationName locationName = LocationName.of(project, location);
      Endpoint endpoint = Endpoint.newBuilder().setDisplayName(endpointDisplayName).build();

      OperationFuture<Endpoint, CreateEndpointOperationMetadata> endpointFuture =
          endpointServiceClient.createEndpointAsync(locationName, endpoint);
      System.out.format("Operation name: %s\n", endpointFuture.getInitialFuture().get().getName());
      System.out.println("Waiting for operation to finish...");
      Endpoint endpointResponse = endpointFuture.get(300, TimeUnit.SECONDS);

      System.out.println("Create Endpoint Response");
      System.out.format("Name: %s\n", endpointResponse.getName());
      System.out.format("Display Name: %s\n", endpointResponse.getDisplayName());
      System.out.format("Description: %s\n", endpointResponse.getDescription());
      System.out.format("Labels: %s\n", endpointResponse.getLabelsMap());
      System.out.format("Create Time: %s\n", endpointResponse.getCreateTime());
      System.out.format("Update Time: %s\n", endpointResponse.getUpdateTime());
    }
  }
}

Node.js

在尝试此示例之前，请按照《Vertex AI 快速入门：使用客户端库》中的 Node.js 设置说明执行操作。如需了解详情，请参阅 Vertex AI Node.js API 参考文档。

如需向 Vertex AI 进行身份验证，请设置应用默认凭证。如需了解详情，请参阅为本地开发环境设置身份验证。

/**
 * TODO(developer): Uncomment these variables before running the sample.\
 * (Not necessary if passing values as arguments)
 */

// const endpointDisplayName = 'YOUR_ENDPOINT_DISPLAY_NAME';
// const project = 'YOUR_PROJECT_ID';
// const location = 'YOUR_PROJECT_LOCATION';

// Imports the Google Cloud Endpoint Service Client library
const {EndpointServiceClient} = require('@google-cloud/aiplatform');

// Specifies the location of the api endpoint
const clientOptions = {
  apiEndpoint: 'us-central1-aiplatform.googleapis.com',
};

// Instantiates a client
const endpointServiceClient = new EndpointServiceClient(clientOptions);

async function createEndpoint() {
  // Configure the parent resource
  const parent = `projects/${project}/locations/${location}`;
  const endpoint = {
    displayName: endpointDisplayName,
  };
  const request = {
    parent,
    endpoint,
  };

  // Get and print out a list of all the endpoints for this resource
  const [response] = await endpointServiceClient.createEndpoint(request);
  console.log(`Long running operation : ${response.name}`);

  // Wait for operation to complete
  await response.promise();
  const result = response.result;

  console.log('Create endpoint response');
  console.log(`\tName : ${result.name}`);
  console.log(`\tDisplay name : ${result.displayName}`);
  console.log(`\tDescription : ${result.description}`);
  console.log(`\tLabels : ${JSON.stringify(result.labels)}`);
  console.log(`\tCreate time : ${JSON.stringify(result.createTime)}`);
  console.log(`\tUpdate time : ${JSON.stringify(result.updateTime)}`);
}
createEndpoint();

Python

如需了解如何安装或更新 Vertex AI SDK for Python，请参阅安装 Vertex AI SDK for Python。如需了解详情，请参阅 Python API 参考文档。

def create_endpoint_sample(
    project: str,
    display_name: str,
    location: str,
):
    aiplatform.init(project=project, location=location)

    endpoint = aiplatform.Endpoint.create(
        display_name=display_name,
        project=project,
        location=location,
    )

    print(endpoint.display_name)
    print(endpoint.resource_name)
    return endpoint

检索端点 ID

您需要端点 ID 才能部署模型。

gcloud

以下示例使用 gcloud ai endpoints list 命令：

gcloud ai endpoints list \
  --region=LOCATION \
  --filter=display_name=ENDPOINT_NAME

替换以下内容：

LOCATION_ID：您在其中使用 Vertex AI 的区域。
ENDPOINT_NAME：端点的显示名称。

请注意 ENDPOINT_ID 列中显示的数字。请在以下步骤中使用此 ID。

REST

在使用任何请求数据之前，请先进行以下替换：

LOCATION_ID：您在其中使用 Vertex AI 的区域。
PROJECT_ID：。
ENDPOINT_NAME：端点的显示名称。

HTTP 方法和网址：

GET https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints?filter=display_name=ENDPOINT_NAME

如需发送您的请求，请展开以下选项之一：

curl（Linux、macOS 或 Cloud Shell）

执行以下命令：

curl -X GET \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints?filter=display_name=ENDPOINT_NAME"

PowerShell (Windows)

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

执行以下命令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints?filter=display_name=ENDPOINT_NAME" | Select-Object -Expand Content

您应该收到类似以下内容的 JSON 响应：

{
  "endpoints": [
    {
      "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/endpoints/ENDPOINT_ID",
      "displayName": "ENDPOINT_NAME",
      "etag": "AMEw9yPz5pf4PwBHbRWOGh0PcAxUdjbdX2Jm3QO_amguy3DbZGP5Oi_YUKRywIE-BtLx",
      "createTime": "2020-04-17T18:31:11.585169Z",
      "updateTime": "2020-04-17T18:35:08.568959Z"
    }
  ]
}

请记下 ENDPOINT_ID。

部署模型

在下面选择您的语言或环境对应的标签页：

gcloud

以下示例使用 gcloud ai endpoints deploy-model 命令。

以下示例将 Model 部署到 Endpoint，但没有在多个 DeployedModel 资源之间拆分流量：

在使用下面的命令数据之前，请先进行以下替换：

ENDPOINT_ID：端点的 ID。
LOCATION_ID：您在其中使用 Vertex AI 的区域。
MODEL_ID：要部署的模型的 ID。
DEPLOYED_MODEL_NAME：DeployedModel 的名称。您还可以将 Model 的显示名用于 DeployedModel。
MIN_REPLICA_COUNT：此部署的最小节点数。节点数可根据推理负载的需要而增加或减少，直至达到节点数上限并且绝不会少于此节点数。
MAX_REPLICA_COUNT：此部署的节点数上限。节点数可根据推理负载的需要而增加或减少，直至达到此节点数并且绝不会少于节点数下限。如果您省略 --max-replica-count 标志，则节点数上限将设置为 --min-replica-count 的值。

执行 gcloud ai endpoints deploy-model 命令：

Linux、macOS 或 Cloud Shell

gcloud ai endpoints deploy-model ENDPOINT_ID\
  --region=LOCATION_ID \
  --model=MODEL_ID \
  --display-name=DEPLOYED_MODEL_NAME \
  --min-replica-count=MIN_REPLICA_COUNT \
  --max-replica-count=MAX_REPLICA_COUNT \
  --traffic-split=0=100

Windows (PowerShell)

gcloud ai endpoints deploy-model ENDPOINT_ID`
  --region=LOCATION_ID `
  --model=MODEL_ID `
  --display-name=DEPLOYED_MODEL_NAME `
  --min-replica-count=MIN_REPLICA_COUNT `
  --max-replica-count=MAX_REPLICA_COUNT `
  --traffic-split=0=100

Windows (cmd.exe)

gcloud ai endpoints deploy-model ENDPOINT_ID^
  --region=LOCATION_ID ^
  --model=MODEL_ID ^
  --display-name=DEPLOYED_MODEL_NAME ^
  --min-replica-count=MIN_REPLICA_COUNT ^
  --max-replica-count=MAX_REPLICA_COUNT ^
  --traffic-split=0=100

拆分流量

上述示例中的 --traffic-split=0=100 标志会将 Endpoint 接收的 100% 预测流量发送到新 DeployedModel（使用临时 ID 0 表示）。如果您的 Endpoint 已有其他 DeployedModel 资源，那么您可以在新 DeployedModel 和旧资源之间拆分流量。例如，如需将 20% 的流量发送到新 DeployedModel，将 80% 发送到旧版本，请运行以下命令。

在使用下面的命令数据之前，请先进行以下替换：

OLD_DEPLOYED_MODEL_ID：现有 DeployedModel 的 ID。

执行 gcloud ai endpoints deploy-model 命令：

Linux、macOS 或 Cloud Shell

gcloud ai endpoints deploy-model ENDPOINT_ID\
  --region=LOCATION_ID \
  --model=MODEL_ID \
  --display-name=DEPLOYED_MODEL_NAME \ 
  --min-replica-count=MIN_REPLICA_COUNT \
  --max-replica-count=MAX_REPLICA_COUNT \
  --traffic-split=0=20,OLD_DEPLOYED_MODEL_ID=80

Windows (PowerShell)

gcloud ai endpoints deploy-model ENDPOINT_ID`
  --region=LOCATION_ID `
  --model=MODEL_ID `
  --display-name=DEPLOYED_MODEL_NAME \ 
  --min-replica-count=MIN_REPLICA_COUNT `
  --max-replica-count=MAX_REPLICA_COUNT `
  --traffic-split=0=20,OLD_DEPLOYED_MODEL_ID=80

Windows (cmd.exe)

gcloud ai endpoints deploy-model ENDPOINT_ID^
  --region=LOCATION_ID ^
  --model=MODEL_ID ^
  --display-name=DEPLOYED_MODEL_NAME \ 
  --min-replica-count=MIN_REPLICA_COUNT ^
  --max-replica-count=MAX_REPLICA_COUNT ^
  --traffic-split=0=20,OLD_DEPLOYED_MODEL_ID=80

REST

部署此模型。

在使用任何请求数据之前，请先进行以下替换：

LOCATION_ID：您在其中使用 Vertex AI 的区域。
PROJECT_ID：。
ENDPOINT_ID：端点的 ID。
MODEL_ID：要部署的模型的 ID。
DEPLOYED_MODEL_NAME：DeployedModel 的名称。您还可以将 Model 的显示名用于 DeployedModel。
MIN_REPLICA_COUNT：此部署的最小节点数。节点数可根据推理负载的需要而增加或减少，直至达到节点数上限并且绝不会少于此节点数。
MAX_REPLICA_COUNT：此部署的节点数上限。节点数可根据推理负载的需要而增加或减少，直至达到此节点数并且绝不会少于节点数下限。
TRAFFIC_SPLIT_THIS_MODEL：流向此端点的要路由到使用此操作部署的模型的预测流量百分比。默认值为 100。所有流量百分比之和必须为 100。详细了解流量拆分。
DEPLOYED_MODEL_ID_N：可选。如果将其他模型部署到此端点，您必须更新其流量拆分百分比，以便所有百分比之和等于 100。
TRAFFIC_SPLIT_MODEL_N：已部署模型 ID 密钥的流量拆分百分比值。
PROJECT_NUMBER：自动生成的项目编号

HTTP 方法和网址：

POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel

请求 JSON 正文：

{
  "deployedModel": {
    "model": "projects/PROJECT_ID/locations/LOCATION_ID/models/MODEL_ID",
    "displayName": "DEPLOYED_MODEL_NAME",
    "automaticResources": {
       "minReplicaCount": MIN_REPLICA_COUNT,
       "maxReplicaCount": MAX_REPLICA_COUNT
     }
  },
  "trafficSplit": {
    "0": TRAFFIC_SPLIT_THIS_MODEL,
    "DEPLOYED_MODEL_ID_1": TRAFFIC_SPLIT_MODEL_1,
    "DEPLOYED_MODEL_ID_2": TRAFFIC_SPLIT_MODEL_2
  },
}

如需发送您的请求，请展开以下选项之一：

curl（Linux、macOS 或 Cloud Shell）

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel"

PowerShell (Windows)

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel" | Select-Object -Expand Content

您应该收到类似以下内容的 JSON 响应：

{
  "name": "projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.DeployModelOperationMetadata",
    "genericMetadata": {
      "createTime": "2020-10-19T17:53:16.502088Z",
      "updateTime": "2020-10-19T17:53:16.502088Z"
    }
  }
}

Java

在尝试此示例之前，请按照《Vertex AI 快速入门：使用客户端库》中的 Java 设置说明执行操作。如需了解详情，请参阅 Vertex AI Java API 参考文档。

如需向 Vertex AI 进行身份验证，请设置应用默认凭证。如需了解详情，请参阅为本地开发环境设置身份验证。


import com.google.api.gax.longrunning.OperationFuture;
import com.google.api.gax.longrunning.OperationTimedPollAlgorithm;
import com.google.api.gax.retrying.RetrySettings;
import com.google.cloud.aiplatform.v1.AutomaticResources;
import com.google.cloud.aiplatform.v1.DedicatedResources;
import com.google.cloud.aiplatform.v1.DeployModelOperationMetadata;
import com.google.cloud.aiplatform.v1.DeployModelResponse;
import com.google.cloud.aiplatform.v1.DeployedModel;
import com.google.cloud.aiplatform.v1.EndpointName;
import com.google.cloud.aiplatform.v1.EndpointServiceClient;
import com.google.cloud.aiplatform.v1.EndpointServiceSettings;
import com.google.cloud.aiplatform.v1.MachineSpec;
import com.google.cloud.aiplatform.v1.ModelName;
import com.google.cloud.aiplatform.v1.stub.EndpointServiceStubSettings;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;
import org.threeten.bp.Duration;

public class DeployModelSample {

  public static void main(String[] args)
      throws IOException, InterruptedException, ExecutionException, TimeoutException {
    // TODO(developer): Replace these variables before running the sample.
    String project = "YOUR_PROJECT_ID";
    String deployedModelDisplayName = "YOUR_DEPLOYED_MODEL_DISPLAY_NAME";
    String endpointId = "YOUR_ENDPOINT_NAME";
    String modelId = "YOUR_MODEL_ID";
    int timeout = 900;
    deployModelSample(project, deployedModelDisplayName, endpointId, modelId, timeout);
  }

  static void deployModelSample(
      String project,
      String deployedModelDisplayName,
      String endpointId,
      String modelId,
      int timeout)
      throws IOException, InterruptedException, ExecutionException, TimeoutException {

    // Set long-running operations (LROs) timeout
    final OperationTimedPollAlgorithm operationTimedPollAlgorithm =
        OperationTimedPollAlgorithm.create(
            RetrySettings.newBuilder()
                .setInitialRetryDelay(Duration.ofMillis(5000L))
                .setRetryDelayMultiplier(1.5)
                .setMaxRetryDelay(Duration.ofMillis(45000L))
                .setInitialRpcTimeout(Duration.ZERO)
                .setRpcTimeoutMultiplier(1.0)
                .setMaxRpcTimeout(Duration.ZERO)
                .setTotalTimeout(Duration.ofSeconds(timeout))
                .build());

    EndpointServiceStubSettings.Builder endpointServiceStubSettingsBuilder =
        EndpointServiceStubSettings.newBuilder();
    endpointServiceStubSettingsBuilder
        .deployModelOperationSettings()
        .setPollingAlgorithm(operationTimedPollAlgorithm);
    EndpointServiceStubSettings endpointStubSettings = endpointServiceStubSettingsBuilder.build();
    EndpointServiceSettings endpointServiceSettings =
        EndpointServiceSettings.create(endpointStubSettings);
    endpointServiceSettings =
        endpointServiceSettings.toBuilder()
            .setEndpoint("us-central1-aiplatform.googleapis.com:443")
            .build();

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (EndpointServiceClient endpointServiceClient =
        EndpointServiceClient.create(endpointServiceSettings)) {
      String location = "us-central1";
      EndpointName endpointName = EndpointName.of(project, location, endpointId);
      // key '0' assigns traffic for the newly deployed model
      // Traffic percentage values must add up to 100
      // Leave dictionary empty if endpoint should not accept any traffic
      Map<String, Integer> trafficSplit = new HashMap<>();
      trafficSplit.put("0", 100);
      ModelName modelName = ModelName.of(project, location, modelId);
      AutomaticResources automaticResourcesInput =
          AutomaticResources.newBuilder().setMinReplicaCount(1).setMaxReplicaCount(1).build();
      DeployedModel deployedModelInput =
          DeployedModel.newBuilder()
              .setModel(modelName.toString())
              .setDisplayName(deployedModelDisplayName)
              .setAutomaticResources(automaticResourcesInput)
              .build();

      OperationFuture<DeployModelResponse, DeployModelOperationMetadata> deployModelResponseFuture =
          endpointServiceClient.deployModelAsync(endpointName, deployedModelInput, trafficSplit);
      System.out.format(
          "Operation name: %s\n", deployModelResponseFuture.getInitialFuture().get().getName());
      System.out.println("Waiting for operation to finish...");
      DeployModelResponse deployModelResponse = deployModelResponseFuture.get(20, TimeUnit.MINUTES);

      System.out.println("Deploy Model Response");
      DeployedModel deployedModel = deployModelResponse.getDeployedModel();
      System.out.println("\tDeployed Model");
      System.out.format("\t\tid: %s\n", deployedModel.getId());
      System.out.format("\t\tmodel: %s\n", deployedModel.getModel());
      System.out.format("\t\tDisplay Name: %s\n", deployedModel.getDisplayName());
      System.out.format("\t\tCreate Time: %s\n", deployedModel.getCreateTime());

      DedicatedResources dedicatedResources = deployedModel.getDedicatedResources();
      System.out.println("\t\tDedicated Resources");
      System.out.format("\t\t\tMin Replica Count: %s\n", dedicatedResources.getMinReplicaCount());

      MachineSpec machineSpec = dedicatedResources.getMachineSpec();
      System.out.println("\t\t\tMachine Spec");
      System.out.format("\t\t\t\tMachine Type: %s\n", machineSpec.getMachineType());
      System.out.format("\t\t\t\tAccelerator Type: %s\n", machineSpec.getAcceleratorType());
      System.out.format("\t\t\t\tAccelerator Count: %s\n", machineSpec.getAcceleratorCount());

      AutomaticResources automaticResources = deployedModel.getAutomaticResources();
      System.out.println("\t\tAutomatic Resources");
      System.out.format("\t\t\tMin Replica Count: %s\n", automaticResources.getMinReplicaCount());
      System.out.format("\t\t\tMax Replica Count: %s\n", automaticResources.getMaxReplicaCount());
    }
  }
}

Node.js

在尝试此示例之前，请按照《Vertex AI 快速入门：使用客户端库》中的 Node.js 设置说明执行操作。如需了解详情，请参阅 Vertex AI Node.js API 参考文档。

如需向 Vertex AI 进行身份验证，请设置应用默认凭证。如需了解详情，请参阅为本地开发环境设置身份验证。

/**
 * TODO(developer): Uncomment these variables before running the sample.\
 * (Not necessary if passing values as arguments)
 */

// const modelId = "YOUR_MODEL_ID";
// const endpointId = 'YOUR_ENDPOINT_ID';
// const deployedModelDisplayName = 'YOUR_DEPLOYED_MODEL_DISPLAY_NAME';
// const project = 'YOUR_PROJECT_ID';
// const location = 'YOUR_PROJECT_LOCATION';

const modelName = `projects/${project}/locations/${location}/models/${modelId}`;
const endpoint = `projects/${project}/locations/${location}/endpoints/${endpointId}`;
// Imports the Google Cloud Endpoint Service Client library
const {EndpointServiceClient} = require('@google-cloud/aiplatform');

// Specifies the location of the api endpoint:
const clientOptions = {
  apiEndpoint: 'us-central1-aiplatform.googleapis.com',
};

// Instantiates a client
const endpointServiceClient = new EndpointServiceClient(clientOptions);

async function deployModel() {
  // Configure the parent resource
  // key '0' assigns traffic for the newly deployed model
  // Traffic percentage values must add up to 100
  // Leave dictionary empty if endpoint should not accept any traffic
  const trafficSplit = {0: 100};
  const deployedModel = {
    // format: 'projects/{project}/locations/{location}/models/{model}'
    model: modelName,
    displayName: deployedModelDisplayName,
    automaticResources: {minReplicaCount: 1, maxReplicaCount: 1},
  };
  const request = {
    endpoint,
    deployedModel,
    trafficSplit,
  };

  // Get and print out a list of all the endpoints for this resource
  const [response] = await endpointServiceClient.deployModel(request);
  console.log(`Long running operation : ${response.name}`);

  // Wait for operation to complete
  await response.promise();
  const result = response.result;

  console.log('Deploy model response');
  const modelDeployed = result.deployedModel;
  console.log('\tDeployed model');
  if (!modelDeployed) {
    console.log('\t\tId : {}');
    console.log('\t\tModel : {}');
    console.log('\t\tDisplay name : {}');
    console.log('\t\tCreate time : {}');

    console.log('\t\tDedicated resources');
    console.log('\t\t\tMin replica count : {}');
    console.log('\t\t\tMachine spec {}');
    console.log('\t\t\t\tMachine type : {}');
    console.log('\t\t\t\tAccelerator type : {}');
    console.log('\t\t\t\tAccelerator count : {}');

    console.log('\t\tAutomatic resources');
    console.log('\t\t\tMin replica count : {}');
    console.log('\t\t\tMax replica count : {}');
  } else {
    console.log(`\t\tId : ${modelDeployed.id}`);
    console.log(`\t\tModel : ${modelDeployed.model}`);
    console.log(`\t\tDisplay name : ${modelDeployed.displayName}`);
    console.log(`\t\tCreate time : ${modelDeployed.createTime}`);

    const dedicatedResources = modelDeployed.dedicatedResources;
    console.log('\t\tDedicated resources');
    if (!dedicatedResources) {
      console.log('\t\t\tMin replica count : {}');
      console.log('\t\t\tMachine spec {}');
      console.log('\t\t\t\tMachine type : {}');
      console.log('\t\t\t\tAccelerator type : {}');
      console.log('\t\t\t\tAccelerator count : {}');
    } else {
      console.log(
        `\t\t\tMin replica count : \
          ${dedicatedResources.minReplicaCount}`
      );
      const machineSpec = dedicatedResources.machineSpec;
      console.log('\t\t\tMachine spec');
      console.log(`\t\t\t\tMachine type : ${machineSpec.machineType}`);
      console.log(
        `\t\t\t\tAccelerator type : ${machineSpec.acceleratorType}`
      );
      console.log(
        `\t\t\t\tAccelerator count : ${machineSpec.acceleratorCount}`
      );
    }

    const automaticResources = modelDeployed.automaticResources;
    console.log('\t\tAutomatic resources');
    if (!automaticResources) {
      console.log('\t\t\tMin replica count : {}');
      console.log('\t\t\tMax replica count : {}');
    } else {
      console.log(
        `\t\t\tMin replica count : \
          ${automaticResources.minReplicaCount}`
      );
      console.log(
        `\t\t\tMax replica count : \
          ${automaticResources.maxReplicaCount}`
      );
    }
  }
}
deployModel();

Python

如需了解如何安装或更新 Vertex AI SDK for Python，请参阅安装 Vertex AI SDK for Python。如需了解详情，请参阅 Python API 参考文档。

def deploy_model_with_automatic_resources_sample(
    project,
    location,
    model_name: str,
    endpoint: Optional[aiplatform.Endpoint] = None,
    deployed_model_display_name: Optional[str] = None,
    traffic_percentage: Optional[int] = 0,
    traffic_split: Optional[Dict[str, int]] = None,
    min_replica_count: int = 1,
    max_replica_count: int = 1,
    metadata: Optional[Sequence[Tuple[str, str]]] = (),
    sync: bool = True,
):
    """
    model_name: A fully-qualified model resource name or model ID.
          Example: "projects/123/locations/us-central1/models/456" or
          "456" when project and location are initialized or passed.
    """

    aiplatform.init(project=project, location=location)

    model = aiplatform.Model(model_name=model_name)

    model.deploy(
        endpoint=endpoint,
        deployed_model_display_name=deployed_model_display_name,
        traffic_percentage=traffic_percentage,
        traffic_split=traffic_split,
        min_replica_count=min_replica_count,
        max_replica_count=max_replica_count,
        metadata=metadata,
        sync=sync,
    )

    model.wait()

    print(model.display_name)
    print(model.resource_name)
    return model

了解如何更改推理日志记录的默认设置。

获取操作状态

某些请求会启动需要一些时间才能完成的长时间运行的操作。这些请求会返回操作名称，您可以使用该名称查看操作状态或取消操作。Vertex AI 提供辅助方法来调用长时间运行的操作。如需了解详情，请参阅使用长时间运行的操作。

使用所部署的模型进行在线推理

如需进行在线推理，请向模型提交一个或多个测试项进行分析，模型会返回基于模型目标的结果。如需详细了解推理结果，请参阅解读结果页面。

控制台

使用 Google Cloud 控制台请求在线推理。您的模型必须部署到端点。

在 Google Cloud 控制台的 Vertex AI 部分中，前往模型页面。

转到“模型”页面
从模型列表中，点击要通过其请求推理的模型的名称。
选择部署和测试标签页。
在测试模型部分下，添加测试项以请求推理。

用于图片目标的 AutoML 模型要求您上传图片才能请求推理。

如需了解局部特征重要性，请参阅获取说明。

推理完成后，Vertex AI 会在控制台中返回结果。

API

使用 Vertex AI API 来请求在线推理。您的模型必须部署到端点。如需查看更详细的示例，请参阅使用 AutoML 图片分类模型进行在线预测的教程笔记本。

获取批量推理结果

如需发出批量推理请求，请指定输入源和 Vertex AI 存储推理结果所采用的输出格式。AutoML 图片模型类型的批量推理需要您提供一个输入 JSON Lines 文件和用于存储输出的 Cloud Storage 存储桶的名称。

输入数据要求

批量请求的输入指定要发送到模型进行推理的内容。对于图片分类模型，您可以使用 JSON 行文件指定用于进行推理的图片列表，然后将 JSON 行文件存储在 Cloud Storage 存储桶中。以下示例展示了输入 JSON 行文件中的一行：

{"content": "gs://sourcebucket/datasets/images/source_image.jpg", "mimeType": "image/jpeg"}

请求批量推理

您可以使用 Google Cloud 控制台或 Vertex AI API 发出批量推理请求。批量推理任务可能需要一些时间才能完成，具体取决于提交的输入数据项数量。

Google Cloud 控制台

使用 Google Cloud 控制台请求批量推理。

在 Google Cloud 控制台的 Vertex AI 部分中，前往批量预测页面。

前往“批量预测”页面
点击创建以打开新建批量预测窗口，完成以下步骤：
1. 为批量推理输入一个名称。
2. 在模型名称部分，选择要用来执行该批量推理的模型的名称。
3. 对于来源路径，指定 JSON 行输入文件所在的 Cloud Storage 位置。
4. 在目标路径部分，指定要用来存储批量推理结果的 Cloud Storage 位置。输出格式取决于模型的目标。用于图片目标的 AutoML 模型会输出 JSON 行文件。

API

使用 Vertex AI API 发送批量推理请求。

REST

在使用任何请求数据之前，请先进行以下替换：

LOCATION_ID：存储模型和执行批量推理作业的区域。例如 us-central1。
PROJECT_ID：
BATCH_JOB_NAME：批处理作业的显示名
MODEL_ID：用于执行推理的模型的 ID
THRESHOLD_VALUE（可选）：Vertex AI 仅返回置信度得分至少为该值的推理。默认值为 0.0。
MAX_PREDICTIONS（可选）：Vertex AI 会将推理结果按置信度得分从高到低排序，返回最多此数量的推理结果。默认值为 10。
URI：输入 JSON 行文件所在的 Cloud Storage URI。
BUCKET：您的 Cloud Storage 存储桶
PROJECT_NUMBER：自动生成的项目编号

HTTP 方法和网址：

POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/batchPredictionJobs

请求 JSON 正文：

{
    "displayName": "BATCH_JOB_NAME",
    "model": "projects/PROJECT/locations/LOCATION/models/MODEL_ID",
    "modelParameters": {
      "confidenceThreshold": THRESHOLD_VALUE,
      "maxPredictions": MAX_PREDICTIONS
    },
    "inputConfig": {
        "instancesFormat": "jsonl",
        "gcsSource": {
            "uris": ["URI"],
        },
    },
    "outputConfig": {
        "predictionsFormat": "jsonl",
        "gcsDestination": {
            "outputUriPrefix": "OUTPUT_BUCKET",
        },
    },
}

如需发送请求，请选择以下方式之一：

curl

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/batchPredictionJobs"

PowerShell

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/batchPredictionJobs" | Select-Object -Expand Content

您应该收到类似以下内容的 JSON 响应：

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/batchPredictionJobs/BATCH_JOB_ID",
  "displayName": "BATCH_JOB_NAME",
  "model": "projects/PROJECT_ID/locations/LOCATION_ID/models/MODEL_ID",
  "inputConfig": {
    "instancesFormat": "jsonl",
    "gcsSource": {
      "uris": [
        "CONTENT"
      ]
    }
  },
  "outputConfig": {
    "predictionsFormat": "jsonl",
    "gcsDestination": {
      "outputUriPrefix": "BUCKET"
    }
  },
  "state": "JOB_STATE_PENDING",
  "createTime": "2020-05-30T02:58:44.341643Z",
  "updateTime": "2020-05-30T02:58:44.341643Z",
  "modelDisplayName": "MODEL_NAME",
  "modelObjective": "MODEL_OBJECTIVE"
}

您可以使用 BATCH_JOB_ID 轮询批量作业的状态，直到作业 state 为 JOB_STATE_SUCCEEDED。

Python

如需了解如何安装或更新 Vertex AI SDK for Python，请参阅安装 Vertex AI SDK for Python。如需了解详情，请参阅 Python API 参考文档。

def create_batch_prediction_job_sample(
    project: str,
    location: str,
    model_resource_name: str,
    job_display_name: str,
    gcs_source: Union[str, Sequence[str]],
    gcs_destination: str,
    sync: bool = True,
):
    aiplatform.init(project=project, location=location)

    my_model = aiplatform.Model(model_resource_name)

    batch_prediction_job = my_model.batch_predict(
        job_display_name=job_display_name,
        gcs_source=gcs_source,
        gcs_destination_prefix=gcs_destination,
        sync=sync,
    )

    batch_prediction_job.wait()

    print(batch_prediction_job.display_name)
    print(batch_prediction_job.resource_name)
    print(batch_prediction_job.state)
    return batch_prediction_job

检索批量推理结果

Vertex AI 会将批量推理输出发送到您指定的目标位置。

批量推理任务完成后，推理输出会存储在您在请求中指定的 Cloud Storage 存储桶中。

批量推理结果示例

以下示例展示了来自图片分类模型的批量推理结果。

{
  "instance": {"content": "gs://bucket/image.jpg", "mimeType": "image/jpeg"},
  "prediction": {
    "ids": [1, 2],
    "displayNames": ["cat", "dog"],
    "confidences": [0.7, 0.5]
  }
}

评估模型

解读结果

从图片分类模型获取推理结果 使用集合让一切井井有条 根据您的偏好保存内容并对其进行分类。

在线推理与批量推理之间的区别

获取在线推理结果

将模型部署到端点

Google Cloud 控制台

API

创建端点

gcloud

REST

curl（Linux、macOS 或 Cloud Shell）

PowerShell (Windows)

Java

Node.js

Python

检索端点 ID

gcloud

REST

curl（Linux、macOS 或 Cloud Shell）

PowerShell (Windows)

部署模型

gcloud

Linux、macOS 或 Cloud Shell

Windows (PowerShell)

Windows (cmd.exe)

拆分流量

Linux、macOS 或 Cloud Shell

Windows (PowerShell)

Windows (cmd.exe)

REST

curl（Linux、macOS 或 Cloud Shell）

PowerShell (Windows)

Java

Node.js

Python

获取操作状态

使用所部署的模型进行在线推理

控制台

API

获取批量推理结果

输入数据要求

请求批量推理

Google Cloud 控制台

API

REST

curl

PowerShell

Python

检索批量推理结果

批量推理结果示例

从图片分类模型获取推理结果