此页面由 Cloud Translation API 翻译。

将模型部署到端点

您必须先将模型部署到端点，然后才能使用该模型执行在线预测。部署模型会将物理资源相关联，以便以低延迟方式执行在线预测。

本页面介绍了必须遵循的步骤，以便使用在线预测将模型部署到端点。

准备工作

在将模型部署到端点之前，请导出模型工件以进行预测，并确保您满足该页面中的所有前提条件。

创建资源池

借助ResourcePool自定义资源，您可以精细控制模型的行为。您可以定义以下设置：

自动扩缩配置。
机器类型，用于定义 CPU 和内存要求。
加速器选项，例如 GPU 资源。

机器类型对于您发送的用于创建预测集群的节点池规范请求至关重要。

对于已部署模型的资源池，加速器数量和类型决定了 GPU 使用情况。机器类型仅决定所请求的 CPU 和内存资源。因此，在 ResourcePool 规范中包含 GPU 加速器时，machineType 字段用于控制模型的 CPU 和内存要求，而 acceleratorType 字段用于控制 GPU。此外，acceleratorCount 字段用于控制 GPU 切片的数量。

如需创建 ResourcePool 自定义资源，请按以下步骤操作：

创建一个 YAML 文件，用于定义 ResourcePool 自定义资源。以下示例包含用于以下资源池的 YAML 文件：具有 GPU 加速器（基于 GPU 的模型）和不具有 GPU 加速器（基于 CPU 的模型）：

基于 GPU 的模型

  apiVersion: prediction.aiplatform.gdc.goog/v1
  kind: ResourcePool
  metadata:
    name: RESOURCE_POOL_NAME
    namespace: PROJECT_NAMESPACE
  spec:
    resourcePoolID: RESOURCE_POOL_NAME
    enableContainerLogging: false
    dedicatedResources:
      machineSpec:
        # The system adds computing overhead to the nodes for mandatory components.
        # Choose a machineType value that allocates fewer CPU and memory resources
        # than those used by the nodes in the prediction cluster.
        machineType: a2-highgpu-1g-gdc
        acceleratorType: nvidia-a100-80gb
        # The accelerator count is a slice of the requested virtualized GPUs.
        # The value corresponds to one-seventh of 80 GB of GPUs for each count.
        acceleratorCount: 2
      autoscaling:
        minReplica: 2
        maxReplica: 10

基于 CPU 的模型

  apiVersion: prediction.aiplatform.gdc.goog/v1
  kind: ResourcePool
  metadata:
    name: RESOURCE_POOL_NAME
    namespace: PROJECT_NAMESPACE
  spec:
    resourcePoolID: RESOURCE_POOL_NAME
    enableContainerLogging: false
    dedicatedResources:
      machineSpec:
        # The system adds computing overhead to the nodes for mandatory components.
        # Choose a machineType value that allocates fewer CPU and memory resources
        # than those used by the nodes in the prediction cluster.
        machineType: n2-highcpu-8-gdc
      autoscaling:
        minReplica: 2
        maxReplica: 10

替换以下内容：

RESOURCE_POOL_NAME：您要为 ResourcePool 定义文件指定的名称。
PROJECT_NAMESPACE：与预测集群关联的项目命名空间的名称。

根据资源需求和预测集群中的可用资源，修改 dedicatedResources 字段中的值。

将 ResourcePool 定义文件应用于预测集群：
```
kubectl --kubeconfig PREDICTION_CLUSTER_KUBECONFIG apply -f RESOURCE_POOL_NAME.yaml
```
替换以下内容：
- PREDICTION_CLUSTER_KUBECONFIG：预测集群中 kubeconfig 文件的路径。
- RESOURCE_POOL_NAME：ResourcePool 定义文件的名称。

创建 ResourcePool 自定义资源时，Kubernetes API 和 webhook 服务会验证 YAML 文件，并报告成功或失败。当您将模型部署到端点时，预测运算符会从资源池中预配并预留资源。