本頁面由 Cloud Translation API 翻譯而成。

設定自訂訓練工作的容器

執行自訂訓練時，您必須指定要讓 Vertex AI 執行哪些機器學習 (ML) 程式碼。為此，請為自訂容器或在預先建構容器中執行的 Python 訓練應用程式設定訓練容器。

如要決定是否要使用自訂容器或預先建構的容器，請參閱「訓練程式碼需求」。

本文件說明您在上述任一情況下必須指定的 Vertex AI API 欄位。

指定容器設定的位置

請在 WorkerPoolSpec 中指定設定詳細資料。視您執行自訂訓練的方式而定，請將此 WorkerPoolSpec 放入下列任一 API 欄位：

如果您要建立 CustomJob 資源，請在 CustomJob.jobSpec.workerPoolSpecs 中指定 WorkerPoolSpec。

如果您使用 Google Cloud CLI，可以在 gcloud ai custom-jobs create 指令中使用 --worker-pool-spec 旗標或 --config 旗標，指定 worker 集區選項。

進一步瞭解如何建立 CustomJob。
如果您要建立 HyperparameterTuningJob 資源，請在 HyperparameterTuningJob.trialJobSpec.workerPoolSpecs 中指定 WorkerPoolSpec。

如果您使用的是 gcloud CLI，則可以使用 gcloud ai hpt-tuning-jobs create 指令的 --config 標記，指定 worker 集區選項。

進一步瞭解如何建立 HyperparameterTuningJob。
如果您在建立 TrainingPipeline 資源時未進行超參數調整，請在 TrainingPipeline.trainingTaskInputs.workerPoolSpecs 中指定 WorkerPoolSpec。

進一步瞭解如何建立自訂 TrainingPipeline。
如果您要建立含有超參數調整的 TrainingPipeline，請在 TrainingPipeline.trainingTaskInputs.trialJobSpec.workerPoolSpecs 中指定 WorkerPoolSpec。

如果您要執行分散式訓練，可以為每個工作池使用不同的設定。

設定容器

視您使用的是預先建構的容器還是自訂容器而定，您必須在 WorkerPoolSpec 中指定不同的欄位。請選取符合您情況的分頁：

預先建構的容器

選取支援您打算用於訓練的 ML 架構的預先建構容器。在 pythonPackageSpec.executorImageUri 欄位中指定容器映像檔的其中一個 URI。
在 pythonPackageSpec.packageUris 欄位中指定 Python 訓練應用程式的 Cloud Storage URI。
在 pythonPackageSpec.pythonModule 欄位中指定訓練應用程式的進入點模組。
您也可以在 pythonPackageSpec.args 欄位中指定要傳遞至訓練應用程式進入點模組的指令列引數清單。

下列範例會說明建立 CustomJob 時，您應在何處指定這些容器設定：

控制台

您無法在 Google Cloud 主控台中直接建立 CustomJob。不過，您可以建立 TrainingPipeline 來建立 CustomJob。在 Google Cloud 控制台中建立 TrainingPipeline 時，您可以在「訓練容器」步驟的特定欄位中指定預先建構的容器設定：

pythonPackageSpec.executorImageUri：使用「Model framework」和「Model framework version」下拉式清單。
pythonPackageSpec.packageUris：使用「Package location」欄位。
pythonPackageSpec.pythonModule：使用「Python 模組」欄位。
pythonPackageSpec.args：使用「引數」欄位。

gcloud

gcloud ai custom-jobs create \
  --region=LOCATION \
  --display-name=JOB_NAME \
  --python-package-uris=PYTHON_PACKAGE_URIS \
  --worker-pool-spec=machine-type=MACHINE_TYPE,replica-count=REPLICA_COUNT,executor-image-uri=PYTHON_PACKAGE_EXECUTOR_IMAGE_URI,python-module=PYTHON_MODULE

如需更多背景資訊，請參閱建立 CustomJob 的指南。

自訂容器

在 containerSpec.imageUri 欄位中，指定自訂容器的 Artifact Registry 或 Docker Hub URI。
您也可以選擇覆寫容器中的 ENTRYPOINT 或 CMD 指示，方法是指定 containerSpec.command 或 containerSpec.args 欄位。這些欄位會影響容器的執行方式，並遵循下列規則：
- 如果您未指定任何欄位：容器會根據其 ENTRYPOINT 指示和 CMD 指示 (如果有) 執行。請參閱 Docker 說明文件，瞭解 CMD 和 ENTRYPOINT 的互動方式。
- 如果您只指定 containerSpec.command：容器會以 containerSpec.command 的值取代其 ENTRYPOINT 指令執行。如果容器有 CMD 指示，系統會忽略該指示。
- 如果您只指定 containerSpec.args：容器會根據其 ENTRYPOINT 指令執行，其中 containerSpec.args 的值會取代其 CMD 指令。
- 如果同時指定這兩個欄位：容器執行時，containerSpec.command 會取代其 ENTRYPOINT 指令，containerSpec.args 會取代其 CMD 指令。

下列範例會強調您在建立 CustomJob 時，可指定部分容器設定的位置：

控制台

您無法在 Google Cloud 主控台中直接建立 CustomJob。不過，您可以建立 TrainingPipeline 來建立 CustomJob。在 Google Cloud 控制台中建立 TrainingPipeline 時，您可以在「訓練容器」步驟的特定欄位中指定自訂容器設定：

containerSpec.imageUri：使用「容器映像檔」欄位。
containerSpec.command：您無法在Google Cloud 控制台中設定這個 API 欄位。
containerSpec.args：使用「引數」欄位。

gcloud

gcloud ai custom-jobs create \
  --region=LOCATION \
  --display-name=JOB_NAME \
  --worker-pool-spec=machine-type=MACHINE_TYPE,replica-count=REPLICA_COUNT,container-image-uri=CUSTOM_CONTAINER_IMAGE_URI

Java

在試用這個範例之前，請先按照 Vertex AI 快速入門：使用用戶端程式庫中的操作說明設定 Java。詳情請參閱 Vertex AI Java API 參考說明文件。

如要向 Vertex AI 進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證機制」。


import com.google.cloud.aiplatform.v1.AcceleratorType;
import com.google.cloud.aiplatform.v1.ContainerSpec;
import com.google.cloud.aiplatform.v1.CustomJob;
import com.google.cloud.aiplatform.v1.CustomJobSpec;
import com.google.cloud.aiplatform.v1.JobServiceClient;
import com.google.cloud.aiplatform.v1.JobServiceSettings;
import com.google.cloud.aiplatform.v1.LocationName;
import com.google.cloud.aiplatform.v1.MachineSpec;
import com.google.cloud.aiplatform.v1.WorkerPoolSpec;
import java.io.IOException;

// Create a custom job to run machine learning training code in Vertex AI
public class CreateCustomJobSample {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    String project = "PROJECT";
    String displayName = "DISPLAY_NAME";

    // Vertex AI runs your training application in a Docker container image. A Docker container
    // image is a self-contained software package that includes code and all dependencies. Learn
    // more about preparing your training application at
    // https://cloud.google.com/vertex-ai/docs/training/overview#prepare_your_training_application
    String containerImageUri = "CONTAINER_IMAGE_URI";
    createCustomJobSample(project, displayName, containerImageUri);
  }

  static void createCustomJobSample(String project, String displayName, String containerImageUri)
      throws IOException {
    JobServiceSettings settings =
        JobServiceSettings.newBuilder()
            .setEndpoint("us-central1-aiplatform.googleapis.com:443")
            .build();
    String location = "us-central1";

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests.
    try (JobServiceClient client = JobServiceClient.create(settings)) {
      MachineSpec machineSpec =
          MachineSpec.newBuilder()
              .setMachineType("n1-standard-4")
              .setAcceleratorType(AcceleratorType.NVIDIA_TESLA_T4)
              .setAcceleratorCount(1)
              .build();

      ContainerSpec containerSpec =
          ContainerSpec.newBuilder().setImageUri(containerImageUri).build();

      WorkerPoolSpec workerPoolSpec =
          WorkerPoolSpec.newBuilder()
              .setMachineSpec(machineSpec)
              .setReplicaCount(1)
              .setContainerSpec(containerSpec)
              .build();

      CustomJobSpec customJobSpecJobSpec =
          CustomJobSpec.newBuilder().addWorkerPoolSpecs(workerPoolSpec).build();

      CustomJob customJob =
          CustomJob.newBuilder()
              .setDisplayName(displayName)
              .setJobSpec(customJobSpecJobSpec)
              .build();
      LocationName parent = LocationName.of(project, location);
      CustomJob response = client.createCustomJob(parent, customJob);
      System.out.format("response: %s\n", response);
      System.out.format("Name: %s\n", response.getName());
    }
  }
}

Node.js

在試用這個範例之前，請先按照 Vertex AI 快速入門：使用用戶端程式庫中的操作說明設定 Node.js。詳情請參閱 Vertex AI Node.js API 參考說明文件。

如要向 Vertex AI 進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證機制」。

/**
 * TODO(developer): Uncomment these variables before running the sample.\
 * (Not necessary if passing values as arguments)
 */

// const customJobDisplayName = 'YOUR_CUSTOM_JOB_DISPLAY_NAME';
// const containerImageUri = 'YOUR_CONTAINER_IMAGE_URI';
// const project = 'YOUR_PROJECT_ID';
// const location = 'YOUR_PROJECT_LOCATION';

// Imports the Google Cloud Job Service Client library
const {JobServiceClient} = require('@google-cloud/aiplatform');

// Specifies the location of the api endpoint
const clientOptions = {
  apiEndpoint: 'us-central1-aiplatform.googleapis.com',
};

// Instantiates a client
const jobServiceClient = new JobServiceClient(clientOptions);

async function createCustomJob() {
  // Configure the parent resource
  const parent = `projects/${project}/locations/${location}`;
  const customJob = {
    displayName: customJobDisplayName,
    jobSpec: {
      workerPoolSpecs: [
        {
          machineSpec: {
            machineType: 'n1-standard-4',
            acceleratorType: 'NVIDIA_TESLA_T4',
            acceleratorCount: 1,
          },
          replicaCount: 1,
          containerSpec: {
            imageUri: containerImageUri,
            command: [],
            args: [],
          },
        },
      ],
    },
  };
  const request = {parent, customJob};

  // Create custom job request
  const [response] = await jobServiceClient.createCustomJob(request);

  console.log('Create custom job response:\n', JSON.stringify(response));
}
createCustomJob();

Python 適用的 Vertex AI SDK

如要瞭解如何安裝或更新 Python 適用的 Vertex AI SDK，請參閱「安裝 Python 適用的 Vertex AI SDK」。詳情請參閱 Vertex AI SDK for Python API 參考說明文件。

from google.cloud import aiplatform


def create_custom_job_sample(
    project: str,
    display_name: str,
    container_image_uri: str,
    location: str = "us-central1",
    api_endpoint: str = "us-central1-aiplatform.googleapis.com",
):
    # The AI Platform services require regional API endpoints.
    client_options = {"api_endpoint": api_endpoint}
    # Initialize client that will be used to create and send requests.
    # This client only needs to be created once, and can be reused for multiple requests.
    client = aiplatform.gapic.JobServiceClient(client_options=client_options)
    custom_job = {
        "display_name": display_name,
        "job_spec": {
            "worker_pool_specs": [
                {
                    "machine_spec": {
                        "machine_type": "n1-standard-4",
                        "accelerator_type": aiplatform.gapic.AcceleratorType.NVIDIA_TESLA_K80,
                        "accelerator_count": 1,
                    },
                    "replica_count": 1,
                    "container_spec": {
                        "image_uri": container_image_uri,
                        "command": [],
                        "args": [],
                    },
                }
            ]
        },
    }
    parent = f"projects/{project}/locations/{location}"
    response = client.create_custom_job(parent=parent, custom_job=custom_job)
    print("response:", response)

如需更多背景資訊，請參閱建立 CustomJob 的指南。

後續步驟

如要瞭解如何執行自訂訓練，請建立 CustomJob。

設定自訂訓練工作的容器 透過集合功能整理內容 你可以依據偏好儲存及分類內容。

指定容器設定的位置

設定容器

預先建構的容器

控制台

gcloud

自訂容器

控制台

gcloud

Java

Node.js

Python 適用的 Vertex AI SDK

後續步驟

設定自訂訓練工作的容器