本頁面由 Cloud Translation API 翻譯而成。

建立及執行使用 GPU 的工作

本文說明如何建立及執行使用繪圖處理單元 (GPU) 的工作。如要進一步瞭解 GPU 的功能和限制，請參閱 Compute Engine 說明文件中的「關於 GPU」一文。

建立 Batch 工作時，您可以選擇使用 GPU 加速處理特定工作負載。使用 GPU 的工作常見用途包括：密集型資料處理和人工智慧 (AI) 工作負載，例如機器學習 (ML)。

事前準備

如果您從未使用過 Batch，請參閱「開始使用 Batch」，並完成專案和使用者的必要條件，啟用 Batch。
如要取得建立作業所需的權限，請要求管理員授予下列 IAM 角色：
- 批次工作編輯者 (roles/batch.jobsEditor) 專案
- 服務帳戶使用者 (roles/iam.serviceAccountUser) 在作業的服務帳戶上，預設為預設 Compute Engine 服務帳戶
如要進一步瞭解如何授予角色，請參閱「管理專案、資料夾和機構的存取權」。

您或許還可透過自訂角色或其他預先定義的角色取得必要權限。

建立使用 GPU 的工作

如要建立使用 GPU 的工作，請按照下列步驟操作：

規劃使用 GPU 的工作需求。
根據您識別出的需求和方法建立工作。如需瞭解如何使用建議選項建立工作，請參閱本文的「建立使用 GPU 的範例工作」一節。

規劃使用 GPU 的工作需求

建立使用 GPU 的工作前，請先規劃工作需求，如以下章節所述：

選取 GPU 機器類型和佈建方法
安裝 GPU 驅動程式
定義相容的 VM 資源

步驟 1：選取 GPU 機器類型和佈建方法

工作需求會因偏好的 GPU 機器類型和佈建方法而異，且各項選項可能互有關聯。您可以根據需求和優先順序，先選取 GPU 機器類型或先選取佈建方法。一般來說，GPU 機型主要會影響效能和基本價格，佈建方法則主要會影響資源可用性，以及額外費用或折扣。

選取 GPU 機型

如要瞭解可用的 GPU 機器類型 (GPU 類型、GPU 數量和機器類型 (vCPU 和記憶體) 的有效組合) 及其用途，請參閱 Compute Engine 說明文件中的「GPU 機器類型」頁面。

工作指定 GPU 機器類型時所需的欄位，會因下表中的類別而異：

GPU 機器類型及其工作需求
加速器最佳化 VM 的 GPU：如果 VM 的機型屬於加速器最佳化機型系列，系統會自動為 VM 連接特定類型和數量的 GPU。	如要將 GPU 用於加速器最佳化 VM，建議您指定機器類型。每個加速器最佳化機器類型僅支援特定類型和數量的 GPU，因此無論您是否指定這些值，功能上都等同於加速器最佳化機器類型。具體來說，Batch 也支援僅為加速器最佳化 VM 指定 GPU 類型和數量，但產生的 vCPU 和記憶體選項通常非常有限。因此，建議您確認可用的 vCPU 和記憶體選項是否與作業的工作需求相容。
N1 VM 的 GPU：您必須指定要附加至每個 VM 的類型和數量，且必須附加至N1 機器系列的機器類型 VM。	如要將 GPU 用於 N1 VM，建議您至少指定 GPU 類型和 GPU 數量。請確認值組合符合N1 機器類型的有效 GPU 選項。使用任何特定類型和數量的 GPU 時，N1 VM 的 vCPU 和記憶體選項相當彈性。除非您使用 Google Cloud 控制台建立工作，否則 Batch 會自動選取符合工作任務需求的機器類型。注意：如果工作指定 N1 機器類型，但未指定 GPU 類型和 GPU 數量，Batch 不會使用 GPU。

GPU 機器類型及其工作需求

加速器最佳化 VM 的 GPU：如果 VM 的機型屬於加速器最佳化機型系列，系統會自動為 VM 連接特定類型和數量的 GPU。

如要將 GPU 用於加速器最佳化 VM，建議您指定機器類型。每個加速器最佳化機器類型僅支援特定類型和數量的 GPU，因此無論您是否指定這些值，功能上都等同於加速器最佳化機器類型。

具體來說，Batch 也支援僅為加速器最佳化 VM 指定 GPU 類型和數量，但產生的 vCPU 和記憶體選項通常非常有限。因此，建議您確認可用的 vCPU 和記憶體選項是否與作業的工作需求相容。

N1 VM 的 GPU：您必須指定要附加至每個 VM 的類型和數量，且必須附加至N1 機器系列的機器類型 VM。

如要將 GPU 用於 N1 VM，建議您至少指定 GPU 類型和 GPU 數量。請確認值組合符合N1 機器類型的有效 GPU 選項。使用任何特定類型和數量的 GPU 時，N1 VM 的 vCPU 和記憶體選項相當彈性。除非您使用 Google Cloud 控制台建立工作，否則 Batch 會自動選取符合工作任務需求的機器類型。

選取佈建方法

Batch 會根據工作要求的資源類型，使用不同方法為使用 GPU 的工作佈建 VM 資源。下表依據使用案例列出可用的佈建方法及其需求，並依資源可用性由高到低排序。

總結來說，我們建議大多數使用者採取下列做法：

如要使用 A3 GPU 機器類型，但不想預留資源，請使用 Batch 的動態工作負載排程器 (搶先版)。

注意： 如要搭配其他 GPU 機器類型使用 Batch 的動態工作負載排程器，請聯絡 Google Cloud 銷售團隊或您的帳戶團隊。
其他 GPU 機器類型一律使用預設佈建方法。預設佈建方法通常是隨選，但如果專案有未使用的預留項目，工作可以自動耗用，則不在此限。

佈建方法和職務需求

佈建方法和職務需求
預訂用途：如果您希望資源可用性有極高的保障，或是已有可能未使用的現有預留資源，建議您為工作預留資源。詳細資料：預留項目會產生指定 VM 的費用，價格與執行 VM 相同，直到您刪除預留項目為止。使用預留項目的 VM 不會產生額外費用，但無論是否使用，預留項目都會產生費用。	Batch 會為可耗用未用保留項目的工作使用保留項目。如要進一步瞭解預留項目及其需求條件，請參閱「使用 VM 預留項目確保資源可用性」頁面。
批次動態工作負載排程器 (預先發布版) 用途：如果您想為機器類型來自 A3 系列機器的 VM 使用 GPU，但不想耗用保留項目，建議使用動態工作負載排程器。詳細資料：動態工作負載排程器可讓您同時存取多項資源，加快 AI 和機器學習工作負載的速度。舉例來說，動態工作負載排程器可協助排定工作，避免因資源無法使用而導致延遲或發生問題。重要事項：與其他工作不同，透過 Dynamic Workload Scheduler 使用 GPU 的 Batch 工作會對 Compute Engine 代管執行個體群組 (MIG) 發出大小調整要求，因此行為略有不同。具體來說，透過動態工作負載排程器使用 GPU 的工作可能需要先占分配配額，建議您採用這個選項，以減少動態工作負載排程器 GPU 的配額摩擦。詳情請參閱GPU VM 和先占分配配額。	如果作業符合下列所有條件，Batch 會使用 Dynamic Workload Scheduler：指定 A3 GPU 機器類型。封鎖預訂。具體來說，工作必須將 `reservation` 欄位設為 `NO_RESERVATION`。詳情請參閱建立及執行無法耗用預留 VM 的工作。請勿使用 Spot VM。具體來說，工作可以省略 `provisioningModel` 欄位，或將 `provisioningModel` 欄位設為 `STANDARD`。提示：雖然您可以在提供 A3 VM 的任何位置執行作業，但我們建議使用 `us-central1` 位置，因為該位置有專屬的 Dynamic Workload Scheduler 容量。
以量計價用途：建議所有其他工作都使用隨選模式。詳細資料：通常，隨選是存取 Compute Engine VM 的預設方式。您可以透過隨選模式，一次要求並立即存取一個 VM 的資源 (如有)。	Batch 會為所有其他工作使用以量計價模式。
Spot VM 用途：建議嘗試使用 Spot VM，降低容錯工作負載的成本。注意：Spot VM 可能無法隨時使用。按照 Spot VM 最佳做法操作，或許能提高資源可用性。不過，如果問題仍未解決，可能需要改用其他佈建方法。詳細資料：Spot VM 提供大幅折扣，但可能不一定隨時可用，而且隨時會被搶佔。詳情請參閱 Compute Engine 說明文件中的「Spot VM」。	如果作業將 `provisioningModel` 欄位設為 `SPOT`，Batch 會使用 Spot VM。

預訂

用途：如果您希望資源可用性有極高的保障，或是已有可能未使用的現有預留資源，建議您為工作預留資源。
詳細資料：預留項目會產生指定 VM 的費用，價格與執行 VM 相同，直到您刪除預留項目為止。使用預留項目的 VM 不會產生額外費用，但無論是否使用，預留項目都會產生費用。

Batch 會為可耗用未用保留項目的工作使用保留項目。如要進一步瞭解預留項目及其需求條件，請參閱「使用 VM 預留項目確保資源可用性」頁面。

批次動態工作負載排程器 (預先發布版)

用途：如果您想為機器類型來自 A3 系列機器的 VM 使用 GPU，但不想耗用保留項目，建議使用動態工作負載排程器。
詳細資料：動態工作負載排程器可讓您同時存取多項資源，加快 AI 和機器學習工作負載的速度。舉例來說，動態工作負載排程器可協助排定工作，避免因資源無法使用而導致延遲或發生問題。

重要事項：與其他工作不同，透過 Dynamic Workload Scheduler 使用 GPU 的 Batch 工作會對 Compute Engine 代管執行個體群組 (MIG) 發出大小調整要求，因此行為略有不同。具體來說，透過動態工作負載排程器使用 GPU 的工作可能需要先占分配配額，建議您採用這個選項，以減少動態工作負載排程器 GPU 的配額摩擦。詳情請參閱GPU VM 和先占分配配額。

如果作業符合下列所有條件，Batch 會使用 Dynamic Workload Scheduler：

指定 A3 GPU 機器類型。
封鎖預訂。具體來說，工作必須將 reservation 欄位設為 NO_RESERVATION。詳情請參閱建立及執行無法耗用預留 VM 的工作。
請勿使用 Spot VM。具體來說，工作可以省略 provisioningModel 欄位，或將 provisioningModel 欄位設為 STANDARD。

以量計價

用途：建議所有其他工作都使用隨選模式。
詳細資料：通常，隨選是存取 Compute Engine VM 的預設方式。您可以透過隨選模式，一次要求並立即存取一個 VM 的資源 (如有)。

Batch 會為所有其他工作使用以量計價模式。

Spot VM

用途：建議嘗試使用 Spot VM，降低容錯工作負載的成本。

注意：Spot VM 可能無法隨時使用。按照 Spot VM 最佳做法操作，或許能提高資源可用性。不過，如果問題仍未解決，可能需要改用其他佈建方法。
詳細資料：Spot VM 提供大幅折扣，但可能不一定隨時可用，而且隨時會被搶佔。詳情請參閱 Compute Engine 說明文件中的「Spot VM」。

如果作業將 provisioningModel 欄位設為 SPOT，Batch 會使用 Spot VM。

步驟 2：安裝 GPU 驅動程式

如要使用 GPU 執行工作，請務必安裝 GPU 驅動程式。如要安裝 GPU 驅動程式，請選用下列其中一種方法：

自動安裝 GPU 驅動程式 (建議盡可能採用)：如範例所示，如要讓 Batch 從第三方位置擷取並安裝必要的 GPU 驅動程式，請將工作的 installGpuDrivers 欄位設為 true。如果您的工作不需要手動安裝驅動程式，建議採用這個方法。

如要指定 Batch 安裝的 GPU 驅動程式版本，請一併設定 driverVersion 欄位。
手動安裝 GPU 驅動程式：如果符合下列任一條件，就必須採用此方法：

重要事項： 由於已知問題，您可能也需要為指定某些 Compute Engine 映像檔的工作手動安裝驅動程式。詳情請參閱「Jobs with GPUs and VM OS images with outdated kernels fail only when automatically installing drivers」(使用 GPU 的工作和核心過時的 VM OS 映像檔只會在自動安裝驅動程式時失敗)。
- 作業會使用指令碼和容器可執行檔，且無法存取網際網路。如要進一步瞭解作業的存取權，請參閱「Batch 網路總覽」。
- 工作使用自訂 VM 映像檔。如要進一步瞭解 VM OS 映像檔和可使用的 VM OS 映像檔，請參閱「VM OS 環境總覽」。
如要手動安裝必要的 GPU 驅動程式，建議使用下列方法：
1. 建立包含 GPU 驅動程式的自訂 VM 映像檔。
  1. 如要安裝 GPU 驅動程式，請根據要使用的 OS 執行安裝指令碼：
    - Container-Optimized OS 的 GPU 驅動程式
    - 其他 OS 的 GPU 驅動程式
  2. 如果作業有任何容器可執行檔，且未使用 Container-Optimized OS，您也必須安裝 NVIDIA Container Toolkit
2. 建立及提交使用 GPU 的工作時，請指定包含 GPU 驅動程式的自訂 VM 映像檔，並將工作的 installGpuDrivers 欄位設為 false (預設值)。

步驟 3：定義相容的 VM 資源

如要瞭解定義作業 VM 資源的規定和選項，請參閱「作業資源」。

總而言之，為使用 GPU 的工作定義 VM 資源時，您必須完成下列所有步驟：

請確認 GPU 機型適用於作業 VM 的位置。

如要瞭解 GPU 機型適用的區域，請參閱 Compute Engine 說明文件中的「各區域和可用區的 GPU 可用性」。
如果您指定工作機型，請確認該機型有足夠的 vCPU 和記憶體，可滿足工作任務需求。使用 Google Cloud 控制台建立工作時，必須指定工作的機器類型。建立使用 GPU 的工作時，建議指定機器類型，以便使用加速器最佳化 VM。
請務必使用有效方法，為工作定義 VM 資源：
- 直接使用 instances[].policy 欄位定義 VM 資源 (建議盡可能採用)。 如範例所示。
- 使用 instances[].instanceTemplate 欄位，透過範本定義 VM 資源。 如要透過自訂映像檔手動安裝 GPU 驅動程式，就必須使用這個方法。詳情請參閱使用 VM 執行個體範本定義工作資源。

建立使用 GPU 的範例工作

下列各節說明如何使用建議選項，為每個 GPU 機器類型建立範例工作。具體來說，範例工作都會自動安裝 GPU 驅動程式、直接定義 VM 資源，並指定佈建方法或使用預設佈建方法。

透過動態工作負載排程器，將 GPU 用於 A3 VM (預先發布版)
為加速器最佳化 VM 使用 GPU
為 N1 VM 使用 GPU

透過 Dynamic Workload Scheduler for Batch (預先發布版) 將 GPU 用於 A3 VM

您可以透過 Dynamic Workload Scheduler，使用 gcloud CLI 或 Batch API 建立工作，讓 A3 VM 使用 GPU。

gcloud

建立 JSON 檔案，安裝 GPU 驅動程式、指定 A3 系列的機器類型、封鎖預留項目，並在具有 GPU 機器類型的位置執行。

舉例來說，如要建立基本指令碼工作，透過動態工作負載排程器將 GPU 用於 A3 VM，請建立內容如下的 JSON 檔案：

{
    "taskGroups": [
        {
            "taskSpec": {
                "runnables": [
                    {
                        "script": {
                            "text": "echo Hello world from task ${BATCH_TASK_INDEX}."
                        }
                    }
                ]
            },
            "taskCount": 3,
            "parallelism": 1
        }
    ],
    "allocationPolicy": {
        "instances": [
            {
                "installGpuDrivers": INSTALL_GPU_DRIVERS,
                "policy": {
                    "machineType": "MACHINE_TYPE",
                    "reservation": "NO_RESERVATION"
                }
            }
        ],
        "location": {
            "allowedLocations": [
                "ALLOWED_LOCATIONS"
            ]
        }
    },
    "logsPolicy": {
        "destination": "CLOUD_LOGGING"
    }
}

更改下列內容：

INSTALL_GPU_DRIVERS：設為 true 時，Batch 會從第三方位置擷取 policy 欄位中指定的 GPU 類型所需驅動程式，並代表您安裝這些驅動程式。如果將這個欄位設為 false (預設值)，您必須手動安裝 GPU 驅動程式，才能使用這項工作的任何 GPU。
MACHINE_TYPE：A3 系列機型。
ALLOWED_LOCATIONS：您可以選擇使用allowedLocations[] 欄位，指定允許工作 VM 執行的地區或地區中的特定區域，例如 regions/us-central1 允許地區 us-central1 中的所有區域。請務必指定提供您要用於這項工作的 GPU 機型的位置。否則，如果您省略這個欄位，請確保工作地點提供 GPU 機器類型。

如要建立及執行作業，請使用 gcloud batch jobs submit 指令：
```
gcloud batch jobs submit JOB_NAME \
    --location LOCATION \
    --config JSON_CONFIGURATION_FILE
```
更改下列內容：
- JOB_NAME：作業名稱。
- LOCATION：工作地點。
- JSON_CONFIGURATION_FILE：JSON 檔案的路徑，內含作業的設定詳細資料。

API

向 jobs.create 方法發出 POST 要求，安裝 GPU 驅動程式、指定 A3 系列的機器類型、封鎖預留項目，並在具有 GPU 機器類型的位置執行。

舉例來說，如要透過 Dynamic Workload Scheduler 建立使用 GPU 的 A3 VM 基本指令碼工作，請提出下列要求：

POST https://batch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/jobs?job_id=JOB_NAME

{
    "taskGroups": [
        {
            "taskSpec": {
                "runnables": [
                    {
                        "script": {
                            "text": "echo Hello world from task ${BATCH_TASK_INDEX}."
                        }
                    }
                ]
            },
            "taskCount": 3,
            "parallelism": 1
        }
    ],
    "allocationPolicy": {
        "instances": [
            {
                "installGpuDrivers": INSTALL_GPU_DRIVERS,
                "policy": {
                    "machineType": "MACHINE_TYPE",
                    "reservation": "NO_RESERVATION"
                }
            }
        ],
        "location": {
            "allowedLocations": [
                "ALLOWED_LOCATIONS"
            ]
        }
    },
    "logsPolicy": {
        "destination": "CLOUD_LOGGING"
    }
}

更改下列內容：

PROJECT_ID：專案的專案 ID。
LOCATION：工作地點。
JOB_NAME：作業名稱。
INSTALL_GPU_DRIVERS：設為 true 時，Batch 會從第三方位置擷取 policy 欄位中指定的 GPU 類型所需驅動程式，並代表您安裝這些驅動程式。如果將這個欄位設為 false (預設值)，您必須手動安裝 GPU 驅動程式，才能使用這項工作的任何 GPU。
MACHINE_TYPE：A3 系列機型。
ALLOWED_LOCATIONS：您可以選擇使用allowedLocations[] 欄位，指定允許工作 VM 執行的地區或地區中的特定區域，例如 regions/us-central1 允許地區 us-central1 中的所有區域。請務必指定提供您要用於這項工作的 GPU 機型的位置。否則，如果您省略這個欄位，請確保工作地點提供 GPU 機器類型。

使用 GPU 搭配加速器最佳化 VM

您可以使用Google Cloud 主控台、gcloud CLI、Batch API、Java、Node.js 或 Python，建立使用 GPU 的工作，以搭配加速器最佳化 VM。

控制台

如要使用 Google Cloud 主控台建立使用 GPU 的工作，請按照下列步驟操作：

前往 Google Cloud 控制台的「Job list」(工作清單) 頁面。

前往工作清單
按一下「 Create」(建立)。「Create batch job」(建立批次工作) 頁面隨即開啟。左側窗格會選取「工作詳細資料」頁面。
設定「工作詳細資料」頁面：
1. 選用：在「Job name」(工作名稱) 欄位中，自訂工作名稱。
  
  例如輸入 example-gpu-job。
2. 設定「工作詳細資料」部分：
  1. 在「新增可執行項目」視窗中，為這項工作新增至少一個指令碼或容器。
    
    舉例來說，如要建立基本指令碼工作，請按照下列步驟操作：
    1. 勾選「指令碼」核取方塊。系統隨即會顯示欄位。
    2. 在欄位中輸入下列指令碼：
      echo Hello world from task ${BATCH_TASK_INDEX}.
    3. 按一下 [完成]。
  2. 在「工作數」欄位中，輸入這項工作的工作數。
    
    例如輸入 3。
  3. 選用：在「平行處理」欄位中，輸入要同時執行的工作數量。
    
    例如，輸入 1 (預設值)。
設定「資源規格」頁面：
1. 在左側窗格中，按一下「資源規格」。「資源規格」頁面隨即開啟。
2. 選用：在「VM 佈建模型」部分，為這項作業的 VM 選取下列其中一個佈建模型選項：
  - 如果工作可以承受先占，且您希望使用折扣 VM，請選取「Spot」。
  - 否則請選取「標準」(預設)。
3. 選取這項工作的地點。
  1. 在「Region」(區域) 欄位中選取區域。
  2. 在「區域」欄位中，執行下列任一操作：
    - 如要限制這項工作只能在特定區域中執行，請選取區域。
    - 否則請選取「任何」 (預設)。
  重要事項： 請務必只指定提供您要用於這項工作的 GPU 機型的
  位置。
4. 為這項工作的 VM 選取 GPU 機器類型：
  1. 在機器系列選項中，按一下「GPU」。
  2. 在「GPU type」(GPU 類型) 欄位中，選取 GPU 類型。然後在「Number of GPUs」(GPU 數量) 欄位中，選取每個 VM 的 GPU 數量。
    
    如果您選取加速器最佳化 VM 的其中一種 GPU 類型，則「機器類型」欄位只會根據您選取的 GPU 類型和數量，提供一種機器類型選項。
  3. 如要自動安裝 GPU 驅動程式，請選取「GPU driver installation」(安裝 GPU 驅動程式) (預設)。
5. 設定各項工作所需的 VM 資源量：
  
  重要事項： 請確認 GPU 機器類型有足夠的VM 資源，可滿足工作任務需求。
  1. 在「Cores」(核心) 欄位中，輸入每個工作使用的 vCPUs 數量。
    
    例如，輸入 1 (預設值)。
  2. 在「記憶體」欄位中，輸入每個工作使用的 RAM 容量 (以 GB 為單位)。
    
    例如，輸入 0.5 (預設值)。
6. 按一下 [完成]。
選用：設定這項工作的其他欄位。
選用：如要檢查工作設定，請在左側窗格中按一下「預覽」。
點選「建立」。

「Job details」(工作詳細資料) 頁面會顯示您建立的工作。

gcloud

建立 JSON 檔案，安裝 GPU 驅動程式、指定加速器最佳化機型系列的機器類型，並在具有 GPU 機器類型的位置執行。

舉例來說，如要建立使用 GPU 的基本指令碼作業，以加速器最佳化 VM 執行，請建立內容如下的 JSON 檔案：

{
    "taskGroups": [
        {
            "taskSpec": {
                "runnables": [
                    {
                        "script": {
                            "text": "echo Hello world from task ${BATCH_TASK_INDEX}."
                        }
                    }
                ]
            },
            "taskCount": 3,
            "parallelism": 1
        }
    ],
    "allocationPolicy": {
        "instances": [
            {
                "installGpuDrivers": INSTALL_GPU_DRIVERS,
                "policy": {
                    "machineType": "MACHINE_TYPE"
                }
            }
        ],
        "location": {
            "allowedLocations": [
                "ALLOWED_LOCATIONS"
            ]
        }
    },
    "logsPolicy": {
        "destination": "CLOUD_LOGGING"
    }
}

更改下列內容：

INSTALL_GPU_DRIVERS：設為 true 時，Batch 會從第三方位置擷取 policy 欄位中指定的 GPU 類型所需驅動程式，並代表您安裝這些驅動程式。如果將這個欄位設為 false (預設值)，您必須手動安裝 GPU 驅動程式，才能使用這項工作的任何 GPU。
MACHINE_TYPE：加速器最佳化機器系列的機器類型。
ALLOWED_LOCATIONS：您可以選擇使用allowedLocations[] 欄位，指定允許工作 VM 執行的地區或地區中的特定區域，例如 regions/us-central1 允許地區 us-central1 中的所有區域。請務必指定提供您要用於這項工作的 GPU 機型的位置。否則，如果您省略這個欄位，請確保工作地點提供 GPU 機器類型。

如要建立及執行作業，請使用 gcloud batch jobs submit 指令：
```
gcloud batch jobs submit JOB_NAME \
    --location LOCATION \
    --config JSON_CONFIGURATION_FILE
```
更改下列內容：
- JOB_NAME：作業名稱。
- LOCATION：工作地點。
- JSON_CONFIGURATION_FILE：JSON 檔案的路徑，內含作業的設定詳細資料。

API

向 jobs.create 方法發出 POST 要求，安裝 GPU 驅動程式、指定加速器最佳化機型系列的機器類型，並在具有 GPU 機器類型的位置執行。

舉例來說，如要建立使用 GPU 的基本指令碼工作，以加速器最佳化 VM 執行個體執行，請提出下列要求：

POST https://batch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/jobs?job_id=JOB_NAME

{
    "taskGroups": [
        {
            "taskSpec": {
                "runnables": [
                    {
                        "script": {
                            "text": "echo Hello world from task ${BATCH_TASK_INDEX}."
                        }
                    }
                ]
            },
            "taskCount": 3,
            "parallelism": 1
        }
    ],
    "allocationPolicy": {
        "instances": [
            {
                "installGpuDrivers": INSTALL_GPU_DRIVERS,
                "policy": {
                    "machineType": "MACHINE_TYPE"
                }
            }
        ],
        "location": {
            "allowedLocations": [
                "ALLOWED_LOCATIONS"
            ]
        }
    },
    "logsPolicy": {
        "destination": "CLOUD_LOGGING"
    }
}

更改下列內容：

PROJECT_ID：專案的專案 ID。
LOCATION：工作地點。
JOB_NAME：作業名稱。
INSTALL_GPU_DRIVERS：設為 true 時，Batch 會從第三方位置擷取 policy 欄位中指定的 GPU 類型所需驅動程式，並代表您安裝這些驅動程式。如果將這個欄位設為 false (預設值)，您必須手動安裝 GPU 驅動程式，才能使用這項工作的任何 GPU。
MACHINE_TYPE：加速器最佳化機器系列的機器類型。
ALLOWED_LOCATIONS：您可以選擇使用allowedLocations[] 欄位，指定允許工作 VM 執行的地區或地區中的特定區域，例如 regions/us-central1 允許地區 us-central1 中的所有區域。請務必指定提供您要用於這項工作的 GPU 機型的位置。否則，如果您省略這個欄位，請確保工作地點提供 GPU 機器類型。

Java


import com.google.cloud.batch.v1.AllocationPolicy;
import com.google.cloud.batch.v1.AllocationPolicy.Accelerator;
import com.google.cloud.batch.v1.AllocationPolicy.InstancePolicy;
import com.google.cloud.batch.v1.AllocationPolicy.InstancePolicyOrTemplate;
import com.google.cloud.batch.v1.BatchServiceClient;
import com.google.cloud.batch.v1.CreateJobRequest;
import com.google.cloud.batch.v1.Job;
import com.google.cloud.batch.v1.LogsPolicy;
import com.google.cloud.batch.v1.Runnable;
import com.google.cloud.batch.v1.Runnable.Script;
import com.google.cloud.batch.v1.TaskGroup;
import com.google.cloud.batch.v1.TaskSpec;
import com.google.protobuf.Duration;
import java.io.IOException;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;

public class CreateGpuJob {

  public static void main(String[] args)
      throws IOException, ExecutionException, InterruptedException, TimeoutException {
    // TODO(developer): Replace these variables before running the sample.
    // Project ID or project number of the Google Cloud project you want to use.
    String projectId = "YOUR_PROJECT_ID";
    // Name of the region you want to use to run the job. Regions that are
    // available for Batch are listed on: https://cloud.google.com/batch/docs/get-started#locations
    String region = "europe-central2";
    // The name of the job that will be created.
    // It needs to be unique for each project and region pair.
    String jobName = "JOB_NAME";
    // Optional. When set to true, Batch fetches the drivers required for the GPU type
    // that you specify in the policy field from a third-party location,
    // and Batch installs them on your behalf. If you set this field to false (default),
    // you need to install GPU drivers manually to use any GPUs for this job.
    boolean installGpuDrivers = false;
    // Accelerator-optimized machine types are available to Batch jobs. See the list
    // of available types on: https://cloud.google.com/compute/docs/accelerator-optimized-machines
    String machineType = "g2-standard-4";

    createGpuJob(projectId, region, jobName, installGpuDrivers, machineType);
  }

  // Create a job that uses GPUs
  public static Job createGpuJob(String projectId, String region, String jobName,
                                  boolean installGpuDrivers, String machineType)
      throws IOException, ExecutionException, InterruptedException, TimeoutException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests.
    try (BatchServiceClient batchServiceClient = BatchServiceClient.create()) {
      // Define what will be done as part of the job.
      Runnable runnable =
          Runnable.newBuilder()
              .setScript(
                  Script.newBuilder()
                      .setText(
                          "echo Hello world! This is task ${BATCH_TASK_INDEX}. "
                                  + "This job has a total of ${BATCH_TASK_COUNT} tasks.")
                      // You can also run a script from a file. Just remember, that needs to be a
                      // script that's already on the VM that will be running the job.
                      // Using setText() and setPath() is mutually exclusive.
                      // .setPath("/tmp/test.sh")
                      .build())
              .build();

      TaskSpec task = TaskSpec.newBuilder()
                  // Jobs can be divided into tasks. In this case, we have only one task.
                  .addRunnables(runnable)
                  .setMaxRetryCount(2)
                  .setMaxRunDuration(Duration.newBuilder().setSeconds(3600).build())
                  .build();

      // Tasks are grouped inside a job using TaskGroups.
      // Currently, it's possible to have only one task group.
      TaskGroup taskGroup = TaskGroup.newBuilder()
          .setTaskCount(3)
          .setParallelism(1)
          .setTaskSpec(task)
          .build();

      // Policies are used to define on what kind of virtual machines the tasks will run.
      // Read more about machine types here: https://cloud.google.com/compute/docs/machine-types
      InstancePolicy instancePolicy =
          InstancePolicy.newBuilder().setMachineType(machineType).build();  

      // Policies are used to define on what kind of virtual machines the tasks will run on.
      AllocationPolicy allocationPolicy =
          AllocationPolicy.newBuilder()
              .addInstances(
                  InstancePolicyOrTemplate.newBuilder()
                      .setInstallGpuDrivers(installGpuDrivers)
                      .setPolicy(instancePolicy)
                      .build())
              .build();

      Job job =
          Job.newBuilder()
              .addTaskGroups(taskGroup)
              .setAllocationPolicy(allocationPolicy)
              .putLabels("env", "testing")
              .putLabels("type", "script")
              // We use Cloud Logging as it's an out of the box available option.
              .setLogsPolicy(
                  LogsPolicy.newBuilder().setDestination(LogsPolicy.Destination.CLOUD_LOGGING))
              .build();

      CreateJobRequest createJobRequest =
          CreateJobRequest.newBuilder()
              // The job's parent is the region in which the job will run.
              .setParent(String.format("projects/%s/locations/%s", projectId, region))
              .setJob(job)
              .setJobId(jobName)
              .build();

      Job result =
          batchServiceClient
              .createJobCallable()
              .futureCall(createJobRequest)
              .get(5, TimeUnit.MINUTES);

      System.out.printf("Successfully created the job: %s", result.getName());

      return result;
    }
  }
}

Node.js

// Imports the Batch library
const batchLib = require('@google-cloud/batch');
const batch = batchLib.protos.google.cloud.batch.v1;

// Instantiates a client
const batchClient = new batchLib.v1.BatchServiceClient();

/**
 * TODO(developer): Update these variables before running the sample.
 */
// Project ID or project number of the Google Cloud project you want to use.
const projectId = await batchClient.getProjectId();
// Name of the region you want to use to run the job. Regions that are
// available for Batch are listed on: https://cloud.google.com/batch/docs/get-started#locations
const region = 'europe-central2';
// The name of the job that will be created.
// It needs to be unique for each project and region pair.
const jobName = 'batch-gpu-job';
// The GPU type. You can view a list of the available GPU types
// by using the `gcloud compute accelerator-types list` command.
const gpuType = 'nvidia-l4';
// The number of GPUs of the specified type.
const gpuCount = 1;
// Optional. When set to true, Batch fetches the drivers required for the GPU type
// that you specify in the policy field from a third-party location,
// and Batch installs them on your behalf. If you set this field to false (default),
// you need to install GPU drivers manually to use any GPUs for this job.
const installGpuDrivers = false;
// Accelerator-optimized machine types are available to Batch jobs. See the list
// of available types on: https://cloud.google.com/compute/docs/accelerator-optimized-machines
const machineType = 'g2-standard-4';

// Define what will be done as part of the job.
const runnable = new batch.Runnable({
  script: new batch.Runnable.Script({
    commands: ['-c', 'echo Hello world! This is task ${BATCH_TASK_INDEX}.'],
  }),
});

const task = new batch.TaskSpec({
  runnables: [runnable],
  maxRetryCount: 2,
  maxRunDuration: {seconds: 3600},
});

// Tasks are grouped inside a job using TaskGroups.
const group = new batch.TaskGroup({
  taskCount: 3,
  taskSpec: task,
});

// Policies are used to define on what kind of virtual machines the tasks will run on.
// In this case, we tell the system to use "g2-standard-4" machine type.
// Read more about machine types here: https://cloud.google.com/compute/docs/machine-types
const instancePolicy = new batch.AllocationPolicy.InstancePolicy({
  machineType,
  // Accelerator describes Compute Engine accelerators to be attached to the VM
  accelerators: [
    new batch.AllocationPolicy.Accelerator({
      type: gpuType,
      count: gpuCount,
      installGpuDrivers,
    }),
  ],
});

const allocationPolicy = new batch.AllocationPolicy.InstancePolicyOrTemplate({
  instances: [{installGpuDrivers, policy: instancePolicy}],
});

const job = new batch.Job({
  name: jobName,
  taskGroups: [group],
  labels: {env: 'testing', type: 'script'},
  allocationPolicy,
  // We use Cloud Logging as it's an option available out of the box
  logsPolicy: new batch.LogsPolicy({
    destination: batch.LogsPolicy.Destination.CLOUD_LOGGING,
  }),
});
// The job's parent is the project and region in which the job will run
const parent = `projects/${projectId}/locations/${region}`;

async function callCreateBatchGPUJob() {
  // Construct request
  const request = {
    parent,
    jobId: jobName,
    job,
  };

  // Run request
  const [response] = await batchClient.createJob(request);
  console.log(JSON.stringify(response));
}

await callCreateBatchGPUJob();

Python

from google.cloud import batch_v1


def create_gpu_job(project_id: str, region: str, job_name: str) -> batch_v1.Job:
    """
    This method shows how to create a sample Batch Job that will run
    a simple command on Cloud Compute instances on GPU machines.

    Args:
        project_id: project ID or project number of the Cloud project you want to use.
        region: name of the region you want to use to run the job. Regions that are
            available for Batch are listed on: https://cloud.google.com/batch/docs/get-started#locations
        job_name: the name of the job that will be created.
            It needs to be unique for each project and region pair.

    Returns:
        A job object representing the job created.
    """
    client = batch_v1.BatchServiceClient()

    # Define what will be done as part of the job.
    task = batch_v1.TaskSpec()
    runnable = batch_v1.Runnable()
    runnable.script = batch_v1.Runnable.Script()
    runnable.script.text = "echo Hello world! This is task ${BATCH_TASK_INDEX}. This job has a total of ${BATCH_TASK_COUNT} tasks."
    # You can also run a script from a file. Just remember, that needs to be a script that's
    # already on the VM that will be running the job. Using runnable.script.text and runnable.script.path is mutually
    # exclusive.
    # runnable.script.path = '/tmp/test.sh'
    task.runnables = [runnable]

    # We can specify what resources are requested by each task.
    resources = batch_v1.ComputeResource()
    resources.cpu_milli = 2000  # in milliseconds per cpu-second. This means the task requires 2 whole CPUs.
    resources.memory_mib = 16  # in MiB
    task.compute_resource = resources

    task.max_retry_count = 2
    task.max_run_duration = "3600s"

    # Tasks are grouped inside a job using TaskGroups.
    # Currently, it's possible to have only one task group.
    group = batch_v1.TaskGroup()
    group.task_count = 4
    group.task_spec = task

    # Policies are used to define on what kind of virtual machines the tasks will run on.
    # In this case, we tell the system to use "g2-standard-4" machine type.
    # Read more about machine types here: https://cloud.google.com/compute/docs/machine-types
    policy = batch_v1.AllocationPolicy.InstancePolicy()
    policy.machine_type = "g2-standard-4"

    instances = batch_v1.AllocationPolicy.InstancePolicyOrTemplate()
    instances.policy = policy
    instances.install_gpu_drivers = True
    allocation_policy = batch_v1.AllocationPolicy()
    allocation_policy.instances = [instances]

    job = batch_v1.Job()
    job.task_groups = [group]
    job.allocation_policy = allocation_policy
    job.labels = {"env": "testing", "type": "container"}
    # We use Cloud Logging as it's an out of the box available option
    job.logs_policy = batch_v1.LogsPolicy()
    job.logs_policy.destination = batch_v1.LogsPolicy.Destination.CLOUD_LOGGING

    create_request = batch_v1.CreateJobRequest()
    create_request.job = job
    create_request.job_id = job_name
    # The job's parent is the region in which the job will run
    create_request.parent = f"projects/{project_id}/locations/{region}"

    return client.create_job(create_request)

為 N1 VM 使用 GPU

您可以使用 Google Cloud 控制台、gcloud CLI、Batch API、Java、Node.js 或 Python，建立使用 N1 VM GPU 的工作。

控制台

如要使用 Google Cloud 主控台建立使用 GPU 的工作，請按照下列步驟操作：

前往 Google Cloud 控制台的「Job list」(工作清單) 頁面。

前往工作清單
按一下「 Create」(建立)。「Create batch job」(建立批次工作) 頁面隨即開啟。左側窗格會選取「工作詳細資料」頁面。
設定「工作詳細資料」頁面：
1. 選用：在「Job name」(工作名稱) 欄位中，自訂工作名稱。
  
  例如輸入 example-gpu-job。
2. 設定「工作詳細資料」部分：
  1. 在「新增可執行項目」視窗中，為這項工作新增至少一個指令碼或容器。
    
    舉例來說，如要建立基本指令碼工作，請按照下列步驟操作：
    1. 勾選「指令碼」核取方塊。系統隨即會顯示欄位。
    2. 在欄位中輸入下列指令碼：
      echo Hello world from task ${BATCH_TASK_INDEX}.
    3. 按一下 [完成]。
  2. 在「工作數」欄位中，輸入這項工作的工作數。
    
    例如輸入 3。
  3. 選用：在「平行處理」欄位中，輸入要同時執行的工作數量。
    
    例如，輸入 1 (預設值)。
設定「資源規格」頁面：
1. 在左側窗格中，按一下「資源規格」。「資源規格」頁面隨即開啟。
2. 選用：在「VM 佈建模型」部分，為這項作業的 VM 選取下列其中一個佈建模型選項：
  - 如果工作可以承受先占，且您希望使用折扣 VM，請選取「Spot」。
  - 否則請選取「標準」(預設)。
3. 選取這項工作的地點。
  1. 在「Region」(區域) 欄位中選取區域。
  2. 在「區域」欄位中，執行下列任一操作：
    - 如要限制這項工作只能在特定區域中執行，請選取區域。
    - 否則請選取「任何」 (預設)。
  重要事項： 請務必只指定提供您要用於這項工作的 GPU 機型的
  位置。
4. 為這項工作的 VM 選取 GPU 機器類型：
  1. 在機器系列選項中，按一下「GPU」。
  2. 在「GPU type」(GPU 類型) 欄位中，選取 GPU 類型。
    
    如果您選取N1 VM 的其中一種 GPU 類型，則「系列」欄會設為「N1」。
  3. 在「Number of GPUs」(GPU 數量) 欄位中，選取每個 VM 的 GPU 數量。
  4. 在「Machine type」(機器類型) 欄位中，選取機器類型。
  5. 如要自動安裝 GPU 驅動程式，請選取「GPU driver installation」(安裝 GPU 驅動程式) (預設)。
5. 設定各項工作所需的 VM 資源量：
  
  重要事項： 請確認 GPU 機器類型有足夠的VM 資源，可滿足工作任務需求。
  1. 在「Cores」(核心) 欄位中，輸入每個工作使用的 vCPUs 數量。
    
    例如，輸入 1 (預設值)。
  2. 在「記憶體」欄位中，輸入每個工作使用的 RAM 容量 (以 GB 為單位)。
    
    例如，輸入 0.5 (預設值)。
6. 按一下 [完成]。
選用：設定這項工作的其他欄位。
選用：如要檢查工作設定，請在左側窗格中按一下「預覽」。
點選「建立」。

「Job details」(工作詳細資料) 頁面會顯示您建立的工作。

gcloud

建立 JSON 檔案，安裝 GPU 驅動程式、定義 accelerators[] 欄位的 type 和 count 子欄位，並在具有 GPU 機器類型的位置執行。

舉例來說，如要建立使用 N1 VM 的 GPU，並讓 Batch 選取確切 N1 機型的基本指令碼工作，請建立含有下列內容的 JSON 檔案：

{
    "taskGroups": [
        {
            "taskSpec": {
                "runnables": [
                    {
                        "script": {
                            "text": "echo Hello world from task ${BATCH_TASK_INDEX}."
                        }
                    }
                ]
            },
            "taskCount": 3,
            "parallelism": 1
        }
    ],
    "allocationPolicy": {
        "instances": [
            {
                "installGpuDrivers": INSTALL_GPU_DRIVERS,
                "policy": {
                    "accelerators": [
                        {
                            "type": "GPU_TYPE",
                            "count": GPU_COUNT
                        }
                    ]
                }
            }
        ],
        "location": {
            "allowedLocations": [
                "ALLOWED_LOCATIONS"
            ]
        }
    },
    "logsPolicy": {
        "destination": "CLOUD_LOGGING"
    }
}

更改下列內容：

INSTALL_GPU_DRIVERS：設為 true 時，Batch 會從第三方位置擷取 policy 欄位中指定的 GPU 類型所需驅動程式，並代表您安裝這些驅動程式。如果將這個欄位設為 false (預設值)，您必須手動安裝 GPU 驅動程式，才能使用這項工作的任何 GPU。
GPU_TYPE：GPU 類型。您可以使用 gcloud compute accelerator-types list 指令，查看可用 GPU 類型的清單。這個欄位僅適用於 N1 VM 的 GPU。
GPU_COUNT：指定類型的 GPU 數量。如要進一步瞭解有效選項，請參閱 N1 系列機器的 GPU 機型。這個欄位僅適用於 N1 VM 的 GPU。
ALLOWED_LOCATIONS：您可以選擇使用allowedLocations[] 欄位，指定允許工作 VM 執行的地區或地區中的特定區域，例如 regions/us-central1 允許地區 us-central1 中的所有區域。請務必指定提供您要用於這項工作的 GPU 機型的位置。否則，如果您省略這個欄位，請確保工作地點提供 GPU 機器類型。

如要建立及執行作業，請使用 gcloud batch jobs submit 指令：
```
gcloud batch jobs submit JOB_NAME \
    --location LOCATION \
    --config JSON_CONFIGURATION_FILE
```
更改下列內容：
- JOB_NAME：作業名稱。
- LOCATION：工作地點。
- JSON_CONFIGURATION_FILE：JSON 檔案的路徑，內含作業的設定詳細資料。

API

向 jobs.create 方法發出 POST 要求，安裝 GPU 驅動程式、定義 accelerators[] 欄位的 type 和 count 子欄位，並使用具有 GPU 機器類型的位置。

舉例來說，如要建立使用 N1 VM 的 GPU，並讓 Batch 選取確切 N1 機器類型的基本指令碼工作，請提出下列要求：

POST https://batch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/jobs?job_id=JOB_NAME

{
    "taskGroups": [
        {
            "taskSpec": {
                "runnables": [
                    {
                        "script": {
                            "text": "echo Hello world from task ${BATCH_TASK_INDEX}."
                        }
                    }
                ]
            },
            "taskCount": 3,
            "parallelism": 1
        }
    ],
    "allocationPolicy": {
        "instances": [
            {
                "installGpuDrivers": INSTALL_GPU_DRIVERS,
                "policy": {
                    "accelerators": [
                        {
                            "type": "GPU_TYPE",
                            "count": GPU_COUNT
                        }
                    ]
                }
            }
        ],
        "location": {
            "allowedLocations": [
                "ALLOWED_LOCATIONS"
            ]
        }
    },
    "logsPolicy": {
        "destination": "CLOUD_LOGGING"
    }
}

更改下列內容：

PROJECT_ID：專案的專案 ID。
LOCATION：工作地點。
JOB_NAME：作業名稱。
INSTALL_GPU_DRIVERS：設為 true 時，Batch 會從第三方位置擷取 policy 欄位中指定的 GPU 類型所需驅動程式，並代表您安裝這些驅動程式。如果將這個欄位設為 false (預設值)，您必須手動安裝 GPU 驅動程式，才能使用這項工作的任何 GPU。
GPU_TYPE：GPU 類型。您可以使用 gcloud compute accelerator-types list 指令，查看可用 GPU 類型的清單。這個欄位僅適用於 N1 VM 的 GPU。
GPU_COUNT：指定類型的 GPU 數量。如要進一步瞭解有效選項，請參閱 N1 系列機器的 GPU 機型。這個欄位僅適用於 N1 VM 的 GPU。
ALLOWED_LOCATIONS：您可以選擇使用allowedLocations[] 欄位，指定允許工作 VM 執行的地區或地區中的特定區域，例如 regions/us-central1 允許地區 us-central1 中的所有區域。請務必指定提供您要用於這項工作的 GPU 機型的位置。否則，如果您省略這個欄位，請確保工作地點提供 GPU 機器類型。

Java


import com.google.cloud.batch.v1.AllocationPolicy;
import com.google.cloud.batch.v1.AllocationPolicy.Accelerator;
import com.google.cloud.batch.v1.AllocationPolicy.InstancePolicy;
import com.google.cloud.batch.v1.AllocationPolicy.InstancePolicyOrTemplate;
import com.google.cloud.batch.v1.BatchServiceClient;
import com.google.cloud.batch.v1.CreateJobRequest;
import com.google.cloud.batch.v1.Job;
import com.google.cloud.batch.v1.LogsPolicy;
import com.google.cloud.batch.v1.Runnable;
import com.google.cloud.batch.v1.Runnable.Script;
import com.google.cloud.batch.v1.TaskGroup;
import com.google.cloud.batch.v1.TaskSpec;
import com.google.protobuf.Duration;
import java.io.IOException;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;

public class CreateGpuJobN1 {

  public static void main(String[] args)
      throws IOException, ExecutionException, InterruptedException, TimeoutException {
    // TODO(developer): Replace these variables before running the sample.
    // Project ID or project number of the Google Cloud project you want to use.
    String projectId = "YOUR_PROJECT_ID";
    // Name of the region you want to use to run the job. Regions that are
    // available for Batch are listed on: https://cloud.google.com/batch/docs/get-started#locations
    String region = "europe-central2";
    // The name of the job that will be created.
    // It needs to be unique for each project and region pair.
    String jobName = "JOB_NAME";
    // Optional. When set to true, Batch fetches the drivers required for the GPU type
    // that you specify in the policy field from a third-party location,
    // and Batch installs them on your behalf. If you set this field to false (default),
    // you need to install GPU drivers manually to use any GPUs for this job.
    boolean installGpuDrivers = false;
    // The GPU type. You can view a list of the available GPU types
    // by using the `gcloud compute accelerator-types list` command.
    String gpuType = "nvidia-tesla-t4";
    // The number of GPUs of the specified type.
    int gpuCount = 2;

    createGpuJob(projectId, region, jobName, installGpuDrivers, gpuType, gpuCount);
  }

  // Create a job that uses GPUs
  public static Job createGpuJob(String projectId, String region, String jobName,
                                  boolean installGpuDrivers, String gpuType, int gpuCount)
      throws IOException, ExecutionException, InterruptedException, TimeoutException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests.
    try (BatchServiceClient batchServiceClient = BatchServiceClient.create()) {
      // Define what will be done as part of the job.
      Runnable runnable =
          Runnable.newBuilder()
              .setScript(
                  Script.newBuilder()
                      .setText(
                          "echo Hello world! This is task ${BATCH_TASK_INDEX}. "
                                  + "This job has a total of ${BATCH_TASK_COUNT} tasks.")
                      // You can also run a script from a file. Just remember, that needs to be a
                      // script that's already on the VM that will be running the job.
                      // Using setText() and setPath() is mutually exclusive.
                      // .setPath("/tmp/test.sh")
                      .build())
              .build();

      TaskSpec task = TaskSpec.newBuilder()
                  // Jobs can be divided into tasks. In this case, we have only one task.
                  .addRunnables(runnable)
                  .setMaxRetryCount(2)
                  .setMaxRunDuration(Duration.newBuilder().setSeconds(3600).build())
                  .build();

      // Tasks are grouped inside a job using TaskGroups.
      // Currently, it's possible to have only one task group.
      TaskGroup taskGroup = TaskGroup.newBuilder()
          .setTaskCount(3)
          .setParallelism(1)
          .setTaskSpec(task)
          .build();

      // Accelerator describes Compute Engine accelerators to be attached to the VM.
      Accelerator accelerator = Accelerator.newBuilder()
          .setType(gpuType)
          .setCount(gpuCount)
          .build();

      // Policies are used to define on what kind of virtual machines the tasks will run on.
      AllocationPolicy allocationPolicy =
          AllocationPolicy.newBuilder()
              .addInstances(
                  InstancePolicyOrTemplate.newBuilder()
                      .setInstallGpuDrivers(installGpuDrivers)
                      .setPolicy(InstancePolicy.newBuilder().addAccelerators(accelerator))
                      .build())
              .build();

      Job job =
          Job.newBuilder()
              .addTaskGroups(taskGroup)
              .setAllocationPolicy(allocationPolicy)
              .putLabels("env", "testing")
              .putLabels("type", "script")
              // We use Cloud Logging as it's an out of the box available option.
              .setLogsPolicy(
                  LogsPolicy.newBuilder().setDestination(LogsPolicy.Destination.CLOUD_LOGGING))
              .build();

      CreateJobRequest createJobRequest =
          CreateJobRequest.newBuilder()
              // The job's parent is the region in which the job will run.
              .setParent(String.format("projects/%s/locations/%s", projectId, region))
              .setJob(job)
              .setJobId(jobName)
              .build();

      Job result =
          batchServiceClient
              .createJobCallable()
              .futureCall(createJobRequest)
              .get(5, TimeUnit.MINUTES);

      System.out.printf("Successfully created the job: %s", result.getName());

      return result;
    }
  }
}

Node.js

// Imports the Batch library
const batchLib = require('@google-cloud/batch');
const batch = batchLib.protos.google.cloud.batch.v1;

// Instantiates a client
const batchClient = new batchLib.v1.BatchServiceClient();

/**
 * TODO(developer): Update these variables before running the sample.
 */
// Project ID or project number of the Google Cloud project you want to use.
const projectId = await batchClient.getProjectId();
// Name of the region you want to use to run the job. Regions that are
// available for Batch are listed on: https://cloud.google.com/batch/docs/get-started#locations
const region = 'europe-central2';
// The name of the job that will be created.
// It needs to be unique for each project and region pair.
const jobName = 'batch-gpu-job-n1';
// The GPU type. You can view a list of the available GPU types
// by using the `gcloud compute accelerator-types list` command.
const gpuType = 'nvidia-tesla-t4';
// The number of GPUs of the specified type.
const gpuCount = 1;
// Optional. When set to true, Batch fetches the drivers required for the GPU type
// that you specify in the policy field from a third-party location,
// and Batch installs them on your behalf. If you set this field to false (default),
// you need to install GPU drivers manually to use any GPUs for this job.
const installGpuDrivers = false;
// Accelerator-optimized machine types are available to Batch jobs. See the list
// of available types on: https://cloud.google.com/compute/docs/accelerator-optimized-machines
const machineType = 'n1-standard-16';

// Define what will be done as part of the job.
const runnable = new batch.Runnable({
  script: new batch.Runnable.Script({
    commands: ['-c', 'echo Hello world! This is task ${BATCH_TASK_INDEX}.'],
  }),
});

const task = new batch.TaskSpec({
  runnables: [runnable],
  maxRetryCount: 2,
  maxRunDuration: {seconds: 3600},
});

// Tasks are grouped inside a job using TaskGroups.
const group = new batch.TaskGroup({
  taskCount: 3,
  taskSpec: task,
});

// Policies are used to define on what kind of virtual machines the tasks will run on.
// In this case, we tell the system to use "g2-standard-4" machine type.
// Read more about machine types here: https://cloud.google.com/compute/docs/machine-types
const instancePolicy = new batch.AllocationPolicy.InstancePolicy({
  machineType,
  // Accelerator describes Compute Engine accelerators to be attached to the VM
  accelerators: [
    new batch.AllocationPolicy.Accelerator({
      type: gpuType,
      count: gpuCount,
      installGpuDrivers,
    }),
  ],
});

const allocationPolicy = new batch.AllocationPolicy.InstancePolicyOrTemplate({
  instances: [{installGpuDrivers, policy: instancePolicy}],
});

const job = new batch.Job({
  name: jobName,
  taskGroups: [group],
  labels: {env: 'testing', type: 'script'},
  allocationPolicy,
  // We use Cloud Logging as it's an option available out of the box
  logsPolicy: new batch.LogsPolicy({
    destination: batch.LogsPolicy.Destination.CLOUD_LOGGING,
  }),
});
// The job's parent is the project and region in which the job will run
const parent = `projects/${projectId}/locations/${region}`;

async function callCreateBatchGPUJobN1() {
  // Construct request
  const request = {
    parent,
    jobId: jobName,
    job,
  };

  // Run request
  const [response] = await batchClient.createJob(request);
  console.log(JSON.stringify(response));
}

await callCreateBatchGPUJobN1();

Python

from google.cloud import batch_v1


def create_gpu_job(
    project_id: str, region: str, zone: str, job_name: str
) -> batch_v1.Job:
    """
    This method shows how to create a sample Batch Job that will run
    a simple command on Cloud Compute instances on GPU machines.

    Args:
        project_id: project ID or project number of the Cloud project you want to use.
        region: name of the region you want to use to run the job. Regions that are
            available for Batch are listed on: https://cloud.google.com/batch/docs/get-started#locations
        zone: name of the zone you want to use to run the job. Important in regard to GPUs availability.
            GPUs availability can be found here: https://cloud.google.com/compute/docs/gpus/gpu-regions-zones
        job_name: the name of the job that will be created.
            It needs to be unique for each project and region pair.

    Returns:
        A job object representing the job created.
    """
    client = batch_v1.BatchServiceClient()

    # Define what will be done as part of the job.
    task = batch_v1.TaskSpec()
    runnable = batch_v1.Runnable()
    runnable.script = batch_v1.Runnable.Script()
    runnable.script.text = "echo Hello world! This is task ${BATCH_TASK_INDEX}. This job has a total of ${BATCH_TASK_COUNT} tasks."
    # You can also run a script from a file. Just remember, that needs to be a script that's
    # already on the VM that will be running the job. Using runnable.script.text and runnable.script.path is mutually
    # exclusive.
    # runnable.script.path = '/tmp/test.sh'
    task.runnables = [runnable]

    # We can specify what resources are requested by each task.
    resources = batch_v1.ComputeResource()
    resources.cpu_milli = 2000  # in milliseconds per cpu-second. This means the task requires 2 whole CPUs.
    resources.memory_mib = 16  # in MiB
    task.compute_resource = resources

    task.max_retry_count = 2
    task.max_run_duration = "3600s"

    # Tasks are grouped inside a job using TaskGroups.
    # Currently, it's possible to have only one task group.
    group = batch_v1.TaskGroup()
    group.task_count = 4
    group.task_spec = task

    # Policies are used to define on what kind of virtual machines the tasks will run on.
    # Read more about machine types here: https://cloud.google.com/compute/docs/machine-types
    policy = batch_v1.AllocationPolicy.InstancePolicy()
    policy.machine_type = "n1-standard-16"

    accelerator = batch_v1.AllocationPolicy.Accelerator()
    # Note: not every accelerator is compatible with instance type
    # Read more here: https://cloud.google.com/compute/docs/gpus#t4-gpus
    accelerator.type_ = "nvidia-tesla-t4"
    accelerator.count = 1

    policy.accelerators = [accelerator]
    instances = batch_v1.AllocationPolicy.InstancePolicyOrTemplate()
    instances.policy = policy
    instances.install_gpu_drivers = True
    allocation_policy = batch_v1.AllocationPolicy()
    allocation_policy.instances = [instances]

    location = batch_v1.AllocationPolicy.LocationPolicy()
    location.allowed_locations = ["zones/us-central1-b"]
    allocation_policy.location = location

    job = batch_v1.Job()
    job.task_groups = [group]
    job.allocation_policy = allocation_policy
    job.labels = {"env": "testing", "type": "container"}
    # We use Cloud Logging as it's an out of the box available option
    job.logs_policy = batch_v1.LogsPolicy()
    job.logs_policy.destination = batch_v1.LogsPolicy.Destination.CLOUD_LOGGING

    create_request = batch_v1.CreateJobRequest()
    create_request.job = job
    create_request.job_id = job_name
    # The job's parent is the region in which the job will run
    create_request.parent = f"projects/{project_id}/locations/{region}"

    return client.create_job(create_request)

後續步驟

如果無法建立或執行工作，請參閱「疑難排解」。
查看工作和工作。
進一步瞭解工作建立選項。

建立及執行使用 GPU 的工作 透過集合功能整理內容 你可以依據偏好儲存及分類內容。

事前準備

建立使用 GPU 的工作

規劃使用 GPU 的工作需求

步驟 1：選取 GPU 機器類型和佈建方法

選取 GPU 機型

選取佈建方法

步驟 2：安裝 GPU 驅動程式

步驟 3：定義相容的 VM 資源

建立使用 GPU 的範例工作

透過 Dynamic Workload Scheduler for Batch (預先發布版) 將 GPU 用於 A3 VM

gcloud

API

使用 GPU 搭配加速器最佳化 VM

控制台

gcloud

API

Java

Node.js

Python

為 N1 VM 使用 GPU

控制台

gcloud

API

Java

Node.js

Python

後續步驟

建立及執行使用 GPU 的工作