Create AI-optimized instances in bulk with A4X

This document explains how to create virtual machine (VM) instances in bulk that use A4X accelerator-optimized machine types. To learn more about the machine type, see A4X series in the Compute Engine documentation.

To learn about VM and cluster creation options, see Deployment options overview page.

Before you begin

Before creating VMs in bulk, if you haven't already done so, complete the following steps:

Choose a consumption option: your choice of consumption option determines how you get and use GPU resources.
To learn more, see Choose a consumption option.
Obtain capacity: the process to obtain capacity differs for each consumption option.
To learn about the process to obtain capacity for your chosen consumption option, see Capacity overview.

Note: When you request A4X capacity, you obtain it in the all capacity mode. This mode is the only supported reservation operational mode for A4X machine types. For more information about all capacity mode, see Reservation operational mode.

Select the tab for how you plan to use the samples on this page:

Console

When you use the Google Cloud console to access Google Cloud services and APIs, you don't need to set up authentication.

gcloud

In the Google Cloud console, activate Cloud Shell.

Activate Cloud Shell

At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.

REST

To use the REST API samples on this page in a local development environment, you use the credentials you provide to the gcloud CLI.

Install the Google Cloud CLI. After installation, initialize the Google Cloud CLI by running the following command:

gcloud init

If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.

For more information, see Authenticate for using REST in the Google Cloud authentication documentation.

Required roles

To get the permissions that you need to create VMs in bulk, ask your administrator to grant you the Compute Instance Admin (v1) (roles/compute.instanceAdmin.v1) IAM role on the project. For more information about granting roles, see Manage access to projects, folders, and organizations.

This predefined role contains the permissions required to create VMs in bulk. To see the exact permissions that are required, expand the Required permissions section:

Required permissions

The following permissions are required to create VMs in bulk:

compute.instances.create on the project
To use a custom image to create the VM: compute.images.useReadOnly on the image
To use a snapshot to create the VM: compute.snapshots.useReadOnly on the snapshot
To use an instance template to create the VM: compute.instanceTemplates.useReadOnly on the instance template
To specify a subnet for your VM: compute.subnetworks.use on the project or on the chosen subnet
To specify a static IP address for the VM: compute.addresses.use on the project
To assign an external IP address to the VM when using a VPC network: compute.subnetworks.useExternalIp on the project or on the chosen subnet
To assign a legacy network to the VM: compute.networks.use on the project
To assign an external IP address to the VM when using a legacy network: compute.networks.useExternalIp on the project
To set VM instance metadata for the VM: compute.instances.setMetadata on the project
To set tags for the VM: compute.instances.setTags on the VM
To set labels for the VM: compute.instances.setLabels on the VM
To set a service account for the VM to use: compute.instances.setServiceAccount on the VM
To create a new disk for the VM: compute.disks.create on the project
To attach an existing disk in read-only or read-write mode: compute.disks.use on the disk
To attach an existing disk in read-only mode: compute.disks.useReadOnly on the disk

You might also be able to get these permissions with custom roles or other predefined roles.

A4X fundamentals

An A4X cluster is organized into a hierarchy of blocks and subblocks to facilitate large-scale, non-blocking network performance. Understanding this topology is key when reserving capacity and deploying workloads.

A4X instance: An A4X instance is a single A4X machine type that has 4 GPUs attached.
NVLink domain or subblock: An NVLink domain, also referred to as a subblock, is the fundamental unit of A4X capacity. An NVLink domain consists of 18 A4X instances (72 GPUs) connected using a multi-node NVLink system. You create an A4X NVLink domain or a subblock by applying a compact placement policy that specifies a 1x72 topology.
Block: An A4X block is composed of 25 subblocks (NVLink domains), totalling up to 450 A4X instances (1,800 GPUs). The subblocks are rail-aligned for efficient scaling. Each subblock requires a compact placement policy. Therefore, for a single A4X block, you can create 25 compact placement policies.

The following table shows the supported topology options for A4X instances:

Topology (`gpuTopology`)	Number of GPUs	Number of instances
`1x72`	72	18

Overview

Creating instances in bulk with the A4X machine type includes the following steps:

Create VPC networks
Create a compact placement policy
Create instances in bulk

Create VPC networks

Tip: If you are setting up a quick test, you can skip this step and specify a single NIC --network-interface=nic-type=GVNIC instead.

To set up the network for A4X machine type, create three VPC networks for the following network interfaces:

2 regular VPC networks for the gVNIC network interfaces (NIC). These are used for host to host communication.
1 VPC network with the RoCE network profile is required for the CX-7 NICs when creating multiple A4X subblocks. The RoCE VPC network needs to have 4 subnets, one subnet for each CX-7 NIC. These NICs use RDMA over Converged Ethernet (RoCE), providing the high-bandwidth, low-latency communication that's essential for scaling out to multiple A4X subblocks. For a single A4X subblock, you can skip this VPC network because within a single subblock, direct GPU to GPU communication is handled by the multi-node NVLink.

For more information about NIC arrangement, see Review network bandwidth and NIC arrangement.

Create the networks either manually by following the instruction guides or automatically by using the provided script.

Instruction guides

To create the networks, you can use the following instructions:

To create the regular VPC networks for the gVNICs, see Create and manage Virtual Private Cloud networks.
To create the RoCE VPC network, see Create a Virtual Private Cloud network for RDMA NICs.

For these VPC networks, we recommend setting the maximum transmission unit (MTU) to a larger value. For A4X machine type, the recommended MTU is 8896 bytes. To review the recommended MTU settings for other GPU machine types, see MTU settings for GPU machine types.

Script

To create the networks, follow these steps.

Use the following script to create regular VPC networks for the gVNICs.

  
    #!/bin/bash

    # Create regular VPC networks and subnets for the gVNICs
    for N in $(seq 0 1); do
      gcloud compute networks create GVNIC_NAME_PREFIX-net-$N \
        --subnet-mode=custom \
        --mtu=8896

      gcloud compute networks subnets create GVNIC_NAME_PREFIX-sub-$N \
        --network=GVNIC_NAME_PREFIX-net-$N \
        --region=REGION \
        --range=192.168.$N.0/24

      gcloud compute firewall-rules create GVNIC_NAME_PREFIX-internal-$N \
        --network=GVNIC_NAME_PREFIX-net-$N \
        --action=ALLOW \
        --rules=tcp:0-65535,udp:0-65535,icmp \
        --source-ranges=192.168.0.0/16
    done

    # Create SSH firewall rules
    gcloud compute firewall-rules create GVNIC_NAME_PREFIX-ssh \
      --network=GVNIC_NAME_PREFIX-net-0 \
      --action=ALLOW \
      --rules=tcp:22 \
      --source-ranges=IP_RANGE

    # Assumes that an external IP is only created for vNIC 0
    gcloud compute firewall-rules create GVNIC_NAME_PREFIX-allow-ping-net-0 \
      --network=GVNIC_NAME_PREFIX-net-0 \
      --action=ALLOW \
      --rules=icmp \
      --source-ranges=IP_RANGE

If you require multiple A4X subblocks, use the following script to create the RoCE VPC network and subnets for the four CX-7 NICs on each A4X instance.

Important: If your deployment consists of only a single A4X subblock, you can skip this step.
```
  
    # List and make sure network profiles exist in the machine type's zone
    gcloud compute network-profiles list --filter "location.name=ZONE"

    # Create network for CX-7
    gcloud compute networks create RDMA_NAME_PREFIX-mrdma \
      --network-profile=ZONE-vpc-roce \
      --subnet-mode custom \
      --mtu=8896

    # Create subnets
    for N in $(seq 0 3); do
      gcloud compute networks subnets create RDMA_NAME_PREFIX-mrdma-sub-$N \
        --network=RDMA_NAME_PREFIX-mrdma \
        --region=REGION \
        --range=192.168.$((N+2)).0/24 & # offset to avoid overlap with gVNICs
    done

  
```
Replace the following:
- GVNIC_NAME_PREFIX: the custom name prefix to use for the regular VPC networks and subnets for the gVNICs.
- RDMA_NAME_PREFIX: the custom name prefix to use for the RoCE VPC network and subnets for the CX-7 NICs.
- ZONE: specify a zone in which the machine type that you want to use is available, such as us-central1-a. For information about regions, see GPU availability by regions and zones.
- REGION: the region where you want to create the subnets. This region must correspond to the zone specified. For example, if your zone is us-central1-a, then your region is us-central1.
- IP_RANGE: the IP range to use for the SSH firewall rules.
Optional: To verify that the VPC network resources are created successfully, check the network settings in the Google Cloud console:
1. In the Google Cloud console, go to the VPC networks page.
  Go to VPC networks
2. Search the list for the networks that you created in the previous step.
3. To view the subnets, firewall rules, and other network settings, click the name of the network.

Create a compact placement policy

To create a compact placement policy, select one of the following options:

gcloud

To create a compact placement policy, use the gcloud beta compute resource-policies create group-placement command:

gcloud beta compute resource-policies create group-placement POLICY_NAME \
    --collocation=collocated \
    --gpu-topology=1x72 \
    --region=REGION

Replace the following:

POLICY_NAME: the name of the compact placement policy.
REGION: the region where you want to create the compact placement policy. Specify a region in which the machine type that you want to use is available. For information about regions, see GPU availability by regions and zones.

REST

To create a compact placement policy, make a POST request to the beta resourcePolicies.insert method.

POST https://compute.googleapis.com/compute/beta/projects/PROJECT_ID/regions/REGION/resourcePolicies
  {
    "name": "POLICY_NAME",
    "groupPlacementPolicy": {
      "collocation": "COLLOCATED",
      "gpuTopology": "1x72"
    }
  }

Replace the following:

PROJECT_ID: your project ID
POLICY_NAME: the name of the compact placement policy.
REGION: the region where you want to create the compact placement policy. Specify a region in which the machine type that you want to use is available. For information about regions, see GPU availability by regions and zones.

Create A4X instances in bulk

To obtain a GPU topology of 1x72, create 18 instances. When you create the instances, apply the compact placement policy that specifies the gpuTopology field. Applying the policy ensures that Compute Engine creates all 18 instances in one subblock to use an NVLink domain. If a subblock lacks capacity for all 18 instances, then the bulk creation fails and doesn't create any instance.

If your workload can operate with less than 18 instances, then you can set the minCount field to the minimum number of instances required for your workload. If you want to use any available capacity, then set the minCount field to 1.

To create A4X instances in bulk, select one of the following options.

The following commands also set the access scope for your instances. To simplify permissions management, Google recommends that you set the access scope on an instance to cloud-platform access and then use IAM roles to define what services the instance can access. For more information, see Scopes best practice.

gcloud

To create VMs in bulk, use the gcloud compute instances bulk create command.

gcloud compute instances bulk create \
    --name-pattern=NAME_PATTERN \
    --count=COUNT \
    --machine-type=a4x-highgpu-4g \
    --image-family=IMAGE_FAMILY \
    --image-project=IMAGE_PROJECT \
    --region=REGION \
    --boot-disk-type=hyperdisk-balanced \
    --boot-disk-size=DISK_SIZE \
    --scopes=cloud-platform \
    --network-interface=nic-type=GVNIC,network=GVNIC_NAME_PREFIX-net-0,subnet=GVNIC_NAME_PREFIX-sub-0 \
    --network-interface=nic-type=GVNIC,network=GVNIC_NAME_PREFIX-net-1,subnet=GVNIC_NAME_PREFIX-sub-1,no-address \
    --network-interface=nic-type=MRDMA,network=RDMA_NAME_PREFIX-mrdma,subnet=RDMA_NAME_PREFIX-mrdma-sub-0,no-address \
    --network-interface=nic-type=MRDMA,network=RDMA_NAME_PREFIX-mrdma,subnet=RDMA_NAME_PREFIX-mrdma-sub-1,no-address \
    --network-interface=nic-type=MRDMA,network=RDMA_NAME_PREFIX-mrdma,subnet=RDMA_NAME_PREFIX-mrdma-sub-2,no-address \
    --network-interface=nic-type=MRDMA,network=RDMA_NAME_PREFIX-mrdma,subnet=RDMA_NAME_PREFIX-mrdma-sub-3,no-address \
    --reservation-affinity=specific \
    --reservation=RESERVATION \
    --provisioning-model=RESERVATION_BOUND \
    --instance-termination-action=TERMINATION_ACTION \
    --maintenance-policy=TERMINATE \
    --resource-policies=POLICY_NAME

Replace the following:

NAME_PATTERN: the name pattern of the VMs. For example, using vm-# for the name pattern generates VMs with names such as vm-1 and vm-2, up to the number of VMs specified by --count.
COUNT: the number of VMs to create.
IMAGE_FAMILY: the image family of the OS image that you want to use. For a list of supported operating systems, see Supported operating systems.
IMAGE_PROJECT: the project ID of the OS image.
REGION: specify a region in which the machine type that you want to use is available. You must use the same region as the compact placement policy. For information about regions, see GPU availability by regions and zones.
DISK_SIZE: the size of the boot disk in GB.
GVNIC_NAME_PREFIX: the name prefix that you specified when creating the standard VPC networks and subnets that use gVNIC NICs.
RDMA_NAME_PREFIX: the name prefix that you specified when creating the VPC networks and subnets that use RDMA NICs.
RESERVATION: the reservation name, a block, or a subblock within a reservation. To get the reservation name or the available blocks, see View reserved capacity. Based on your requirements for instance placement, choose one of the following:
- To create the instances on any single block:
```
    projects/RESERVATION_OWNER_PROJECT_ID/reservations/RESERVATION_NAME
    
```
- To create the instances on a specific block:
```
    projects/RESERVATION_OWNER_PROJECT_ID/reservations/RESERVATION_NAME/reservationBlocks/RESERVATION_BLOCK_NAME
    
```
- To create the instances in a specific subblock:
```
    projects/RESERVATION_OWNER_PROJECT_ID/reservations/RESERVATION_NAME/reservationBlocks/RESERVATION_BLOCK_NAME/reservationSubBlocks/RESERVATION_SUBBLOCK_NAME
    
```
Tip: If the reservation exists in the current project, then you can omit projects/RESERVATION_OWNER_PROJECT_ID/reservations/ from the reservation value.
TERMINATION_ACTION: whether Compute Engine stops (STOP) or deletes (DELETE) the VM at the end of the reservation period.
POLICY_NAME: the name of the compact placement policy.

REST

To create VMs in bulk, make a POST request to the instances.bulkInsert method.

POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instances/bulkInsert
{
  "namePattern":"NAME_PATTERN",
  "count":"COUNT",
  "instanceProperties":{
    "machineType":"a4x-highgpu-4g",
    "disks":[
      {
        "boot":true,
        "initializeParams":{
          "diskSizeGb":"DISK_SIZE",
          "diskType":"hyperdisk-balanced",
          "sourceImage":"projects/IMAGE_PROJECT/global/images/family/IMAGE_FAMILY"
        },
        "mode":"READ_WRITE",
        "type":"PERSISTENT"
      }
    ],
    "serviceAccounts": [
      {
        "email": "default",
        "scopes": [
          "https://www.googleapis.com/auth/cloud-platform"
        ]
      }
    ],
    "networkInterfaces": [
      {
        "accessConfigs": [
          {
            "name": "external-nat",
            "type": "ONE_TO_ONE_NAT"
          }
        ],
        "network": "projects/NETWORK_PROJECT_ID/global/networks/GVNIC_NAME_PREFIX-net-0",
        "nicType": "GVNIC",
        "subnetwork": "projects/NETWORK_PROJECT_ID/region/REGION/subnetworks/GVNIC_NAME_PREFIX-sub-0"
      },
      {
        "network": "projects/NETWORK_PROJECT_ID/global/networks/GVNIC_NAME_PREFIX-net-1",
        "nicType": "GVNIC",
        "subnetwork": "projects/NETWORK_PROJECT_ID/region/REGION/subnetworks/GVNIC_NAME_PREFIX-sub-1"
      },
      {
        "network": "projects/NETWORK_PROJECT_ID/global/networks/RDMA_NAME_PREFIX-mrdma",
        "nicType": "MRDMA",
        "subnetwork": "projects/NETWORK_PROJECT_ID/region/REGION/subnetworks/RDMA_NAME_PREFIX-mrdma-sub-0"
      },
      {
        "network": "projects/NETWORK_PROJECT_ID/global/networks/RDMA_NAME_PREFIX-mrdma",
        "nicType": "MRDMA",
        "subnetwork": "projects/NETWORK_PROJECT_ID/region/REGION/subnetworks/RDMA_NAME_PREFIX-mrdma-sub-1"
      },
      {
        "network": "projects/NETWORK_PROJECT_ID/global/networks/RDMA_NAME_PREFIX-mrdma",
        "nicType": "MRDMA",
        "subnetwork": "projects/NETWORK_PROJECT_ID/region/REGION/subnetworks/RDMA_NAME_PREFIX-mrdma-sub-2"
      },
      {
        "network": "projects/NETWORK_PROJECT_ID/global/networks/RDMA_NAME_PREFIX-mrdma",
        "nicType": "MRDMA",
        "subnetwork": "projects/NETWORK_PROJECT_ID/region/REGION/subnetworks/RDMA_NAME_PREFIX-mrdma-sub-3"
      }
    ],
    "reservationAffinity":{
        "consumeReservationType":"SPECIFIC_RESERVATION",
        "key":"compute.googleapis.com/reservation-name",
        "values":[
          "RESERVATION"
        ]
      },
    "scheduling":{
        "provisioningModel":"RESERVATION_BOUND",
        "instanceTerminationAction":"DELETE",
        "onHostMaintenance": "TERMINATE",
        "automaticRestart":true
      },
    "resourcePolicies": [
      "projects/PROJECT_ID/regions/REGION/resourcePolicies/POLICY_NAME"
    ]
  }
}

Replace the following:

PROJECT_ID: the project ID of the project where you want to create the VM.
ZONE: specify a zone in which the machine type that you want to use is available. You must use a zone in the same region as the compact placement policy. For information about regions, see GPU availability by regions and zones.
NAME_PATTERN: the name pattern of the VMs. For example, using vm-# for the name pattern generates VMs with names such as vm-1 and vm-2, up to the number of VMs specified by --count.
COUNT: the number of VMs to create.
VM_NAME: the name of the VM.
DISK_SIZE: the size of the boot disk in GB.
IMAGE_PROJECT: the project ID of the OS image.
IMAGE_FAMILY: the image family of the OS image that you want to use. For a list of supported operating systems, see Supported operating systems.
NETWORK_PROJECT_ID: the project ID of the network.
GVNIC_NAME_PREFIX: the name prefix that you specified when creating the standard VPC networks and subnets that use gVNIC NICs.
REGION: the region of the subnetwork.
RDMA_NAME_PREFIX: the name prefix that you specified when creating the VPC networks and subnets that use RDMA NICs.
RESERVATION: the reservation name, a block, or a subblock within a reservation. To get the reservation name or the available blocks, see View reserved capacity. Based on your requirements for instance placement, choose one of the following:
- To create the instances on any single block:
```
    projects/RESERVATION_OWNER_PROJECT_ID/reservations/RESERVATION_NAME
    
```
- To create the instances on a specific block:
```
    projects/RESERVATION_OWNER_PROJECT_ID/reservations/RESERVATION_NAME/reservationBlocks/RESERVATION_BLOCK_NAME
    
```
- To create the instances in a specific subblock:
```
    projects/RESERVATION_OWNER_PROJECT_ID/reservations/RESERVATION_NAME/reservationBlocks/RESERVATION_BLOCK_NAME/reservationSubBlocks/RESERVATION_SUBBLOCK_NAME
    
```
Tip: If the reservation exists in the current project, then you can omit projects/RESERVATION_OWNER_PROJECT_ID/reservations/ from the reservation value.
TERMINATION_ACTION: whether Compute Engine stops (STOP) or deletes (DELETE) the VM at the end of the reservation period.
PROJECT_ID: the project ID of the compact placement policy.
REGION: the region of the compact placement policy.
POLICY_NAME: the name of the compact placement policy.

For more information about the configuration options when creating VMs in bulk, see Create VMs in bulk in the Compute Engine documentation.

Create AI-optimized instances in bulk with A4X

Before you begin

Console

gcloud

REST

Required roles

Required permissions

A4X fundamentals

Overview

Create VPC networks

Instruction guides

Script

Create a compact placement policy

gcloud

REST

Create A4X instances in bulk

gcloud

REST

What's next