This page shows you how to create your own Hypercompute Cluster with Google Kubernetes Engine (GKE) to support your AI and ML workloads, using A4 or A3 Ultra virtual machines (VMs).
GKE is the open, portable, extensible, and highly scalable platform for Hypercompute Cluster. GKE provides a single platform surface to run a diverse set of workloads for your organization's needs. This includes high performance distributed pre-training, model fine-tuning, model inference, application serving, and supporting services. GKE reduces the operational burden of managing multiple platforms.
Choose how to create a cluster
The following options for cluster creation each provide varying degrees of ease and flexibility in cluster configuration and workload scheduling:
Create clusters with the default configuration for compute, storage, and networking resources, and with GPUDirect RDMA-over-Converged-Ethernet (RoCE) enabled:
- Use Cluster Toolkit to quickly create production-ready GKE clusters.
- Use Accelerated Processing Kit (XPK) to quickly create GKE clusters for proofs-of-concept and testing.
Alternatively, you can create your GKE cluster manually for precise customization or expansion of existing production GKE environments. To create a Hypercompute Cluster with GKE manually, see Create a custom Hypercompute Cluster with GKE.
Before you begin
Before you start, make sure you have performed the following tasks:
- Enable the Google Kubernetes Engine API. Enable Google Kubernetes Engine API
- If you want to use the Google Cloud CLI for this task,
install and then
initialize the
gcloud CLI. If you previously installed the gcloud CLI, get the latest
version by running
gcloud components update
.
- Ensure that you have enough quota for A4 or A3 Ultra VMs. To request more quota, follow the instructions in GPU quota. To ensure that your cluster has capacity, you can follow the instructions to reserve capacity.
Requirements
The following requirements apply to GKE Hypercompute Cluster:
Ensure you use the minimum GPU driver version, depending on the machine type:
- A4: The B200 GPUs in A4 VMs require a minimum of the 570 GPU driver version. GKE, by default, automatically installs this driver version on all A4 nodes running the required minimum version for A4, 1.32.1-gke.1729000 or later.
- A3 Ultra: The H200 GPUs in A3 Ultra VMs require a minimum of 550 GPU
driver version, which is available in GKE 1.31 as
latest
driver version. For A3 Ultra, you must setgpu-driver-version=latest
with GKE 1.31. For GKE version 1.31.5-gke.1169000 or later, GKE, by default, automatically installs 550 GPU driver versions on A3 Ultra nodes.
For A3 Ultra node pools, you must set the disk type to
hyperdisk-balanced
.To use GPUDirect RDMA, use the following minimum versions depending on the machine type:
- A4: Use 1.32.2-gke.1475000 or later.
- A3 Ultra: Use 1.31.4-gke.1183000 or later.
To use GPUDirect RDMA, the GKE nodes must use a Container-Optimized OS node image. Ubuntu and Windows node images are not supported.
Reserve capacity
To ensure that your workloads have the A4 or A3 Ultra VM resources required for these instructions, you can create a future reservation request. With this request, you can reserve blocks of capacity for a defined duration in the future. At that date and time in the future, Compute Engine automatically provisions the blocks of capacity by creating on-demand reservations that you can immediately consume by provisioning node pools for this cluster.
By default, GKE node pools link to the entire reserved capacity
using just the name of your reservation (RESERVATION_NAME
). Optionally, you can create GKE
nodes on a specific block within
your reservation.
Do the following steps to request capacity and gather the required information to create nodes on a specific block within your reservation:
To get the name of the blocks that are available for your reservation, run the following command:
gcloud beta compute reservations blocks list RESERVATION_NAME \ --zone=COMPUTE_ZONE --format "value(name)"
Replace the following:
RESERVATION_NAME
: the name of your reservation.COMPUTE_ZONE
: the compute zone of your reservation.
The output has the following format:
BLOCK_NAME
. For example the output might be similar to the following:example-res1-block-0001
.If you want to target specific blocks within a reservation when provisioning GKE node pools, you must specify the full reference to your block as follows:
RESERVATION_NAME/reservationBlocks/BLOCK_NAME
For example, using the example output in the preceding step, the full path is as follows:
example-res1/reservationBlocks/example-res1-block-0001
Create a cluster
Use the following instructions to create a cluster either using Cluster Toolkit or XPK.
Create a cluster using Cluster Toolkit
This section guides you through the cluster creation process, ensuring that your project follows best practices and meets the requirements for GKE Hypercompute Cluster.
A4
- Launch Cloud Shell. You can use a different environment; however, we recommend Cloud Shell because the dependencies are already pre-installed for Cluster Toolkit. If you don't want to use Cloud Shell, follow the instructions to install dependencies to prepare a different environment.
Clone the Cluster Toolkit from the git repository:
cd ~ git clone https://github.com/GoogleCloudPlatform/cluster-toolkit.git
Install the Cluster Toolkit:
cd cluster-toolkit && git checkout main && make
Create a Cloud Storage bucket to store the state of the Terraform deployment:
gcloud storage buckets create gs://BUCKET_NAME \ --default-storage-class=STANDARD \ --location=COMPUTE_REGION \ --uniform-bucket-level-access gcloud storage buckets update gs://BUCKET_NAME --versioning
Replace the following variables:
BUCKET_NAME
: the name of the new Cloud Storage bucket.COMPUTE_REGION
: the compute region where you want to store the state of the Terraform deployment.
In the
examples/gke-a4-highgpu/gke-a4-highgpu-deployment.yaml
file, replace the following variables in theterraform_backend_defaults
andvars
sections to match the specific values for your deployment:BUCKET_NAME
: the name of the Cloud Storage bucket you created in the previous step.PROJECT_ID
: your Google Cloud project ID.COMPUTE_REGION
: the compute region for the cluster.COMPUTE_ZONE
: the compute zone for the node pool of A4 machines.IP_ADDRESS/SUFFIX
: The IP address range that you want to allow to connect with the cluster. This CIDR block must include the IP address of the machine to call Terraform.RESERVATION_NAME
: the name of your reservation.BLOCK_NAME
: the name of a specific block within the reservation.NODE_COUNT
: the number of A4 nodes in your cluster.
To modify advanced settings, edit
examples/gke-a4-highgpu/gke-a4-highgpu.yaml
.Generate Application Default Credentials (ADC) to provide access to Terraform.
Deploy the blueprint to provision the GKE infrastructure using A4 machine types:
cd ~/cluster-toolkit ./gcluster deploy -d \ examples/gke-a4-highgpu/gke-a4-highgpu-deployment.yaml \ examples/gke-a4-highgpu/gke-a4-highgpu.yaml
A3 Ultra
- Launch Cloud Shell. You can use a different environment; however, we recommend Cloud Shell because the dependencies are already pre-installed for Cluster Toolkit. If you don't want to use Cloud Shell, follow the instructions to install dependencies to prepare a different environment.
Clone the Cluster Toolkit from the git repository:
cd ~ git clone https://github.com/GoogleCloudPlatform/cluster-toolkit.git
Install the Cluster Toolkit:
cd cluster-toolkit && git checkout main && make
Create a Cloud Storage bucket to store the state of the Terraform deployment:
gcloud storage buckets create gs://BUCKET_NAME \ --default-storage-class=STANDARD \ --location=COMPUTE_REGION \ --uniform-bucket-level-access gcloud storage buckets update gs://BUCKET_NAME --versioning
Replace the following variables:
BUCKET_NAME
: the name of the new Cloud Storage bucket.COMPUTE_REGION
: the compute region where you want to store the state of the Terraform deployment.
In the
examples/gke-a3-ultragpu/gke-a3-ultragpu-deployment.yaml
file, replace the following variables in theterraform_backend_defaults
andvars
sections to match the specific values for your deployment:BUCKET_NAME
: the name of the Cloud Storage bucket you created in the previous step.PROJECT_ID
: your Google Cloud project ID.COMPUTE_REGION
: the compute region for the cluster.COMPUTE_ZONE
: the compute zone for the node pool of A3 Ultra machines.IP_ADDRESS/SUFFIX
: The IP address range that you want to allow to connect with the cluster. This CIDR block must include the IP address of the machine to call Terraform.RESERVATION_NAME
: the name of your reservation.BLOCK_NAME
: the name of a specific block within the reservation.NODE_COUNT
: the number of A3 Ultra nodes in your cluster.
To modify advanced settings, edit
examples/gke-a3-ultragpu/gke-a3-ultragpu.yaml
.Generate Application Default Credentials (ADC) to provide access to Terraform.
Deploy the blueprint to provision the GKE infrastructure using A3 Ultra machine types:
cd ~/cluster-toolkit ./gcluster deploy -d \ examples/gke-a3-ultragpu/gke-a3-ultragpu-deployment.yaml \ examples/gke-a3-ultragpu/gke-a3-ultragpu.yaml
Create a cluster and run workloads using XPK
Accelerated Processing Kit (XPK) lets you quickly provision and utilize clusters. XPK generates preconfigured, training-optimized infrastructure, ideal for when workload execution is your primary focus.
- Install the required tools to meet the XPK prerequisites.
Open a shell window on a Linux machine, and enter the following commands to clone XPK from the Git repository, and install the required packages:
## Setup virtual environment. VENV_DIR=~/venvp3 python3 -m venv $VENV_DIR source $VENV_DIR/bin/activate ## Clone the repository. git clone --branch v0.6.0 https://github.com/google/xpk.git cd xpk ## Install required packages make install && export PATH=$PATH:$PWD/bin
Create a Standard cluster using A4 or A3 Ultra. You can provision the cluster's nodes with either reserved capacity, or, you can use on-demand, standard VMs or Spot VMs:
Provision nodes using reserved capacity:
python3 xpk.py cluster create \ --cluster=CLUSTER_NAME \ --device-type=DEVICE_TYPE \ --zone=COMPUTE_ZONE \ --project=PROJECT_ID \ --num-nodes=NUM_NODES \ --reservation=RESERVATION_NAME
Provision nodes using on-demand, standard VMs or Spot VMs capacity:
python3 xpk.py cluster create \ --cluster=CLUSTER_NAME \ --device-type=DEVICE_TYPE \ --zone=COMPUTE_ZONE \ --project=PROJECT_ID \ --num-nodes=NUM_NODES \ --PROVISIONING_MODE
For these commands, replace the following variables:
CLUSTER_NAME
: a name for the cluster.DEVICE_TYPE
: the machine type, eitherb200-8
for A4 orh200-141gb-8
for A3 Ultra.COMPUTE_ZONE
: the compute zone for the node pool of A4 or A3 Ultra machines. To use reserved capacity, ensure that you use the zone where you reserved the capacity. And, we generally recommend choosing a zone near the user to minimize latency.PROJECT_ID
: your Google Cloud project ID.NUM_NODES
: the number of worker nodes in the node pool.RESERVATION_NAME
: the name of your reservation.PROVISIONING_MODE
:on-demand
for standard VMs, orspot
for Spot VMs.
XPK offers additinal arguments for cluster creation, including those for creating private clusters, creating Vertex AI Tensorboards, and using node auto-provisioning. For more information, refer to the cluster creation guide for XPK.
Verify that the cluster was created successfully:
python3 xpk.py cluster list --zone=COMPUTE_ZONE --project=PROJECT_ID
Optional: Run a workload to test the cluster environment:
python3 xpk.py workload create \ --workload WORKLOAD_NAME --command "echo goodbye" \ --cluster CLUSTER_NAME \ --device-type=DEVICE_TYPE \ --num-nodes=WORKLOAD_NUM_NODES
Replace the following variables:
WORKLOAD_NAME
: name of your workload.CLUSTER_NAME
: the name of the cluster.DEVICE_TYPE
: the machine type you selected when creating the cluster, eitherb200-8
for A4 orh200-141gb-8
for A3 Ultra.WORKLOAD_NUM_NODES
: number of worker nodes used for workload execution.
Deploy and run NCCL test with Topology Aware Scheduling (TAS)
To validate the functionality of the provisioned cluster, you can run a NCCL test. To run a NCCL test with Topology Aware Scheduling, complete the following steps.
Connect to your cluster:
gcloud container clusters get-credentials CLUSTER_NAME
Replace
CLUSTER_NAME
with the name of your cluster. For the clusters created with Cluster Toolkit, the cluster names will begke-a4-high
orgke-a3-ultra
, depending on your selected machine type.Deploy an all-gather NCCL performance test with Topology Aware Scheduling enabled by using the nccl-jobset-example.yaml file.
By default, this test uses four nodes. To change the number of nodes, modify the YAML file to change the following values from
4
to your required number of nodes:parallelism
completions
N_NODES
Create the resources to run the test:
kubectl create -f ~/cluster-toolkit/examples/CLUSTER_NAME/nccl-jobset-example.yaml
This command returns a JobSet name.
The output should be similar to the following:
jobset.jobset.x-k8s.io/all-gather8t7dt created
To view the results of the NCCL test, run this command to view all of the running Pods:
kubectl get pods
The output should be similar to the following:
NAME READY STATUS RESTARTS AGE all-gather8t7dt-w-0-0-n9s6j 0/1 Completed 0 9m34s all-gather8t7dt-w-0-1-rsf7r 0/1 Completed 0 9m34s
Find a Pod name matching the pattern
jobset-name-w-0-0-*
. The logs of this Pod contain the results of the NCCL test.To fetch the logs for this Pod, run this command:
kubectl logs all-gather8t7dt-w-0-0-n9s6j
The output should be similar to the following:
# size count type redop root time algbw busbw #wrong time algbw busbw #wrong # (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s) 1024 16 float none -1 54.07 0.02 0.02 0 55.80 0.02 0.02 0 2048 32 float none -1 55.46 0.04 0.03 0 55.31 0.04 0.03 0 4096 64 float none -1 55.59 0.07 0.07 0 55.38 0.07 0.07 0 8192 128 float none -1 56.05 0.15 0.14 0 55.92 0.15 0.14 0 16384 256 float none -1 57.08 0.29 0.27 0 57.75 0.28 0.27 0 32768 512 float none -1 57.49 0.57 0.53 0 57.22 0.57 0.54 0 65536 1024 float none -1 59.20 1.11 1.04 0 59.20 1.11 1.04 0 131072 2048 float none -1 59.58 2.20 2.06 0 63.57 2.06 1.93 0 262144 4096 float none -1 63.87 4.10 3.85 0 63.61 4.12 3.86 0 524288 8192 float none -1 64.83 8.09 7.58 0 64.40 8.14 7.63 0 1048576 16384 float none -1 79.74 13.15 12.33 0 76.66 13.68 12.82 0 2097152 32768 float none -1 78.41 26.74 25.07 0 79.05 26.53 24.87 0 4194304 65536 float none -1 83.21 50.41 47.26 0 81.25 51.62 48.39 0 8388608 131072 float none -1 94.35 88.91 83.35 0 99.07 84.68 79.38 0 16777216 262144 float none -1 122.9 136.55 128.02 0 121.7 137.83 129.21 0 33554432 524288 float none -1 184.2 182.19 170.80 0 178.1 188.38 176.60 0 67108864 1048576 float none -1 294.7 227.75 213.51 0 277.7 241.62 226.52 0 134217728 2097152 float none -1 495.4 270.94 254.00 0 488.8 274.60 257.43 0 268435456 4194304 float none -1 877.5 305.92 286.80 0 861.3 311.65 292.17 0 536870912 8388608 float none -1 1589.8 337.71 316.60 0 1576.2 340.61 319.33 0 1073741824 16777216 float none -1 3105.7 345.74 324.13 0 3069.2 349.85 327.98 0 2147483648 33554432 float none -1 6161.7 348.52 326.74 0 6070.7 353.75 331.64 0 4294967296 67108864 float none -1 12305 349.03 327.22 0 12053 356.35 334.08 0 8589934592 134217728 float none -1 24489 350.77 328.85 0 23991 358.05 335.67 0 # Out of bounds values : 0 OK # Avg bus bandwidth : 120.248
Run reproducible benchmarks
You can use reproduce pre-training benchmarks for large machine learning open models on A4 and A3 Ultra VMs on GKE.
Each recipe provides you with the instructions to complete the following tasks:
- Prepare your environment.
- Run the benchmark.
- Analyze the benchmarks results. This includes the benchmark results and detailed logs for further analysis.
To view all the recipes available, see the GPU recipes repository.
Models | Framework | Recipe |
---|---|---|
Llama-3.1-70B | MaxText | 32 node workload |
Llama-3.1-70B | NeMo | 32 node workload |
Mixtral-8-7B | MaxText | 32 node workload |
Mixtral-8-7B | NeMo | 32 node workload |
Clean up resources created by Cluster Toolkit
To avoid recurring charges for the resources used on this page, clean up the resources provisioned by Cluster Toolkit, including the VPC networks and GKE cluster:
./gcluster destroy CLUSTER_NAME/
Replace CLUSTER_NAME
with the name of your cluster.
For the clusters created with Cluster Toolkit, the cluster names
will be gke-a4-high
or gke-a3-ultra
, depending on your selected machine
type.
What's next
- To learn about scheduling workloads on your Hypercompute Cluster with GKE using Topology Aware Scheduling (TAS) and Kueue, see Schedule GKE workloads with Topology Aware Scheduling.
- To learn about managing common events relevant to GKE clusters and AI workloads, see Manage Hypercompute Clusters with GKE.