This page explains how to deploy dynamic resource allocation (DRA) workloads
on your Google Kubernetes Engine clusters. On this page, you'll create a ResourceClaimTemplate
to request hardware with DRA and then deploy a basic workload to demonstrate
how Kubernetes flexibly allocates hardware on your Pods.
This page is intended for Application operators and Data engineers who run workloads like AI/ML or high performance computing (HPC).
About dynamic resource allocation
DRA is a built-in Kubernetes feature that lets you flexibly request, allocate, and share hardware in your cluster among Pods and containers. For more information, see About dynamic resource allocation.
About requesting devices with DRA
When you set up your GKE infrastructure for DRA, the DRA drivers
on your nodes create DeviceClass
objects in the cluster.
A DeviceClass defines a category of devices, such as GPUs, that are available to request for
workloads.
A platform administrator can optionally deploy additional DeviceClasses that
limit which devices you can request in specific workloads.
To request devices within a DeviceClass
, you create one of the following objects:
ResourceClaim
: A ResourceClaim lets a Pod or a user request hardware resources by filtering for certain parameters within a DeviceClass.ResourceClaimTemplate
: A ResourceClaimTemplate defines a template that Pods can use to automatically create new per-Pod ResourceClaims.
For more information about ResourceClaim
and ResourceClaimTemplate
objects,
see
When to use ResourceClaims
and ResourceClaimTemplates
.
The examples on this page use a basic ResourceClaimTemplate
to request the
specified device configuration. For more detailed information, see the
ResourceClaimTemplateSpec
Kubernetes documentation.
Limitations
- Node auto-provisioning isn't supported.
- Autopilot clusters don't support DRA.
- You can't use the following GPU sharing features:
- Time-sharing GPUs
- Multi-instance GPUs
- Multi-process Service (MPS)
Requirements
To use DRA, your GKE version must be version 1.32.1-gke.1489001 or later.
You should also be familiar with the following requirements and limitations:
Before you begin
Before you start, make sure you have performed the following tasks:
- Enable the Google Kubernetes Engine API. Enable Google Kubernetes Engine API
- If you want to use the Google Cloud CLI for this task,
install and then
initialize the
gcloud CLI. If you previously installed the gcloud CLI, get the latest
version by running
gcloud components update
.
- Ensure that your GKE clusters are configured for DRA workloads.
Use DRA to deploy workloads
To request per-Pod device allocation, you first create a ResourceClaimTemplate
that produces a ResourceClaim
to describe your request for GPUs or TPUs,
which Kubernetes uses as a template to create new ResourceClaim
objects for each Pod in a workload.
When you specify the ResourceClaimTemplate
in a workload, Kubernetes
allocates the requested resources and schedules the Pods on corresponding
nodes.
GPU
Save the following manifest as
claim-template.yaml
:apiVersion: resource.k8s.io/v1beta1 kind: ResourceClaimTemplate metadata: name: gpu-claim-template spec: spec: devices: requests: - name: single-gpu deviceClassName: gpu.nvidia.com allocationMode: ExactCount count: 1
Create the
ResourceClaimTemplate
:kubectl create -f claim-template.yaml
To create a workload that references the
ResourceClaimTemplate
, save the following manifest asdra-gpu-example.yaml
:apiVersion: apps/v1 kind: Deployment metadata: name: dra-gpu-example spec: replicas: 1 selector: matchLabels: app: dra-gpu-example template: metadata: labels: app: dra-gpu-example spec: containers: - name: ctr image: ubuntu:22.04 command: ["bash", "-c"] args: ["while [ 1 ]; do date; echo $(nvidia-smi -L || echo Waiting...); sleep 60; done"] resources: claims: - name: single-gpu resourceClaims: - name: single-gpu resourceClaimTemplateName: gpu-claim-template tolerations: - key: "nvidia.com/gpu" operator: "Exists" effect: "NoSchedule"
Deploy the workload:
kubectl create -f dra-gpu-example.yaml
TPU
Save the following manifest as
claim-template.yaml
:apiVersion: resource.k8s.io/v1beta1 kind: ResourceClaimTemplate metadata: name: tpu-claim-template spec: spec: devices: requests: - name: all-tpus deviceClassName: tpu.google.com allocationMode: All
This
ResourceClaimTemplate
requests that GKE allocate an entire TPU node pool to everyResourceClaim
.Create the
ResourceClaimTemplate
:kubectl create -f claim-template.yaml
To create a workload that references the
ResourceClaimTemplate
, save the following manifest asdra-tpu-example.yaml
:apiVersion: apps/v1 kind: Deployment metadata: name: dra-tpu-example spec: replicas: 1 selector: matchLabels: app: dra-tpu-example template: metadata: labels: app: dra-tpu-example spec: containers: - name: ctr image: ubuntu:22.04 command: - /bin/sh - -c - | echo "Environment Variables:" env echo "Sleeping indefinitely..." sleep infinity resources: claims: - name: all-tpus resourceClaims: - name: all-tpus resourceClaimTemplateName: tpu-claim-template tolerations: - key: "google.com/tpu" operator: "Exists" effect: "NoSchedule"
Deploy the workload:
kubectl create -f dra-tpu-example.yaml
Verify the hardware allocation
You can verify that your workloads have been allocated hardware by checking
the ResourceClaim
or by looking at the logs for your Pod.
GPU
Get the
ResourceClaim
associated with the workload that you deployed:kubectl get resourceclaims
The output should resemble the following:
NAME STATE AGE dra-gpu-example-64b75dc6b-x8bd6-single-gpu-jwwdh allocated,reserved 9s
To get more details about the hardware assigned to the Pod, run the following command:
kubectl describe resourceclaims RESOURCECLAIM
Replace
RESOURCECLAIM
with the full name of theResourceClaim
that you got from the output of the previous step.The output should resemble the following:
Name: dra-gpu-example-64b75dc6b-x8bd6-single-gpu-jwwdh Namespace: default Labels: <none> Annotations: resource.kubernetes.io/pod-claim-name: single-gpu API Version: resource.k8s.io/v1beta1 Kind: ResourceClaim Metadata: Creation Timestamp: 2025-03-31T17:11:37Z Finalizers: resource.kubernetes.io/delete-protection Generate Name: dra-gpu-example-64b75dc6b-x8bd6-single-gpu- Owner References: API Version: v1 Block Owner Deletion: true Controller: true Kind: Pod Name: dra-gpu-example-64b75dc6b-x8bd6 UID: cb3cb1db-e62a-4961-9967-cdc7d599105b Resource Version: 12953269 UID: 3e0c3925-e15a-40e9-b552-d03610fff040 Spec: Devices: Requests: Allocation Mode: ExactCount Count: 1 Device Class Name: gpu.nvidia.com Name: single-gpu Status: Allocation: Devices: Results: Admin Access: <nil> Device: gpu-0 Driver: gpu.nvidia.com Pool: gke-cluster-gpu-pool-11026a2e-zgt1 Request: single-gpu Node Selector: # lines omitted for clarity Reserved For: Name: dra-gpu-example-64b75dc6b-x8bd6 Resource: pods UID: cb3cb1db-e62a-4961-9967-cdc7d599105b Events: <none>
To get logs for the workload that you deployed, run the following command:
kubectl logs deployment/dra-gpu-example --all-pods=true | grep "GPU"
The output should resemble the following:
[pod/dra-gpu-example-64b75dc6b-x8bd6/ctr] GPU 0: Tesla T4 (UUID: GPU-2087ac7a-f781-8cd7-eb6b-b00943cc13ef)
The output of these steps shows that GKE allocated one GPU to the Pod.
TPU
Get the
ResourceClaim
associated with the workload that you deployed:kubectl get resourceclaims | grep dra-tpu-example
The output should resemble the following:
NAME STATE AGE dra-tpu-example-64b75dc6b-x8bd6-all-tpus-jwwdh allocated,reserved 9s
To get more details about the hardware assigned to the Pod, run the following command:
kubectl describe resourceclaims RESOURCECLAIM -o yaml
Replace
RESOURCECLAIM
with the full name of theResourceClaim
that you got from the output of the previous step.The output should resemble the following:
apiVersion: resource.k8s.io/v1beta1 kind: ResourceClaim metadata: annotations: resource.kubernetes.io/pod-claim-name: all-tpus creationTimestamp: "2025-03-04T21:00:54Z" finalizers: - resource.kubernetes.io/delete-protection generateName: dra-tpu-example-59b8785697-k9kzd-all-gpus- name: dra-tpu-example-59b8785697-k9kzd-all-gpus-gnr7z namespace: default ownerReferences: - apiVersion: v1 blockOwnerDeletion: true controller: true kind: Pod name: dra-tpu-example-59b8785697-k9kzd uid: c2f4fe66-9a73-4bd3-a574-4c3eea5fda3f resourceVersion: "12189603" uid: 279b5014-340b-4ef6-9dda-9fbf183fbb71 spec: devices: requests: - allocationMode: All deviceClassName: tpu.google.com name: all-tpus status: allocation: devices: results: - adminAccess: null device: "0" driver: tpu.google.com pool: gke-tpu-2ec29193-bcc0 request: all-tpus - adminAccess: null device: "1" driver: tpu.google.com pool: gke-tpu-2ec29193-bcc0 request: all-tpus - adminAccess: null device: "2" driver: tpu.google.com pool: gke-tpu-2ec29193-bcc0 request: all-tpus - adminAccess: null device: "3" driver: tpu.google.com pool: gke-tpu-2ec29193-bcc0 request: all-tpus - adminAccess: null device: "4" driver: tpu.google.com pool: gke-tpu-2ec29193-bcc0 request: all-tpus - adminAccess: null device: "5" driver: tpu.google.com pool: gke-tpu-2ec29193-bcc0 request: all-tpus - adminAccess: null device: "6" driver: tpu.google.com pool: gke-tpu-2ec29193-bcc0 request: all-tpus - adminAccess: null device: "7" driver: tpu.google.com pool: gke-tpu-2ec29193-bcc0 request: all-tpus nodeSelector: nodeSelectorTerms: - matchFields: - key: metadata.name operator: In values: - gke-tpu-2ec29193-bcc0 reservedFor: - name: dra-tpu-example-59b8785697-k9kzd resource: pods uid: c2f4fe66-9a73-4bd3-a574-4c3eea5fda3f
To get logs for the workload that you deployed, run the following command:
kubectl logs deployment/dra-tpu-example --all-pods=true | grep "TPU"
The output should resemble the following:
[pod/dra-tpu-example-59b8785697-tm2lc/ctr] TPU_CHIPS_PER_HOST_BOUNDS=2,4,1 [pod/dra-tpu-example-59b8785697-tm2lc/ctr] TPU_TOPOLOGY_WRAP=false,false,false [pod/dra-tpu-example-59b8785697-tm2lc/ctr] TPU_SKIP_MDS_QUERY=true [pod/dra-tpu-example-59b8785697-tm2lc/ctr] TPU_RUNTIME_METRICS_PORTS=8431,8432,8433,8434,8435,8436,8437,8438 [pod/dra-tpu-example-59b8785697-tm2lc/ctr] TPU_WORKER_ID=0 [pod/dra-tpu-example-59b8785697-tm2lc/ctr] TPU_WORKER_HOSTNAMES=localhost [pod/dra-tpu-example-59b8785697-tm2lc/ctr] TPU_TOPOLOGY=2x4 [pod/dra-tpu-example-59b8785697-tm2lc/ctr] TPU_ACCELERATOR_TYPE=v6e-8 [pod/dra-tpu-example-59b8785697-tm2lc/ctr] TPU_HOST_BOUNDS=1,1,1 [pod/dra-tpu-example-59b8785697-tm2lc/ctr] TPU_TOPOLOGY_ALT=false [pod/dra-tpu-example-59b8785697-tm2lc/ctr] TPU_DEVICE_0_RESOURCE_CLAIM=77e68f15-fa2f-4109-9a14-6c91da1a38d3
The output of these steps indicates that all of the TPUs in a node pool were allocated to the Pod.