This page provides information about dynamic resource allocation (DRA) in Google Kubernetes Engine (GKE). On this page, you learn about the fundamentals of DRA, how it works in GKE, and the benefits of using DRA to allocate hardware like GPUs and TPUs.
This page is intended for the following roles:
- Platform administrators who want to reduce the complexity and overhead of setting up infrastructure with specialized hardware devices.
- App operators and Data engineers who run workloads like AI/ML or high performance computing (HPC).
Before reading this page, ensure that you're familiar with the following resources:
Introduction to DRA
DRA is a built-in Kubernetes feature that lets you flexibly request, allocate, and share hardware in your cluster among Pods and containers. DRA improves the experience of allocating attached hardware, such as accelerators, by letting device vendors and platform administrators declare classes of devices that can be requested and allocated. App operators can ask for specific device configurations within those classes and then request those configurations in their workloads. Kubernetes and GKE manage Pod scheduling, node assignments, and device allocation based on workload requests.
For example, a platform administrator might define a device class that has only NVIDIA A100 GPUs. App operators can then filter the devices in that device class based on workload requirements, such as filtering for a minimum of 80 GB of GPU memory. When the app operator deploys a workload that requests the filtered configuration, GKE places the Pods on nodes that meet the selected criteria. In this example, GKE finds nodes that have available A100 (80 GB) GPUs. The app operator doesn't need to select specific nodes or device configurations in the workload manifest.
Benefits of DRA
Without DRA, allocating hardware devices in Kubernetes relies on device plugins. To attach hardware resources to Pods by using device plugins, you use node labels to place Pods on specific nodes. Additionally, to dedicate an entire node's resources to a single Pod, you request the exact number of devices that's attached to the nodes.
With DRA, the experience of allocating devices to Pods is similar to allocating volumes for storage. You define classes of devices, request devices within those classes, and then assign those requested devices to workloads. DRA provides a significantly more extensible surface to filter for devices based on workload and business needs. The DRA approach of using expressions and templates to claim hardware and schedule Pods has the following benefits:
- Declarative device allocation: platform administrators can define device configurations for specific types of workloads or teams.
- Reduced cross-team complexity: when platform administrators provision nodes that have specialized hardware configurations, app operators don't need to know which nodes have specific configurations. Platform administrators don't need to label nodes or communicate information about specific nodes and devices to operators.
- Reduced developer complexity: Kubernetes schedules Pods based on the referenced device configuration. App operators don't need to select specific nodes in their workloads and don't need to ensure that each Pod requests exactly the number of devices that are attached to those nodes.
- Centralized infrastructure management: platform administrators can centrally define hardware configurations that meet specific business requirements. For example, a platform administrator could declare a high-performance configuration that has H100 GPUs alongside a small inference configuration that has Tesla T4 GPUs.
- Flexible hardware selection: DRA lets you use CEL expressions to filter for devices that have specific attributes. Using expressions provides the flexibility to filter for devices that are optimal for specific workloads.
When to use DRA
During the Preview, the primary reason to use DRA in GKE is the flexibility with which you can request devices for workloads. You can write a manifest once and deploy the workload to different clusters with different device types without needing to change the manifest. This flexibility is ideal for use cases like the following:
- Improve GPU obtainability: for workloads that need access to GPU hardware, you can use DRA to request any available GPU in the cluster instead of needing to specify a GPU model. If those workloads have specific GPU memory (VRAM) requirements, you can request any GPU in the cluster that has a minimum amount of memory. This type of flexible request expands the set of GPU nodes that a workload can run on, which reduces the risk of the workload not being scheduled because of unavailable resources.
- Optimize GPU node availability during scaling: the number of attached GPUs that a workload requires might change depending on the GPU type. You can use a GKE compute class to provision nodes based on GPU availability, quota, or capacity reservations. You can then use DRA in your workloads to configure the Pods to run on any node that GKE provisions for the compute class. Using DRA with compute classes lets you minimize the risk of unscheduled workloads while also ensuring that the workloads run on optimized hardware.
Terminology
Open source Kubernetes and managed Kubernetes providers like GKE use the following DRA terms:
- ResourceSlice
- A ResourceSlice lists one or more hardware devices in the cluster that nodes can access. For example, in a node that can access a single GPU, the ResourceSlice lists the GPU and the name of the node. The DRA device drivers on each node create ResourceSlices. The Kubernetes scheduler uses ResourceSlices to decide which devices to allocate to satisfy workload requests.
- DeviceClass
-
A DeviceClass defines a category of devices, such as GPUs, that are available to request for
workloads.
Some device drivers provide built-in DeviceClasses, such as the
gpu.nvidia.com
DeviceClass for NVIDIA GPUs. Platform administrators can also create custom DeviceClasses that define specific device configurations. - ResourceClaim
-
A ResourceClaim lets a Pod or a user request hardware resources by filtering for certain parameters within a DeviceClass. When a workload references a ResourceClaim, Kubernetes assigns devices that match the specified parameters to that ResourceClaim.
For example, consider a scenario in which you create a ResourceClaim for one A100 (40 GB) GPU and then deploy a workload that selects that ResourceClaim. Kubernetes assigns an available A100 (40 GB) GPU to the ResourceClaim and schedules your Pod on a node that can access that GPU.
- ResourceClaimTemplate
-
A ResourceClaimTemplate defines a template that Pods can use to automatically create new per-Pod ResourceClaims. ResourceClaimTemplates are useful when you have multiple workloads that need access to similar device configurations, especially when using a workload controller like Deployments or StatefulSets.
App operators deploy ResourceClaimTemplates and then reference the templates in workloads. Kubernetes creates ResourceClaims for each Pod based on the specified template, allocates devices, and schedules the Pods. When the Pods terminate, Kubernetes cleans up the corresponding ResourceClaims.
How DRA works
Using DRA in your clusters and workloads is a similar experience to using StorageClasses, PersistentVolumeClaims, and PersistentVolumes to dynamically provision volumes for Pods.
The following diagram shows the steps that cluster administrators and app operators take to allocate devices by using DRA:
In this diagram, cluster administrators and app operators do the following:
- Cluster administrators install device drivers that support DRA in the nodes.
- Cluster administrators create DeviceClasses that filter for hardware that meets specific requirements, such as all GPUs with more than 40 GB of memory. Some devices might also include built-in DeviceClasses.
- Application operators create ResourceClaimTemplates or ResourceClaims
that request device configurations. The primary use case for each type of
claim is as follows:
- A ResourceClaim lets multiple Pods share access to the same device.
- A ResourceClaimTemplate lets multiple Pods access separate, similar devices by automatically generating per-Pod ResourceClaims.
- Application operators add the ResourceClaimTemplates or ResourceClaims to their workload manifests.
- Application operators deploy the workload.
When you deploy a workload that references a ResourceClaimTemplate or a ResourceClaim, Kubernetes performs the following scheduling steps:
- If the workload references a ResourceClaimTemplate, Kubernetes creates
a new
ResourceClaim
object for every instance of the workload (for example, every replica in a Deployment). - The Kubernetes scheduler uses the ResourceSlices in the cluster to allocate available, eligible devices to each Pod's ResourceClaim.
- The scheduler places each Pod on a node that has access to the devices that were allocated to the Pod's ResourceClaim.
- The kubelet on the destination node calls the on-node DRA driver to attach the allocated hardware to the Pod to satisfy its resource request.
When to use ResourceClaims and ResourceClaimTemplates
Both ResourceClaims and ResourceClaimTemplates let you indicate to Kubernetes
that you want devices that meet specific requirements. When a ResourceClaim
is referenced in a Pod, Kubernetes allocates devices to the corresponding
ResourceClaim
API resource in the Kubernetes API server. This allocation
happens regardless of whether you created the ResourceClaim or Kubernetes
created the ResourceClaim from a ResourceClaimTemplate.
If you create a ResourceClaim and then reference it in multiple Pods, all of those Pods can access the devices that Kubernetes allocates for that ResourceClaim. For example, this shared access might happen if you reference a specific ResourceClaim in a Deployment manifest that has multiple replicas. However, if the allocated devices aren't configured to be shared by multiple processes, this shared device access across Pods might result in unintended behavior.
A ResourceClaimTemplate lets you define templates that Kubernetes uses to automatically create individual ResourceClaims for Pods. For example, if you reference a ResourceClaimTemplate in a Deployment that has multiple replicas, Kubernetes creates a separate ResourceClaim for each replicated Pod. As a result, each Pod gets its own allocated device instead of sharing access to the device with other Pods. These auto-generated ResourceClaims are bound to the lifetime of the corresponding Pod, and are deleted when the Pod terminates. If you have independent Pods that need access to similar device configurations, use a ResourceClaimTemplate to allocate devices to each Pod separately.
The following table describes some differences between manually creating ResourceClaims and letting Kubernetes create ResourceClaims from a ResourceClaimTemplate:
Manually-created ResourceClaims | Automatically-created ResourceClaims |
---|---|
Managed by you | Managed by Kubernetes |
Provides access to the same devices from multiple Pods | Provides access to devices from a single Pod |
Exists in the cluster independently of Pods | Bound to the lifecycle of the corresponding Pod |
Ideal for multiple workloads that need to share a specific device | Ideal for multiple workloads that need independent device access |
Comparison of DRA with manual device allocation
DRA makes allocating attached devices a similar experience to dynamically provisioning PersistentVolumes. Kubernetes also supports allocating devices by using device plugins. This method involves the following steps:
- A cluster administrator creates nodes that have attached devices, like GPUs.
- The cluster administrator communicates information about specific nodes and their attached devices to workload operators.
- A workload operator requests devices in the workload manifest as follows:
- Select a node that has the required device configuration, like
the GPU model or the TPU type and topology, by using a
nodeSelector
field. - Specify the exact number of devices for the containers to consume by
using the
resources
field in the Pod specification.
- Select a node that has the required device configuration, like
the GPU model or the TPU type and topology, by using a
This manual allocation method requires the application operators and cluster administrators to communicate about which specific nodes or node pools have certain device configurations. They must coordinate workload requests to match the devices on the nodes, or the deployment fails. In comparison, DRA lets you use expressions to flexibly filter for devices based on attributes, and doesn't require workload operators to know the exact configuration of nodes in the cluster.
The following table compares DRA with device plugins:
DRA | Manual allocation |
---|---|
Flexible device selection using CEL expressions | Specific node selection using selectors and resource requests |
Scheduling decisions made by Kubernetes | Scheduling decisions made by the operator using node selectors |
Device filtering is separate from workload creation | Device filtering has to be done in the workload manifest |
Centralized device filtering and needs-based classes, managed by platform administrators | Isolated device filtering by application operators |
App operators don't need to know node capacity, node label information, or the attached device models for each node | App operators must know which nodes have specific models and quantities of certain devices attached. |
Supported GKE devices for DRA
You can use DRA to allocate GPUs or TPUs to GKE workloads. You can allocate any of the GPU and TPU models that GKE supports. For details about the GPUs and TPUs that GKE supports, see the following resources:
Limitations of DRA in GKE
DRA has the following limitations in GKE clusters:
- You can't use DRA with node auto-provisioning.
- You can't use DRA with the following GPU sharing features:
- Time-shared GPUs.
- Multi-instance GPUs.
- Multi-Process Service (MPS).
- You can't use DRA in Autopilot clusters.
- You must use GKE version 1.32.1-gke.1489001 or later.
Recommended skills for understanding and using DRA
This section provides recommendations for platform administrators or app operators who want to use DRA to allocate devices to workloads. DRA significantly changes the method by which you request attached devices, both in GKE and in Kubernetes. To benefit from more advanced use cases, such as cross-device fallback or fine-grained device filtering and selection, consider the following guidance:
- Learn CEL: DRA lets you use CEL expressions to perform fine-grained device filtering in your resource allocation requests and DeviceClasses. The following resources might help you to learn CEL:
- Learn about ComputeClasses in GKE: ComputeClasses let you define priority-based fallback node configurations for GKE to use when scaling a cluster up. You can use ComputeClasses with DRA to meet business needs like provisioning Spot VMs to run inference workloads that request cost-efficient GPUs. The following resources help you to learn about ComputeClasses:
Improve node availability during scaling
ComputeClasses in GKE let you define priority-based fallback behavior that GKE follows when it creates new nodes in clusters. You can use ComputeClasses to configure a prioritized series of node and device configurations that GKE uses when it creates nodes to run workloads. You can then use DRA to ensure that your workload can run on any node within the ComputeClass without needing to manually select nodes by label.
For example, a workload might need either two NVIDIA L4 GPUs or one NVIDIA A100 (40 GB) GPU to run optimally. You could create a ComputeClass that prioritizes creating nodes with one A100 (40 GB) GPU, but can fall back to creating nodes with two L4 GPUs per node. You could then use DRA to request any available GPU for your workload. When you deploy the workload and select that ComputeClass, GKE creates nodes that have one of the specified GPU configurations. With DRA, GKE can place the workload on the first available node, regardless of GPU model, node label, or GPU count.
To learn more, see the following pages:
What's next
- Prepare your GKE infrastructure for DRA workloads
- Dynamically allocate devices to workloads with DRA