This page describes how to configure your GPU workloads to use dynamic resource allocation in your Google Distributed Cloud bare metal clusters. Dynamic resource allocation is a Kubernetes API that lets you request and share generic resources, such as GPUs, among Pods and containers. Third-party drivers manage these resources.
With dynamic resource allocation, Kubernetes schedules Pods based on the referenced device configuration. App operators don't need to select specific nodes in their workloads and don't need to ensure that each Pod requests exactly the number of devices that are attached to those nodes. This process is similar to allocating volumes for storage.
This capability helps you run AI workloads by dynamically and precisely allocating the GPU resources within your bare metal clusters, improving resource utilization and performance for demanding workloads.
This page is for Admins and architects and Operators who manage the lifecycle of the underlying tech infrastructure. To learn more about common roles and example tasks that we reference in Google Cloud content, see Common GKE Enterprise user roles and tasks.
Before you begin
Before you configure your GPU workloads to use dynamic resource allocation, verify that the following prerequisites are met:
- Your bare metal cluster is at version 1.33.0 or later.
- Your operating system is either Ubuntu 22.04 or Red Hat Enterprise Linux (RHEL) 9.4.
- You have updated your cluster to enable dynamic resource allocation as described in Enable dynamic resource allocation.
- You have at least one node machine with a GPU attached and the NVIDIA GPU driver installed. For more information, see Install or uninstall the bundled NVIDIA GPU Operator.
- You have followed the instructions in NVIDIA DRA Driver for GPUs to install the NVIDIA DRA driver on all GPU-attached nodes.
Create GPU workloads that use dynamic resource allocation
For your GPU workloads to take advantage of dynamic resource allocation to request GPUs, they must
be in a shared namespace with a ResourceClaim
that describes the request for
GPU device allocation. Your workloads must reference the ResourceClaim
for
Kubernetes to assign GPU resources.
The following steps set up an environment in which your workloads use dynamic resource allocation to request GPU resources:
To create resources related to dynamic resource allocation, create a new
Namespace
in your cluster:cat <<EOF | kubectl apply --kubeconfig=CLUSTER_KUBECONFIG -f - apiVersion: v1 kind: Namespace metadata: name: NAMESPACE_NAME EOF
Replace the following:
CLUSTER_KUBECONFIG
: the path of the user cluster kubeconfig file.NAMESPACE_NAME
with the name for your dynamic resource allocation namespace.
Create a
ResourceClaim
to describe the request for GPU access:cat <<EOF | kubectl apply --kubeconfig=CLUSTER_KUBECONFIG -f - apiVersion: resource.k8s.io/v1beta1 kind: ResourceClaim metadata: namespace: NAMESPACE_NAME name: RESOURCE_CLAIM_NAME spec: devices: requests: - name: gpu deviceClassName: gpu.nvidia.com EOF
Replace
RESOURCE_CLAIM_NAME
with the name of your resource claim for GPU requests.Create workloads that reference the
ResourceClaim
created in the preceding step.The following workload examples show how to reference a
ResourceClaim
namedgpu-claim
in thedra-test
namespace. The containers in thepod1
Pod are NVIDIA compute unified device architecture (CUDA) samples designed to run CUDA workloads on the GPUs. When thepod1
Pod completes successfully, it indicates that the dynamic resource allocation capability is working properly and dynamic resource allocation is ready to manage GPU resources in your cluster.Ubuntu
Use the following command to apply the manifest to your cluster:
cat <<EOF | kubectl apply --kubeconfig=CLUSTER_KUBECONFIG -f - apiVersion: v1 kind: Pod metadata: name: pod1 namespace: dra-test spec: restartPolicy: OnFailure resourceClaims: - name: gpu resourceClaimName: gpu-claim containers: - name: ctr0 image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0 resources: claims: - name: gpu - name: ctr1 image: nvcr.io/nvidia/k8s/cuda-sample:devicequery resources: claims: - name: gpu EOF
RHEL
Download and install SELinux policy module
nvidia_container_t
, which is required to access GPUs.For more information, refer to the NVIDIA dgx-selinux repository.
Use the following command to apply the manifest to your cluster:
cat <<EOF | kubectl apply --kubeconfig=CLUSTER_KUBECONFIG -f - apiVersion: v1 kind: Pod metadata: name: pod1 namespace: dra-test spec: restartPolicy: OnFailure securityContext: seLinuxOptions: type: nvidia_container_t resourceClaims: - name: gpu resourceClaimName: gpu-claim containers: - name: ctr0 image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0 resources: claims: - name: gpu - name: ctr1 image: nvcr.io/nvidia/k8s/cuda-sample:devicequery resources: claims: - name: gpu EOF
Limitations
Consider the following limitations when you use dynamic resource allocation:
When you use RHEL OS, SELinux policy can interfere with containers that try to access GPUs. For more information, see How to use GPUs in containers on bare metal RHEL 8.
This feature uses the
resource.k8s.io/v1beta1
API group, which differs from the open source Kubernetes API group for this feature,resource.k8s.io/v1
. Thev1
open source API group provides more features and better stability than thev1beta1
API group.
What's next
- Refer to the Kubernetes documentation for more information about dynamic resource allocation.
- Learn how to serve third-party LLMs on bare metal.