About GPU obtainability with Dynamic Workload Scheduler


This page describes Dynamic Workload Scheduler in Google Kubernetes Engine (GKE). Dynamic Workload Scheduler improves access to GPUs and optimizes scheduling costs.

The information on this page can help you to do the following:

  • Understand how Dynamic Workload Scheduler in GKE works.
  • Decide whether Dynamic Workload Scheduler is right for your use case.
  • Manage disruptions by using Dynamic Workload Scheduler.
  • Understand the limitations of Dynamic Workload Scheduler in GKE.

This page is intended for Platform admins and operators and Machine learning (ML) engineers who want to ensure that accelerator infrastructure is optimized for your workloads.

Requirements

To use Dynamic Workload Scheduler in GKE, your clusters must meet the following version requirements based on its mode of operation:

How Dynamic Workload Scheduler works

Dynamic Workload Scheduler is a resource management and Job scheduling platform that optimizes how you get access to GPUs. To use Dynamic Workload Scheduler, you configure your node pools to work with Dynamic Workload Scheduler and specify the required GPU capacity in your workloads. For more information, see Deploy GPUs for batch and AI workloads with Dynamic Workload Scheduler.

GKE automatically provisions VMs by completing the following process when capacity becomes available:

  1. GKE identifies that your node pool has Dynamic Workload Scheduler enabled and that the workload can wait for an indeterminate amount of time, until all the required nodes are ready to use at the same time.
  2. The cluster autoscaler accepts your request and calculates the number of necessary nodes, treating them as a single unit.
  3. The scheduler waits until all needed resources are available in a single zone. Clusters running GKE version 1.29.1-gke.1708000 and later optimize zone selection for lower wait times; earlier versions might experience longer queueing.
  4. The cluster autoscaler provisions the necessary nodes when available at the same time. All Pods of the workload can run together on newly provisioned nodes. These nodes run for a maximum of seven days, or for a shorter duration if you specify a value in the maxRunDurationSeconds parameter. This parameter is available with GKE version 1.28.5-gke.1355000 or later.
  5. After the running time you defined in the maxRunDurationSeconds parameter ends, the nodes and the Pods are preempted.
  6. The provisioned nodes aren't reused between workload executions. Each provisioning request triggers the creation of new nodes with the new seven-day duration. If the Pods finish sooner and the nodes are no longer utilized, the cluster autoscaler removes them according to the autoscaling profile.

GKE counts the duration for each Dynamic Workload Scheduler request on a node level. The time available for running Pods might be slightly smaller due to delays during startup. Pod retries share this duration, which means that less time is available for Pods after retry. GKE counts the duration for each Dynamic Workload Scheduler request separately.

When to use Dynamic Workload Scheduler

We recommend that you use Dynamic Workload Scheduler if your workloads meet all of the following conditions:

  • You request GPUs to run your workloads.
  • You have limited or no reserved GPU capacity and you need more reliable access to GPUs.
  • Your workload is time-flexible and your use case can afford to wait to get all the requested capacity, for example, when GKE allocates the GPU resources outside of the busiest hours.
  • Your workload requires multiple nodes and can't start running until all GPU nodes are provisioned and ready at the same time (for example, if you are running distributed ML training workloads).

Manage disruptions in workloads that use Dynamic Workload Scheduler

Workloads that require the availability of all nodes, or most nodes, in a node pool are sensitive to evictions. In addition, nodes provisioned using Dynamic Workload Scheduler requests don't support automatic repair or upgrade operations. These operations remove all workloads from a node, which prevents the workloads from running.

Best practices to minimize workload disruptions

To minimize disruptions to running workloads that use Dynamic Workload Scheduler, perform the following tasks:

Limitations

  • Inter-pod anti-affinity is not supported. Cluster autoscaler doesn't consider inter-pod anti-affinity rules during node provisioning, which might lead to unschedulable workloads. This situation might happen when nodes for two or more Dynamic Workload Scheduler objects were provisioned in the same node pool.
  • Only GPU nodes are supported.
  • Reservations aren't supported with Dynamic Workload Scheduler. You have to specify--reservation-affinity=none when you create the node pool. Dynamic Workload Scheduler requires and supports only the ANY location policy for cluster autoscaling.
  • A single Dynamic Workload Scheduler request can create up to 1,000 virtual machines (VMs), which is the maximum number of nodes per zone for a single node pool.
  • GKE uses the Compute Engine ACTIVE_RESIZE_REQUESTS quota to control the number of Dynamic Workload Scheduler requests that are pending in a queue. By default, this quota has a limit of 100 requests per Google Cloud project. If you attempt to create a Dynamic Workload Scheduler request that's greater than this quota, the new request fails.
  • Node pools that use Dynamic Workload Scheduler are sensitive to disruption because the nodes are provisioned together. To learn more, see Manage disruption in workloads that use Dynamic Workload Scheduler.
  • You might see additional short-lived VMs listed in the Google Cloud console. This behavior is intended because Compute Engine might create and then promptly remove VMs until the capacity to provision all of the required machines is available.
  • Spot VMs aren't supported.

What's next