This page describes Dynamic Workload Scheduler in Google Kubernetes Engine (GKE). Dynamic Workload Scheduler improves access to GPUs and optimizes scheduling costs.
The information on this page can help you to do the following:
- Understand how Dynamic Workload Scheduler in GKE works.
- Decide whether Dynamic Workload Scheduler is right for your use case.
- Manage disruptions by using Dynamic Workload Scheduler.
- Understand the limitations of Dynamic Workload Scheduler in GKE.
This page is intended for Platform admins and operators and Machine learning (ML) engineers who want to ensure that accelerator infrastructure is optimized for your workloads.
Requirements
To use Dynamic Workload Scheduler in GKE, your clusters must meet the following version requirements based on its mode of operation:
- For Standard clusters, use version 1.28.3-gke.1098000 or later.
- For Autopilot clusters, use version 1.30.3-gke.1451000 or later.
How Dynamic Workload Scheduler works
Dynamic Workload Scheduler is a resource management and Job scheduling platform that optimizes how you get access to GPUs. To use Dynamic Workload Scheduler, you configure your node pools to work with Dynamic Workload Scheduler and specify the required GPU capacity in your workloads. For more information, see Deploy GPUs for batch and AI workloads with Dynamic Workload Scheduler.
GKE automatically provisions VMs by completing the following process when capacity becomes available:
- GKE identifies that your node pool has Dynamic Workload Scheduler enabled and that the workload can wait for an indeterminate amount of time, until all the required nodes are ready to use at the same time.
- The cluster autoscaler accepts your request and calculates the number of necessary nodes, treating them as a single unit.
- The scheduler waits until all needed resources are available in a single zone. Clusters running GKE version 1.29.1-gke.1708000 and later optimize zone selection for lower wait times; earlier versions might experience longer queueing.
- The cluster autoscaler provisions the necessary nodes when available at the same time. All Pods of the workload can run together on newly provisioned
nodes. These nodes run for a maximum of seven days, or for a shorter duration if
you specify a value in the
maxRunDurationSeconds
parameter. This parameter is available with GKE version 1.28.5-gke.1355000 or later. - After the running time you defined in the
maxRunDurationSeconds
parameter ends, the nodes and the Pods are preempted. - The provisioned nodes aren't reused between workload executions. Each provisioning request triggers the creation of new nodes with the new seven-day duration. If the Pods finish sooner and the nodes are no longer utilized, the cluster autoscaler removes them according to the autoscaling profile.
GKE counts the duration for each Dynamic Workload Scheduler request on a node level. The time available for running Pods might be slightly smaller due to delays during startup. Pod retries share this duration, which means that less time is available for Pods after retry. GKE counts the duration for each Dynamic Workload Scheduler request separately.
When to use Dynamic Workload Scheduler
We recommend that you use Dynamic Workload Scheduler if your workloads meet all of the following conditions:
- You request GPUs to run your workloads.
- You have limited or no reserved GPU capacity and you need more reliable access to GPUs.
- Your workload is time-flexible and your use case can afford to wait to get all the requested capacity, for example, when GKE allocates the GPU resources outside of the busiest hours.
- Your workload requires multiple nodes and can't start running until all GPU nodes are provisioned and ready at the same time (for example, if you are running distributed ML training workloads).
Manage disruptions in workloads that use Dynamic Workload Scheduler
Workloads that require the availability of all nodes, or most nodes, in a node pool are sensitive to evictions. In addition, nodes provisioned using Dynamic Workload Scheduler requests don't support automatic repair or upgrade operations. These operations remove all workloads from a node, which prevents the workloads from running.
Best practices to minimize workload disruptions
To minimize disruptions to running workloads that use Dynamic Workload Scheduler, perform the following tasks:
- Depending on your cluster's
release channel enrollment,
use the following best practices to prevent node auto-upgrades from
disrupting your workloads:
- If your cluster isn't enrolled in a release channel, disable node auto-upgrades.
- If your cluster is enrolled in a release channel, use maintenance windows and exclusions to prevent GKE from automatically upgrading your nodes while your workload is running.
- Disable node auto-repair.
- Use maintenance windows and exclusions to minimize disruption to running workloads, while ensuring that GKE still has the time to do automatic maintenance. Be sure to schedule that time for when there are no running workloads.
- To ensure that your node pool remains up to date, manually upgrade your node pool when there are no active Dynamic Workload Scheduler requests and the node pool is empty.
Limitations
- Inter-pod anti-affinity is not supported. Cluster autoscaler doesn't consider inter-pod anti-affinity rules during node provisioning, which might lead to unschedulable workloads. This situation might happen when nodes for two or more Dynamic Workload Scheduler objects were provisioned in the same node pool.
- Only GPU nodes are supported.
- Reservations aren't supported with Dynamic Workload Scheduler. You have to specify
--reservation-affinity=none
when you create the node pool. Dynamic Workload Scheduler requires and supports only theANY
location policy for cluster autoscaling. - A single Dynamic Workload Scheduler request can create up to 1,000 virtual machines (VMs), which is the maximum number of nodes per zone for a single node pool.
- GKE uses the Compute Engine
ACTIVE_RESIZE_REQUESTS
quota to control the number of Dynamic Workload Scheduler requests that are pending in a queue. By default, this quota has a limit of 100 requests per Google Cloud project. If you attempt to create a Dynamic Workload Scheduler request that's greater than this quota, the new request fails. - Node pools that use Dynamic Workload Scheduler are sensitive to disruption because the nodes are provisioned together. To learn more, see Manage disruption in workloads that use Dynamic Workload Scheduler.
- You might see additional short-lived VMs listed in the Google Cloud console. This behavior is intended because Compute Engine might create and then promptly remove VMs until the capacity to provision all of the required machines is available.
- Spot VMs aren't supported.