This page explains what Spot VMs are and how they work in Google Kubernetes Engine (GKE). To learn how to use Spot VMs, refer to Use Spot VMs.
Overview of Spot VMs in GKE
Spot VMs are Compute Engine virtual machine (VM) instances that are priced lower than standard Compute Engine VMs and provide no guarantee of availability. Spot VMs offer the same machine types and options as standard VMs.
You can use Spot VMs in your clusters and node pools to run stateless, batch, or fault-tolerant workloads that can tolerate disruptions caused by the ephemeral nature of Spot VMs.
Spot VMs remain available until Compute Engine requires the resources for standard VMs.
To learn more about Spot VMs, see Spot VMs in the Compute Engine documentation.
Benefits of Spot VMs
Spot VMs and preemptible VMs share many benefits, including the following:
- Lower pricing than standard Compute Engine VMs.
- Useful for stateless, fault-tolerant workloads that are resilient to the ephemeral nature of these VMs.
- Works with the cluster autoscaler and node auto-provisioning.
In contrast to preemptible VMs, which expire after 24 hours, Spot VMs have no expiration time. Spot VMs are only terminated when Compute Engine needs the resources elsewhere.
How Spot VMs work in GKE
When you create a cluster or node pool with Spot VMs, GKE creates underlying Compute Engine Spot VMs that behave like a managed instance group (MIG). Nodes that use Spot VMs behave like standard GKE nodes, but with no guarantee of availability. When the resources used by Spot VMs are required to run standard VMs, Compute Engine terminates those Spot VMs to use the resources elsewhere.
Termination and graceful shutdown of Spot VMs
When Compute Engine needs to reclaim the resources used by Spot VMs, a termination notice is sent to GKE. Spot VMs terminate 30 seconds after receiving a termination notice.
By default, clusters use graceful node shutdown. The kubelet notices the termination notice and gracefully terminates Pods that are running on the node. If the Pods are part of a managed workload, such as a Deployment, the controller creates and schedules new Pods to replace the terminated Pods.
On a best-effort basis, the kubelet grants a graceful termination period of 15
seconds for non-system Pods, after which system Pods (with the
system-cluster-critical
or system-node-critical
priorityClasses) have 15
seconds to gracefully terminate. During graceful node termination, the kubelet
updates the status of the Pods and assigns a Failed
phase and a Terminated
reason to the terminated Pods.
The VM shuts down 30 seconds after the termination notice is sent even if you
specify a value greater than 15 seconds in the terminationGracePeriodSeconds
field of your Pod manifest.
When the number of terminated Pods reaches a threshold of 1000 for clusters with fewer than 100 nodes or 5000 for clusters with 100 nodes or more, garbage collection cleans up the Pods.
You can also delete terminated Pods manually using the following commands:
kubectl get pods --all-namespaces | grep -i NodeShutdown | awk '{print $1, $2}' | xargs -n2 kubectl delete pod -n
kubectl get pods --all-namespaces | grep -i Terminated | awk '{print $1, $2}' | xargs -n2 kubectl delete pod -n
Scheduling workloads on Spot VMs
GKE automatically adds both the cloud.google.com/gke-spot=true
and cloud.google.com/gke-provisioning=spot
(for nodes running
GKE version 1.25.5-gke.2500 or later)
labels
to nodes that use Spot VMs. You can schedule specific Pods on nodes
that use Spot VMs using the
nodeSelector
field in your Pod spec. The following examples use the
cloud.google.com/gke-spot
label:
apiVersion: v1
kind: Pod
spec:
nodeSelector:
cloud.google.com/gke-spot: "true"
Alternatively, you can use node affinity to tell GKE to schedule Pods on Spot VMs, similar to the following example:
apiVersion: v1
kind: Pod
spec:
...
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: cloud.google.com/gke-spot
operator: In
values:
- "true"
...
You can also use nodeAffinity.preferredDuringSchedulingIgnoredDuringExecution
to prefer that GKE places Pods on nodes that use Spot VMs.
Preferring Spot VMs is not recommended, because GKE might
schedule the Pods onto existing viable nodes that use standard VMs instead.
Using taints and tolerations for scheduling
To avoid system disruptions, use a node taint to ensure that GKE doesn't schedule critical workloads onto Spot VMs. When you taint nodes that use Spot VMs, GKE only schedules Pods that have the corresponding toleration onto those nodes.
If you use node taints, ensure that your cluster also has at least one node pool that uses standard Compute Engine VMs. Node pools that use standard VMs provide a reliable place for GKE to schedule critical system components like DNS.
For information on using a node taint for Spot VMs, see Use taints and tolerations for Spot VMs.
Using Spot VMs with GPU node pools
Spot VMs support using GPUs.
When you create a new GPU node pool, GKE automatically adds the
nvidia.com/gpu=present:NoSchedule
taint to the new nodes. Only Pods with the
corresponding toleration can run on these nodes. GKE automatically
adds this toleration to Pods that request GPUs.
Your cluster must have at least one existing non-GPU node pool that uses
standard VMs before you create a GPU node pool that uses Spot VMs. If your
cluster only has a GPU node pool with Spot VMs, GKE doesn't add
the nvidia.com/gpu=present:NoSchedule
taint to those nodes. As a result, GKE
might schedule system workloads onto the GPU node pools with Spot VMs, which
can lead to disruptions because of the Spot VMs and can increase your
resource consumption because GPU nodes are more expensive than non-GPU nodes.
Cluster autoscaler and node auto-provisioning
You can use the cluster autoscaler and node auto-provisioning to automatically scale your clusters and node pools based on the demands of your workloads. Both the cluster autoscaler and node auto-provisioning support using Spot VMs.
Spot VMs and node auto-provisioning
Node auto-provisioning automatically creates and deletes node pools in your
cluster to meet the demands of your workloads. When you schedule workloads that
require Spot VMs
by using a nodeSelector
or node affinity, node auto-provisioning creates new
node pools to accommodate the workloads' Pods. GKE automatically
adds the cloud.google.com/gke-spot=true:NoSchedule
taint to nodes in the new
node pools. Only Pods with the corresponding toleration can run on nodes in
those node pools. You must add the corresponding toleration to your deployments
to allow GKE to place the Pods on Spot VMs:
tolerations:
- key: cloud.google.com/gke-spot
operator: Equal
value: "true"
effect: NoSchedule
You can ensure that GKE only schedules your Pods on Spot VMs
by using both a toleration and either a nodeSelector
or node affinity rule to
filter for Spot VMs.
If you schedule a workload using only a toleration, GKE can
schedule the Pods onto either Spot VMs or existing standard VMs with
capacity. If you require a workload to be scheduled on Spot VMs, use a
nodeSelector
or a node affinity in addition to a toleration. To learn more,
see Scheduling workloads on Spot VMs.
Spot VMs and cluster autoscaler
Cluster autoscaler automatically adds and removes nodes in your node pools based on demand. You can configure cluster autoscaler to add new nodes with a preference for Spot VMs. To learn more, see Spot VMs and cluster autoscaler.
Default policy
Starting in GKE version 1.24.1-gke.800, you can define the
autoscaler location policy. Cluster autoscaler attempts to provision
Spot VMs node pools when resources are available and the default
location policy is set to ANY
. With this policy, Spot VMs have a
lower risk of being preempted. For other VM types, the default cluster
autoscaler distribution policy is BALANCED
.
Upgrade Standard node pools using Spot VMs
If your Standard cluster node pools using Spot VMs are configured to use surge upgrades, GKE creates surge nodes with Spot VMs. However, GKE doesn't wait for the Spot VMs to be ready before cordoning and draining the existing nodes, as Spot VMs provide no guarantee of availability. To learn more, see Surge upgrades.
Modifications to Kubernetes behavior
Using Spot VMs on GKE modifies some guarantees and constraints that Kubernetes provides, such as the following:
- Reclamation of Spot VMs is involuntary and is not covered by the
guarantees of
PodDisruptionBudgets
. You might experience greater unavailability than your configuredPodDisruptionBudget
.
Best practices for Spot VMs
When designing a system that uses Spot VMs, you can avoid major disruptions by using the following guidelines:
- Spot VMs have no availability guarantees. Design your systems under the assumption that GKE might reclaim any or all your Spot VMs at any time, with no guarantee of when new instances become available.
- To ensure that your workloads and Jobs are processed even when no Spot VMs are available, ensure that your clusters have a mix of node pools that use Spot VMs and node pools that use standard Compute Engine VMs.
- Ensure that your cluster has at least one non-GPU node pool that uses standard VMs before you add a GPU node pool that uses Spot VMs.
- While the node names do not usually change when nodes are recreated, the internal and external IP addresses used by Spot VMs might change after recreation.
- Use node taints and tolerations to ensure that critical Pods aren't scheduled onto node pools that use Spot VMs.
- To run stateful workloads on Spot VMs, test to ensure that your workloads can gracefully terminate within 25 seconds of shutdown to minimize the risk of persistent volume data corruption.
- Follow the Kubernetes Pod termination best practices.
What's next
- Learn how to use Spot VMs in your node pools.
- Learn about autoscaling your clusters.
- Learn how to scale your deployed apps.
- Learn more about Spot VMs in the Compute Engine documentation.
- Take a tutorial about deploying a batch workload using Spot VMs in GKE.