Configure Pod bursting in GKE


This page shows you how to configure Pods to burst into available unused capacity on Google Kubernetes Engine (GKE) nodes.

What is bursting?

Bursting describes the action of Pods temporarily using more compute capacity on the node than they originally requested.

Kubernetes lets you request specific capacities of resources like CPU or memory for your Pods. You set these requests in your Pod manifest. The Kubernetes scheduler places your Pods on nodes that have enough capacity to accommodate those resource requests.

Some workloads don't use 100% of the requested resources for their entire run time. For example, a workload that consumes extra CPU during its boot period might not require the same amount of resources for normal operations. In these situations, you can set the resource limits for your workload to a higher value than the resource requests or leave the limits unset. GKE allows the workload to temporarily use more resources than you specified in the requests, if that capacity is available.

For more information about how this process works in GKE, see Burstable capacity in GKE in this page.

Benefits of Pod bursting

Bursting is useful when your Pods only need additional resources for short periods of time to accommodate spikes in resource usage. Example scenarios include the following:

  • You have groups of workloads that are often idle and send a small number of requests per second, but occasionally experience spikes in traffic and would benefit from additional resources to process those requests.
  • Your workloads need more resources during startup than during normal operations.
  • You want to maximize the usage of the compute capacity that you provision.

Bursting lets you request only the resources that your Pod needs for the majority of its runtime, while also ensuring that your Pod can consume more resources if needed. The benefits of bursting include the following:

  • Lower running costs: You don't need to request the expected peak resource consumption of the workload. Your requests can be for the lower steady-state values. In Autopilot, you pay for the sum of your Pod resource requests, so your running costs are lower.
  • More efficient resource usage: You avoid idle compute capacity because your Pods burst into unused capacity. Your workloads are more likely to use all of your paid-for resources.
  • Improved performance: Pods can use extra resources as needed to reduce the time to process incoming requests, or to boot up faster during scale-up events.

When not to use bursting

Kubernetes assigns the Burstable Quality of Service (QoS) class to Pods that specify higher resource limits than their requests. Burstable QoS Pods are more likely to be evicted when Kubernetes needs to reclaim resources on the node. For more information, see Burstable QoS class in the Kubernetes documentation.

Before you begin

Before you start, make sure you have performed the following tasks:

  • Enable the Google Kubernetes Engine API.
  • Enable Google Kubernetes Engine API
  • If you want to use the Google Cloud CLI for this task, install and then initialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running gcloud components update.
  • Ensure that you have a GKE Autopilot cluster running version 1.30.2-gke.1394000 or later, or any version of a GKE Standard cluster. To create a new cluster, see Create an Autopilot cluster.

Bursting availability in GKE

Workloads can burst in the following situations:

Bursting availability
GKE Autopilot mode

Pods that use the Performance compute class or the Accelerator compute class can burst in any GKE version that supports that compute class.

In any other compute class and for Pods that don't specify a compute class, bursting is only available if the cluster meets both of the following conditions:

  • You originally created the cluster with GKE version 1.26 or later
  • The cluster is running GKE version 1.30.2-gke.1394000 or later

This restriction exists because, in Autopilot clusters, bursting requires cgroup v2. cgroup v2 is only available in clusters that were originally created with version 1.26 and later.

GKE Standard mode Pods can burst in any GKE version.

Autopilot clusters that were originally created with a version earlier than 1.26 and were later upgraded to 1.29.2-gke.1060000 and later don't support bursting. To check the original cluster version, run the following command:

gcloud container clusters describe CLUSTER_NAME \
    --location=LOCATION \
    --format="value(initialClusterVersion)"

The output must be GKE version 1.26 or later.

Limitations

  • Autopilot workloads can only use bursting for CPU and memory requests.
  • When you upgrade an Autopilot cluster to a supported version, GKE upgrades the worker nodes to match the control plane version over time. A control plane restart is required to enable bursting, and must happen after all the nodes run a supported version. The control plane restarts automatically about once a week during operations like scaling, upgrades, or maintenance.

    To trigger a control plane restart manually, do the following:

    1. Check whether all of your nodes run version 1.30.2-gke.1394000 or later:

      kubectl get nodes
      

      The output is similar to the following:

      NAME                                          STATUS   ROLES    AGE     VERSION
      gk3-ap-cluster-1-default-pool-18092e49-mllk   Ready    <none>   4m26s   v1.30.2-gke.1349000
      

      All of the nodes in the output must show the required version or later.

    2. Manually start a control plane upgrade to the same version that the cluster already uses. For instructions, see Manually upgrading the control plane.

Connect to the cluster

Run the following command:

gcloud container clusters get-credentials CLUSTER_NAME \
    --location=LOCATION

Replace the following:

  • CLUSTER_NAME: the name of your existing cluster.
  • LOCATION: the location of your cluster.

Deploy a burstable workload

  1. Save the following manifest as burstable-deployment.yaml:

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: helloweb
      labels:
        app: hello
    spec:
      selector:
        matchLabels:
          app: hello
          tier: web
      template:
        metadata:
          labels:
            app: hello
            tier: web
        spec:
          containers:
          - name: hello-app
            image: us-docker.pkg.dev/google-samples/containers/gke/hello-app:1.0
            ports:
            - containerPort: 8080
            resources:
              requests:
                cpu: 250m
              limits:
                cpu: 350m
    

    This manifest has the following fields to enable bursting:

    • resources.requests: The resources that the container requires to function. Set this value to the capacity that your container will need in the steady-state.
    • resources.limits: The maximum resource capacity that the container can use. Setting the limits higher than the requests lets Pods burst up to the specified limit if that capacity is available on the node. If you omit this field, the Pods can burst up to the available burstable capacity on the node. This capacity is calculated as follows:
      • Autopilot mode: Unused capacity in the sum of the resource requests of Pods on the node.
      • Standard mode: Unused capacity in the node resources.
    • spec.nodeSelector and spec.tolerations: Optional. Add these fields with custom labels like pod-type: "non-critical" to tell GKE to create new nodes to run the burstable Pods. GKE applies taints to these new nodes to prevent other Pods, like critical workloads, from running on the same nodes. Autopilot enforces higher minimum resource requests for Pods that use workload separation. For details, see Configure workload separation in GKE and Resource requests in Autopilot.
  2. Deploy the workload:

    kubectl apply -f burstable-deployment.yaml
    

    The workload might take a few minutes to start.

  3. Check the QoS class of a Pod:

    kubectl describe pod helloweb | grep -m 1 "QoS"
    

    The output is the following:

    QoS Class: Burstable
    

Burstable capacity in GKE

To facilitate Pod bursting, GKE calculates the burstable capacity for each node in a cluster. This calculation for a specific node is as follows:

  • Autopilot clusters:

    • Pods that request accelerators or request specific machine series: The node allocatable resource capacity, which is the capacity that's available for workload usage. For details , see Node allocatable resources.
    • All other Pods: The sum of resource requests of all Pods on that node, regardless of the actual resource capacity of the node. If a Pod is terminated, the burstable capacity reduces by that Pod's requests. The portion of the burstable capacity that isn't in use by running Pods is available to allocate if one of the Pods needs to burst.

    Autopilot also adds a predefined buffer to the burstable capacity so that any system Pods on the node that burst beyond their requests don't affect your own burstable Pods.

  • Standard clusters: The node allocatable resource capacity, which is the capacity that's available for workload usage. For details , see Node allocatable resources.

Best practices for bursting

Use the following practices with Pod bursting:

  • Set your resource requests equal to your limits for any Pods that provide critical functionality in your environment. This ensures that those Pods get the Guaranteed Kubernetes Quality of Service (QoS) class.
  • Ensure that you only configure memory bursting on Pods that can handle being evicted when Kubernetes needs to reclaim memory on the node.
  • Always request enough memory for your Pod to boot up. Don't rely on memory bursting to meet your boot requirements.
  • To prevent burstable Pods that consistently burst into multiples of their CPU requests from potentially disrupting critical workloads, use workload separation to avoid placing those Pods alongside your critical Pods.

Optimize burstable capacity in Autopilot nodes

Autopilot calculates the burstable capacity as the sum of resource requests of all the Pods on a specific node, including system Pods and DaemonSets. You can optimize the burstable capacity on a node in the following ways. However, bursting is opportunistic and isn't guaranteed.

  • To increase the burstable capacity on nodes for specific workloads, use Pod affinity to place specific Pods together on the same node.
  • To ensure that a specific burstable capacity is always available on every node, create DaemonSets to run on all nodes in the cluster.

Example of how bursting works

This section uses an example Deployment that has the following burstable Pods to demonstrate how Pod bursting works in GKE Autopilot clusters:

  • Pod 1 requests 250m CPU and has no CPU limit. Pod 1 uses 100m CPU to run.
  • Pod 2 requests 200m CPU and has a 250m CPU limit. Pod 2 uses 100m CPU to run.

Both Pods run on the same node. The total burstable capacity on the node is 450m CPU (the sum of resource requests). Each Pod only uses 100m CPU to run, which means that the node has a remaining available burstable capacity of 250m.

Consider the following scenarios in which a traffic spike occurs:

  • Pod 1 needs an additional 300m CPU: it can burst and use 250m CPU, which is the available burstable capacity. The node no longer has any available burstable capacity.
  • Pod 2 needs an additional 150m CPU: it can burst and use an extra 150m CPU. The node then has 100m CPU remaining of available burstable capacity.
  • Pod 2 needs an additional 200m CPU: it can burst and use 150m CPU, which brings the total usage to 250m CPU for Pod 2. Pod 2 has a 250m CPU limit and can't burst beyond that limit.

How GKE handles Pods that exceed burstable capacity

If your burstable Pods try to use more resources than the burstable capacity on the node, GKE takes the following actions:

  • CPU: If the CPU usage exceeds the burstable capacity, GKE throttles the CPU usage of some containers so that all containers on the node get the CPU that they request.
  • Memory: If the memory usage exceeds the burstable capacity, GKE terminates containers to reclaim memory on the node. GKE starts by terminating resource-intensive containers in Pods with a lower QoS.

We recommend that you always request enough memory for normal Pod operation. If a container has a dependency on memory bursting to function normally, it might crash repeatedly if that memory isn't available.

Use Pod bursting with spare capacity provisioning

GKE lets you deploy idle Pods to reserve extra compute capacity for faster Pod scaling during future high-traffic events like online store flash sales. Other Pods on the same node can burst into this unused reserved capacity so that the capacity isn't idle in the time leading up to your high-traffic event. You can reserve this capacity by using various Kubernetes mechanisms. For example, you can deploy Pods that have a low PriorityClass. For details, see Provision extra compute capacity for rapid Pod scaling.

Pod bursting in GKE Standard clusters

GKE Standard clusters also support Pod bursting by setting the limits higher than the requests or by omitting limits. However, in Standard clusters, you must create and configure node pools with appropriate resource capacity to support bursting. Getting the potential cost reduction of burstable Pods in Standard clusters requires more careful node planning and Pod bin-packing, because you pay for the underlying Compute Engine VMs.

Consider the following in Standard clusters:

  • The maximum resource consumption limit that triggers Kubernetes eviction or CPU throttling is the allocatable resource capacity on the node. To determine this value, see Plan GKE Standard node sizes.

  • Node resource usage in Standard clusters is more likely to reach a Kubernetes eviction threshold because GKE doesn't automatically limit resource consumption if you don't specify limits. Pods that burst into memory are therefore more likely to be terminated by Kubernetes node-pressure eviction.

What's next