Manually upgrading a cluster or node pool


By default, automatic upgrades are enabled for Google Kubernetes Engine (GKE) clusters and for GKE Standard node pools.

This page explains how to manually request an upgrade or downgrade for the control plane or nodes of a GKE cluster. You can manually upgrade the version as follows:

To upgrade a cluster, GKE updates the version the control plane and nodes are running. Clusters are upgraded to either a newer minor version (for example, 1.24 to 1.25) or newer patch version (for example, 1.24.2-gke.100 to 1.24.5-gke.200). For more information, see GKE versioning and support.

You can learn more about how automatic and manual cluster upgrades work. You can also control when auto-upgrades can and cannot occur by configuring maintenance windows and exclusions.

New versions of GKE are announced regularly, and you can receive notice about the new versions available for each specific cluster with cluster notifications. To find specific auto-upgrade targets for clusters, get information about a cluster's upgrades (Preview).

To learn about available versions, see Versioning. To learn more about clusters, see Cluster architecture. For guidance on upgrading clusters, see Best practices for upgrading clusters.

Before you begin

Before you start, make sure you have performed the following tasks:

  • Enable the Google Kubernetes Engine API.
  • Enable Google Kubernetes Engine API
  • If you want to use the Google Cloud CLI for this task, install and then initialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running gcloud components update.

Save your data to persistent disks

Before upgrading a node pool, you must ensure that any data you wish to keep is stored in a Pod using persistent volumes which use persistent disks. Persistent disks are unmounted, rather than erased, during upgrades, and their data is "handed off" between Pods.

The following restrictions pertain to persistent disks:

  • The nodes on which Pods are running must be Compute Engine VMs
  • Those VMs need to be in the same Compute Engine project and zone as the persistent disk

To learn how to add a persistent disk to an existing node instance, see Adding or resizing zonal persistent disks in the Compute Engine documentation.

About upgrading

A cluster's control plane and nodes are upgraded separately.

Cluster control planes are always upgraded on a regular basis, regardless of whether your cluster is enrolled in a release channel or not.

To receive upgrade notifications proactively, refer to Receive cluster notifications.

Limitations

Alpha clusters cannot be upgraded.

Supported versions

The release notes announce when new versions become available and when older versions are no longer available. At any time, you can list all supported cluster and node versions using this command:

gcloud container get-server-config

If your cluster is enrolled in a release channel, you can upgrade to a patch version in a different release channel with the same minor version as your control plane. For example, you can upgrade your cluster from version 1.21.12-gke.1700 in the Regular channel to 1.21.13-gke.900 in the Rapid channel. For more information, refer to Running patch versions from a newer channel. All Autopilot clusters are enrolled in a release channel.

Downgrading limitations

You can downgrade the version of your cluster to an earlier version in certain scenarios.

To mitigate an unsuccessful cluster control plane upgrade, you can downgrade your control plane to a previous patch release if the version is an earlier patch release within the same minor version. For example, if your cluster's control plane is running GKE 1.25.3-gke.400, you can downgrade the control plane to 1.25.2-gke.100, if that version is still available.

You can't downgrade a Kubernetes cluster control plane to an earlier minor version. For example, if your control plane runs GKE version 1.25, you cannot downgrade to 1.24. If you attempt to do this, the following error message appears:

ERROR: (gcloud.container.clusters.upgrade) ResponseError: code=400,
message=Master cannot be upgraded to "1.24.3-gke.100": specified version is not
newer than the current version.

You can't downgrade the minor version of a cluster's control plane, so we recommend that you test and qualify minor version upgrades with clusters in a testing environment when a new minor version becomes available but before the version becomes default. This is especially recommended if your cluster might be affected by significant changes in the next minor version, such as deprecated APIs or features being removed.

To mitigate an unsuccessful node pool upgrade, you can downgrade a node pool to an earlier patch release or minor version. Ensure that you don't downgrade nodes to a version that is more than two minor versions behind the cluster control plane version.

Upgrading the cluster

Google upgrades clusters and nodes automatically. For more control over which auto-upgrades your cluster and its nodes receive, you can enroll it in a release channel. All Autopilot clusters are automatically enrolled in a release channel.

To learn more about managing your cluster's GKE version, see Upgrades.

You can initiate a manual upgrade any time after a new version becomes available.

Manually upgrading the control plane

When initiating a cluster upgrade, you can't modify the cluster's configuration for several minutes, until the control plane is accessible again. If you need to prevent downtime during control plane upgrades, consider using an Autopilot cluster or a regional Standard cluster. This operation does not affect the availability of the worker nodes that your workloads run on as they remain available during control plane upgrades.

You can manually upgrade your Autopilot or Standard control plane using the Google Cloud console or the Google Cloud CLI.

gcloud

To see the available versions for your cluster's control plane, run the following command:

gcloud container get-server-config

To upgrade to the default cluster version, run the following command:

gcloud container clusters upgrade CLUSTER_NAME --master

To upgrade to a specific version that is not the default, specify the --cluster-version flag as in the following command:

gcloud container clusters upgrade CLUSTER_NAME --master \
    --cluster-version VERSION

Replace VERSION with the version that you want to upgrade your cluster to. You can use a specific version, such as 1.18.17-gke.100 or you can use a version alias, like latest. For more information, see Specifying cluster version.

Console

To manually update your cluster control plane, perform the following steps:

  1. Go to the Google Kubernetes Engine page in Google Cloud console.

    Go to Google Kubernetes Engine

  2. Click the desired cluster name.

  3. Under Cluster basics, click Upgrade Available next to Version.

  4. Select the desired version, then click Save Changes.

After upgrading a Standard control plane, you can upgrade its nodes. By default, Standard nodes created using the Google Cloud console have auto-upgrade enabled, so this happens automatically. Autopilot always upgrades nodes automatically.

Downgrading clusters

  1. Set a maintenance exclusion before downgrading to prevent GKE from automatically upgrading the control plane after you downgrade it.
  2. Downgrade the cluster control plane to an earlier patch version:

     gcloud container clusters upgrade CLUSTER_NAME \
         --master --cluster-version VERSION
    

Disabling cluster auto-upgrades

Infrastructure security is high priority for GKE, and as such control planes are upgraded on a regular basis, and cannot be disabled. However, you can apply maintenance windows and exclusions to temporarily suspend upgrades for control planes and nodes.

Although it is not recommended, you can disable node auto-upgrade.

Check recent control plane upgrade history

For a snapshot of a cluster's recent auto-upgrade history, get information about a cluster's upgrades (Preview).

Alternatively, you can list recent operations to see when the control plane was upgraded:

gcloud container operations list --filter="TYPE:UPGRADE_MASTER AND TARGET:CLUSTER_NAME"

Replace CLUSTER_NAME with the name of your cluster.

Upgrading node pools

By default, a cluster's nodes have auto-upgrade enabled. Node auto-upgrades ensure that your cluster's control plane and node version remain in sync and in compliance with the Kubernetes version skew policy, which ensures that control planes are compatible with nodes up to two minor versions older than the control plane. For example, Kubernetes 1.29 control planes are compatible with Kubernetes 1.27 nodes.

Best practice:

Avoid disabling node auto-upgrades so that your cluster benefits from the upgrades listed in the preceding paragraph.

With GKE node pool upgrades, you can choose between two configurable upgrade strategies, namely surge upgrades and blue-green upgrades.

Choose a strategy and use the parameters to tune the strategy to best fit your cluster environment's needs.

How node upgrades work

While a node is being upgraded, GKE stops scheduling new Pods onto it, and attempts to schedule its running Pods onto other nodes. This is similar to other events that re-create the node, such as enabling or disabling a feature on the node pool.

During automatic or manual node upgrades, PodDisruptionBudgets (PDBs) and Pod termination grace period are respected for a maximum of 1 hour. If Pods running on the node can't be scheduled onto new nodes after one hour, GKE initiates the upgrade anyway. This behavior applies even if you configure your PDBs to always have all of your replicas available by setting the maxUnavailable field to 0 or 0% or by setting the minAvailable field to 100% or to the number of replicas. In all of these scenarios, GKE deletes the Pods after one hour so that the node deletion can happen.

Best practice:

If a workload requires more flexibility with graceful termination, use blue-green upgrades which provide settings for additional soak time to extend PDB checks beyond the one hour default.

To learn more about what to expect during node termination in general, see the topic about Pods.

If a workload requires more flexibility with graceful termination, we recommend using blue-green upgrades, which provide settings for additional soak time to extend PDB checks beyond the 1 hour default. For more information about what to expect during node termination in general, see the topic about Pods.

The upgrade is only complete when all nodes have been recreated and the cluster is in the desired state. When a newly-upgraded node registers with the control plane, GKE marks the node as schedulable.

New node instances run the desired Kubernetes version as well as:

Manually upgrade a node pool

You can manually upgrade a node pool version to match the version of the control plane or to a previous version that is still available and is compatible with the control plane. You can manually upgrade multiple node pools in parallel, whereas GKE automatically upgrades only one node pool at a time.

When you manually upgrade a node pool, GKE removes any labels you added to individual nodes using kubectl. To avoid this, apply labels to node pools instead.

Before you manually upgrade your node pool, consider the following conditions:

  • Upgrading a node pool may disrupt workloads running in that node pool. To avoid this, you can create a new node pool with the desired version and migrate the workload. After migration, you can delete the old node pool.
  • If you upgrade a node pool with an Ingress in an errored state, the instance group does not sync. To work around this issue, first check the status using the kubectl get ing command. If the instance group is not synced, you can work around the problem by re-applying the manifest used to create the ingress.

You can manually upgrade your node pools to a version compatible with the control plane, using the Google Cloud console or the Google Cloud CLI.

gcloud

The following variables are used in the commands in this section:

  • CLUSTER_NAME: the name of the cluster of the node pool to be upgraded.
  • NODE_POOL_NAME: the name of the node pool to be upgraded.
  • VERSION: the Kubernetes version to which the nodes are upgraded. For example, --cluster-version=1.7.2 or cluster-version=latest.

Upgrade a node pool:

gcloud container clusters upgrade CLUSTER_NAME \
  --node-pool=NODE_POOL_NAME

To specify a different version of GKE on nodes, use the optional --cluster-version flag:

gcloud container clusters upgrade CLUSTER_NAME \
  --node-pool=NODE_POOL_NAME \
  --cluster-version VERSION

For more information about specifying versions, see Versioning.

For more information, refer to the gcloud container clusters upgrade documentation.

Console

To upgrade a node pool using the Google Cloud console, perform the following steps:

  1. Go to the Google Kubernetes Engine page in Google Cloud console.

    Go to Google Kubernetes Engine

  2. Next to the cluster you want to edit, click Actions, then click Edit.

  3. On the Cluster details page, click the Nodes tab.

  4. In the Node Pools section, click the name of the node pool that you want to upgrade.

  5. Click Edit.

  6. Click Change under Node version.

  7. Select the desired version from the Node version drop-down list, then click Change.

It may take several minutes for the node version to change.

Downgrading node pools

You can downgrade a node pool, for example, to mitigate an unsuccessful node pool upgrade. Review the limitations before downgrading a node pool.

Best practice:

Use the blue-green node upgrade strategy if you need to optimize for risk mitigation for node pool upgrades impacting your workloads. With this strategy, you can roll backan in-progress upgrade to the original nodes if the upgrade is unsuccessful.

  1. Set a maintenance exclusion for the cluster to prevent the node pool from being automatically upgraded by GKE after being downgraded.
  2. To downgrade a node pool, specify an earlier version while following the instructions to Manually upgrade a node pool.

Changing surge upgrade parameters

To learn more about changing surge upgrade parameters, see Configure surge upgrades.

Checking node pool upgrade status

You can check the status of an upgrade using gcloud container operations.

View a list of every running and completed operation in the cluster:

gcloud container operations list

Each operation is assigned an operation ID and an operation type as well as start and end times, target cluster, and status. The list appears similar to the following example:

NAME                              TYPE                ZONE           TARGET              STATUS_MESSAGE  STATUS  START_TIME                      END_TIME
operation-1505407677851-8039e369  CREATE_CLUSTER      us-west1-a     my-cluster                          DONE    20xx-xx-xxT16:47:57.851933021Z  20xx-xx-xxT16:50:52.898305883Z
operation-1505500805136-e7c64af4  UPGRADE_CLUSTER     us-west1-a     my-cluster                          DONE    20xx-xx-xxT18:40:05.136739989Z  20xx-xx-xxT18:41:09.321483832Z
operation-1505500913918-5802c989  DELETE_CLUSTER      us-west1-a     my-cluster                          DONE    20xx-xx-xxT18:41:53.918825764Z  20xx-xx-xxT18:43:48.639506814Z

To get more information about a specific operation, specify the operation ID as shown in the following command:

gcloud container operations describe OPERATION_ID

For example:

gcloud container operations describe operation-1507325726639-981f0ed6
endTime: '20xx-xx-xxT21:40:05.324124385Z'
name: operation-1507325726639-981f0ed6
operationType: UPGRADE_CLUSTER
selfLink: https://container.googleapis.com/v1/projects/.../kubernetes-engine/docs/zones/us-central1-a/operations/operation-1507325726639-981f0ed6
startTime: '20xx-xx-xxT21:35:26.639453776Z'
status: DONE
targetLink: https://container.googleapis.com/v1/projects/.../kubernetes-engine/docs/zones/us-central1-a/clusters/...
zone: us-central1-a

Checking node pool upgrade settings

You can see details on the node upgrade strategy being used for your node pools using the gcloud container node-pools describe command. For blue-green upgrades, the command also returns the current phase of the upgrade.

Run the following command:

gcloud container node-pools describe NODE_POOL_NAME \
--cluster=CLUSTER_NAME

Replace the following:

  • NODE_POOL_NAME: the name of the node pool to describe.
  • CLUSTER_NAME: the name of the cluster of the node pool to describe.

This command will output the current upgrade settings. The following example shows the output if you are using the blue-green upgrade strategy.

upgradeSettings:
  blueGreenSettings:
    nodePoolSoakDuration: 1800s
    standardRolloutPolicy:
      batchNodeCount: 1
      batchSoakDuration: 10s
  strategy: BLUE_GREEN

If you are using the blue-green upgrade strategy, the output also includes details about the blue-green upgrade settings and its current intermediate phase. The following example shows what this might look like:

updateInfo:
  blueGreenInfo:
    blueInstanceGroupUrls:
    - https://www.googleapis.com/compute/v1/projects/{PROJECT_ID}/zones/{LOCATION}/instanceGroupManagers/{BLUE_INSTANCE_GROUP_NAME}
    bluePoolDeletionStartTime: {BLUE_POOL_DELETION_TIME}
    greenInstanceGroupUrls:
    - https://www.googleapis.com/compute/v1/projects/{PROJECT_ID}/zones/{LOCATION}/instanceGroupManagers/{GREEN_INSTANCE_GROUP_NAME} 
    greenPoolVersion: {GREEN_POOL_VERSION}
    phase: DRAINING_BLUE_POOL

Canceling a node pool upgrade

You can cancel an upgrade at any time. To learn more about what happens when you cancel a surge upgrade, see Cancel a surge upgrade. To learn more about what happens when you cancel a blue-green upgrade, see Cancel a blue-green upgrade.

  1. Get the upgrade's operation ID:

    gcloud container operations list
    
  2. Cancel the upgrade:

    gcloud container operations cancel OPERATION_ID
    

Refer to the gcloud container operations cancel documentation.

Resuming a node pool upgrade

You can resume an upgrade by manually initiating the upgrade again, specifying the target version from the original upgrade.

If, for example, an upgrade failed, or if you paused an ongoing upgrade, you could resume the canceled upgrade by starting the same upgrade again on the node pool, specifying the target version from the initial upgrade operation.

To learn more about what happens when you resume an upgrade, see Resume a surge upgrade and blue-green upgrade.

To resume an upgrade, use the following command:

    gcloud container clusters upgrade CLUSTER_NAME \
      --node-pool=NODE_POOL_NAME \
      --cluster-version VERSION

Replace the following:

  • NODE_POOL_NAME: the name of the node pool for which you want to resume the node pool upgrade.
  • CLUSTER_NAME: the name of the cluster of the node pool for which you want to resume the upgrade.
  • VERSION: the target version of the canceled node pool upgrade.

For more information, refer to the gcloud container clusters upgrade documentation.

Rolling back a node pool upgrade

You can roll back a node pool to downgrade the upgraded nodes to their original state from before the node pool upgrade started.

Use the rollback command if an in-progress upgrade was cancelled, the upgrade failed, or the upgrade is incomplete due to a maintenance window timing out. Alternatively, if you want to specify the version, follow the instructions to downgrade the node pool.

To learn more about what happens when you roll back a node pool upgrade, see Roll back a surge upgrade or Roll back a blue-green upgrade.

To roll back an upgrade, run the following command:

gcloud container node-pools rollback NODE_POOL_NAME \
  --cluster CLUSTER_NAME

Replace the following:

  • NODE_POOL_NAME: the name of the node pool for which to to roll back the node pool upgrade.
  • CLUSTER_NAME: the name of the cluster of the node pool for which to roll back the upgrade.

Refer to the gcloud container node-pools rollback documentation.

Completing a node pool upgrade

If you are using the blue-green upgrade strategy, you can complete a node pool upgrade during the Soak phase, skipping the rest of the soak time.

To learn how completing a node pool upgrade works, see Complete a node pool upgrade.

To complete an upgrade when using the blue-green upgrade strategy, run the following command:

gcloud container node-pools complete-upgrade NODE_POOL_NAME \
  --cluster CLUSTER_NAME

Replace the following:

  • NODE_POOL_NAME: the name of the node pool for which you want to complete the upgrade.
  • CLUSTER_NAME: the name of the cluster of the node pool for which you want to complete the upgrade.

Refer to the gcloud container node-pools complete-upgrade documentation.

Known issues

If you have PodDisruptionBudget objects configured that are unable to allow any additional disruptions, node upgrades might fail to upgrade to the control plane version after repeated attempts. To prevent this failure, we recommend that you scale up the Deployment or HorizontalPodAutoscaler to allow the node to drain while still respecting the PodDisruptionBudget configuration.

To see all PodDisruptionBudget objects that do not allow any disruptions:

kubectl get poddisruptionbudget --all-namespaces -o jsonpath='{range .items[?(@.status.disruptionsAllowed==0)]}{.metadata.name}/{.metadata.namespace}{"\n"}{end}'

Although automatic upgrades might encounter the issue, the automatic upgrade process forces the nodes to upgrade. However, the upgrade takes an extra hour for every node in the istio-system namespace that violates the PodDisruptionBudget.

Troubleshooting

Resume or roll back an incomplete node pool upgrade

If GKE didn't complete a node pool upgrade, and the nodes are partially upgraded to the new version, you can either resume or roll back the upgrade. This is relevant for node pool upgrades using either node upgrade strategy, surge upgrades or blue-green upgrades.

You might have a node pool that is partially upgraded for any of the following reasons:

Follow the instructions to resume or roll back the upgrade so that all the nodes in the node pool run a consistent version. If you do nothing, GKE eventually tries to upgrade the node pool again when there is maintenance availability.

Nodes CPU usage higher than expected

You might encounter an issue where some nodes are using higher CPU usage than is expected from the running Pods.

This can occur if your cluster or nodes are not running a supported version. Review the release notes to ensure the versions you are using are available and supported. You can also run the following command to list all supported cluster and node versions:

gcloud container get-server-config

What's next