Troubleshoot concurrent operations


This page helps you resolve errors caused by concurrent operations in Google Kubernetes Engine (GKE).

This page is for Application developers who want to understand the cause of deployment failures, such as a Cluster is running incompatible operation error. It's also for Platform admins and operators who want to diagnose and resolve these concurrent operation errors at the cluster or node pool level. To learn more about common roles and example tasks that we reference in Google Cloud content, see Common GKE user roles and tasks.

Understand concurrent operation errors

In GKE, a cluster operation is an action that modifies the state of your cluster's control plane or its node pools. These operations are managed by GKE and can be initiated by you or by GKE for maintenance purposes. Common cluster operations include the following:

  • Creating or deleting the cluster.
  • Upgrading the cluster's control plane version.
  • Creating, updating, resizing, or deleting node pools.
  • Modifying cluster-level settings, such as enabling or disabling features.
  • Automatic control plane repairs initiated by GKE.

When you perform operations on your clusters, you might see error messages similar to the following:

Cluster is running incompatible operation OPERATION_NAME
Cluster is currently being created, deleted, updated or repaired and cannot be updated
Operation OPERATION_NAME is currently ACTIONING cluster CLUSTER_NAME. Please wait and try again once it is done

These errors can include the following values:

  • OPERATION_NAME: the unique ID for an operation that's already running on your cluster. Use this name to track the status of this pre-existing operation that's blocking your new operation from starting.
  • ACTIONING: the action that is being performed on the cluster. For example, Creating or Updating.
  • CLUSTER_NAME: the name of the cluster that the operation is targeting.

These errors occur because GKE limits how many operations can run at the same time to prevent conflicts. Generally, GKE permits only one operation at the cluster level or one operation per node pool to run concurrently. GKE also performs its own automatic actions, such as control plane upgrades, which count toward this limit and can temporarily block you from starting a new task.

Resolve concurrent operation errors

If you receive an error that indicates that another operation is in progress, identify the ongoing task and wait for it to finish:

  1. If you don't know the name of the blocking operation, list all ongoing and pending operations for your cluster:

     gcloud container operations list \
         --location=LOCATION \
         --filter '(targetLink~/clusters/CLUSTER_NAME$ OR targetLink~/clusters/CLUSTER_NAME/) AND status!=DONE' \
         --format json
    

    Replace the following:

    • LOCATION: the Compute Engine region or zone (for example, us-central1 or us-central1-a) for the cluster, depending on whether your cluster is regional or zonal.
    • CLUSTER_NAME: the name of the cluster with the failing operation.

      The output is similar to the following:

      {
         "name": "operation-0978307200000-00112233-4455-6677-8899-aabbccddeeff",
         "operationType": "UPDATE_CLUSTER",
         "selfLink": "https://container.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/operations/operation-0978307200000-00112233-4455-6677-8899-aabbccddeeff",
         "startTime": "2001-01-01T00:00:00.000000000Z",
         "status": "RUNNING",
         "targetLink": "https://container.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/clusters/CLUSTER_NAME/nodePools/NODE_POOL_NAME",
         "zone": "LOCATION"
      }
      

      In the output, review the name field for each operation. This value is the name of the operation that's blocking your new operation from starting. You need this value for the next step.

      For more information about the other fields in the output, see the API documentation for projects.locations.operations.

  2. Wait for the operation to complete:

    gcloud container operations wait OPERATION_NAME \
        --location=LOCATION
    

    Replace OPERATION_NAME with the name of a blocking operation from an error message or the preceding step.

    This command actively monitors the operation and exits when the operation is complete.

  3. After the blocking operation has a status of DONE, retry the operation that caused the error.

What's next