Troubleshoot permission errors in Backup for GKE


This page describes permission errors you might encounter when using Backup for GKE, things to consider when performing the action, and how to resolve the error.

Error 100010107: Failed to backup PVC - Missing IAM binding - agent service account (KCP)

Error 100010107 occurs when you try to perform a Backup for GKE backup operation and the Google Kubernetes Engine cluster service agent doesn't have access to your customer-managed encryption key (CMEK), resulting in a message stating Failed to backup PVC - Missing IAM binding - agent service account (KCP).

The Google Kubernetes Engine cluster service agent, typically in the format of service-PROJECT_NUMBER@container-engine-robot.iam.gserviceaccount.com, is essential for your GKE cluster to interact with Google Cloud services. When your backup plan uses a customer-managed encryption key (CMEK). This service agent needs permissions to encrypt and decrypt your backup data using your CMEK. If the backup plan is missing the roles/cloudkms.cryptoKeyEncrypterDecrypter role on your CMEK, backup operations initiated from the cluster fail with a permission denied error.

To resolve this error, use the following troubleshooting instructions:

  1. Verify that you have the correct permissions to modify IAM policies on the Cloud Key Management Service key. For example, cloudkms.admin or roles/owner.

  2. Identify the Google Kubernetes Engine cluster service agent. This service agent is automatically created and managed by Google Cloud for your GKE clusters. For example, service-PROJECT_NUMBER@container-engine-robot.iam.gserviceaccount.com. You need the project number to put together the full service account. You can find your project number by using one of the following methods:

    • Use the Google Cloud project dashboard in the Google Cloud console.

    • Run the gcloud projects describe command using Google Cloud CLI:

      gcloud projects describe PROJECT_ID –-format="value(projectNumber)"
      

      Replace PROJECT_ID with your project ID.

  3. Locate the following CMEK information:

    • Key name: the name of your encryption key.

    • Key ring: the name of the key ring where your key resides.

    • Location: the Google Cloud location where your key is located. For example, global or us-central1.

  4. Grant the roles/cloudkms.cryptoKeyEncrypterDecrypter role at the CMEK level. The Google Kubernetes Engine service agent needs permissions on your encryption key. To grant the roles/cloudkms.cryptoKeyEncrypterDecrypter role on your CMEK, use Google Cloud CLI to run the gcloud kms key add-iam-policy-binding command:

    gcloud kms keys add-iam-policy-binding KEY_NAME \
        --keyring KEY_RING \
        --location LOCATION \
        --member "serviceAccount:service-PROJECT_NUMBER@container-engine-robot.iam.gserviceaccount.com" \
        --role roles/cloudkms.cryptoKeyEncrypterDecrypter
    

    Replace the following:

    • KEY_NAME: the name of your encryption key.

    • KEY_RING: the name of the key ring.

    • LOCATION: the Google Cloud location of your key. For example, global or us-central1.

    • PROJECT_NUMBER: the name of the project.

    The output is similar to the following:

     - members:
     - serviceAccount:service-123456789012@container-engine-robot.iam.gserviceaccount.com
     role: roles/cloudkms.cryptoKeyEncrypterDecrypter
     ```
    
  5. Re-attempt the Backup for GKE operation. If the operations continues to fail, contact Cloud Customer Care for further assistance.

Error 100010106: Failed to backup PVC - Missing IAM binding for Backup for GKE service agent

Error 100010106 occurs when an attempt to back up a PersistentVolumeClaim fails due to a missing Identity and Access Management binding for your Backup for GKE service agent, resulting in an error message stating Failed to backup PVC - Missing IAM binding for Backup for GKE service agent.

Backup for GKE requires permissions to use your BackupPlan's customer-managed encryption key (CMEK) for encrypting and decrypting volumes Persistent Disks. When the Backup for GKE service agent lacks the roles/cloudkms.cryptoKeyEncrypterDecrypter role on your BackupPlan CMEK, backup operations fail.

To resolve this error, use the following instructions:

  1. Identify the Google-managed Backup for GKE service agent specific to your project. For example, service-PROJECT_NUMBER@gcp-sa-gkebackup.iam.gserviceaccount.com. You can find your project number by using the following methods:

    • Use the Google Cloud project dashboard in the Google Cloud console.

    • Run the gcloud projects describe command using Google Cloud CLI:

      gcloud projects describe PROJECT_ID –format="value(projectNumber)"
      

      Replace PROJECT_ID with the unique name of your project.

  2. Identify the following CMEK details:

    • Key name: the name of your encryption key.

    • Key ring: the name of the key ring where your key resides.

    • Location: the Google Cloud location where your BackupPlan CMEK is located. For example, global or us-central1.

  3. To grant the Backup for GKE service agent the roles/cloudkms.cryptoKeyEncrypterDecrypter role on your CMEK, use Google Cloud CLI to run the gcloud kms keys add-iam-policy-binding command:

    gcloud kms keys add-iam-policy-binding KEY_NAME \
        --keyring KEY_RING \
        --location LOCATION \
        --member "serviceAccount:service-PROJECT_NUMBER@gcp-sa-gkebackup.iam.gserviceaccount.com" \
        --role roles/cloudkms.cryptoKeyEncrypterDecrypter
    

    Replace the following:

    • KEY_NAME: the name of your encryption key.

    • KEY_RING: the name of the key ring.

    • LOCATION: the Google Cloud location of your key. For example, global or us-central1.

    • PROJECT_NUMBER: your Google Cloud project number.

  4. Verify that you have the required Identity and Access Management permissions on the Cloud Key Management Service key. For example, roles/cloudkms.admin or roles/owner.

  5. Verify that you have the granted permissions. In the output of the previous gcloud kms keys add-iam-policy-binding command, look for an entry similar to the following:

    -members:
    -serviceAccount:service-123456789012@gcp-sa-gkebackup.iam.gserviceaccount.com
    role: roles/cloudkms.cryptoKeyEncrypterDecrypter
    
  6. Retest the backup operation after you grant the necessary permissions. If the operation doesn't complete successfully, contact Cloud Customer Care for further assistance.

Error 100010101: Failed to backup PersistentVolumeClaim - Missing IAM binding for tenant project

Error 100010101 occurs an attempt to back up a PersistentVolumeClaim fails due to a missing Identity and Access Management binding for your tenant project, resulting in an error message stating Failed to backup PersistentVolumeClaim - Missing IAM binding for tenant project.

Backup for GKE operates by creating snapshots of your GKE cluster's Persistent Disk. The snapshots reside in your Google Cloud project, also known as the consumer project, and are created within a tenant project that is managed by Google Cloud. The tenant project exists within the google.com organization, separate from your own organization.

The service agent within the tenant project requires specific permissions to use the customer-managed encryption key (CMEK) that encrypts the Persistent Disk that is referenced by your cluster's PersistentVolumeClaim. This permission is needed for encrypting and decrypting the snapshot data. If the service-TENANT_PROJECT_NUMBER@compute-system.iam.gserviceaccount.com service agent lacks the roles/cloudkms.cryptoKeyEncrypterDecrypter role on your disk's CMEK, the backup operation fails.

To resolve this error, use the following instructions:

  1. Verify that you have sufficient IAM permissions to modify IAM policies on the Cloud Key Management Service key such as roles/cloudkms.admin or roles/owner.

  2. Locate the tenant project's Compute Engine service agent by using the TENANT_PROJECT_NUMBER value that's in the status reason message of your failed backup operation. For example, service-TENANT_PROJECT_NUMBER@compute-system.iam.gserviceaccount.com.

  3. Locate the following CMEK information used for your encrypted Persistent Disk:

    • Key name: the name of your encryption key.

    • Key ring: the name of the key ring where your key resides.

    • Location: The Google Cloud location where your key is located. For example, global or us-central1.

  4. To grant the tenant project's Compute Engine service agent the roles/cloudkms.cryptoKeyEncrypterDecrypter role on your CMEK, run the following command:

    gcloud kms keys add-iam-policy-binding KEY_NAME \
        --keyring KEY_RING \
        --location LOCATION \
        --member "serviceAccount:service-TENANT_PROJECT_NUMBER@compute-system.iam.gserviceaccount.com" \
        --role roles/cloudkms.cryptoKeyEncrypterDecrypter
    

    Replace the following:

    • KEY_NAME: the name of your encryption key.

    • KEY_RING: the name of the key ring.

    • LOCATION: the Google Cloud location of your key. For example, global or us-central1.

    • TENANT_PROJECT_NUMBER: the tenant project number that you obtained from the status reason message of your failed backup operation.

    If the command is successful, the output looks like the following:

    - members:
    - serviceAccount:service-987654321098@compute-system.iam.gserviceaccount.com
    role: roles/cloudkms.cryptoKeyEncrypterDecrypter
    
  5. Retest the backup operation. If the operation is still unsuccessful, contact Cloud Customer Care for further assistance.

Error 100010104: Failed to backup PersistentVolumeClaim - Org policy constraint violation while creating snapshot

Error 100010104 occurs when an attempt to back up a PersistentVolumeClaim fails due to an organization policy constraint violation during snapshot creation, resulting in an error message stating Failed to backup PersistentVolumeClaim - Org policy constraint violation while creating snapshot.

Backup for GKE operates by creating snapshots of your GKE cluster's Persistent Disk. The snapshots reside in your Google Cloud project, also known as the consumer project, and are created within a tenant project that is managed by Google Cloud. The tenant project exists within the google.com organization, separate from your own organization.

Your organization policy dictates where storage resources can be created. The Constraint constraints/compute.storageResourceUseRestrictions violated error means that a resource or snapshot is violating the policy by being created in a tenant project that isn't part of your allowed organizational structure. Because the tenant project is within Google's organization, it falls outside of your defined policy, which leads to the backup failure.

To resolve this error, use the following instructions:

  1. Locate the organization policy that implements the constraints/compute.storageResourceUseRestrictions constraint. For more information about how to view organization policies using the Google Cloud console, see Viewing organization policies.

  2. Modify the constraints/compute.storageResourceUseRestrictions policy to include the folders/77620796932 tenant project folder used by Backup for GKE in its allowlist.

  3. Save the policy changes after you add the folder to the allowlist.

  4. Retest the backup operation after the organization policy updates and propagates, which usually takes a few minutes. The backup should proceed without violating the storage resource use restrictions. If the operation is still unsuccessful, contact Cloud Customer Care for further assistance.