Troubleshoot 4xx errors


This page helps you resolve 400, 401, 403, and 404 errors that you might encounter when using Google Kubernetes Engine (GKE).

Issue: Authentication and authorization errors

When connecting to GKE clusters, you can get an authentication and authorization error with HTTP status code 401 (Unauthorized). This issue might occur when you try to run a kubectl command in your GKE cluster from a local environment.

The cause of this issue might be one of the following:

  • The gke-gcloud-auth-plugin authentication plugin is not correctly installed or configured.
  • You lack the permissions to connect to the cluster API server and run kubectl commands.

To diagnose the cause, complete the steps in the following sections:

  1. Connect to the cluster using curl
  2. Configure the plugin in kubeconfig

Connect to the cluster using curl

To diagnose the cause of the authentication and authorization error, connect to the cluster using curl. Using curl bypasses the kubectl command-line tool and the gke-gcloud-auth-plugin plugin.

  1. Set environment variables:

    APISERVER=https://$(gcloud container clusters describe CLUSTER_NAME \
        --location=COMPUTE_LOCATION --format "value(endpoint)")
    TOKEN=$(gcloud auth print-access-token)
    
  2. Verify that your access token is valid:

    curl https://oauth2.googleapis.com/tokeninfo?access_token=$TOKEN
    

    When you have a valid access token, this command sends a request to Google's OAuth 2.0 server and the server responds with information about the token.

  3. Try to connect to the core API endpoint in the API server:

    # Get cluster CA certificate
    gcloud container clusters describe CLUSTER_NAME \
        --location=COMPUTE_LOCATION \
        --format "value(masterAuth.clusterCaCertificate)" | \
        base64 -d > /tmp/ca.crt
    
    # Make API call with authentication and CA certificate
    curl -s -X GET "${APISERVER}/api/v1/namespaces" \
        --header "Authorization: Bearer $TOKEN" \
        --cacert /tmp/ca.crt
    

    If the curl command succeeds, you'll see a list of namespaces. Proceed to check whether the plugin is the cause using the steps in the Configure the plugin in kubeconfig section.

    If the curl command fails with an output that is similar to the following, then you don't have the correct permissions to access the cluster:

    {
    "kind": "Status",
    "apiVersion": "v1",
    "metadata": {},
    "status": "Failure",
    "message": "Unauthorized",
    "reason": "Unauthorized",
    "code": 401
    }
    

    To resolve this issue, consult your administrator to get the correct permissions to access the cluster.

Configure use of the plugin in kubeconfig

If you're getting authentication and authorization errors when connecting to your clusters but were able to connect to the cluster using curl, then ensure that you can access your cluster without needing the gke-gcloud-auth-plugin plugin.

To resolve this issue, configure your local environment to ignore the gke-gcloud-auth-plugin binary when authenticating to the cluster. In Kubernetes clients running version 1.25 and later, the gke-gcloud-auth-plugin binary is required, so you need to use a version of 1.24 or earlier for the kubectl command-line tool.

Follow these steps to access your cluster without needing the plugin:

  1. Install the kubectl command-line tool with version 1.24 or earlier using curl. The following example installs the tool with version 1.24:

    curl -LO https://dl.k8s.io/release/v1.24.0/bin/linux/amd64/kubectl
    
  2. Open your shell startup script file in a text editor. For example, open .bashrc for the Bash shell:

    vi ~/.bashrc
    

    If you are using macOS, use ~/.bash_profile instead of .bashrc in these instructions.

  3. Add the following line to the startup script file and save it:

    export USE_GKE_GCLOUD_AUTH_PLUGIN=False
    
  4. Run the startup script:

    source ~/.bashrc
    
  5. Get credentials for your cluster, which sets up your .kube/config file:

    gcloud container clusters get-credentials CLUSTER_NAME \
        --location=COMPUTE_LOCATION
    

    Replace the following:

  6. Run a kubectl command. For example:

    kubectl cluster-info
    

    If you get a 401 error or a similar authorization error after running these commands, ensure that you have the correct permissions, then rerun the step that returned the error.

Error 400: Node pool requires recreation

The following error can occur when you try to perform an action that recreates your control plane and nodes:

ERROR: (gcloud.container.clusters.update) ResponseError: code=400, message=Node pool "test-pool-1" requires recreation.

For example, this error can occur when you complete an ongoing credential rotation.

On the backend, node pools are marked for recreation, but the actual recreation operation might take some time to begin. Because of this, the operation fails because GKE has not recreated one or more node pools in your cluster yet.

To resolve this issue, choose one of the following solutions:

  • Wait for the recreation to happen. This might take hours, days, or weeks depending on factors such as existing maintenance windows and exclusions.
  • Manually start a recreation of the affected node pools by starting a version upgrade to the same version as the control plane.

    To start a recreation, run the following command:

    gcloud container clusters upgrade CLUSTER_NAME \
        --node-pool=POOL_NAME
    

    After the upgrade completes, try the operation again.

Error 401: Unauthorized

GKE uses IAM service accounts that are attached to your nodes to run system tasks like logging and monitoring. At a minimum, these node service accounts must have the Kubernetes Engine Default Node Service Account (roles/container.defaultNodeServiceAccount) role on your project. By default, GKE uses the Compute Engine default service account, which is automatically created in your project, as the node service account.

If your organization enforces the iam.automaticIamGrantsForDefaultServiceAccounts organization policy constraint, the default Compute Engine service account in your project might not automatically get the required permissions for GKE.

  1. Find the name of the service account that your nodes use:

    console

    1. Go to the Kubernetes clusters page:

      Go to Kubernetes clusters

    2. In the cluster list, click the name of the cluster that you want to inspect.
    3. Depending on the cluster mode of operation, do one of the following:
      • For Autopilot mode clusters, in the Security section, find the Service account field.
      • For Standard mode clusters, do the following:
        1. Click the Nodes tab.
        2. In the Node pools table, click a node pool name. The Node pool details page opens.
        3. In the Security section, find the Service account field.

    If the value in the Service account field is default, your nodes use the Compute Engine default service account. If the value in this field is not default, your nodes use a custom service account. To grant the required role to a custom service account, see Use least privilege IAM service accounts.

    gcloud

    For Autopilot mode clusters, run the following command:

    gcloud container clusters describe CLUSTER_NAME \
        --location=LOCATION \
        --flatten=autoscaling.autoprovisioningNodePoolDefaults.serviceAccount

    For Standard mode clusters, run the following command:

    gcloud container clusters describe CLUSTER_NAME \
        --location=LOCATION \
        --format="table(nodePools.name,nodePools.config.serviceAccount)"

    If the output is default, your nodes use the Compute Engine default service account. If the output is not default, your nodes use a custom service account. To grant the required role to a custom service account, see Use least privilege IAM service accounts.

  2. To grant the roles/container.defaultNodeServiceAccount role to the Compute Engine default service account, complete the following steps:

    console

    1. Go to the Welcome page:

      Go to Welcome

    2. In the Project number field, click Copy to clipboard.
    3. Go to the IAM page:

      Go to IAM

    4. Click Grant access.
    5. In the New principals field, specify the following value:
      PROJECT_NUMBER-compute@developer.gserviceaccount.com
      Replace PROJECT_NUMBER with the project number that you copied.
    6. In the Select a role menu, select the Kubernetes Engine Default Node Service Account role.
    7. Click Save.

    gcloud

    1. Find your Google Cloud project number:
      gcloud projects describe PROJECT_ID \
          --format="value(projectNumber)"

      Replace PROJECT_ID with your project ID.

      The output is similar to the following:

      12345678901
      
    2. Grant the roles/container.defaultNodeServiceAccount role to the Compute Engine default service account:
      gcloud projects add-iam-policy-binding PROJECT_ID \
          --member="serviceAccount:PROJECT_NUMBER-compute@developer.gserviceaccount.com" \
          --role="roles/container.defaultNodeServiceAccount"

      Replace PROJECT_NUMBER with the project number from the previous step.

Error 403: Insufficient permissions

The following error occurs when you try to connect to a GKE cluster using gcloud container clusters get-credentials, but the account doesn't have permission to access the Kubernetes API server:

ERROR: (gcloud.container.clusters.get-credentials) ResponseError: code=403, message=Required "container.clusters.get" permission(s) for "projects/<your-project>/locations/<region>/clusters/<your-cluster>".

To resolve this issue, complete the following steps:

  1. Identify the account that has the access issue:

    gcloud auth list
    
  2. Grant the required access to the account using the instructions in Authenticating to the Kubernetes API server.

Error 403: Retry budget exhausted

The following error can occur when you try to create a GKE cluster:

Error: googleapi: Error 403: Retry budget exhausted: Google Compute Engine:
Required permission 'PERMISSION_NAME' for 'RESOURCE_NAME'.

In this error message, the following variables apply:

  • PERMISSION_NAME: the name of a permission, like compute.regions.get.
  • RESOURCE_NAME: the path to the Google Cloud resource that you were trying to access, like a Compute Engine region.

This error occurs if the IAM service account attached to the cluster doesn't have the minimum required permissions to create the cluster.

To resolve this issue, do the following:

  1. Create or modify an IAM service account to have all of the required permissions to run a GKE cluster. For instructions, see Use least privilege IAM service accounts.
  2. Specify the updated IAM service account in your cluster creation command by using the --service-account flag. For instructions, see Create an Autopilot cluster.

Alternatively, omit the --service-account flag to let GKE use the Compute Engine default service account in the project, which has the required permissions by default.

Error 404: Resource not found

If you get an error 404, resource not found, when calling gcloud container commands, resolve the issue by re-authenticating to the Google Cloud CLI:

gcloud auth login

Error 400/403: Missing edit permissions on account

A missing edit permissions on account error (error 400 or 403), indicates that one of the following has been deleted or edited manually:

When you enable the Compute Engine or Kubernetes Engine API, Google Cloud creates the following service accounts and agents:

  • Compute Engine default service account in your project. GKE attaches this service account to nodes by default for system tasks like logging and monitoring.
  • Google APIs Service Agent in a Google-managed project, with edit permissions on your project.
  • Google Kubernetes Engine service agent in a Google-managed project, with the Kubernetes Engine Service Agent role on your project.

Cluster creation and all management fails if, at any point, someone edits those permissions, removes the role bindings on the project, removes the service account entirely, or disables the API.

Verify permissions for the GKE service agent

To verify whether the Google Kubernetes Engine service account has the Kubernetes Engine Service Agent role assigned on the project, complete the following steps:

  1. Determine the name of your Google Kubernetes Engine service account. All service accounts have the following format:

    service-PROJECT_NUMBER@container-engine-robot.iam.gserviceaccount.com
    

    Replace PROJECT_NUMBER with your project number.

  2. Verify that your Google Kubernetes Engine service account doesn't have the Kubernetes Engine Service Agent role assigned on the project:

    gcloud projects get-iam-policy PROJECT_ID
    

    Replace PROJECT_ID with your project ID.

To fix the issue, if someone removed the Kubernetes Engine Service Agent role from your Google Kubernetes Engine service account, add it back. Otherwise, use the following instructions to re-enable the Kubernetes Engine API, which restores your service accounts and permissions:

Console

  1. Go to the APIs & Services page in the Google Cloud console.

    Go to APIs & Services

  2. Select your project.

  3. Click Enable APIs and Services.

  4. Search for Kubernetes, then select the API from the search results.

  5. Click Enable. If you have previously enabled the API, you must first disable it and then enable it again. It can take several minutes for API and related services to be enabled.

gcloud

Run the following commands in the gcloud CLI:

PROJECT_NUMBER=$(gcloud projects describe "PROJECT_ID"
    --format 'get(projectNumber)')
gcloud projects add-iam-policy-binding PROJECT_ID \
    --member "serviceAccount:service-${PROJECT_NUMBER?}@container-engine-robot.iam.gserviceaccount.com" \
    --role roles/container.serviceAgent

What's next

If you need additional assistance, reach out to Cloud Customer Care.