Troubleshoot the kubectl command-line tool


This page shows you how to resolve issues with the kubectl command-line tool when you're working in Google Kubernetes Engine (GKE). For more general advice, see Troubleshooting kubectl in the Kubernetes documentation.

Authentication and authorization errors

If you're experiencing errors related to authentication and authorization when using the kubectl command-line tool commands, read the following sections for advice.

Error: 401 (Unauthorized)

When connecting to GKE clusters, you can get an authentication and authorization error with HTTP status code 401 (Unauthorized). This issue might occur when you try to run a kubectl command in your GKE cluster from a local environment. To learn more, see Issue: Authentication and authorization errors.

Error: Insufficient authentication scopes

When you run gcloud container clusters get-credentials, you might receive the following error:

ERROR: (gcloud.container.clusters.get-credentials) ResponseError: code=403, message=Request had insufficient authentication scopes.

This error occurs because you are attempting to access the Kubernetes Engine API from a Compute Engine VM that doesn't have the cloud-platform scope.

To resolve this error, grant the missing cloud-platform scope. For instructions on changing the scopes on your Compute Engine VM instance, see Creating and enabling service accounts for instances in the Compute Engine documentation.

Error: Executable gke-gcloud-auth-plugin not found

Error messages similar to the following can occur while trying to run kubectl commands or custom clients interacting with GKE:

Unable to connect to the server: getting credentials: exec: executable gke-gcloud-auth-plugin not found

It looks like you are trying to use a client-go credential plugin that is not installed.

To learn more about this feature, consult the documentation available at:
      https://kubernetes.io/docs/reference/access-authn-authz/authentication/#client-go-credential-plugins

Visit cloud.google.com/kubernetes-engine/docs/how-to/cluster-access-for-kubectl#install_plugin to install gke-gcloud-auth-plugin.
Unable to connect to the server: getting credentials: exec: fork/exec /usr/lib/google-cloud-sdk/bin/gke-gcloud-auth-plugin: no such file or directory

To resolve the issue, install the gke-gcloud-auth-plugin as described in Install required plugins.

Error: No auth provider found

The following error occurs if kubectl or custom Kubernetes clients have been built with Kubernetes client-go version 1.26 or later:

no Auth Provider found for name "gcp"

To resolve this issue, complete the following steps:

  1. Install gke-gcloud-auth-plugin as described in Install required plugins.

  2. Update to the latest version of the gcloud CLI:

    gcloud components update
    
  3. Update the kubeconfig file:

    gcloud container clusters get-credentials CLUSTER_NAME \
        --region=COMPUTE_REGION
    

    Replace the following:

    • CLUSTER_NAME: the name of your cluster.
    • COMPUTE_REGION: the Compute Engine region for your cluster. For zonal clusters, use --zone=COMPUTE_ZONE.

Error: The gcp auth plugin is deprecated, use gcloud instead

You might see the following warning message after you install the gke-gcloud-auth-plugin and run a kubectl command against a GKE cluster:

WARNING: the gcp auth plugin is deprecated in v1.22+, unavailable in v1.25+; use gcloud instead.

This message appears if your client version is earlier than 1.26.

To resolve this issue, tell your client to use the gke-gcloud-auth-plugin authentication plugin instead:

  1. Open your shell login script in a text editor:

    Bash

    vi ~/.bashrc

    Zsh

    vi ~/.zshrc

    If you're using PowerShell, skip this step.

  2. Set the following environment variable:

    Bash

    export USE_GKE_GCLOUD_AUTH_PLUGIN=True
    

    Zsh

    export USE_GKE_GCLOUD_AUTH_PLUGIN=True
    

    PowerShell

    [Environment]::SetEnvironmentVariable('USE_GKE_GCLOUD_AUTH_PLUGIN', True, 'Machine')
    
  3. Apply the variable in your environment:

    Bash

    source ~/.bashrc

    Zsh

    source ~/.zshrc
    

    PowerShell

    Exit the terminal and open a new terminal session.

  4. Update the gcloud CLI:

    gcloud components update
    
  5. Authenticate to your cluster:

    gcloud container clusters get-credentials CLUSTER_NAME \
        --region=COMPUTE_REGION
    

    Replace the following:

    • CLUSTER_NAME: the name of your cluster.
    • COMPUTE_REGION: the Compute Engine region for your cluster. For zonal clusters, use --zone=COMPUTE_ZONE.

Issue: The kubectl command isn't found

If you receive a message that the kubectl command isn't found, reinstall the kubectl binary and set your $PATH environment variable:

  1. Install the kubectl binary:

    gcloud components update kubectl
    
  2. When the installer prompts you to modify your $PATH environment variable, enter y to proceed. Modifying this variable lets you use kubectl commands without typing their full path.

    Alternatively, add the following line to wherever your shell stores environment variables, such as ~/.bashrc (or ~/.bash_profile in macOS):

    export PATH=$PATH:/usr/local/share/google/google-cloud-sdk/bin/
    
  3. Run the following command to load your updated file. The following example uses .bashrc:

    source ~/.bashrc
    

    If you are using macOS, use ~/.bash_profile instead of .bashrc.

Issue: kubectl commands return "connection refused" error

If kubectl commands return a "connection refused" error, then you need to set the cluster context with the following command:

gcloud container clusters get-credentials CLUSTER_NAME

Replace CLUSTER_NAME with the name of your cluster. If you're unsure of what to enter for the cluster name, use the following command to list your clusters:

gcloud container clusters list

Error: kubectl command timed out

If you created a cluster and attempted to run a kubectl command against the cluster but the kubectl command times out, you'll see an error similar to the following:

  • Unable to connect to the server: dial tcp IP_ADDRESS: connect: connection timed out
  • Unable to connect to the server: dial tcp IP_ADDRESS: i/o timeout.

These errors indicate that kubectl is unable to communicate with the cluster control plane.

To resolve this issue, verify and set the context where the cluster is set and ensure connectivity to the cluster:

  1. Go to $HOME/.kube/config or run the command kubectl config view to verify that the config file contains the cluster context and the external IP address of the control plane.

  2. Set the cluster credentials:

    gcloud container clusters get-credentials CLUSTER_NAME \
        --location=COMPUTE_LOCATION \
        --project=PROJECT_ID
    

    Replace the following:

    • CLUSTER_NAME: the name of your cluster.
    • COMPUTE_LOCATION: the Compute Engine location.
    • PROJECT_ID: the ID of the project that the cluster was created in.
  3. If the cluster is a private GKE cluster, then ensure that its list of existing authorized networks includes the outgoing IP of the machine that you are attempting to connect from. You can find your existing authorized networks in the console or by running the following command:

    gcloud container clusters describe CLUSTER_NAME \
        --location=COMPUTE_LOCATION \
        --project=PROJECT_ID \
        --format "flattened(masterAuthorizedNetworksConfig.cidrBlocks[])"
    

    If the outgoing IP of the machine is not included in the list of authorized networks from the output of the preceding command, then complete one of the following steps:

Error: kubectl commands return failed to negotiate an api version

If kubectl commands return a failed to negotiate an API version error, then you need to ensure kubectl has authentication credentials:

gcloud auth application-default login

Issue: kubectl logs, attach, exec, or port-forward command stops responding

If the kubectl logs, attach, exec, or port-forward commands stop responding, typically the API server is unable to communicate with the nodes.

First, check if your cluster has any nodes. If you've scaled down the number of nodes in your cluster to zero, the commands won't work. To resolve this issue, resize your cluster to have at least one node.

If your cluster has at least one node, then check whether you are using SSH or Konnectivity proxy tunnels to enable secure communication. The following sections discuss the troubleshooting steps specific to each service:

Troubleshoot SSH issues

If you're using SSH, GKE saves an SSH public key file in your Compute Engine project metadata. All Compute Engine VMs using Google-provided images regularly check their project's common metadata and their instance's metadata for SSH keys to add to the VM's list of authorized users. GKE also adds a firewall rule to your Compute Engine network for allowing SSH access from the control plane's IP address to each node in the cluster.

The following settings can cause issues with SSH communication:

  • Your network's firewall rules don't allow for SSH access from the control plane.

    All Compute Engine networks are created with a firewall rule called default-allow-ssh that allows SSH access from all IP addresses (requiring a valid private key). GKE also inserts an SSH rule for each public cluster of the form gke-CLUSTER_NAME-RANDOM_CHARACTERS-ssh that allows SSH access specifically from the cluster's control plane to the cluster's nodes.

    If neither of these rules exists, then the control plane can't open SSH tunnels.

    To verify that this is the cause of the issue, check whether your configuration has these rules.

    To resolve this issue, identify the tag that's on all of the cluster's nodes, then re-add a firewall rule allowing access to VMs with that tag from the IP address of the control plane.

  • Your project's common metadata entry for ssh-keys is full.

    If the project's metadata entry named ssh-keys is close to its maximum size limit, then GKE isn't able to add its own SSH key for opening SSH tunnels.

    To verify that this is the issue, check the length of the list of ssh-keys. You can see your project's metadata by running the following command, optionally including the --project flag:

    gcloud compute project-info describe [--project=PROJECT_ID]
    

    To resolve this issue, delete some of the SSH keys that are no longer needed.

  • You have set a metadata field with the key ssh-keys on the VMs in the cluster.

    The node agent on VMs prefers per-instance SSH keys to project-wide SSH keys, so if you've set any SSH keys specifically on the cluster's nodes, then the control plane's SSH key in the project metadata won't be respected by the nodes.

    To verify that this is the issue, run gcloud compute instances describe VM_NAME and look for an ssh-keys field in the metadata.

    To resolve this issue, delete the per-instance SSH keys from the instance metadata.

Troubleshoot Konnectivity proxy issues

You can determine whether your cluster uses the Konnectivity proxy by checking for the following system Deployment:

kubectl get deployments konnectivity-agent --namespace kube-system

The following settings can cause issues with the Konnectivity proxy:

  • Your network's firewall rules don't allow for Konnectivity agent access to the control plane.

    On cluster creation, Konnectivity agent Pods establish and maintain a connection to the control plane on port 8132. When one of the kubectl commands is run, the API server uses this connection to communicate with the cluster.

    If your network's firewall rules contain Egress Deny rule(s), it can prevent the agent from connecting.

    To verify that this is the cause of the issue, check your network's firewall rules to see whether they contain Egress Deny rule(s).

    To resolve this issue, allow Egress traffic to the cluster control plane on port 8132. (For comparison, the API server uses 443).

  • Your cluster's network policy blocks ingress from kube-system namespace to workload namespace.

    These features are not required for the correct functioning of the cluster. If you prefer to keep your cluster's network locked down from all outside access, be aware that features like these won't work.

    To verify that this is the cause of the issue, find the network policies in the affected namespace by running the following command:

    kubectl get networkpolicy --namespace AFFECTED_NAMESPACE
    

    To resolve this issue, add the following to the spec.ingress field of the network policies:

    - from:
      - namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: kube-system
        podSelector:
          matchLabels:
            k8s-app: konnectivity-agent
    

What's next

If you need additional assistance, reach out to Cloud Customer Care.