This page shows you how to resolve errors with your deployed workloads in Google Kubernetes Engine (GKE).
For more general advice about troubleshooting your applications, see Troubleshooting Applications in the Kubernetes documentation.
All errors: Check Pod status
If there are issues with a workload's Pods, Kubernetes updates the Pod status
with an error message. View these errors by checking the status of a Pod using
the Google Cloud console or the kubectl
command-line tool.
Console
Perform the following steps:
In the Google Cloud console, go to the Workloads page.
Select the workload that you want to investigate. The Overview tab displays the status of the workload.
From the Managed Pods section, click any error status message.
kubectl
To see all Pods running in your cluster, run the following command:
kubectl get pods
The output is similar to the following:
NAME READY STATUS RESTARTS AGE
POD_NAME 0/1 CrashLoopBackOff 23 8d
Potential errors are listed in the Status
column.
To get more details information about a specific Pod, run the following command:
kubectl describe pod POD_NAME
Replace POD_NAME
with the name of the Pod that you
want to investigate.
In the output, the Events
field shows more information about errors.
If you'd like more information, view the container logs:
kubectl logs POD_NAME
These logs can help you identify if a command or code in the container caused the Pod to crash.
After you identify the error, use the following sections to try and resolve the issue.
Error: CrashLoopBackOff
A status of CrashLoopBackOff
doesn't mean there's a specific error, instead it
indicates that a container is repeatedly crashing after restarting.
When a container crashes or exits shortly after starting
(CrashLoop
), Kubernetes attempts to restart the container. With each failed
restart, the delay (BackOff
) before the next attempt increases exponentially
(10s, 20s, 40s, etc.), up to a maximum of five minutes.
The following sections help you identify why your container might be crashing.
Use the Crashlooping Pods interactive playbook
Begin troubleshooting what's causing a CrashLoopBackOff
status by using the
interactive playbook in the Google Cloud console:
Go to the Crashlooping Pods interactive playbook:
In the Cluster drop-down list, select the cluster that you want to troubleshoot. If you can't find your cluster, enter the name of the cluster in the
Filter field.In the Namespace drop-down list, select the namespace that you want to troubleshoot. If you can't find your namespace, enter the namespace in the
Filter field.Work through each of the sections to help you identify the cause:
- Identify Application Errors
- Investigate Out Of Memory Issues
- Investigate Node Disruptions
- Investigate Liveness Probe Failures
- Correlate Change Events
Optional: To get notifications about future
CrashLoopBackOff
errors, in the Future Mitigation Tips section, select Create an Alert.
Inspect logs
A container might crash for many reasons, and checking a Pod's logs can aid you in troubleshooting the root cause.
You can check the logs with the Google Cloud console or the kubectl
command-line tool.
Console
Perform the following steps:
Go to the Workloads page in the Google Cloud console.
Select the workload that you want to investigate. The Overview tab displays the status of the workload.
From the Managed Pods section, click the problematic Pod.
From the Pod's menu, click the Logs tab.
kubectl
View all Pods running in your cluster:
kubectl get pods
In the output of the preceding command, look for a Pod with the
CrashLoopBackOff
error in theStatus
column.Get the Pod's logs:
kubectl logs POD_NAME
Replace
POD_NAME
with the name of the problematic Pod.You can also pass in the
-p
flag to get the logs for the previous instance of a Pod's container, if it exists.
Check the exit code of the crashed container
To better understand why your container crashed, find the exit code:
Describe the Pod:
kubectl describe pod POD_NAME
Replace
POD_NAME
with the name of the problematic Pod.Review the value in the
containers: CONTAINER_NAME: last state: exit code
field:- If the exit code is 1, the container crashed because the application crashed.
- If the exit code is 0, check how long your app was running. Containers
exit when your application's main process exits. If your app finishes
execution very quickly, the container might continue to restart. If
you experience this error, one solution is to set the
restartPolicy
field toOnFailure
. After you make this change, the app only restarts when the exit code isn't 0.
Connect to a running container
To run bash commands from the container so that you can test the network or check if you have access to files or databases used by your application, open a shell to the Pod:
kubectl exec -it POD_NAME -- /bin/bash
If there's more than one container in your Pod, add
-c CONTAINER_NAME
.
Errors: ImagePullBackOff and ErrImagePull
A status of ImagePullBackOff
or ErrImagePull
indicates that the image used
by a container cannot be loaded from the image registry.
For guidance on troubleshooting these statuses, see Troubleshoot image pulls.
Error: Pod unschedulable
A status of PodUnschedulable
indicates that your Pod cannot be scheduled
because of insufficient resources or some configuration error.
If you have configured control plane metrics, you can find more information about these errors in scheduler metrics and API server metrics.
Use the unschedulable Pods interactive playbook
You can troubleshoot PodUnschedulable
errors using the interactive playbook
in the Google Cloud console:
Go to the unschedulable Pods interactive playbook:
In the Cluster drop-down list, select the cluster that you want to troubleshoot. If you can't find your cluster, enter the name of the cluster in the
Filter field.In the Namespace drop-down list, select the namespace that you want to troubleshoot. If you can't find your namespace, enter the namespace in the
Filter field.To help you identify the cause, work through each of the sections in the playbook:
- Investigate CPU and Memory
- Investigate Max Pods per Node
- Investigate Autoscaler Behavior
- Investigate Other Failure Modes
- Correlate Change Events
Optional: To get notifications about future
PodUnschedulable
errors, in the Future Mitigation Tips section, select Create an Alert .
Error: Insufficient resources
You might encounter an error indicating a lack of CPU, memory, or another
resource. For example: No nodes are available that match all of the predicates:
Insufficient cpu (2)
which indicates that, on two nodes, there isn't enough CPU
available to fulfill a Pod's requests.
If your Pod resource requests exceed that of a single node from any eligible node pools, GKE does not schedule the Pod and also does not trigger scale up to add a new node. For GKE to schedule the Pod, you must either request fewer resources for the Pod, or create a new node pool with sufficient resources.
You can also enable node auto-provisioning so that GKE can automatically create node pools with nodes where the unscheduled Pods can run.
The default CPU request is 100m or 10% of a CPU (or
one core).
If you want to request more or fewer resources, specify the value in the Pod
specification under spec: containers: resources: requests
.
Error: MatchNodeSelector
MatchNodeSelector
indicates that there are no nodes that match the Pod's
label selector.
To verify this, check the labels specified in the Pod specification's
nodeSelector
field, under spec: nodeSelector
.
To see how nodes in your cluster are labeled, run the following command:
kubectl get nodes --show-labels
To attach a label to a node, run the following command:
kubectl label nodes NODE_NAME LABEL_KEY=LABEL_VALUE
Replace the following:
NODE_NAME
: the node that you want to add a label to.LABEL_KEY
: the label's key.LABEL_VALUE
: the label's value.
For more information, refer to Assigning Pods to Nodes in the Kubernetes documentation.
Error: PodToleratesNodeTaints
PodToleratesNodeTaints
indicates that the Pod can't be scheduled to any node
because the Pod doesn't have tolerations that correspond to existing
node taints.
To verify that this is the case, run the following command:
kubectl describe nodes NODE_NAME
In the output, check the Taints
field, which lists key-value pairs and
scheduling effects.
If the effect listed is NoSchedule
, then no Pod can be scheduled on that node
unless it has a matching toleration.
One way to resolve this issue is to remove the taint. For example, to remove a NoSchedule taint, run the following command:
kubectl taint nodes NODE_NAME key:NoSchedule-
Error: PodFitsHostPorts
The PodFitsHostPorts
error means that a node is trying to use a port that's
already occupied.
To resolve the issue, consider following
Kubernetes best practices
and use a NodePort
instead of a hostPort
.
If you must use a hostPort
, check the manifests of the Pods and make
sure that all Pods on the same node have unique values defined for hostPort
.
Error: Does not have minimum availability
If a node has adequate resources but you still see the Does not have minimum availability
message, check the Pod's status. If the status is SchedulingDisabled
or
Cordoned
status, the node cannot schedule new Pods. You can check the status of a
node using the Google Cloud console or the kubectl
command-line tool.
Console
Perform the following steps:
Go to the Google Kubernetes Engine page in the Google Cloud console.
Select the cluster that you want to investigate. The Nodes tab displays the Nodes and their status.
To enable scheduling on the node, perform the following steps:
From the list, click the node that you want to investigate.
From the Node Details section, click Uncordon.
kubectl
To get statuses of your nodes, run the following command:
kubectl get nodes
To enable scheduling on the node, run:
kubectl uncordon NODE_NAME
Error: Maximum Pods per node limit reached
If the Maximum Pods per node
limit is reached by all nodes in the cluster, the Pods will be stuck in
Unschedulable state. Under the Pod Events tab, you see a message
including the phrase Too many pods
.
To resolve this error, complete the following steps:
Check the
Maximum pods per node
configuration from the Nodes tab in GKE cluster details in the Google Cloud console.Get a list of nodes:
kubectl get nodes
For each node, verify the number of Pods running on the node:
kubectl get pods -o wide | grep NODE_NAME | wc -l
If the limit is reached, add a new node pool or add additional nodes to the existing node pool.
Issue: Maximum node pool size reached with cluster autoscaler enabled
If the node pool has reached its maximum size according to its cluster autoscaler configuration, GKE does not trigger scale up for the Pod that would otherwise be scheduled with this node pool. If you want the Pod to be scheduled with this node pool, change the cluster autoscaler configuration.
Issue: Maximum node pool size reached with cluster autoscaler disabled
If the node pool has reached its maximum number of nodes, and cluster autoscaler is disabled, GKE cannot schedule the Pod with the node pool. Increase the size of your node pool or enable cluster autoscaler for GKE to resize your cluster automatically.
Error: Unbound PersistentVolumeClaims
Unbound PersistentVolumeClaims
indicates that the Pod references a
PersistentVolumeClaim that is not bound. This error might happen if your
PersistentVolume failed to provision. You can verify that provisioning failed by
getting the events for your PersistentVolumeClaim and examining them for
failures.
To get events, run the following command:
kubectl describe pvc STATEFULSET_NAME-PVC_NAME-0
Replace the following:
STATEFULSET_NAME
: the name of the StatefulSet object.PVC_NAME
: the name of the PersistentVolumeClaim object.
This can also happen if there was a configuration error during your manual pre-provisioning of a PersistentVolume and its binding to a PersistentVolumeClaim.
To resolve this error, try to pre-provision the volume again.
Error: Insufficient quota
Verify that your project has sufficient Compute Engine quota for
GKE to scale up your cluster. If GKE attempts to
add a node to your cluster to schedule the Pod, and scaling up would exceed your
project's available quota, you receive the scale.up.error.quota.exceeded
error
message.
To learn more, see ScaleUp errors.
Issue: Deprecated APIs
Ensure that you are not using deprecated APIs that are removed with your cluster's minor version. To learn more, see GKE deprecations.
Error: Didn't have free ports for the requested Pod ports
If you see an error similar to the following, you likely have multiple Pods
on the same node with the same value defined in the hostPort
field:
0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
Binding a Pod to a hostPort
limits where GKE can schedule the
Pod because each hostIP
, hostPort
, and protocol
combination must be
unique.
To resolve the issue, consider following
Kubernetes best practices
and using a NodePort
instead of a hostPort
.
If you must use a hostPort
, check the manifests of the Pods and make
sure that all Pods on the same node have unique values defined for hostPort
.
What's next
If you need additional assistance, reach out to
Cloud Customer Care.