This page explains how to debug node issues on Google Distributed Cloud using a suite of preinstalled
debugging tools.
Overview
Each Google Distributed Cloud cluster you create is composed of several
nodes. Each node includes a distribution of
CoreOS' toolbox, a shell
script that unpacks and runs a debugging container, debug-toolbox.
debug-toolbox is a container image that includes several useful debugging
tools.
If you encounter issues with a specific node, you can attempt debugging by
connecting to the affected node, run the toolbox script to unpack and run the
debug-toolbox container, and run the tools included in the container.
Tools included in debug-toolbox container
The debug-toolbox container runs a Debian base image that includes the
following packages:
bash
curl
dnsutils
hping3
iperf3
lsof
netcat
mtr
procps
strace
tcpdump
traceroute
util-linux
Since these tools are included in the container, they don't require an internet
connection. If you want to install additional debugging tools, you use
apt-get, which does require an internet connection.
While inside the container, run one of the tools. For example,
tcpdump.
When you're finished, exit the container and close the SSH connection to the
node.
Node Problem Detector
Beginning with Google Distributed Cloud version 1.4, Node Problem
Detector,
which is enabled for all the nodes in a cluster, helps in quick detection of
some common node problems. Node Problem Detector keeps checking for possible
problems and reports the same as events and conditions on the node. If a node
misbehaves, you can check whether Node Problem Detector detected the problem by
running kubectl describe on the node and looking for the corresponding events
and conditions.
Node Problem Detector monitors generate several conditions on the node. If the
reported condition is KubeletUnhealthy or ContainerRuntimeUnhealthy, a
restart of the corresponding systemd service (kubelet or Docker) might help in
making the node healthy again.
Beginning with Google Distributed Cloud version 1.5, kubelet and docker
systemd service auto repair is enabled in Node Problem Detector. If
Node Problem Detector detects a KubeletUnhealthy or
ContainerRuntimeUnhealthy condition on the node, it tries to restart the
kubelet or docker service automatically if the duration since last restart is
above a certain threshold.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-25 UTC."],[[["\u003cp\u003eGoogle Distributed Cloud includes preinstalled debugging tools within a \u003ccode\u003edebug-toolbox\u003c/code\u003e container, accessible via the \u003ccode\u003etoolbox\u003c/code\u003e script on each node.\u003c/p\u003e\n"],["\u003cp\u003eThe \u003ccode\u003edebug-toolbox\u003c/code\u003e container contains a suite of useful tools like \u003ccode\u003etcpdump\u003c/code\u003e, \u003ccode\u003ecurl\u003c/code\u003e, \u003ccode\u003enetcat\u003c/code\u003e, and others, eliminating the need for an internet connection for their basic use.\u003c/p\u003e\n"],["\u003cp\u003eTo debug a node issue, you can SSH into the node, run \u003ccode\u003etoolbox\u003c/code\u003e, and then utilize the tools inside the \u003ccode\u003edebug-toolbox\u003c/code\u003e container.\u003c/p\u003e\n"],["\u003cp\u003eNode Problem Detector, enabled from version 1.4, quickly detects common node problems and reports them as events and conditions, while from version 1.5 it can also automatically attempt to resolve issues with \u003ccode\u003eKubeletUnhealthy\u003c/code\u003e or \u003ccode\u003eContainerRuntimeUnhealthy\u003c/code\u003e by restarting the relevant systemd service.\u003c/p\u003e\n"]]],[],null,["# Debugging node issues\n\n\u003cbr /\u003e\n\nThis page explains how to debug node issues on Google Distributed Cloud using a suite of preinstalled\ndebugging tools.\n\nOverview\n--------\n\nEach Google Distributed Cloud cluster you create is composed of several\nnodes. Each node includes a distribution of\n[CoreOS' `toolbox`](https://github.com/coreos/toolbox), a shell\nscript that unpacks and runs a debugging container, `debug-toolbox`.\n`debug-toolbox` is a container image that includes several useful debugging\n[tools](#tools).\n\nIf you encounter issues with a specific node, you can attempt debugging by\nconnecting to the affected node, run the `toolbox` script to unpack and run the\n`debug-toolbox` container, and run the tools included in the container.\n\n### Tools included in `debug-toolbox` container\n\nThe `debug-toolbox` container runs a Debian base image that includes the\nfollowing packages:\n\n- bash\n- curl\n- dnsutils\n- hping3\n- iperf3\n- lsof\n- netcat\n- mtr\n- procps\n- strace\n- tcpdump\n- traceroute\n- util-linux\n\nSince these tools are included in the container, they don't require an internet\nconnection. If you want to install additional debugging tools, you use\n`apt-get`, which does require an internet connection.\n\nUsing `toolbox`\n---------------\n\n1. [SSH into the cluster node](/anthos/clusters/docs/on-prem/1.9/how-to/ssh-cluster-node).\n\n2. Run the `toolbox` command:\n\n ```\n sudo toolbox\n ```\n\n This command starts a `debug-toolbox` container.\n3. While inside the container, run one of the [tools](#tools). For example,\n `tcpdump`.\n\n4. When you're finished, exit the container and close the SSH connection to the\n node.\n\nNode Problem Detector\n---------------------\n\nBeginning with Google Distributed Cloud version 1.4, [Node Problem\nDetector](https://github.com/kubernetes/node-problem-detector),\nwhich is enabled for all the nodes in a cluster, helps in quick detection of\nsome common node problems. Node Problem Detector keeps checking for possible\nproblems and reports the same as events and conditions on the node. If a node\nmisbehaves, you can check whether Node Problem Detector detected the problem by\nrunning `kubectl describe` on the node and looking for the corresponding events\nand conditions.\n\nNode Problem Detector monitors generate several conditions on the node. If the\nreported condition is `KubeletUnhealthy` or `ContainerRuntimeUnhealthy`, a\nrestart of the corresponding `systemd` service (kubelet or Docker) might help in\nmaking the node healthy again.\n\nBeginning with Google Distributed Cloud version 1.5, kubelet and docker\nsystemd service auto repair is enabled in Node Problem Detector. If\nNode Problem Detector detects a `KubeletUnhealthy` or\n`ContainerRuntimeUnhealthy` condition on the node, it tries to restart the\nkubelet or docker service automatically if the duration since last restart is\nabove a certain threshold."]]