You can diagnose or check clusters to debug issues and capture a snapshot of the cluster state. Additionally, if you have partially succeeded with an installation, but the cluster returns errors, or is not performing properly, you can try to reset the cluster.
Diagnosing clusters with bmctl check cluster
You can capture the state of your created clusters with the
bmctl check cluster command. The flags for the commmand let you choose the
diagnostic scope of the command so you can get focused information.
The diagnostic information can help you discover issues and debug your deployments more effectively. The command captures all relevant cluster and node configuration files for your defined scope, and then packages the information into a single tar archive.
bmctl check cluster --snapshot --cluster CLUSTER_NAME --admin-kubeconfig ADMIN_KUBECONFIG
This command outputs a tar archive that includes relevant debug information from all system components and machines in the cluster you specified.
ADMIN_KUBECONFIGspecifies the path to the kubeconfig file, and CLUSTER_NAME specifies the
name of the cluster.
You can change the scope of the diagnostic information collected with the following command flags:
- The
--snapshot-scenario allflag increases the scope of the diagnostic snapshot to include all the Pods in the specified cluster:
bmctl check cluster --snapshot --snapshot-scenario all --cluster CLUSTER_NAME --admin-kubeconfig ADMIN_KUBECONFIG
- The
--snapshot-dry-runflag works in conjunction with the--snapshot-config stringflag. Use the--snapshot-dry-runflag to output a configuration file that you can modify to define a custom diagnostic scope. Your scope can include specific pods, namespaces, or node commands.
After you modify the output file created with the --snapshot-dry-run flag,
you can use it as input to diagnose your specific scope with the
--snapshot-config string flag, described below. If you omit this flag, a
default configuration is applied.
bmctl check cluster --snapshot --snapshot-dry-run --cluster CLUSTER_NAME --admin-kubeconfig ADMIN_KUBECONFIG
- The
--snapshot-configflag tells thebmctlcommand to use the scope options specified in a snapshot configuration file. Generally, you create the snapshot configuration file with the--snapshot-dry-runflag.
bmctl check cluster --snapshot --snapshot-config SNAPSHOT_CONFIG_FILE --cluster CLUSTER_NAME --admin-kubeconfig ADMIN_KUBECONFIG
Resetting clusters with bmctl reset cluster
When a cluster fails to install correctly, you can try to return the nodes to a clean state by resetting it. Then you can re-install the cluster after making configuration changes.
To reset a cluster, issue the following command:
bmctl reset --cluster CLUSTER_NAME
The reset command applies to the entire cluster. There is no option to target a subset of nodes in a cluster.
Output from the command looks similar to this sample:
bmctl reset -c cluster1 Creating bootstrap cluster... OK Deleting GKE Hub member admin in project my-gcp-project... Successfully deleted GKE Hub member admin in project my-gcp-project Loading images... OK Starting reset jobs... Resetting: 1 Completed: 0 Failed: 0 ... Resetting: 0 Completed: 1 Failed: 0 Flushing logs... OK
Reset cluster details
During the reset operation, bmctl first attempts to delete the GKE hub
membership registration, and then cleans up the affected nodes.
During the reset, storage mounts and data from the anthos-system StorageClass
are also deleted.
For all nodes, bmctl runs kubeadm reset, removes the tunnel interfaces
used for cluster networking, and deletes the following directories:
- /etc/kubernetes
- /etc/cni/net.d
- /root/.kube
- /var/lib/kubelet
For load balancer nodes, bmctl also performs the following actions:
- Disables
keepalivedandhaproxyservices - Deletes the configuration files for
keepalivedandhaproxy
The reset tool expects the cluster configuration file to be at the following location under the current working directory:
bmctl-workspace/cluster name/cluster name.yaml