Create diagnostic snapshots when advanced cluster is enabled

This document shows how to use the gkectl diagnose command to create diagnostic snapshots for troubleshooting issues in your clusters created using Google Distributed Cloud (software only) for VMware when advanced cluster is enabled. Advanced cluster is enabled when enableAdvancedClusters is set to true in the admin cluster configuration file and the user cluster configuration file. If advanced cluster isn't enabled, see Create diagnostic snapshots when advanced cluster isn't enabled.

`gkectl diagnose snapshot`

This command compresses a cluster's status, configurations, and logs into a tar file. The default configuration of the gkectl diagnose snapshot command captures the following information about your cluster:

Kubernetes version.
Status of Kubernetes resources in the kube-system and gke-system namespaces: cluster, machine, nodes, Services, Endpoints, ConfigMaps, ReplicaSets, CronJobs, Pods, and the owners of those Pods, including Deployments, DaemonSets, and StatefulSets.
Status of the control plane.
Details about each node configuration including IP addresses, iptables rules, mount points, file system, network connections, and running processes.
Container logs from the admin cluster's control-plane node, when Kubernetes API server is not available.
Logs from the gkectl diagnose snapshot command.
Logs of preflight jobs.
Logs of containers in namespaces based on the scenarios.

Credentials, including for vSphere and F5, are removed before the tar file is created.

Capture admin cluster state

To capture an admin cluster's state, run the following command:

gkectl diagnose snapshot --kubeconfig=ADMIN_CLUSTER_KUBECONFIG

Replace ADMIN_CLUSTER_KUBECONFIG with the path to the admin cluster kubeconfig file.

The output includes a list of files and the name of a tar file, as shown in the following example output:

Using "system" snapshot configuration...
Creating ssh signer for "[ADMIN_CLUSTER_NAME]"...
Taking snapshots in 10 thread(s)...
    gkectlLogs/gkectl-xxx.log
...
    kubectlCommands/[NAMESPACE]/kubectl_get_pods
    kubectlCommands/[NAMESPACE]/kubectl_get_deployments
    kubectlCommands/[NAMESPACE]/kubectl_get_daemonsets
...
    nodes/[ADMIN_CLUSTER_NODE]/commands/journalctl_-u_kubelet
    nodes/[ADMIN_CLUSTER_NODE]/files/var/log/startup.log
...
Snapshot succeeded. Output saved in [FILENAME].tar.gz.

To extract the tar file to a directory, run the following command:

tar -zxf TAR_FILE_NAME --directory EXTRACTION_DIRECTORY_NAME

Replace the following:

TAR_FILE_NAME: the name of the tar file.
EXTRACTION_DIRECTORY_NAME: the directory into which you want to extract the tar file archive.

To look at the list of files produced by the snapshot, run the following commands:

cd EXTRACTION_DIRECTORY_NAME/EXTRACTED_SNAPSHOT_DIRECTORY
ls kubectlCommands
ls nodes/NODE_NAME/commands
ls nodes/NODE_NAME/files

Replace NODE_NAME with the name of the node that you want to view the files for.

To see the details of a particular operation, open one of the files.

Specify the SSH key for the admin cluster

When you get a snapshot of the admin cluster, gkectl finds the private SSH key for the admin cluster automatically. You can also specify the key explicitly by using the --admin-ssh-key-path parameter.

Follow the instructions for Using SSH to connect to a cluster node to download the SSH keys.

In your gkectl diagnose snapshot command, set --admin-ssh-key-path to your decoded key path:

gkectl diagnose snapshot --kubeconfig=ADMIN_CLUSTER_KUBECONFIG \
    --admin-ssh-key-path=PATH_TO_DECODED_KEY

Capture user cluster state

To capture a user cluster's state, run the following command:

gkectl diagnose snapshot --kubeconfig=ADMIN_CLUSTER_KUBECONFIG \
    --cluster-name=USER_CLUSTER_NAME

The following example output includes a list of files and the name of a tar file:

Using "system" snapshot configuration...
Creating ssh signer for "[USER_CLUSTER_NAME]"...
Taking snapshots in 10 thread(s)...
    gkectlLogs/gkectl-xxx.log
...
    kubectlCommands/[NAMESPACE]/kubectl_get_pods
    kubectlCommands/[NAMESPACE]/kubectl_get_deployments
    kubectlCommands/[NAMESPACE]/kubectl_get_daemonsets
...
    nodes/[USER_CLUSTER_NODE_IP_ADDRESS]/commands/journalctl_-u_kubelet
    nodes/[USER_CLUSTER_NODE_IP_ADDRESS]/files/var/log/startup.logn
...
Snapshot succeeded. Output saved in [FILENAME].tar.gz.

Snapshot scenarios

Snapshot scenarios let you control the information that is included in a snapshot. To specify a scenario, use the --scenario flag. The following list shows the possible values:

system (default): Collect snapshot with logs in supported system namespaces.
all: Collect snapshot with logs in all of namespaces, including user defined namespaces.

To create a snapshot of the admin cluster, you don't need to specify a scenario:

gkectl diagnose snapshot \
    --kubeconfig=ADMIN_CLUSTER_KUBECONFIG

To create a snapshot of a user cluster using the system scenario:

gkectl diagnose snapshot \
    --kubeconfig=ADMIN_CLUSTER_KUBECONFIG \
    --cluster-name=USER_CLUSTER_NAME \
    --scenario=system

To create a snapshot of a user cluster using the all scenario:

gkectl diagnose snapshot \
    --kubeconfig=ADMIN_CLUSTER_KUBECONFIG \
    --cluster-name=USER_CLUSTER_NAME \
    --scenario=all

Use `--log-since` to limit a snapshot

You can use the --log-since flag to limit log collection to a recent time period. For example, you could collect only the logs from the last two days or the last three hours. By default, diagnose snapshot collects all logs.

gkectl diagnose snapshot --kubeconfig=ADMIN_CLUSTER_KUBECONFIG \
    --cluster-name=CLUSTER_NAME \
    --scenario=system \
    --log-since=DURATION

Replace <var>DURATION</var> with a time value like 120m or 48h.

The following considerations apply:

The --log-since flag is supported only for kubectl and journalctl logs.
Command flags like --log-since are not allowed in the customized snapshot configuration.

Perform a dry run for a snapshot

You can use the --dry-run flag to show the actions to be taken and the snapshot configuration.

To perform a dry run on your admin cluster, enter the following command:

gkectl diagnose snapshot --kubeconfig=ADMIN_CLUSTER_KUBECONFIG \
    --cluster-name=ADMIN_CLUSTER_NAME \
    --dry-run

To perform a dry run on a user cluster, enter the following command:

gkectl diagnose snapshot --kubeconfig=ADMIN_CLUSTER_KUBECONFIG \
    --cluster-name=USER_CLUSTER_NAME \
    --dry-run

Use a snapshot configuration

If these two scenarios (--scenario system or all) don't meet your needs, you can create a customized snapshot by passing in a snapshot configuration file using the --snapshot-config flag:

gkectl diagnose snapshot --kubeconfig=ADMIN_CLUSTER_KUBECONFIG \
    --cluster-name=USER_CLUSTER_NAME \
    --snapshot-config=SNAPSHOT_CONFIG_FILE

Generate a snapshot configuration

You can generate a snapshot configuration for a given scenario by passing in the --scenario and --dry-run flags. For example, to see the snapshot configuration for the default scenario (system) of a user cluster, enter the following command:

gkectl diagnose snapshot \
    --kubeconfig=ADMIN_CLUSTER_KUBECONFIG \
    --cluster-name=USER_CLUSTER_NAME \
    --scenario=system
    --dry-run

The output is similar to the following example:

The snapshot configuration is:
numOfParallelThreads: 10
excludeWords:
- password
kubectlCommands:
- commands:
  - kubectl version
  - kubectl cluster-info
  - kubectl top node
  - kubectl top pod -A --containers
  namespaces: []
- commands:
  - kubectl get -o wide --ignore-not-found
  - kubectl get -o yaml --ignore-not-found
  - kubectl describe
  namespaces:
  - kube-system
...
  expandResources: true
- commands:
  - kubectl logs
  namespaces:
  - kube-system
...
kindNodeCommands: []
nodeCommands:
- nodes: []
  commands:
  - uptime
  - df --all --inodes
  - ip addr
  - ip neigh
  - iptables-save --counters
  - mount
  - ip route list table all
  - top -bn1 || true
  - docker info || true
...
- nodes: []
  commands:
  - dmesg
  - systemctl status -l kubelet
  - journalctl --utc -u kubelet
  - journalctl --utc -u kubelet-monitor.service
...
nodeFiles:
- nodes: []
  files:
  - /proc/sys/fs/file-nr
  - /proc/sys/net/netfilter/nf_conntrack_max
...
- nodes: []
  files:
  - /var/log/apiserver/audit-proxy-buffer/* || true

The following information is displayed in the output:

numOfParallelThreads: Number of parallel threads used to take snapshots.
excludeWords: List of words to be excluded from the snapshot (case insensitive). Lines containing these words are removed from snapshot results. "password" is always excluded, whether or not you specify it.
kubectlCommands: List of kubectl commands to run. The results are saved. The commands run against the corresponding namespaces. For kubectl logs commands, all Pods and containers in the corresponding namespaces are added automatically. Regular expressions are supported for specifying namespaces. If you don't specify a namespace, the default namespace is assumed.
nodeCommands: List of commands to run on the corresponding nodes. The results are saved. When nodes are not specified, all nodes in the target cluster are considered.
nodeFiles: List of files to be collected from the corresponding nodes. The files are saved. When nodes are not specified, all nodes in the target cluster are considered.

What's next

If you need additional assistance, reach out to Cloud Customer Care.

You can also see Getting support for more information about support resources, including the following:

Requirements for opening a support case.
Tools to help you troubleshoot, such as logs and metrics.
Supported components, versions, and features of Google Distributed Cloud for VMware (software only).

Create diagnostic snapshots when advanced cluster is enabled Stay organized with collections Save and categorize content based on your preferences.

gkectl diagnose snapshot