Learn about steps that you might find helpful when troubleshooting using GKE on AWS.
Overview
This topic is divided into the following sections:
- Getting information on your cluster to understand the operations of your GKE on AWS installation.
- Troubleshooting steps for common issues.
- "snapshot" of your cluster's status to provide information to Google Cloud Support.
Getting information on your cluster
This section explains how to retrieve information on your GKE on AWS installation for troubleshooting problems.
Getting information from the management cluster
You can query your management service for information on your user clusters.
From your
anthos-aws
directory, useanthos-gke
to switch context to your management service.cd
anthos-aws anthos-gke aws management get-credentialsUse
kubectl get
to retrieve a cluster's basic status information.env HTTPS_PROXY=http://localhost:8118 \ kubectl get AWSCluster
The output includes each cluster's name, current state, age, version, and endpoint.
NAME STATE AGE VERSION ENDPOINT cluster-0 Provisioning 2m41s 1.25.5-gke.2100 gke-<var>endpoint</var>.elb.us-east-1.amazonaws.com
You can retrieve more information about a cluster with kubectl describe
.
env HTTPS_PROXY=http://localhost:8118 \
kubectl describe AWSCluster cluster-0
There are three main sections in this output:
Spec
lists the cluster's initial declarative configuration.Status
contains the cluster's state and the AWS resources created by the management service.Events
includes all recent actions or errors. This log is invaluable for debugging your cluster.
The output resembles the following.
Name: cluster-0
Namespace: default
Kind: AWSCluster
Metadata:
Creation Timestamp: 0000-00-00T00:00:00Z
...
Spec:
Control Plane:
Etcd:
Main Volume:
Size GiB: 10
Iam Instance Profile: gke-node
Instance Type: t2.medium
Key Name: gke-key
Root Volume:
Size GiB: 10
Subnet IDs:
subnet-0000
Version: 0.00.0-gke.00
Networking:
Pod Address CIDR Blocks:
0.0.0.0/0
...
Status:
Admin Cert Secret Name: gke-0000-admin-cert
API DNS Name: gke-0000-controlplane-0000.elb.region.amazonaws.com
Control Plane Security Group ID: sg-0000
Gke Hub Membership Name: projects/global/memberships/gke-0000-cluster
Listener ARN: arn:aws:elasticloadbalancing:region:0000:listener/net/gke-0000
Load Balancer Name: gke-0000-controlplane
Node Pool Security Group ID: sg-0000
Provisioning Info:
Addons Installed: false
Gke Hub Membership Installed: false
Target Version: 0.00.0-gke.0
Replica Status:
Auto Scaling Group Name: gke-0000-controlplane-0
Etcd Main Volume ID: vol-0000
Launch Template Name: gke-0000-controlplane-0-0.00.0-gke.0
Network Interface ID: eni-0000
Private IP Address: 0.0.0.0
Replica: 0
...
Root CA Secret Name: gke-0000-root-ca
State: Provisioning
Target Group Name: gke-0000-controlplane
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal CreatedSecurityGroup 1m36s cluster-operator Created security group named "gke-0000"
Normal CreatedEtcdVolume 1m35s cluster-operator Created etcd volume on replica 0
Normal CreatedEtcdVolume 1m35s cluster-operator Created etcd volume on replica 2
Normal CreatedEtcdVolume 1m35s cluster-operator Created etcd volume on replica 1
Normal CreatedNetworkLoadBalancer 1m34s cluster-operator Created network load balancer named "gke-0000-controlplane"
Normal CreatedTargetGroup 1m34s cluster-operator Created target group named "gke-0000-controlplane"
Normal CreatedRootCASecret 1m34s cluster-operator Created root CA secret named "default/gke-0000-root-ca"
Normal CreatedGKEHubMembership 1m33s cluster-operator Created GKE Hub membership named "projects/global/memberships/gke-0000-cluster"
Normal CreatedNetworkInterface 1m30s cluster-operator Created network interface on replica 2
Getting your user cluster IDs
To configure certain GKE on AWS features, you might need to get your user cluster IDs.
To get your GKE on AWS user cluster ID, perform the following steps:
From your
anthos-aws
directory, useanthos-gke
to switch context to your management service.cd
anthos-aws anthos-gke aws management get-credentialsUse
kubectl get awscluster
to retrieve your cluster IDs.env HTTPS_PROXY=http://localhost:8118 \ kubectl get awscluster -o jsonpath={.items..status.clusterID}
The output contains your cluster IDs.
Events
An Event is a message attached to a Kubernetes object. Events describe actions
that affect the resource's status. The management service attaches Events to
your AWSCluster
and AWSNodePool
objects. Events provide a log of the steps
taken to create, upgrade, resize, or delete your cluster. Use kubectl get
events
to list events.
env HTTPS_PROXY=http://localhost:8118 \
kubectl get events
Example Output
LAST SEEN TYPE REASON OBJECT MESSAGE
27s Normal CreatingCluster awscluster/cluster-0 Cluster version 1.25.5-gke.2100 is being created
24s Normal CreatedSecurityGroup awscluster/cluster-0 Created security group named "gke-123456a7-controlplane"
24s Normal CreatedSecurityGroup awscluster/cluster-0 Created security group named "gke-123456a7-nodepool"
23s Normal CreatedEtcdVolume awscluster/cluster-0 Created etcd volume on replica 0
23s Normal CreatedEtcdVolume awscluster/cluster-0 Created etcd volume on replica 1
23s Normal CreatedEtcdVolume awscluster/cluster-0 Created etcd volume on replica 2
23s Normal CreatedNetworkLoadBalancer awscluster/cluster-0 Created network load balancer named "gke-123456a7-controlplane"
23s Normal CreatedTargetGroup awscluster/cluster-0 Created target group named "gke-123456a7-controlplane"
23s Normal CreatedRootCASecret awscluster/cluster-0 Created root CA secret named "default/gke-123456a7-api-server-ca"
22s Normal CreatedGKEHubMembership awscluster/cluster-0 Created GKE Hub membership named "projects/global/memberships/gke-123456a7-cluster"
20s Normal CreatedNetworkInterface awscluster/cluster-0 Created network interface on replica 0
20s Normal CreatedNetworkInterface awscluster/cluster-0 Created network interface on replica 1
20s Normal CreatedNetworkInterface awscluster/cluster-0 Created network interface on replica 2
20s Normal CreatedAdminCertSecret awscluster/cluster-0 Created admin certificate secret named "default/gke-123456a7-admin-cert"
27s Normal StartedNodePoolProvisioning awsnodepool/pool-0 Started node pool provisioning
13s Normal CreatedAutoScalingGroup awsnodepool/pool-0 Created auto scaling group named "gke-123456a7-nodepool-8b269fb0"
Operator Logs
Events list high-level actions taken by the operator. You
might need to observe lower-level activity such as stack traces. The
management service runs a static pod called gke-aws-cluster-operator
. This
application is an
operator
that contains the core cluster management logic.
In the following steps, you use the
crictl
tool to examine logs from the gke-aws-cluster-operator
.
Change to the directory with your GKE on AWS configuration. You created this directory when Installing the management service.
cd
anthos-aws Locate the management EC2 instance's DNS. Choose your version of Terraform, then run the following commands:
export CLUSTER_ID=$(terraform output cluster_id) export MANAGEMENT_IP=$(aws ec2 describe-instances \ --filters "Name=tag:Name,Values=$CLUSTER_ID-management-0" \ --query "Reservations[*].Instances[*].PrivateIpAddress" \ --output text)
export CLUSTER_ID=$(terraform output -raw cluster_id) export MANAGEMENT_IP=$(aws ec2 describe-instances \ --filters "Name=tag:Name,Values=$CLUSTER_ID-management-0" \ --query "Reservations[*].Instances[*].PrivateIpAddress" \ --output text)
If you are using a bastion host, find the bastion host's DNS.
export BASTION_DNS=$(terraform output bastion_dns_name)
export BASTION_DNS=$(terraform output -raw bastion_dns_name)
SSH through to the management instance. Choose whether you have a direct connection or use a bastion host.
ssh -i ~/.ssh/anthos-gke ubuntu@$MANAGEMENT_IP
ssh -i ~/.ssh/anthos-gke -J ubuntu@$BASTION_DNS ubuntu@$MANAGEMENT_IP
Get the container ID of the
gke-aws-cluster-operator
pod.export POD_ID=$(sudo crictl pods --name gke-aws-cluster-operator --quiet) export CONTAINER_ID=$(sudo crictl ps --pod $POD_ID --latest --quiet)
Print the pod's logs.
sudo crictl logs $CONTAINER_ID
Terraform
The anthos-gke
command outputs Terraform to
provision a management service.
You can use the terraform state
command to list infrastructure managed in
Terraform's
state.
terraform state list
Example Output
module.gke_dedicated_vpc.module.gke_bastion.aws_security_group.this
module.gke_dedicated_vpc.module.gke_bastion_security_group_rules.aws_security_group_rule.allow_http_outbound
module.gke_dedicated_vpc.module.gke_bastion_security_group_rules.aws_security_group_rule.allow_https_outbound
module.gke_dedicated_vpc.module.gke_bastion_security_group_rules.aws_security_group_rule.allow_ssh_inbound
module.gke_dedicated_vpc.module.gke_bastion_security_group_rules.aws_security_group_rule.allow_ssh_outbound
module.gke_dedicated_vpc.module.gke_controlplane_iam_policies.data.aws_iam_policy_document.this
module.gke_dedicated_vpc.module.gke_controlplane_iam_policies.aws_iam_role_policy.this
module.gke_dedicated_vpc.module.gke_controlplane_iam_role.data.aws_iam_policy_document.assume_role_policy
module.gke_dedicated_vpc.module.gke_controlplane_iam_role.aws_iam_instance_profile.this
module.gke_dedicated_vpc.module.gke_controlplane_iam_role.aws_iam_role.this
module.gke_dedicated_vpc.module.gke_management.data.aws_ami.this
module.gke_dedicated_vpc.module.gke_management.data.aws_iam_policy_document.assume_role_policy
module.gke_dedicated_vpc.module.gke_management.data.aws_subnet.this[0]
module.gke_dedicated_vpc.module.gke_management.aws_autoscaling_group.this[0]
module.gke_dedicated_vpc.module.gke_management.aws_ebs_volume.main[0]
module.gke_dedicated_vpc.module.gke_management.aws_iam_instance_profile.this
module.gke_dedicated_vpc.module.gke_management.aws_iam_role.this
module.gke_dedicated_vpc.module.gke_management.aws_launch_template.this[0]
module.gke_dedicated_vpc.module.gke_management.aws_lb.this
module.gke_dedicated_vpc.module.gke_management.aws_lb_listener.this
module.gke_dedicated_vpc.module.gke_management.aws_lb_target_group.this
module.gke_dedicated_vpc.module.gke_management.aws_security_group.this
module.gke_dedicated_vpc.module.gke_management_iam_policies.data.aws_iam_policy_document.this
module.gke_dedicated_vpc.module.gke_management_iam_policies.aws_iam_role_policy.this
module.gke_dedicated_vpc.module.gke_management_security_group_rules.aws_security_group_rule.allow_cidr_https_inbound[0]
module.gke_dedicated_vpc.module.gke_management_security_group_rules.aws_security_group_rule.allow_http_outbound
module.gke_dedicated_vpc.module.gke_management_security_group_rules.aws_security_group_rule.allow_https_inbound[0]
module.gke_dedicated_vpc.module.gke_management_security_group_rules.aws_security_group_rule.allow_https_outbound
module.gke_dedicated_vpc.module.gke_management_security_group_rules.aws_security_group_rule.allow_ssh_inbound[0]
module.gke_dedicated_vpc.module.gke_nodepool_iam_policies.data.aws_iam_policy_document.this
module.gke_dedicated_vpc.module.gke_nodepool_iam_policies.aws_iam_role_policy.this
module.gke_dedicated_vpc.module.gke_nodepool_iam_role.data.aws_iam_policy_document.assume_role_policy
module.gke_dedicated_vpc.module.gke_nodepool_iam_role.aws_iam_instance_profile.this
module.gke_dedicated_vpc.module.gke_nodepool_iam_role.aws_iam_role.this
module.gke_dedicated_vpc.module.gke_vpc.aws_eip.nat[0]
module.gke_dedicated_vpc.module.gke_vpc.aws_internet_gateway.this
module.gke_dedicated_vpc.module.gke_vpc.aws_nat_gateway.this[0]
module.gke_dedicated_vpc.module.gke_vpc.aws_route.private_nat_gateway[0]
module.gke_dedicated_vpc.module.gke_vpc.aws_route.public_internet_gateway
module.gke_dedicated_vpc.module.gke_vpc.aws_route_table.private[0]
module.gke_dedicated_vpc.module.gke_vpc.aws_route_table.public
module.gke_dedicated_vpc.module.gke_vpc.aws_route_table_association.private[0]
module.gke_dedicated_vpc.module.gke_vpc.aws_route_table_association.public[0]
module.gke_dedicated_vpc.module.gke_vpc.aws_subnet.private[0]
module.gke_dedicated_vpc.module.gke_vpc.aws_subnet.public[0]
module.gke_dedicated_vpc.module.gke_vpc.aws_vpc.this
We can take a closer look at any resource with terraform state show
command.
For example, let's examine the load balancer in front of the management service.
terraform state show module.gke_dedicated_vpc.module.gke_management.aws_lb.this
Example Output
resource "aws_lb" "this" {
arn = "arn:aws:elasticloadbalancing:aws-region :aws-account :loadbalancer/net/gke-12345678-management/arn-id "
arn_suffix = "net/gke-12345678-management/abcde1234"
dns_name = "exampledns .elb.ca-central-1.amazonaws.com"
enable_cross_zone_load_balancing = true
enable_deletion_protection = false
id = "arn:aws:elasticloadbalancing:aws-region :aws-account :loadbalancer/net/gke-12345678-management/arn-id "
internal = true
ip_address_type = "ipv4"
load_balancer_type = "network"
name = "gke-12345678-management"
security_groups = []
subnets = [
"subnet-0f77f5a97beb42e6d",
]
vpc_id = "vpc-0a123456789b"
zone_id = "Z2EPGBW3API2WT"
access_logs {
enabled = false
}
subnet_mapping {
subnet_id = "subnet-0f77f5a97beb42e6d"
}
}
Troubleshooting steps
This section lists specific troubleshooting steps for issues that might arise with GKE on AWS.
Terraform errors
This section lists errors that might arise in setting up your management service
with the anthos-gke
and terraform
command line tools.
NoCredentialProviders
If you receive the following error, confirm you have a version of Terraform higher than v0.12.28. If you do not have a compatible version of Terraform, download and install it.
Error: cannot determine availability zones: NoCredentialProviders: no valid providers in chain. Deprecated. For verbose messaging see aws.Config.CredentialsChainVerboseErrors
Failed to query available providers
If you receive the following error, you might have upgraded your version of Terraform.
exit status 1: Error: Failed to query available provider packages
Could not retrieve the list of available versions for provider hashicorp/aws:
locked provider registry.terraform.io/hashicorp/aws 2.70.0 does not match
configured version constraint 3.26.0; must use terraform init -upgrade to
allow selection of new versions
To fix this error, delete the .terraform.lock.hcl
file from your anthos-aws
directory.
Unknown token
If you receive the following error when creating a management service:
Error: error running 'terraform init -input=false -no-color' exit status 1:
There are some problems with the configuration, described below.
The Terraform configuration must be valid before initialization so that
Terraform can determine which modules and providers need to be installed.
Error: Error parsing /home/user/aws/main.tf: At 15:12: Unknown token:
15:12 IDENT var.region
Confirm you have a version of Terraform higher than v0.12.28. If you do not have a compatible version of Terraform, download and install it.
Invalid legacy provider address
If you receive the following error, you must upgrade your Terraform binary through each minor version in order.
Error: error running 'terraform init -input=false -no-color'.
exit status 1: Error: Invalid legacy provider address
This configuration or its associated state refers to the unqualified provider
"aws".
You must complete the Terraform 0.13 upgrade process before upgrading to later
versions.
For example, if you want to upgrade Terraform from v0.12.x to v0.14.x, you must
install v0.13.x temporarily. After installing a v0.13.x, run anthos-gke aws
management init
and anthos-gke aws management apply
. You can then upgrade to
v0.14.x.
Cannot connect to bastion host
If you receive the error bind [::1]:8118: Cannot assign requested address
when trying to connect to your bastion host, you may not be able to connect
using IPv4. Force an IPv4 connection with the following command:
./bastion-tunnel.sh -N -4
Unable to connect to the server
If you receive the error Unable to connect to the server
when trying to run
a kubectl
command on your user cluster, check the following:
- Your private subnets all have routes between them.
- Your private subnets have access to an AWS NAT gateway.
- If you are using
kubectl
1.21, change theHTTP_PROXY
variable toHTTPS_PROXY
.
Connect cannot access the project
During some Connect operations when registering your cluster, you might observe an error similar to the following:
ERROR: (gcloud.container.hub.memberships.register) failed to initialize Default Feature "authorizer", the fleet service account (service-PROJECT_NUMBER@gcp-sa-gkehub.iam.gserviceaccount.com) may not have access to your project
For more information, see the Fleet cannot access the project entry under Multicluster Management Troubleshooting.
Permissions errors
If you receive an authentication error when using the anthos-gke
command-line
tool, for example:
oauth2: cannot fetch token: 400 Bad Request
Re-authenticate your Google Cloud CLI installation with
gcloud auth application-default login
.
kubectl
cannot connect to your cluster
If you receive the following error when trying to connect to your cluster with
kubectl
:
Error: could not get token: NoCredentialProviders: no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors
Solution
Your AWS CLI credentials might have expired. Confirm the credentials are valid
with aws ec2 describe instances
.
Regenerate your credentials with aws configure
. If you use multi-factor
authentication (MFA),
regenerate your AWS MFA token.
An AWSCluster
stays Provisioning
If the state of a cluster in kubectl get awsclusters
or
kubectl describe AWSCluster cluster-name
does not change
from Provisioning
, check the value of your AWSCluster
's
spec.controlPlane.hub.membershipName
field. The value of this field must be
unique.
kubectl get awsclusters
NAME STATE AGE VERSION ENDPOINT
awscluster.multicloud.cluster.gke.io/cluster-0 Provisioning 8h 1.25.5-gke.2100 gke-123456a7-controlplane-abcdefg12345.elb.us-east-1.amazonaws.com
Solutions
To fix the problem, try the following steps:
- Confirm that you are running a supported version of Terraform.
- If you created more than one user cluster in your AWS account, edit your
AWSCluster definition.
Change
spec.controlPlane.hub.membershipName
to a unique value. Apply the change withkubectl
.
TLS errors when connecting to your cluster
If you receive the following error when connecting to your cluster:
error dialing backend: remote error: tls: internal error
Check that your AWS VPC
DHCP options set
does not have a custom value for domain-name
. When a custom domain is set,
GKE on AWS cannot generate a certificate signing request (CSR).
Solution
To fix the problem, perform one of the following steps:
Set
enableDnsHostnames
to false on your VPC. If you create a dedicated AWS VPC withanthos-gke
, you can set this value in the.terraform/modules/gke_vpc/modules/gke-vpc/main.tf
file.Update the value of
domain-name
to the default.
Rebooting your control plane
If your control plane becomes unresponsive, you can reboot its instances. When you reboot control plane instances, your configuration remains as state is saved on persistent disks.
- From your
anthos-aws
directory, useanthos-gke
to switch context to your management service.cd
anthos-aws anthos-gke aws management get-credentials Use
kubectl
to get your control plane's AWS EC2 target group from your AWSCluster.env HTTPS_PROXY=http://localhost:8118 \ kubectl get awscluster
cluster-name \ -o jsonpath='{.status.targetGroupName}{"\n"}'The output includes the name of your control plane's EC2 target group. For example,
gke-123456a7-controlplane
.Open the AWS EC2 Console. Choose Target Groups from the left pane.
Click the search bar and find your target group. Click the Name of your target group and then click Targets. The list of your control plane instances appears.
For each instance in the target group, perform the following steps:
Click the instance's Instance ID. The AWS EC2 Instances console appears.
Click on the Instance ID.
Select Actions -> Instance state -> terminate to remove the instance. EC2 automatically creates a new instance with the same EBS volume.
Return to the Target Groups page.
After you have terminated all the instances in the group, return to the Target Groups page.
In the Registered Targets section of the page, find the Status column. Each of your instances should have a Status of Healthy. If any of the instances are healthy, wait several minutes and click the Refresh (
) icon.After all instances in the target group are healthy, proceed to the next step.
Tagging subnets
GKE on AWS requires tags on subnets that contain load balancer
endpoints. GKE on AWS automatically tags all subnets specified in the
spec.Networking.ServiceLoadBalancerSubnetIDs
field of the AWSCluster
resource.
If you would like to add additional subnets to your user cluster, or if you need to re-apply tags to existing subnets, perform the following steps.
From your
anthos-aws
directory, useanthos-gke
to switch context to your management service.cd
anthos-aws anthos-gke aws management get-credentialsGet the ID of your cluster's AWS VPC with
kubectl
and store it as a variable.export VPC_ID=$(\ env HTTPS_PROXY=http://localhost:8118 \ kubectl get awscluster cluster-0 -o jsonpath='{.spec.networking.vpcID}')
Check the variables content with
echo
. The output resemblesvpc-12345678abcdef0
.echo $VPC_ID
Save your cluster ID into an environment variable.
export CLUSTER_ID=$(\ env HTTPS_PROXY=http://localhost:8118 \ kubectl get awscluster cluster-0 -o jsonpath='{.status.clusterID}')
You can check the variable with
echo
:echo $CLUSTER_ID
The response includes your cluster ID.
gke-12345678
If you installed GKE on AWS into a dedicated VPC, you can use the
aws
command-line tool to retrieve the subnet ID.Select from the following options:
- Public, if you want to expose Services on your public subnet.
- Private, if you want to expose Services on your private subnet.
Multiple subnets, if you want to expose Services on multiple subnets.
export SUBNET_ID=$(aws ec2 describe-subnets \ --filters "Name=vpc-id,Values=$VPC_ID" "Name=tag:Name,Values=*public*" \ --query "Subnets[*].SubnetId" \ --output text)
The output is an object that contains your subnet ID. It resembles
subnet-1234abcdefg
. You can check the variable withecho
:echo $SUBNET_ID
The response includes your subnet ID.
subnet-012345678abcdef
export SUBNET_ID=$(aws ec2 describe-subnets \ --filters "Name=vpc-id,Values=$VPC_ID" "Name=tag:Name,Values=*private*" \ --query "Subnets[*].SubnetId" \ --output text)
The output is an object that contains your subnet ID. It resembles
subnet-1234abcdefg
. You can check the variable withecho
:echo $SUBNET_ID
The response includes your subnet ID.
subnet-012345678abcdef
If you are using multiple subnets for your AWSNodePools (for example, if you use multiple availability zones), you need to tag your subnet IDs individually.
Retrieve your list of subnet IDs with
aws ec2 describe-subnets
.To get a list of all public subnets, run the following command:
aws ec2 describe-subnets \ --filters "Name=vpc-id,Values=$VPC_ID" "Name=tag:Name,Values=*public*" \ --query "Subnets[*].SubnetId" \ --output text
To get a list of all private subnets, run the following command:
aws ec2 describe-subnets \ --filters "Name=vpc-id,Values=$VPC_ID" "Name=tag:Name,Values=*private*" \ --query "Subnets[*].SubnetId" \ --output text
The response includes your subnet IDs.
subnet-012345678abcdef subnet-abcdef123456789 subnet-123456789abcdef
Tag your subnet with your cluster ID. If you have multiple subnets, select Multiple subnets.
aws ec2 create-tags \ --resources $SUBNET_ID \ --tags Key=kubernetes.io/cluster/$CLUSTER_ID,Value=shared
For each of your subnets, run the following command:
aws ec2 create-tags \ --resources
subnet-ids \ --tags Key=kubernetes.io/cluster/$CLUSTER_ID,Value=sharedReplace subnet-ids with the list of subnet IDs, separated by spaces. For example,
subnet-012345678abcdef subnet-abcdef123456789 subnet-123456789abcdef
.
Error when deleting your clusters
If kubectl
does not respond when you try to
delete a user cluster,
get your cluster's events. You might see the following event.
Could not delete security group: resource SECURITY_GROUP_ID has a dependent object.
Solution
Delete any AWS Elastic File System mount targets in SECURITY_GROUP_ID.
API errors
Kubernetes 1.22 deprecates and replaces several APIs. If you've upgraded your cluster to version 1.22 or later, any calls your application makes to one of the deprecated APIs will fail.
Solution
Upgrade your application to replace the deprecated API calls with their newer counterparts.
Snapshots
The anthos-gke
tool supports generating snapshots of your
GKE on AWS environment. A snapshot contains information to help
Google Cloud support reproduce and debug issues offline.
The anthos-gke
tool can generate a snapshot from either a management service
or user clusters. By default, a snapshot includes CRDs, Events, cluster
information, cluster logs, instance logs, and instance files. You can customize
the information included with a configuration file. The bundle also contains an
index.html
with links to the included files.
If you would like to troubleshoot your GKE on AWS installation before contacting Google Cloud support, skip to the following section.
Generating a snapshot
This section shows how to create a snapshot from a management or user cluster using the default configuration or a custom configuration.
Before you begin
To connect to your GKE on AWS resources, perform the following steps. Select if you have an existing AWS VPC (or direct connection to your VPC) or created a dedicated VPC when creating your management service.
If you have a direct or VPN connection to an existing VPC, omit the line
env HTTP_PROXY=http://localhost:8118
from commands in this topic.
When you create a management service in a dedicated VPC, GKE on AWS includes a bastion host in a public subnet.
To connect to your management service, perform the following steps:
Change to the directory with your GKE on AWS configuration. You created this directory when Installing the management service.
cd
anthos-aws To open the tunnel, run the
bastion-tunnel.sh
script. The tunnel forwards tolocalhost:8118
.To open a tunnel to the bastion host, run the following command:
./bastion-tunnel.sh -N
Messages from the SSH tunnel appear in this window. When you are ready to close the connection, stop the process by using Control+C or closing the window.
Open a new terminal and change into your
anthos-aws
directory.cd
anthos-aws Check that you're able to connect to the cluster with
kubectl
.env HTTPS_PROXY=http://localhost:8118 \ kubectl cluster-info
The output includes the URL for the management service API server.
From your
anthos-aws
directory, useanthos-gke
to switch context to your management service.cd
anthos-aws anthos-gke aws management get-credentials
Capture a snapshot of a management service
To capture a snapshot of a management service, use
anthos-gke aws management diagnose snapshot
.
env HTTPS_PROXY=http://localhost:8118 \
anthos-gke aws management diagnose snapshot \
--ssh-key-path ssh-key-path \
--workspace workspace
Replace:
- ssh-key-path with the path to your GKE on AWS ssh key.
By default the file is created in
.ssh/anthos-gke
. - workspace with the path to the directory where you created your GKE on AWS deployment. If a bastion host was created as part of a dedicated VPC installation, GKE on AWS uses the bastion to connect to your management service.
The output includes the log of actions and the name of the snapshot file:
2020/06/15 15:39:48 Found bastion instance in tfworkspace. IP: bastion-ip.aws-zone.compute.amazonaws.com
writing file: /tmp/kubeconfig-mgmt679794004/kubeconfig.conf
snapshot: 2020/06/15 15:39:50 Getting snapshot of controlPlane...
snapshot: 2020/06/15 15:39:50 Getting snapshot of kubectl command...
snapshot: 2020/06/15 15:39:52 Getting snapshot of control plane managed aws resources
...
ip-10-0-1-44/commands/ip_route_list_table_all.out
/tmp/tmp.Z26niLmVfU/snapshot.tar.gz
2020/06/15 15:40:04 Snapshot saved in snapshot-1592260783.tar.gz.
Capture a snapshot of a user cluster
To capture a snapshot of a user cluster, use
anthos-gke aws clusters diagnose snapshot
.
env HTTPS_PROXY=http://localhost:8118 \
anthos-gke aws clusters diagnose snapshot user-cluster-name \
--ssh-key-path ssh-key-path --workspace terraform-workspace
Replace:
- user-cluster-name is the name of your user cluster. By default,
your first user cluster is named
cluster-0
. - ssh-key-path with the path to your GKE on AWS ssh key.
By default the file is created in
.ssh/anthos-gke
. - terraform-workspace with the path to the Terraform workspace where you created your GKE on AWS deployment.
The output includes the log of actions and the name of the snapshot file:
2020/06/15 15:43:37 Found bastion instance in tfworkspace. IP: bastion-ip.aws-zone.compute.amazonaws.com
writing file: /tmp/kubeconfig-mgmt616204648/kubeconfig.conf
snapshot: 2020/06/15 15:43:40 Getting snapshot of controlPlane...
snapshot: 2020/06/15 15:43:40 Getting snapshot of kubectl command...
writing file: /tmp/kubeconfig-clustercluster-0620691367/kubeconfig.conf
snapshot: 2020/06/15 15:43:43 Getting snapshot of cluster default/cluster-0
snapshot: 2020/06/15 15:43:43 Getting snapshot of controlPlane...
snapshot: 2020/06/15 15:43:43 Getting snapshot of kubectl command...
snapshot: 2020/06/15 15:43:46 Getting snapshot of control plane managed aws resources
...
snapshot: 2020/06/15 15:43:48 Getting snapshot of node pools
snapshot: 2020/06/15 15:43:48 Getting snapshot of node pool default/pool-0
snapshot: 2020/06/15 15:43:48 Getting snapshot of node pool managed aws resources
...
2020/06/15 15:44:00 Snapshot saved in snapshot-1592261012.tar.gz.
Modifying the snapshot configuration
To modify the default snapshot configuration, first you use anthos-gke
with
the --dry-run
option and save the output to a file. Then, edit your
configuration and re-run anthos-gke
with the new configuration.
Create a configuration file for the management service
Use
anthos-gke
with the--dry-run
option and redirect the output to a file named management-config.yaml.env HTTPS_PROXY=http://localhost:8118 \ anthos-gke aws management diagnose snapshot --ssh-key-path
ssh-key-path \ --workspaceworkspace --dry-run > management-config.yaml 2>&1Replace:
- ssh-key-path with the path to your GKE on AWS ssh key.
By default the file is created in
~/.ssh/anthos-gke
. - workspace with the path to the directory where you created your GKE on AWS deployment.
- ssh-key-path with the path to your GKE on AWS ssh key.
By default the file is created in
Edit the
management-config.yaml
file in a text editor. The file contains two sections. The first section is a timestamped log fromanthos-gke
, which looks like the following:2020/06/15 15:26:51 Found bastion instance in tfworkspace. IP: bastion-ip.aws-zone.compute.amazonaws.com 2020/06/15 15:26:51 Running in dry-run mode... ...
Delete the log section. Keep the configuration after the line
The snapshot configuration is:
.The remaining contents of the file are a YAML configuration that looks like the following:
mgmtCluster:
instanceCommands:
- dmesg
- sudo crictl ps a
- systemctl status -l containerd
- journalctl --since '1 hour ago' --utc -u containerd
- systemctl status -l kubelet
- journalctl --since '1 hour ago' --utc -u kubelet
- journalctl --since '1 hour ago' --utc --boot --dmesg
- uptime
- df --all --inodes
- ip addr
- sudo iptables-save --counters
- mount
- ip route list table all
- top -bn1
- ps -edF
- ps -eo pid,tid,ppid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:14,comm,args,cgroup
- chronyc sources -v
- journalctl --since '1 hour ago' --utc -u chrony
instanceFiles:
- /etc/kubernetes/manifests/
- /proc/sys/fs/file-nr
- /proc/sys/net/nf_conntrack_max
- /var/log/startup.log
- /var/log/cloud-init-output.log
- /var/log/containers
kubectlCommands:
- commands:
- kubectl get events
- kubectl version
- kubectl cluster-info
- kubectl get clusterroles -o wide
- kubectl get clusterrolebindings -o wide
- kubectl get crd -o wide
- kubectl describe clusterroles
- kubectl describe clusterrolebindings
- kubectl describe crd
- kubectl get all -o yaml
- kubectl describe all
managedAWSResources: true
numOfThreads: 10
truncate: 2000
Remove any commands you believe expose sensitive data.
Run the
anthos-gke
command and with the--snapshot-config
option andmanagement-config.yaml
.env HTTPS_PROXY=http://localhost:8118 \ anthos-gke aws management diagnose snapshot --ssh-key-path
ssh-key-path \ --snapshot-config management-config.yaml --workspaceterraform-workspace Replace:
- ssh-key-path with the path to your GKE on AWS ssh key.
By default the file is created in
.ssh/anthos-gke
. - terraform-workspace with the path to the Terraform workspace where you created your GKE on AWS deployment.
- ssh-key-path with the path to your GKE on AWS ssh key.
By default the file is created in
Create a configuration file for a user cluster
Use
anthos-gke
with the--dry-run
option and redirect the output to a file named user-config.yaml.env HTTPS_PROXY=http://localhost:8118 \ anthos-gke aws clusters diagnose snapshot
user-cluster-name \ --ssh-key-pathssh-key-path --snapshot-configsnapshot-config-path \ --workspaceworkspace --dry-run > user-config.yaml 2>&1Replace:
- user-cluster-name is the name of your user cluster. By default,
your first cluster is named
cluster-0
. - ssh-key-path with the path to your GKE on AWS ssh key.
By default the file is created in
~/.ssh/anthos-gke
. - workspace with the path to the directory where you created your GKE on AWS deployment.
- user-cluster-name is the name of your user cluster. By default,
your first cluster is named
Edit the
management-config.yaml
file in a text editor. The file contains two sections. The first section is a timestamped log fromanthos-gke
, which looks like the following:2020/06/15 15:26:51 Found bastion instance in tfworkspace. IP: bastion-ip.aws-zone.compute.amazonaws.com 2020/06/15 15:26:51 Running in dry-run mode... ...
Delete the log section. Keep the configuration after the line
The snapshot configuration is:
.The remaining contents of the file are a YAML configuration that looks like the following:
clusters:
- clusterName: cluster-0
controlPlane:
instanceCommands:
- dmesg
- sudo crictl ps a
- systemctl status -l containerd
- journalctl --since '1 hour ago' --utc -u containerd
- systemctl status -l kubelet
- journalctl --since '1 hour ago' --utc -u kubelet
- journalctl --since '1 hour ago' --utc --boot --dmesg
- uptime
- df --all --inodes
- ip addr
- sudo iptables-save --counters
- mount
- ip route list table all
- top -bn1
- sudo docker ps -a
- ps -edF
- ps -eo pid,tid,ppid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:14,comm,args,cgroup
instanceFiles:
- /etc/kubernetes/manifests/
- /proc/sys/fs/file-nr
- /proc/sys/net/nf_conntrack_max
- /var/log/startup.log
- /var/log/cloud-init-output.log
- /var/log/containers
kubectlCommands:
- commands:
- kubectl get events
- kubectl version
- kubectl cluster-info
- kubectl get clusterroles -o wide
- kubectl get clusterrolebindings -o wide
- kubectl get crd -o wide
- kubectl describe clusterroles
- kubectl describe clusterrolebindings
- kubectl describe crd
- kubectl get all -o yaml
- kubectl describe all
- kubectl logs --namespace=kube-system -l k8s-app=aws-ebs-csi-driver-node --all-containers
- kubectl logs --namespace=kube-system -l k8s-app=aws-efs-csi-driver-node --all-containers
- kubectl logs --namespace=kube-system -l k8s-app=calico-node --all-containers
- kubectl logs --namespace=kube-system -l k8s-app=node-local-dns --all-containers
- kubectl logs --namespace=kube-system -l k8s-app=kube-proxy --all-containers
- kubectl describe nodes
managedAWSResources: true
nodePools:
- NodePoolName: ""
instanceCommands:
- dmesg
- sudo crictl ps a
- systemctl status -l containerd
- journalctl --since '1 hour ago' --utc -u containerd
- systemctl status -l kubelet
- journalctl --since '1 hour ago' --utc -u kubelet
- journalctl --since '1 hour ago' --utc --boot --dmesg
- uptime
- df --all --inodes
- ip addr
- sudo iptables-save --counters
- mount
- ip route list table all
- top -bn1
- sudo docker ps -a
- ps -edF
- ps -eo pid,tid,ppid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:14,comm,args,cgroup
instanceFiles:
- /etc/kubernetes/manifests/
- /proc/sys/fs/file-nr
- /proc/sys/net/nf_conntrack_max
- /var/log/startup.log
- /var/log/cloud-init-output.log
- /var/log/containers
managedAWSResources: true
mgmtCluster:
kubectlCommands:
- commands:
- kubectl get awscluster -oyaml
numOfThreads: 10
truncate: 2000
Remove any commands you believe expose sensitive data.
Run the
anthos-gke
command and with the--snapshot-config
option anduser-config.yaml
.env HTTPS_PROXY=http://localhost:8118 \ anthos-gke aws clusters diagnose snapshot
user-cluster-name --ssh-key-path <ssh-key-path> \ --snapshot-config user-config.yaml --workspace <terraform-workspace>Replace:
- user-cluster-name is the name of your user cluster. By default,
your first cluster is named
cluster-0
. - ssh-key-path with the path to your GKE on AWS ssh key.
By default the file is created in
.ssh/anthos-gke
. - terraform-workspace with the path to the Terraform workspace where you created your GKE on AWS deployment.
- user-cluster-name is the name of your user cluster. By default,
your first cluster is named
Default snapshot contents
This section shows example file trees contained within the default snapshot for a management service and a user cluster.
The following tree is an example of the files within a snapshot for a management cluster.
.
├── index.html
├── mgmt_cluster
│ ├── controlplane-0-10.0.1.44
│ │ └── ip-10-0-1-44
│ │ ├── commands
│ │ │ ├── df_--all_--inodes.out
│ │ │ ├── dmesg.out
│ │ │ ├── ip_addr.out
│ │ │ ├── ip_route_list_table_all.out
│ │ │ ├── journalctl_--since__1_hour_ago__--utc_--boot_--dmesg.out
│ │ │ ├── journalctl_--since__1_hour_ago__--utc_-u_containerd.out
│ │ │ ├── journalctl_--since__1_hour_ago__--utc_-u_kubelet.out
│ │ │ ├── mount.out
│ │ │ ├── ps_-edF.out
│ │ │ ├── ps_-eo_pid,tid,ppid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:14,comm,args,cgroup.out
│ │ │ ├── sudo_crictl_ps_a.out
│ │ │ ├── sudo_docker_ps_-a.err
│ │ │ ├── sudo_docker_ps_-a.out
│ │ │ ├── sudo_iptables-save_--counters.out
│ │ │ ├── systemctl_status_-l_containerd.out
│ │ │ ├── systemctl_status_-l_kubelet.out
│ │ │ ├── top_-bn1.out
│ │ │ └── uptime.out
│ │ └── files
│ │ ├── etc
│ │ │ └── kubernetes
│ │ │ └── manifests
│ │ │ ├── etcd.yaml
│ │ │ ├── gke-aws-cluster-operator.yaml
│ │ │ ├── kube-apiserver.yaml
│ │ │ └── kube-controller-manager.yaml
│ │ ├── proc
│ │ │ └── sys
│ │ │ ├── fs
│ │ │ │ └── file-nr
│ │ │ └── net
│ │ │ └── nf_conntrack_max
│ │ └── var
│ │ └── log
│ │ ├── cloud-init-output.log
│ │ ├── containers
│ │ │ ├── etcd-ip-10-0-1-44.ap-southeast-2.compute.internal_kube-system_kube-etcd-149e96d0b0da2250505a6b41603e57f42a5386701fa0033840e8f3b211b49733.log
│ │ │ ├── gke-aws-cluster-operator-ip-10-0-1-44.ap-southeast-2.compute.internal_kube-system_gke-aws-cluster-operator-d423d3191ce1a8c65c4a0f30f1d7598a8739c0aba65784355b28dee0d694626a.log
│ │ │ ├── kube-apiserver-ip-10-0-1-44.ap-southeast-2.compute.internal_kube-system_kube-apiserver-48061659a4b77f4b50eed5819dbfab5586dc9086fa24217cc16486bd852dfbf6.log
│ │ │ ├── kube-apiserver-ip-10-0-1-44.ap-southeast-2.compute.internal_kube-system_kube-apiserver-baf60859cd807e9325295fde7a8d3cd16c3d5e73abca87acc107cee5e08f4c1c.log
│ │ │ └── kube-controller-manager-ip-10-0-1-44.ap-southeast-2.compute.internal_kube-system_kube-controller-manager-af9b4ffb40ada3383630090948ec8133ca0e3e54c232dd3f068b3bd8bbee8f92.log
│ │ └── startup.log
│ ├── kubectl
│ │ ├── kubectl_cluster-info_--kubeconfig_.tmp.kubeconfig-mgmt609381529.kubeconfig.conf.out
│ │ ├── kubectl_describe_all_--kubeconfig_.tmp.kubeconfig-mgmt609381529.kubeconfig.conf.out
│ │ ├── kubectl_describe_clusterrolebindings_--kubeconfig_.tmp.kubeconfig-mgmt609381529.kubeconfig.conf.out
│ │ ├── kubectl_describe_clusterroles_--kubeconfig_.tmp.kubeconfig-mgmt609381529.kubeconfig.conf.out
│ │ ├── kubectl_describe_crd_--kubeconfig_.tmp.kubeconfig-mgmt609381529.kubeconfig.conf.out
│ │ ├── kubectl_get_all_-o_yaml_--kubeconfig_.tmp.kubeconfig-mgmt609381529.kubeconfig.conf.out
│ │ ├── kubectl_get_clusterrolebindings_-o_wide_--kubeconfig_.tmp.kubeconfig-mgmt609381529.kubeconfig.conf.out
│ │ ├── kubectl_get_clusterroles_-o_wide_--kubeconfig_.tmp.kubeconfig-mgmt609381529.kubeconfig.conf.out
│ │ ├── kubectl_get_crd_-o_wide_--kubeconfig_.tmp.kubeconfig-mgmt609381529.kubeconfig.conf.out
│ │ ├── kubectl_get_events_--kubeconfig_.tmp.kubeconfig-mgmt609381529.kubeconfig.conf.out
│ │ └── kubectl_version_--kubeconfig_.tmp.kubeconfig-mgmt609381529.kubeconfig.conf.out
│ └── managed_resources
│ ├── controlplane-0
│ │ ├── asg.out
│ │ └── instance-0.out
│ ├── elb.out
│ └── target_group.out
├── snapshot.config
└── snapshot.log
The following tree is an example of the files within a snapshot for a user
cluster named cluster-0
.
.
├── cluster
│ └── cluster-0
│ ├── control_plane
│ │ ├── controlplane-0-10.0.1.7
│ │ │ └── ip-10-0-1-7
│ │ │ ├── commands
│ │ │ │ ├── df_--all_--inodes.out
│ │ │ │ ├── dmesg.out
│ │ │ │ ├── ip_addr.out
│ │ │ │ ├── ip_route_list_table_all.out
│ │ │ │ ├── journalctl_--since__1_hour_ago__--utc_--boot_--dmesg.out
│ │ │ │ ├── journalctl_--since__1_hour_ago__--utc_-u_containerd.out
│ │ │ │ ├── journalctl_--since__1_hour_ago__--utc_-u_kubelet.out
│ │ │ │ ├── mount.out
│ │ │ │ ├── ps_-edF.out
│ │ │ │ ├── ps_-eo_pid,tid,ppid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:14,comm,args,cgroup.out
│ │ │ │ ├── sudo_crictl_ps_a.out
│ │ │ │ ├── sudo_docker_ps_-a.err
│ │ │ │ ├── sudo_docker_ps_-a.out
│ │ │ │ ├── sudo_iptables-save_--counters.out
│ │ │ │ ├── systemctl_status_-l_containerd.out
│ │ │ │ ├── systemctl_status_-l_kubelet.out
│ │ │ │ ├── top_-bn1.out
│ │ │ │ └── uptime.out
│ │ │ └── files
│ │ │ ├── etc
│ │ │ │ └── kubernetes
│ │ │ │ └── manifests
│ │ │ │ ├── aws-ebs-csi-driver-controller.yaml
│ │ │ │ ├── aws-encryption-provider.yaml
│ │ │ │ ├── cluster-autoscaler.yaml
│ │ │ │ ├── etcd-events.yaml
│ │ │ │ ├── etcd.yaml
│ │ │ │ ├── kube-apiserver.yaml
│ │ │ │ ├── kube-controller-manager.yaml
│ │ │ │ └── kube-scheduler.yaml
│ │ │ ├── proc
│ │ │ │ └── sys
│ │ │ │ ├── fs
│ │ │ │ │ └── file-nr
│ │ │ │ └── net
│ │ │ │ └── nf_conntrack_max
│ │ │ └── var
│ │ │ └── log
│ │ │ ├── cloud-init-output.log
│ │ │ ├── containers
│ │ │ │ ├── aws-ebs-csi-driver-controller-ip-10-0-1-7.ap-southeast-2.compute.internal_kube-system_csi-attacher-218b7834cda8b4ae0f6687e06b33426ca39669a6c2652948e17746d49ed4c7c9.log
│ │ │ │ ├── aws-ebs-csi-driver-controller-ip-10-0-1-7.ap-southeast-2.compute.internal_kube-system_csi-provisioner-ff1ba1960712a00065db1e036e1aaf5aeaca0979c833d020ad1cafdea05a76c7.log
│ │ │ │ ├── aws-ebs-csi-driver-controller-ip-10-0-1-7.ap-southeast-2.compute.internal_kube-system_ebs-plugin-697389a6c73bdb4a0370a644a28617b3b8a12862341b91ca2d640aa66724affd.log
│ │ │ │ ├── aws-encryption-provider-ip-10-0-1-7.ap-southeast-2.compute.internal_kube-system_aws-encryption-provider-b08216cbca004f71e68145b9a38b931276dd9ef92d26c53b85275587ce28f3ca.log
│ │ │ │ ├── cluster-autoscaler-ip-10-0-1-7.ap-southeast-2.compute.internal_kube-system_cluster-autoscaler-57f9ca6abec10a76b42449dababea6c963853b1aa30f1db2b87d963311d03629.log
│ │ │ │ ├── cluster-autoscaler-ip-10-0-1-7.ap-southeast-2.compute.internal_kube-system_cluster-autoscaler-9c09937ddbe3220c896f857a8b8c02c84062f13092b39ebac3ab1ce26f13b317.log
│ │ │ │ ├── etcd-events-ip-10-0-1-7.ap-southeast-2.compute.internal_kube-system_kube-etcd-events-4e79334e69f670a3a4637c20635944abb71ed93d6e802407ef5881478ee78dc1.log
│ │ │ │ ├── etcd-ip-10-0-1-7.ap-southeast-2.compute.internal_kube-system_kube-etcd-e6dbe39ef969fb2f049292d4f3a66a41d22f963b40f72f5f91ad6acd9e9cde77.log
│ │ │ │ ├── kube-apiserver-ip-10-0-1-7.ap-southeast-2.compute.internal_kube-system_kube-apiserver-e61770a46518313306e1668c34e4efbdb3ed81b7f451dc3278a00a40fee09e0d.log
│ │ │ │ ├── kube-controller-manager-ip-10-0-1-7.ap-southeast-2.compute.internal_kube-system_kube-controller-manager-3b33df6a4d4cca8fd63f90d4fcbee65595e71c0390a5c29c81670d0232b98edc.log
│ │ │ │ └── kube-scheduler-ip-10-0-1-7.ap-southeast-2.compute.internal_kube-system_kube-scheduler-0aae214e17741189db8d3608275e71551f62f43619e07a37a11017b88a611970.log
│ │ │ └── startup.log
│ │ ├── controlplane-1-10.0.1.61
│ │ │ └── ip-10-0-1-61
│ │ │ ├── commands
│ │ │ │ ├── df_--all_--inodes.out
│ │ │ │ ├── dmesg.out
│ │ │ │ ├── ip_addr.out
│ │ │ │ ├── ip_route_list_table_all.out
│ │ │ │ ├── journalctl_--since__1_hour_ago__--utc_--boot_--dmesg.out
│ │ │ │ ├── journalctl_--since__1_hour_ago__--utc_-u_containerd.out
│ │ │ │ ├── journalctl_--since__1_hour_ago__--utc_-u_kubelet.out
│ │ │ │ ├── mount.out
│ │ │ │ ├── ps_-edF.out
│ │ │ │ ├── ps_-eo_pid,tid,ppid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:14,comm,args,cgroup.out
│ │ │ │ ├── sudo_crictl_ps_a.out
│ │ │ │ ├── sudo_docker_ps_-a.err
│ │ │ │ ├── sudo_docker_ps_-a.out
│ │ │ │ ├── sudo_iptables-save_--counters.out
│ │ │ │ ├── systemctl_status_-l_containerd.out
│ │ │ │ ├── systemctl_status_-l_kubelet.out
│ │ │ │ ├── top_-bn1.out
│ │ │ │ └── uptime.out
│ │ │ └── files
│ │ │ ├── etc
│ │ │ │ └── kubernetes
│ │ │ │ └── manifests
│ │ │ │ ├── aws-ebs-csi-driver-controller.yaml
│ │ │ │ ├── aws-encryption-provider.yaml
│ │ │ │ ├── cluster-autoscaler.yaml
│ │ │ │ ├── etcd-events.yaml
│ │ │ │ ├── etcd.yaml
│ │ │ │ ├── kube-apiserver.yaml
│ │ │ │ ├── kube-controller-manager.yaml
│ │ │ │ └── kube-scheduler.yaml
│ │ │ ├── proc
│ │ │ │ └── sys
│ │ │ │ ├── fs
│ │ │ │ │ └── file-nr
│ │ │ │ └── net
│ │ │ │ └── nf_conntrack_max
│ │ │ └── var
│ │ │ └── log
│ │ │ ├── cloud-init-output.log
│ │ │ ├── containers
│ │ │ │ ├── aws-ebs-csi-driver-controller-ip-10-0-1-61.ap-southeast-2.compute.internal_kube-system_csi-attacher-63345efd65ea584c35f4b0d2de443bf42e83e65324e899be27335a25fe07a72c.log
│ │ │ │ ├── aws-ebs-csi-driver-controller-ip-10-0-1-61.ap-southeast-2.compute.internal_kube-system_csi-provisioner-6f66e7479c319fbcbcaf53f9b5398cd8e53bcd646fa9788afbc25a69fc9291fe.log
│ │ │ │ ├── aws-ebs-csi-driver-controller-ip-10-0-1-61.ap-southeast-2.compute.internal_kube-system_ebs-plugin-2ee649206dd099e29b8bb3cbf27bef499b851682c07590a34c2e08d9545ca51b.log
│ │ │ │ ├── aws-encryption-provider-ip-10-0-1-61.ap-southeast-2.compute.internal_kube-system_aws-encryption-provider-3d2b5c28b7389e1303d2e36dd510ec40cef99f2ea63823901ea9806869def8fa.log
│ │ │ │ ├── cluster-autoscaler-ip-10-0-1-61.ap-southeast-2.compute.internal_kube-system_cluster-autoscaler-ebc572523516845d023884810f721865c2f0a76e34aaf92babdffacf4c95f75b.log
│ │ │ │ ├── cluster-autoscaler-ip-10-0-1-61.ap-southeast-2.compute.internal_kube-system_cluster-autoscaler-f7a7480c9adb08077b9a07d2f14e2b14dda7b4d50027cf105d90f37c63945bfa.log
│ │ │ │ ├── etcd-events-ip-10-0-1-61.ap-southeast-2.compute.internal_kube-system_kube-etcd-events-60301eef32b7c10d0aea3de19549bfdcc3e4152cf3ca8ca7d5e10785e2e232fd.log
│ │ │ │ ├── etcd-ip-10-0-1-61.ap-southeast-2.compute.internal_kube-system_kube-etcd-25a369f08d0d2bbc9f8f83337724e14c9878a1a0249cc5e9c7c63cae3d3657a1.log
│ │ │ │ ├── kube-apiserver-ip-10-0-1-61.ap-southeast-2.compute.internal_kube-system_kube-apiserver-13a276422964d2674f16d971bafcd25555eee3eb10b6a6f60686e8b8810a5def.log
│ │ │ │ ├── kube-controller-manager-ip-10-0-1-61.ap-southeast-2.compute.internal_kube-system_kube-controller-manager-18625e69e9604fbacbe82aebc6dc18dd4427269941a25a9bdef3fc0e5a4dfb9e.log
│ │ │ │ └── kube-scheduler-ip-10-0-1-61.ap-southeast-2.compute.internal_kube-system_kube-scheduler-12f48aad99ecc18b450ebcda85ffb7f138bbb6bc261fb06e75ae1548647eaa45.log
│ │ │ └── startup.log
│ │ ├── controlplane-2-10.0.1.161
│ │ │ └── ip-10-0-1-161
│ │ │ ├── commands
│ │ │ │ ├── df_--all_--inodes.out
│ │ │ │ ├── dmesg.out
│ │ │ │ ├── ip_addr.out
│ │ │ │ ├── ip_route_list_table_all.out
│ │ │ │ ├── journalctl_--since__1_hour_ago__--utc_--boot_--dmesg.out
│ │ │ │ ├── journalctl_--since__1_hour_ago__--utc_-u_containerd.out
│ │ │ │ ├── journalctl_--since__1_hour_ago__--utc_-u_kubelet.out
│ │ │ │ ├── mount.out
│ │ │ │ ├── ps_-edF.out
│ │ │ │ ├── ps_-eo_pid,tid,ppid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:14,comm,args,cgroup.out
│ │ │ │ ├── sudo_crictl_ps_a.out
│ │ │ │ ├── sudo_docker_ps_-a.err
│ │ │ │ ├── sudo_docker_ps_-a.out
│ │ │ │ ├── sudo_iptables-save_--counters.out
│ │ │ │ ├── systemctl_status_-l_containerd.out
│ │ │ │ ├── systemctl_status_-l_kubelet.out
│ │ │ │ ├── top_-bn1.out
│ │ │ │ └── uptime.out
│ │ │ └── files
│ │ │ ├── etc
│ │ │ │ └── kubernetes
│ │ │ │ └── manifests
│ │ │ │ ├── aws-ebs-csi-driver-controller.yaml
│ │ │ │ ├── aws-encryption-provider.yaml
│ │ │ │ ├── cluster-autoscaler.yaml
│ │ │ │ ├── etcd-events.yaml
│ │ │ │ ├── etcd.yaml
│ │ │ │ ├── kube-apiserver.yaml
│ │ │ │ ├── kube-controller-manager.yaml
│ │ │ │ └── kube-scheduler.yaml
│ │ │ ├── proc
│ │ │ │ └── sys
│ │ │ │ ├── fs
│ │ │ │ │ └── file-nr
│ │ │ │ └── net
│ │ │ │ └── nf_conntrack_max
│ │ │ └── var
│ │ │ └── log
│ │ │ ├── cloud-init-output.log
│ │ │ ├── containers
│ │ │ │ ├── aws-ebs-csi-driver-controller-ip-10-0-1-161.ap-southeast-2.compute.internal_kube-system_csi-attacher-0d66d0e6d7ead9a0af3ee2b9ea7769669a33636639549571ed10eaacf7ddd85b.log
│ │ │ │ ├── aws-ebs-csi-driver-controller-ip-10-0-1-161.ap-southeast-2.compute.internal_kube-system_csi-provisioner-c68d3acbdf8f319fe1d700eb3584fd07016d7a8b507e05261b1596fb96ca7598.log
│ │ │ │ ├── aws-ebs-csi-driver-controller-ip-10-0-1-161.ap-southeast-2.compute.internal_kube-system_ebs-plugin-66c248fc6c21021355ad5aa20ec98894b3370d1b58d86d3bf4b794bfb971eaef.log
│ │ │ │ ├── aws-encryption-provider-ip-10-0-1-161.ap-southeast-2.compute.internal_kube-system_aws-encryption-provider-f79e0367f399331652f7beb9145e97eefe95a635a3101ffb73bb8c29d449304e.log
│ │ │ │ ├── cluster-autoscaler-ip-10-0-1-161.ap-southeast-2.compute.internal_kube-system_cluster-autoscaler-4ce584e9f04e3c545168a01a84b4a697a199e4ff536477d8cb884f89ab65872c.log
│ │ │ │ ├── cluster-autoscaler-ip-10-0-1-161.ap-southeast-2.compute.internal_kube-system_cluster-autoscaler-a384f6202e6b7f0a3d5918adc87a8acf158a4e5d13401825a129188663cf32d7.log
│ │ │ │ ├── etcd-events-ip-10-0-1-161.ap-southeast-2.compute.internal_kube-system_kube-etcd-events-4975f4f7ea692237be1016e2c03e024ca58cc78745b482ca41fe80481c425f28.log
│ │ │ │ ├── etcd-ip-10-0-1-161.ap-southeast-2.compute.internal_kube-system_kube-etcd-92a56adf23ce887f032335ccc2ebd48e39de6ddd600302fe985d3b088e8d4eea.log
│ │ │ │ ├── kube-apiserver-ip-10-0-1-161.ap-southeast-2.compute.internal_kube-system_kube-apiserver-76551958905114e0eaa056c0e3eb7cc0af7d9f6291af9efe49bbab95250500ce.log
│ │ │ │ ├── kube-controller-manager-ip-10-0-1-161.ap-southeast-2.compute.internal_kube-system_kube-controller-manager-d50c38a03f5e01ca438508db11091e9421fa8eb7231f484303a350a7b0538439.log
│ │ │ │ └── kube-scheduler-ip-10-0-1-161.ap-southeast-2.compute.internal_kube-system_kube-scheduler-7ebaccccbf67c06d379b1541f1970e4e987de138556542469cc24aacea1c9213.log
│ │ │ └── startup.log
│ │ ├── kubectl
│ │ │ ├── kubectl_cluster-info_--kubeconfig_.tmp.kubeconfig-clustercluster-0904143419.kubeconfig.conf.out
│ │ │ ├── kubectl_describe_all_--kubeconfig_.tmp.kubeconfig-clustercluster-0904143419.kubeconfig.conf.out
│ │ │ ├── kubectl_describe_clusterrolebindings_--kubeconfig_.tmp.kubeconfig-clustercluster-0904143419.kubeconfig.conf.out
│ │ │ ├── kubectl_describe_clusterroles_--kubeconfig_.tmp.kubeconfig-clustercluster-0904143419.kubeconfig.conf.out
│ │ │ ├── kubectl_describe_crd_--kubeconfig_.tmp.kubeconfig-clustercluster-0904143419.kubeconfig.conf.out
│ │ │ ├── kubectl_get_all_-o_yaml_--kubeconfig_.tmp.kubeconfig-clustercluster-0904143419.kubeconfig.conf.out
│ │ │ ├── kubectl_get_clusterrolebindings_-o_wide_--kubeconfig_.tmp.kubeconfig-clustercluster-0904143419.kubeconfig.conf.out
│ │ │ ├── kubectl_get_clusterroles_-o_wide_--kubeconfig_.tmp.kubeconfig-clustercluster-0904143419.kubeconfig.conf.out
│ │ │ ├── kubectl_get_crd_-o_wide_--kubeconfig_.tmp.kubeconfig-clustercluster-0904143419.kubeconfig.conf.out
│ │ │ ├── kubectl_get_events_--kubeconfig_.tmp.kubeconfig-clustercluster-0904143419.kubeconfig.conf.out
│ │ │ └── kubectl_version_--kubeconfig_.tmp.kubeconfig-clustercluster-0904143419.kubeconfig.conf.out
│ │ └── managed_resources
│ │ ├── controlplane-0
│ │ │ ├── asg.out
│ │ │ ├── eni.out
│ │ │ ├── etcd.out
│ │ │ └── instance-0.out
│ │ ├── controlplane-1
│ │ │ ├── asg.out
│ │ │ ├── eni.out
│ │ │ ├── etcd.out
│ │ │ └── instance-0.out
│ │ ├── controlplane-2
│ │ │ ├── asg.out
│ │ │ ├── eni.out
│ │ │ ├── etcd.out
│ │ │ └── instance-0.out
│ │ ├── elb.out
│ │ └── target_group.out
│ └── nodepools
│ ├── default
│ │ └── pool-0
│ │ ├── i-03a23df438ac8278e
│ │ │ └── ip-10-0-1-53
│ │ │ ├── commands
│ │ │ │ ├── df_--all_--inodes.out
│ │ │ │ ├── dmesg.out
│ │ │ │ ├── ip_addr.out
│ │ │ │ ├── ip_route_list_table_all.out
│ │ │ │ ├── journalctl_--since__1_hour_ago__--utc_--boot_--dmesg.out
│ │ │ │ ├── journalctl_--since__1_hour_ago__--utc_-u_containerd.out
│ │ │ │ ├── journalctl_--since__1_hour_ago__--utc_-u_kubelet.out
│ │ │ │ ├── mount.out
│ │ │ │ ├── ps_-edF.out
│ │ │ │ ├── ps_-eo_pid,tid,ppid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:14,comm,args,cgroup.out
│ │ │ │ ├── sudo_crictl_ps_a.out
│ │ │ │ ├── sudo_docker_ps_-a.err
│ │ │ │ ├── sudo_docker_ps_-a.out
│ │ │ │ ├── sudo_iptables-save_--counters.out
│ │ │ │ ├── systemctl_status_-l_containerd.out
│ │ │ │ ├── systemctl_status_-l_kubelet.out
│ │ │ │ ├── top_-bn1.out
│ │ │ │ └── uptime.out
│ │ │ └── files
│ │ │ ├── proc
│ │ │ │ └── sys
│ │ │ │ ├── fs
│ │ │ │ │ └── file-nr
│ │ │ │ └── net
│ │ │ │ └── nf_conntrack_max
│ │ │ └── var
│ │ │ └── log
│ │ │ ├── cloud-init-output.log
│ │ │ ├── containers
│ │ │ │ ├── aws-ebs-csi-driver-node-mwxhs_kube-system_ebs-plugin-696201b4997d5cc72d85e6b005faa544ab2311571b50d2b1402b6e967a9364f0.log
│ │ │ │ ├── aws-ebs-csi-driver-node-mwxhs_kube-system_node-driver-registrar-fcde7a18980aee3c690a84953f44341df9755e28ada6a42a6aea4c1b9d6cdd8e.log
│ │ │ │ ├── calico-node-2g6zt_kube-system_calico-node-f6c22e30079cff40bef7deafbdfa2a97d0c3a4a95e7b68499c917adb1aa24b09.log
│ │ │ │ ├── calico-node-2g6zt_kube-system_flexvol-driver-9c6e02ad10d342a91e2c3c3d924f747cfab756e719ffc580c073c6e3640b7515.log
│ │ │ │ ├── calico-node-2g6zt_kube-system_install-cni-3649827b98cb5b2f79f8e9204b07247ca8d2768e4d13b1a8a1359278741ed156.log
│ │ │ │ ├── calico-node-2g6zt_kube-system_upgrade-ipam-3a2fe7afee90bfe4e09b0939300be89f4795bc7a57e8085a57cb714e015092f6.log
│ │ │ │ ├── coredns-88cd756b8-thm49_kube-system_coredns-6485d1e189c7b11fdc6249316ab6082360737c67edb77ab0d426eb26dba261ee.log
│ │ │ │ ├── gke-connect-agent-20200605-02-00-8647455579-jh2r2_gke-connect_gke-connect-agent-20200605-02-00-369a43ce1bccb57bf3abfd503b9b25c81dbcd73d60a8642b68c0bb89b1b8e9fd.log
│ │ │ │ └── kube-proxy-sg5nr_kube-system_kube-proxy-44f9171a644c8d7d0497900f361faa22fc673adc8336608ced096e655ccde762.log
│ │ │ └── startup.log
│ │ ├── i-0569a9f23d49f59ea
│ │ │ └── ip-10-0-1-137
│ │ │ ├── commands
│ │ │ │ ├── df_--all_--inodes.out
│ │ │ │ ├── dmesg.out
│ │ │ │ ├── ip_addr.out
│ │ │ │ ├── ip_route_list_table_all.out
│ │ │ │ ├── journalctl_--since__1_hour_ago__--utc_--boot_--dmesg.out
│ │ │ │ ├── journalctl_--since__1_hour_ago__--utc_-u_containerd.out
│ │ │ │ ├── journalctl_--since__1_hour_ago__--utc_-u_kubelet.out
│ │ │ │ ├── mount.out
│ │ │ │ ├── ps_-edF.out
│ │ │ │ ├── ps_-eo_pid,tid,ppid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:14,comm,args,cgroup.out
│ │ │ │ ├── sudo_crictl_ps_a.out
│ │ │ │ ├── sudo_docker_ps_-a.err
│ │ │ │ ├── sudo_docker_ps_-a.out
│ │ │ │ ├── sudo_iptables-save_--counters.out
│ │ │ │ ├── systemctl_status_-l_containerd.out
│ │ │ │ ├── systemctl_status_-l_kubelet.out
│ │ │ │ ├── top_-bn1.out
│ │ │ │ └── uptime.out
│ │ │ └── files
│ │ │ ├── proc
│ │ │ │ └── sys
│ │ │ │ ├── fs
│ │ │ │ │ └── file-nr
│ │ │ │ └── net
│ │ │ │ └── nf_conntrack_max
│ │ │ └── var
│ │ │ └── log
│ │ │ ├── cloud-init-output.log
│ │ │ ├── containers
│ │ │ │ ├── aws-ebs-csi-driver-node-zxxqg_kube-system_ebs-plugin-da9a84b2e45e2ad18d08cbab5260d4cee17636d868a645ab7343f50c25c64ece.log
│ │ │ │ ├── aws-ebs-csi-driver-node-zxxqg_kube-system_node-driver-registrar-f96a7dbf1bac95c41e022b2ede129c664caafa6eff37caa52f6763c1e737be1a.log
│ │ │ │ ├── calico-kube-controllers-56cd854695-mjfwx_kube-system_calico-kube-controllers-0ed4316450f5f2e6c4abfb5cc430ed18e2d530525e2ab0ed69a150eed5b3c860.log
│ │ │ │ ├── calico-node-n5klf_kube-system_calico-node-36101112d423636164c236eacca76d6814c167203cfaf89754984cd79f3b6bbf.log
│ │ │ │ ├── calico-node-n5klf_kube-system_flexvol-driver-5837e2ba75d549373ca0a3032d1be0c75c0dd442d2e25e286e9006e604794da2.log
│ │ │ │ ├── calico-node-n5klf_kube-system_install-cni-0e19cfa737dcaaf8fbc40ee2e68460ea8888829b7fab4b8733d5322c339cf838.log
│ │ │ │ ├── calico-node-n5klf_kube-system_upgrade-ipam-10c94c2fa5f67a69ad9ebeedf9764bbf566c99b50ef60f2f268d484bd028eb76.log
│ │ │ │ └── kube-proxy-pzh87_kube-system_kube-proxy-5df6d54f9ff2dd036687e064186bcfc2b7c0536fd88586b5cac9e140ffa16658.log
│ │ │ └── startup.log
│ │ └── i-05bfb8fc961337cc7
│ │ └── ip-10-0-1-167
│ │ ├── commands
│ │ │ ├── df_--all_--inodes.out
│ │ │ ├── dmesg.out
│ │ │ ├── ip_addr.out
│ │ │ ├── ip_route_list_table_all.out
│ │ │ ├── journalctl_--since__1_hour_ago__--utc_--boot_--dmesg.out
│ │ │ ├── journalctl_--since__1_hour_ago__--utc_-u_containerd.out
│ │ │ ├── journalctl_--since__1_hour_ago__--utc_-u_kubelet.out
│ │ │ ├── mount.out
│ │ │ ├── ps_-edF.out
│ │ │ ├── ps_-eo_pid,tid,ppid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:14,comm,args,cgroup.out
│ │ │ ├── sudo_crictl_ps_a.out
│ │ │ ├── sudo_docker_ps_-a.err
│ │ │ ├── sudo_docker_ps_-a.out
│ │ │ ├── sudo_iptables-save_--counters.out
│ │ │ ├── systemctl_status_-l_containerd.out
│ │ │ ├── systemctl_status_-l_kubelet.out
│ │ │ ├── top_-bn1.out
│ │ │ └── uptime.out
│ │ └── files
│ │ ├── proc
│ │ │ └── sys
│ │ │ ├── fs
│ │ │ │ └── file-nr
│ │ │ └── net
│ │ │ └── nf_conntrack_max
│ │ └── var
│ │ └── log
│ │ ├── cloud-init-output.log
│ │ ├── containers
│ │ │ ├── aws-ebs-csi-driver-node-kdghk_kube-system_ebs-plugin-3e107a145cc86ac24014b1bf4670b26cb9372fd8022bc0698ca68b27e79a9bfe.log
│ │ │ ├── aws-ebs-csi-driver-node-kdghk_kube-system_node-driver-registrar-25874dd7063db875a27f170e13e74267749c0d9846781ac8ab7568ac5f940a11.log
│ │ │ ├── calico-node-b98tq_kube-system_calico-node-1dd735ce87fe6f0f73761d2d97c07ea6f908d0cd088e23f6b916b13b6805f828.log
│ │ │ ├── calico-node-b98tq_kube-system_flexvol-driver-e9951b1a3de0ed9426de260c5591b4c161b4917873f2eaaf1cbdbd3926c9f933.log
│ │ │ ├── calico-node-b98tq_kube-system_install-cni-58a2f1a5bfb16951a4b012b5ed30751d24c3380f489011274e3ca8de7eb1e1aa.log
│ │ │ ├── calico-node-b98tq_kube-system_upgrade-ipam-f9804f2eef0d18122219dbb2843880a392f55493dab0edc16bce2dc5e186fa2a.log
│ │ │ ├── kube-proxy-pf7sv_kube-system_kube-proxy-473ffafc30368b1cb370cd5cbbe4b20e77dfc383da04386d3ec02948f04bc97d.log
│ │ │ ├── metrics-server-v0.3.3-85dfcbb78-fmklb_kube-system_metrics-server-4570fa1bd82d238d0ab11fc4256e5cc9fa97accece05f7f0c02d5edab884468e.log
│ │ │ └── metrics-server-v0.3.3-85dfcbb78-fmklb_kube-system_metrics-server-nanny-71eeba81fb3cf128066ca965aabc5bfdf8e045790a46a9ec7e56e73ad3859218.log
│ │ └── startup.log
│ └── managed_resources
│ ├── asg.out
│ ├── instance-0.out
│ ├── instance-1.out
│ └── instance-2.out
├── index.html
├── mgmt_cluster
│ └── kubectl
│ └── kubectl_get_awscluster_-oyaml_--kubeconfig_.tmp.kubeconfig-mgmt786666316.kubeconfig.conf.out
├── snapshot.config
└── snapshot.log