Troubleshooting

Learn about steps that you might find helpful when troubleshooting using GKE on AWS.

Overview

This topic is divided into the following sections:

Getting information on your cluster

This section explains how to retrieve information on your GKE on AWS installation for troubleshooting problems.

Getting information from the management cluster

You can query your management service for information on your user clusters.

  1. From your anthos-aws directory, use anthos-gke to switch context to your management service.

    cd anthos-aws
    anthos-gke aws management get-credentials

  2. Use kubectl get to retrieve a cluster's basic status information.

    env HTTPS_PROXY=http://localhost:8118 \
      kubectl get AWSCluster
    

    The output includes each cluster's name, current state, age, version, and endpoint.

    NAME        STATE          AGE     VERSION         ENDPOINT
    cluster-0   Provisioning   2m41s   1.25.5-gke.2100   gke-<var>endpoint</var>.elb.us-east-1.amazonaws.com
    

You can retrieve more information about a cluster with kubectl describe.

env HTTPS_PROXY=http://localhost:8118 \
  kubectl describe AWSCluster cluster-0

There are three main sections in this output:

  • Spec lists the cluster's initial declarative configuration.
  • Status contains the cluster's state and the AWS resources created by the management service.
  • Events includes all recent actions or errors. This log is invaluable for debugging your cluster.

The output resembles the following.

Name:         cluster-0
Namespace:    default
Kind:         AWSCluster
Metadata:
  Creation Timestamp:  0000-00-00T00:00:00Z
  ...
Spec:
 Control Plane:
    Etcd:
      Main Volume:
        Size GiB: 10
    Iam Instance Profile:  gke-node
    Instance Type:         t2.medium
    Key Name:              gke-key
    Root Volume:
      Size GiB: 10
    Subnet IDs:
      subnet-0000
    Version:  0.00.0-gke.00
  Networking:
    Pod Address CIDR Blocks:
      0.0.0.0/0
  ...
Status:
  Admin Cert Secret Name:           gke-0000-admin-cert
  API DNS Name:                     gke-0000-controlplane-0000.elb.region.amazonaws.com
  Control Plane Security Group ID:  sg-0000
  Gke Hub Membership Name:          projects/global/memberships/gke-0000-cluster
  Listener ARN:                     arn:aws:elasticloadbalancing:region:0000:listener/net/gke-0000
  Load Balancer Name:               gke-0000-controlplane
  Node Pool Security Group ID:      sg-0000
  Provisioning Info:
    Addons Installed:              false
    Gke Hub Membership Installed:  false
    Target Version:                0.00.0-gke.0
  Replica Status:
    Auto Scaling Group Name:  gke-0000-controlplane-0
    Etcd Main Volume ID:      vol-0000
    Launch Template Name:     gke-0000-controlplane-0-0.00.0-gke.0
    Network Interface ID:     eni-0000
    Private IP Address:       0.0.0.0
    Replica:                  0
    ...
  Root CA Secret Name:        gke-0000-root-ca
  State:                      Provisioning
  Target Group Name:          gke-0000-controlplane
Events:
  Type    Reason                      Age    From              Message
  ----    ------                      ----   ----              -------
  Normal  CreatedSecurityGroup        1m36s  cluster-operator  Created security group named "gke-0000"
  Normal  CreatedEtcdVolume           1m35s  cluster-operator  Created etcd volume on replica 0
  Normal  CreatedEtcdVolume           1m35s  cluster-operator  Created etcd volume on replica 2
  Normal  CreatedEtcdVolume           1m35s  cluster-operator  Created etcd volume on replica 1
  Normal  CreatedNetworkLoadBalancer  1m34s  cluster-operator  Created network load balancer named "gke-0000-controlplane"
  Normal  CreatedTargetGroup          1m34s  cluster-operator  Created target group named "gke-0000-controlplane"
  Normal  CreatedRootCASecret         1m34s  cluster-operator  Created root CA secret named "default/gke-0000-root-ca"
  Normal  CreatedGKEHubMembership     1m33s  cluster-operator  Created GKE Hub membership named "projects/global/memberships/gke-0000-cluster"
  Normal  CreatedNetworkInterface     1m30s  cluster-operator  Created network interface on replica 2

Getting your user cluster IDs

To configure certain GKE on AWS features, you might need to get your user cluster IDs.

To get your GKE on AWS user cluster ID, perform the following steps:

  1. From your anthos-aws directory, use anthos-gke to switch context to your management service.

    cd anthos-aws
    anthos-gke aws management get-credentials

  2. Use kubectl get awscluster to retrieve your cluster IDs.

    env HTTPS_PROXY=http://localhost:8118 \
      kubectl get awscluster -o jsonpath={.items..status.clusterID}
    

    The output contains your cluster IDs.

Events

An Event is a message attached to a Kubernetes object. Events describe actions that affect the resource's status. The management service attaches Events to your AWSCluster and AWSNodePool objects. Events provide a log of the steps taken to create, upgrade, resize, or delete your cluster. Use kubectl get events to list events.

env HTTPS_PROXY=http://localhost:8118 \
    kubectl get events

Example Output

LAST SEEN   TYPE     REASON                        OBJECT                   MESSAGE
27s         Normal   CreatingCluster               awscluster/cluster-0     Cluster version 1.25.5-gke.2100 is being created
24s         Normal   CreatedSecurityGroup          awscluster/cluster-0     Created security group named "gke-123456a7-controlplane"
24s         Normal   CreatedSecurityGroup          awscluster/cluster-0     Created security group named "gke-123456a7-nodepool"
23s         Normal   CreatedEtcdVolume             awscluster/cluster-0     Created etcd volume on replica 0
23s         Normal   CreatedEtcdVolume             awscluster/cluster-0     Created etcd volume on replica 1
23s         Normal   CreatedEtcdVolume             awscluster/cluster-0     Created etcd volume on replica 2
23s         Normal   CreatedNetworkLoadBalancer    awscluster/cluster-0     Created network load balancer named "gke-123456a7-controlplane"
23s         Normal   CreatedTargetGroup            awscluster/cluster-0     Created target group named "gke-123456a7-controlplane"
23s         Normal   CreatedRootCASecret           awscluster/cluster-0     Created root CA secret named "default/gke-123456a7-api-server-ca"
22s         Normal   CreatedGKEHubMembership       awscluster/cluster-0     Created GKE Hub membership named "projects/global/memberships/gke-123456a7-cluster"
20s         Normal   CreatedNetworkInterface       awscluster/cluster-0     Created network interface on replica 0
20s         Normal   CreatedNetworkInterface       awscluster/cluster-0     Created network interface on replica 1
20s         Normal   CreatedNetworkInterface       awscluster/cluster-0     Created network interface on replica 2
20s         Normal   CreatedAdminCertSecret        awscluster/cluster-0     Created admin certificate secret named "default/gke-123456a7-admin-cert"
27s         Normal   StartedNodePoolProvisioning   awsnodepool/pool-0       Started node pool provisioning
13s         Normal   CreatedAutoScalingGroup       awsnodepool/pool-0       Created auto scaling group named "gke-123456a7-nodepool-8b269fb0"

Operator Logs

Events list high-level actions taken by the operator. You might need to observe lower-level activity such as stack traces. The management service runs a static pod called gke-aws-cluster-operator. This application is an operator that contains the core cluster management logic.

In the following steps, you use the crictl tool to examine logs from the gke-aws-cluster-operator.

  1. Change to the directory with your GKE on AWS configuration. You created this directory when Installing the management service.

    cd anthos-aws

  2. Locate the management EC2 instance's DNS. Choose your version of Terraform, then run the following commands:

    export CLUSTER_ID=$(terraform output cluster_id)
    export MANAGEMENT_IP=$(aws ec2 describe-instances \
        --filters "Name=tag:Name,Values=$CLUSTER_ID-management-0" \
        --query "Reservations[*].Instances[*].PrivateIpAddress" \
        --output text)
    
    export CLUSTER_ID=$(terraform output -raw cluster_id)
    export MANAGEMENT_IP=$(aws ec2 describe-instances \
        --filters "Name=tag:Name,Values=$CLUSTER_ID-management-0" \
        --query "Reservations[*].Instances[*].PrivateIpAddress" \
        --output text)
    
  3. If you are using a bastion host, find the bastion host's DNS.

    export BASTION_DNS=$(terraform output bastion_dns_name)
    
    export BASTION_DNS=$(terraform output -raw bastion_dns_name)
    
  4. SSH through to the management instance. Choose whether you have a direct connection or use a bastion host.

    ssh -i ~/.ssh/anthos-gke ubuntu@$MANAGEMENT_IP
    
    ssh -i ~/.ssh/anthos-gke -J ubuntu@$BASTION_DNS ubuntu@$MANAGEMENT_IP
    
  5. Get the container ID of the gke-aws-cluster-operator pod.

    export POD_ID=$(sudo crictl pods --name gke-aws-cluster-operator --quiet)
    export CONTAINER_ID=$(sudo crictl ps --pod $POD_ID --latest --quiet)
    
  6. Print the pod's logs.

    sudo crictl logs $CONTAINER_ID
    

Terraform

The anthos-gke command outputs Terraform to provision a management service. You can use the terraform state command to list infrastructure managed in Terraform's state.

terraform state list

Example Output

module.gke_dedicated_vpc.module.gke_bastion.aws_security_group.this
module.gke_dedicated_vpc.module.gke_bastion_security_group_rules.aws_security_group_rule.allow_http_outbound
module.gke_dedicated_vpc.module.gke_bastion_security_group_rules.aws_security_group_rule.allow_https_outbound
module.gke_dedicated_vpc.module.gke_bastion_security_group_rules.aws_security_group_rule.allow_ssh_inbound
module.gke_dedicated_vpc.module.gke_bastion_security_group_rules.aws_security_group_rule.allow_ssh_outbound
module.gke_dedicated_vpc.module.gke_controlplane_iam_policies.data.aws_iam_policy_document.this
module.gke_dedicated_vpc.module.gke_controlplane_iam_policies.aws_iam_role_policy.this
module.gke_dedicated_vpc.module.gke_controlplane_iam_role.data.aws_iam_policy_document.assume_role_policy
module.gke_dedicated_vpc.module.gke_controlplane_iam_role.aws_iam_instance_profile.this
module.gke_dedicated_vpc.module.gke_controlplane_iam_role.aws_iam_role.this
module.gke_dedicated_vpc.module.gke_management.data.aws_ami.this
module.gke_dedicated_vpc.module.gke_management.data.aws_iam_policy_document.assume_role_policy
module.gke_dedicated_vpc.module.gke_management.data.aws_subnet.this[0]
module.gke_dedicated_vpc.module.gke_management.aws_autoscaling_group.this[0]
module.gke_dedicated_vpc.module.gke_management.aws_ebs_volume.main[0]
module.gke_dedicated_vpc.module.gke_management.aws_iam_instance_profile.this
module.gke_dedicated_vpc.module.gke_management.aws_iam_role.this
module.gke_dedicated_vpc.module.gke_management.aws_launch_template.this[0]
module.gke_dedicated_vpc.module.gke_management.aws_lb.this
module.gke_dedicated_vpc.module.gke_management.aws_lb_listener.this
module.gke_dedicated_vpc.module.gke_management.aws_lb_target_group.this
module.gke_dedicated_vpc.module.gke_management.aws_security_group.this
module.gke_dedicated_vpc.module.gke_management_iam_policies.data.aws_iam_policy_document.this
module.gke_dedicated_vpc.module.gke_management_iam_policies.aws_iam_role_policy.this
module.gke_dedicated_vpc.module.gke_management_security_group_rules.aws_security_group_rule.allow_cidr_https_inbound[0]
module.gke_dedicated_vpc.module.gke_management_security_group_rules.aws_security_group_rule.allow_http_outbound
module.gke_dedicated_vpc.module.gke_management_security_group_rules.aws_security_group_rule.allow_https_inbound[0]
module.gke_dedicated_vpc.module.gke_management_security_group_rules.aws_security_group_rule.allow_https_outbound
module.gke_dedicated_vpc.module.gke_management_security_group_rules.aws_security_group_rule.allow_ssh_inbound[0]
module.gke_dedicated_vpc.module.gke_nodepool_iam_policies.data.aws_iam_policy_document.this
module.gke_dedicated_vpc.module.gke_nodepool_iam_policies.aws_iam_role_policy.this
module.gke_dedicated_vpc.module.gke_nodepool_iam_role.data.aws_iam_policy_document.assume_role_policy
module.gke_dedicated_vpc.module.gke_nodepool_iam_role.aws_iam_instance_profile.this
module.gke_dedicated_vpc.module.gke_nodepool_iam_role.aws_iam_role.this
module.gke_dedicated_vpc.module.gke_vpc.aws_eip.nat[0]
module.gke_dedicated_vpc.module.gke_vpc.aws_internet_gateway.this
module.gke_dedicated_vpc.module.gke_vpc.aws_nat_gateway.this[0]
module.gke_dedicated_vpc.module.gke_vpc.aws_route.private_nat_gateway[0]
module.gke_dedicated_vpc.module.gke_vpc.aws_route.public_internet_gateway
module.gke_dedicated_vpc.module.gke_vpc.aws_route_table.private[0]
module.gke_dedicated_vpc.module.gke_vpc.aws_route_table.public
module.gke_dedicated_vpc.module.gke_vpc.aws_route_table_association.private[0]
module.gke_dedicated_vpc.module.gke_vpc.aws_route_table_association.public[0]
module.gke_dedicated_vpc.module.gke_vpc.aws_subnet.private[0]
module.gke_dedicated_vpc.module.gke_vpc.aws_subnet.public[0]
module.gke_dedicated_vpc.module.gke_vpc.aws_vpc.this

We can take a closer look at any resource with terraform state show command. For example, let's examine the load balancer in front of the management service.

terraform state show module.gke_dedicated_vpc.module.gke_management.aws_lb.this

Example Output

resource "aws_lb" "this" {
    arn                              = "arn:aws:elasticloadbalancing:aws-region:aws-account:loadbalancer/net/gke-12345678-management/arn-id"
    arn_suffix                       = "net/gke-12345678-management/abcde1234"
    dns_name                         = "exampledns.elb.ca-central-1.amazonaws.com"
    enable_cross_zone_load_balancing = true
    enable_deletion_protection       = false
    id                               = "arn:aws:elasticloadbalancing:aws-region:aws-account:loadbalancer/net/gke-12345678-management/arn-id"
    internal                         = true
    ip_address_type                  = "ipv4"
    load_balancer_type               = "network"
    name                             = "gke-12345678-management"
    security_groups                  = []
    subnets                          = [
        "subnet-0f77f5a97beb42e6d",
    ]
    vpc_id                           = "vpc-0a123456789b"
    zone_id                          = "Z2EPGBW3API2WT"

    access_logs {
        enabled = false
    }

    subnet_mapping {
        subnet_id = "subnet-0f77f5a97beb42e6d"
    }
}

Troubleshooting steps

This section lists specific troubleshooting steps for issues that might arise with GKE on AWS.

Terraform errors

This section lists errors that might arise in setting up your management service with the anthos-gke and terraform command line tools.

NoCredentialProviders

If you receive the following error, confirm you have a version of Terraform higher than v0.12.28. If you do not have a compatible version of Terraform, download and install it.

Error: cannot determine availability zones: NoCredentialProviders: no valid providers in chain. Deprecated. For verbose messaging see aws.Config.CredentialsChainVerboseErrors

Failed to query available providers

If you receive the following error, you might have upgraded your version of Terraform.

exit status 1: Error: Failed to query available provider packages

Could not retrieve the list of available versions for provider hashicorp/aws:
locked provider registry.terraform.io/hashicorp/aws 2.70.0 does not match
configured version constraint 3.26.0; must use terraform init -upgrade to
allow selection of new versions

To fix this error, delete the .terraform.lock.hcl file from your anthos-aws directory.

Unknown token

If you receive the following error when creating a management service:

Error: error running 'terraform init -input=false -no-color' exit status 1:
There are some problems with the configuration, described below.

The Terraform configuration must be valid before initialization so that
Terraform can determine which modules and providers need to be installed.

Error: Error parsing /home/user/aws/main.tf: At 15:12: Unknown token:
15:12 IDENT var.region

Confirm you have a version of Terraform higher than v0.12.28. If you do not have a compatible version of Terraform, download and install it.

Invalid legacy provider address

If you receive the following error, you must upgrade your Terraform binary through each minor version in order.

Error: error running 'terraform init -input=false -no-color'.
exit status 1: Error: Invalid legacy provider address

This configuration or its associated state refers to the unqualified provider
"aws".

You must complete the Terraform 0.13 upgrade process before upgrading to later
versions.

For example, if you want to upgrade Terraform from v0.12.x to v0.14.x, you must install v0.13.x temporarily. After installing a v0.13.x, run anthos-gke aws management init and anthos-gke aws management apply. You can then upgrade to v0.14.x.

Cannot connect to bastion host

If you receive the error bind [::1]:8118: Cannot assign requested address when trying to connect to your bastion host, you may not be able to connect using IPv4. Force an IPv4 connection with the following command:

./bastion-tunnel.sh -N -4

Unable to connect to the server

If you receive the error Unable to connect to the server when trying to run a kubectl command on your user cluster, check the following:

  • Your private subnets all have routes between them.
  • Your private subnets have access to an AWS NAT gateway.
  • If you are using kubectl 1.21, change the HTTP_PROXY variable to HTTPS_PROXY.

Connect cannot access the project

During some Connect operations when registering your cluster, you might observe an error similar to the following:

  ERROR: (gcloud.container.hub.memberships.register) failed to initialize Default Feature
  "authorizer", the fleet service account (service-PROJECT_NUMBER@gcp-sa-gkehub.iam.gserviceaccount.com) may not have access to your project
  

For more information, see the Fleet cannot access the project entry under Multicluster Management Troubleshooting.

Permissions errors

If you receive an authentication error when using the anthos-gke command-line tool, for example:

oauth2: cannot fetch token: 400 Bad Request

Re-authenticate your Google Cloud CLI installation with gcloud auth application-default login.

kubectl cannot connect to your cluster

If you receive the following error when trying to connect to your cluster with kubectl:

Error: could not get token: NoCredentialProviders: no valid providers in chain. Deprecated.
        For verbose messaging see aws.Config.CredentialsChainVerboseErrors

Solution

Your AWS CLI credentials might have expired. Confirm the credentials are valid with aws ec2 describe instances.

Regenerate your credentials with aws configure. If you use multi-factor authentication (MFA), regenerate your AWS MFA token.

An AWSCluster stays Provisioning

If the state of a cluster in kubectl get awsclusters or kubectl describe AWSCluster cluster-name does not change from Provisioning, check the value of your AWSCluster's spec.controlPlane.hub.membershipName field. The value of this field must be unique.

kubectl get awsclusters

NAME                                             STATE          AGE   VERSION         ENDPOINT
awscluster.multicloud.cluster.gke.io/cluster-0   Provisioning   8h    1.25.5-gke.2100   gke-123456a7-controlplane-abcdefg12345.elb.us-east-1.amazonaws.com

Solutions

To fix the problem, try the following steps:

TLS errors when connecting to your cluster

If you receive the following error when connecting to your cluster:

error dialing backend: remote error: tls: internal error

Check that your AWS VPC DHCP options set does not have a custom value for domain-name. When a custom domain is set, GKE on AWS cannot generate a certificate signing request (CSR).

Solution

To fix the problem, perform one of the following steps:

  • Set enableDnsHostnames to false on your VPC. If you create a dedicated AWS VPC with anthos-gke, you can set this value in the .terraform/modules/gke_vpc/modules/gke-vpc/main.tf file.

  • Update the value of domain-name to the default.

Rebooting your control plane

If your control plane becomes unresponsive, you can reboot its instances. When you reboot control plane instances, your configuration remains as state is saved on persistent disks.

  1. From your anthos-aws directory, use anthos-gke to switch context to your management service.
    cd anthos-aws
    anthos-gke aws management get-credentials
  2. Use kubectl to get your control plane's AWS EC2 target group from your AWSCluster.

    env HTTPS_PROXY=http://localhost:8118 \
      kubectl get awscluster cluster-name \
      -o jsonpath='{.status.targetGroupName}{"\n"}'
    

    The output includes the name of your control plane's EC2 target group. For example, gke-123456a7-controlplane.

  3. Open the AWS EC2 Console. Choose Target Groups from the left pane.

  4. Click the search bar and find your target group. Click the Name of your target group and then click Targets. The list of your control plane instances appears.

  5. For each instance in the target group, perform the following steps:

    1. Click the instance's Instance ID. The AWS EC2 Instances console appears.

    2. Click on the Instance ID.

    3. Select Actions -> Instance state -> terminate to remove the instance. EC2 automatically creates a new instance with the same EBS volume.

    4. Return to the Target Groups page.

  6. After you have terminated all the instances in the group, return to the Target Groups page.

  7. In the Registered Targets section of the page, find the Status column. Each of your instances should have a Status of Healthy. If any of the instances are healthy, wait several minutes and click the Refresh () icon.

  8. After all instances in the target group are healthy, proceed to the next step.

Tagging subnets

GKE on AWS requires tags on subnets that contain load balancer endpoints. GKE on AWS automatically tags all subnets specified in the spec.Networking.ServiceLoadBalancerSubnetIDs field of the AWSCluster resource.

If you would like to add additional subnets to your user cluster, or if you need to re-apply tags to existing subnets, perform the following steps.

  1. From your anthos-aws directory, use anthos-gke to switch context to your management service.

    cd anthos-aws
    anthos-gke aws management get-credentials

  2. Get the ID of your cluster's AWS VPC with kubectl and store it as a variable.

     export VPC_ID=$(\
     env HTTPS_PROXY=http://localhost:8118 \
     kubectl get awscluster cluster-0 -o jsonpath='{.spec.networking.vpcID}')
    
  3. Check the variables content with echo. The output resembles vpc-12345678abcdef0.

    echo $VPC_ID
    
  4. Save your cluster ID into an environment variable.

    export CLUSTER_ID=$(\
    env HTTPS_PROXY=http://localhost:8118 \
    kubectl get awscluster cluster-0 -o jsonpath='{.status.clusterID}')
    

    You can check the variable with echo:

    echo $CLUSTER_ID
    

    The response includes your cluster ID.

    gke-12345678
    
  5. If you installed GKE on AWS into a dedicated VPC, you can use the aws command-line tool to retrieve the subnet ID.

    Select from the following options:

    • Public, if you want to expose Services on your public subnet.
    • Private, if you want to expose Services on your private subnet.
    • Multiple subnets, if you want to expose Services on multiple subnets.

     export SUBNET_ID=$(aws ec2 describe-subnets \
     --filters "Name=vpc-id,Values=$VPC_ID" "Name=tag:Name,Values=*public*" \
     --query "Subnets[*].SubnetId" \
     --output text)
    

    The output is an object that contains your subnet ID. It resembles subnet-1234abcdefg. You can check the variable with echo:

    echo $SUBNET_ID
    

    The response includes your subnet ID.

    subnet-012345678abcdef
    
     export SUBNET_ID=$(aws ec2 describe-subnets \
     --filters "Name=vpc-id,Values=$VPC_ID" "Name=tag:Name,Values=*private*" \
     --query "Subnets[*].SubnetId" \
     --output text)
    

    The output is an object that contains your subnet ID. It resembles subnet-1234abcdefg. You can check the variable with echo:

    echo $SUBNET_ID
    

    The response includes your subnet ID.

    subnet-012345678abcdef
    

    If you are using multiple subnets for your AWSNodePools (for example, if you use multiple availability zones), you need to tag your subnet IDs individually.

    Retrieve your list of subnet IDs with aws ec2 describe-subnets.

    To get a list of all public subnets, run the following command:

    aws ec2 describe-subnets \
     --filters "Name=vpc-id,Values=$VPC_ID" "Name=tag:Name,Values=*public*" \
     --query "Subnets[*].SubnetId" \
     --output text
    

    To get a list of all private subnets, run the following command:

    aws ec2 describe-subnets \
     --filters "Name=vpc-id,Values=$VPC_ID" "Name=tag:Name,Values=*private*" \
     --query "Subnets[*].SubnetId" \
     --output text
    

    The response includes your subnet IDs.

    subnet-012345678abcdef
    subnet-abcdef123456789
    subnet-123456789abcdef
    
  6. Tag your subnet with your cluster ID. If you have multiple subnets, select Multiple subnets.

    aws ec2 create-tags \
    --resources $SUBNET_ID \
    --tags Key=kubernetes.io/cluster/$CLUSTER_ID,Value=shared
    

    For each of your subnets, run the following command:

    aws ec2 create-tags \
    --resources subnet-ids \
    --tags Key=kubernetes.io/cluster/$CLUSTER_ID,Value=shared
    

    Replace subnet-ids with the list of subnet IDs, separated by spaces. For example, subnet-012345678abcdef subnet-abcdef123456789 subnet-123456789abcdef.

Error when deleting your clusters

If kubectl does not respond when you try to delete a user cluster, get your cluster's events. You might see the following event.

Could not delete security group: resource SECURITY_GROUP_ID has a dependent object.

Solution

Delete any AWS Elastic File System mount targets in SECURITY_GROUP_ID.

API errors

Kubernetes 1.22 deprecates and replaces several APIs. If you've upgraded your cluster to version 1.22 or later, any calls your application makes to one of the deprecated APIs will fail.

Solution

Upgrade your application to replace the deprecated API calls with their newer counterparts.

Snapshots

The anthos-gke tool supports generating snapshots of your GKE on AWS environment. A snapshot contains information to help Google Cloud support reproduce and debug issues offline.

The anthos-gke tool can generate a snapshot from either a management service or user clusters. By default, a snapshot includes CRDs, Events, cluster information, cluster logs, instance logs, and instance files. You can customize the information included with a configuration file. The bundle also contains an index.html with links to the included files.

If you would like to troubleshoot your GKE on AWS installation before contacting Google Cloud support, skip to the following section.

Generating a snapshot

This section shows how to create a snapshot from a management or user cluster using the default configuration or a custom configuration.

Before you begin

To connect to your GKE on AWS resources, perform the following steps. Select if you have an existing AWS VPC (or direct connection to your VPC) or created a dedicated VPC when creating your management service.

If you have a direct or VPN connection to an existing VPC, omit the line env HTTP_PROXY=http://localhost:8118 from commands in this topic.

When you create a management service in a dedicated VPC, GKE on AWS includes a bastion host in a public subnet.

To connect to your management service, perform the following steps:

  1. Change to the directory with your GKE on AWS configuration. You created this directory when Installing the management service.

    cd anthos-aws

  2. To open the tunnel, run the bastion-tunnel.sh script. The tunnel forwards to localhost:8118.

    To open a tunnel to the bastion host, run the following command:

    ./bastion-tunnel.sh -N
    

    Messages from the SSH tunnel appear in this window. When you are ready to close the connection, stop the process by using Control+C or closing the window.

  3. Open a new terminal and change into your anthos-aws directory.

    cd anthos-aws
  4. Check that you're able to connect to the cluster with kubectl.

    env HTTPS_PROXY=http://localhost:8118 \
    kubectl cluster-info
    

    The output includes the URL for the management service API server.

  • From your anthos-aws directory, use anthos-gke to switch context to your management service.

    cd anthos-aws
    anthos-gke aws management get-credentials

Capture a snapshot of a management service

To capture a snapshot of a management service, use anthos-gke aws management diagnose snapshot.

env HTTPS_PROXY=http://localhost:8118 \
  anthos-gke aws management diagnose snapshot \
  --ssh-key-path ssh-key-path \
  --workspace workspace

Replace:

  • ssh-key-path with the path to your GKE on AWS ssh key. By default the file is created in .ssh/anthos-gke.
  • workspace with the path to the directory where you created your GKE on AWS deployment. If a bastion host was created as part of a dedicated VPC installation, GKE on AWS uses the bastion to connect to your management service.

The output includes the log of actions and the name of the snapshot file:

2020/06/15 15:39:48 Found bastion instance in tfworkspace. IP: bastion-ip.aws-zone.compute.amazonaws.com
writing file: /tmp/kubeconfig-mgmt679794004/kubeconfig.conf
snapshot: 2020/06/15 15:39:50 Getting snapshot of controlPlane...
snapshot: 2020/06/15 15:39:50 Getting snapshot of kubectl command...
snapshot: 2020/06/15 15:39:52 Getting snapshot of control plane managed aws resources
...
ip-10-0-1-44/commands/ip_route_list_table_all.out
/tmp/tmp.Z26niLmVfU/snapshot.tar.gz
2020/06/15 15:40:04 Snapshot saved in snapshot-1592260783.tar.gz.

Capture a snapshot of a user cluster

To capture a snapshot of a user cluster, use anthos-gke aws clusters diagnose snapshot.

env HTTPS_PROXY=http://localhost:8118 \
  anthos-gke aws clusters diagnose snapshot user-cluster-name \
  --ssh-key-path ssh-key-path --workspace terraform-workspace

Replace:

  • user-cluster-name is the name of your user cluster. By default, your first user cluster is named cluster-0.
  • ssh-key-path with the path to your GKE on AWS ssh key. By default the file is created in .ssh/anthos-gke.
  • terraform-workspace with the path to the Terraform workspace where you created your GKE on AWS deployment.

The output includes the log of actions and the name of the snapshot file:

2020/06/15 15:43:37 Found bastion instance in tfworkspace. IP: bastion-ip.aws-zone.compute.amazonaws.com
writing file: /tmp/kubeconfig-mgmt616204648/kubeconfig.conf
snapshot: 2020/06/15 15:43:40 Getting snapshot of controlPlane...
snapshot: 2020/06/15 15:43:40 Getting snapshot of kubectl command...
writing file: /tmp/kubeconfig-clustercluster-0620691367/kubeconfig.conf
snapshot: 2020/06/15 15:43:43 Getting snapshot of cluster default/cluster-0
snapshot: 2020/06/15 15:43:43 Getting snapshot of controlPlane...
snapshot: 2020/06/15 15:43:43 Getting snapshot of kubectl command...
snapshot: 2020/06/15 15:43:46 Getting snapshot of control plane managed aws resources
...
snapshot: 2020/06/15 15:43:48 Getting snapshot of node pools
snapshot: 2020/06/15 15:43:48 Getting snapshot of node pool default/pool-0
snapshot: 2020/06/15 15:43:48 Getting snapshot of node pool managed aws resources
...
2020/06/15 15:44:00 Snapshot saved in snapshot-1592261012.tar.gz.

Modifying the snapshot configuration

To modify the default snapshot configuration, first you use anthos-gke with the --dry-run option and save the output to a file. Then, edit your configuration and re-run anthos-gke with the new configuration.

Create a configuration file for the management service

  1. Use anthos-gke with the --dry-run option and redirect the output to a file named management-config.yaml.

    env HTTPS_PROXY=http://localhost:8118 \
      anthos-gke aws management diagnose snapshot --ssh-key-path ssh-key-path \
      --workspace workspace --dry-run > management-config.yaml 2>&1
    

    Replace:

    • ssh-key-path with the path to your GKE on AWS ssh key. By default the file is created in ~/.ssh/anthos-gke.
    • workspace with the path to the directory where you created your GKE on AWS deployment.

  1. Edit the management-config.yaml file in a text editor. The file contains two sections. The first section is a timestamped log from anthos-gke, which looks like the following:

    2020/06/15 15:26:51 Found bastion instance in tfworkspace. IP: bastion-ip.aws-zone.compute.amazonaws.com
    2020/06/15 15:26:51 Running in dry-run mode...
    ...
    

    Delete the log section. Keep the configuration after the line The snapshot configuration is:.

    The remaining contents of the file are a YAML configuration that looks like the following:

mgmtCluster:
  instanceCommands:
  - dmesg
  - sudo crictl ps a
  - systemctl status -l containerd
  - journalctl --since '1 hour ago' --utc -u containerd
  - systemctl status -l kubelet
  - journalctl --since '1 hour ago' --utc -u kubelet
  - journalctl --since '1 hour ago' --utc --boot --dmesg
  - uptime
  - df --all --inodes
  - ip addr
  - sudo iptables-save --counters
  - mount
  - ip route list table all
  - top -bn1
  - ps -edF
  - ps -eo pid,tid,ppid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:14,comm,args,cgroup
  - chronyc sources -v
  - journalctl --since '1 hour ago' --utc -u chrony
  instanceFiles:
  - /etc/kubernetes/manifests/
  - /proc/sys/fs/file-nr
  - /proc/sys/net/nf_conntrack_max
  - /var/log/startup.log
  - /var/log/cloud-init-output.log
  - /var/log/containers
  kubectlCommands:
  - commands:
    - kubectl get events
    - kubectl version
    - kubectl cluster-info
    - kubectl get clusterroles -o wide
    - kubectl get clusterrolebindings -o wide
    - kubectl get crd -o wide
    - kubectl describe clusterroles
    - kubectl describe clusterrolebindings
    - kubectl describe crd
    - kubectl get all -o yaml
    - kubectl describe all
  managedAWSResources: true
numOfThreads: 10
truncate: 2000
  1. Remove any commands you believe expose sensitive data.

  2. Run the anthos-gke command and with the --snapshot-config option and management-config.yaml.

    env HTTPS_PROXY=http://localhost:8118 \
      anthos-gke aws management diagnose snapshot --ssh-key-path ssh-key-path \
      --snapshot-config management-config.yaml --workspace terraform-workspace
    

    Replace:

    • ssh-key-path with the path to your GKE on AWS ssh key. By default the file is created in .ssh/anthos-gke.
    • terraform-workspace with the path to the Terraform workspace where you created your GKE on AWS deployment.

Create a configuration file for a user cluster

  1. Use anthos-gke with the --dry-run option and redirect the output to a file named user-config.yaml.

    env HTTPS_PROXY=http://localhost:8118 \
      anthos-gke aws clusters diagnose snapshot user-cluster-name \
      --ssh-key-path ssh-key-path --snapshot-config snapshot-config-path  \
      --workspace workspace --dry-run > user-config.yaml 2>&1
    

    Replace:

    • user-cluster-name is the name of your user cluster. By default, your first cluster is named cluster-0.
    • ssh-key-path with the path to your GKE on AWS ssh key. By default the file is created in ~/.ssh/anthos-gke.
    • workspace with the path to the directory where you created your GKE on AWS deployment.

  1. Edit the management-config.yaml file in a text editor. The file contains two sections. The first section is a timestamped log from anthos-gke, which looks like the following:

    2020/06/15 15:26:51 Found bastion instance in tfworkspace. IP: bastion-ip.aws-zone.compute.amazonaws.com
    2020/06/15 15:26:51 Running in dry-run mode...
    ...
    

    Delete the log section. Keep the configuration after the line The snapshot configuration is:.

    The remaining contents of the file are a YAML configuration that looks like the following:

clusters:
- clusterName: cluster-0
  controlPlane:
    instanceCommands:
    - dmesg
    - sudo crictl ps a
    - systemctl status -l containerd
    - journalctl --since '1 hour ago' --utc -u containerd
    - systemctl status -l kubelet
    - journalctl --since '1 hour ago' --utc -u kubelet
    - journalctl --since '1 hour ago' --utc --boot --dmesg
    - uptime
    - df --all --inodes
    - ip addr
    - sudo iptables-save --counters
    - mount
    - ip route list table all
    - top -bn1
    - sudo docker ps -a
    - ps -edF
    - ps -eo pid,tid,ppid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:14,comm,args,cgroup
    instanceFiles:
    - /etc/kubernetes/manifests/
    - /proc/sys/fs/file-nr
    - /proc/sys/net/nf_conntrack_max
    - /var/log/startup.log
    - /var/log/cloud-init-output.log
    - /var/log/containers
    kubectlCommands:
    - commands:
      - kubectl get events
      - kubectl version
      - kubectl cluster-info
      - kubectl get clusterroles -o wide
      - kubectl get clusterrolebindings -o wide
      - kubectl get crd -o wide
      - kubectl describe clusterroles
      - kubectl describe clusterrolebindings
      - kubectl describe crd
      - kubectl get all -o yaml
      - kubectl describe all
      - kubectl logs --namespace=kube-system -l k8s-app=aws-ebs-csi-driver-node --all-containers
      - kubectl logs --namespace=kube-system -l k8s-app=aws-efs-csi-driver-node --all-containers
      - kubectl logs --namespace=kube-system -l k8s-app=calico-node --all-containers
      - kubectl logs --namespace=kube-system -l k8s-app=node-local-dns --all-containers
      - kubectl logs --namespace=kube-system -l k8s-app=kube-proxy --all-containers
      - kubectl describe nodes
    managedAWSResources: true
  nodePools:
  - NodePoolName: ""
    instanceCommands:
    - dmesg
    - sudo crictl ps a
    - systemctl status -l containerd
    - journalctl --since '1 hour ago' --utc -u containerd
    - systemctl status -l kubelet
    - journalctl --since '1 hour ago' --utc -u kubelet
    - journalctl --since '1 hour ago' --utc --boot --dmesg
    - uptime
    - df --all --inodes
    - ip addr
    - sudo iptables-save --counters
    - mount
    - ip route list table all
    - top -bn1
    - sudo docker ps -a
    - ps -edF
    - ps -eo pid,tid,ppid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:14,comm,args,cgroup
    instanceFiles:
    - /etc/kubernetes/manifests/
    - /proc/sys/fs/file-nr
    - /proc/sys/net/nf_conntrack_max
    - /var/log/startup.log
    - /var/log/cloud-init-output.log
    - /var/log/containers
    managedAWSResources: true
mgmtCluster:
  kubectlCommands:
  - commands:
    - kubectl get awscluster -oyaml
numOfThreads: 10
truncate: 2000
  1. Remove any commands you believe expose sensitive data.

  2. Run the anthos-gke command and with the --snapshot-config option and user-config.yaml.

    env HTTPS_PROXY=http://localhost:8118 \
      anthos-gke aws clusters diagnose snapshot user-cluster-name
      --ssh-key-path <ssh-key-path> \
      --snapshot-config user-config.yaml --workspace <terraform-workspace>
    

    Replace:

    • user-cluster-name is the name of your user cluster. By default, your first cluster is named cluster-0.
    • ssh-key-path with the path to your GKE on AWS ssh key. By default the file is created in .ssh/anthos-gke.
    • terraform-workspace with the path to the Terraform workspace where you created your GKE on AWS deployment.

Default snapshot contents

This section shows example file trees contained within the default snapshot for a management service and a user cluster.

The following tree is an example of the files within a snapshot for a management cluster.

.
├── index.html
├── mgmt_cluster
│   ├── controlplane-0-10.0.1.44
│   │   └── ip-10-0-1-44
│   │       ├── commands
│   │       │   ├── df_--all_--inodes.out
│   │       │   ├── dmesg.out
│   │       │   ├── ip_addr.out
│   │       │   ├── ip_route_list_table_all.out
│   │       │   ├── journalctl_--since__1_hour_ago__--utc_--boot_--dmesg.out
│   │       │   ├── journalctl_--since__1_hour_ago__--utc_-u_containerd.out
│   │       │   ├── journalctl_--since__1_hour_ago__--utc_-u_kubelet.out
│   │       │   ├── mount.out
│   │       │   ├── ps_-edF.out
│   │       │   ├── ps_-eo_pid,tid,ppid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:14,comm,args,cgroup.out
│   │       │   ├── sudo_crictl_ps_a.out
│   │       │   ├── sudo_docker_ps_-a.err
│   │       │   ├── sudo_docker_ps_-a.out
│   │       │   ├── sudo_iptables-save_--counters.out
│   │       │   ├── systemctl_status_-l_containerd.out
│   │       │   ├── systemctl_status_-l_kubelet.out
│   │       │   ├── top_-bn1.out
│   │       │   └── uptime.out
│   │       └── files
│   │           ├── etc
│   │           │   └── kubernetes
│   │           │       └── manifests
│   │           │           ├── etcd.yaml
│   │           │           ├── gke-aws-cluster-operator.yaml
│   │           │           ├── kube-apiserver.yaml
│   │           │           └── kube-controller-manager.yaml
│   │           ├── proc
│   │           │   └── sys
│   │           │       ├── fs
│   │           │       │   └── file-nr
│   │           │       └── net
│   │           │           └── nf_conntrack_max
│   │           └── var
│   │               └── log
│   │                   ├── cloud-init-output.log
│   │                   ├── containers
│   │                   │   ├── etcd-ip-10-0-1-44.ap-southeast-2.compute.internal_kube-system_kube-etcd-149e96d0b0da2250505a6b41603e57f42a5386701fa0033840e8f3b211b49733.log
│   │                   │   ├── gke-aws-cluster-operator-ip-10-0-1-44.ap-southeast-2.compute.internal_kube-system_gke-aws-cluster-operator-d423d3191ce1a8c65c4a0f30f1d7598a8739c0aba65784355b28dee0d694626a.log
│   │                   │   ├── kube-apiserver-ip-10-0-1-44.ap-southeast-2.compute.internal_kube-system_kube-apiserver-48061659a4b77f4b50eed5819dbfab5586dc9086fa24217cc16486bd852dfbf6.log
│   │                   │   ├── kube-apiserver-ip-10-0-1-44.ap-southeast-2.compute.internal_kube-system_kube-apiserver-baf60859cd807e9325295fde7a8d3cd16c3d5e73abca87acc107cee5e08f4c1c.log
│   │                   │   └── kube-controller-manager-ip-10-0-1-44.ap-southeast-2.compute.internal_kube-system_kube-controller-manager-af9b4ffb40ada3383630090948ec8133ca0e3e54c232dd3f068b3bd8bbee8f92.log
│   │                   └── startup.log
│   ├── kubectl
│   │   ├── kubectl_cluster-info_--kubeconfig_.tmp.kubeconfig-mgmt609381529.kubeconfig.conf.out
│   │   ├── kubectl_describe_all_--kubeconfig_.tmp.kubeconfig-mgmt609381529.kubeconfig.conf.out
│   │   ├── kubectl_describe_clusterrolebindings_--kubeconfig_.tmp.kubeconfig-mgmt609381529.kubeconfig.conf.out
│   │   ├── kubectl_describe_clusterroles_--kubeconfig_.tmp.kubeconfig-mgmt609381529.kubeconfig.conf.out
│   │   ├── kubectl_describe_crd_--kubeconfig_.tmp.kubeconfig-mgmt609381529.kubeconfig.conf.out
│   │   ├── kubectl_get_all_-o_yaml_--kubeconfig_.tmp.kubeconfig-mgmt609381529.kubeconfig.conf.out
│   │   ├── kubectl_get_clusterrolebindings_-o_wide_--kubeconfig_.tmp.kubeconfig-mgmt609381529.kubeconfig.conf.out
│   │   ├── kubectl_get_clusterroles_-o_wide_--kubeconfig_.tmp.kubeconfig-mgmt609381529.kubeconfig.conf.out
│   │   ├── kubectl_get_crd_-o_wide_--kubeconfig_.tmp.kubeconfig-mgmt609381529.kubeconfig.conf.out
│   │   ├── kubectl_get_events_--kubeconfig_.tmp.kubeconfig-mgmt609381529.kubeconfig.conf.out
│   │   └── kubectl_version_--kubeconfig_.tmp.kubeconfig-mgmt609381529.kubeconfig.conf.out
│   └── managed_resources
│       ├── controlplane-0
│       │   ├── asg.out
│       │   └── instance-0.out
│       ├── elb.out
│       └── target_group.out
├── snapshot.config
└── snapshot.log

The following tree is an example of the files within a snapshot for a user cluster named cluster-0.

.
├── cluster
│   └── cluster-0
│       ├── control_plane
│       │   ├── controlplane-0-10.0.1.7
│       │   │   └── ip-10-0-1-7
│       │   │       ├── commands
│       │   │       │   ├── df_--all_--inodes.out
│       │   │       │   ├── dmesg.out
│       │   │       │   ├── ip_addr.out
│       │   │       │   ├── ip_route_list_table_all.out
│       │   │       │   ├── journalctl_--since__1_hour_ago__--utc_--boot_--dmesg.out
│       │   │       │   ├── journalctl_--since__1_hour_ago__--utc_-u_containerd.out
│       │   │       │   ├── journalctl_--since__1_hour_ago__--utc_-u_kubelet.out
│       │   │       │   ├── mount.out
│       │   │       │   ├── ps_-edF.out
│       │   │       │   ├── ps_-eo_pid,tid,ppid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:14,comm,args,cgroup.out
│       │   │       │   ├── sudo_crictl_ps_a.out
│       │   │       │   ├── sudo_docker_ps_-a.err
│       │   │       │   ├── sudo_docker_ps_-a.out
│       │   │       │   ├── sudo_iptables-save_--counters.out
│       │   │       │   ├── systemctl_status_-l_containerd.out
│       │   │       │   ├── systemctl_status_-l_kubelet.out
│       │   │       │   ├── top_-bn1.out
│       │   │       │   └── uptime.out
│       │   │       └── files
│       │   │           ├── etc
│       │   │           │   └── kubernetes
│       │   │           │       └── manifests
│       │   │           │           ├── aws-ebs-csi-driver-controller.yaml
│       │   │           │           ├── aws-encryption-provider.yaml
│       │   │           │           ├── cluster-autoscaler.yaml
│       │   │           │           ├── etcd-events.yaml
│       │   │           │           ├── etcd.yaml
│       │   │           │           ├── kube-apiserver.yaml
│       │   │           │           ├── kube-controller-manager.yaml
│       │   │           │           └── kube-scheduler.yaml
│       │   │           ├── proc
│       │   │           │   └── sys
│       │   │           │       ├── fs
│       │   │           │       │   └── file-nr
│       │   │           │       └── net
│       │   │           │           └── nf_conntrack_max
│       │   │           └── var
│       │   │               └── log
│       │   │                   ├── cloud-init-output.log
│       │   │                   ├── containers
│       │   │                   │   ├── aws-ebs-csi-driver-controller-ip-10-0-1-7.ap-southeast-2.compute.internal_kube-system_csi-attacher-218b7834cda8b4ae0f6687e06b33426ca39669a6c2652948e17746d49ed4c7c9.log
│       │   │                   │   ├── aws-ebs-csi-driver-controller-ip-10-0-1-7.ap-southeast-2.compute.internal_kube-system_csi-provisioner-ff1ba1960712a00065db1e036e1aaf5aeaca0979c833d020ad1cafdea05a76c7.log
│       │   │                   │   ├── aws-ebs-csi-driver-controller-ip-10-0-1-7.ap-southeast-2.compute.internal_kube-system_ebs-plugin-697389a6c73bdb4a0370a644a28617b3b8a12862341b91ca2d640aa66724affd.log
│       │   │                   │   ├── aws-encryption-provider-ip-10-0-1-7.ap-southeast-2.compute.internal_kube-system_aws-encryption-provider-b08216cbca004f71e68145b9a38b931276dd9ef92d26c53b85275587ce28f3ca.log
│       │   │                   │   ├── cluster-autoscaler-ip-10-0-1-7.ap-southeast-2.compute.internal_kube-system_cluster-autoscaler-57f9ca6abec10a76b42449dababea6c963853b1aa30f1db2b87d963311d03629.log
│       │   │                   │   ├── cluster-autoscaler-ip-10-0-1-7.ap-southeast-2.compute.internal_kube-system_cluster-autoscaler-9c09937ddbe3220c896f857a8b8c02c84062f13092b39ebac3ab1ce26f13b317.log
│       │   │                   │   ├── etcd-events-ip-10-0-1-7.ap-southeast-2.compute.internal_kube-system_kube-etcd-events-4e79334e69f670a3a4637c20635944abb71ed93d6e802407ef5881478ee78dc1.log
│       │   │                   │   ├── etcd-ip-10-0-1-7.ap-southeast-2.compute.internal_kube-system_kube-etcd-e6dbe39ef969fb2f049292d4f3a66a41d22f963b40f72f5f91ad6acd9e9cde77.log
│       │   │                   │   ├── kube-apiserver-ip-10-0-1-7.ap-southeast-2.compute.internal_kube-system_kube-apiserver-e61770a46518313306e1668c34e4efbdb3ed81b7f451dc3278a00a40fee09e0d.log
│       │   │                   │   ├── kube-controller-manager-ip-10-0-1-7.ap-southeast-2.compute.internal_kube-system_kube-controller-manager-3b33df6a4d4cca8fd63f90d4fcbee65595e71c0390a5c29c81670d0232b98edc.log
│       │   │                   │   └── kube-scheduler-ip-10-0-1-7.ap-southeast-2.compute.internal_kube-system_kube-scheduler-0aae214e17741189db8d3608275e71551f62f43619e07a37a11017b88a611970.log
│       │   │                   └── startup.log
│       │   ├── controlplane-1-10.0.1.61
│       │   │   └── ip-10-0-1-61
│       │   │       ├── commands
│       │   │       │   ├── df_--all_--inodes.out
│       │   │       │   ├── dmesg.out
│       │   │       │   ├── ip_addr.out
│       │   │       │   ├── ip_route_list_table_all.out
│       │   │       │   ├── journalctl_--since__1_hour_ago__--utc_--boot_--dmesg.out
│       │   │       │   ├── journalctl_--since__1_hour_ago__--utc_-u_containerd.out
│       │   │       │   ├── journalctl_--since__1_hour_ago__--utc_-u_kubelet.out
│       │   │       │   ├── mount.out
│       │   │       │   ├── ps_-edF.out
│       │   │       │   ├── ps_-eo_pid,tid,ppid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:14,comm,args,cgroup.out
│       │   │       │   ├── sudo_crictl_ps_a.out
│       │   │       │   ├── sudo_docker_ps_-a.err
│       │   │       │   ├── sudo_docker_ps_-a.out
│       │   │       │   ├── sudo_iptables-save_--counters.out
│       │   │       │   ├── systemctl_status_-l_containerd.out
│       │   │       │   ├── systemctl_status_-l_kubelet.out
│       │   │       │   ├── top_-bn1.out
│       │   │       │   └── uptime.out
│       │   │       └── files
│       │   │           ├── etc
│       │   │           │   └── kubernetes
│       │   │           │       └── manifests
│       │   │           │           ├── aws-ebs-csi-driver-controller.yaml
│       │   │           │           ├── aws-encryption-provider.yaml
│       │   │           │           ├── cluster-autoscaler.yaml
│       │   │           │           ├── etcd-events.yaml
│       │   │           │           ├── etcd.yaml
│       │   │           │           ├── kube-apiserver.yaml
│       │   │           │           ├── kube-controller-manager.yaml
│       │   │           │           └── kube-scheduler.yaml
│       │   │           ├── proc
│       │   │           │   └── sys
│       │   │           │       ├── fs
│       │   │           │       │   └── file-nr
│       │   │           │       └── net
│       │   │           │           └── nf_conntrack_max
│       │   │           └── var
│       │   │               └── log
│       │   │                   ├── cloud-init-output.log
│       │   │                   ├── containers
│       │   │                   │   ├── aws-ebs-csi-driver-controller-ip-10-0-1-61.ap-southeast-2.compute.internal_kube-system_csi-attacher-63345efd65ea584c35f4b0d2de443bf42e83e65324e899be27335a25fe07a72c.log
│       │   │                   │   ├── aws-ebs-csi-driver-controller-ip-10-0-1-61.ap-southeast-2.compute.internal_kube-system_csi-provisioner-6f66e7479c319fbcbcaf53f9b5398cd8e53bcd646fa9788afbc25a69fc9291fe.log
│       │   │                   │   ├── aws-ebs-csi-driver-controller-ip-10-0-1-61.ap-southeast-2.compute.internal_kube-system_ebs-plugin-2ee649206dd099e29b8bb3cbf27bef499b851682c07590a34c2e08d9545ca51b.log
│       │   │                   │   ├── aws-encryption-provider-ip-10-0-1-61.ap-southeast-2.compute.internal_kube-system_aws-encryption-provider-3d2b5c28b7389e1303d2e36dd510ec40cef99f2ea63823901ea9806869def8fa.log
│       │   │                   │   ├── cluster-autoscaler-ip-10-0-1-61.ap-southeast-2.compute.internal_kube-system_cluster-autoscaler-ebc572523516845d023884810f721865c2f0a76e34aaf92babdffacf4c95f75b.log
│       │   │                   │   ├── cluster-autoscaler-ip-10-0-1-61.ap-southeast-2.compute.internal_kube-system_cluster-autoscaler-f7a7480c9adb08077b9a07d2f14e2b14dda7b4d50027cf105d90f37c63945bfa.log
│       │   │                   │   ├── etcd-events-ip-10-0-1-61.ap-southeast-2.compute.internal_kube-system_kube-etcd-events-60301eef32b7c10d0aea3de19549bfdcc3e4152cf3ca8ca7d5e10785e2e232fd.log
│       │   │                   │   ├── etcd-ip-10-0-1-61.ap-southeast-2.compute.internal_kube-system_kube-etcd-25a369f08d0d2bbc9f8f83337724e14c9878a1a0249cc5e9c7c63cae3d3657a1.log
│       │   │                   │   ├── kube-apiserver-ip-10-0-1-61.ap-southeast-2.compute.internal_kube-system_kube-apiserver-13a276422964d2674f16d971bafcd25555eee3eb10b6a6f60686e8b8810a5def.log
│       │   │                   │   ├── kube-controller-manager-ip-10-0-1-61.ap-southeast-2.compute.internal_kube-system_kube-controller-manager-18625e69e9604fbacbe82aebc6dc18dd4427269941a25a9bdef3fc0e5a4dfb9e.log
│       │   │                   │   └── kube-scheduler-ip-10-0-1-61.ap-southeast-2.compute.internal_kube-system_kube-scheduler-12f48aad99ecc18b450ebcda85ffb7f138bbb6bc261fb06e75ae1548647eaa45.log
│       │   │                   └── startup.log
│       │   ├── controlplane-2-10.0.1.161
│       │   │   └── ip-10-0-1-161
│       │   │       ├── commands
│       │   │       │   ├── df_--all_--inodes.out
│       │   │       │   ├── dmesg.out
│       │   │       │   ├── ip_addr.out
│       │   │       │   ├── ip_route_list_table_all.out
│       │   │       │   ├── journalctl_--since__1_hour_ago__--utc_--boot_--dmesg.out
│       │   │       │   ├── journalctl_--since__1_hour_ago__--utc_-u_containerd.out
│       │   │       │   ├── journalctl_--since__1_hour_ago__--utc_-u_kubelet.out
│       │   │       │   ├── mount.out
│       │   │       │   ├── ps_-edF.out
│       │   │       │   ├── ps_-eo_pid,tid,ppid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:14,comm,args,cgroup.out
│       │   │       │   ├── sudo_crictl_ps_a.out
│       │   │       │   ├── sudo_docker_ps_-a.err
│       │   │       │   ├── sudo_docker_ps_-a.out
│       │   │       │   ├── sudo_iptables-save_--counters.out
│       │   │       │   ├── systemctl_status_-l_containerd.out
│       │   │       │   ├── systemctl_status_-l_kubelet.out
│       │   │       │   ├── top_-bn1.out
│       │   │       │   └── uptime.out
│       │   │       └── files
│       │   │           ├── etc
│       │   │           │   └── kubernetes
│       │   │           │       └── manifests
│       │   │           │           ├── aws-ebs-csi-driver-controller.yaml
│       │   │           │           ├── aws-encryption-provider.yaml
│       │   │           │           ├── cluster-autoscaler.yaml
│       │   │           │           ├── etcd-events.yaml
│       │   │           │           ├── etcd.yaml
│       │   │           │           ├── kube-apiserver.yaml
│       │   │           │           ├── kube-controller-manager.yaml
│       │   │           │           └── kube-scheduler.yaml
│       │   │           ├── proc
│       │   │           │   └── sys
│       │   │           │       ├── fs
│       │   │           │       │   └── file-nr
│       │   │           │       └── net
│       │   │           │           └── nf_conntrack_max
│       │   │           └── var
│       │   │               └── log
│       │   │                   ├── cloud-init-output.log
│       │   │                   ├── containers
│       │   │                   │   ├── aws-ebs-csi-driver-controller-ip-10-0-1-161.ap-southeast-2.compute.internal_kube-system_csi-attacher-0d66d0e6d7ead9a0af3ee2b9ea7769669a33636639549571ed10eaacf7ddd85b.log
│       │   │                   │   ├── aws-ebs-csi-driver-controller-ip-10-0-1-161.ap-southeast-2.compute.internal_kube-system_csi-provisioner-c68d3acbdf8f319fe1d700eb3584fd07016d7a8b507e05261b1596fb96ca7598.log
│       │   │                   │   ├── aws-ebs-csi-driver-controller-ip-10-0-1-161.ap-southeast-2.compute.internal_kube-system_ebs-plugin-66c248fc6c21021355ad5aa20ec98894b3370d1b58d86d3bf4b794bfb971eaef.log
│       │   │                   │   ├── aws-encryption-provider-ip-10-0-1-161.ap-southeast-2.compute.internal_kube-system_aws-encryption-provider-f79e0367f399331652f7beb9145e97eefe95a635a3101ffb73bb8c29d449304e.log
│       │   │                   │   ├── cluster-autoscaler-ip-10-0-1-161.ap-southeast-2.compute.internal_kube-system_cluster-autoscaler-4ce584e9f04e3c545168a01a84b4a697a199e4ff536477d8cb884f89ab65872c.log
│       │   │                   │   ├── cluster-autoscaler-ip-10-0-1-161.ap-southeast-2.compute.internal_kube-system_cluster-autoscaler-a384f6202e6b7f0a3d5918adc87a8acf158a4e5d13401825a129188663cf32d7.log
│       │   │                   │   ├── etcd-events-ip-10-0-1-161.ap-southeast-2.compute.internal_kube-system_kube-etcd-events-4975f4f7ea692237be1016e2c03e024ca58cc78745b482ca41fe80481c425f28.log
│       │   │                   │   ├── etcd-ip-10-0-1-161.ap-southeast-2.compute.internal_kube-system_kube-etcd-92a56adf23ce887f032335ccc2ebd48e39de6ddd600302fe985d3b088e8d4eea.log
│       │   │                   │   ├── kube-apiserver-ip-10-0-1-161.ap-southeast-2.compute.internal_kube-system_kube-apiserver-76551958905114e0eaa056c0e3eb7cc0af7d9f6291af9efe49bbab95250500ce.log
│       │   │                   │   ├── kube-controller-manager-ip-10-0-1-161.ap-southeast-2.compute.internal_kube-system_kube-controller-manager-d50c38a03f5e01ca438508db11091e9421fa8eb7231f484303a350a7b0538439.log
│       │   │                   │   └── kube-scheduler-ip-10-0-1-161.ap-southeast-2.compute.internal_kube-system_kube-scheduler-7ebaccccbf67c06d379b1541f1970e4e987de138556542469cc24aacea1c9213.log
│       │   │                   └── startup.log
│       │   ├── kubectl
│       │   │   ├── kubectl_cluster-info_--kubeconfig_.tmp.kubeconfig-clustercluster-0904143419.kubeconfig.conf.out
│       │   │   ├── kubectl_describe_all_--kubeconfig_.tmp.kubeconfig-clustercluster-0904143419.kubeconfig.conf.out
│       │   │   ├── kubectl_describe_clusterrolebindings_--kubeconfig_.tmp.kubeconfig-clustercluster-0904143419.kubeconfig.conf.out
│       │   │   ├── kubectl_describe_clusterroles_--kubeconfig_.tmp.kubeconfig-clustercluster-0904143419.kubeconfig.conf.out
│       │   │   ├── kubectl_describe_crd_--kubeconfig_.tmp.kubeconfig-clustercluster-0904143419.kubeconfig.conf.out
│       │   │   ├── kubectl_get_all_-o_yaml_--kubeconfig_.tmp.kubeconfig-clustercluster-0904143419.kubeconfig.conf.out
│       │   │   ├── kubectl_get_clusterrolebindings_-o_wide_--kubeconfig_.tmp.kubeconfig-clustercluster-0904143419.kubeconfig.conf.out
│       │   │   ├── kubectl_get_clusterroles_-o_wide_--kubeconfig_.tmp.kubeconfig-clustercluster-0904143419.kubeconfig.conf.out
│       │   │   ├── kubectl_get_crd_-o_wide_--kubeconfig_.tmp.kubeconfig-clustercluster-0904143419.kubeconfig.conf.out
│       │   │   ├── kubectl_get_events_--kubeconfig_.tmp.kubeconfig-clustercluster-0904143419.kubeconfig.conf.out
│       │   │   └── kubectl_version_--kubeconfig_.tmp.kubeconfig-clustercluster-0904143419.kubeconfig.conf.out
│       │   └── managed_resources
│       │       ├── controlplane-0
│       │       │   ├── asg.out
│       │       │   ├── eni.out
│       │       │   ├── etcd.out
│       │       │   └── instance-0.out
│       │       ├── controlplane-1
│       │       │   ├── asg.out
│       │       │   ├── eni.out
│       │       │   ├── etcd.out
│       │       │   └── instance-0.out
│       │       ├── controlplane-2
│       │       │   ├── asg.out
│       │       │   ├── eni.out
│       │       │   ├── etcd.out
│       │       │   └── instance-0.out
│       │       ├── elb.out
│       │       └── target_group.out
│       └── nodepools
│           ├── default
│           │   └── pool-0
│           │       ├── i-03a23df438ac8278e
│           │       │   └── ip-10-0-1-53
│           │       │       ├── commands
│           │       │       │   ├── df_--all_--inodes.out
│           │       │       │   ├── dmesg.out
│           │       │       │   ├── ip_addr.out
│           │       │       │   ├── ip_route_list_table_all.out
│           │       │       │   ├── journalctl_--since__1_hour_ago__--utc_--boot_--dmesg.out
│           │       │       │   ├── journalctl_--since__1_hour_ago__--utc_-u_containerd.out
│           │       │       │   ├── journalctl_--since__1_hour_ago__--utc_-u_kubelet.out
│           │       │       │   ├── mount.out
│           │       │       │   ├── ps_-edF.out
│           │       │       │   ├── ps_-eo_pid,tid,ppid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:14,comm,args,cgroup.out
│           │       │       │   ├── sudo_crictl_ps_a.out
│           │       │       │   ├── sudo_docker_ps_-a.err
│           │       │       │   ├── sudo_docker_ps_-a.out
│           │       │       │   ├── sudo_iptables-save_--counters.out
│           │       │       │   ├── systemctl_status_-l_containerd.out
│           │       │       │   ├── systemctl_status_-l_kubelet.out
│           │       │       │   ├── top_-bn1.out
│           │       │       │   └── uptime.out
│           │       │       └── files
│           │       │           ├── proc
│           │       │           │   └── sys
│           │       │           │       ├── fs
│           │       │           │       │   └── file-nr
│           │       │           │       └── net
│           │       │           │           └── nf_conntrack_max
│           │       │           └── var
│           │       │               └── log
│           │       │                   ├── cloud-init-output.log
│           │       │                   ├── containers
│           │       │                   │   ├── aws-ebs-csi-driver-node-mwxhs_kube-system_ebs-plugin-696201b4997d5cc72d85e6b005faa544ab2311571b50d2b1402b6e967a9364f0.log
│           │       │                   │   ├── aws-ebs-csi-driver-node-mwxhs_kube-system_node-driver-registrar-fcde7a18980aee3c690a84953f44341df9755e28ada6a42a6aea4c1b9d6cdd8e.log
│           │       │                   │   ├── calico-node-2g6zt_kube-system_calico-node-f6c22e30079cff40bef7deafbdfa2a97d0c3a4a95e7b68499c917adb1aa24b09.log
│           │       │                   │   ├── calico-node-2g6zt_kube-system_flexvol-driver-9c6e02ad10d342a91e2c3c3d924f747cfab756e719ffc580c073c6e3640b7515.log
│           │       │                   │   ├── calico-node-2g6zt_kube-system_install-cni-3649827b98cb5b2f79f8e9204b07247ca8d2768e4d13b1a8a1359278741ed156.log
│           │       │                   │   ├── calico-node-2g6zt_kube-system_upgrade-ipam-3a2fe7afee90bfe4e09b0939300be89f4795bc7a57e8085a57cb714e015092f6.log
│           │       │                   │   ├── coredns-88cd756b8-thm49_kube-system_coredns-6485d1e189c7b11fdc6249316ab6082360737c67edb77ab0d426eb26dba261ee.log
│           │       │                   │   ├── gke-connect-agent-20200605-02-00-8647455579-jh2r2_gke-connect_gke-connect-agent-20200605-02-00-369a43ce1bccb57bf3abfd503b9b25c81dbcd73d60a8642b68c0bb89b1b8e9fd.log
│           │       │                   │   └── kube-proxy-sg5nr_kube-system_kube-proxy-44f9171a644c8d7d0497900f361faa22fc673adc8336608ced096e655ccde762.log
│           │       │                   └── startup.log
│           │       ├── i-0569a9f23d49f59ea
│           │       │   └── ip-10-0-1-137
│           │       │       ├── commands
│           │       │       │   ├── df_--all_--inodes.out
│           │       │       │   ├── dmesg.out
│           │       │       │   ├── ip_addr.out
│           │       │       │   ├── ip_route_list_table_all.out
│           │       │       │   ├── journalctl_--since__1_hour_ago__--utc_--boot_--dmesg.out
│           │       │       │   ├── journalctl_--since__1_hour_ago__--utc_-u_containerd.out
│           │       │       │   ├── journalctl_--since__1_hour_ago__--utc_-u_kubelet.out
│           │       │       │   ├── mount.out
│           │       │       │   ├── ps_-edF.out
│           │       │       │   ├── ps_-eo_pid,tid,ppid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:14,comm,args,cgroup.out
│           │       │       │   ├── sudo_crictl_ps_a.out
│           │       │       │   ├── sudo_docker_ps_-a.err
│           │       │       │   ├── sudo_docker_ps_-a.out
│           │       │       │   ├── sudo_iptables-save_--counters.out
│           │       │       │   ├── systemctl_status_-l_containerd.out
│           │       │       │   ├── systemctl_status_-l_kubelet.out
│           │       │       │   ├── top_-bn1.out
│           │       │       │   └── uptime.out
│           │       │       └── files
│           │       │           ├── proc
│           │       │           │   └── sys
│           │       │           │       ├── fs
│           │       │           │       │   └── file-nr
│           │       │           │       └── net
│           │       │           │           └── nf_conntrack_max
│           │       │           └── var
│           │       │               └── log
│           │       │                   ├── cloud-init-output.log
│           │       │                   ├── containers
│           │       │                   │   ├── aws-ebs-csi-driver-node-zxxqg_kube-system_ebs-plugin-da9a84b2e45e2ad18d08cbab5260d4cee17636d868a645ab7343f50c25c64ece.log
│           │       │                   │   ├── aws-ebs-csi-driver-node-zxxqg_kube-system_node-driver-registrar-f96a7dbf1bac95c41e022b2ede129c664caafa6eff37caa52f6763c1e737be1a.log
│           │       │                   │   ├── calico-kube-controllers-56cd854695-mjfwx_kube-system_calico-kube-controllers-0ed4316450f5f2e6c4abfb5cc430ed18e2d530525e2ab0ed69a150eed5b3c860.log
│           │       │                   │   ├── calico-node-n5klf_kube-system_calico-node-36101112d423636164c236eacca76d6814c167203cfaf89754984cd79f3b6bbf.log
│           │       │                   │   ├── calico-node-n5klf_kube-system_flexvol-driver-5837e2ba75d549373ca0a3032d1be0c75c0dd442d2e25e286e9006e604794da2.log
│           │       │                   │   ├── calico-node-n5klf_kube-system_install-cni-0e19cfa737dcaaf8fbc40ee2e68460ea8888829b7fab4b8733d5322c339cf838.log
│           │       │                   │   ├── calico-node-n5klf_kube-system_upgrade-ipam-10c94c2fa5f67a69ad9ebeedf9764bbf566c99b50ef60f2f268d484bd028eb76.log
│           │       │                   │   └── kube-proxy-pzh87_kube-system_kube-proxy-5df6d54f9ff2dd036687e064186bcfc2b7c0536fd88586b5cac9e140ffa16658.log
│           │       │                   └── startup.log
│           │       └── i-05bfb8fc961337cc7
│           │           └── ip-10-0-1-167
│           │               ├── commands
│           │               │   ├── df_--all_--inodes.out
│           │               │   ├── dmesg.out
│           │               │   ├── ip_addr.out
│           │               │   ├── ip_route_list_table_all.out
│           │               │   ├── journalctl_--since__1_hour_ago__--utc_--boot_--dmesg.out
│           │               │   ├── journalctl_--since__1_hour_ago__--utc_-u_containerd.out
│           │               │   ├── journalctl_--since__1_hour_ago__--utc_-u_kubelet.out
│           │               │   ├── mount.out
│           │               │   ├── ps_-edF.out
│           │               │   ├── ps_-eo_pid,tid,ppid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:14,comm,args,cgroup.out
│           │               │   ├── sudo_crictl_ps_a.out
│           │               │   ├── sudo_docker_ps_-a.err
│           │               │   ├── sudo_docker_ps_-a.out
│           │               │   ├── sudo_iptables-save_--counters.out
│           │               │   ├── systemctl_status_-l_containerd.out
│           │               │   ├── systemctl_status_-l_kubelet.out
│           │               │   ├── top_-bn1.out
│           │               │   └── uptime.out
│           │               └── files
│           │                   ├── proc
│           │                   │   └── sys
│           │                   │       ├── fs
│           │                   │       │   └── file-nr
│           │                   │       └── net
│           │                   │           └── nf_conntrack_max
│           │                   └── var
│           │                       └── log
│           │                           ├── cloud-init-output.log
│           │                           ├── containers
│           │                           │   ├── aws-ebs-csi-driver-node-kdghk_kube-system_ebs-plugin-3e107a145cc86ac24014b1bf4670b26cb9372fd8022bc0698ca68b27e79a9bfe.log
│           │                           │   ├── aws-ebs-csi-driver-node-kdghk_kube-system_node-driver-registrar-25874dd7063db875a27f170e13e74267749c0d9846781ac8ab7568ac5f940a11.log
│           │                           │   ├── calico-node-b98tq_kube-system_calico-node-1dd735ce87fe6f0f73761d2d97c07ea6f908d0cd088e23f6b916b13b6805f828.log
│           │                           │   ├── calico-node-b98tq_kube-system_flexvol-driver-e9951b1a3de0ed9426de260c5591b4c161b4917873f2eaaf1cbdbd3926c9f933.log
│           │                           │   ├── calico-node-b98tq_kube-system_install-cni-58a2f1a5bfb16951a4b012b5ed30751d24c3380f489011274e3ca8de7eb1e1aa.log
│           │                           │   ├── calico-node-b98tq_kube-system_upgrade-ipam-f9804f2eef0d18122219dbb2843880a392f55493dab0edc16bce2dc5e186fa2a.log
│           │                           │   ├── kube-proxy-pf7sv_kube-system_kube-proxy-473ffafc30368b1cb370cd5cbbe4b20e77dfc383da04386d3ec02948f04bc97d.log
│           │                           │   ├── metrics-server-v0.3.3-85dfcbb78-fmklb_kube-system_metrics-server-4570fa1bd82d238d0ab11fc4256e5cc9fa97accece05f7f0c02d5edab884468e.log
│           │                           │   └── metrics-server-v0.3.3-85dfcbb78-fmklb_kube-system_metrics-server-nanny-71eeba81fb3cf128066ca965aabc5bfdf8e045790a46a9ec7e56e73ad3859218.log
│           │                           └── startup.log
│           └── managed_resources
│               ├── asg.out
│               ├── instance-0.out
│               ├── instance-1.out
│               └── instance-2.out
├── index.html
├── mgmt_cluster
│   └── kubectl
│       └── kubectl_get_awscluster_-oyaml_--kubeconfig_.tmp.kubeconfig-mgmt786666316.kubeconfig.conf.out
├── snapshot.config
└── snapshot.log