Back up and restore advanced clusters with gkectl

This document describes how to back up and restore Google Distributed Cloud version 1.32 and higher admin and user clusters that have advanced cluster enabled.

The gkectl backup and restore process does not include persistent volumes. Any volumes created by the local volume provisioner (LVP) are left unaltered.

Back up a cluster

The gkectl backup cluster command adds the cluster information from the etcd store and the PKI certificates for the specified cluster to a tar file. The etcd store is the Kubernetes backing store for all cluster data and contains all the Kubernetes objects and custom objects required to manage cluster state. The PKI certificates are used for authentication over Transport Layer Security (TLS). This data is backed up from the cluster's control plane or from one of the control planes for a high-availability (HA) deployment.

The backup tar file contains sensitive credentials, including your service account keys and the SSH key. Store backup files in a secure location. To prevent unintended file exposure, the backup process uses in-memory files only.

Back up your clusters regularly to ensure your snapshot data is relatively current. Adjust the rate of backups to reflect the frequency of significant changes to your clusters.

Before you begin, ensure your cluster is operating properly, with working credentials and SSH connectivity to all nodes. The intent of the backup process is to capture your cluster in a known good state so that you can restore operation if a catastrophic failure occurs.

To back up a cluster:

  1. Run the following command to check your cluster:

    gkectl diagnose cluster --cluster-name CLUSTER_NAME \
        --kubeconfig ADMIN_KUBECONFIG
    

    Replace the following:

    • CLUSTER_NAME: the name of the cluster you plan to back up.

    • ADMIN_KUBECONFIG: the path of the kubeconfig file for the admin cluster.

  2. Run the applicable command to back up the cluster:

    Admin cluster

    gkectl backup admin --kubeconfig ADMIN_KUBECONFIG
    

    User cluster

    gkectl backup cluster --cluster-name CLUSTER_NAME \
        --kubeconfig ADMIN_KUBECONFIG
    

By default, the backup tar file is saved to the directory gkectl-workspace/backups on your admin workstation. The tar file is named CLUSTER_NAME_backup_TIMESTAMP.tar.gz, where CLUSTER_NAME is the name of the cluster being backed up and TIMESTAMP is the date and time the backup was made. For example, if the cluster name is testuser, the backup file has a name like testuser_backup_2006-01-02T150405Z0700.tar.gz. To specify a different name and location for your backup file, use the --backup-file flag.

The backup file expires after a year and the cluster restore process doesn't work with expired backup files.

The backup file can also be uploaded to your vCenter Server, if the following section is configured to the admin cluster configuration file: clusterBackup:

datastore: DATASTORE

Replace DATASTORE with the datastore where you want to store the backup. The datastore must be in the same datacenter as the admin cluster. The backups are located in the anthos/CLUSTER_NAME/backup directory of the specified datastore.

Restore a cluster

Restoring a cluster from a backup is a last resort and should only be used when a cluster has failed catastrophically and can't be returned to service any other way. For example, the etcd data is corrupted or the etcd Pod is in a crash loop.

The backup tar file contains sensitive credentials, including your service account keys and the SSH key. To prevent unintended file exposure, the Google Distributed Cloud restore process uses in-memory files only.

Before you restore a cluster, make sure that the following conditions are met:

  • Ensure all control-plane node machines that were available for the cluster at the time of the backup are operating properly and reachable.
  • Ensure that SSH connectivity between nodes works with the SSH keys that were used at the time of the backup. These SSH keys are reinstated as part of the restore process.
  • Ensure that the service-account keys that were used at the time of the backup are still active. These service-account keys are reinstated for the restored cluster.

To restore a cluster:

  1. Run the applicable command to restore the cluster:

    Admin cluster

    gkectl restore admin --backup-file BACKUP_FILE \
        --config ADMIN_CONFIG
    

    Replace the following:

    • BACKUP_FILE: the path and name of the backup file you are using.

    • ADMIN_CONFIG: the path to the admin cluster config file.

    User cluster

    gkectl restore cluster --cluster-name CLUSTER_NAME \
        --backup-file BACKUP_FILE \
        --kubeconfig ADMIN_KUBECONFIG
    

    Replace the following:

    • CLUSTER_NAME: the name of the cluster you are restoring.

    • BACKUP_FILE: the path and name of the backup file you are using.

    • ADMIN_KUBECONFIG: the path to the admin cluster kubeconfig file.

    At the end of the restore process, a new kubeconfig file is generated for the restored cluster at the workspace directory gkectl-workspace.

  2. When the restore finishes, run the following command to verify that it was successful:

    gkectl diagnose cluster --cluster-name CLUSTER_NAME \
        --kubeconfig GENERATED_KUBECONFIG
    

    Replace GENERATED_KUBECONFIG with the generated kubeconfig file.

Troubleshoot

If you have problems with the backup or restore process, the following sections might help you to troubleshoot the issue.

If you need additional assistance, reach out to the Cloud Customer Care team.

Running out of memory during a backup or restore

If the workstation where you run the gkectl command doesn't have a lot of RAM, you might have insufficient memory to perform the backup or restore process. If needed, create and use a temporary scratch disk to process the backup or restore operations by using the --use-disk parameter in the backup command. To preserve the file permissions, this parameter modifies permissions of the files, so it requires that you run the command as a root user (or use sudo).

Refresh of SSH key after a backup breaks the restore process

SSH-related operations during the restore process might fail if the SSH key is refreshed after a backup was performed. In this case, the new SSH key becomes invalid for the restore process. To resolve this issue, you can temporarily add the original SSH key back, then perform the restore. After the restore process is complete, you can rotate the SSH key.