Renew expired cluster certificates manually

This document describes how to manually renew expired certificates for your Google Distributed Cloud. Transport Layer Security (TLS) certificates are used by the control plane components of Google Distributed Cloud. When these certificates expire, your ability to manage workloads and cluster lifecycles is blocked until the certificates can be renewed. For more information about the impact of expired certificates, see Certificate expiration.

By default, TLS certificates have a 1-year expiration period. Google Distributed Cloud renews these certificates automatically during cluster upgrades and when you Rotate certificate authorities. We recommend that you upgrade your clusters regularly to keep them secure, supported, and to prevent TLS certificates from expiring.

Errors caused by certificate expiration

If the TLS certificates on your cluster expire, the core controllers can't establish TLS connections with the Kubernetes API server. This lack of connectivity causes the following errors:

  • Unable to connect to the server: x509: Unable to connect to the server

    When you use kubectl to get your cluster nodes, the response includes an error that references certificate expiration:

    kubectl get nodes --kubeconfig KUBECONFIG_PATH
    

    Replace KUBECONFIG_PATH with the path to the kubeconfig file for your cluster.

    When certificates have expired, the response is something like the following:

    Unable to connect to the server: x509: certificate has expired or is not yet valid
    
  • could not connect: x509 or rejected connection

    Expired certificates block access to the etcd cluster, as peers can't communicate with each other. The etcd logs may contain error entries like these:

    W | rafthttp: health check for peer 6221a1d241bb2d0a could not connect: x509: certificate
    has expired or is not yet valid
    I | embed: rejected connection from "10.200.0.4:46108" (error "remote error: tls: bad
    certificate", ServerName "")
    

Check certificate expiration times

This section has instructions for checking the expiration times for the certificates used by your cluster. Perform the following steps on each control plane node.

To check certificate expiration times:

  1. Log in to one of the control plane node machines and run the following command:

    sudo kubeadm certs check-expiration
    

    The command output lists the certificates created by kubeadm for the control plane components and their expiry:

    CERTIFICATE                EXPIRES                  RESIDUAL TIME   CERTIFICATE AUTHORITY   EXTERNALLY MANAGED
    admin.conf                 Nov 28, 2021 19:09 UTC   53m                                     no      
    apiserver                  Nov 28, 2021 19:09 UTC   53m             ca                      no      
    apiserver-etcd-client      Nov 28, 2021 19:09 UTC   53m             etcd-ca                 no      
    apiserver-kubelet-client   Nov 28, 2021 19:09 UTC   53m             ca                      no      
    controller-manager.conf    Nov 28, 2021 19:09 UTC   53m                                     no      
    etcd-healthcheck-client    Nov 28, 2021 19:09 UTC   53m             etcd-ca                 no      
    etcd-peer                  Nov 28, 2021 19:09 UTC   53m             etcd-ca                 no      
    etcd-server                Nov 28, 2021 19:09 UTC   53m             etcd-ca                 no      
    front-proxy-client         Nov 28, 2021 19:09 UTC   53m             front-proxy-ca          no      
    scheduler.conf             Nov 28, 2021 19:09 UTC   53m                                     no      
    
    CERTIFICATE AUTHORITY   EXPIRES                  RESIDUAL TIME   EXTERNALLY MANAGED
    ca                      Nov 26, 2031 18:06 UTC   9y              no      
    etcd-ca                 Nov 26, 2031 18:06 UTC   9y              no      
    front-proxy-ca          Nov 26, 2031 18:06 UTC   9y              no    
    
  2. Run the following command to check the expiration times for kubelet certificates:

    sudo openssl x509 -in /var/lib/kubelet/pki/kubelet-client-current.pem -text | grep Validity -A2
    sudo openssl x509 -in /var/lib/kubelet/pki/kubelet-server-current.pem -text | grep Validity -A2
    

    The response for each command looks like the following output:

    Validity
        Not Before: Sep 17 22:27:53 2021 GMT
        Not After : Sep 17 22:33:16 2022 GMT
    

    If all control plane nodes have been bootstrapped at the same time, then the certificate expiry times are within minutes of each other. This timing relationship applies across all control plane nodes. You can verify expiry times by running the preceding commands on each control plane node.

  3. Run the following command on the admin workstation to check the expiration time of the client certificate in the kubeconfig file of the cluster:

    grep 'client-certificate-data' KUBECONFIG_PATH | \
        awk '{print $2}' | base64 -d | openssl x509 -text | grep Validity -A2
    

    The response looks like this sample output:

    Validity
        Not Before: Sep 17 22:27:53 2021 GMT
        Not After : Sep 17 22:33:16 2022 GMT
    
  4. Run the following command to look up the certificate expiration for the cluster kubeconfig in the admin cluster:

    kubectl get secret/CLUSTER_NAME-kubeconfig -n CLUSTER_NAMESPACE -o --kubeconfig=ADMIN_KUBECONFIG jsonpath='{.data.value}' | base64 --decode | grep client-certificate-data | awk '{print $2}' | base64 -d | openssl x509 -text | grep Validity -A2
    
    Validity
        Not Before: Sep 17 22:27:53 2021 GMT
        Not After : Sep 17 22:33:16 2022 GMT
    

    The kubeconfig certificate in the admin cluster and the certificate in the kubeconfig file on the admin workstation are the same. Therefore, the output for this command and the command from the previous step must match.

Renew certificates manually

To manually renew TLS certificates for a cluster, use the instructions in the following sections.

Renew certificates on each control plane node

Perform the following steps on each control plane node of the affected cluster:

  1. Back up the /etc/kubernetes folder.

  2. Run the following kubeadm command to renew all the certificates:

    The command renews the certificates using the existing Certificate Authorities (CAs) on the machine.

    sudo kubeadm certs renew all 
    

    The command output is similar to the following example:

    certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself renewed
    certificate for serving the Kubernetes API renewed
    certificate the apiserver uses to access etcd renewed
    certificate for the API server to connect to kubelet renewed
    certificate embedded in the kubeconfig file for the controller manager to use renewed
    certificate for liveness probes to healthcheck etcd renewed
    certificate for etcd nodes to communicate with each other renewed
    certificate for serving etcd renewed
    certificate for the front proxy client renewed
    certificate embedded in the kubeconfig file for the scheduler manager to use renewed
    
  3. Verify that the certificates have a new expiry time by running the following command:

    sudo kubeadm certs check-expiration
    
  4. Restart containers with the following commands:

    Not all control plane components support dynamic certificate reload, so this step restarts the following containers: kube-apiserver, kube-scheduler, kube-controller-manager, and etcd to pick up the renewed certificates.

    Repeat the following steps for each of the four containers:

    1. Find the container ID for each container:

      sudo crictl ps | grep CONTAINER_NAME
      

      Replace CONTAINER_NAME with the name of the following containers: kube-apiserver, kube-scheduler, kube-controller-manager, or etcd (not etcd-defrag).

      The response is similar to the following output:

      c331ade490cb6       28df10594cd92      26 hours ago       Running          kube-apiserver ...
      

      The container ID is the value in the first column.

    2. Stop each container:

      sudo crictl stop CONTAINER_ID
      

      Replace CONTAINER_ID with the container ID from the preceding step.

      When the stopped container exits, kubelet creates a new container in its place and deletes the stopped one. If you encounter an error, such as context deadline exceeded (error code DeadlineExceeded), rerun the command.

Verify that connectivity is restored

At this point, kubeadm certs should be renewed on all control plane nodes. If you're renewing expired certificates, perform the following step.

  • To verify the connection with the Kubernetes API server, run the following kubectl command on any control plane node:

    kubectl get nodes --kubeconfig=/etc/kubernetes/admin.conf
    

The response should return the list of nodes for the cluster. If your certificates are properly renewed, no TLS or certificate errors are returned.

Replace the cluster kubeconfig file

To replace the kubeconfig file for your cluster with one that has the renewed certificates, use the following steps:

  1. To create the new kubeconfig file, run the following kubectl command on the admin workstation:

    kubectl --kubeconfig="ADMIN_KUBECONFIG" get secret/CLUSTER_NAME-kubeconfig  \
        -n "CLUSTER_NAMESPACE"  -o jsonpath='{.data.value}'  | base64 --decode > new_kubeconfig.conf
    

    Replace the following:

    • ADMIN_KUBECONFIG: the path to the admin cluster kubeconfig file.

    • CLUSTER_NAME: the name of the cluster that you're renewing certificates for.

    • CLUSTER_NAMESPACE: the namespace of the cluster that you're renewing certificates for.

    The new_kubeconfig.conf file contains the updated certificate data.

  2. Verify that the new kubeconfig works by running any kubectl command, using the new credentials:

    kubectl get nodes --kubeconfig new_kubeconfig.conf
    
  3. Replace the contents of the old kubeconfig file saved in the cluster directory on the admin workstation with the contents of the new kubeconfig file new-kubeconfig.conf.

    By default, the path to the cluster configuration file is bmctl-workspace/CLUSTER_NAME/CLUSTER_NAME-kubeconfig.

Verify kubelet certificates and restart etcd-defrag

To finish the process of renewing cluster certificates manually, perform the following steps for each control plane node:

  1. Log in to the control plane node and verify the kubelet client and serving certificate expiry time by running the following commands:

    Kubelet certificates are rotated automatically as long as the control plane is reachable. The period for automatic renewal of kubelet certificates is shorter than the expiration period for control plane component certificates. Therefore, it's likely that kubelet certificates have been renewed before

    sudo openssl x509 -in /var/lib/kubelet/pki/kubelet-client-current.pem -text | grep Validity -A2
    sudo openssl x509 -in /var/lib/kubelet/pki/kubelet-server-current.pem -text | grep Validity -A2
    

    The output of either command looks something like the following example:

    Validity
        Not Before: Nov 28 18:04:57 2022 GMT
        Not After : Nov 28 19:04:57 2023 GMT
    
  2. Use the following command to restart the etcd-defrag container:

    The etcd-defrag container uses the apiserver-etcd client certificate to talk to etcd and must be restarted to pick up the updated certificates.

    kubectl rollout restart daemonset etcd-defrag -n kube-system --kubeconfig KUBECONFIG_PATH
    

You've completed the manual steps to renew cluster certificates. Verify that all pods are running properly and that no TLS errors are reported for control plane containers.