This document describes how to manually renew expired certificates for your Google Distributed Cloud. Transport Layer Security (TLS) certificates are used by the control plane components of Google Distributed Cloud. When these certificates expire, your ability to manage workloads and cluster lifecycles is blocked until the certificates can be renewed. For more information about the impact of expired certificates, see Certificate expiration.
By default, TLS certificates have a 1-year expiration period. Google Distributed Cloud renews these certificates automatically during cluster upgrades and when you Rotate certificate authorities. We recommend that you upgrade your clusters regularly to keep them secure, supported, and to prevent TLS certificates from expiring.
Errors caused by certificate expiration
If the TLS certificates on your cluster expire, the core controllers can't establish TLS connections with the Kubernetes API server. This lack of connectivity causes the following errors:
Unable to connect to the server: x509: Unable to connect to the server
When you use
kubectl
to get your cluster nodes, the response includes an error that references certificate expiration:kubectl get nodes --kubeconfig KUBECONFIG_PATH
Replace
KUBECONFIG_PATH
with the path to the kubeconfig file for your cluster.When certificates have expired, the response is something like the following:
Unable to connect to the server: x509: certificate has expired or is not yet valid
could not connect: x509
orrejected connection
Expired certificates block access to the etcd cluster, as peers can't communicate with each other. The etcd logs may contain error entries like these:
W | rafthttp: health check for peer 6221a1d241bb2d0a could not connect: x509: certificate has expired or is not yet valid I | embed: rejected connection from "10.200.0.4:46108" (error "remote error: tls: bad certificate", ServerName "")
Check certificate expiration times
This section has instructions for checking the expiration times for the certificates used by your cluster. Perform the following steps on each control plane node.
To check certificate expiration times:
Log in to one of the control plane node machines and run the following command:
sudo kubeadm certs check-expiration
The command output lists the certificates created by
kubeadm
for the control plane components and their expiry:CERTIFICATE EXPIRES RESIDUAL TIME CERTIFICATE AUTHORITY EXTERNALLY MANAGED admin.conf Nov 28, 2021 19:09 UTC 53m no apiserver Nov 28, 2021 19:09 UTC 53m ca no apiserver-etcd-client Nov 28, 2021 19:09 UTC 53m etcd-ca no apiserver-kubelet-client Nov 28, 2021 19:09 UTC 53m ca no controller-manager.conf Nov 28, 2021 19:09 UTC 53m no etcd-healthcheck-client Nov 28, 2021 19:09 UTC 53m etcd-ca no etcd-peer Nov 28, 2021 19:09 UTC 53m etcd-ca no etcd-server Nov 28, 2021 19:09 UTC 53m etcd-ca no front-proxy-client Nov 28, 2021 19:09 UTC 53m front-proxy-ca no scheduler.conf Nov 28, 2021 19:09 UTC 53m no CERTIFICATE AUTHORITY EXPIRES RESIDUAL TIME EXTERNALLY MANAGED ca Nov 26, 2031 18:06 UTC 9y no etcd-ca Nov 26, 2031 18:06 UTC 9y no front-proxy-ca Nov 26, 2031 18:06 UTC 9y no
Run the following command to check the expiration times for
kubelet
certificates:sudo openssl x509 -in /var/lib/kubelet/pki/kubelet-client-current.pem -text | grep Validity -A2 sudo openssl x509 -in /var/lib/kubelet/pki/kubelet-server-current.pem -text | grep Validity -A2
The response for each command looks like the following output:
Validity Not Before: Sep 17 22:27:53 2021 GMT Not After : Sep 17 22:33:16 2022 GMT
If all control plane nodes have been bootstrapped at the same time, then the certificate expiry times are within minutes of each other. This timing relationship applies across all control plane nodes. You can verify expiry times by running the preceding commands on each control plane node.
Run the following command on the admin workstation to check the expiration time of the client certificate in the kubeconfig file of the cluster:
grep 'client-certificate-data' KUBECONFIG_PATH | \ awk '{print $2}' | base64 -d | openssl x509 -text | grep Validity -A2
The response looks like this sample output:
Validity Not Before: Sep 17 22:27:53 2021 GMT Not After : Sep 17 22:33:16 2022 GMT
Run the following command to look up the certificate expiration for the cluster kubeconfig in the admin cluster:
kubectl get secret/CLUSTER_NAME-kubeconfig -n CLUSTER_NAMESPACE -o --kubeconfig=ADMIN_KUBECONFIG jsonpath='{.data.value}' | base64 --decode | grep client-certificate-data | awk '{print $2}' | base64 -d | openssl x509 -text | grep Validity -A2
Validity Not Before: Sep 17 22:27:53 2021 GMT Not After : Sep 17 22:33:16 2022 GMT
The kubeconfig certificate in the admin cluster and the certificate in the kubeconfig file on the admin workstation are the same. Therefore, the output for this command and the command from the previous step must match.
Renew certificates manually
To manually renew TLS certificates for a cluster, use the instructions in the following sections.
Renew certificates on each control plane node
Perform the following steps on each control plane node of the affected cluster:
Back up the
/etc/kubernetes
folder.Run the following
kubeadm
command to renew all the certificates:The command renews the certificates using the existing Certificate Authorities (CAs) on the machine.
sudo kubeadm certs renew all
The command output is similar to the following example:
certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself renewed certificate for serving the Kubernetes API renewed certificate the apiserver uses to access etcd renewed certificate for the API server to connect to kubelet renewed certificate embedded in the kubeconfig file for the controller manager to use renewed certificate for liveness probes to healthcheck etcd renewed certificate for etcd nodes to communicate with each other renewed certificate for serving etcd renewed certificate for the front proxy client renewed certificate embedded in the kubeconfig file for the scheduler manager to use renewed
Verify that the certificates have a new expiry time by running the following command:
sudo kubeadm certs check-expiration
Restart containers with the following commands:
Not all control plane components support dynamic certificate reload, so this step restarts the following containers:
kube-apiserver
,kube-scheduler
,kube-controller-manager
, andetcd
to pick up the renewed certificates.Repeat the following steps for each of the four containers:
Find the container ID for each container:
sudo crictl ps | grep CONTAINER_NAME
Replace
CONTAINER_NAME
with the name of the following containers:kube-apiserver
,kube-scheduler
,kube-controller-manager
, oretcd
(notetcd-defrag
).The response is similar to the following output:
c331ade490cb6 28df10594cd92 26 hours ago Running kube-apiserver ...
The container ID is the value in the first column.
Stop each container:
sudo crictl stop CONTAINER_ID
Replace CONTAINER_ID with the container ID from the preceding step.
When the stopped container exits, kubelet creates a new container in its place and deletes the stopped one. If you encounter an error, such as
context deadline exceeded
(error codeDeadlineExceeded
), rerun the command.
Verify that connectivity is restored
At this point, kubeadm certs should be renewed on all control plane nodes. If you're renewing expired certificates, perform the following step.
To verify the connection with the Kubernetes API server, run the following
kubectl
command on any control plane node:kubectl get nodes --kubeconfig=/etc/kubernetes/admin.conf
The response should return the list of nodes for the cluster. If your certificates are properly renewed, no TLS or certificate errors are returned.
Replace the cluster kubeconfig file
To replace the kubeconfig file for your cluster with one that has the renewed certificates, use the following steps:
To create the new kubeconfig file, run the following
kubectl
command on the admin workstation:kubectl --kubeconfig="ADMIN_KUBECONFIG" get secret/CLUSTER_NAME-kubeconfig \ -n "CLUSTER_NAMESPACE" -o jsonpath='{.data.value}' | base64 --decode > new_kubeconfig.conf
Replace the following:
ADMIN_KUBECONFIG: the path to the admin cluster kubeconfig file.
CLUSTER_NAME: the name of the cluster that you're renewing certificates for.
CLUSTER_NAMESPACE: the namespace of the cluster that you're renewing certificates for.
The
new_kubeconfig.conf
file contains the updated certificate data.Verify that the new kubeconfig works by running any
kubectl
command, using the new credentials:kubectl get nodes --kubeconfig new_kubeconfig.conf
Replace the contents of the old kubeconfig file saved in the cluster directory on the admin workstation with the contents of the new kubeconfig file
new-kubeconfig.conf
.By default, the path to the cluster configuration file is
bmctl-workspace/CLUSTER_NAME/CLUSTER_NAME-kubeconfig
.
Verify kubelet certificates and restart etcd-defrag
To finish the process of renewing cluster certificates manually, perform the following steps for each control plane node:
Log in to the control plane node and verify the kubelet client and serving certificate expiry time by running the following commands:
Kubelet certificates are rotated automatically as long as the control plane is reachable. The period for automatic renewal of kubelet certificates is shorter than the expiration period for control plane component certificates. Therefore, it's likely that kubelet certificates have been renewed before
sudo openssl x509 -in /var/lib/kubelet/pki/kubelet-client-current.pem -text | grep Validity -A2 sudo openssl x509 -in /var/lib/kubelet/pki/kubelet-server-current.pem -text | grep Validity -A2
The output of either command looks something like the following example:
Validity Not Before: Nov 28 18:04:57 2022 GMT Not After : Nov 28 19:04:57 2023 GMT
Use the following command to restart the
etcd-defrag
container:The
etcd-defrag
container uses theapiserver-etcd
client certificate to talk to etcd and must be restarted to pick up the updated certificates.kubectl rollout restart daemonset etcd-defrag -n kube-system --kubeconfig KUBECONFIG_PATH
You've completed the manual steps to renew cluster certificates. Verify that all pods are running properly and that no TLS errors are reported for control plane containers.