Mit Sammlungen den Überblick behalten
Sie können Inhalte basierend auf Ihren Einstellungen speichern und kategorisieren.
In diesem Dokument wird beschrieben, wie Sie ein fehlgeschlagenes etcd-Replikat in einem Nutzercluster mit Hochverfügbarkeit (High Availability, HA) für Google Distributed Cloud ersetzen.
Die hier beschriebenen Schritte gelten für einen HA-Nutzercluster, der kubeception verwendet, also einen Nutzercluster, für den Controlplane V2 nicht aktiviert ist. Wenn Sie ein etcd-Replikat in einem Nutzercluster ersetzen müssen, für den Controlplane V2 aktiviert ist, wenden Sie sich an Cloud Customer Care.
Hinweise
Überprüfen Sie, ob der Admin-Cluster ordnungsgemäß funktioniert.
Prüfen Sie, ob die anderen beiden etcd-Mitglieder im Nutzercluster ordnungsgemäß funktionieren. Wenn mehr als ein etcd-Mitglied fehlgeschlagen ist, lesen Sie die Informationen unter Wiederherstellung von Datenfehlern oder Verlusten von etcd.
Fehlerhaftes etcd-Replikat ersetzen
Sichern Sie eine Kopie des etcd PodDisruptionBudget (PDB), das Sie später wiederherstellen können.
Dabei ist MEMBER_ID die Hexadezimal-Mitglieds-ID des ausgefallenen etcd-Replikat-Pods.
Fügen Sie ein neues Mitglied mit demselben Namen und derselben Peer-URL wie der ausgefallene Replikatknoten hinzu.
etcdctl member add MEMBER_NAME --peer-urls=https://MEMBER_NAME.kube-etcd:2380
Dabei ist MEMBER_NAME die ID des fehlgeschlagenen Replikats von kube-etcd. Beispiel: kube-etcd-1oder kube-etcd2
Führen Sie die Schritte 1 bis 3 unter Dienstprogramm-Pods bereitstellen aus, um einen Dienstprogramm-Pod im Administratorcluster zu erstellen. Dieser Pod wird verwendet, um im Nutzercluster auf das PersistentVolume (PV) des fehlgeschlagenen etcd-Mitglieds zuzugreifen.
Bereinigen Sie das etcd-Datenverzeichnis innerhalb des Dienstprogramm-Pods.
[[["Leicht verständlich","easyToUnderstand","thumb-up"],["Mein Problem wurde gelöst","solvedMyProblem","thumb-up"],["Sonstiges","otherUp","thumb-up"]],[["Schwer verständlich","hardToUnderstand","thumb-down"],["Informationen oder Beispielcode falsch","incorrectInformationOrSampleCode","thumb-down"],["Benötigte Informationen/Beispiele nicht gefunden","missingTheInformationSamplesINeed","thumb-down"],["Problem mit der Übersetzung","translationIssue","thumb-down"],["Sonstiges","otherDown","thumb-down"]],["Zuletzt aktualisiert: 2025-05-01 (UTC)."],[],[],null,["This document describes how to replace a failed etcd replica in a high availability\n(HA) user cluster for Google Distributed Cloud.\n\nThe instructions given here apply to an HA user cluster that uses\n[kubeception](/kubernetes-engine/distributed-cloud/vmware/docs/how-to/create-user-cluster);\nthat is, a user cluster that does not have\n[Controlplane V2](/kubernetes-engine/distributed-cloud/vmware/docs/how-to/user-cluster-configuration-file-latest#enablecontrolplanev2-field)\nenabled. If you need to replace an etcd replica in a user cluster that has\nControlplane V2 enabled, [contact Cloud Customer Care](/kubernetes-engine/distributed-cloud/vmware/docs/getting-support).\n\nBefore you begin\n\n- Make sure the admin cluster is working correctly.\n\n- Make sure the other two etcd members in the user cluster are working\n correctly. If more than one etcd member has failed, see [Recovery from etcd\n data corruption or loss](/kubernetes-engine/distributed-cloud/vmware/docs/concepts/high-availability-disaster-recovery#recovery_from_etcd_data_corruption_or_loss).\n\nReplacing a failed etcd replica\n\n1. Back up a copy of the etcd PodDisruptionBudget (PDB) so you can restore it\n later.\n\n ```\n kubectl --kubeconfig ADMIN_CLUSTER_KUBECONFIG -n USER_CLUSTER_NAME get pdb kube-etcd-pdb -o yaml \u003e PATH_TO_PDB_FILE\n ```\n\n Where:\n - \u003cvar translate=\"no\"\u003eADMIN_CLUSTER_KUBECONFIG\u003c/var\u003e is the path to the\n kubeconfig file for the admin cluster.\n\n - \u003cvar translate=\"no\"\u003eUSER_CLUSTER_NAME\u003c/var\u003e is the name of the user cluster\n that contains the failed etcd replica.\n\n - \u003cvar translate=\"no\"\u003ePATH_TO_PDB_FILE\u003c/var\u003e is the path where you want to\n save the etcd PDB file, for instance `/tmp/etcpdb.yaml`.\n\n2. Delete the etcd PodDisruptionBudget (PDB).\n\n ```\n kubectl --kubeconfig ADMIN_CLUSTER_KUBECONFIG -n USER_CLUSTER_NAME delete pdb kube-etcd-pdb\n ```\n3. Run the following command to open the kube-etcd [StatefulSet](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/) in your text editor:\n\n ```\n kubectl --kubeconfig ADMIN_CLUSTER_KUBECONFIG -n USER_CLUSTER_NAME edit statefulset kube-etcd\n ```\n\n Change the value of the `--initial-cluster-state` flag to `existing`. \n\n ```\n containers:\n - name: kube-etcd\n ...\n args:\n - --initial-cluster-state=existing\n ...\n \n ```\n4. Drain the failed etcd replica node.\n\n ```\n kubectl --kubeconfig ADMIN_CLUSTER_KUBECONFIG drain NODE_NAME --ignore-daemonsets --delete-local-data\n ```\n\n Where \u003cvar translate=\"no\"\u003eNODE_NAME\u003c/var\u003e is the name of the failed etcd replica node.\n5. Create a new shell in the container of one of the working kube-etcd pods.\n\n ```\n kubectl --kubeconfig ADMIN_CLUSTER_KUBECONFIG exec -it \\\n KUBE_ETCD_POD --container kube-etcd --namespace USER_CLUSTER_NAME \\\n -- bin/sh\n ```\n\n Where \u003cvar translate=\"no\"\u003eKUBE_ETCD_POD\u003c/var\u003e is the name of the working\n kube-etcd pod. For example, `kube-etcd-0`.\n\n From this new shell, run the following commands:\n 1. Remove the failed etcd replica node from the etcd cluster.\n\n First, list all the members of the etcd cluster: \n\n ```\n etcdctl member list -w table\n ```\n\n The output shows all the member IDs. Determine the member ID of the\n failed replica.\n\n Next, remove the failed replica: \n\n ```\n export ETCDCTL_CACERT=/etcd.local.config/certificates/etcdCA.crt\n export ETCDCTL_CERT=/etcd.local.config/certificates/etcd.crt\n export ETCDCTL_CERT=/etcd.local.config/certificates/etcd.crt\n export ETCDCTL_KEY=/etcd.local.config/certificates/etcd.key\n export ETCDCTL_ENDPOINTS=https://127.0.0.1:2379\n etcdctl member remove MEMBER_ID\n ```\n\n Where \u003cvar translate=\"no\"\u003eMEMBER_ID\u003c/var\u003e is the hex member ID of the\n failed etcd replica pod.\n 2. Add a new member with the same name and peer URL as the failed replica node.\n\n ```\n etcdctl member add MEMBER_NAME --peer-urls=https://MEMBER_NAME.kube-etcd:2380\n ```\n\n Where \u003cvar translate=\"no\"\u003eMEMBER_NAME\u003c/var\u003e is the identifier of the failed kube-etcd replica node. For example, `kube-etcd-1` or `kube-etcd2`.\n6. Follow steps 1-3 of [Deploying the utility Pods](/kubernetes-engine/distributed-cloud/vmware/docs/how-to/backing-up#deploy_utility_pods) to create a utility Pod in\n the admin cluster. This Pod is used to access the PersistentVolume (PV) of the\n failed etcd member in the user cluster.\n\n7. Clean up the etcd data directory from within the utility Pod.\n\n ```\n kubectl --kubeconfig ADMIN_CLUSTER_KUBECONFIG exec -it -n USER_CLUSTER_NAME etcd-utility-MEMBER_NUMBER -- /bin/bash -c 'rm -rf /var/lib/etcd/*'\n ```\n8. Delete the utility Pod.\n\n ```\n kubectl --kubeconfig ADMIN_CLUSTER_KUBECONFIG delete pod -n USER_CLUSTER_NAME etcd-utility-MEMBER_NUMBER\n ```\n9. Uncordon the failed node.\n\n ```\n kubectl --kubeconfig ADMIN_CLUSTER_KUBECONFIG uncordon NODE_NAME\n ```\n10. Open the kube-etcd StatefulSet in your text editor.\n\n ```\n kubectl --kubeconfig ADMIN_CLUSTER_KUBECONFIG -n USER_CLUSTER_NAME edit statefulset kube-etcd\n ```\n\n Change the value of the `--initial-cluster-state` flag to `new`. \n\n ```\n containers:\n - name: kube-etcd\n ...\n args:\n - --initial-cluster-state=new\n ...\n \n ```\n11. Restore the etcd PDB which was deleted in step 1.\n\n ```\n kubectl --kubeconfig ADMIN_CLUSTER_KUBECONFIG apply -f /path/to/etcdpdb.yaml\n ```"]]