Mantenha tudo organizado com as coleções
Salve e categorize o conteúdo com base nas suas preferências.
Este documento descreve como substituir uma réplica com falha do etcd em cluster de alta disponibilidade
(HA) no Google Distributed Cloud.
As instruções fornecidas aqui se aplicam a um cluster de usuário de alta disponibilidade que usa
kubeception
ou seja, um cluster de usuário que não tem
Plano de controle V2
ativado. Se você precisar substituir uma réplica do etcd em um cluster de usuário que tenha
o Controlplane V2 ativado, entre em contato com o Cloud Customer Care.
Antes de começar
Verifique se o cluster de administrador está funcionando corretamente.
Verifique se os outros dois membros do etcd no cluster de usuário estão funcionando corretamente. Se mais de um membro do etcd tiver falhado, consulte Recuperação da corrupção ou perda de dados do etcd.
Como substituir uma réplica do etcd com falha
Faça um backup do PodDisruptionBudget (PDB) do etcd para poder restaurá-lo posteriormente.
Em que MEMBER_ID é o ID de membro hexadecimal do pod de réplica do etcd com falha.
Adicione um novo membro com o mesmo nome e URL de peer do nó da réplica com falha.
etcdctl member add MEMBER_NAME --peer-urls=https://MEMBER_NAME.kube-etcd:2380
Em que MEMBER_NAME é o identificador do nó de réplica kube-etcd com falha. Por exemplo, kube-etcd-1 ou kube-etcd2.
Siga as etapas 1 a 3 de Como implantar os pods de utilitário para criar um pod de utilitário no cluster de administrador. Esse pod é usado para acessar o PersistentVolume (PV) do membro do etcd com falha no cluster do usuário.
Limpe o diretório de dados do etcd no pod de utilitário.
[[["Fácil de entender","easyToUnderstand","thumb-up"],["Meu problema foi resolvido","solvedMyProblem","thumb-up"],["Outro","otherUp","thumb-up"]],[["Difícil de entender","hardToUnderstand","thumb-down"],["Informações incorretas ou exemplo de código","incorrectInformationOrSampleCode","thumb-down"],["Não contém as informações/amostras de que eu preciso","missingTheInformationSamplesINeed","thumb-down"],["Problema na tradução","translationIssue","thumb-down"],["Outro","otherDown","thumb-down"]],["Última atualização 2025-05-01 UTC."],[],[],null,["This document describes how to replace a failed etcd replica in a high availability\n(HA) user cluster for Google Distributed Cloud.\n\nThe instructions given here apply to an HA user cluster that uses\n[kubeception](/kubernetes-engine/distributed-cloud/vmware/docs/how-to/create-user-cluster);\nthat is, a user cluster that does not have\n[Controlplane V2](/kubernetes-engine/distributed-cloud/vmware/docs/how-to/user-cluster-configuration-file-latest#enablecontrolplanev2-field)\nenabled. If you need to replace an etcd replica in a user cluster that has\nControlplane V2 enabled, [contact Cloud Customer Care](/kubernetes-engine/distributed-cloud/vmware/docs/getting-support).\n\nBefore you begin\n\n- Make sure the admin cluster is working correctly.\n\n- Make sure the other two etcd members in the user cluster are working\n correctly. If more than one etcd member has failed, see [Recovery from etcd\n data corruption or loss](/kubernetes-engine/distributed-cloud/vmware/docs/concepts/high-availability-disaster-recovery#recovery_from_etcd_data_corruption_or_loss).\n\nReplacing a failed etcd replica\n\n1. Back up a copy of the etcd PodDisruptionBudget (PDB) so you can restore it\n later.\n\n ```\n kubectl --kubeconfig ADMIN_CLUSTER_KUBECONFIG -n USER_CLUSTER_NAME get pdb kube-etcd-pdb -o yaml \u003e PATH_TO_PDB_FILE\n ```\n\n Where:\n - \u003cvar translate=\"no\"\u003eADMIN_CLUSTER_KUBECONFIG\u003c/var\u003e is the path to the\n kubeconfig file for the admin cluster.\n\n - \u003cvar translate=\"no\"\u003eUSER_CLUSTER_NAME\u003c/var\u003e is the name of the user cluster\n that contains the failed etcd replica.\n\n - \u003cvar translate=\"no\"\u003ePATH_TO_PDB_FILE\u003c/var\u003e is the path where you want to\n save the etcd PDB file, for instance `/tmp/etcpdb.yaml`.\n\n2. Delete the etcd PodDisruptionBudget (PDB).\n\n ```\n kubectl --kubeconfig ADMIN_CLUSTER_KUBECONFIG -n USER_CLUSTER_NAME delete pdb kube-etcd-pdb\n ```\n3. Run the following command to open the kube-etcd [StatefulSet](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/) in your text editor:\n\n ```\n kubectl --kubeconfig ADMIN_CLUSTER_KUBECONFIG -n USER_CLUSTER_NAME edit statefulset kube-etcd\n ```\n\n Change the value of the `--initial-cluster-state` flag to `existing`. \n\n ```\n containers:\n - name: kube-etcd\n ...\n args:\n - --initial-cluster-state=existing\n ...\n \n ```\n4. Drain the failed etcd replica node.\n\n ```\n kubectl --kubeconfig ADMIN_CLUSTER_KUBECONFIG drain NODE_NAME --ignore-daemonsets --delete-local-data\n ```\n\n Where \u003cvar translate=\"no\"\u003eNODE_NAME\u003c/var\u003e is the name of the failed etcd replica node.\n5. Create a new shell in the container of one of the working kube-etcd pods.\n\n ```\n kubectl --kubeconfig ADMIN_CLUSTER_KUBECONFIG exec -it \\\n KUBE_ETCD_POD --container kube-etcd --namespace USER_CLUSTER_NAME \\\n -- bin/sh\n ```\n\n Where \u003cvar translate=\"no\"\u003eKUBE_ETCD_POD\u003c/var\u003e is the name of the working\n kube-etcd pod. For example, `kube-etcd-0`.\n\n From this new shell, run the following commands:\n 1. Remove the failed etcd replica node from the etcd cluster.\n\n First, list all the members of the etcd cluster: \n\n ```\n etcdctl member list -w table\n ```\n\n The output shows all the member IDs. Determine the member ID of the\n failed replica.\n\n Next, remove the failed replica: \n\n ```\n export ETCDCTL_CACERT=/etcd.local.config/certificates/etcdCA.crt\n export ETCDCTL_CERT=/etcd.local.config/certificates/etcd.crt\n export ETCDCTL_CERT=/etcd.local.config/certificates/etcd.crt\n export ETCDCTL_KEY=/etcd.local.config/certificates/etcd.key\n export ETCDCTL_ENDPOINTS=https://127.0.0.1:2379\n etcdctl member remove MEMBER_ID\n ```\n\n Where \u003cvar translate=\"no\"\u003eMEMBER_ID\u003c/var\u003e is the hex member ID of the\n failed etcd replica pod.\n 2. Add a new member with the same name and peer URL as the failed replica node.\n\n ```\n etcdctl member add MEMBER_NAME --peer-urls=https://MEMBER_NAME.kube-etcd:2380\n ```\n\n Where \u003cvar translate=\"no\"\u003eMEMBER_NAME\u003c/var\u003e is the identifier of the failed kube-etcd replica node. For example, `kube-etcd-1` or `kube-etcd2`.\n6. Follow steps 1-3 of [Deploying the utility Pods](/kubernetes-engine/distributed-cloud/vmware/docs/how-to/backing-up#deploy_utility_pods) to create a utility Pod in\n the admin cluster. This Pod is used to access the PersistentVolume (PV) of the\n failed etcd member in the user cluster.\n\n7. Clean up the etcd data directory from within the utility Pod.\n\n ```\n kubectl --kubeconfig ADMIN_CLUSTER_KUBECONFIG exec -it -n USER_CLUSTER_NAME etcd-utility-MEMBER_NUMBER -- /bin/bash -c 'rm -rf /var/lib/etcd/*'\n ```\n8. Delete the utility Pod.\n\n ```\n kubectl --kubeconfig ADMIN_CLUSTER_KUBECONFIG delete pod -n USER_CLUSTER_NAME etcd-utility-MEMBER_NUMBER\n ```\n9. Uncordon the failed node.\n\n ```\n kubectl --kubeconfig ADMIN_CLUSTER_KUBECONFIG uncordon NODE_NAME\n ```\n10. Open the kube-etcd StatefulSet in your text editor.\n\n ```\n kubectl --kubeconfig ADMIN_CLUSTER_KUBECONFIG -n USER_CLUSTER_NAME edit statefulset kube-etcd\n ```\n\n Change the value of the `--initial-cluster-state` flag to `new`. \n\n ```\n containers:\n - name: kube-etcd\n ...\n args:\n - --initial-cluster-state=new\n ...\n \n ```\n11. Restore the etcd PDB which was deleted in step 1.\n\n ```\n kubectl --kubeconfig ADMIN_CLUSTER_KUBECONFIG apply -f /path/to/etcdpdb.yaml\n ```"]]