Sometimes, to remove a broken node for repair or replacement, you may have to
force its removal from the cluster. Force removal only removes the broken node
from the cluster management's perspective. Force removal bypasses clean up jobs
for the installed components on the node itself. Upon recovery of the node, you
run bmctl reset nodes
to clean up the installed components on the node so that
it can be reused.
Force-removing nodes
The following methods apply to both control plane nodes and worker nodes. For
control plane nodes, controllers in Google Distributed Cloud also take care of the
bookkeeping of etcd
memberships.
Using bmctl
You can use bmctl
to remove the node from the cluster. Normally, bmctl reset
triggers a reset job to try to clean up installed components on the node. To
remove the node from the cluster without being blocked on cleaning up installed
packages, you can run the bmctl
command with the --force
flag:
bmctl reset nodes --addresses NODE_IP --force --kubeconfig ADMIN_KUBECONFIG --cluster CLUSTER_NAME
Replace the following:
NODE_IP
: the IP address of the node to reset, such as10.200.0.8
.ADMIN_KUBECONFIG
: the path to the admin cluster kubeconfig file.CLUSTER_NAME
: the name of the target cluster that contains the nodes.
Using kubectl
In Google Distributed Cloud, you can add an annotation to mark a node for force removal.
After removing the node from the parent nodepool, run the following command
to annotate the corresponding failing machine with the
baremetal.cluster.gke.io/force-remove
annotation. The value of the annotation itself
does not matter:
kubectl --kubeconfig ADMIN_KUBECONFIG -n CLUSTER_NAMESPACE \ annotate machine 10.200.0.8 baremetal.cluster.gke.io/force-remove=true
Google Distributed Cloud removes the node successfully.