This document describes how a cluster behaves if vCenter Server is down.
While vCenter Server is down:
The machines are in the Available state
The nodes are in the Ready state.
The Pods are in the Running state.
There are some expected errors in Pods that connect to vCenter Server; for
example, the vsphere-controller-manager and cluster-health-controller
Pods.
Stateless Pods can be created and deleted.
The creation of a stateful Pod will fail, because attaching a disk requires
access to vCenter Server. These Pods will be in the Pending state.
The gkectl diagnose command will fail with an error similar to the
following:
Exit with error:
failed to prepare diagnose parameters: failed to create vSphere client: Post "https://my-server": dial tcp 203.0.113.1:443: connect: connection timed out
Auto repair is not triggered. This is because the machine and node states do
not change states on connection errors to vCenter Server.
After vCenter Server comes back online (versions < 7.0U2)
The machines go to the Unavailable state, and auto repair or or a manual
workaround is needed to get back the correct states.
The cluster functions correctly even though the machines are in the
Unavailable state.
After vCenter Server comes back online (versions >= 7.0U2)
No extra steps are needed, and the cluster is healthy again.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-03-05 UTC."],[[["When vCenter Server is down, machines are in the `Available` state, nodes are `Ready`, and Pods are `Running`, but Pods connecting to vCenter Server may encounter errors."],["During a vCenter Server outage, stateless Pods can be managed, but stateful Pod creation will fail and remain in a `Pending` state due to the inability to attach disks."],["The `gkectl diagnose` command will fail to execute while vCenter Server is down, typically displaying a connection timeout error."],["Auto-repair does not activate during vCenter Server downtime because machine and node states remain unchanged despite the connection issues."],["Upon vCenter Server recovery, versions below 7.0U2 require manual intervention or auto-repair due to machines going to `Unavailable`, whereas versions 7.0U2 and above recover automatically."]]],[]]