This document describes how a cluster behaves if vCenter Server is down.
While vCenter Server is down:
The machines are in the Available state
The nodes are in the Ready state.
The Pods are in the Running state.
There are some expected errors in Pods that connect to vCenter Server; for
example, the vsphere-controller-manager and cluster-health-controller
Pods.
Stateless Pods can be created and deleted.
The creation of a stateful Pod will fail, because attaching a disk requires
access to vCenter Server. These Pods will be in the Pending state.
The gkectl diagnose command will fail with an error similar to the
following:
Exit with error:
failed to prepare diagnose parameters: failed to create vSphere client: Post "https://my-server": dial tcp 203.0.113.1:443: connect: connection timed out
Auto repair is not triggered. This is because the machine and node states do
not change states on connection errors to vCenter Server.
After vCenter Server comes back online (versions < 7.0U2)
The machines go to the Unavailable state, and auto repair or or a manual
workaround is needed to get back the correct states.
The cluster functions correctly even though the machines are in the
Unavailable state.
After vCenter Server comes back online (versions >= 7.0U2)
No extra steps are needed, and the cluster is healthy again.