Configure node pool update policy

Node pools are updated in parallel by default. This can potentially causing two issues:

  • Quota issue

    For each node pool, it will create one surge node, so that N node pools will create N surge nodes when they are updated in parallel. This may result in resource exhaustion if there is limited extra capacity for creating those surge nodes.

  • PDB deadlock issue

    Draining more than one node at a time could potentially cause PDB deadlocks.

For N node pools to be updated in parallel, there must be N extra IP addresses available for the surge nodes. If your worker nodes get their IP addresses from a DHCP server, then your DHCP server must be able to provide N extra IP addresses. If your worker nodes use static IP addresses, then your IP block file must contain N extra IP addresses in addition to those needed for the worker nodes.

If there aren't enough extra IP addresses available to update all N nodes in parallel, then we update as many pools as possible in parallel. As IP addresses become available, we update the remaining node pools.

This doc shows how to specify a node pool update policy to configure the maximum number of nodes to be updated concurrently during node pool update, which can help avoid the two issues mentioned. 0 means preserving the current parallel behavior, which is the default value. 1 means updating the node pools sequentially.

Note that while sequential update can avoid the two issues, it could make the node pool update process take longer time than updating in parallel.

Configure node pool update policy

In user cluster seed config user-cluster.yaml, you can configure the node pool update to be done sequentially as follows:

  maximumConcurrentNodePoolUpdate: 1

The maximumConcurrentNodePoolUpdate can be set to an arbitrary integer value to configure the number of node pools you want to update at the same time.

The node pool update policy applies to both node pool update and upgrade, but not create. Also in case any node pool update/upgrade encountered issues and got stuck, the current behavior is that we will block there and won't move on to update next node pool.