You can set the bounds for Dataproc on GKE node pool autoscaling when you
create a Dataproc on GKE virtual cluster. If not specified, Dataproc on GKE node pools
are autoscaled with default values (at Dataproc on GKE GA release, defaults
set to minimum = 1 and maximum = 10, which are subject to change). To obtain
specific minimum and maximum node pool autoscaling values, set them when you
create your Dataproc on GKE virtual cluster.
Update node pool autoscaling
Use the following GKE
gcloud container node-pools update
command to change the autoscaling configuration of a Dataproc on GKE node pool.
When a job is submitted, the driver pod is scheduled to run on the node pool
associated with the Spark driver role.
The driver pod calls the GKE scheduler to create
executor pods.
Executor pods are scheduled on the node pool associated with the
Spark executor role.
If the node pools have capacity for the pods, the pods start running immediately.
If there is insufficient capacity, the GKE cluster autoscaler scales up
the node pool to provide the requested resources, up to the user-specified
limit. When node pools have excess capacity, the GKE cluster autoscaler
scales down the node pool to its user-specified limit.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-04-02 UTC."],[[["Dataproc on GKE clusters can be scaled by updating the autoscaler configuration of the node pools associated with Spark driver or executor roles."],["Node pool autoscaling bounds can be set during the creation of a Dataproc on GKE virtual cluster, with default values applied if not specified."],["Updating a node pool to disable autoscaling is not recommended, as autoscaling can help with job execution."],["The `gcloud container node-pools update` command can be used to modify the autoscaling configuration of a Dataproc on GKE node pool, including setting minimum and maximum node counts."],["When a job is submitted, the Spark driver and executor pods are scheduled on their respective node pools, and the GKE cluster autoscaler manages node pool scaling based on capacity and user-specified limits."]]],[]]