Stay organized with collections
Save and categorize content based on your preferences.
To mitigate the effects of the unavailability of user-specified VMs in specific
regions at specific times
(stockouts),
Dataproc lets you request the creation of a partial cluster by specifying
a minimum number of primary workers that is acceptable to allow cluster creation.
Standard cluster
Partial cluster
If one or more primary workers cannot be created and initialized,
cluster creation fails. Workers that are created continue to run and
incur charges until deleted by the user.
If the specified minimum number of workers can be created,
the cluster is created. Failed (uninitialized) workers are deleted and
don't incur charges. If the specified minimum number of workers can't be
created and initialized, the cluster is not created. Workers that are
created aren't deleted to allow for debugging.
Cluster creation time is optimized.
Longer cluster creation time can occur since all nodes must report provisioning
status.
Use autoscaling
with partial cluster creation to make sure that the target (full) number
of primary workers is created. Autoscaling will try to acquire failed workers
in the background if the workload requires them.
The following is a sample autoscaling policy that retries until the total number
of primary worker instances reaches a target size of 10.
The policy's minInstances and maxInstances match the minimum and total
number of primary workers specified at cluster creation time (see
Create a partial cluster).
Setting the scaleDownFactor to 0 prevents the cluster from scaling down
from 10 to 8, and will help keep the number of workers at the maximum 10-worker
limit.
CLUSTER_NAME: The cluster name must start with a lowercase letter
followed by up to 51 lowercase letters, numbers, and hyphens, and cannot end with a hyphen.
PROJECT: Specify the project associated with the job cluster.
NUM_WORKERS: The total number of primary workers in the cluster to
create if available.
MIN_NUM_WORKERS: The minimum number of primary workers to create
if the specified total number of workers (NUM_WORKERS) cannot be created.
Cluster creation fails if this minimum number of primary workers cannot be created
(workers that are created are not deleted to allow for debugging).
If this flag is omitted, standard cluster creation with the total number of
primary workers (NUM_WORKERS) is attempted.
After creating a cluster, you can run the following gcloud CLI
command to list the number of workers, including any secondary workers,
provisioned in your cluster.
gcloud dataproc clusters list \
--project=PROJECT \
--region=REGION \
--filter=clusterName=CLUSTER_NAME
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-28 UTC."],[[["\u003cp\u003eDataproc's partial cluster creation feature allows cluster creation to proceed even if the desired number of primary worker VMs are unavailable, as long as a specified minimum number can be provisioned.\u003c/p\u003e\n"],["\u003cp\u003eUnlike standard clusters, partial clusters will continue creation if the minimum number of primary workers is met, but failed workers are deleted and not charged for, and single node clusters are not allowed.\u003c/p\u003e\n"],["\u003cp\u003eAutoscaling can be used with partial cluster creation to dynamically acquire any failed worker nodes in the background, working to ensure the full target number of primary workers are present over time.\u003c/p\u003e\n"],["\u003cp\u003ePartial clusters can be created using the Google Cloud CLI or the Dataproc API by specifying the minimum number of workers, but are not supported through the Google Cloud console.\u003c/p\u003e\n"],["\u003cp\u003eAfter cluster creation, users can check the number of provisioned workers, including secondary workers, using the \u003ccode\u003egcloud dataproc clusters list\u003c/code\u003e command.\u003c/p\u003e\n"]]],[],null,["# Create a Dataproc partial cluster\n\nTo mitigate the effects of the unavailability of user-specified VMs in specific\nregions at specific times\n([stockouts](https://en.wikipedia.org/wiki/Stockout)),\nDataproc lets you request the creation of a `partial cluster` by specifying\na **minimum number** of primary workers that is acceptable to allow cluster creation.\n| **Note:** See [Dataproc secondary workers](/dataproc/docs/concepts/compute/secondary-vms) to understand the difference between primary and secondary workers.\n\n### Autoscaling\n\nUse [autoscaling](/dataproc/docs/concepts/configuring-clusters/autoscaling)\nwith partial cluster creation to make sure that the target (full) number\nof primary workers is created. Autoscaling will try to acquire failed workers\nin the background if the workload requires them.\n\nThe following is a sample autoscaling policy that retries until the total number\nof primary worker instances reaches a target size of 10.\nThe policy's `minInstances` and `maxInstances` match the minimum and total\nnumber of primary workers specified at cluster creation time (see\n[Create a partial cluster](#create-partial-cluster)).\nSetting the `scaleDownFactor` to 0 prevents the cluster from scaling down\nfrom 10 to 8, and will help keep the number of workers at the maximum 10-worker\nlimit. \n\n workerConfig:\n minInstances: 8\n maxInstances: 10\n basicAlgorithm:\n cooldownPeriod: 2m\n yarnConfig:\n scaleUpFactor: 1\n scaleDownFactor: 0\n gracefulDecommissionTimeout: 1h\n\nCreate a partial cluster\n------------------------\n\nYou can use the Google Cloud CLI or the Dataproc API to\ncreate a Dataproc partial cluster.\n**Note:** Dataproc partial cluster creation is not available in the Google Cloud console. \n\n### gcloud\n\nTo create a Dataproc partial cluster on the command line, run the\nfollowing [`gcloud dataproc clusters create`](/sdk/gcloud/reference/dataproc/clusters/create#--min-num-workers)\ncommand locally in a terminal window or in\n[Cloud Shell](https://console.cloud.google.com/?cloudshell=true). \n\n```\ngcloud dataproc clusters create CLUSTER_NAME \\\n --project=PROJECT \\\n --region=REGION \\\n --num-workers=NUM_WORKERS \\\n --min-num-workers=MIN_NUM_WORKERS \\\n other args ...\n```\n\nReplace the following:\n\n- \u003cvar translate=\"no\"\u003eCLUSTER_NAME\u003c/var\u003e: The cluster name must start with a lowercase letter followed by up to 51 lowercase letters, numbers, and hyphens, and cannot end with a hyphen.\n- \u003cvar translate=\"no\"\u003ePROJECT\u003c/var\u003e: Specify the project associated with the job cluster.\n- \u003cvar translate=\"no\"\u003eREGION\u003c/var\u003e: Specify the [Compute Engine region](/compute/docs/regions-zones#available) where the job cluster will be located.\n- \u003cvar translate=\"no\"\u003eNUM_WORKERS\u003c/var\u003e: The total number of primary workers in the cluster to create if available.\n- \u003cvar translate=\"no\"\u003eMIN_NUM_WORKERS\u003c/var\u003e: The minimum number of primary workers to create if the specified total number of workers (`NUM_WORKERS`) cannot be created. Cluster creation fails if this minimum number of primary workers cannot be created (workers that are created are not deleted to allow for debugging). If this flag is omitted, standard cluster creation with the total number of primary workers (`NUM_WORKERS`) is attempted.\n\n### REST\n\nTo create a Dataproc partial cluster, specify the minimum number of primary workers in the\n[`workerConfig.minNumInstances`](/dataproc/docs/reference/rest/v1/InstanceGroupConfig#FIELDS.min_num_instances)\nfield as part of a [clusters.create](/dataproc/docs/reference/rest/v1/projects.regions.clusters/create) request.\n| **Note:** You can click the **Equivalent REST\n| or command line** links at the bottom of the left panel of the Dataproc Google Cloud console [Create a cluster](https://console.cloud.google.com/dataproc/clustersAdd) page to have the Console construct an equivalent API REST request or gcloud CLI command to use in your code or from the command line to create a cluster.\n\n\u003cbr /\u003e\n\nDisplay the number of provisioned workers\n-----------------------------------------\n\nAfter creating a cluster, you can run the following gcloud CLI\ncommand to list the number of workers, including any secondary workers,\nprovisioned in your cluster. \n\n```\ngcloud dataproc clusters list \\\n --project=PROJECT \\\n --region=REGION \\\n --filter=clusterName=CLUSTER_NAME\n```"]]