Dataproc Quotas

This page lists Dataproc API quota limits, which are enforced at the project and region level. The quotas reset every sixty seconds (one-minute).

The following table lists the specific and default per-project Dataproc API quota types, quota limits, and the methods to which they apply.

Quota Type Limit Applicable API Methods
AutoscalingOperationRequestsPerMinutePerProjectPerRegion 400 CreateAutoscalingPolicy, GetAutoscalingPolicy, ListAutoscalingPolicies, UpdateAutoscalingPolicy, DeleteAutoscalingPolicy
ClusterOperationRequestsPerMinutePerProjectPerRegion 200 CreateCluster, DeleteCluster, UpdateCluster, StopCluster, StartCluster, DiagnoseCluster, RepairCluster
NodeGroupOperationRequestsPerMinutePerProjectPerRegion 600 CreateNodeGroup, DeleteNodeGroup, ResizeNodeGroup, RepairNodeGroup, UpdateLabelsNodeGroup, StartNodeGroup, StopNodeGroup
GetJobRequestsPerMinutePerProjectPerRegion 7500 GetJob
JobOperationRequestsPerMinutePerProjectPerRegion 400 SubmitJob, UpdateJob, CancelJob, DeleteJob
WorkflowOperationRequestsPerMinutePerProjectPerRegion 400 CreateWorkflowTemplate, InstantiateWorkflowTemplate, InstantiateInlineWorkflowTemplate, UpdateWorkflowTemplate, DeleteWorkflowTemplate
DefaultRequestsPerMinutePerProjectPerRegion 7500 All other operations (primarily Get operations)

The following table lists additional limits on total active operations and jobs at the project and region level.

Quota type Limit Description
ActiveOperationsPerProjectPerRegion 5000 Limit on the total number of concurrent active operations of all types in a single project in a single regional database
ActiveJobsPerProjectPerRegion 5000 Limit on the total number of active jobs in NON_TERMINAL state in a single project in a single regional database

Other Google Cloud quotas

Dataproc clusters utilize other Google Cloud products. These products have project-level quotas, which include quotas that apply to Dataproc use. Some services are required to use Dataproc, such as Compute Engine and Cloud Storage. Other services, such as BigQuery and Bigtable, can optionally be used with Dataproc.

Required services

The following services, which enforce quota limits, are automatically (required to be) used to create Dataproc clusters.

Compute Engine

Dataproc clusters utilize Compute Engine virtual machines. The Compute Engine quotas are split into regional and global limits. These limits apply to clusters you create. For example, to create a cluster with one n1-standard-4 master node and two n1-standard-4 worker nodes, you will use 12 virtual CPUs (4 * 3). This cluster usage will count against the regional quota limit of 24 virtual CPUs.

Default clusters

When you create a Dataproc cluster with default settings, the following Compute Engine resources are used.

Item Resources used
Virtual CPUs 12
Virtual Machine (VM) Instances 3
Persistent disk 1500 GB

Cloud Logging

Dataproc saves driver output and cluster logs in Cloud Logging. The Cloud Logging quota applies to your Dataproc clusters.

Optional services

The following services, which have quota limits, can optionally be used with Dataproc clusters.

BigQuery

When reading or writing data into BigQuery, the BigQuery quota applies.

Bigtable

When reading or writing data into Bigtable, the Bigtable quota applies.