Stay organized with collections
Save and categorize content based on your preferences.
Dataproc provides the ability for graphics processing units (GPUs) to be attached to the master and worker Compute Engine nodes in a Dataproc cluster. You can use these GPUs to accelerate specific workloads on your instances, such as machine learning and data processing.
For more information about what you can do with GPUs and what types of GPU hardware are available, read GPUs on Compute Engine.
Before you begin
GPUs require special drivers and software. These items are not
pre-installed on Dataproc clusters.
Check the quotas page for your project to ensure that you have sufficient GPU quota (NVIDIA_T4_GPUS, NVIDIA_P100_GPUS, or NVIDIA_V100_GPUS) available in your project. If GPUs are not listed on the quotas page or you require additional GPU quota, request a quota increase.
Types of GPUs
Dataproc nodes support the following GPU types. You must specify
GPU type when attaching GPUs to your Dataproc cluster.
Attach GPUs to the master and primary and secondary worker nodes in a Dataproc cluster
by filling in the
InstanceGroupConfig.AcceleratorConfigacceleratorTypeUri and acceleratorCount fields as part of the
cluster.create
API request.
Console
Click CPU PLATFORM AND GPU→GPUs→ADD GPU in the master and worker nodes sections of the
Configure nodes panel on the Create a cluster
page in the Google Cloud console to specify the number of GPUs and GPU type
for the nodes.
Install GPU drivers
GPU drivers are required to utilize any GPUs attached to Dataproc nodes.
To install GPU drivers, see the following instructions:
After you have finished installing the GPU driver on your Dataproc nodes, you can verify
that the driver is functioning properly. SSH into the master node of your Dataproc cluster
and run the following command:
nvidia-smi
If the driver is functioning properly, the output will display the driver version and GPU statistics (see Verifying the GPU driver install).
Spark configuration
When you submit a job to Spark,
you can use the spark.executorEnv Spark configuration
runtime environment property
property with the LD_PRELOAD environment variable to preload needed libraries.
Run the following example with spark-shell to run a matrix computation:
import org.apache.spark.mllib.linalg._
import org.apache.spark.mllib.linalg.distributed._
import java.util.Random
def makeRandomSquareBlockMatrix(rowsPerBlock: Int, nBlocks: Int): BlockMatrix = {
val range = sc.parallelize(1 to nBlocks)
val indices = range.cartesian(range)
return new BlockMatrix(
indices.map(
ij => (ij, Matrices.rand(rowsPerBlock, rowsPerBlock, new Random()))),
rowsPerBlock, rowsPerBlock, 0, 0)
}
val N = 1024 * 4
val n = 2
val mat1 = makeRandomSquareBlockMatrix(N, n)
val mat2 = makeRandomSquareBlockMatrix(N, n)
val mat3 = mat1.multiply(mat2)
mat3.blocks.persist.count
println("Processing complete!")
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-25 UTC."],[[["\u003cp\u003eDataproc clusters can utilize GPUs attached to their master and worker nodes to accelerate machine learning and data processing workloads.\u003c/p\u003e\n"],["\u003cp\u003eThere are no additional Dataproc charges for using GPUs; however, standard Compute Engine charges for GPU usage apply and must be reviewed.\u003c/p\u003e\n"],["\u003cp\u003eBefore using GPUs on Dataproc, special drivers need to be installed, and users should verify they have sufficient GPU quota in their project.\u003c/p\u003e\n"],["\u003cp\u003eYou can attach different types of GPUs, including \u003ccode\u003envidia-tesla-l4\u003c/code\u003e, \u003ccode\u003envidia-tesla-a100\u003c/code\u003e, and others, to your Dataproc cluster nodes via gcloud, the REST API, or the Google Cloud console.\u003c/p\u003e\n"],["\u003cp\u003eThe \u003ccode\u003espark.executorEnv\u003c/code\u003e property can be configured in Spark jobs to properly utilize GPUs by preloading the necessary libraries through the use of the \u003ccode\u003eLD_PRELOAD\u003c/code\u003e environment variable.\u003c/p\u003e\n"]]],[],null,["# Attach GPUs to Dataproc clusters\n\nDataproc provides the ability for graphics processing units (GPUs) to be attached to the master and worker Compute Engine nodes in a Dataproc cluster. You can use these GPUs to accelerate specific workloads on your instances, such as machine learning and data processing.\n\nFor more information about what you can do with GPUs and what types of GPU hardware are available, read [GPUs on Compute Engine](/compute/docs/gpus).\n| There are no additional [Dataproc pricing](/dataproc/pricing) charges added to Compute Engine charges for GPUs used in Dataproc clusters.\n\nBefore you begin\n----------------\n\n- GPUs require special drivers and software. These items are not pre-installed on Dataproc clusters.\n- Read about [GPU pricing on Compute Engine](/compute/pricing#gpus) to understand the cost to use GPUs in your instances.\n- Read about [restrictions for instances with GPUs](/compute/docs/gpus#restrictions) to learn how these instances function differently from non-GPU instances.\n- Check the [quotas page](https://console.cloud.google.com/iam-admin/quotas) for your project to ensure that you have sufficient GPU quota (`NVIDIA_T4_GPUS`, `NVIDIA_P100_GPUS`, or `NVIDIA_V100_GPUS`) available in your project. If GPUs are not listed on the quotas page or you require additional GPU quota, [request a quota increase](/compute/quotas#requesting_additional_quota).\n\nTypes of GPUs\n-------------\n\nDataproc nodes support the following GPU types. You must specify\nGPU type when attaching GPUs to your Dataproc cluster.\n\n- `nvidia-tesla-l4` - NVIDIA® Tesla® L4\n- `nvidia-tesla-a100` - NVIDIA® Tesla® A100\n- `nvidia-tesla-p100` - NVIDIA® Tesla® P100\n- `nvidia-tesla-v100` - NVIDIA® Tesla® V100\n- `nvidia-tesla-p4` - NVIDIA® Tesla® P4\n- `nvidia-tesla-t4` - NVIDIA® Tesla® T4\n- `nvidia-tesla-p100-vws` - NVIDIA® Tesla® P100 Virtual Workstations\n- `nvidia-tesla-p4-vws` - NVIDIA® Tesla® P4 Virtual Workstations\n- `nvidia-tesla-t4-vws` - NVIDIA® Tesla® T4 Virtual Workstations\n\nAttach GPUs to clusters\n-----------------------\n\n### gcloud\n\nAttach GPUs to the master and primary and secondary worker nodes in a Dataproc cluster when\ncreating the cluster using the\n[`‑‑master-accelerator`](/sdk/gcloud/reference/dataproc/clusters/create#--master-accelerator),\n[`‑‑worker-accelerator`](/sdk/gcloud/reference/dataproc/clusters/create#--worker-accelerator), and\n[`‑‑secondary-worker-accelerator`](/sdk/gcloud/reference/dataproc/clusters/create#--secondary-worker-accelerator) flags. These flags take the\nfollowing two values:\n\n1. the type of GPU to attach to a node, and\n2. the number of GPUs to attach to the node.\n\nThe type of GPU is required, and the number of GPUs is optional (the default\nis 1 GPU).\n\n\n**Example:** \n\n```\ngcloud dataproc clusters create cluster-name \\\n --region=region \\\n --master-accelerator type=nvidia-tesla-t4 \\\n --worker-accelerator type=nvidia-tesla-t4,count=4 \\\n --secondary-worker-accelerator type=nvidia-tesla-t4,count=4 \\\n ... other flags\n```\n\nTo use GPUs in your cluster, you must\n[install GPU drivers](/compute/docs/gpus/install-drivers-gpu).\n\n### REST API\n\nAttach GPUs to the master and primary and secondary worker nodes in a Dataproc cluster\nby filling in the\n[InstanceGroupConfig.AcceleratorConfig](/dataproc/docs/reference/rest/v1/ClusterConfig#AcceleratorConfig)\n`acceleratorTypeUri` and `acceleratorCount` fields as part of the\n[cluster.create](/dataproc/docs/reference/rest/v1/projects.regions.clusters/create)\nAPI request.\n\n### Console\n\nClick CPU PLATFORM AND GPU→GPUs→ADD GPU in the master and worker nodes sections of the\nConfigure nodes panel on the [Create a cluster](https://console.cloud.google.com/dataproc/clustersAdd)\npage in the Google Cloud console to specify the number of GPUs and GPU type\nfor the nodes.\n\nInstall GPU drivers\n-------------------\n\nGPU drivers are required to utilize any GPUs attached to Dataproc nodes.\nTo install GPU drivers, see the following instructions:\n\n- [Spark Rapids](https://github.com/GoogleCloudDataproc/initialization-actions/tree/master/spark-rapids).\n- [GPU ML Libraries](https://github.com/GoogleCloudDataproc/initialization-actions/tree/master/gpu).\n\nVerify GPU driver install\n-------------------------\n\nAfter you have finished installing the GPU driver on your Dataproc nodes, you can verify\nthat the driver is functioning properly. SSH into the master node of your Dataproc cluster\nand run the following command: \n\n```\nnvidia-smi\n```\n\nIf the driver is functioning properly, the output will display the driver version and GPU statistics (see [Verifying the GPU driver install](/compute/docs/gpus/install-drivers-gpu#verify-driver-install)).\n| **Note:** The driver may not work correctly after a restart of the VM following a Linux [Unattended Upgrade](https://wiki.debian.org/UnattendedUpgrades). Possible solutions: You can disable unattended upgrades or exclude kernel updates by editing the unattended upgrades service config.\n\nSpark configuration\n-------------------\n\nWhen you [submit a job](/dataproc/docs/guides/submit-job) to Spark,\nyou can use the `spark.executorEnv` Spark configuration\n[runtime environment property](https://spark.apache.org/docs/latest/configuration.html#runtime-environment)\nproperty with the `LD_PRELOAD` environment variable to preload needed libraries.\n\nExample: \n\n```\ngcloud dataproc jobs submit spark --cluster=CLUSTER_NAME \\\n --region=REGION \\\n --class=org.apache.spark.examples.SparkPi \\\n --jars=file:///usr/lib/spark/examples/jars/spark-examples.jar \\\n --properties=spark.executorEnv.LD_PRELOAD=libnvblas.so,spark.task.resource.gpu.amount=1,spark.executor.resource.gpu.amount=1,spark.executor.resource.gpu.discoveryScript=/usr/lib/spark/scripts/gpu/getGpusResources.sh\n```\n\nExample GPU job\n---------------\n\nYou can test GPUs on Dataproc by running any of the following jobs,\nwhich benefit when run with GPUs:\n\n1. Run one of the [Spark ML examples](https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifier.scala).\n2. Run the following example with `spark-shell` to run a matrix computation:\n\n```\nimport org.apache.spark.mllib.linalg._\nimport org.apache.spark.mllib.linalg.distributed._\nimport java.util.Random\n\ndef makeRandomSquareBlockMatrix(rowsPerBlock: Int, nBlocks: Int): BlockMatrix = {\n val range = sc.parallelize(1 to nBlocks)\n val indices = range.cartesian(range)\n return new BlockMatrix(\n indices.map(\n ij =\u003e (ij, Matrices.rand(rowsPerBlock, rowsPerBlock, new Random()))),\n rowsPerBlock, rowsPerBlock, 0, 0)\n}\n\nval N = 1024 * 4\nval n = 2\nval mat1 = makeRandomSquareBlockMatrix(N, n)\nval mat2 = makeRandomSquareBlockMatrix(N, n)\nval mat3 = mat1.multiply(mat2)\nmat3.blocks.persist.count\nprintln(\"Processing complete!\")\n```\n\nWhat's Next\n-----------\n\n- Learn how to [create a Compute Engine instance with attached GPUs](/compute/docs/gpus/add-gpus).\n- Learn more about [GPU machine types](/compute/docs/gpus)."]]