Stay organized with collections
Save and categorize content based on your preferences.
Dataproc on GKE allows you to execute Big Data applications using the
Dataproc jobs API on GKE clusters.
Use the Google Cloud console, Google Cloud CLI or the Dataproc API
(HTTP request or Cloud Client Libraries) to
create a Dataproc on GKE virtual cluster,
then submit a Spark, PySpark, SparkR, or Spark-SQL job to the Dataproc
service.
Dataproc on GKE deploys Dataproc virtual clusters on
a GKE cluster. Unlike
Dataproc on Compute Engine clusters,
Dataproc on GKE virtual clusters do not include separate
master and worker VMs. Instead, when you create a Dataproc on GKE virtual cluster,
Dataproc on GKE creates node pools within a GKE cluster. Dataproc on GKE
jobs are run as pods on these node pools. The node pools and
scheduling of pods on the node pools are managed by GKE.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-03-21 UTC."],[[["Dataproc on GKE enables the execution of Big Data applications on GKE clusters through the Dataproc `jobs` API."],["You can create a Dataproc on GKE virtual cluster and then submit Spark, PySpark, SparkR, or Spark-SQL jobs via the Google Cloud console, Cloud CLI, or the Dataproc API."],["Dataproc on GKE utilizes virtual clusters, which, unlike Dataproc on Compute Engine clusters, do not have separate master and worker VMs."],["Dataproc on GKE job are run as pods on node pools and is managed by GKE."],["Dataproc on GKE supports Spark 3.5 versions."]]],[]]