Mantenha tudo organizado com as coleções
Salve e categorize o conteúdo com base nas suas preferências.
O Dataproc no GKE permite executar aplicativos de Big Data usando a API jobs do Dataproc em clusters do GKE.
Use o console Google Cloud , a Google Cloud CLI ou a API Dataproc
(solicitação HTTP ou bibliotecas de cliente do Cloud) para
criar um cluster virtual do Dataproc no GKE
e envie um job do Spark, PySpark, SparkR ou Spark SQL para o serviço
do Dataproc.
O Dataproc no GKE implanta clusters virtuais do Dataproc em um cluster do GKE. Ao contrário dos clusters do Dataproc no Compute Engine, os clusters virtuais do Dataproc no GKE não incluem VMs mestre e worker separadas. Em vez disso, quando você cria um cluster virtual do Dataproc no GKE, o Dataproc no GKE cria pools de nós em um cluster do GKE. Os jobs do Dataproc no GKE são executados como pods nesses pools de nós. Os pools de nós e o
agendamento de pods neles são gerenciados pelo GKE.
[[["Fácil de entender","easyToUnderstand","thumb-up"],["Meu problema foi resolvido","solvedMyProblem","thumb-up"],["Outro","otherUp","thumb-up"]],[["Difícil de entender","hardToUnderstand","thumb-down"],["Informações incorretas ou exemplo de código","incorrectInformationOrSampleCode","thumb-down"],["Não contém as informações/amostras de que eu preciso","missingTheInformationSamplesINeed","thumb-down"],["Problema na tradução","translationIssue","thumb-down"],["Outro","otherDown","thumb-down"]],["Última atualização 2025-08-22 UTC."],[[["\u003cp\u003eDataproc on GKE enables the execution of Big Data applications on GKE clusters through the Dataproc \u003ccode\u003ejobs\u003c/code\u003e API.\u003c/p\u003e\n"],["\u003cp\u003eYou can create a Dataproc on GKE virtual cluster and then submit Spark, PySpark, SparkR, or Spark-SQL jobs via the Google Cloud console, Cloud CLI, or the Dataproc API.\u003c/p\u003e\n"],["\u003cp\u003eDataproc on GKE utilizes virtual clusters, which, unlike Dataproc on Compute Engine clusters, do not have separate master and worker VMs.\u003c/p\u003e\n"],["\u003cp\u003eDataproc on GKE job are run as pods on node pools and is managed by GKE.\u003c/p\u003e\n"],["\u003cp\u003eDataproc on GKE supports Spark 3.5 versions.\u003c/p\u003e\n"]]],[],null,["# Dataproc on GKE overview\n\nDataproc on GKE allows you to execute Big Data applications using the\nDataproc `jobs` API on GKE clusters.\nUse the Google Cloud console, Google Cloud CLI or the Dataproc API\n(HTTP request or Cloud Client Libraries) to\n[create a Dataproc on GKE virtual cluster](/dataproc/docs/guides/dpgke/quickstarts/dataproc-gke-quickstart-create-cluster),\nthen submit a Spark, PySpark, SparkR, or Spark-SQL job to the Dataproc\nservice.\n\nDataproc on GKE supports\n[Spark 3.5 versions](/dataproc/docs/guides/dpgke/dataproc-gke-versions).\n\nHow Dataproc on GKE works\n-------------------------\n\nDataproc on GKE deploys Dataproc **virtual** clusters on\na GKE cluster. Unlike\n[Dataproc on Compute Engine clusters](/dataproc/docs/guides/create-cluster),\nDataproc on GKE virtual clusters do not include separate\nmaster and worker VMs. Instead, when you create a Dataproc on GKE virtual cluster,\nDataproc on GKE creates node pools within a GKE cluster. Dataproc on GKE\njobs are run as pods on these node pools. The node pools and\nscheduling of pods on the node pools are managed by GKE."]]