Dataproc on GKE 会在 GKE 集群上部署 Dataproc 虚拟集群。与 Dataproc on Compute Engine 集群不同,Dataproc on GKE 虚拟集群不包含单独的主虚拟机和工作器虚拟机。当您创建 Dataproc on GKE 虚拟集群时,Dataproc on GKE 会在 GKE 集群中创建节点池。Dataproc on GKE 作业会在这些节点池上作为 Pod 运行。节点池以及节点池上的 Pod 调度由 GKE 管理。
[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-08-27。"],[[["\u003cp\u003eDataproc on GKE enables the execution of Big Data applications on GKE clusters through the Dataproc \u003ccode\u003ejobs\u003c/code\u003e API.\u003c/p\u003e\n"],["\u003cp\u003eYou can create a Dataproc on GKE virtual cluster and then submit Spark, PySpark, SparkR, or Spark-SQL jobs via the Google Cloud console, Cloud CLI, or the Dataproc API.\u003c/p\u003e\n"],["\u003cp\u003eDataproc on GKE utilizes virtual clusters, which, unlike Dataproc on Compute Engine clusters, do not have separate master and worker VMs.\u003c/p\u003e\n"],["\u003cp\u003eDataproc on GKE job are run as pods on node pools and is managed by GKE.\u003c/p\u003e\n"],["\u003cp\u003eDataproc on GKE supports Spark 3.5 versions.\u003c/p\u003e\n"]]],[],null,["# Dataproc on GKE overview\n\nDataproc on GKE allows you to execute Big Data applications using the\nDataproc `jobs` API on GKE clusters.\nUse the Google Cloud console, Google Cloud CLI or the Dataproc API\n(HTTP request or Cloud Client Libraries) to\n[create a Dataproc on GKE virtual cluster](/dataproc/docs/guides/dpgke/quickstarts/dataproc-gke-quickstart-create-cluster),\nthen submit a Spark, PySpark, SparkR, or Spark-SQL job to the Dataproc\nservice.\n\nDataproc on GKE supports\n[Spark 3.5 versions](/dataproc/docs/guides/dpgke/dataproc-gke-versions).\n\nHow Dataproc on GKE works\n-------------------------\n\nDataproc on GKE deploys Dataproc **virtual** clusters on\na GKE cluster. Unlike\n[Dataproc on Compute Engine clusters](/dataproc/docs/guides/create-cluster),\nDataproc on GKE virtual clusters do not include separate\nmaster and worker VMs. Instead, when you create a Dataproc on GKE virtual cluster,\nDataproc on GKE creates node pools within a GKE cluster. Dataproc on GKE\njobs are run as pods on these node pools. The node pools and\nscheduling of pods on the node pools are managed by GKE."]]