Tetap teratur dengan koleksi
Simpan dan kategorikan konten berdasarkan preferensi Anda.
Dataproc di GKE memungkinkan Anda menjalankan aplikasi Big Data menggunakan
Dataproc jobs API di cluster GKE.
Gunakan Google Cloud konsol, Google Cloud CLI, atau Dataproc API
(permintaan HTTP atau Cloud Client Libraries) untuk
membuat cluster virtual Dataproc di GKE,
lalu kirimkan tugas Spark, PySpark, SparkR, atau Spark-SQL ke layanan
Dataproc.
Dataproc on GKE men-deploy cluster virtual Dataproc di cluster GKE. Tidak seperti
kluster Dataproc di Compute Engine,
kluster virtual Dataproc di GKE tidak menyertakan VM master dan pekerja yang terpisah. Sebagai gantinya, saat Anda membuat cluster virtual Dataproc di GKE,
Dataproc di GKE akan membuat node pool dalam cluster GKE. Tugas Dataproc di GKE dijalankan sebagai pod di node pool ini. Node pool dan penjadwalan pod di node pool dikelola oleh GKE.
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Sulit dipahami","hardToUnderstand","thumb-down"],["Informasi atau kode contoh salah","incorrectInformationOrSampleCode","thumb-down"],["Informasi/contoh yang saya butuhkan tidak ada","missingTheInformationSamplesINeed","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],["Terakhir diperbarui pada 2025-08-22 UTC."],[[["\u003cp\u003eDataproc on GKE enables the execution of Big Data applications on GKE clusters through the Dataproc \u003ccode\u003ejobs\u003c/code\u003e API.\u003c/p\u003e\n"],["\u003cp\u003eYou can create a Dataproc on GKE virtual cluster and then submit Spark, PySpark, SparkR, or Spark-SQL jobs via the Google Cloud console, Cloud CLI, or the Dataproc API.\u003c/p\u003e\n"],["\u003cp\u003eDataproc on GKE utilizes virtual clusters, which, unlike Dataproc on Compute Engine clusters, do not have separate master and worker VMs.\u003c/p\u003e\n"],["\u003cp\u003eDataproc on GKE job are run as pods on node pools and is managed by GKE.\u003c/p\u003e\n"],["\u003cp\u003eDataproc on GKE supports Spark 3.5 versions.\u003c/p\u003e\n"]]],[],null,["# Dataproc on GKE overview\n\nDataproc on GKE allows you to execute Big Data applications using the\nDataproc `jobs` API on GKE clusters.\nUse the Google Cloud console, Google Cloud CLI or the Dataproc API\n(HTTP request or Cloud Client Libraries) to\n[create a Dataproc on GKE virtual cluster](/dataproc/docs/guides/dpgke/quickstarts/dataproc-gke-quickstart-create-cluster),\nthen submit a Spark, PySpark, SparkR, or Spark-SQL job to the Dataproc\nservice.\n\nDataproc on GKE supports\n[Spark 3.5 versions](/dataproc/docs/guides/dpgke/dataproc-gke-versions).\n\nHow Dataproc on GKE works\n-------------------------\n\nDataproc on GKE deploys Dataproc **virtual** clusters on\na GKE cluster. Unlike\n[Dataproc on Compute Engine clusters](/dataproc/docs/guides/create-cluster),\nDataproc on GKE virtual clusters do not include separate\nmaster and worker VMs. Instead, when you create a Dataproc on GKE virtual cluster,\nDataproc on GKE creates node pools within a GKE cluster. Dataproc on GKE\njobs are run as pods on these node pools. The node pools and\nscheduling of pods on the node pools are managed by GKE."]]