Tetap teratur dengan koleksi
Simpan dan kategorikan konten berdasarkan preferensi Anda.
Saat Anda
membuat atau
memperbarui
cluster virtual Dataproc di GKE, Anda menentukan satu atau beberapa node pool yang akan digunakan cluster virtual untuk menjalankan tugas (cluster ini disebut sebagai cluster yang "digunakan oleh" atau "terkait" dengan node pool yang ditentukan). Jika node pool yang ditentukan tidak ada di cluster GKE Anda, Dataproc di GKE akan membuat node pool di cluster GKE dengan setelan yang Anda tentukan. Jika node pool ada dan dibuat oleh Dataproc, node pool tersebut akan divalidasi untuk mengonfirmasi bahwa setelannya cocok dengan setelan yang ditentukan.
Setelan node pool Dataproc di GKE
Anda dapat menentukan setelan berikut pada node pool yang digunakan oleh cluster virtual Dataproc di GKE (setelan ini adalah subset dari setelan node pool GKE):
spot dapat ditetapkan di GkeNodeConfig Dataproc API.
Penghapusan kumpulan node
Saat cluster Dataproc di GKE dihapus, node pool yang digunakan oleh cluster tersebut tidak akan dihapus. Lihat Menghapus node pool
untuk menghapus node pool yang tidak lagi digunakan oleh cluster Dataproc di GKE.
Lokasi node pool
Anda dapat menentukan lokasi zona node pool yang terkait dengan cluster virtual Dataproc di GKE saat Anda membuat atau mengupdate cluster virtual. Zona node pool harus
berada di region cluster virtual terkait.
Pemetaan peran ke node pool
Peran node pool
ditentukan untuk tugas driver dan eksekutor Spark, dengan peran default
yang ditentukan untuk semua jenis tugas oleh node pool. Cluster Dataproc di GKE harus memiliki setidaknya satu node pool yang diberi peran default.
Penetapan peran lainnya bersifat opsional.
Rekomendasi: Buat node pool terpisah untuk setiap jenis peran, dengan jenis dan ukuran node berdasarkan persyaratan peran.
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Sulit dipahami","hardToUnderstand","thumb-down"],["Informasi atau kode contoh salah","incorrectInformationOrSampleCode","thumb-down"],["Informasi/contoh yang saya butuhkan tidak ada","missingTheInformationSamplesINeed","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],["Terakhir diperbarui pada 2025-09-02 UTC."],[[["\u003cp\u003eDataproc on GKE virtual clusters utilize node pools to run jobs, and if a specified node pool doesn't exist, Dataproc on GKE will create it with the designated settings.\u003c/p\u003e\n"],["\u003cp\u003eYou can define various settings for Dataproc on GKE node pools, including \u003ccode\u003eaccelerators\u003c/code\u003e, \u003ccode\u003emachineType\u003c/code\u003e, \u003ccode\u003eminNodeCount\u003c/code\u003e, \u003ccode\u003emaxNodeCount\u003c/code\u003e, \u003ccode\u003epreemptible\u003c/code\u003e, and \u003ccode\u003espot\u003c/code\u003e, among others.\u003c/p\u003e\n"],["\u003cp\u003eWhen a Dataproc on GKE cluster is deleted, the associated node pools are not automatically deleted, and they need to be deleted separately.\u003c/p\u003e\n"],["\u003cp\u003eNode pool zones must reside within the same region as the associated virtual cluster, and they can be specified when creating or updating the virtual cluster.\u003c/p\u003e\n"],["\u003cp\u003eNode pools are assigned roles such as \u003ccode\u003edefault\u003c/code\u003e, \u003ccode\u003espark-driver\u003c/code\u003e, and \u003ccode\u003espark-executor\u003c/code\u003e, with at least one node pool needing to be assigned the \u003ccode\u003edefault\u003c/code\u003e role.\u003c/p\u003e\n"]]],[],null,["When you\n[create](/dataproc/docs/guides/dpgke/quickstarts/dataproc-gke-quickstart-create-cluster) or\n[update](/dataproc/docs/guides/dpgke/dataproc-gke-recreate-cluster)\na Dataproc on GKE virtual cluster, you specify one or more node pools that\nthe virtual cluster will use to run jobs (this cluster is referred to as the\ncluster \"used by\" or \"associated\" with the specified node pools). If a specified node pool\ndoes not exist on your GKE cluster, Dataproc on GKE\nwill create the node pool on the GKE cluster with settings\nyou specify. If the node pool exists and was created by Dataproc,\nit will be validated to confirm that its settings match the specified settings.\n\nDataproc on GKE node pool settings\n\nYou can specify the following\n[settings](/dataproc/docs/reference/rest/v1/projects.regions.clusters#gkenodeconfig)\non node pools used by your Dataproc on GKE virtual clusters (these\nsettings are a subset of\n[GKE node pool settings](/kubernetes-engine/docs/reference/rest/v1/NodeConfig)):\n\n- `accelerators`\n- `acceleratorCount`\n- `acceleratorType`\n- `gpuPartitionSize`\\*\n- `localSsdCount`\n- `machineType`\n- `minCpuPlatform`\n- `minNodeCount`\n- `maxNodeCount`\n- `preemptible`\n- `spot`\\*\n\nNotes:\n\n- `gpuPartitionSize` can be set in the Dataproc API [`GkeNodePoolAcceleratorConfig`](/dataproc/docs/reference/rest/v1/GkeClusterConfig#gkenodepoolacceleratorconfig).\n- `spot` can be set in the Dataproc API [GkeNodeConfig](/dataproc/docs/reference/rest/v1/projects.regions.clusters#gkenodeconfig).\n\nNode pool deletion\n\nWhen a Dataproc on GKE cluster is deleted, the node pools used by the cluster\nare not deleted. See [Delete a node pool](/kubernetes-engine/docs/how-to/node-pools#deleting_a_node_pool)\nto delete node pools no longer in use by Dataproc on GKE clusters.\n\nNode pool location\n\nYou can specify the\n[zone](/dataproc/docs/reference/rest/v1/projects.regions.clusters#GkeNodePoolConfig.FIELDS.locations)\nlocation of node pools associated with your Dataproc on GKE virtual cluster\nwhen you create or update the virtual cluster. The node pool zones must be\nlocated in the region of the associated virtual cluster.\n\nRole to node pool mapping\n\nNode pool [roles](/dataproc/docs/reference/rest/v1/projects.regions.clusters#Role)\nare defined for Spark driver and executor work, with a default role\ndefined for all types of work by a node pool. Dataproc on GKE clusters must have\nat least one a node pool that is assigned the `default` role.\nAssigning other roles is optional.\n\n**Recommendation:** Create separate node pools for each role type, with node type\nand size based on role requirements.\n\ngcloud CLI virtual cluster creation example: \n\n```\ngcloud dataproc clusters gke create \"${DP_CLUSTER}\" \\\n --region=${REGION} \\\n --gke-cluster=${GKE_CLUSTER} \\\n --spark-engine-version=latest \\\n --staging-bucket=${BUCKET} \\\n --pools=\"name=${DP_POOLNAME},roles=default \\\n --setup-workload-identity\n --pools=\"name=${DP_CTRL_POOLNAME},roles=default,machineType=e2-standard-4\" \\\n --pools=\"name=${DP_DRIVER_POOLNAME},min=1,max=3,roles=spark-driver,machineType=n2-standard-4\" \\\n --pools=\"name=${DP_EXEC_POOLNAME},min=1,max=10,roles=spark-executor,machineType=n2-standard-8\"\n```"]]