Tetap teratur dengan koleksi
Simpan dan kategorikan konten berdasarkan preferensi Anda.
Cluster Dataproc memiliki jenis komponen berikut:
Komponen yang diinstal: Komponen yang diinstal dalam image dan diaktifkan
saat cluster dibuat.
Komponen opsional: Komponen yang Anda pilih untuk diinstal dan digunakan di cluster saat Anda membuat cluster. Dataproc menginstal dan mengaktifkan komponen opsional bergantung pada versi image cluster sebagai berikut:
2.2 dan versi gambar sebelumnya: Komponen opsional diinstal secara otomatis. Komponen opsional yang dipilih diaktifkan dan komponen opsional yang tidak dipilih akan di-uninstal saat pembuatan cluster.
Versi image 2.3 dan yang lebih baru: Semua komponen opsional diinstal selama
pembuatan cluster, kecuali komponen opsional Jupyter, Iceberg, dan Delta Lake,
yang telah diinstal sebelumnya di versi image 2.3 dan yang lebih baru. Komponen opsional yang telah diinstal sebelumnya dihapus dari cluster versi image 2.3 atau yang lebih baru jika tidak diaktifkan saat cluster dibuat. Untuk mengetahui informasi selengkapnya, lihat
Versi rilis Dataproc 2.3.x.
Komponen tindakan inisialisasi: Komponen yang diinstal pada cluster sebagai bagian
dari tindakan inisialisasi
yang Anda tentukan saat membuat cluster.
Komponen opsional diinstal pada cluster sebelum
tindakan inisialisasi
dijalankan di cluster.
Halaman versi image Dataproc mencantumkan komponen dan jenis komponen yang tersedia dalam rilis image Dataproc terbaru.
Komponen opsional memiliki keuntungan berikut dibandingkan tindakan inisialisasi
yang digunakan untuk menginstal komponen:
Komponen opsional diuji agar kompatibel dengan versi Dataproc tertentu.
Komponen opsional diaktifkan dengan parameter pembuatan cluster; tindakan inisialisasi memerlukan skrip.
Komponen opsional yang tersedia
Komponen opsional
Nama komponen
dalam perintah Google Cloud CLI dan permintaan API
Apache Pig adalah komponen opsional dalam versi image 2.3 dan yang lebih baru. Aplikasi ini telah diinstal sebelumnya di 2.2 dan versi image yang lebih lama.
Menambahkan komponen opsional
Konsol
Di konsol Google Cloud , buka halaman
Create a cluster Dataproc.
Di bagian Components, di bagian
Optional components, pilih satu atau beberapa komponen yang akan
diinstal di cluster Anda.
Google Cloud CLI
Untuk membuat cluster Dataproc dan menginstal satu atau beberapa
komponen opsional di cluster, gunakan perintah
gcloud beta dataproc clusters create cluster-name
dengan tanda --optional-components.
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Sulit dipahami","hardToUnderstand","thumb-down"],["Informasi atau kode contoh salah","incorrectInformationOrSampleCode","thumb-down"],["Informasi/contoh yang saya butuhkan tidak ada","missingTheInformationSamplesINeed","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],["Terakhir diperbarui pada 2025-08-22 UTC."],[[["\u003cp\u003eOptional components can be installed on Dataproc clusters during creation, alongside standard Apache Hadoop ecosystem components.\u003c/p\u003e\n"],["\u003cp\u003eInstalling optional components offers benefits like faster cluster startup times and tested compatibility with specific Dataproc versions, and utilizes a cluster parameter.\u003c/p\u003e\n"],["\u003cp\u003eOptional components are installed before any initialization actions are run on the cluster.\u003c/p\u003e\n"],["\u003cp\u003eA variety of optional components are available, including Docker, Flink, HBase, Hive WebHCat, Hudi, Jupyter, Presto, Ranger, Solr, Trino, Zeppelin, and Zookeeper, with some availability dependent on the image version.\u003c/p\u003e\n"],["\u003cp\u003eOptional components can be installed using \u003ccode\u003egcloud\u003c/code\u003e commands with the \u003ccode\u003e--optional-components\u003c/code\u003e flag, the REST API through \u003ccode\u003eSoftwareConfig.Component\u003c/code\u003e, or the Google Cloud console during cluster creation.\u003c/p\u003e\n"]]],[],null,["# Dataproc components\n\nDataproc clusters feature the following types of components:\n\n- Installed components: Components that are installed in the image and activated\n when the cluster is created.\n\n- Optional components: Components that you select to install and use on\n your cluster when you create the cluster. Dataproc installs and\n activates optional components depending on the cluster image version as follows:\n\n - **`2.2` and earlier image versions**: Optional components are automatically\n installed. Selected optional components are activated and non-selected\n optional components are uninstalled at cluster creation.\n\n - **`2.3` and later image versions** : All optional components are installed during\n cluster creation except the Jupyter, Iceberg, and Delta Lake optional components,\n which are pre-installed in `2.3` and later image versions. Pre-installed\n optional components are removed from a `2.3` or later image version cluster\n if they are not enabled when the cluster is created. For more information, see\n [Dataproc 2.3.x release versions](/dataproc/docs/concepts/versioning/dataproc-release-2.3).\n\n | To avoid increased startup time for `2.3` and later image version clusters, create a [custom image](/dataproc/docs/guides/dataproc-images#generate_a_custom_image) with optional components pre-installed. You can do this by running [`generate_custom_image.py`](https://github.com/GoogleCloudDataproc/custom-images?tab=readme-ov-file#generate-custom-image) with the [`--optional-components`](/dataproc/docs/guides/dataproc-images#run_the_code) flag.\n\n \u003cbr /\u003e\n\n- Initialization action components: Components installed on a cluster as part\n of an [initialization action](/dataproc/docs/concepts/configuring-clusters/init-actions)\n that you specify when you create a cluster.\n\nOptional components are installed on a cluster before\n[initialization actions](/dataproc/docs/concepts/configuring-clusters/init-actions)\u003c\u003e\nare run on the cluster.\n\nThe [Dataproc image version pages](/dataproc/docs/concepts/versioning/dataproc-version-clusters#supported-dataproc-image-versions)\nlist the components and component types available in the latest\nDataproc image releases.\n\nOptional components have the following advantages over initialization actions\nused to install components:\n\n- Optional components are tested as compatible with specific Dataproc versions.\n- Optional components are enabled with a cluster creation parameter; initialization actions require a script.\n\nAvailable optional components\n-----------------------------\n\nNotes:\n\n- Apache Pig is an optional component in image versions 2.3 and later. It was pre-installed in `2.2` and earlier image versions.\n\n| See [Cluster web interfaces](/dataproc/docs/concepts/accessing/cluster-web-interfaces) for connecting to component Web interfaces running on clusters. Also see the Dataproc [Component Gateway](/dataproc/docs/concepts/accessing/dataproc-gateways), which lets you connect to the web interfaces of Dataproc core and optional components, including YARN, HDFS, Jupyter, and Zeppelin UIs, without requiring the use of [SSH tunnels](/dataproc/docs/concepts/accessing/cluster-web-interfaces#create_an_ssh_tunnel) or the [modification of firewall rules](/dataproc/docs/concepts/configuring-clusters/network) to allow inbound traffic.\n\nAdd optional components\n-----------------------\n\n**Note:** The following usage examples apply to [General Availability (GA)](/products#product-launch-stages) components. \n\n### Console\n\n1. In the Google Cloud console, go to the Dataproc **Create a cluster** page.\n\n [Go to Create a cluster](https://console.cloud.google.com/dataproc/clustersAdd)\n\n The **Set up cluster** panel is selected.\n2. In the **Components** section, under **Optional components**, select one or more components to install on your cluster.\n\n### Google Cloud CLI\n\nTo create a Dataproc cluster and install one or more\noptional components on the cluster, use the\n`gcloud beta dataproc clusters create `\u003cvar translate=\"no\"\u003ecluster-name\u003c/var\u003e` `\ncommand with the `--optional-components` flag. \n\n```\ngcloud dataproc clusters create CLUSTER_NAME \\\n --optional-components=COMPONENT-NAME(s) \\\n ... other flags\n```\n\n### REST API\n\nOptional components can be specified through the Dataproc API\nusing\n[SoftwareConfig.Component](/dataproc/docs/reference/rest/v1/ClusterConfig#Component)\nas part of a\n[clusters.create](/dataproc/docs/reference/rest/v1/projects.regions.clusters/create)\nrequest."]]