[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-08-27。"],[[["\u003cp\u003eOptional components can be installed on Dataproc clusters during creation, alongside standard Apache Hadoop ecosystem components.\u003c/p\u003e\n"],["\u003cp\u003eInstalling optional components offers benefits like faster cluster startup times and tested compatibility with specific Dataproc versions, and utilizes a cluster parameter.\u003c/p\u003e\n"],["\u003cp\u003eOptional components are installed before any initialization actions are run on the cluster.\u003c/p\u003e\n"],["\u003cp\u003eA variety of optional components are available, including Docker, Flink, HBase, Hive WebHCat, Hudi, Jupyter, Presto, Ranger, Solr, Trino, Zeppelin, and Zookeeper, with some availability dependent on the image version.\u003c/p\u003e\n"],["\u003cp\u003eOptional components can be installed using \u003ccode\u003egcloud\u003c/code\u003e commands with the \u003ccode\u003e--optional-components\u003c/code\u003e flag, the REST API through \u003ccode\u003eSoftwareConfig.Component\u003c/code\u003e, or the Google Cloud console during cluster creation.\u003c/p\u003e\n"]]],[],null,["# Dataproc components\n\nDataproc clusters feature the following types of components:\n\n- Installed components: Components that are installed in the image and activated\n when the cluster is created.\n\n- Optional components: Components that you select to install and use on\n your cluster when you create the cluster. Dataproc installs and\n activates optional components depending on the cluster image version as follows:\n\n - **`2.2` and earlier image versions**: Optional components are automatically\n installed. Selected optional components are activated and non-selected\n optional components are uninstalled at cluster creation.\n\n - **`2.3` and later image versions** : All optional components are installed during\n cluster creation except the Jupyter, Iceberg, and Delta Lake optional components,\n which are pre-installed in `2.3` and later image versions. Pre-installed\n optional components are removed from a `2.3` or later image version cluster\n if they are not enabled when the cluster is created. For more information, see\n [Dataproc 2.3.x release versions](/dataproc/docs/concepts/versioning/dataproc-release-2.3).\n\n | To avoid increased startup time for `2.3` and later image version clusters, create a [custom image](/dataproc/docs/guides/dataproc-images#generate_a_custom_image) with optional components pre-installed. You can do this by running [`generate_custom_image.py`](https://github.com/GoogleCloudDataproc/custom-images?tab=readme-ov-file#generate-custom-image) with the [`--optional-components`](/dataproc/docs/guides/dataproc-images#run_the_code) flag.\n\n \u003cbr /\u003e\n\n- Initialization action components: Components installed on a cluster as part\n of an [initialization action](/dataproc/docs/concepts/configuring-clusters/init-actions)\n that you specify when you create a cluster.\n\nOptional components are installed on a cluster before\n[initialization actions](/dataproc/docs/concepts/configuring-clusters/init-actions)\u003c\u003e\nare run on the cluster.\n\nThe [Dataproc image version pages](/dataproc/docs/concepts/versioning/dataproc-version-clusters#supported-dataproc-image-versions)\nlist the components and component types available in the latest\nDataproc image releases.\n\nOptional components have the following advantages over initialization actions\nused to install components:\n\n- Optional components are tested as compatible with specific Dataproc versions.\n- Optional components are enabled with a cluster creation parameter; initialization actions require a script.\n\nAvailable optional components\n-----------------------------\n\nNotes:\n\n- Apache Pig is an optional component in image versions 2.3 and later. It was pre-installed in `2.2` and earlier image versions.\n\n| See [Cluster web interfaces](/dataproc/docs/concepts/accessing/cluster-web-interfaces) for connecting to component Web interfaces running on clusters. Also see the Dataproc [Component Gateway](/dataproc/docs/concepts/accessing/dataproc-gateways), which lets you connect to the web interfaces of Dataproc core and optional components, including YARN, HDFS, Jupyter, and Zeppelin UIs, without requiring the use of [SSH tunnels](/dataproc/docs/concepts/accessing/cluster-web-interfaces#create_an_ssh_tunnel) or the [modification of firewall rules](/dataproc/docs/concepts/configuring-clusters/network) to allow inbound traffic.\n\nAdd optional components\n-----------------------\n\n**Note:** The following usage examples apply to [General Availability (GA)](/products#product-launch-stages) components. \n\n### Console\n\n1. In the Google Cloud console, go to the Dataproc **Create a cluster** page.\n\n [Go to Create a cluster](https://console.cloud.google.com/dataproc/clustersAdd)\n\n The **Set up cluster** panel is selected.\n2. In the **Components** section, under **Optional components**, select one or more components to install on your cluster.\n\n### Google Cloud CLI\n\nTo create a Dataproc cluster and install one or more\noptional components on the cluster, use the\n`gcloud beta dataproc clusters create `\u003cvar translate=\"no\"\u003ecluster-name\u003c/var\u003e` `\ncommand with the `--optional-components` flag. \n\n```\ngcloud dataproc clusters create CLUSTER_NAME \\\n --optional-components=COMPONENT-NAME(s) \\\n ... other flags\n```\n\n### REST API\n\nOptional components can be specified through the Dataproc API\nusing\n[SoftwareConfig.Component](/dataproc/docs/reference/rest/v1/ClusterConfig#Component)\nas part of a\n[clusters.create](/dataproc/docs/reference/rest/v1/projects.regions.clusters/create)\nrequest."]]