When you create a cluster, standard Apache Hadoop ecosystem components are automatically installed on the cluster (see Dataproc cluster image version lists). You can install additional components, called "optional components", on the cluster when you create the cluster. Adding optional components to a cluster is similar to adding components through the use of initialization actions, but has the following advantages:
- Faster cluster startup times
- Tested compatibility with specific Dataproc versions
- Use of a cluster parameter instead of an initialization action script
Available optional components
Optional component | COMPONENT_NAME in Google Cloud CLI commands and API requests |
Image Version | Release Stage |
---|---|---|---|
Delta Lake | DELTA | 2.2.46 and later | GA |
Docker | DOCKER | 1.5 and later | GA |
Flink | FLINK | 1.5 and later | GA |
HBase | HBASE | 1.5 and later (not available in 2.1 and later) |
Beta |
Hive WebHCat | HIVE_WEBHCAT | 1.3 and later | GA |
Hudi | Hudi | 1.5 and later | GA |
Iceberg | Iceberg | 2.2 and later | GA |
Jupyter Notebook | JUPYTER | 1.3 and later | GA |
Presto | PRESTO | 1.3 and later (not available in 2.1 and later) |
GA |
Ranger | RANGER | 1.3 and later | GA |
Solr | SOLR | 1.3 and later | GA |
Trino | TRINO | 2.1 and later | GA |
Zeppelin Notebook | ZEPPELIN | 1.3 and later | GA |
Zookeeper | ZOOKEEPER | 1.0 and later | GA |
Add optional components
Console
- In the Google Cloud console, go to the Dataproc
Create a cluster page.
The Set up cluster panel is selected.
- In the Components section, under Optional components, elect one or more components to install on your cluster.
Google Cloud CLI
To create a Dataproc cluster and install one or more
optional components on the cluster, use the
gcloud beta dataproc clusters create cluster-name
command with the --optional-components
flag.
gcloud dataproc clusters create cluster-name \ --optional-components=COMPONENT-NAME(s) \ ... other flags