Important changes in 2.3:
Version
2.3
is a lightweight image that contains only core components, reducing exposure to Common Vulnerabilities and Exposures (CVEs). For higher security compliance requirements, use the image version2.3
or later, when creating a Dataproc cluster.If you choose to install optional components when creating a Dataproc cluster with
2.3
image, they will be downloaded and installed during cluster creation. This might increase the cluster startup time. To avoid this delay, you can create a custom image with the optional components pre-installed. This is achieved by runninggenerate_custom_image.py
with the--optional-components
flag.
Notes:
The following are the optional components in 2.3 images:
- Apache Flink
- Apache Hive WebHCat
- Apache Hudi
- Apache Iceberg
- Apache Pig
- Delta Lake
- Docker
- JupyterLab Notebook
- Ranger
- Solr
- Zeppelin Notebook
- Zookeeper
yarn.nodemanager.recovery.enabled
and HDFS Audit Logging are enabled by default in 2.3 images.micromamba, instead of conda in previous image versions, is installed as part of the Python installation.
Docker and Zeppelin installation issues:
- Installation fails if the cluster has no public internet access. As a
workaround, create a cluster that uses a custom image with optional
components pre-installed. You can do this by running
generate_custom_image.py
with the--optional-components
flag. - Installation can fail if the cluster is pinned to an older sub-minor image
version: Packages are installed on demand from public OSS repositories, and a package
might not be available upstream to support the installation.
As a workaround, create a cluster that uses a custom image with optional
components pre-installed in the custom image. To do this, run
generate_custom_image.py
with the--optional-components
flag.
- Installation fails if the cluster has no public internet access. As a
workaround, create a cluster that uses a custom image with optional
components pre-installed. You can do this by running