Dataproc on GKE release versions

Spark Engine 3.5

Component 3.5-dataproc-22 3.5-dataproc-17
Apache Spark 3.5.0 3.5.0
Hadoop Libraries 3.3.6 3.3.6
Cloud Storage Connector 3.0.0 3.0.0
Java 11 11
Python 3.8.5 3.8.5
Conda 4.9.2 4.9.2
R 4.3.0 4.3.0

Spark Engine 3.1(Deprecated)

Component 3.1-dataproc-17 3.1-dataproc-16 3.1-dataproc-15 3.1-dataproc-14
Apache Spark 3.1.3 3.1.3 3.1.3 3.1.3
Hadoop Libraries 3.2.3 3.2.3 3.2.3 3.2.3
Cloud Storage Connector hadoop3-2.2.13 hadoop3-2.2.13 hadoop3-2.2.11 hadoop3-2.2.11
Java 8 8 8 8
Python 3.8.5 3.8.5 3.8.5 3.8.5
Conda 4.9.2 23.5.0 4.9.2 4.9.2
R 4.3.0 4.3.0 4.2.3 4.2.3

Spark Engine 2.4(Deprecated)

Spark 2.4 has reached EOL for support in DPGKE. The public image continues to be available with no further support.

Component 2.4-dataproc-17 2.4-dataproc-16 2.4-dataproc-15 2.4-dataproc-14
Apache Spark 2.4.8 2.4.8 2.4.8 2.4.8
Hadoop Libraries 2.10.2 2.10.2 2.10.2 2.10.2
Cloud Storage Connector hadoop2-2.1.9 hadoop2-2.1.9 hadoop2-2.1.9 hadoop2-2.1.9
Java 8 8 8 8
Python 3.7.4 3.7.4 3.7.4 3.7.4
Conda 4.7.12 22.11.1 22.11.1 22.1.0
R 3.6.3 3.6.3 3.6.3 3.6.3

A fully qualified Spark engine release version is expressed as: 3.1-dataproc-[NUMBER] or 3.5-dataproc-[NUMBER], for example, 3.1-dataproc-17 or 3.5-dataproc-17.

Spark Version formats can also be expressed in alias form, as shown in the following examples:

  • 3 - Most recent version of Spark engine with a Spark major version of 3.
  • 3.5 - Most recent version of Spark engine with a Spark major.minor version of 3.5.
  • dataproc-2.2 - Most recent version of Spark engine that is compatible with Dataproc on Compute Engine 2.2 images.
  • latest - Most recent version of Spark engine.