Create and secure a Hive metastore cluster

Overview

When you create a Dataproc cluster, the Apache Hive application and its components, including Hive metastore, are installed on the cluster, and a default password is set in the hive-site.xml file located on the cluster master node.

Specifying your own Hive metastore password is recommended to accomplish the following goals:

  • As a best-practice security measure to ensure you control access to the local Hive metastore by providing your own password

  • To specify a known password that controls access to external Hive metastores used with external databases that are shared among different clusters

Set the Hive metastore password

Run the following Google Cloud CLI Dataproc clusters create command to create a Dataproc cluster and specify a Hive metastore password.

gcloud dataproc clusters create cluster-name
 --properties="hive:javax.jdo.option.ConnectionPassword=HIVE_METASTORE_PASSWORD"

Notes:

  • See Create a key to create a key in Cloud Key Management Service.
  • The Hive metastore password is stored only on cluster master node(s), not on worker nodes.

For additional information on securing Dataproc clusters see Dataproc security best practices.

Unsupported Scenarios

Dataproc does not support the following Hive metastore scenarios, regardless of whether you use the default or a user-supplied Hive metastore password:

  • You use an embedded metastore client in the Spark driver running in cluster mode, so that the worker nodes require Hive passwords. This scenario can cause connectivity problems with the metastore database since the connection is not made through the HiveMetaStore process running on the Dataproc master node.

  • You deactivate Hive metastore and hive-server2 to use your own MySQL database. In this scenario, the spark.hadoop.javax.jdo.option.ConnectionURL=jdbc:mysql://CLUSTER_NAME-m/metastore property has no effect.