This document describes how to use Secret Manager as a credential store with Dataproc Serverless to safely store and access sensitive data processed by serverless workloads.
Overview
The Secret Manager can safeguard your sensitive data, such as your API keys, passwords, and certificates. You can use it to manage, access, and audit your secrets across Google Cloud.
When you run a Dataproc Serverless batch workload, you can configure it to use a Secret Manager secret by using the Dataproc Secret Manager Credential Provider.
Availability
This feature is available for Dataproc Serverless for Spark runtime versions 1.2.29+, 2.2.29+, or later major runtime versions.
Terminology
The following table describes the terms used in this document.
Term | Description |
---|---|
Secret |
A Secret Manager secret is a global project object that contains a collection of metadata and secret versions. You can store, manage, and access secrets as binary blobs or text strings. |
Credential |
In Hadoop and other Dataproc workloads, a credential consists of a credential name (ID) and credential value (password). A credential ID and value map to a Secret Manager secret ID and secret value (secret version). |
Usage
You can configure supported Hadoop and other OSS components to work with the Secret Manager by setting the following properties when you submit a Dataproc Serverless workload:
Provider path (required): The provider path property,
hadoop.security.credential.provider.path
, is a comma-separated list of one or more credential provider URIs that is traversed to resolve a credential.--properties=hadoop.security.credential.provider.path=gsm://projects/PROJECT_ID
- The
scheme
in the provider path indicates the credential provider type. Hadoop schemes includejceks://
,user://
,localjceks://
. Use thegsm://
scheme to search for credentials in Secret Manager.
- The
Substitute dot operator : The Secret Manager service does not allow dots (
.
) in secret names. However, some open source software (OSS) components use dots in their credential keys. To fix this limitation, enable this property to replace dots (.
) with hyphens (-
) in credential names. This ensures that OSS credentials with dots in their names can be stored and retrieved correctly from Secret Manager.For example, If an OSS credential key is
a.b.c
, you must modify it toa-b-c
when storing it in Secret Manager.--properties=hadoop.security.credstore.google-secret-manager.secret-id.substitute-dot-operator=true
This is an optional property. By default, the value is
false
. For credentials keys that have no dot (.
) operator in their credential name, this property can be safely ignored.Secret version : Secrets in Secret Manager can have multiple versions (values). Use this property to access a specific secret version for stable access in production environments.
--properties=hadoop.security.credstore.google-secret-manager.secret-version=1
This is an optional property. By default, Secret Manager accesses the
LATEST
version, which resolves to the latest value of the secret at runtime. If your use case is to always access theLATEST
version of a secret, this property can be safely ignored.
Run a batch workload with Secret Manager Credential Provider
To submit a batch workload that uses Secret Manager Credential Provider, run the following command locally or in Cloud Shell.
gcloud dataproc batches submit spark \ --region=REGION \ --jars=JARS \ --class=MAIN_CLASS \ --properties="spark.hive.hadoop.security.credential.provider.path=gsm://projects/PROJECT_ID,spark.hive.hadoop.security.credstore.google-secret-manager.secret-id.substitute-dot-operator=true" \ ...other flags as needed...
Replace the following:
- REGION: a Compute Engine region where your workload runs
- JARS: workload jar path
- MAIN_CLASS: the Jar main class
- PROJECT_ID: your project ID, listed in the Project info section of the Google Cloud console dashboard