Secret Manager Credential Provider

This document describes how to use Secret Manager as a credential store with Dataproc Serverless to safely store and access sensitive data processed by serverless workloads.

Overview

The Secret Manager can safeguard your sensitive data, such as your API keys, passwords, and certificates. You can use it to manage, access, and audit your secrets across Google Cloud.

When you run a Dataproc Serverless batch workload, you can configure it to use a Secret Manager secret by using the Dataproc Secret Manager Credential Provider.

Availability

This feature is available for Dataproc Serverless for Spark runtime versions 1.2.29+, 2.2.29+, or later major runtime versions.

Terminology

The following table describes the terms used in this document.

Term Description
Secret A Secret Manager secret is a global project object that contains a collection of metadata and secret versions. You can store, manage, and access secrets as binary blobs or text strings.
Credential In Hadoop and other Dataproc workloads, a credential consists of a credential name (ID) and credential value (password). A credential ID and value map to a Secret Manager secret ID and secret value (secret version).

Usage

You can configure supported Hadoop and other OSS components to work with the Secret Manager by setting the following properties when you submit a Dataproc Serverless workload:

  • Provider path (required): The provider path property, hadoop.security.credential.provider.path, is a comma-separated list of one or more credential provider URIs that is traversed to resolve a credential.

    --properties=hadoop.security.credential.provider.path=gsm://projects/PROJECT_ID
    
    • The scheme in the provider path indicates the credential provider type. Hadoop schemes include jceks://, user://,localjceks://. Use the gsm:// scheme to search for credentials in Secret Manager.
  • Substitute dot operator : The Secret Manager service does not allow dots (.) in secret names. However, some open source software (OSS) components use dots in their credential keys. To fix this limitation, enable this property to replace dots (.) with hyphens (-) in credential names. This ensures that OSS credentials with dots in their names can be stored and retrieved correctly from Secret Manager.

    For example, If an OSS credential key is a.b.c, you must modify it to a-b-c when storing it in Secret Manager.

    --properties=hadoop.security.credstore.google-secret-manager.secret-id.substitute-dot-operator=true
    

    This is an optional property. By default, the value is false. For credentials keys that have no dot (.) operator in their credential name, this property can be safely ignored.

  • Secret version : Secrets in Secret Manager can have multiple versions (values). Use this property to access a specific secret version for stable access in production environments.

    --properties=hadoop.security.credstore.google-secret-manager.secret-version=1
    

    This is an optional property. By default, Secret Manager accesses the LATEST version, which resolves to the latest value of the secret at runtime. If your use case is to always access the LATEST version of a secret, this property can be safely ignored.

Run a batch workload with Secret Manager Credential Provider

To submit a batch workload that uses Secret Manager Credential Provider, run the following command locally or in Cloud Shell.

gcloud dataproc batches submit spark \
    --region=REGION \
    --jars=JARS \
    --class=MAIN_CLASS \
    --properties="spark.hive.hadoop.security.credential.provider.path=gsm://projects/PROJECT_ID,spark.hive.hadoop.security.credstore.google-secret-manager.secret-id.substitute-dot-operator=true" \
    ...other flags as needed...

Replace the following: