Use CMEK with Google Cloud Serverless for Apache Spark

By default, Google Cloud Serverless for Apache Spark encrypts customer content at rest. Serverless for Apache Spark handles encryption for you without any additional actions on your part. This option is called Google default encryption.

If you want to control your encryption keys, then you can use customer-managed encryption keys (CMEKs) in Cloud KMS with CMEK-integrated services including Serverless for Apache Spark. Using Cloud KMS keys gives you control over their protection level, location, rotation schedule, usage and access permissions, and cryptographic boundaries. Using Cloud KMS also lets you track key usage, view audit logs, and control key lifecycles. Instead of Google owning and managing the symmetric key encryption keys (KEKs) that protect your data, you control and manage these keys in Cloud KMS.

After you set up your resources with CMEKs, the experience of accessing your Serverless for Apache Spark resources is similar to using Google default encryption. For more information about your encryption options, see Customer-managed encryption keys (CMEK).

Use CMEK

Follow the steps in this section to use CMEK to encrypt data that Google Cloud Serverless for Apache Spark writes to persistent disk and to the Dataproc staging bucket.

Beginning April 23, 2024:

Serverless for Apache Spark also uses your CMEK to encrypt batch job arguments. The Cloud KMS CryptoKey Encrypter/Decrypter IAM role must be assigned to the Dataproc Service Agent service account to enable this behavior. If the Dataproc Service Agent role is not attached to the Dataproc Service Agent service account, then add the serviceusage.services.use permission to a custom role attached to the Dataproc Service Agent service account . The Cloud KMS API must be enabled on the project that runs Serverless for Apache Spark resources.
batches.list returns an unreachable field that lists any batches with job arguments that couldn't be decrypted. You can issue batches.get requests to obtain more information on unreachable batches.
The key (CMEK) must be located in the same location as the encrypted resource. For example, the CMEK used to encrypt a batch that runs in the us-central1 region must also be located in the us-central1 region.

Create a key using the Cloud Key Management Service (Cloud KMS).

Copy the resource name.

The resource name is is constructed as follows:

projects/PROJECT_ID/locations/REGION/keyRings/KEY_RING_NAME/cryptoKeys/KEY_NAME

Enable the Compute Engine, Dataproc, and Cloud Storage Service Agent service accounts to use your key:
1. See Protect resources by using Cloud KMS keys > Required Roles to assign the Cloud KMS CryptoKey Encrypter/Decrypter role to the Compute Engine Service Agent service account. If this service account is not listed on the IAM page in Google Cloud console, click Include Google-provided role grants to list it.
2. Assign the Cloud KMS CryptoKey Encrypter/Decrypter role to the Dataproc Service Agent service account. You can use the Google Cloud CLI to assign the role:
```
 gcloud projects add-iam-policy-binding KMS_PROJECT_ID \
 --member serviceAccount:service-PROJECT_NUMBER@dataproc-accounts.iam.gserviceaccount.com \
 --role roles/cloudkms.cryptoKeyEncrypterDecrypter
```
  Replace the following:
  
  KMS_PROJECT_ID: the ID of your Google Cloud project that runs Cloud KMS. This project can also be the project that runs Dataproc resources.
  
  PROJECT_NUMBER: the project number (not the project ID) of your Google Cloud project that runs Dataproc resources.
3. Enable the Cloud KMS API on the project that runs Serverless for Apache Spark resources.
4. If the Dataproc Service Agent role is not attached to the Dataproc Service Agent service account, then add the serviceusage.services.use permission to the custom role attached to the Dataproc Service Agent service account. If the Dataproc Service Agent role is attached to the Dataproc Service Agent service account, you can skip this step.
5. Follow the steps to add your key on the bucket.
When you submit a batch workload:
1. Specify your key in the batch kmsKey parameter.
2. Specify the name of your Cloud Storage bucket in the batch stagingBucket parameter.
When you create an interactive session or session template:
1. Specify your key in the session kmsKey parameter.
2. Specify the name of your Cloud Storage bucket in the session stagingBucket parameter.