Serverless for Apache Spark staging buckets

This document provides information about Serverless for Apache Spark staging buckets. Serverless for Apache Spark creates a Cloud Storage staging bucket in your project or reuses an existing staging bucket from previous batch creation requests. This is the default bucket created by Dataproc on Compute Engine clusters. For more information, see Dataproc staging and temp buckets.

Serverless for Apache Spark stores workload dependencies, config files, and job driver console output in the staging bucket.

Serverless for Apache Spark sets regional staging buckets in Cloud Storage locations according to the Compute Engine zone where your workload is deployed, and then creates and manages these project-level, per-location buckets. Serverless for Apache Spark-created staging buckets are shared among workloads in the same region, and are created with a Cloud Storage soft delete retention duration set to 0 seconds.

To locate the Dataproc default staging bucket, in the Google Cloud console, go to Cloud Storage and filter the results using the dataproc-staging- prefix.