[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-08-25。"],[[["\u003cp\u003eDataproc Serverless for Spark supports batch workloads and interactive sessions, managing the underlying infrastructure for you, whereas Dataproc on Compute Engine requires manual cluster management and supports a wider range of open-source components.\u003c/p\u003e\n"],["\u003cp\u003eYou can leverage Dataproc Serverless for Spark to run batch and streaming jobs, train models, utilize interactive SQL notebooks, and orchestrate workloads with Cloud Composer.\u003c/p\u003e\n"],["\u003cp\u003eDataproc Serverless allows for the use of custom container images and provides options to specify memory, disk resources, and the number of executors for Spark workloads.\u003c/p\u003e\n"],["\u003cp\u003eWorkload execution can be concurrent or sequential, with the ability to manage the IP address range within your VPC network to accommodate the desired number of concurrent workloads, within resource quotas.\u003c/p\u003e\n"],["\u003cp\u003eDataproc Serverless workloads use dynamic allocation metrics to autoscale, with the ability to configure autoscaling behavior by adjusting specific Spark and Spark dynamic allocation properties.\u003c/p\u003e\n"]]],[],null,["# Serverless for Apache Spark FAQ\n\nThis page contains frequently asked Google Cloud Serverless for Apache Spark questions with answers.\n\n### When should I use Serverless for Apache Spark instead of Dataproc on Compute Engine?\n\n- Serverless for Apache Spark:\n\n - Supports Spark batch workloads and interactive sessions in PySpark kernel Jupyter notebooks.\n - Serverless for Apache Spark creates and manages your workload and interactive session infrastructure.\n- Dataproc on Compute Engine:\n\n - Supports the submission of different types Spark jobs, and jobs based on\n other open source components, such as Flink, Hadoop, Hive, Pig, Presto,\n and others.\n\n - Does not create and manage infrastructure. You create and\n manage your Dataproc clusters.\n\n### What can I do with Serverless for Apache Spark?\n\n- [Run batch jobs](/dataproc-serverless/docs/quickstarts/spark-batch).\n\n- [Use the Dataproc JupyterLab plugin for serverless batch and\n interactive notebook sessions](/dataproc-serverless/docs/quickstarts/jupyterlab-sessions).\n\n- Run streaming jobs using Spark streaming libraries. Note: Streaming\n is not a managed service, so you must manage checkpointing and restarts.\n\n- Train models using Spark MLlib.\n\n- Use interactive SQL notebooks for data exploration, graph, time series, and\n geospatial analytics.\n\n- Orchestrate Serverless for Apache Spark workloads with Cloud Composer, a\n managed Apache Airflow service.\n\n### How should I set up a workload execution plan?\n\nYou can run workloads concurrently or sequentially. Your execution plan\nimpacts your Google Cloud resource quota. You can run as many workloads\nin parallel as your [batch resource](/dataproc-serverless/quotas#default_batch_resources)\nquotas allow.\n\nCan I use a custom image with Serverless for Apache Spark?\n----------------------------------------------------------\n\n- Yes. You can use a custom container image instead of the default container image. See [Use custom containers with Serverless for Apache Spark](/dataproc-serverless/docs/guides/custom-containers).\n\nCan I specify memory and disk resources for Serverless for Apache Spark Spark workloads?\n----------------------------------------------------------------------------------------\n\nYes. You can specify premium executor and driver compute and\ndisk tiers and the amount of driver and executor compute and disk resources\nto allocate when you submit a workload (see\n[Resource allocation properties](/dataproc-serverless/docs/concepts/properties#resource_allocation_properties)).\n\nHow can I specify the IP address range for my Serverless for Apache Spark VPC network?\n--------------------------------------------------------------------------------------\n\nServerless for Apache Spark workloads run within your environment.\nEach Spark driver and Spark executor in a Serverless Spark workload consumes one\ninternal IP address in your [Serverless for Apache Spark VPC network](/dataproc-serverless/docs/concepts/network).\n`/16` is a typical user-specified\n[CIDR](https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing) address range\nfor a [Serverless for Apache Spark VPC network](/dataproc-serverless/docs/concepts/network).\nYou can limit your network's IP address range based on the number of concurrent\nworkloads you plan to run.\n\nDoes Serverless for Apache Spark support data residency?\n--------------------------------------------------------\n\nYes. You specify the region where your workload is processed.\nLocate you input and output datasets in the specified region.\n\nHow does Serverless for Apache Spark select a zone within your specified region to run the workload?\n----------------------------------------------------------------------------------------------------\n\nServerless for Apache Spark selects the Compute Engine zone where it executes a workload\nbased on capacity and availability. If a zone becomes unavailable after\na workload starts, the workload fails, and you must resubmit the\nfailed workload.\n\nHow do Serverless for Apache Spark workloads use compute resources?\n-------------------------------------------------------------------\n\nEach workload executes on its own compute resources. Multiple batch\nsubmissions don't share or reuse compute resources.\n\n**Best Practices:**\n\n- Optimize your workload for medium-running jobs, not short-running jobs.\n\n- Persist data that is accessed by multiple workloads in Cloud Storage.\n\nWhere can I find information on Serverless for Apache Spark announcements, features, bug fixes, known issues, and deprecations?\n-------------------------------------------------------------------------------------------------------------------------------\n\nSee the [Serverless for Apache Spark release notes](/dataproc-serverless/docs/release-notes).\n\nDo concurrent workloads compete for resources?\n----------------------------------------------\n\nServerless for Apache Spark workloads only compete for resources\nif your resource quota is insufficient to run all concurrently running workloads.\nOtherwise, workloads are fully isolated from each other.\n\nHow is Serverless for Apache Spark quota allocated?\n---------------------------------------------------\n\nServerless for Apache Spark batches consume Google Cloud resources.\nSee [Dataproc Serverless quotas](/dataproc-serverless/quotas) for more\ninformation.\n\nDo I need to set up a Dataproc Persistent History Server?\n---------------------------------------------------------\n\nSetting up a [Persistent History Server (PHS)](/dataproc/docs/concepts/jobs/history-server)\nto use with Serverless for Apache Spark is optional.You can use the PHS\nto view Spark event and other logs in a specified Cloud Storage bucket up to and\nafter the standard\n[Serverless for Apache Spark staging and temp bucket](/dataproc-serverless/docs/concepts/buckets)\n90-day retention (TTL) period.\n| **Note:** The PHS must be located in the region where you run batch workloads.\n\nWhat Serverless for Apache Spark Spark logs are available?\n----------------------------------------------------------\n\nSpark executors and driver logs are available in Cloud Logging during and\nafter Spark workload execution. Also, Spark applications are visible in the\n[Persistent History Server (PHS)](/dataproc/docs/concepts/jobs/history-server)\nweb interface while the workload is running (select **PHS** \\\u003e **Incomplete Applications**\nin the PHS UI).\n\nIf you set up a Dataproc PHS, it provides persistent access to\nSpark event logs saved in Cloud Storage, which\nprovide insight into Spark app execution, such DAG and executor events.\n| **Note:** The PHS must be located in the region where you run batch workloads.\n\nCan I set the number of executors for my Spark workload?\n--------------------------------------------------------\n\nYes. You can set the number of executors for a Spark workload using the\n[`spark.executor.instances`](/dataproc-serverless/docs/concepts/properties#resource_allocation_properties)\nproperty. However, the total number of cores that a workload can use is more important\nthan the number of executors because Spark runs 1 task per core. For example,\nif a workload has four executors with two cores each, it will run `4 * 2 = 8` tasks\nat the same time. And it will also run the same number of tasks for a workload that\nhas two executors with four cores each. Since the number of cores for each workload is the\nsame, they will run the same number of tasks. You can use the\n[`spark.executor.cores`](/dataproc-serverless/docs/concepts/properties#resource_allocation_properties)\nproperty to set the number cores per executor for your Serverless for Apache Spark workload.\n\nWhat Spark metrics does Serverless for Apache Spark use for autoscaling?\n------------------------------------------------------------------------\n\nServerless for Apache Spark looks at the `maximum-needed` and `running`\nSpark's dynamic allocation metrics to determine whether to scale up or down.\nSee [Serverless for Apache Spark autoscaling](/dataproc-serverless/docs/concepts/autoscaling).\n\nCan I configure Serverless for Apache Spark autoscaling behavior using Spark properties?\n----------------------------------------------------------------------------------------\n\nYes. Serverless for Apache Spark autoscaling is based on Spark dynamic allocation, and\nis enabled by default. You can adjust the following\n[Spark properties](/dataproc-serverless/docs/concepts/properties#supported_spark_properties)\nand [Spark dynamic allocation properties](/dataproc-serverless/docs/concepts/autoscaling#spark_dynamic_allocation_properties):\n\n- `spark.executor.instances`\n- `spark.dynamicAllocation.initialExecutors`\n- `spark.dynamicAllocation.minExecutors`\n- `spark.dynamicAllocation.maxExecutors`\n\nWhy do I need to package my code in a JAR file to submit my Spark workload?\n---------------------------------------------------------------------------\n\nSpark is written in Scala, which means that both the driver and the worker processes\noperate as JVM processes. In JVM languages, the JAR file is the primary way to\npackage code. You pass the JAR file to Serverless for Apache Spark when you\nsubmit a workload."]]