Google Cloud Serverless for Apache Spark components
Stay organized with collections
Save and categorize content based on your preferences.
The Serverless for Apache Spark components let you run Apache Spark batch
workloads from a pipeline within Vertex AI Pipelines.
Serverless for Apache Spark runs the batch workloads on a managed compute
infrastructure, autoscaling resources as needed.
In Serverless for Apache Spark, a Batch resource represents a batch workload.
The Google Cloud SDK includes the following operators to
create Batch resources and monitor their execution:
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-29 UTC."],[],[],null,["The Serverless for Apache Spark components let you run Apache Spark batch\nworkloads from a pipeline within Vertex AI Pipelines.\nServerless for Apache Spark runs the batch workloads on a managed compute\ninfrastructure, autoscaling resources as needed.\n\nLearn more about [Google Cloud Serverless for Apache Spark](/dataproc-serverless/docs/overview) and [supported Spark workloads](/dataproc-serverless/docs/overview#for_spark_workload_capabilities).\n\nIn Serverless for Apache Spark, a `Batch` resource represents a batch workload.\nThe Google Cloud SDK includes the following operators to\ncreate `Batch` resources and monitor their execution:\n\n\n- [`DataprocPySparkBatchOp`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-2.19.0/api/v1/dataproc.html#v1.dataproc.DataprocPySparkBatchOp)\n- [`DataprocSparkBatchOp`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-2.19.0/api/v1/dataproc.html#v1.dataproc.DataprocSparkBatchOp)\n- [`DataprocSparkRBatchOp`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-2.19.0/api/v1/dataproc.html#v1.dataproc.DataprocSparkRBatchOp)\n- [`DataprocSparkSqlBatchOp`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-2.19.0/api/v1/dataproc.html#v1.dataproc.DataprocSparkSqlBatchOp)\n\n\u003cbr /\u003e\n\nAPI reference\n\n- For component reference, see the\n [Google Cloud SDK reference for Google Cloud Serverless for Apache Spark components](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-2.19.0/api/v1/dataproc.html) .\n\n- For Serverless for Apache Spark resource reference, see the following API\n reference page:\n\n - [`Batch`](/dataproc-serverless/docs/reference/rest/v1/projects.locations.batches#resource:-batch) resource\n\nTutorials\n\n- [Get started with Google Cloud Serverless for Apache Spark pipeline components](https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/ml_ops/stage3/get_started_with_dataproc_serverless_pipeline_components.ipynb)\n\nVersion history and release notes\n\nTo learn more about the version history and changes to the Google Cloud Pipeline Components SDK, see the [Google Cloud Pipeline Components SDK Release Notes](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-2.19.0/release.html).\n\nTechnical support contacts\n\nIf you have any questions, reach out to\n[kfp-dataproc-components@google.com](mailto: kfp-dataproc-components@google.com)."]]