Stay organized with collections
Save and categorize content based on your preferences.
This document describes the requirements needed for Google Cloud Serverless for Apache Spark
network configuration.
Virtual Private Cloud subnetwork requirements
This document explains the Virtual Private Cloud network requirements for
Google Cloud Serverless for Apache Spark batch workloads and interactive sessions.
Private Google Access
Serverless for Apache Spark batch workloads and interactive sessions
run on VMs with internal IP addresses only and on a regional subnet with
Private Google Access (PGA)
automatically enabled on the subnet.
If you don't specify a subnet, Serverless for Apache Spark selects the
default subnet in the batch workload or session region as the subnet for a
batch workload or session.
If your workload requires external network or internet
access, for example to download resources such as ML models from
PyTorch Hub or Hugging Face,
you can set up Cloud NAT to allow outbound traffic
using internal IPs on your VPC network.
Open subnet connectivity
The VPC subnet for the region selected for the
Serverless for Apache Spark batch workload or interactive session must
allow internal subnet communication on all ports between VM instances.
The following Google Cloud CLI command attaches a network firewall to a
subnet that allows internal ingress communications among VMs using all protocols
on all ports:
SUBNET_RANGES: See
Allow internal ingress connections between VMs.
The default VPC network in a project with the
default-allow-internal firewall rule, which allows ingress communication on
all ports (tcp:0-65535, udp:0-65535, and icmp protocols:ports),
meets the open-subnet-connectivity requirement. However, this rule also allows
ingress by any VM instance on the network.
Serverless for Apache Spark and VPC-SC networks
With VPC Service Controls,
network administrators can define a security perimeter around resources of
Google-managed services to control communication to and between those services.
Note the following strategies when using VPC-SC
networks with Serverless for Apache Spark:
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-25 UTC."],[[["\u003cp\u003eDataproc Serverless for Spark workloads and interactive sessions require a Virtual Private Cloud (VPC) subnetwork.\u003c/p\u003e\n"],["\u003cp\u003eThe selected VPC subnet must have Private Google Access enabled to ensure internal IP address functionality, and it must allow internal subnet communication on all ports between VM instances.\u003c/p\u003e\n"],["\u003cp\u003eFor workloads needing external access, you can use Cloud NAT to enable outbound traffic via internal IPs on your VPC network.\u003c/p\u003e\n"],["\u003cp\u003eWhen using VPC Service Controls (VPC-SC), you should set up private connectivity and consider using a custom container image for pre-installing dependencies outside the VPC-SC perimeter.\u003c/p\u003e\n"]]],[],null,["# Serverless for Apache Spark network configuration\n\nThis document describes the requirements needed for Google Cloud Serverless for Apache Spark\nnetwork configuration.\n\nVirtual Private Cloud subnetwork requirements\n---------------------------------------------\n\nThis document explains the Virtual Private Cloud network requirements for\nGoogle Cloud Serverless for Apache Spark batch workloads and interactive sessions.\n\n### Private Google Access\n\nServerless for Apache Spark batch workloads and interactive sessions\nrun on VMs with internal IP addresses only and on a regional subnet with\n[Private Google Access (PGA)](/vpc/docs/configure-private-google-access)\nautomatically enabled on the subnet.\n\nIf you don't specify a subnet, Serverless for Apache Spark selects the\n`default` subnet in the batch workload or session region as the subnet for a\nbatch workload or session.\n\nIf your workload requires external network or internet\naccess, for example to download resources such as ML models from\n[PyTorch Hub](https://pytorch.org/hub/) or [Hugging Face](https://huggingface.co/),\nyou can set up [Cloud NAT](/nat/docs/overview) to allow outbound traffic\nusing internal IPs on your VPC network.\n\n### Open subnet connectivity\n\nThe VPC subnet for the region selected for the\nServerless for Apache Spark batch workload or interactive session must\nallow internal subnet communication on all ports between VM instances.\n| **Note:** To prevent malicious scripts in one workload from affecting other workloads, Serverless for Apache Spark deploys [default security measures](/dataproc-serverless/docs/concepts/security).\n\nThe following Google Cloud CLI command attaches a network firewall to a\nsubnet that allows internal ingress communications among VMs using all protocols\non all ports: \n\n```\ngcloud compute firewall-rules create allow-internal-ingress \\\n --network=NETWORK_NAME \\\n --source-ranges=SUBNET_RANGES \\\n --destination-ranges=SUBNET_RANGES \\\n --direction=ingress \\\n --action=allow \\\n --rules=all\n```\n\nNotes:\n\n- \u003cvar translate=\"no\"\u003eSUBNET_RANGES:\u003c/var\u003e See\n [Allow internal ingress connections between VMs](/firewall/docs/using-firewalls#common-use-cases-allow-internal).\n The `default` VPC network in a project with the\n `default-allow-internal` firewall rule, which allows ingress communication on\n all ports (`tcp:0-65535`, `udp:0-65535`, and `icmp protocols:ports`),\n meets the open-subnet-connectivity requirement. However, this rule also allows\n ingress by any VM instance on the network.\n\n | **Use network tags to limit connectivity**. In production, the recommended practice is to limit firewall rules to the IP addresses used by your Spark workloads.\n\nServerless for Apache Spark and VPC-SC networks\n-----------------------------------------------\n\nWith [VPC Service Controls](/vpc-service-controls/docs),\nnetwork administrators can define a security perimeter around resources of\nGoogle-managed services to control communication to and between those services.\n\nNote the following strategies when using VPC-SC\nnetworks with Serverless for Apache Spark:\n\n- [Set up private connectivity](/vpc-service-controls/docs/set-up-private-connectivity).\n\n- Create a [custom container image](/dataproc-serverless/docs/guides/custom-containers)\n that pre-installs dependencies outside the VPC-SC perimeter,\n and then [submit a Spark batch workload](/dataproc-serverless/docs/guides/custom-containers#submit_a_spark_batch_workload_using_a_custom_container_image)\n that uses your custom container image.\n\nFor more information, see\n[VPC Service Controls---Serverless for Apache Spark](/vpc-service-controls/docs/supported-products#table_dataproc_serverless)."]]