This page provides guidance on configuring network connectivity for Dataproc clusters when using Private Service Connect. It explains the interaction between Private Service Connect and Virtual Private Cloud peering for different Dataproc use cases. It also summarizes feature similarities and differences among Private Google Access, Private Service Connect, and Cloud NAT.
Overview
Dataproc clusters require network connectivity to Google Cloud APIs and services, such as the Dataproc API, Cloud Storage, and Cloud Logging, and to user resources, such as data sources in other Virtual Private Cloud networks or on-premises environments.
By default, Dataproc clusters created with image versions 2.2
and later
are created with internal IP addresses only. Dataproc automatically
enables Private Google Access
on the regional subnet used by the internal-IP-only cluster to
enable connections to Google APIs and services without connecting to the
public internet.
To provide more granular network control, you can configure a cluster to use Private Service Connect, which routes traffic to supported Google APIs and services through a private endpoint within your VPC network. This can be beneficial for security and compliance.
Common private networking options
This section describes Private Google Access, Private Service Connect, and Cloud NAT features and differences.
Private Google Access is a one-way path for VMs to reach Google public services without using the internet. It is similar to a special roadway exit from your neighborhood (VPC subnet) that leads directly to the Google services mall, bypassing public roads. Everyone in the neighborhood can use it. Dataproc automatically enables Private Google Access on the regional subnet used by Dataproc Serverless clusters created with image version
2.2
and later.Private Service Connect creates a private, two-way endpoint for a service that is located within your VPC network. It is similar to a dedicated private path from your location (VPC network) directly into a service. It has an address at your location (an internal IP address in your VPC network) and only you can use it.
Cloud NAT allows VMs with private IP addresses to access the internet.
Features and differences
Feature | Private Google Access (PGA) | Private Service Connect (PSC) |
---|---|---|
How it works | Directs traffic from a VM to a special Google IP address range (private.googleapis.com ). |
Creates a forwarding rule (endpoint) inside your VPC network that represents the Google service. |
IP address | Your VM connects to a Google-owned IP address. | Your VM connects to an internal IP address that you own within your VPC network. |
Direction | Outbound only: Your VM initiates a connection to Google. | Bi-directional: Your VM connects to the service, and the service can initiate return traffic. |
Scope | Enabled or disabled for an entire subnet. | Deployed as a specific endpoint resource. |
Services | Connects only to Google APIs, such as Cloud Storage, BigQuery, or Dataproc API. | Connects to Google APIs, services from other companies, and your own services. |
For Dataproc, Private Google Access is the simpler, traditional method to allow cluster VMs to contact the Dataproc control plane. Private Service Connect is a more recent and flexible approach that gives you fine-grained control, particularly in complex or multi-tenant networks.
Why use Private Service Connect? Even if your Dataproc
cluster has internal-only IP addresses with Private Google Access
enabled (the default configuration for 2.2+
image version clusters),
Private Service Connect offers the following advantages:
Instead of using the Private Google Access shared set of endpoints to connect to Google APIs and services, Private Service Connect lets you create a private endpoint with an internal IP address inside your VPC network that maps directly to a specific Google service.
You can create firewall rules that allow traffic only to the Private Service Connect endpoint IP address. For example, you can configure a rule that allows egress traffic from Dataproc cluster VMs exclusively to the internal IP address of the Private Service Connect endpoint for BigQuery, while denying all other egress traffic. This is a more secure approach than creating broader firewall rules with Private Google Access.
Using the Private Service Connect endpoint within your VPC network makes the network path explicit and easier to audit for security and compliance since traffic to a service such as Cloud Storage doesn't share a path with other API traffic.
Private and public paths
Private Google Access, Private Service Connect,
and Cloud NAT allow hosts with
RFC 1918
addresses to reach
Google Cloud services. They also allow Google Cloud resources with
private RFC 1918
addresses to initiate connections to Google Cloud services.
An important distinction to make when assessing different connection options is whether traffic using the connection remains private or travels over the public internet.
Private Google Access and Private Service Connect keep traffic within Google's private network. Data does not travel over the public internet to reach Google Cloud services, which is ideal for security and predictable performance.
Cloud NAT reaches a Google Cloud service by connecting to a public endpoint for the service. The traffic leaves your VPC network through the NAT gateway and travels over the internet.
How each option works
Here's a breakdown of each connection mechanism:
Method | Path to service | Destination endpoint | Primary use case |
---|---|---|---|
Private Google Access | Google private network | Special Google IP addresses (private.googleapis.com ) |
Simple, subnet-level access for VMs to reach Google APIs privately. |
Private Service Connect | Google private network | A private IP address endpoint inside your VPC network | Granular, secure access to Google APIs, third-party or your own services. |
Cloud NAT | Public internet | Service public IP address | General-purpose outbound internet access for VMs with private IP addresses. |
Configure Private Service Connect
To use Private Service Connect with your Dataproc cluster, you must configure the necessary Private Service Connect endpoints and DNS in your VPC network for all Google APIs that Dataproc depends on. For instructions on setting up your subnet and configuring DNS, see About accessing Google APIs through endpoints.
Enable peering if needed
While Private Service Connect provides private access to many Google services, you may also need to enable VPC peering, particularly in the following scenarios:
Other Virtual Private Cloud networks: Private Service Connect connects to Google-managed services, not directly to other customer VPC networks. If your data sources, custom applications, or other services are located in a different VPC network than your Dataproc cluster, typically VPC peering is required to enable private communication between these networks.
On-premises networks: If your Dataproc cluster accesses data or services in your on-premises environment, you will need a Cloud VPN or Cloud Interconnect connection to your on-premises network, often combined with VPC peering.
Comprehensive internal communication to Google services: While Private Service Connect provides private access to configured Google services, such as Cloud Storage and BigQuery, internal control plane communications or specific Dataproc features might require VPC peering to a network with broad Google service accessibility to access underlying Google infrastructure or other Google APIs.
Access to data sources in other VPC networks: If your Dataproc jobs read from or write to data sources, such as Cloud SQL, self-managed databases, and custom applications, that are located in a different VPC network, you must establish VPC peering between your Dataproc cluster VPC network and the VPC network containing those data sources. Private Service Connect does not provide inter-VPC network communication between customer-owned networks.
Hybrid connectivity: For hybrid cloud deployments where Dataproc clusters need to interact with resources in an on-premises data center, VPC peering is essential to connect your on-premises network to your Google Cloud VPC network using Cloud VPN or Cloud Interconnect.
Troubleshoot Private Service Connect
If your Dataproc cluster with Private Service Connect (without VPC peering) fails to create or has connectivity issues, use the following steps to help troubleshoot and resolve the problem:
Confirm required API access:
- Verify that all necessary Google APIs are enabled in your Google Cloud project.
Verify Private Service Connect endpoint configuration:
Verify that a Private Service Connect endpoint is correctly configured for all Google APIs that the cluster requires, such as
dataproc.googleapis.com
,storage.googleapis.com
,logging.googleapis.com
,bigquery.googleapis.com
,compute.googleapis.com
.Use tools such as
dig
ornslookup
from a VM within the VPC subnet to confirm that the DNS records for required services correctly resolve to the private IP addresses within your VPC network using the Private Service Connect endpoint.
Check firewall rules:
Verify that firewall rules in your VPC network allow outbound connections from Dataproc cluster instances to Private Service Connect endpoints.
If using Shared VPC, verify that appropriate firewall rules are configured in the host project.
Examine Dataproc cluster logs:
- Review the cluster creation logs in Logging for any
network related errors, such as
connection refused
,timeout
, or "unreachable host
. These errors can indicate a missing route or incorrect firewall rule. Examine the serial console logs of cluster instances.
- Review the cluster creation logs in Logging for any
network related errors, such as
Assess the need for VPC peering:
Based on workload dependencies, if your Dataproc cluster requires connectivity to resources that are not Google-managed, such as databases in a separate VPC network and on-premises servers, establish VPC peering.
Examine the network requirements of Google Cloud services that your Dataproc cluster interacts with. Some services may have specific peering requirements even when used with Private Service Connect.
Follow best practices
Comprehensive network architecture planning: Before deploying Dataproc with Private Service Connect, carefully design your network architecture, considering all implicit and explicit dependencies and data flow paths. This includes identifying all Google APIs that your Dataproc cluster interacts with during provisioning and operation.
Test connectivity: Thoroughly test network connectivity from your Dataproc cluster to all required services and data sources during the development and staging phases.
Use the Network Intelligence Center: Use Google Cloud Network Intelligence Center tools, such as Connectivity Tests, to diagnose and troubleshoot network connectivity issues.
What's next
- Learn more about Private Service Connect.
- Understand VPC Network Peering.
- Explore Dataproc cluster networking configuration.