Dataproc optional Pig component

You can install additional components like Apache Pig when you create a Dataproc cluster using the Optional components feature. This page describes the Pig component, an open source platform for analyzing large data sets.

Install the component

Install the component when you create a Dataproc cluster.

Apache Pig is an optional component in Dataproc 2.3 and later image versions.

See Supported Dataproc versions for component versions included in the latest Dataproc image releases.

gcloud

To create a Dataproc cluster that includes the Pig component, use the gcloud dataproc clusters create CLUSTER_NAME command with the --optional-components flag (using image version 2.3 or later).

gcloud dataproc clusters create CLUSTER_NAME \
    --region=REGION \
    --optional-components=PIG \
    --image-version=2.3 \
    ... other flags

REST API

The Pig component can be specified through the Dataproc API using SoftwareConfig.Component as part of a clusters.create request.

Console

Enable the component:

  1. In the Google Cloud console, open the Dataproc Create a cluster page. The Set up cluster panel is selected.
  2. In the Components section, under Optional components, select Pig and other optional components to install on your cluster.