This page describes how to reuse Dataproc clusters for your pipeline runs in Cloud Data Fusion. For more information, see When to reuse clusters and Run a pipeline against an existing Dataproc cluster.
Before you begin
- You must have a Cloud Data Fusion instance in version 6.5.0 or later.
Enable cluster reuse
You can reuse clusters in a new compute profile, or in one that's been used in a deployed pipeline.
Enable cluster reuse in a new profile
Go to your instance:
In the Google Cloud console, go to the Cloud Data Fusion page.
To open the instance in the Cloud Data Fusion Studio, click Instances, and then click View instance.
Click System admin > Configuration > System compute profiles.
Click Create new profile.
Choose the Dataproc provisioner.
In the Create a profile for Dataproc window, enter the details about your cluster:
- In the Profile label and Profile name fields, enter a name to
identify the profile—for example,
execution_compute-profile
. - In the Description field, describe the purpose of the
profile—for example,
Profile used for pipeline execution
. - In the Max idle time field, enter a value. For more information, see Set max idle time.
- Set the Skip cluster delete field to
True
. For more information, see When to reuse clusters. - Optional: configure other optional fields.
- Click Create.
- In the Profile label and Profile name fields, enter a name to
identify the profile—for example,
Enable cluster reuse in a deployed pipeline
Go to your instance:
In the Google Cloud console, go to the Cloud Data Fusion page.
To open the instance in the Cloud Data Fusion Studio, click Instances, and then click View instance.
Click List.
Click the Deployed tab and click a pipeline name. The deployed pipeline opens on the Studio page in the Cloud Data Fusion web interface.
Click Configure.
In the Compute config window, go to the chosen profile and click Customize.
In the window that opens, enter the following values:
- In the Max Idle Time field, enter a value. For more information, see Set max idle time.
- Set Skip cluster delete to
True
. For more information, see When to reuse clusters.
Click Done.
What's next
- Learn more about configuring clusters.
- Troubleshoot deleting clusters.