Troubleshoot deleting clusters

This page shows you how to resolve issues with deleting ephemeral Dataproc clusters in Cloud Data Fusion.

When Cloud Data Fusion creates an ephemeral Dataproc cluster during pipeline run provisioning, the cluster gets deleted after the pipeline run is finished. In rare cases, the cluster deletion fails.

Strongly recommended: Upgrade to the most recent Cloud Data Fusion version to ensure proper cluster maintenance.

Set Max Idle Time

To resolve this issue, configure the Max Idle Time value. This lets Dataproc delete clusters automatically, even if an explicit call on the pipeline finish fails.

Max Idle Time is available in Cloud Data Fusion versions 6.4 and later.

In Cloud Data Fusion 6.6 and later, Max Idle Time is set to 4 hours by default.

To override the default time in the default compute profile, follow these steps:

  1. Open the instance in the Cloud Data Fusion web interface.
  2. Click System Admin > Configuration > System Preferences.
  3. Click Edit System Preferences and add the key and the value, in IntegerUnit format, such as 30m.

Recommended: For versions before 6.6, set Max Idle Time manually to 30 minutes or greater.

Delete clusters manually

If you cannot upgrade your version or configure the Max Idle Time option, instead delete stale clusters manually:

  1. Get each project ID where the clusters were created:

    1. In the pipeline's runtime arguments, check if the Dataproc project ID is customized for the run.

      Check if the Dataproc project ID is customized for the run

    2. If a Dataproc project ID is not specified explicitly, determine which provisioner is used, and then check for a project ID:

      1. In the pipeline runtime arguments, check the value.

        Get the provisioner name in the runtime arguments

      2. Open the provisioner settings and check if the Dataproc project ID is set. If the setting is not present or the field is empty, the project that the Cloud Data Fusion instance is running in is used.

  2. For each project:

    1. Open the project in the Google Cloud console and go to the Dataproc Clusters page.

      Go to Clusters

    2. Sort the clusters by the date that they were created, from oldest to newest.

    3. If the info panel is hidden, click Show info panel and go to the Labels tab.

    4. For every cluster that is not in use—for example, more than a day has elapsed—check if it has a Cloud Data Fusion version label. That is an indication that it was created by Cloud Data Fusion.

    5. Select the checkbox by the cluster name and click Delete.

Skip cluster deletion

For debugging purposes, you can stop the automatic deletion of an ephemeral cluster.

To stop the deletion, set the Skip Cluster Deletion property to True. You must manually delete the cluster after you finish debugging.