Schedule pipelines

This page describes how to create a schedule for your pipeline runs. For example, you can schedule a pipeline to run daily at 1:00 AM UTC.

Before you begin

To create the schedule, you need to have a deployed pipeline in Cloud Data Fusion. If you don't have one, you can create a pipeline by following the Quickstart.

To create, edit, or suspend a schedule, open your pipeline in Cloud Data Fusion:

  1. Go to your instance:

    1. In the Google Cloud console, go to the Cloud Data Fusion page.

    2. To open the instance in the Cloud Data Fusion Studio, click Instances, and then click View instance.

      Go to Instances

  2. Go to the Cloud Data Fusion List page.

  3. In the Deployed tab, choose a pipeline.

    The Pipeline page opens, where you can create, edit, or suspend a schedule for your pipeline.

Create the schedule

From the Pipeline page in the Cloud Data Fusion Studio, click Schedule.

You can use either the Basic or Advanced tab to define your schedule.

Basic

  1. On the Basic tab, enter the following information about your schedule:

    • Frequency, such as Every 5 minutes and Repeats every 30 days.
    • Start time, in UTC.
    • Maximum concurrent runs: Choose up to ten runs. If there are already ten pipelines running, the scheduled run that you're creating won't run.
    • Compute Engine profile: Choose a compute profile. The default is the Dataproc compute profile.
  2. Click Save and start schedule (or Save schedule, if you want to start it later).

Advanced

  1. On the Advanced tab, define your schedule in cron syntax.
  2. In the Maximum concurrent runs field, choose up to ten runs. If there are already ten pipelines running, the scheduled run that you're creating won't run.
  3. Click Save and start schedule (or Save schedule, if you want to start it later).

Change or suspend the schedule

  • You can change, start, or suspend a pipeline schedule from the Pipeline page in the Cloud Data Fusion Studio.

  • To suspend the schedule, click Unschedule.