Schedule data preparations

This document describes how to schedule data preparation pipelines and perform manual runs.

Data preparations are powered by Dataform.

Data preparations run using Dataform service accounts, which you select when you configure schedules or test runs. For more information, see About service accounts in Dataform.

Changes you make to the data preparation steps aren't automatically saved. You must save and deploy the changes before they can be executed with a schedule. Schedules always run the latest deployed version of your data preparation and exclude any undeployed changes you might be developing.

Before you begin

Before you begin, create a data preparation.

Required roles

To run data preparations, you must grant the roles to the service account that you plan to use for executing the data preparation runs. For more information, see the required roles.

Develop a data preparation

As you develop a data preparation, you can manually run the steps and inspect the output before you deploy the changes to production. You can test the current version you're developing on your data, while BigQuery continues to run the latest deployed version, according to a schedule. Before you can perform the run, you must configure the destination, and fix any validation errors.

Manually run a data preparation in development

To test your data preparation steps and validate the results in your destination table, run the data preparation manually from the data preparation editor:

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the Explorer pane, expand your project and the Data preparations folder. Click the name of the data preparation that you want to run.

  3. Configure the permissions on the service account for the run:

    1. In the data preparation editor toolbar, hold the pointer over the disabled Run option.
    2. On the dialog that appears with information about configuring the service account, click Configure.
    3. In the Service account settings dialog, select a service account.
    4. If the service account needs additional permissions, grant it the required roles by clicking Grant all.
    5. Click Save.
  4. Optional: To update the service account for future runs, go to the data preparation editor toolbar and click More > Configure run now experience, and then update and save the service account settings.

  5. Fix any validation errors that appear.

  6. From the data preparation editor toolbar, click Run.

  7. In the Run now dialog, click Confirm to acknowledge that this manual run writes data to a destination table, which you might also be using for scheduled runs.

    The run then executes your steps and loads the output to the destination.

  8. Optional: After the run is complete, you can view the details about the execution in the Executions pane.

Deploy a data preparation

To schedule runs for a version of your data preparation, you must first deploy it. Schedules run the most recently deployed version.

To deploy a data preparation, follow these steps:

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the Explorer pane, expand your project and the Data preparations folder. Click the name of the chosen data preparation.

    The data preparation editor opens.

  3. In the data preparation editor toolbar, click Deploy.

Create a schedule

To create a schedule that executes the deployed data preparation steps and loads the prepared data into the destination table, you must first schedule a data preparation run. To schedule the run, you must configure the destination, and fix any validation errors.

To create a data preparation schedule, follow these steps:

Explorer pane

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the Explorer pane, expand your project and the Data preparations folder. Click the name of the data preparation that you want to schedule.

  3. From the data preparation editor toolbar, click Schedule.

  4. Enter a schedule name.

  5. In the Authentication section, authorize the data preparation with your Google Account user credentials or a service account.

    • To use your Google Account user credentials (Preview), select Execute with my user credentials.
    • To use a service account, select Execute with selected service account, and then select a service account.
  6. Schedule a frequency.

  7. Click Create schedule. If you selected Execute with my user credentials for your authentication method, you must authorize your Google Account (Preview).

Scheduling page

  1. In the Google Cloud console, go to the Scheduling page.

    Go to Scheduling

  2. Click Create, and then select Data Preparation schedule from the menu.

  3. In the Schedule data preparation pane, in the Data preparation field, select the data preparation that you want to schedule.

  4. In the Schedule name field, enter a name for the schedule.

  5. In the Authentication section, authorize the data preparation with your Google Account user credentials or a service account.

    • To use your Google Account user credentials (Preview), select Execute with my user credentials.
    • To use a service account, select Execute with selected service account, and then select a service account.
  6. In the Schedule frequency section, do the following:

    1. In the Repeats menu, select the frequency of data preparation runs.
    2. In the At time field, enter the time for scheduled data preparation runs.
    3. In the Timezone menu, select the timezone for the schedule.
  7. Click Create schedule. If you selected Execute with my user credentials for your authentication method, you must authorize your Google Account (Preview).

Authorize your Google Account

To authenticate the resource with your Google Account user credentials, you must manually grant permission for BigQuery pipelines to get the access token for your Google Account and access the source data on your behalf. You can grant manual approval with the OAuth dialog interface.

You only need to give permission to BigQuery pipelines once.

To revoke the permission that you granted, follow these steps:

  1. Go to your Google Account page.
  2. Click BigQuery Pipelines.
  3. Click Remove access.

Changing the data preparation schedule owner by updating credentials also requires manual approval if the new Google Account owner has never created a schedule before.

Manually run a scheduled data preparation

When you manually run a data preparation in a selected schedule, BigQuery executes the data preparation once, independently from the schedule.

To manually run a scheduled data preparation, follow these steps:

  1. In the Google Cloud console, go to the Scheduling page.

    Go to Scheduling

  2. Click the name of the selected data preparation schedule.

  3. On the Schedule details page, click Run.

View schedules

You can view data preparation schedules from the data preparation editor or the Scheduling page.

Data preparation editor

To view the schedule for a data preparation, follow these steps:

  1. In the data preparation editor toolbar, click schedule View schedule.
  2. Optional: To view the schedule history, click View past executions.

Scheduling page

To view all data preparation schedules in your project, follow these steps:

  1. In the Google Cloud console, go to the Scheduling page.

    Go to Scheduling

  2. Optional: To view the run history and details of a selected schedule, click the name of the schedule. History of manual runs is not shown.

Edit a schedule

You can edit a schedule from the data preparation editor or the Scheduling page.

Data preparation editor

To edit a schedule, follow these steps:

  1. In the data preparation editor toolbar, click schedule View schedule.
  2. In the Schedule data preparation dialog, click Edit and then update the schedule.
  3. Click Update schedule.

Scheduling page

To edit a schedule, follow these steps:

  1. In the Google Cloud console, go to the Scheduling page.

    Go to Scheduling

  2. Click the name of the selected data preparation schedule.

  3. On the Schedule details page, click Edit.

  4. Click View schedule.

  5. In the Schedule data preparation dialog, click Edit and then update the schedule.

  6. Click Update schedule.

Delete a schedule

To permanently delete a schedule for a selected data preparation, follow these steps:

  1. In the Google Cloud console, go to the Scheduling page.

    Go to Scheduling

  2. In the row that contains the schedule, click more_vert Actions > Delete.

What's next