Manage data preparations

This document describes how to manage data preparations in BigQuery, including how to deploy and schedule data preparations. Data preparations are BigQuery resources powered by Dataform.

This document also describes how to view and manage data preparation metadata in Dataplex.

Before you begin

  1. Ensure that you have enabled the Gemini for Google Cloud API.
  2. To manage data preparation metadata in Dataplex, ensure that the Dataplex API is enabled in your Google Cloud project.

Required roles

To ensure that the Dataform service account has the necessary permissions to prepare data in BigQuery, see the required roles for Dataform service accounts.

To get the permissions that you need to prepare data in BigQuery, ask your administrator to grant you the following IAM roles on the project:

For more information about granting roles, see Manage access to projects, folders, and organizations.

You might also be able to get the required permissions through custom roles or other predefined roles.

To manage data preparation metadata in Dataplex, ensure that you have the required Dataplex roles and the dataform.repository.get permission.

View existing data preparations

To view a list of existing data preparations, follow these steps:

  1. On the BigQuery Studio page, go to the Explorer pane.
  2. Expand your project.
  3. Expand the Data preparations list.

Schedule data preparations

You can create schedules in the data preparation editor and manage schedules in the BigQuery Orchestration page.

Create a schedule

To create a schedule that executes the data preparation steps and loads the prepared data into the destination table, schedule a one-time or a recurring data preparation run:

  1. From the data preparation toolbar, click Schedule.
  2. Enter a schedule name.
  3. Enter the service account name associated with the execution.
  4. Schedule a frequency.
  5. Click Create schedule.

View schedules

To view all data preparation schedules in your project, follow these steps:

  1. In the Google Cloud console, go to the Orchestration page.

    Go to Orchestration

  2. Optional: To view details of a selected schedule and its past runs, click the name of the schedule.

Delete a schedule

To permanently delete a schedule for a selected data preparation, follow these steps:

  1. In the Google Cloud console, go to the Orchestration page.

    Go to Orchestration

  2. In the row that contains the schedule, click Actions > Delete.

Optimize data preparation by incrementally processing data

To configure the way your prepared data is written into a destination table, follow these steps.

  1. In the Google Cloud console, go to the BigQuery Studio page.

    Go to BigQuery

  2. In the Activity pane, select your data preparation.

  3. In the toolbar of your data preparation, select More > Write mode.

  4. Select one of the options. For more information, see Write mode.

  5. Click Save.

Help improve suggestions

You can help improve Gemini suggestions by sharing with Google the prompt data that you submit to features in Preview. To share your prompt data, follow these steps:

  1. Open the data preparation editor in BigQuery.
  2. In the data preparation toolbar, click settings More.
  3. Select Share data to improve Gemini in BigQuery.

Data sharing settings apply to the entire project and can only be set by a project administrator with the serviceusage.services.enable and serviceusage.services.list IAM permissions. For more information about data use in the Trusted Tester Program, see Gemini for Google Cloud Trusted Tester Program.

Data preparation versions

BigQuery data preparations don't support viewing, comparing, or restoring data preparation versions.

For a list of data preparation versions in chronological order, follow these steps:

  1. On the BigQuery Studio page, go to the Explorer pane.
  2. Select your data preparation. Versions are listed on the Activity tab in the Explorer pane.

Download a data preparation

To download a data preparation in a YAML file, follow these steps:

  1. In the Google Cloud console, go to the BigQuery Studio page.

    Go to BigQuery

  2. In the Explorer pane, expand your project and the Data preparations folder. Click the name of the data preparation that you want to download.

  3. Click Download. The data preparation is saved in the YAML file format—for example, NAME data preparation.dp.yaml.

Upload a data preparation

To upload a data preparation from a YAML file, follow these steps:

  1. In the Google Cloud console, go to the BigQuery Studio page.

    Go to BigQuery

  2. In the Explorer pane, expand your project.

  3. Go to the Data preparations folder and click more_vert Menu > Upload to Data preparation.

  4. In the Upload data preparation dialog, select a file to upload, or enter the URL of the data preparation.

  5. Enter a name for the data preparation.

  6. Select a data preparation location where resources are managed and stored.

  7. Click Upload.

Manage metadata in Dataplex

Dataplex lets you store and manage metadata for data preparations. Data preparations are available in Dataplex by default, without additional configuration.

You can use Dataplex to manage data preparations in all BigQuery locations. Managing data preparations in Dataplex is subject to Dataplex quotas and limits and Dataplex pricing.

Dataplex automatically retrieves the following metadata from data preparations:

  • Data asset name
  • Data asset parent
  • Data asset location
  • Data asset type
  • Corresponding Google Cloud project

Dataplex logs data preparations as entries with the following entry values:

System entry group
The system entry group for data preparations is @dataform. To view details of data preparation entries in Dataplex, you need to view the dataform system entry group. For instructions about how to view a list of all entries in an entry group, see View details of an entry group in the Dataplex documentation.
System entry type
The system entry type for data preparations is dataform-code-asset. To view details of data preparations,you need to view the dataform-code-asset system entry type, filter the results with an aspect-based filter, and set the type field inside dataform-code-asset aspect to DATA_PREPARATION. Then, select an entry of the selected data preparation. For instructions about how to view details of a selected entry type, see View details of an entry type in the Dataplex documentation. For instructions about how to view details of a selected entry, see View details of an entry in the Dataplex documentation.
System aspect type
The system aspect type for data preparations is dataform-code-asset. To provide additional context to data preparations in Dataplex by annotating data preparation entries with aspects, view the dataform-code-asset aspect type, filter the results with an aspect-based filter, and set the type field inside dataform-code-asset aspect to DATA_PREPARATION. For instructions about how to annotate entries with aspects, see Manage aspects and enrich metadata in the Dataplex documentation.
Type
The type for data canvases is DATA_PREPARATION. This type lets you filter data preparations in the dataform-code-asset system entry type and the dataform-code-asset aspect type by using the aspect:dataplex-types.global.dataform-code-asset.type=DATA_PREPARATION query in an aspect-based filter.

For instructions about how to search for assets in Dataplex, see Search for data assets in Dataplex in the Dataplex documentation.

What's next