Manage data preparations
This document describes how to manage data preparations in BigQuery, including how to deploy and schedule data preparations. Data preparations are BigQuery resources powered by Dataform.
This document also describes how to view and manage data preparation metadata in Dataplex.
Before you begin
- Ensure that you have enabled the Gemini for Google Cloud API.
- To manage data preparation metadata in Dataplex, ensure that the Dataplex API is enabled in your Google Cloud project.
Required roles
To ensure that the Dataform service account has the necessary permissions to prepare data in BigQuery, see the required roles for Dataform service accounts.
To get the permissions that you need to prepare data in BigQuery, ask your administrator to grant you the following IAM roles on the project:
-
BigQuery Data Editor (
roles/bigquery.dataEditor
) -
Service Usage Consumer (
roles/serviceusage.serviceUsageConsumer
)
For more information about granting roles, see Manage access to projects, folders, and organizations.
You might also be able to get the required permissions through custom roles or other predefined roles.
To manage data preparation metadata in Dataplex,
ensure that you have the required
Dataplex roles and the
dataform.repository.get
permission.
View existing data preparations
To view a list of existing data preparations, follow these steps:
- On the BigQuery Studio page, go to the Explorer pane.
- Expand your project.
- Expand the Data preparations list.
Schedule data preparations
You can create schedules in the data preparation editor and manage schedules in the BigQuery Orchestration page.
Create a schedule
To create a schedule that executes the data preparation steps and loads the prepared data into the destination table, schedule a one-time or a recurring data preparation run:
- From the data preparation toolbar, click Schedule.
- Enter a schedule name.
- Enter the service account name associated with the execution.
- Schedule a frequency.
- Click Create schedule.
View schedules
To view all data preparation schedules in your project, follow these steps:
In the Google Cloud console, go to the Orchestration page.
Optional: To view details of a selected schedule and its past runs, click the name of the schedule.
Delete a schedule
To permanently delete a schedule for a selected data preparation, follow these steps:
In the Google Cloud console, go to the Orchestration page.
In the row that contains the schedule, click > Delete.
Actions
Optimize data preparation by incrementally processing data
To configure the way your prepared data is written into a destination table, follow these steps.
In the Google Cloud console, go to the BigQuery Studio page.
In the Activity pane, select your data preparation.
In the toolbar of your data preparation, select More > Write mode.
Select one of the options. For more information, see Write mode.
Click Save.
Help improve suggestions
You can help improve Gemini suggestions by sharing with Google the prompt data that you submit to features in Preview. To share your prompt data, follow these steps:
- Open the data preparation editor in BigQuery.
- In the data preparation toolbar, click settings More.
- Select Share data to improve Gemini in BigQuery.
Data sharing settings apply to the entire project and can only be set by a
project administrator with the serviceusage.services.enable
and
serviceusage.services.list
IAM permissions. For more
information about data use in the Trusted Tester Program, see
Gemini for Google Cloud Trusted Tester Program.
Data preparation versions
BigQuery data preparations don't support viewing, comparing, or restoring data preparation versions.
For a list of data preparation versions in chronological order, follow these steps:
- On the BigQuery Studio page, go to the Explorer pane.
- Select your data preparation. Versions are listed on the Activity tab in the Explorer pane.
Download a data preparation
To download a data preparation in a YAML file, follow these steps:
In the Google Cloud console, go to the BigQuery Studio page.
In the Explorer pane, expand your project and the Data preparations folder. Click the name of the data preparation that you want to download.
Click Download. The data preparation is saved in the YAML file format—for example,
NAME data preparation.dp.yaml
.
Upload a data preparation
To upload a data preparation from a YAML file, follow these steps:
In the Google Cloud console, go to the BigQuery Studio page.
In the Explorer pane, expand your project.
Go to the Data preparations folder and click more_vert Menu > Upload to Data preparation.
In the Upload data preparation dialog, select a file to upload, or enter the URL of the data preparation.
Enter a name for the data preparation.
Select a data preparation location where resources are managed and stored.
Click Upload.
Manage metadata in Dataplex
Dataplex lets you store and manage metadata for data preparations. Data preparations are available in Dataplex by default, without additional configuration.
You can use Dataplex to manage data preparations in all BigQuery locations. Managing data preparations in Dataplex is subject to Dataplex quotas and limits and Dataplex pricing.
Dataplex automatically retrieves the following metadata from data preparations:
- Data asset name
- Data asset parent
- Data asset location
- Data asset type
- Corresponding Google Cloud project
Dataplex logs data preparations as entries with the following entry values:
- System entry group
- The system entry group
for data preparations is
@dataform
. To view details of data preparation entries in Dataplex, you need to view thedataform
system entry group. For instructions about how to view a list of all entries in an entry group, see View details of an entry group in the Dataplex documentation. - System entry type
- The system entry type
for data preparations is
dataform-code-asset
. To view details of data preparations,you need to view thedataform-code-asset
system entry type, filter the results with an aspect-based filter, and set thetype
field insidedataform-code-asset
aspect toDATA_PREPARATION
. Then, select an entry of the selected data preparation. For instructions about how to view details of a selected entry type, see View details of an entry type in the Dataplex documentation. For instructions about how to view details of a selected entry, see View details of an entry in the Dataplex documentation. - System aspect type
- The system aspect type
for data preparations is
dataform-code-asset
. To provide additional context to data preparations in Dataplex by annotating data preparation entries with aspects, view thedataform-code-asset
aspect type, filter the results with an aspect-based filter, and set thetype
field insidedataform-code-asset
aspect toDATA_PREPARATION
. For instructions about how to annotate entries with aspects, see Manage aspects and enrich metadata in the Dataplex documentation. - Type
- The type for data canvases is
DATA_PREPARATION
. This type lets you filter data preparations in thedataform-code-asset
system entry type and thedataform-code-asset
aspect type by using theaspect:dataplex-types.global.dataform-code-asset.type=DATA_PREPARATION
query in an aspect-based filter.
For instructions about how to search for assets in Dataplex, see Search for data assets in Dataplex in the Dataplex documentation.
What's next
- Learn more about preparing data in BigQuery.
- Learn how to create data preparations.