Create pipelines
This document describes how to create pipelines in BigQuery. Pipelines are powered by Dataform.
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the BigQuery, Dataform, and Vertex AI APIs.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the BigQuery, Dataform, and Vertex AI APIs.
- Follow the instructions at Enable code asset management to save, share, and manage versions of code assets such as pipelines.
- If this is the first time you are creating a code asset, set the default region for storing code assets. You can't change the region for a code asset after it is created.
Required roles for pipelines
To get the permissions that you need to create pipelines, ask your administrator to grant you the following IAM roles on the project:
-
To create pipelines:
Code Creator (
roles/dataform.codeCreator
) -
To edit and run pipelines:
Dataform Editor (
roles/dataform.editor
)
For more information about granting roles, see Manage access to projects, folders, and organizations.
You might also be able to get the required permissions through custom roles or other predefined roles.
For more information about Dataform IAM, see Control access with IAM.
Required roles for notebook options
To get the permissions that
you need to select a runtime template in notebook options,
ask your administrator to grant you the
Notebook Runtime User (roles/aiplatform.notebookRuntimeUser
) IAM role on the project.
For more information about granting roles, see Manage access to projects, folders, and organizations.
You might also be able to get the required permissions through custom roles or other predefined roles.
If you don't have this role, you can select the default notebook runtime specification.
Create a pipeline
To create a pipeline, follow these steps:
Go to the BigQuery page.
In the tab bar of the editor pane, click the
arrow next to the + sign, and then click Pipeline.Optional: To rename the pipeline, click the pipeline name, and then type a new name.
Click Get started, and then go to the Settings tab.
In the Service account field, select a Dataform service account.
In the Location section, select a processing region for the pipeline.
- To select a specific region, select Region, and then select a region in the Region menu.
- To select a multi-region, select Multi-region, and then select a multi-region in the Multi-region menu.
The pipeline processing region doesn't need to match your default storage region for code assets.
Notebook options
If you plan to add a notebook to your pipeline, do the following in the Notebook options section:
In the Runtime template field, either accept the default notebook runtime, or search for and select an existing runtime.
- To view specifications for the default runtime, click the adjacent arrow.
- To create a new runtime, see Create a runtime template.
In the Cloud Storage bucket field, click Browse and select or create a Cloud Storage bucket for storing the output of notebooks in your pipeline.
Follow Add a principal to a bucket-level policy to add your custom Dataform service account as a principal to the Cloud Storage bucket which you plan to use for storing output of scheduled pipeline runs, and grant the Storage Admin (
roles/storage.admin
) role to this principal.The selected custom Dataform service account must be granted the Storage Admin IAM role on the selected bucket.
Add a pipeline task
To add a task to a pipeline, follow these steps:
In the Google Cloud console, go to the BigQuery page.
In the Explorer pane, expand your project and the Pipelines folder, and then select a pipeline.
To add a code asset, such as a SQL query, a notebook, or a data preparation, do the following:
SQL query
Click Add task, and then select Query. You can either create a new query or import an existing one.
Optional: In the Query task details pane, in the Run after menu, select a task that will precede your query.
Your query will depend on the preceding task.
Create a new query
Click the
arrow menu next to Edit Query and select either In context or In new tab.Search for an existing query.
Select a query name and then press Enter.
Click Save.
Optional: To rename the query, click the query name on the pipeline pane, click Edit Query, click the existing query name at the top of the screen, and then type a new name.
Import an existing query
Click the
arrow menu next to Edit Query and click Import a copy.Search for an existing query to import or select an existing query from the search pane. When you import a query, the original remains unchanged because the query's source file is copied into the pipeline.
Click Edit to open the imported query.
Click Save.
Notebook
Click Add task, and then select Notebook. You can either create a new notebook or import an existing one. To change settings for notebook runtime templates, see Notebook options.
Optional: In the Notebook task details pane, in the Run after menu, select a task that will precede your notebook.
Your notebook will depend on the preceding task.
Create a new notebook
Click the
arrow menu next to Edit Notebook and select either In context or In new tab.Search for an existing notebook.
Select a notebook name and then press Enter.
Click Save.
Optional: To rename the notebook, click the notebook name on the pipeline pane, click Edit Notebook, click the existing notebook name at the top of the screen, and then type a new name.
Import an existing notebook
Click the
arrow menu next to Edit Notebook and click Import a copy.Search for an existing notebook to import or select an existing notebook from the search pane. When you import a notebook, the original remains unchanged because the notebook's source file is copied into the pipeline.
To open the imported notebook, click Edit.
Click Save.
Data preparation
Click Add task, and then select Data preparation. You can either create a new data preparation or import an existing one.
Optional: In the Data preparation task details pane, in the Run after menu, select a task that will precede your data preparation.
Your data preparation will depend on the preceding task.
Create a new data preparation
Click the
arrow menu adjacent to Edit Data preparation and select either In context or In new tab.Search for an existing data preparation.
Select a data preparation name and press enter.
Click Save.
Optional: To rename the data preparation, click the data preparation name on the pipeline pane, click Edit Data preparation, click the name at the top of the screen, and enter a new name.
Import an existing data preparation
Click the
arrow drop-down menu next to Edit Data preparation and click Import a copy.Search for an existing data preparation to import or select an existing data preparation from the search pane. When you import a data preparation, the original remains unchanged because the data preparation's source file is copied into the pipeline.
To open the imported data preparation, click Edit.
Click Save.
Edit a pipeline task
To edit a pipeline task, follow these steps:
In the Google Cloud console, go to the BigQuery page.
In the Explorer pane, expand your project and the Pipelines folder, and then select a pipeline.
Click the selected task.
To change the preceding task, in the Run after menu, select a task that will precede your query or notebook.
To edit the contents of the selected task, click Edit.
In the new tab that opens, edit task contents, and then save changes to the task.
Delete a pipeline task
To delete a task from a pipeline, follow these steps:
In the Google Cloud console, go to the BigQuery page.
In the Explorer pane, expand your project and the Pipelines folder, and then select a pipeline.
Click the selected task.
In the Task details pane, click the DeleteDelete icon.
Share a pipeline
To share a pipeline, follow these steps:
In the Google Cloud console, go to the BigQuery page.
In the Explorer pane, expand your project and the Pipelines folder, and then select a pipeline.
Click Share, and then select Manage permissions.
Click Add user/group.
In the New principals field, enter the name of at least one user or group.
For Assign Roles, select a role.
Click Save.
Share a link to a pipeline
In the Google Cloud console, go to the BigQuery page.
In the Explorer pane, expand your project and the Pipelines folder, and then select a pipeline.
Click Share, and then select Share link. The URL for your pipeline is copied to your computer's clipboard.
Run a pipeline
To manually run the current version of a pipeline, follow these steps:
In the Google Cloud console, go to the BigQuery page.
In the Explorer pane, expand your project and the Pipelines folder, and then select a pipeline.
Click Run.
Optional: To inspect the run, view past manual runs.
What's next
- Learn more about BigQuery pipelines.
- Learn how to manage pipelines.
- Learn how to schedule pipelines.