Prepare data with Gemini

This document describes how to generate and manage SQL code suggestions for your data preparations in BigQuery.

For more information, see Introduction to BigQuery data preparation.

Before you begin

Open the data preparation editor in BigQuery

You can open the data preparation editor in BigQuery by creating a new data preparation, creating one from an existing table, or opening an existing data preparation. For more information about what happens when you create a data preparation, see Data preparation entry points.

On the BigQuery Studio page, you can go to the data preparation editor in the following ways:

Create new

To create a new data preparation in BigQuery, follow these steps:

  1. In the Google Cloud console, go to the BigQuery Studio page.
    Go to BigQuery
  2. Go to the Create new list and click Data preparation. The data preparation editor is displayed in a new untitled data preparation tab.
  3. In the editor's search bar, enter your table name or keywords and select a table. The data preparation editor for the table opens, showing a preview of your data on the Data tab, and an initial set of data preparation suggestions from Gemini.

    A Data preparation resource appears in the Explorer pane, where you can access this preparation in the future.
  4. Optional: To simplify your view, turn on full screen mode by clicking fullscreen Full screen.

Create new from table

To create a new data preparation from an existing table, follow these steps:

  1. In the Google Cloud console, go to the BigQuery Studio page.
    Go to BigQuery
  2. In the Explorer pane, hold the pointer over a table.
  3. Click more_vert Menu > Query in > Data preparation. The data preparation editor for the table opens, showing a preview of your data on the Data tab, and an initial set of data preparation suggestions from Gemini.

    A Data preparation resource appears in the Explorer pane, where you can access this preparation in the future.
  4. Optional: To simplify your view, turn on full screen mode by clicking fullscreen Full screen.

Open existing

To open the editor for an existing data preparation, follow these steps:

  1. In the Google Cloud console, go to the BigQuery Studio page.
    Go to BigQuery
  2. In the Explorer pane, click your project name and data preparations.
  3. Select the existing data preparation. The graph view of the data preparation pipeline is displayed.
  4. Select one of the nodes in the graph. The data preparation editor for the table opens, showing a preview of your data on the Data tab, and an initial set of data preparation suggestions from Gemini.
  5. Optional: To simplify your view, turn on full screen mode by clicking fullscreen Full screen.

Create data preparation steps

You prepare data in steps. You can preview or apply steps suggested by Gemini. You can also fine tune the suggestions, or apply your own steps. For more information about supported data preparation steps, see Supported steps.

View, edit, and apply suggestions by Gemini

When you open the data preparation editor for your table in the previous steps, Gemini inspects the data and schema from the table you loaded and generates suggestions.

When you view the data in the data view, click a column name or a particular cell. Gemini generates filter and transformation suggestions that are specific to that column or cell.

Apply a suggestion by Gemini as a data preparation step:

  1. In the data view, go to the Steps list and review the suggestions.
  2. Optional: To preview the result of the suggestion card, click Preview.
  3. Optional: To modify the suggestion card using natural language, click Edit.

    For more information about modifying suggestions, see Improve suggestions with an example and Add steps with natural language or SQL expressions.

  4. Click Apply.

You can perform the following actions in the data view when you click Applied steps and inspect the list of steps:

  • Edit or remove step in the list by clicking more_vert Menu and making the change.
  • Select any step in the list to view the data sample and schema as of this particular step.
  • Select a step in the list to insert new steps after the selected step.

Improve suggestions with an example

To fine tune the suggestions from Gemini, give it an example of what the data should look like in the cell:

  1. In the data view, edit the values of one to three cells to demonstrate what the values in that column should look like. For example, enter a date the way you want to format all dates. Gemini generates new suggestions based on your changes.

  2. To modify a suggestion card, click Edit.

  3. Click Apply.

Add steps with natural language or SQL expressions

If no suggestions meet your needs, add your own by clicking Add step, selecting a step type, and entering a prompt in natural language describing your intent.

Add a transformation

  1. In the data view, click Add step and choose the Transformation option.
  2. In the Description field, enter a prompt, such as Convert the state column to a uppercase.
  3. Click send Send.

    Gemini generates a SQL expression and a new description based on your prompt.

  4. In the Target column list, select or enter a column name.

  5. Optional: To update the SQL expression, revise the prompt and click send Send, or manually enter a SQL expression.

  6. Optional: Click Preview and review the step.

  7. Click Apply.

Filter rows

To add a filter that removes rows, follow these steps:

  1. In the data view, click Add step > Filter.
  2. In the Description field, enter a prompt, such as Column ID should not be NULL.
  3. Click Generate. Gemini generates a SQL expression and a new description based on your prompt.
  4. Optional: To update the SQL expression, revise the prompt and click send Send, or enter a SQL expression manually.
  5. Optional: Click Preview and review the step.
  6. Click Apply.

Filter expression format

SQL expressions for filters retain rows that match the specified condition. This is equivalent to a SELECT … WHERE SQL_EXPRESSION statement.

For example, to retain records where the column, year is greater than or equal to 2000, the condition is year >= 2000.

Expressions must follow the BigQuery SQL syntax for the WHERE clause.

Configure the error table and add a validation rule

You can add a filter that creates a validation rule, which sends errors to an error table or fails the data preparation run.

Configure the error table

To configure your error table, follow these steps:

  1. In the data view, go to the toolbar and click More > Error table.
  2. Click Enable error table.
  3. Define the table location.
  4. Optional: Define a maximum duration for keeping errors.
  5. Click Save.

Add a validation rule

To add a validation rule, follow these steps:

  1. In the data view, click Add step > Filter.
  2. Enter a description for the step.
  3. Enter a SQL expression, in the form of a WHERE clause.
  4. Optional: If you want the SQL expression to act as a validation rule, select the Failed validation rows go to error table checkbox. You can also change a filter to a validation in the data preparation toolbar by clicking More > Error table.
  5. Optional: Click Preview and review the step.
  6. Click Apply.

Delete a column

To delete a column from a data preparation, follow these steps:

  1. In the schema view, click more_vert Menu next to the column to be deleted.
  2. Click Delete. A new applied step is added for the deleted column.

Add a join operation

To add a join operation step between two sources in your data preparation, follow these steps:

  1. In the data view for a node in your data preparation go to the Steps list, and click Add step > Join.
  2. In the Configure join dialog, select the other table involved in the join operation (referred to as the right side of the join).
  3. Optional: Select the join keys for each table. Join keys must have the same data type.
  4. Optional: Select the type of join operation to perform, such as Inner join.
  5. Optional: To prepare the right side table before specifying a join key, such as when the data types of the two columns in the join differ, follow these steps:

    1. Save a join without a join key selected.
    2. Go to the graph view and select the right side source table.
    3. Prepare the right side table, such as transforming a column's data type.
    4. Edit the join you saved in the previous step by setting a key.
  6. Click Create.

All table rows are retained, even if there's no corresponding value in one of the tables.

After you save the step, the source table you selected (the right side of the join) and the join operation are reflected in the list of applied steps and in the nodes in the graph view of your data preparation.

Add or change a destination table

To add or change a destination table for the output of your data preparation, follow these steps:

  1. In the data view, click Add > Destination.
  2. Select the project where the destination table is stored.
  3. Select one of the datasets, or load a new dataset.
  4. Enter a destination table. If the table doesn't exist, the data preparation creates a new table on the first run. For more information, see Write mode.
  5. Select your dataset as the destination dataset.
  6. Click Save.

Run the data preparation

After you've added your data preparation steps, configured the destination, and fixed any validation errors, you can perform test runs on a sample of the data, or deploy the steps and schedule data preparation runs. For more information, see Orchestrate data preparations.

Refresh data preparation samples

Data in the sample isn't automatically refreshed. If data in the source tables for the data preparation has changed, but the changes aren't reflected in the data sample of the preparation, click More > Refresh sample.

What's next