Generate data insights in Dataplex

Data insights offers an automated and intuitive way to explore and understand your data. It uses Gemini large language models to generate queries based on the metadata of a table, and lets you uncover patterns, assess data quality, and perform statistical analysis.

This document describes the key features of data insights and the process to automate query generation for insightful data exploration.

About data insights

Data analysts face the cold-start problem in data exploration, when exploring a new dataset with little or no prior knowledge. The problem often involves uncertainties about the data structure, key patterns, and relevant insights. By using automated query generation based on metadata, data insights addresses the cold-start problem in data exploration. The insights provides valuable cues to help you make informed decisions and gain deeper insights into your data. Rather than starting with a blank slate, you can more quickly start data exploration with meaningful queries that offer valuable insights.

Queries generated using data insights are grounded using published profile scan data. Data insights uses published profile scan data to craft queries that deliver results, providing efficient and reliable information retrieval. This significantly accelerates the initiation of the data analysis process, and lets you delve into the data with a clearer direction and purpose.

Data insights serves as a guiding tool that resolves the common challenge of navigating unfamiliar datasets, empowering you to make informed decisions and uncover patterns more rapidly during data exploration.

Example of an insights run

Consider a table called telco_churn with the following technical metadata:

Field name Type
Tenure INT64
PhoneService STRING
OnlineBackup STRING
Dependents BOOLEAN
Contract STRING
TechSupport STRING
PaymentMethod STRING
MonthlyCharges FLOAT

The following are some of the sample queries that data insights generates for this table:

  • Identify customers who have subscribed to all premium services and have been a customer for more than 50 months.

      OnlineSecurity = 'Yes'
      AND OnlineBackup = 'Yes'
      AND DeviceProtection = 'Yes'
      AND TechSupport = 'Yes'
      AND StreamingTV = 'Yes'
      AND StreamingMovies = 'Yes'
      AND Tenure > 50;
  • Identify which internet service has the most churned customers.

      COUNT(DISTINCT customerID) AS total_customers
      Churn = TRUE
      total_customers DESC
    LIMIT 1;
  • Identify customer segments with high churn rates among high-value customers.

      COUNT(DISTINCT customerID) AS total_customers,
      SUM(CASE WHEN Churn = TRUE THEN 1 ELSE 0 END) AS churned_customers,
    * 100 AS churn_rate
      MonthlyCharges > 100

About grounding insights using profile scans

When you create a data profile scan for a table, you can choose to publish the scan results to BigQuery and Data Catalog. Data insights uses this published profile scan data to generate more accurate and relevant queries about the table.

The published profile scan data is viewable in the Google Cloud console. For more information about how to view published profile scan data, see Reporting and monitoring.

If a published profile scan for the table is accessible, it is used to generate comprehensive insights. Otherwise, insights are formulated based on the column names and their respective descriptions. This approach ensures that you receive insights regardless of the availability of a profile scan.

Profile scan data provides valuable information about the data distribution, data types, and statistical summaries of the dataset. Because queries are grounded using profile scan data, data insights ensures that the generated queries are meaningful, and returns results that help in further analysis.

For more information about profile scans, see About data profiling.

For more information about creating and publishing profile scans, see Create and use data profile scans.

The following sections describe how data insights uses the published profile scan data to ground queries.

About profile scan data

Profile scan data is the metadata that describes the contents of a dataset. It includes the following information:

  • Data types of columns
  • Minimum and maximum values
  • Distribution of values
  • Null or missing values
  • Top values
  • Unique values and their frequencies

Data insights uses this information to generate queries that are tailored to specific dataset and to provide meaningful insights.

How data insights grounds queries using profile scan data

Data insights uses profile scan data to create queries that are based on the actual data distribution and patterns within the dataset. This process involves the following steps:

  • Analyzing the profile scan data to identify interesting patterns, trends, or outliers in the data.
  • Generating queries that focus on these patterns, trends, or outliers to uncover insights.
  • Validating the generated queries against the profile scan data to ensure that the queries return meaningful results.

Tips for maximizing the benefits of data insights

Grounded queries help ensure that the insights you gain are accurate, relevant, and actionable, enabling you to make better data-driven decisions. To make the most out of grounded queries using profile scan data, follow these tips:

  • Ensure that your table has up-to-date published profile scan data. This helps data insights to generate more accurate and relevant queries.
  • Review the generated queries to understand how they are grounded in the profile scan data. This lets you interpret the results and gain deeper insights into your data.
  • Adjust the profile scan settings of your table or provide additional context to data insights if the generated queries aren't relevant or useful.


The Dataplex data insights feature is offered free of charge during this Preview.


  • Data insights is available for native BigQuery tables, BigLake tables, external tables, and views.
  • For multicloud customers, data from other clouds isn't available.
  • Data insights is available in all Dataplex regions.
  • Data insights doesn't support Geo or JSON column types.
  • Insights runs don't guarantee the presentation of queries every time. To increase the likelihood of generating more engaging queries, re-initiate the insights pipeline.
  • For tables with column-level access control (ACLs) and restricted user permissions, you can generate insights if you have read access to all columns of the table. To run the generated queries, you must have the respective permissions.

Before you begin

Required roles and permissions

To create, manage, and retrieve data insights, ask your administrator to grant you the following IAM roles:

To get read-only access to the generated insights, ask your administrator to grant you the following IAM role:

  • Dataplex DataScan DataViewer (roles/dataplex.dataScanDataViewer) on the project containing the BigQuery tables for which you want to view insights

To ensure that the Dataplex Service Account has the necessary permissions to read the published data profile scans, ask your administrator to grant the Dataplex Service Agent in your project, the following IAM role. This helps grounding the generated queries using real values present in the BigQuery table.

For more information about granting roles, see Manage access to service accounts.

You might also be able to get the required permissions through custom roles or other predefined roles.

Enable APIs

To use data insights, enable the following APIs in your project:

For more information about enabling the Cloud AI Companion API, see Enable the Cloud AI Companion API in a Google Cloud project.

Generate insights for a BigQuery table

To generate insights for a BigQuery table, you must access the table entry in Dataplex using the Dataplex Search.

  1. In the Google Cloud console, go to the Dataplex Search page.

    Go to Dataplex Search

  2. Search for the table entry in Dataplex.

  3. Click the Insights tab. If the tab is empty, it means that the insights for this table are not generated yet.

  4. To trigger the insights pipeline, click Generate insights.

    It takes 5-10 minutes for the insights to be populated.

  5. In the Insights tab, explore the generated queries and their descriptions.

  6. To execute a query, click Open in BigQuery. The query opens in BigQuery.

  7. To generate a new set of queries, click Generate insights and trigger the pipeline again.

Generate insights for a BigQuery external table

Dataplex data insights supports BigQuery external tables that are located in the same Google Cloud project. If the BigQuery table references data that is stored in Cloud Storage in another Google Cloud project, the insights generation fails.

To generate insights for a BigQuery external table, follow the instructions described in the Generate insights for a BigQuery table section of this document.

Generate insights for a BigLake table

To generate insights for a BigLake table, follow these steps:

  1. Enable the BigQuery Connection API in your project.

    Enable the BigQuery Connection API

  2. Create a BigQuery connection. For more information, see Manage connections.

  3. Grant the Storage Object Viewer (roles/storage.objectViewer) IAM role to the service account corresponding to the BigQuery connection that you created.

    You can retrieve the service account ID from the connection information page for the service account.

  4. To generate insights, follow the instructions described in the Generate insights for a BigQuery table section of this document.

What's next