Get data insights with the Data agent

The Data agent is an agent that's Made by Google. It gives you data insights from your BigQuery data. With the Data agent, you don't need prior knowledge of SQL. This lets you make well-informed, data-driven business decisions and frees up data analysts to focus on more complex tasks.

This page describes how a Google Cloud project administrator can authorize, create, and deploy the Data agent using Google Cloud console and REST API. This page also shows how an end user can use the agent.

Overview

The Data agent is designed to do the following:

  • Understand the user's intent: It analyzes the context of connected data sources and the user's natural language query to understand the user's goal
  • Generate SQL: Based on this understanding, it converts the user's question into a syntactically and semantically correct SQL query.
  • Retrieve data: It then executes the generated SQL to fetch the relevant data directly from the connected data source, a BigQuery dataset.
  • Provide insights: It presents the retrieved data as visualizations, such as charts and tables, or as text-based summaries to answer the user's query.

Example queries that you can ask the Data agent

Here are some examples queries that you can ask the Data agent:

  • Data aggregation and visualization:
    • "How did Q2 sales in the LATAM region this year compare to Q2 last year?"
    • "Plot a bar chart showing the comparison for each of the top 5 countries in the region."
  • Trend analysis:
    • "How has outgoing call volume varied over the last 6 months, broken out by location?"
    • "Analyze the booking patterns for the hotels in Lisbon rated higher than 3 stars"
  • Data mining:
    • "Which factors are correlated with the total sales value when a customer buys something? Give me a heatmap showing the relationship."
  • Analysis and reporting:
    • "Summarize the opportunities and accounts table and create a short report highlighting any key trends."

Before you begin

To start using the Data agent in Agentspace, follow these steps:

Grant access to BigQuery data

To enable the Data agent to view and query the BigQuery data, grant the Identity and Access Management (IAM) roles to the agent's users:

Workflow

The overall workflow to set up and use the Data agent is as follows:

Obtain authorization details

Follow these steps to set up authorization. The details you obtain are needed to authorize the Data agent to connect to the BigQuery data.

  1. In the Google Cloud console, go to the Credentials page in APIs & Services.

    Go to Credentials

  2. Select the Google Cloud project that contains the BigQuery dataset that you want the Data agent to query.

  3. Click Create credentials and select OAuth client ID.

  4. In Application type, select Web application.

  5. In the Authorized redirect URIs section, add the following URIs:

    • https://vertexaisearch.cloud.google.com/oauth-redirect
    • https://vertexaisearch.cloud.google.com/static/oauth/oauth.html
  6. Click Create.

  7. In the OAuth client created panel, click Download JSON. The downloaded JSON includes the following details for the selected Google Cloud project. You need these details to create an authorization resource:

    • Client ID: CLIENT_ID
    • Authorization URI: AUTHORIZATION_URI
    • Token URI: TOKEN_URI
    • Client secret: CLIENT_SECRET

Set up the Data agent using Google Cloud console

This section shows how to authorize, create, and deploy a Data agent instance using Google Cloud console. You can also add user permissions that determine who can access the created agent.

Authorize and create a Data agent instance

Authorize and create a Data agent instance using these steps:

  1. In Google Cloud console, go to Agentspace.

    Go to Agentspace

  2. Select an app in which you want to create the Data agent.

  3. In the menu, click Agents.

    The Agents page displays the existing agents.

  4. Click Add agent.

  5. In the Create agent pane, click Create in the Data agent card.

  6. In Authorizations, click Add authorization and enter the authorization details. For more information, see Obtain authorization.

  7. Click Done.

  8. Click Next.

  9. Configure your agent as follows:

    1. Enter your agent name and description.
    2. In BigQuery dataset, click Browse and do one of the following:
      • Select an available dataset and click Select.
      • Enter the path to the required BigQuery dataset, click Search, select it, and then select Select.
    3. Optional: Click Show more for advanced options.

    4. Select the correct table access options. If you want to impose an allowlist or a blocklist, specify the paths to the restricted tables.

    5. Optional: Define the natural language query configuration to provide customizations specific to a natural language translation to SQL or Python code. You can also provide a SQL example using a natural language query, its expected SQL output, and its expected response. This improves the quality of the agent's outputs.

      • Schema description: a string in natural language that describes the schema of the BigQuery dataset.
      • Natural language query to SQL prompt: a query in natural language transformed into a SQL instruction.
      • Natural language query to Python prompt: a query in natural language transformed into a Python instruction.
    6. Optional: Add examples of natural language queries transformed to SQL queries:

      • Query: an example of a natural query that must be converted to a SQL query. For example, "What are the names and email addresses of the customers based in California"
      • Expected SQL: a string that illustrates an example SQL query corresponding to the natural language query. For example, suppose that you have a BigQuery table named customers. Then, your expected SQL query can be SELECT customer_name, email FROM customers WHERE state = 'California'.
      • Expected response: a string that supplies the expected answer for the query by executing the expected SQL query. For example:
      Here are the names and email addresses of your customers in California: \
      * Customer name: Lara B, Email address: 222larabrown@gmail.com \
      * Customer name: Alex A, Email address: baklavainthebalkans@gmail.com \
      * Customer name: Bola C, Email address: cloudysanfrancisco@gmail.com \
      
  10. Click Create.

    The Data agent instance appears in the Agents list.
    To start working with the agent, wait until the Agent state column shows Enabled for your instance.

Set up the Data agent using REST API

This section describes how to authorize, create, and deploy a Data agent instance using the REST API.

Authorize the Data agent

As an administrator, create an authorization resource in Agentspace. This lets the Data agent access the BigQuery data.

  1. Create the authorization resource.

    REST

    The following sample shows how to create an authorization resource using the authorizations.create method.

    curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json" \
     -H "X-Goog-User-Project: PROJECT_NUMBER" \
     "https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_NUMBER/locations/LOCATION/authorizations?authorizationId=AUTHORIZATION_ID" \
     -d '{
       "name": "projects/PROJECT_NUMBER/locations/LOCATION/authorizations/AUTHORIZATION_ID",
       "serverSideOauth2": {
         "clientId": "CLIENT_ID",
         "clientSecret": "CLIENT_SECRET",
         "authorizationUri": "AUTHORIZATION_URI",
         "tokenUri": "TOKEN_URI"
    }
    }'
    

    Replace the following:

    • PROJECT_NUMBER: the number of your Google Cloud project.
    • LOCATION: the location of your Google Cloud project.
    • AUTHORIZATION_ID: an ID that you must provide to identify the authorization resource.
    • CLIENT_ID: the client ID that you obtained in the previous step.
    • CLIENT_SECRET: the client secret that you obtained in the previous step.
    • AUTHORIZATION_URI: the authorization URI that you obtained in the previous step.
    • TOKEN_URI: the token URI that you obtained in the previous step.

Create a Data agent instance

As a Google Cloud project administrator, you can create a Data agent instance. This requires the project ID and the dataset ID of the BigQuery data that you want to query using your agent.

REST

The following sample shows how to create a Data agent instance using the agents.create method. To learn about the advanced fields that you can add to this sample, see Add advanced configurations for the agent.

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  -H "X-Goog-User-Project: PROJECT_NUMBER" \
  "https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_NUMBER/locations/LOCATION/collections/default_collection/engines/APP_ID/assistants/default_assistant/agents" \
  -d '{
    "displayName": "AGENT_DISPLAY_NAME",
    "description": "AGENT_DESCRIPTION",
    "icon": {
       "uri": "AGENT_ICON_URI"
     },
    "managed_agent_definition": {
      "tool_settings": {
        "tool_description": "AGENT_DESCRIPTION"
      },
      "data_science_agent_config": {
        "bq_project_id": "BIGQUERY_PROJECT_ID",
        "bq_dataset_id": "BIGQUERY_DATASET_ID"
      }
    },
    "authorizations": [
      "AUTHORIZATION_RESOURCE_NAME"
    ]
  }'

Replace the following:

  • PROJECT_NUMBER: the number of your Google Cloud project.
  • LOCATION: the location of your Agentspace app.
  • APP_ID: the ID of the app.
  • AGENT_DISPLAY_NAME: the name of your Data agent instance.
  • AGENT_ICON_URI: an optional field to provide a URI for the agent's icon.
  • AGENT_DESCRIPTION: a description of your Data agent instance that indicates the agent's purpose or its BigQuery data source details.
  • BIGQUERY_PROJECT_ID: the project ID of the Google Cloud project that contains the BigQuery dataset.
  • BIGQUERY_DATASET_ID: the BigQuery dataset ID that contains the data to be queried.
  • AUTHORIZATION_RESOURCE_NAME: the authorization resource name that you obtained in the previous section.

Add advanced configurations for the agent

You can optionally define the nlQueryConfig field to provide customizations specific to natural language translation to SQL or Python code. You can also provide a SQL example using a natural language query, its expected SQL output, and its expected response. This improves the quality of the agent's outputs. The following code snippet shows how you can configure these advanced fields:

"dataScienceAgentConfig": {
  "nlQueryConfig": {
    "nl2sqlPrompt": "NL_TO_SQL_INSTRUCTIONS",
    "nl2pyPrompt": "NL_TO_PYTHON_INSTRUCTIONS",
    "nl2sqlExample": {
      "query": "EXAMPLE_NL_QUERY",
      "expectedSql": "EXPECTED_SQL_QUERY",
      "expectedResponse": "EXPECTED_SQL_RESPONSE"
    },
    "schemaDescription": "NL_DESCRIPTION_OF_BQ_DATASET"
  }
}

Replace the following:

  • NL_TO_SQL_INSTRUCTIONS: a query in natural language transformed into a SQL instruction.
  • NL_TO_PYTHON_INSTRUCTIONS: a query in natural language transformed into a Python instruction.
  • EXAMPLE_NL_QUERY: an example of a natural language query that must be converted to a SQL query. For example, "What are the names and email addresses of the customers based in California"
  • EXPECTED_SQL_QUERY: a string that illustrates an example SQL query corresponding to the natural query. For example, suppose that you have a BigQuery table named customers. Then, your expected SQL query can be "SELECT customer_name, email FROM customers WHERE state = 'California'".
  • EXPECTED_SQL_RESPONSE: a string that supplies the expected answer for the query and the expected SQL query. For example:

    Here are the names and email addresses of your customers in California: \
    * Customer name: Lara B, Email address: 222larabrown@gmail.com \
    * Customer name: Alex A, Email address: baklavainthebalkans@gmail.com \
    * Customer name: Bola C, Email address: cloudysanfrancisco@gmail.com \
    
  • NL_DESCRIPTION_OF_BQ_DATASET: a string in natural language that describes the schema of the BigQuery dataset.

Deploy the Data agent instance

After creating the Data agent instance, as an administrator, you can deploy it so that the end users can use it.

REST

  1. Deploy the agent. The following sample shows how to deploy the created agent using the agents.deploy method. Deploying the agent is a long-running operation (LRO).

    curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json" \
     -H "X-Goog-User-Project: PROJECT_NUMBER" \
     "https://discoveryengine.googleapis.com/v1alpha/AGENT_RESOURCE_NAME:deploy" \
     -d '{
       "name":"AGENT_RESOURCE_NAME"
     }'
    

    Replace the following:

    • PROJECT_NUMBER: the number of your Google Cloud project.
    • AGENT_RESOURCE_NAME: the agent resource name that you obtained in the previous section when you created the agent.
  2. Get the status of the deployment operation. The following sample shows how to get the status of the deployment operation operations.get method.

    curl -X GET \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    "https://discoveryengine.googleapis.com/v1alpha/DEPLOY_OPERATION_NAME"
    

    Replace DEPLOY_OPERATION_NAME with the LRO name you obtained in the previous step when you deployed the agent.

    In the response, if the value of the done field is true, the deployment is completed. If the value of the done field is false, the deployment is in progress.

Add or modify users and their permissions

Add or modify principals to your Data agent instance and assign specific Identity and Access Management (IAM) roles to them, using these steps:

Console

  1. In Google Cloud console, go to Agentspace.

    Go to Agentspace

  2. Select an app that contains your Data agent instance.

  3. In the menu, click Agents.

    The Agents page displays the existing agents.

  4. Click the agent for which you want to add or modify users.

    By default, a newly created agent has no users.

  5. In the Permissioned users table, click Add user.

  6. Select a Member type from the available list.

  7. Enter the member's identity depending on the type and assign one or more roles.

    • For a user, group, or service account, the member string is an email.
    • For a domain, the member string is a valid domain name.
    • For a principal, the member string is a valid principal. For example, principal://iam.googleapis.com/locations/global/workforcePools/pool-1/subject/subject-1.
    • For a principal set, the member string is a valid principal set. For example, principalSet://iam.googleapis.com/locations/global/workforcePools/pool-1/group/group-1.
    • For domains and principal sets, all the identities and groups of users within those domains and principal sets are assigned the same roles. For a secure access to the agent, select individual groups and controlled domains or principal sets, and assign them least-privilege roles.
  8. Click Save.

    The IAM policy is updated, and the user is added to the permissioned users list.

  9. To modify the assigned permissions, click Actions, select Modify, and do one of the following:

    • Modify the assigned roles.
    • Add a different role.
    • Click Delete to remove a role. You must assign at least one role to a user.

Change the working state of the Data agent instance

After you create the Data agent instance, the agent is enabled, by default. You can change its working state to Disabled, Suspended, Enabled, or Deleted using these steps:

Console

  1. In Google Cloud console, go to Agentspace.

    Go to Agentspace

  2. Select an app that contains your Data agent instance.

  3. In the menu, click Agents.

    The Agents page displays the existing agents.

  4. Click Actions for your agents and select one of the following:

    • Suspended: to make the agent temporarily unavailable for usage. However, users with any degree of permission to the access agent can still see the agent.
    • Disabled: to make the agent unavailable to all users except the user who created it.
    • Enabled: to make the agent available to all users with any degree of permission to the access agent.
    • Delete: to delete the agent instance.

Use the Data agent

Follow these steps to get data insights using your agent:

App

  1. In the app navigation menu, click Agents.

  2. Click View all agents.

  3. Select your Data agent instance.

  4. If your agent requires additional authorization, click Authorize, and provide authorization details.

  5. Click Add files to include files as additional data sources for the Data agent to work with.

  6. Click Sources to select the sources that the Data agent must include to provide the most relevant data insights.

  7. Enter your questions or prompts and press Enter or click Submitmple