为 BigQuery 数据源定义数据代理上下文

编写有效的系统指令可以为 Conversational Analytics API 数据智能体提供有用的上下文,以便回答有关数据源的问题。系统指令是一种由数据智能体所有者提供的创作上下文,可用于引导数据智能体的行为并优化 API 的回答。

本页面介绍了一种建议的结构来为连接到 BigQuery 数据的 Conversational Analytics API 数据智能体编写有效的提示。您可以使用 system_instruction 参数编写字符串形式的上下文,来定义这些提示。

本页面介绍了如何为基于 BigQuery 数据库的 BigQuery 数据源编写系统指令。

在系统指令中定义上下文

系统指令包含一系列关键组件和对象,可为数据智能体提供有关数据源的详细信息,以及有关智能体在回答问题时的角色的指导。您可以使用 system_instruction 参数以 YAML 格式的字符串形式向数据代理提供系统指令。

以下模板展示了一种建议的 YAML 结构来为 system_instruction 参数提供字符串,包括可用的键和预期的数据类型。

以下 YAML 模板展示了如何为 BigQuery 数据源构建系统指令。

- system_instruction: str # A description of the expected behavior of the agent. For example: You are a sales agent.
- tables: # A list of tables to describe for the agent.
    - table: # Details about a single table that is relevant for the agent.
        - name: str # The name of the table.
        - description: str # A description of the table.
        - synonyms: list[str] # Alternative terms for referring to the table.
        - tags: list[str] # Keywords or tags that are associated with the table.
        - fields: # Details about columns (fields) within the table.
            - field: # Details about a single column within the current table.
                - name: str # The name of the column.
                - description: str # A description of the column.
                - synonyms: list[str] # Alternative terms for referring to the column.
                - tags: list[str] # Keywords or tags that are associated with the column.
                - sample_values: list[str] # Sample values that are present within the column.
                - aggregations: list[str] # Commonly used or default aggregations for the column.
        - measures: # A list of calculated metrics (measures) for the table.
            - measure: # Details about a single measure within the table.
                - name: str # The name of the measure.
                - description: str # A description of the measure.
                - exp: str # The expression that is used to construct the measure.
                - synonyms: list[str] # Alternative terms for referring to the measure.
        - golden_queries: # A list of important or popular ("golden") queries for the table.
            - golden_query: # Details about a single golden query.
                - natural_language_query: str # The natural language query.
                - sql_query: str # The SQL query that corresponds to the natural language query.
        - golden_action_plans: # A list of suggested multi-step plans for answering specific queries.
            - golden_action_plan: # Details about a single action plan.
                - natural_language_query: str # The natural language query.
                - action_plan: # A list of the steps for this action plan.
                    - step: str # A single step within the action plan.
    - relationships: # A list of join relationships between tables.
        - relationship: # Details about a single join relationship.
            - name: str # The name of this join relationship.
            - description: str # A description of the relationship.
            - relationship_type: str # The join relationship type: one-to-one, one-to-many, many-to-one, or many-to-many.
            - join_type: str # The join type: inner, outer, left, right, or full.
            - left_table: str # The name of the left table in the join.
            - right_table: str # The name of the right table in the join.
            - relationship_columns: # A list of columns that are used for the join.
                - left_column: str # The join column from the left table.
                - right_column: str # The join column from the right table.
- glossaries: # A list of definitions for glossary business terms, jargon, and abbreviations.
    - glossary: # The definition for a single glossary item.
        - term: str # The term, phrase, or abbreviation to define.
        - description: str # A description or definition of the term.
        - synonyms: list[str] # Alternative terms for the glossary entry.
- additional_descriptions: # A list of any other general instructions or content.
    - text: str # Any additional general instructions or context not covered elsewhere.

使用 BigQuery 数据源的系统指令的关键组成部分示例

以下部分包含 BigQuery 中系统指令的关键组成部分示例。这些键包括:

system_instruction

使用 system_instruction 键定义智能体的角色及角色设定。此初始指令可为 API 的回答设定基调和风格,并帮助智能体理解其核心目标。

例如,您可以将智能体定义成一个虚构网店的销售分析师,如下所示:

- system_instruction: >-
    You are an expert sales analyst for a fictitious ecommerce store. You will answer questions about sales, orders, and customer data. Your responses should be concise and data-driven.

tables

tables 键为智能体提供了表说明的列表,并提供有关智能体可用来回答问题的特定数据的详细信息。此列表中的每个 table 对象都包含一个特定表的元数据,包括相应表的名称、说明、同义词、标记、字段、度量、黄金查询和黄金操作方案。以下 YAML 代码块显示了表 bigquery-public-data.thelook_ecommerce.orderstables 键的基本结构:

- tables:
    - table:
        - name: bigquery-public-data.thelook_ecommerce.orders
        - description: Data for customer orders in The Look fictitious e-commerce store.
        - synonyms:
            - sales
            - orders_data
        - tags:
            - ecommerce
            - transaction

fields

嵌套在 table 对象中的 fields 键接受字段对象列表作为输入来描述各列。并非所有字段都需要提供额外的上下文信息;不过,对于常用字段,添加额外的详细信息有助于提升智能体的性能。

以下 YAML 代码示例描述了 orders 表的关键字段,例如 order_idstatuscreated_atnum_of_itemsearnings

- tables:
    - table:
        - name: bigquery-public-data.thelook_ecommerce.orders
        - fields:
            - field:
                - name: order_id
                - description: The unique identifier for each customer order.
            - field:
                - name: user_id
                - description: The unique identifier for each customer.
            - field:
                - name: status
                - description: The current status of the order.
                - sample_values:
                    - complete
                    - shipped
                    - returned
            - field:
                - name: created_at
                - description: The timestamp when the order was created.
            - field:
                - name: num_of_items
                - description: The total number of items in the order.
                - aggregations:
                    - sum
                    - avg
            - field:
                - name: earnings
                - description: The sales amount for the order.
                - aggregations:
                    - sum
                    - avg

measures

嵌套在 table 对象中的 measures 键用于定义未直接以列的形式呈现在表中的自定义业务指标或计算。提供衡量指标上下文信息有助于智能体回答有关关键绩效指标 (KPI) 或其他计算值的问题。

例如,您可以将 profit 测量定义为收入减去成本的计算,如下所示:

- tables:
    - table:
        - name: bigquery-public-data.thelook_ecommerce.orders
        - measures:
            - measure:
                - name: profit
                - description: Raw profit (earnings minus cost).
                - exp: earnings - cost
                - synonyms: gains

golden_queries

嵌套在 table 对象中的 golden_queries 键接受 golden_query 对象列表作为输入。黄金查询有助于智能体针对您可定义的常见问题或重要问题提供更准确、更相关的回答。

例如,您可以按如下方式为 orders 表中的数据定义常见分析的黄金查询:

- tables:
    - table:
        - golden_queries:
            - golden_query:
                - natural_language_query: How many orders are there?
                - sql_query: SELECT COUNT(*) FROM sqlgen-testing.thelook_ecommerce.orders
            - golden_query:
                - natural_language_query: How many orders were shipped?
                - sql_query: >-
                    SELECT COUNT(*) FROM sqlgen-testing.thelook_ecommerce.orders
                    WHERE status = 'shipped'

golden_action_plans

嵌套在 table 对象中的 golden_action_plans 键接受 golden_action_plan 对象列表作为输入。您可以使用黄金操作方案为智能体提供有关如何处理多步请求的示例,例如提取数据,然后创建可视化图表。

例如,您可以定义一个按年龄段显示订单细分的行动计划,并包含有关 SQL 查询和可视化相关步骤的详细信息:

- tables:
    - table:
        - golden_action_plans:
            - golden_action_plan:
                - natural_language_query: Show me the number of orders broken down by age group.
                - action_plan:
                    - step: >-
                        Run a SQL query that joins the table
                        sqlgen-testing.thelook_ecommerce.orders and
                        sqlgen-testing.thelook_ecommerce.users to get a
                        breakdown of order count by age group.
                    - step: >-
                        Create a vertical bar plot using the retrieved data,
                        with one bar per age group.

relationships

relationships 键包含表之间联接关系的列表。通过定义联接关系,有助于智能体理解在回答问题时应如何联接来自多个表的数据。

例如,您可以按如下方式定义 bigquery-public-data.thelook_ecommerce.orders 表与 bigquery-public-data.thelook_ecommerce.users 表之间的 orders_to_user 关系:

- relationships:
    - relationship:
        - name: orders_to_user
        - description: >-
            Connects customer order data to user information with the user_id and id fields to allow an aggregated view of sales by customer demographics.
        - relationship_type: many-to-one
        - join_type: left
        - left_table: bigquery-public-data.thelook_ecommerce.orders
        - right_table: bigquery-public-data.thelook_ecommerce.users
        - relationship_columns:
            - left_column: user_id
            - right_column: id

glossaries

glossaries 键列出了与您的数据及应用场景相关的业务术语、行话和缩写的定义。通过提供词汇表定义,有助于智能体准确解读并回答使用特定业务语言的问题。

例如,您可以根据特定业务情境定义常见业务状态和“OMPF”等字词,如下所示:

- glossaries:
    - glossary:
        - term: complete
        - description: Represents an order status where the order has been completed.
        - synonyms: 'finish, done, fulfilled'
    - glossary:
        - term: shipped
        - description: Represents an order status where the order has been shipped to the customer.
    - glossary:
        - term: returned
        - description: Represents an order status where the customer has returned the order.
    - glossary:
        - term: OMPF
        - description: Order Management and Product Fulfillment

additional_descriptions

additional_descriptions 键列出了系统指令中未涵盖的任何其他一般性说明或上下文信息。通过提供更多说明,有助于智能体更好地理解您的数据及应用场景的上下文信息。

例如,您可以使用 additional_descriptions 键提供有关组织的信息,如下所示:

- additional_descriptions:
    - text: All the sales data pertains to The Look, a fictitious ecommerce store.
    - text: 'Orders can be of three categories: food, clothes, and electronics.'

示例:BigQuery 中的系统指令

以下示例展示了虚构的销售分析师智能体的示例系统指令:

- system_instruction: >-
    You are an expert sales analyst for a fictitious ecommerce store. You will answer questions about sales, orders, and customer data. Your responses should be concise and data-driven.
- tables:
    - table:
        - name: bigquery-public-data.thelook_ecommerce.orders
        - description: Data for orders in The Look, a fictitious ecommerce store.
        - synonyms: sales
        - tags: 'sale, order, sales_order'
        - fields:
            - field:
                - name: order_id
                - description: The unique identifier for each customer order.
            - field:
                - name: user_id
                - description: The unique identifier for each customer.
            - field:
                - name: status
                - description: The current status of the order.
                - sample_values:
                    - complete
                    - shipped
                    - returned
            - field:
                - name: created_at
                - description: >-
                    The date and time at which the order was created in timestamp
                    format.
            - field:
                - name: returned_at
                - description: >-
                    The date and time at which the order was returned in timestamp
                    format.
            - field:
                - name: num_of_items
                - description: The total number of items in the order.
                - aggregations: 'sum, avg'
            - field:
                - name: earnings
                - description: The sales revenue for the order.
                - aggregations: 'sum, avg'
            - field:
                - name: cost
                - description: The cost for the items in the order.
                - aggregations: 'sum, avg'
        - measures:
            - measure:
                - name: profit
                - description: Raw profit (earnings minus cost).
                - exp: earnings - cost
                - synonyms: gains
        - golden_queries:
            - golden_query:
                - natural_language_query: How many orders are there?
                - sql_query: SELECT COUNT(*) FROM sqlgen-testing.thelook_ecommerce.orders
            - golden_query:
                - natural_language_query: How many orders were shipped?
                - sql_query: >-
                    SELECT COUNT(*) FROM sqlgen-testing.thelook_ecommerce.orders
                    WHERE status = 'shipped'
        - golden_action_plans:
            - golden_action_plan:
                - natural_language_query: Show me the number of orders broken down by age group.
                - action_plan:
                    - step: >-
                        Run a SQL query that joins the table
                        sqlgen-testing.thelook_ecommerce.orders and
                        sqlgen-testing.thelook_ecommerce.users to get a
                        breakdown of order count by age group.
                    - step: >-
                        Create a vertical bar plot using the retrieved data,
                        with one bar per age group.
    - table:
        - name: bigquery-public-data.thelook_ecommerce.users
        - description: Data for users in The Look, a fictitious ecommerce store.
        - synonyms: customers
        - tags: 'user, customer, buyer'
        - fields:
            - field:
                - name: id
                - description: The unique identifier for each user.
            - field:
                - name: first_name
                - description: The first name of the user.
                - tag: person
                - sample_values: 'alex, izumi, nur'
            - field:
                - name: last_name
                - description: The first name of the user.
                - tag: person
                - sample_values: 'warmer, stilles, smith'
            - field:
                - name: age_group
                - description: The age demographic group of the user.
                - sample_values:
                    - 18-24
                    - 25-34
                    - 35-49
                    - 50+
            - field:
                - name: email
                - description: The email address of the user.
                - tag: contact
                - sample_values: '222larabrown@gmail.com, cloudysanfrancisco@gmail.com'
        - golden_queries:
            - golden_query:
                - natural_language_query: How many unique customers are there?
                - sql_query: >-
                    SELECT COUNT(DISTINCT id) FROM
                    bigquery-public-data.thelook_ecommerce.users
            - golden_query:
                - natural_language_query: How many users in the 25-34 age group have a cymbalgroup email address?
                - sql_query: >-
                    SELECT COUNT(DISTINCT id) FROM
                    bigquery-public-data.thelook_ecommerce.users WHERE users.age_group =
                    '25-34' AND users.email LIKE '%@cymbalgroup.com';
    - relationships:
        - relationship:
            - name: orders_to_user
            - description: >-
                Connects customer order data to user information with the user_id and id fields to allow an aggregated view of sales by customer demographics.
            - relationship_type: many-to-one
            - join_type: left
            - left_table: bigquery-public-data.thelook_ecommerce.orders
            - right_table: bigquery-public-data.thelook_ecommerce.users
            - relationship_columns:
                - left_column: user_id
                - right_column: id
- glossaries:
    - glossary:
        - term: complete
        - description: Represents an order status where the order has been completed.
        - synonyms: 'finish, done, fulfilled'
    - glossary:
        - term: shipped
        - description: Represents an order status where the order has been shipped to the customer.
    - glossary:
        - term: returned
        - description: Represents an order status where the customer has returned the order.
    - glossary:
        - term: OMPF
        - description: Order Management and Product Fulfillment
- additional_descriptions:
    - text: All the sales data pertains to The Look, a fictitious ecommerce store.
    - text: 'Orders can be of three categories: food, clothes, and electronics.'