Use AlloyDB AI natural language to generate SQL


This tutorial describes how to set up and use the AlloyDB AI natural language API using the Google Cloud console. You learn how to configure the AlloyDB AI natural language API so that you can ask natural language questions and receive SQL queries and results.

Objectives

  • Create and populate tables, and use auto generation to create context.
  • Create a value index for the columns in the database.
  • Create and configure a natural language configuration (nl_config) object.
  • Create templates for a sample query in the application.
  • Use the get_sql() function to produce a SQL query that answers a question.
  • Use the execute_nl_query() function to answer a natural language question using the database.

Costs

In this document, you use the following billable components of Google Cloud:

To generate a cost estimate based on your projected usage, use the pricing calculator. New Google Cloud users might be eligible for a free trial.

When you finish the tasks that are described in this document, you can avoid continued billing by deleting the resources that you created. For more information, see Clean up.

Before you begin

Request access

Before you can use AlloyDB AI natural language to generate SQL, you must request access to AlloyDB AI natural language and wait until you receive the enablement confirmation before you follow the instructions in this tutorial.

Enable billing and required APIs

  1. In the Google Cloud console, select a project.

    Go to project selector

  2. Make sure that billing is enabled for your Google Cloud project.

  3. Enable the Cloud APIs necessary to create and connect to AlloyDB for PostgreSQL.

    Enable the API

    1. In the Confirm project step, click Next to confirm the name of the project you are going to make changes to.
    2. In the Enable APIs step, click Enable to enable the following:

      • AlloyDB API

Create and connect to a database

  1. Create a cluster and its primary instance.
  2. Connect to your instance and create a database.
  3. Enable Vertex AI integration. For more information, see Integrate with Vertex AI.

Install the required extension

To install the alloydb_ai_nl extension, which is the AlloyDB AI natural language support API, run the following query:

CREATE EXTENSION google_ml_integration;
CREATE EXTENSION alloydb_ai_nl cascade;

Create the nla_demo schema and tables

In the following steps, you create the nla_demo schema and tables in the schema. You populate the tables with synthetic data. The provided schema and data are designed to support the fundamental operations of an online retail business, with potential applications extending to customer management, analytics, marketing, and operational aspects.

The sample data shows how you can use AlloyDB AI natural language for development, testing, and demonstration purposes, particularly for features like natural language interfaces.

  1. Create the schema by running the following query:

    CREATE SCHEMA nla_demo;
    
  2. Create tables in the nla_demo schema. The addresses table stores the address information for customers and orders.

    CREATE TABLE nla_demo.addresses (
        address_id      SERIAL         PRIMARY KEY,
        street_address  VARCHAR(255)   NOT NULL,
        city            VARCHAR(255)   NOT NULL,
        country         VARCHAR(255)
    );
    
  3. Create the customers table by running the following query. This table stores customer information including the customer ID, name, contact details, address reference, birth date, and record creation time.

    CREATE TABLE nla_demo.customers (
        customer_id     SERIAL         PRIMARY KEY,
        first_name      VARCHAR(255)   NOT NULL,
        last_name       VARCHAR(255)   NOT NULL,
        email           VARCHAR(255)   UNIQUE NOT NULL,
        address_id      INTEGER        REFERENCES nla_demo.addresses(address_id),
        date_of_birth   DATE,
        created_at      TIMESTAMP      DEFAULT CURRENT_TIMESTAMP
    );
    
  4. Create the categories table, which stores product categories.

    CREATE TABLE nla_demo.categories (
        category_id     INTEGER        PRIMARY KEY,
        category_name   VARCHAR(255)   UNIQUE NOT NULL
    );
    
  5. Create the products table. This table stores product information like the product ID, name, description, brand, category linkage, and record creation time.

    CREATE TABLE nla_demo.products (
        product_id    INTEGER        PRIMARY KEY,
        name          VARCHAR(255)   NOT NULL,
        description   TEXT           DEFAULT 'Not available',
        brand_id      INTEGER,
        category_id   INTEGER        REFERENCES nla_demo.categories(category_id),
        created_at    TIMESTAMP      DEFAULT CURRENT_TIMESTAMP
    );
    
  6. Create the orders table. This table stores information about customer orders, including the customer, date, total amount, shipping and billing addresses, and order status.

    CREATE TABLE nla_demo.orders (
        order_id            INTEGER        PRIMARY KEY,
        customer_id         INTEGER        REFERENCES nla_demo.customers(customer_id),
        order_date          TIMESTAMP      DEFAULT CURRENT_TIMESTAMP,
        total_amount        DECIMAL(10, 2) NOT NULL,
        shipping_address_id INTEGER        REFERENCES nla_demo.addresses(address_id),
        billing_address_id  INTEGER        REFERENCES nla_demo.addresses(address_id),
        order_status        VARCHAR(50)
    );
    
  7. Create the order_items table. This table records individual items in an order, links to the order and product variant, and specifies quantity and price.

    CREATE TABLE nla_demo.order_items (
        order_item_id   SERIAL         PRIMARY KEY,
        order_id        INTEGER        REFERENCES nla_demo.orders(order_id),
        product_id      INTEGER        REFERENCES nla_demo.products(product_id),
        quantity        INTEGER        NOT NULL,
        price           DECIMAL(10, 2) NOT NULL
    );
    

Populate tables in the nla_demo schema

  1. Populate the addresses table by running the following query:

    INSERT INTO nla_demo.addresses (street_address, city, country)
    VALUES
        ('1800 Amphibious Blvd', 'Mountain View', 'USA'),
        ('Avenida da Pastelaria, 1903', 'Lisbon', 'Portugal'),
        ('8 Rue du Nom Fictif 341', 'Paris', 'France');
    
  2. Populate the customers table.

    INSERT INTO nla_demo.customers (first_name, last_name, email, address_id, date_of_birth)
    VALUES
        ('Alex', 'B.', 'alex.b@example.com', 1, '2003-02-20'),
        ('Amal', 'M.', 'amal.m@example.com', 2, '1998-11-08'),
        ('Dani', 'G.', 'dani.g@example.com', 3, '2002-07-25');
    
  3. Populate the categories table.

    INSERT INTO nla_demo.categories (category_id, category_name)
    VALUES
        (1, 'Accessories'),
        (2, 'Apparel'),
        (3, 'Footwear'),
        (4, 'Swimwear');
    
  4. Populate the products table.

    INSERT INTO nla_demo.products (product_id, brand_id, category_id, name)
    VALUES
        (1, 1, 2, 'Hoodie'),
        (2, 1, 3, 'Running Shoes'),
        (3, 2, 4, 'Swimsuit'),
        (4, 3, 1, 'Tote Bag');
    
  5. Populate the orders table.

    INSERT INTO nla_demo.orders (order_id, customer_id, total_amount, shipping_address_id, billing_address_id, order_status)
    VALUES
        (1, 1, 99.99, 1, 1, 'Shipped'),
        (2, 1, 69.99, 1, 1, 'Delivered'),
        (3, 2, 20.99, 2, 2, 'Processing'),
        (4, 3, 79.99, 3, 3, 'Shipped');
    
  6. Populate the order_items table.

    INSERT INTO nla_demo.order_items (order_id, product_id, quantity, price)
    VALUES
        (1, 1, 1, 79.99),
        (1, 3, 1, 20.00),
        (2, 4, 1, 69.99),
        (3, 3, 1, 20.00),
        (4, 2, 1, 79.99);
    

Create a natural language configuration

To use AlloyDB AI natural language, make sure that the Vertex AI endpoint is configured. Then you create a configuration and register a schema. g_alloydb_ai_nl.g_create_configuration creates the model.

  1. Create a natural language configuration.

    SELECT alloydb_ai_nl.g_create_configuration( 'nla_demo_cfg' );
    

Create and apply context for tables and columns

To provide accurate answers to natural language questions, you use the AlloyDB AI natural language API to provide context about tables, views, and columns. You can use the automated context generation feature of the AlloyDB AI natural language API to produce context from tables and columns, and apply the context as COMMENTS attached to tables, views, and columns.

  1. To generate schema contexts for the tables and their columns that are registered in the nla_demo_cfg configuration, run the following:

    SELECT alloydb_ai_nl.generate_schema_context(
      'nla_demo_cfg',
      TRUE
    );
    

    The preceding query populates the alloydb_ai_nl.generated_schema_context_view view with context. Passing TRUE overwrites the context in this view from previous runs.

  2. To verify the generated context for the nla_demo.products table, run the following query:

    SELECT object_context
    FROM alloydb_ai_nl.generated_schema_context_view
    WHERE schema_object = 'nla_demo.products';
    

    The resulting context is similar to the following:

    The products table stores information about products, including their name,
    a brief description, the brand they belong to (referenced by brand_id),
    and the category they fall under (referenced by category_id). Each product
    has a unique identifier (product_id) and a timestamp indicating its creation
    time (created_at).
    
  3. To verify the produced context for a column, such as nla_demo.products.name, run the following:

    SELECT object_context
    FROM alloydb_ai_nl.generated_schema_context_view
    WHERE schema_object = 'nla_demo.products.name';
    

    The query output is similar to the following:

    The name column in the nla_demo.products table contains the specific
    name or title of each product. This is a short, descriptive text string
    that clearly identifies the product, like "Hoodie," "Tote Bag,"
    "Running Shoes," or "Swimsuit." It helps distinguish individual products
    within the broader context of their brand and category. The name column
    specifies the exact product. This column is essential for users and
    systems to identify and refer to specific products within the database.
    
  4. Review the generated context in the alloydb_ai_nl.generated_schema_context_view view, and update the context that needs revision.

    SELECT alloydb_ai_nl.update_generated_relation_context(
      'nla_demo.products',
      'The "nla_demo.products" table stores product details such as ID, name, description, brand, category linkage, and record creation time.'
    );
    
    SELECT alloydb_ai_nl.update_generated_column_context(
      'nla_demo.products.name',
      'The "name" column in the "nla_demo.products" table contains the specific name or title of each product.'
    );
    
  5. Apply the generated context that will be attached to the corresponding objects:

    SELECT alloydb_ai_nl.apply_generated_relation_context(
      'nla_demo.products', true
    );
    
    SELECT alloydb_ai_nl.apply_generated_column_context(
      'nla_demo.products.name',
      true
    );
    

    The resulting context entries in the alloydb_ai_nl.generated_schema_context_view view are applied to the corresponding schema objects, and the comments are overwritten.

Construct the value index

The AlloyDB AI natural language API produces accurate SQL queries by using value linking. Value linking associates value phrases in natural language statements with pre-registered concept types and column names which can enrich the natural language question.

For example, the question "Give me the price of a Hoodie" can be answered more accurately if Hoodie is associated with a product_name concept, which is associated with the nla_demo.products.name. column.

  1. To define the product_name concept type and associate it with the nla_demo.products.name column, run the following queries:

    SELECT alloydb_ai_nl.add_concept_type(
        concept_type_in => 'product_name',
        match_function_in => 'alloydb_ai_nl.get_concept_and_value_generic_entity_name',
        additional_info_in => '{
          "description": "Concept type for product name.",
          "examples": "SELECT alloydb_ai_nl.get_concept_and_value_generic_entity_name(''Camera'')" }'::jsonb
    );
    SELECT alloydb_ai_nl.associate_concept_type(
        'nla_demo.products.name',
        'product_name'
    );
    
  2. To verify that the product_name concept type is added to the list of concept types, run the following query to ensure that product_name is included in the result of this query:

    SELECT alloydb_ai_nl.list_concept_types();
    
  3. To verify that the nla_demo.products.name column is associated with the product_name concept type, run the following query:

    SELECT *
    FROM alloydb_ai_nl.value_index_columns
    WHERE column_names = 'nla_demo.products.name';
    
  4. After you define the concept types and associate columns with them, create a value index.

    SELECT alloydb_ai_nl.create_value_index('nla_demo_cfg');
    SELECT alloydb_ai_nl.refresh_value_index('nla_demo_cfg');
    

Define a query template

You can define templates to improve the quality of the answers produced by the AlloyDB AI natural language API.

  1. To provide example templates for business critical questions, and to provide anticipated questions for which high accuracy is expected, run the following query to add a template:

    SELECT alloydb_ai_nl.add_template(
        nl_config_id => 'nla_demo_cfg',
        intent => 'List the first names and the last names of all customers who ordered Swimsuit.',
        sql => 'SELECT c.first_name, c.last_name FROM nla_demo.Customers c JOIN nla_demo.orders o ON c.customer_id = o.customer_id JOIN nla_demo.order_items oi ON o.order_id = oi.order_id JOIN nla_demo.products p ON oi.product_id = p.product_id  AND p.name = ''Swimsuit''',
        sql_explanation => 'To answer this question, JOIN `nla_demo.Customers` with `nla_demo.orders` on having the same `customer_id`, and JOIN the result with nla_demo.order_items on having the same `order_id`. Then JOIN the result with `nla_demo.products` on having the same `product_id`, and filter rwos that with p.name = ''Swimsuit''. Return the `first_name` and the `last_name` of the customers with matching records.',
        check_intent => TRUE
    );
    
  2. To view the list of added templates, query the alloydb_ai_nl.template_store_view:

    SELECT nl, sql, intent, psql, pintent
    FROM alloydb_ai_nl.template_store_view
    WHERE config = 'nla_demo_cfg';
    

    The following output is returned:

    nl      | List the first names and the last names of all customers who ordered Swimsuit.
    sql     | SELECT c.first_name, c.last_name
            | FROM nla_demo.Customers c
            | JOIN nla_demo.orders o ON c.customer_id = o.customer_id
            | JOIN nla_demo.order_items oi ON o.order_id = oi.order_id
            | JOIN nla_demo.products p ON oi.product_id = p.product_id
            | AND p.name = 'Swimsuit'
    intent  | List the first names and the last names of all customers who ordered
            | Swimsuit.
    psql    | SELECT c.first_name, c.last_name
            | FROM nla_demo.Customers c JOIN nla_demo.orders o
            | ON c.customer_id = o.customer_id 
            | JOIN nla_demo.order_items oi ON o.order_id = oi.order_id
            | JOIN nla_demo.products p ON oi.product_id = p.product_id
            | AND p.name = $1
    pintent | List the first names and the last names of all customers who ordered
            | $1.
    

    In this template, the value corresponding to the psql attribute is the parameterized SQL query, and the value for the pintent column is the parameterized intent statement. The id of a recently added template can be different, based on the templates that were added before. Templates are used to provide highly accurate answers to questions.

Generate SQL results from natural language questions

  1. To use the AlloyDB AI natural language API to produce SQL queries and result sets, run the following query:

    SELECT
        alloydb_ai_nl.get_sql(
            'nla_demo_cfg',
            'Find the customers who purchased Tote Bag.'
        ) ->> 'sql';
    

    The following output is returned:

    SELECT DISTINCT "c"."first_name", "c"."last_name"
    FROM "nla_demo"."customers" AS "c"
    JOIN "nla_demo"."orders" AS "o" ON "c"."customer_id" = "o"."customer_id"
    JOIN "nla_demo"."order_items" AS "oi" ON "o"."order_id" = "oi"."order_id"
    JOIN "nla_demo"."products" AS "p" ON "oi"."product_id" = "p"."product_id"
    WHERE "p"."name" = 'Tote Bag';
    

    The JSON output is a SQL query using the template that you added in Define a query template.

  2. To use the AlloyDB AI natural language API to produce the result of a question, run the following query:

    SELECT
    alloydb_ai_nl.execute_nl_query(
        'Find the last name of the customers who live in Lisbon.',
        'nla_demo_cfg'
    );
    

    The following output is returned:

    execute_nl_query     
    --------------------------
    {"last_name":"M."}
    

Clean up

To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.

The following sections describe how to delete these resources and objects.

Delete the cluster

When you delete the cluster that you created in Before you begin, all of the objects that you created are also deleted.

  1. In the Google Cloud console, go to the Clusters page.

    Go to Clusters

  2. Click the name of your cluster, my-cluster, in the Resource name column.

  3. Click Delete cluster.

  4. In Delete cluster my-cluster, enter my-cluster to confirm you want to delete your cluster.

  5. Click Delete.

  6. If you created a private connection when you created a cluster, go to the Google Cloud console VPC networks page and click Delete VPC network.

Delete the objects

You can choose to keep the resources that you set up in Before you begin, and you can delete just the objects that you created in the Google Cloud project.

  1. To remove the template that you defined in Define a query template, run the following query:

    SELECT alloydb_ai_nl.drop_template(id)
    FROM alloydb_ai_nl.template_store_view
    WHERE config = 'nla_demo_cfg';
    
  2. To remove the product_name concept type that you defined in Construct the value index, run the following query:

    SELECT alloydb_ai_nl.drop_concept_type('product_name');
    
  3. To refresh the value index after you remove the product_name concept type, run the following query:

    SELECT alloydb_ai_nl.refresh_value_index();
    
  4. To remove the nla_demo_cfg configuration that you created in Create a natural language configuration, run the following query:

    SELECT
    alloydb_ai_nl.g_manage_configuration(
        'drop_configuration',
        'nla_demo_cfg'
    );
    
  5. To remove the nla_demo schema and tables that you created and populated in Create the nla_demo schema and tables and Populate tables in the nla_demo schema, run the following query:

    DROP SCHEMA nla_demo CASCADE;
    

What's next