Generate SQL queries using natural language questions

This page describes how to set up, configure, and generate SQL statements using AlloyDB AI natural language. Natural language lets you create user-facing generative AI applications using natural language to query databases.

To enable the alloydb_ai_nl extension, which is the AlloyDB for PostgreSQL natural language support API, you perform the following high-level steps:

  1. Install the alloydb_ai_nl extension.
  2. Define a natural language configuration for your application.
  3. Register a schema.
  4. Add context.
  5. Add query templates.
  6. Define concept types and create a value index.
  7. Generate SQL statements using a natural language interface.

Before you begin

  • Request access to AlloyDB AI natural language and wait until you receive the enablement confirmation before you follow the instructions on this page.
  • Understand how to connect to the AlloyDB database and run PostgreSQL commands. For more information, see Connection overview.
  • Populate the database with the data and schema that the end user wants to access.

Create a cluster and enable Vertex AI integration

  1. Create an AlloyDB cluster and instance. You use the AlloyDB instance to create the application database and the schema.
  2. Enable Vertex AI integration. For more information, see Integrate with Vertex AI.

Required roles

To install the alloydb_ai_nl extension and grant access to other users, you must have the following Identity and Access Management (IAM) role in the Google Cloud project that you're using:

For more information, see Manage PostgreSQL users with standard authentication.

Prepare your environment

To prepare to generate natural language queries, you must install the required extension, create a configuration, and register a schema.

Install the alloydb_nl_ai extension

The alloydb_ai_nl extension uses the google_ml_integration extension, which interacts with large language models (LLM), including Gemini models on Vertex AI.

To install the alloydb_ai_nl extension, connect to the database and run the following command, which enables the google_ml_integration and alloydb_ai_nl extensions.

CREATE EXTENSION google_ml_integration;
CREATE EXTENSION alloydb_ai_nl cascade;

Create a natural language configuration and register a schema

AlloyDB AI natural language uses nl_config to associate applications to certain schemas, query templates, and model endpoints. nl_config is a configuration that associates an application to schema, templates, and other contexts. A large application can also use different configurations for different parts of the application, as long as you specify the right configuration when a question is sent from that part of the application. You can register an entire schema, or you can register specific schema objects, like tables, views, and columns.

  1. To create a natural language configuration, use the following example:

    SELECT
      alloydb_ai_nl.g_create_configuration(
        'my_app_config'        -- configuration_id
      );
    

    'gemini-2.0-flash:generateContent' is the model endpoint.

  2. Register a schema for a specified configuration using the following example:

    SELECT
      alloydb_ai_nl.g_manage_configuration(
        operation => 'register_schema',
        configuration_id_in => 'my_app_config',
        schema_names_in => '{my_schema}'
      );
    

Add context

Context includes any kind of information that you can use to answer an end user question. Context includes schema structure and relationships, summaries and descriptions of columns, column values and their semantics, and rules or statements of business logic specific to the application or domain.

Add general context for application-specific rules

General context items include application-specific rules, business logic statements, or any application- and domain-specific terminology that isn't linked to a specific schema object.

To add general context for application-specific rules and application or domain-specific terminology, follow these steps:

  1. To add a general context item for the specified configuration, use the following example:

    SELECT
      alloydb_ai_nl.g_manage_configuration(
        'add_general_context',
        'my_app_config',
        general_context_in => '{"If the user asks for a good seat, assume that means a window or aisle seat."}'
      );
    

    The preceding statement helps AlloyDB AI natural language provide higher quality responses to users' natural language questions.

  2. To view the general contexts for the specified configuration, run the following statement:

    SELECT alloydb_ai_nl.list_general_context(nl_config TEXT);
    

Generate and review schema context

Schema context describes schema objects including tables, views, materialized views, and columns. This context is stored as the COMMENT of each schema object.

  1. To generate contexts for schema objects, call the following APIs. For best results, make sure that the database tables contain representative data.

    -- For all schema objects (tables, views, materialized views and columns)
    -- within the scope of a provided nl_config.
    SELECT
      alloydb_ai_nl.generate_schema_context(
        'my_app_config' -- nl_config
      );
    
  2. Review the generated schema contexts by running the following statement.

    SELECT schema_object, object_context
    FROM alloydb_ai_nl.generated_schema_context_view;
    

    The generated schema contexts are stored in the preceding view.

  3. Optional: Update the generated schema contexts.

    SELECT
      alloydb_ai_nl.update_generated_relation_context(
        'my_schema.my_table',
        'This table contains archival records, if you need latest records use records_new table.'
      );
    
    SELECT
      alloydb_ai_nl.update_generated_column_context(
        'my_schema.my_table.column1',
        'The seat_class column takes single letters like "E" for economy, "P" for premium economy, "B" for business and "F" for First.'
      );
    
  4. Apply the context. When you apply the context, it takes effect immediately and is deleted from the generated_schema_context_view.

    -- For all schema objects (tables, views, materialized views and columns) 
    -- within the scope of nl_config.
    SELECT 
      alloydb_ai_nl.apply_generated_schema_context(
        'my_app_config' --nl_config
      );
    
  5. Optional: Verify the generated context. The following API lets you check the schema contexts, which are used when you generate SQL statements.

    -- For table, view or materialized view.
    SELECT
      alloydb_ai_nl.get_relation_context(
        'my_schema.my_table'
      );
    
    -- For column.
    SELECT
      alloydb_ai_nl.get_column_context(
        'my_schema.my_table.column1'
      );
    
  6. Optional: Manually set the schema context.

    -- For table, view or materialized view.
    SELECT 
      alloydb_ai_nl.set_relation_context(
        'my_schema.my_table',
        'One-to-many mapping from product to categories'
      );
    
    -- For column.
    SELECT 
      alloydb_ai_nl.set_column_context(
        'my_schema.my_table.column1',
        'This column provides additional tagged info for the product in  Json format, e.g., additional color or size information of the product - tags: { "color": "red", "size": "XL"}'
      );
    

Create query templates

To enhance the quality of generative AI applications built with LLMs, you can add templates. A query template is a curated set of representative or common natural language questions, with corresponding SQL queries, as well as explanations to provide a declarative rationale for the natural language-to-SQL (NL2SQL) generation. Templates are primarily intended to be specified by the application, but templates can also be automatically generated by the alloydb_ai_nl extension based on frequently used SQL queries. Each template must be associated with an nl_config.

The alloydb_ai_nl extension uses a template_store to dynamically incorporate relevant SQL templates in the process of generating a SQL statement to answer the user's question. The template_store identifies templates with similar intentions to the natural language question asked, identifies the corresponding parameterized SQL statement, and synthesizes a SQL statement by instantiating parameters with values from the natural language question. However, if there isn't a template with the same intention as the question that the user asked, alloydb_ai_nl uses every relevant template and context to compose a SQL statement.

You add templates by specifying the question (using a parameter named intent) and the SQL query.

To add a template to the template store, run the following statement:

SELECT
  alloydb_ai_nl.add_template(
    nl_config => 'my_app_config',
    intent => 'How many accounts associated with loans are located in the Prague region?',
    sql => 'SELECT COUNT(T1.account_id)
            FROM bird_dev_financial.account AS T1
            INNER JOIN bird_dev_financial.loan AS T2
              ON T1.account_id = T2.account_id
            INNER JOIN bird_dev_financial.district AS T3
              ON T1.district_id = T3.district_id
            WHERE T3."A3" = ''Prague''',
    check_intent => TRUE
  );

When check_intent is TRUE, alloydb_ai_nl performs a semantic check to confirm that the provided intent matches the passed-in SQL statement. If the intent doesn't match the SQL statement, the template isn't added.

Autogenerate templates

After you have a representative dataset in your tables, we recommend that you run SQL queries that correspond to the common questions that your end users are likely to ask. It's important that you ensure that the queries have good query plans and that they perform well.

After you run the queries, AlloyDB AI natural language can automatically generate templates based on the query history. You can call the following APIs to generate templates. You need to review and apply the generated templates before they take effect.

Template auto generation is based on the most frequently used queries in the query log, google_db_advisor_workload_statements. The queries are filtered based on the following criteria:

  • SELECT statements
  • Executables: the query can be successfully processed by the EXPLAIN command.
  • No duplication: the query has not been previously used to generate templates.
  • All referred tables and views are within the scope of the nl_config.

To autogenerate, review, and apply templates, follow these steps:

  1. Request AlloyDB to generate templates based on your query history:

    SELECT 
      alloydb_ai_nl.generate_templates(
        'my_app_config', 
    );
    

    Use the provided view, alloydb_ai_nl.generated_templates_view, to review the generated_templates.

    The following output shows the number of generated templates:

    -[ RECORD 1 ]------+--
    generate_templates | 1
    
  2. Review the generated templates using the generated_templates_view view.

    SELECT * 
    FROM alloydb_ai_nl.generated_templates_view;
    

    The following is an example of the returned output:

    -[ RECORD 1 ]----------------------------------------------------------------
    id          | 1
    config      | my_app_config
    type        | Template
    manifest    | How many clients have a birth year of a given number?
    nl          | How many clients have a birth year of 1997?
    sql         | select count(*) from public.client as T where  
                 to_char(T.birth_date::timestamp, 'YYYY') = '1997';
    intent      | How many clients have a birth year of 1997?
    psql        | select count(*) from public.client as T where  
                 to_char(T.birth_date::timestamp, 'YYYY') = $1;
    pintent     | How many clients have a birth year of $1?
    comment     |
    explanation |
    weight      | 1
    
  3. To update a generated template, run the following example statement:

    SELECT alloydb_ai_nl.update_generated_template(
      id => 1,
      manifest => 'How many clients are born in a given year?',
      nl => 'How many clients are born in 1997?',
      intent => 'How many clients are born in 1997?',
      pintent => 'How many clients are born in $1?'
    
    );
    
  4. Apply the templates. The templates that you apply are immediately added to the template store, and they are deleted from the review view.

    -- For all templates generated under the nl config.
    SELECT
      alloydb_ai_nl.apply_generated_templates('my_app_config');
    

Configure security for natural language

To configure security for AlloyDB AI natural language, see Manage data application security using parameterized secure views.

Define concept types and value index

You define concept types and value indexes to provide a deeper understanding of questions being asked. A concept type is a category or class of entities that identifies the semantic meaning of words and phrases, rather than just their literal form.

For example, two country names might be the same even if one country name is in upper case, for example, USA, and the other country name is in lower case, for example, usa. In this case, country name is the concept type. Other examples of concept types include person name, city name, and date.

A value index is an index on top of values in the columns that are part of the natural language configuration nl_config, based on the concept types associated with each column. A value index enables efficient matching of value phrases for the question being asked and values in the database.

To define concept types and a value index, follow these steps using the provided examples. The examples associate a column to a concept type, create and refresh a value index, and use a synonym set to perform a value search.

  1. To associate a column with a concept type, run the following query:

    SELECT
      alloydb_ai_nl.associate_concept_type(
        column_names_in => 'my_schema.country.country_name',
        concept_type_in => 'country_name',
        nl_config_id_in => 'my_app_config'
      );
    
  2. To create a value index based on all the columns that are part of a natural language config and are associated with a concept type, run the following statement:

    SELECT
      alloydb_ai_nl.create_value_index(
        nl_config_id_in => 'my_app_config'
      );
    
  3. When you associate concept types to new columns, refresh the value index to reflect the changes.

    SELECT
      alloydb_ai_nl.refresh_value_index(
        nl_config_id_in => 'my_app_config'
      );
    
  4. To enable AlloyDB AI natural language to match synonyms of a value, run the following example statement:

    SELECT
      alloydb_ai_nl.insert_synonym_set(
        ARRAY [
          'USA',
          'US',
          'United States',
          'United States of America'
        ]
      );
    

    Although the data in your tables might use a specific value—for example, if United States is used to identify a country—you can define a synonym set that contains all the synonyms for United States. If any of the synonyms appear in the natural language question, AlloyDB AI natural language matches the synonyms with the values in your tables.

  5. Perform a value search to find the correct database values, given an array of value phrases.

    SELECT
      alloydb_ai_nl.get_concept_and_value(
        value_phrases_in => ARRAY['United States'],
        nl_config_id_in  => 'my_app_config'
      );
    

    For example, if a user asks a question like "What is the population of the United States?" that uses the following get_sql query, AlloyDB AI natural language uses the get_concept_and_value function with the value phrase United States to perform a fuzzy search against the value indexes. A fuzzy search is a search technique that finds matches even when the search query doesn't exactly match corresponding data.

    Natural language finds a result—the value USA— that is close to the search query, and it uses that result to generate the SQL query.

    SELECT
      alloydb_ai_nl.get_sql(
        nl_config_id    => 'my_app_config',
        nl_question     => 'What is the population of the United States?',
        additional_info => json_build_object('enrich_nl_question', TRUE)
      ) ->> 'sql';
    

    Built-in concept types defined by AlloyDB AI natural language are listed in the following table.

    Concept name Description
    generic_entity_name A single string type column can be used for a generic entity name. For example:
      SELECT alloydb_ai_nl.associate_concept_type('public.item.item_name', 'generic_entity_name')
      
    country_name, city_name, region_name Names of countries, cities, and regions. The usage is exactly the same as the generic_entity_name concept type.
    full_person_name Name of the person, consisting of the first, last, and middle names. Up to three string type columns can be used for a full person name. Any of the columns can be skipped when associating name columns to the full_person_name. For example:
      SELECT alloydb_ai_nl.associate_concept_type('public.person.last_name,public.person.first_name,public.person.middle_name','full_person_name')
      
    ssn A single string column containing a social security number. For example:
      SELECT alloydb_ai_nl.associate_concept_type('public.person.ssn','ssn')
     
    date A date or timestamp. For example:
     SELECT alloydb_ai_nl.associate_concept_type('public.person.date_col','date')
     

Generate SQL statements from natural language inputs

You can use AlloyDB AI natural language to generate SQL statements from natural language inputs. When you run the generated SQL statement, it provides the data from the database that you need to answer the natural language question.

  1. To use natural language to get results from your database using the alloydb_ai_nl.get_sql function, use the following example:

    SELECT
      alloydb_ai_nl.get_sql(
        'my_app_config', -- nl_config
        'What is the sum that client number 4''s account has following transaction 851?' -- nl question
      );
    

    The following JSON output is returned:

    {
      "sql": "SELECT T3.balance FROM public.client AS T1 INNER JOIN public.account AS T2 ON T1.district_id = T2.district_id INNER JOIN public.trans AS T3 ON T2.account_id = T3.account_id WHERE T1.client_id = 4 AND T3.trans_id = 851",
      "prompt": "",
      "retries": 0,
      "error_msg": "",
      "nl_question": "What is the sum that client number 4's account has following transaction 851?"
    }
    
  2. Optional: To extract the generated SQL query as a text string, add ->>'sql':

    SELECT
      alloydb_ai_nl.get_sql(
        'my_app_config', -- nl_config
        'What is the sum that client number 4''s account has following transaction 851?' -- nl question
      ) ->> 'sql';
    

    The ->> operator is used to extract a JSON value as text. The alloydb_ai_nl.get_sql function returns a JSON object, which is the part of the statement that retrieves the value associated with the key sql. This value is the generated SQL query.

Test and refine

To get improved auto-generated queries, add better context, query templates, and value indexes, then iterate until you achieve the results that you want.

What's next