The AI.CLASSIFY function

This document describes the AI.CLASSIFY function, which uses a Vertex AI Gemini model to classify inputs into categories that you provide. BigQuery automatically structures your input to improve the quality of the classification.

The following are common use cases:

  • Retail: Classify reviews by sentiment or classify products by categories.
  • Text analysis: Classify support tickets or emails by topic.

Input

AI.CLASSIFY accepts the following types of input:

  • Text data from standard tables.

This function passes your input to a Gemini model and incurs charges in Vertex AI each time it's called.

Syntax

AI.CLASSIFY(
  [ input => ] 'INPUT',
  [ categories => ] 'CATEGORIES',
  connection_id => 'CONNECTION'
)

Arguments

AI.CLASSIFY takes the following arguments:

  • INPUT: a STRING or STRUCT value that specifies the input to classify. The input must be the first argument that you specify. You can provide the input value in the following ways:

    • Specify a STRING value. For example, 'apple'.
    • Specify a STRUCT value that contains one or more fields. You can use the following types of fields within the STRUCT value:

      Field type Description Examples
      STRING A string literal, or the name of a STRING column. String literal:
      'apple'

      String column name:
      my_string_column
      ARRAY<STRING> You can only use string literals in the array. Array of string literals:
      ['red ', 'apples']

      The function combines STRUCT fields similarly to a CONCAT operation and concatenates the fields in their specified order. The same is true for the elements of any arrays used within the struct. The following table shows some examples of STRUCT prompt values and how they are interpreted:

      Struct field types Struct value Semantic equivalent
      STRUCT<STRING> ('apples') 'apples'
      STRUCT<STRING, STRING> ('red', ' apples') 'red apples'
      STRUCT<STRING, ARRAY<STRING>> ('crisp ', ['red', ' apples']) 'crisp red apples'
  • CATEGORIES: the categories by which to classify the input. You can specify categories with or without descriptions:

    • With descriptions: Use an ARRAY<STRUCT<STRING, STRING>> value where each struct contains the category name, followed by a description of the category. The array can only contain string literals. For example, you could use colors to classify sentiment:

      [('green', 'positive'), ('yellow', 'neutral'), ('red', 'negative')]

      You can optionally name the fields of the struct for your own readability, but the field names aren't used by the function:

        [STRUCT('green' AS label, 'positive' AS description),
         STRUCT('yellow' AS label, 'neutral' AS description),
         STRUCT('red' AS label, 'negative' AS description)]
      
    • Without descriptions: Use an ARRAY<STRING> value. The array can only contain string literals. This works well when your categories are self-explanatory. For example, you could use the following categories to classify sentiment:

      ['positive', 'neutral', 'negative']

    To handle input that doesn't closely match a category, consider including an 'Other' category.

  • CONNECTION: a STRING value specifying the Cloud resource connection to use. The following forms are accepted:

    • Connection name: [PROJECT_ID].LOCATION.CONNECTION_ID

      For example, myproject.us.myconnection.

    • Fully qualified connection ID: projects/PROJECT_ID/locations/LOCATION/connections/CONNECTION_ID

      For example, projects/myproject/locations/us/connections/myconnection.

    Replace the following:

    • PROJECT_ID: the project ID of the project that contains the connection.
    • LOCATION: the location used by the connection.
    • CONNECTION_ID: the connection ID—for example, myconnection.

      You can get this value by viewing the connection details in the Google Cloud console and copying the value in the last section of the fully qualified connection ID that is shown in Connection ID. For example, projects/myproject/locations/connection_location/connections/myconnection.

Output

AI.CLASSIFY returns a STRING value containing the provided category that best fits the input.

If the call to Vertex AI is unsuccessful for any reason, such as exceeding quota or model unavailability, then the function returns NULL.

Examples

The following examples show how to use the AI.CLASSIFY function to classify text and images into predefined categories.

Classify text by topic

The following query categorizes BBC news articles into high-level categories:

SELECT
  title,
  body,
  AI.CLASSIFY(
    body,
    categories => ['tech', 'sport', 'business', 'politics', 'entertainment', 'other'],
    connection_id => 'us.example_connection') AS category
FROM
  `bigquery-public-data.bbc_news.fulltext`
LIMIT 100;

Classify reviews by sentiment

The following query classifies movie reviews of The English Patient by sentiment according to a custom color scheme. For example, a review that is very positive is classified as 'green'.

SELECT
  AI.CLASSIFY(
    ('Classify the review by sentiment: ', review),
    categories => 
         [('green', 'The review is positive.'),
          ('yellow', 'The review is neutral.'),
          ('red', 'The review is negative.')],
    connection_id => 'us.example_connection') AS ai_review_rating,
  reviewer_rating AS human_provided_rating,
  review,
FROM
  `bigquery-public-data.imdb.reviews`
WHERE
  title = 'The English Patient'

Locations

You can run AI.CLASSIFY in all of the regions that support Gemini models, and also in the US and EU multi-regions.

Quotas

See Generative AI functions quotas and limits.

What's next