Asynchronous secondary index queries

This document provides examples of common query patterns for building asynchronous secondary indexes in Bigtable. The examples use IDs and values that are similar to those in Data for examples and you can create a test table to test the queries.

Before you read this page, familiarize yourself with Create an asynchronous secondary index and GoogleSQL for Bigtable.

Example queries

The following examples show how to create asynchronous secondary indexes using different GoogleSQL queries.

Create an inverted index based on a column qualifier and values

The following query creates an inverted index by extracting all key-value pairs from a column family that has a flexible schema. It uses MAP_ENTRIES to get an array of all data that the stats_summary column family stores, and then UNNEST to transform each key-value pair into a separate row. The resulting index allows for fast lookups based on stat_description and stat_value. If you add new types of statistics to the source table, then Bigtable includes them in the asynchronous index automatically.

SELECT
 stats.key as stat_description,
 stats.value as stat_value,
 _key as original_key,
 cell_plan as cell_plan
FROM test_table
CROSS JOIN UNNEST(MAP_ENTRIES(stats_summary)) stats
ORDER BY stat_description, stat_value, original_key

As a result, each key-value pair from the original stats_summary column family has its own row in the continuous materialized view:

stat_description (key)	stat_value (key)	original_key (key)	cell_plan
connected_cell	"2"	phone#4c410523#20190501	{ "data_plan_01gb": "true", "data_plan_05gb": "false" }
os_build	PQ2A.190401.002	phone#5c10102#20190501	{ "data_plan_10gb": "false" }

To get fast lookups on data that Bigtable captured as values, use the new row keys to query the data, as shown in the following example:

SELECT cell_plan, original_key
FROM `mv_inverted_index`
WHERE stat_description = "connected_cell"
AND stat_value = "2"

Create a new row key based on existing data

The following example creates a new index key, total_plan_capacity, by categorizing data plans that the cell_plan column family stores. The query uses MAP_KEYS to get all keys in cell_plan and ARRAY_INCLUDES_ANY to filter for specific data plan keys. A CASE statement then defines the logic to assign a total_plan_capacity category that's based on the presence and values of these data plans. This allows for efficient lookups based on the combined data plan capacity.

SELECT
 CASE
    WHEN
     cell_plan['data_plan_01gb'] = "true"
     AND (cell_plan['data_plan_05gb'] = "false" OR cell_plan['data_plan_05gb'] IS NULL)
     AND (cell_plan['data_plan_10gb'] = "false" OR cell_plan['data_plan_10gb'] IS NULL)
   THEN 'x-small'

   WHEN
     cell_plan['data_plan_01gb'] = "true"
     AND (cell_plan['data_plan_05gb'] = "true")
     AND (cell_plan['data_plan_10gb'] = "false" OR cell_plan['data_plan_10gb'] IS NULL)
   THEN 'small'

   WHEN
     cell_plan['data_plan_01gb'] = "true"
     AND (cell_plan['data_plan_05gb'] = "false" OR cell_plan['data_plan_05gb'] IS NULL)
     AND (cell_plan['data_plan_10gb'] = "true")
   THEN 'medium'

   WHEN
     (cell_plan['data_plan_01gb'] = "false" OR cell_plan['data_plan_01gb'] IS NULL)
     AND (cell_plan['data_plan_05gb'] = "true")
     AND (cell_plan['data_plan_10gb'] = "true")
   THEN 'large'

   WHEN
     cell_plan['data_plan_01gb'] = "true"
     AND (cell_plan['data_plan_05gb'] = "true")
     AND (cell_plan['data_plan_10gb'] = "true")
   THEN 'x-large'
 END as total_plan_capacity,
 _key as original_key,
 stats_summary
FROM test_table
WHERE ARRAY_INCLUDES_ANY(MAP_KEYS(cell_plan), ["data_plan_01gb", "data_plan_05gb","data_plan_10gb"])
ORDER BY total_plan_capacity, original_key

Create a time-series-based index

Bigtable stores each cell value and its associated timestamp. By using the WITH_HISTORY=>TRUE flag in the FROM clause, you can retrieve all versions of data and their timestamps. The UNPACK table function then expands the query results so that each timestamped value appears on its own row with a separate _timestamp column. This lets you create an asynchronous secondary index where the new row key is based on these timestamp values, enabling quick lookups that are based on time ranges.

SELECT
 _timestamp as stats_timestamp,
 _key as original_key,
 stats_summary
FROM UNPACK((
 select
   _key,
   stats_summary
  FROM
 test_table (WITH_HISTORY=>TRUE)
))
ORDER BY stats_timestamp, original_key

Create an index entry for items and values in a JSON cell

Bigtable is often used as a key-value store for large-scale workloads, including data in formats such as JSON. The following example shows how to create an asynchronous secondary index for fast lookups of data that JSON stores. The query uses JSON_EXTRACT to extract the store_id from a JSON string within the transaction_information column family in a table named sales. This index allows for quick retrieval of sales transactions by specific store IDs.

SELECT
JSON_EXTRACT(transaction_information["json"], "$.store_id") as store_id,
_key as original_key,
transaction_information
from sales
ORDER BY store_id, original_key;