Asynchronous secondary index queries
This document provides examples of common query patterns for building asynchronous secondary indexes in Bigtable. The examples use IDs and values that are similar to those in Data for examples and you can create a test table to test the queries.
Before you read this page, familiarize yourself with Create an asynchronous secondary index and GoogleSQL for Bigtable.
Example queries
The following examples show how to create asynchronous secondary indexes using different GoogleSQL queries.
Create an inverted index based on a column qualifier and values
The following query creates an inverted index by extracting all key-value pairs
from a column family that has a flexible schema. It uses MAP_ENTRIES
to get an
array of all data that the stats_summary
column family stores, and then
UNNEST
to transform each key-value pair into a separate row. The resulting
index allows for fast lookups based on stat_description
and stat_value
. If
you add new types of statistics to the source table, then Bigtable
includes them in the asynchronous index automatically.
SELECT
stats.key as stat_description,
stats.value as stat_value,
_key as original_key,
cell_plan as cell_plan
FROM test_table
CROSS JOIN UNNEST(MAP_ENTRIES(stats_summary)) stats
ORDER BY stat_description, stat_value, original_key
Create a new row key based on existing data
The following example creates a new index key, total_plan_capacity
, by
categorizing data plans that the cell_plan
column family stores. The query
uses MAP_KEYS
to get all keys in cell_plan
and ARRAY_INCLUDES_ANY
to
filter for specific data plan keys. A CASE
statement then defines the logic to
assign a total_plan_capacity
category that's based on the presence and values
of these data plans. This allows for efficient lookups based on the combined
data plan capacity.
SELECT
CASE
WHEN
cell_plan['data_plan_01gb'] = "true"
AND (cell_plan['data_plan_05gb'] = "false" OR cell_plan['data_plan_05gb'] IS NULL)
AND (cell_plan['data_plan_10gb'] = "false" OR cell_plan['data_plan_10gb'] IS NULL)
THEN 'x-small'
WHEN
cell_plan['data_plan_01gb'] = "true"
AND (cell_plan['data_plan_05gb'] = "true")
AND (cell_plan['data_plan_10gb'] = "false" OR cell_plan['data_plan_10gb'] IS NULL)
THEN 'small'
WHEN
cell_plan['data_plan_01gb'] = "true"
AND (cell_plan['data_plan_05gb'] = "false" OR cell_plan['data_plan_05gb'] IS NULL)
AND (cell_plan['data_plan_10gb'] = "true")
THEN 'medium'
WHEN
(cell_plan['data_plan_01gb'] = "false" OR cell_plan['data_plan_01gb'] IS NULL)
AND (cell_plan['data_plan_05gb'] = "true")
AND (cell_plan['data_plan_10gb'] = "true")
THEN 'large'
WHEN
cell_plan['data_plan_01gb'] = "true"
AND (cell_plan['data_plan_05gb'] = "true")
AND (cell_plan['data_plan_10gb'] = "true")
THEN 'x-large'
END as total_plan_capacity,
_key as original_key,
stats_summary
FROM test_table
WHERE ARRAY_INCLUDES_ANY(MAP_KEYS(cell_plan), ["data_plan_01gb", "data_plan_05gb","data_plan_10gb"])
ORDER BY total_plan_capacity, original_key
Create a time-series-based index
Bigtable stores each cell value and its associated timestamp. By
using the WITH_HISTORY=>TRUE
flag in the FROM
clause, you can retrieve all
versions of data and their timestamps. The UNPACK
table function then expands
the query results so that each timestamped value appears on its own row with a
separate _timestamp
column. This lets you create an asynchronous secondary
index where the new row key is based on these timestamp values, enabling quick
lookups that are based on time ranges.
SELECT
_timestamp as stats_timestamp,
_key as original_key,
stats_summary
FROM UNPACK((
select
_key,
stats_summary
FROM
test_table (WITH_HISTORY=>TRUE)
))
ORDER BY stats_timestamp, original_key
Create an index entry for items and values in a JSON cell
Bigtable is often used as a key-value store for large-scale
workloads, including data in formats such as JSON. The following example shows
how to create an asynchronous secondary index for fast lookups of data that JSON
stores. The query uses JSON_EXTRACT
to extract the store_id
from a JSON
string within the transaction_information
column family in a table named
sales
. This index allows for quick retrieval of sales transactions by specific
store IDs.
SELECT
JSON_EXTRACT(transaction_information["json"], "$.store_id") as store_id,
_key as original_key,
transaction_information
from sales
ORDER BY store_id, original_key;