Use primary and foreign keys

Primary keys and foreign keys are table constraints that can help with query optimization. This document explains how to create, view, and manage constraints, and use them to optimize your queries.

BigQuery supports the following key constraints:

  • Primary key: A primary key for a table is a combination of one or more columns that is unique for each row and not NULL.
  • Foreign key: A foreign key for a table is a combination of one or more columns that is present in the primary key column of a referenced table, or is NULL.

Primary and foreign keys are typically used to ensure data integrity and enable query optimization. BigQuery doesn't enforce primary and foreign key constraints. When you declare constraints on your tables, you must ensure that your data conforms to them. BigQuery can use table constraints to optimize your queries.

Manage constraints

Primary and foreign key relationships can be created and managed through the following DDL statements:

You can also manage table constraints through the BigQuery API by updating the TableConstraints object.

View constraints

The following views give you information about your table constraints:

Optimize queries

When you create and enforce primary and foreign keys on your tables, BigQuery can use that information to eliminate or optimize certain query joins. While it's possible to mimic these optimizations by rewriting your queries, such rewrites aren't always practical.

In a production environment, you might create views that join many fact and dimension tables. Developers can query the views instead of querying the underlying tables and manually rewriting the joins each time. If you define the proper constraints, join optimizations happen automatically for any queries they apply to.

The examples in the following sections reference the store_sales and customer tables with constraints:

CREATE TABLE mydataset.customer (customer_name STRING PRIMARY KEY NOT ENFORCED);

CREATE TABLE mydataset.store_sales (
    item STRING PRIMARY KEY NOT ENFORCED,
    sales_customer STRING REFERENCES mydataset.customer(customer_name) NOT ENFORCED,
    category STRING);

Eliminate inner joins

Consider the following query that contains an INNER JOIN:

SELECT ss.*
FROM mydataset.store_sales AS ss
    INNER JOIN mydataset.customer AS c
    ON ss.sales_customer = c.customer_name;

The customer_name column is a primary key on the customer table, so each row from the store_sales table has either a single match, or no match if sales_customer is NULL. Since the query only selects columns from the store_sales table, the query optimizer can eliminate the join and rewrite the query as the following:

SELECT *
FROM mydataset.store_sales
WHERE sales_customer IS NOT NULL;

Eliminate outer joins

To remove a LEFT OUTER JOIN, the join keys on the right side must be unique and only columns from the left side are selected. Consider the following query:

SELECT ss.*
FROM mydataset.store_sales ss
    LEFT OUTER JOIN mydataset.customer c
    ON ss.category = c.customer_name;

In this example, there is no relationship between category and customer_name. The selected columns only come from the store_sales table and the join key customer_name is a primary key on the customer table, so each value is unique. This means that there is exactly one (possibly NULL) match in the customer table for each row in the store_sales table and the LEFT OUTER JOIN can be eliminated:

SELECT ss.*
FROM mydataset.store_sales;

Reorder joins

When BigQuery can't eliminate a join, it can use table constraints to get information about join cardinalities and optimize the order in which to perform joins.

Limitations

Primary keys and foreign keys are subject to the following limitations:

  • Key constraints are unenforced in BigQuery. You are responsible for maintaining the constraints at all times. Queries over tables with violated constraints might return incorrect results.
  • Primary keys can't exceed 16 columns.
  • Foreign keys must have values that are present in the referenced table column. These values can be NULL.
  • Primary keys and foreign keys must be of one of the following types: BIGNUMERIC, BOOLEAN, DATE, DATETIME, INT64, NUMERIC, STRING, or TIMESTAMP.
  • Primary keys and foreign keys can only be set on top-level columns.
  • Primary keys can't be named.
  • Tables with primary key constraints can't be renamed.
  • A table can have up to 64 foreign keys.
  • A foreign key can't refer to a column in the same table.
  • Fields that are part of primary key constraints or foreign key constraints can't be renamed, or have their type changed.
  • If you copy, clone, restore, or snapshot a table without the -a or --append_table option, the source table constraints are copied and overwritten to the destination table. If you use the -a or --append_table option, only the source table records are added to the destination table without the table constraints.

What's next