Introduction to federated queries
This page introduces how to use federated queries and provides guidance on querying Spanner, AlloyDB, and Cloud SQL data from BigQuery.
Federated queries let you send a query statement to AlloyDB, Spanner, or Cloud SQL databases
and get the result back as a temporary table. Federated queries use the
BigQuery Connection API to establish a connection with AlloyDB, Spanner, or Cloud SQL.
In your query, you use the EXTERNAL_QUERY
function to send a
query statement to the external database, using that database's SQL dialect.
The results are converted to GoogleSQL data types.
Supported data stores
You can use federated queries with the following data stores:
Workflow
- Identify the Google Cloud project that includes the data source that you want to query.
- A
bigquery.admin
user creates a connection resource in BigQuery. - The admin user grants permission to use the connection resource
to user B.
- If the admin and user B are the same person, there is no need to grant permission.
- User B writes a query in BigQuery with the new
EXTERNAL_QUERY
SQL function.
Alternatives to federated queries: external tables and datasets
Another option to query operational databases such as Bigtable, Spanner,
Cloud Storage, Google Drive, and Salesforce Data Cloud, is to use
external tables and datasets. External datasets and tables let you view tables
and their schemas and query them without using an
EXTERNAL_QUERY
SQL function. You don't have to bring data back into BigQuery and
you can use the BigQuery syntax instead of writing in the specific
SQL database dialect of SQL.
Supported regions
Federated queries are only supported in regions that support both the external data source and BigQuery. For a list of supported locations, see the following sections:
- Spanner regional and multi-regional configurations.
- Cloud SQL instance location.
- AlloyDB locations.
- BigQuery dataset locations.
You can create a connection and run a federated query across regions according to the following rules:
Single regions
A BigQuery single region can only query a resource in the same region.
For example, if your dataset is in us-east4
, you can query
Cloud SQLinstances, AlloyDB instances, or Spanner databases that are
located in us-east4
. The query processing location
is the BigQuery single region.
Multi-regions
A BigQuery multi-region can query any data source region in the same large geographic area (US, EU). Multi-regional locations aren't available for Cloud SQL instances, because these are only used for backups. A BigQuery multi-region can also query a Spanner instance in the same multi-region.
A query that runs in the BigQuery US multi-region can query any single region in the US geographic area, such as
us-central1
,us-east4
, orus-west2
.A query that runs in the BigQuery EU multi-region can query any single region in member states of the European Union, such as
europe-north1
oreurope-west3
.The location where the query runs must be the same as the location of the connection resource. For example, queries executed from the US multi-region must use a connection located in the US multi-region.
The query performance varies based on the proximity between the dataset and the
external data source. For example, a federated query between a dataset in
the US multi-region and a Cloud SQL instance in us-central1
is
fast. However, if you run the same query between the US multi-region
and a Cloud SQL instance in us-east4
, the performance might be slower.
The
query processing location
is the multi-region location, either US
or EU
.
Data type mappings
When you execute a federated query, the data from the external data source is converted to GoogleSQL types. For more information, see Cloud SQL federated queries.
Quotas and limits
- Cross-region federated querying. If the
BigQuery query processing location
and the external data source location are different, this is a cross-region
query. You can run up to 1 TB in cross-region queries per project per
day. The following is an example of a cross-region query.
- The Cloud SQL instance is in
us-west1
while the BigQuery connection is based in the US multi-region. The BigQuery query processing location isUS
.
- The Cloud SQL instance is in
- Quota. Users should control query quota in the external data source, such as Cloud SQL or AlloyDB. There is no extra quota setting for federated querying. To achieve workload isolation, it's recommended to only query a database read replica.
- Maximum bytes billed allowed. This field isn't supported for federated queries. Calculating the bytes billed before actually executing the federated queries isn't possible.
- Number of connections. A federated query can have at most 10 unique connections.
- Cloud SQL MySQL and PostgreSQL. Quotas and limitations apply.
Limitations
Federated queries are subject to the following limitations:
Performance. A federated query is likely to not be as fast as querying only BigQuery storage. BigQuery needs to wait for the source database to execute the external query and temporarily move data from the external data source to BigQuery. Also, the source database might not be optimized for complex analytical queries.
The query performance also varies based on the proximity between the dataset and the external data source. For more information, see Supported regions.
Federated queries are read-only. The external query that is executed in the source database must be read-only. Therefore, DML or DDL statements are not supported.
Unsupported data types. If your external query contains a data type that is unsupported in BigQuery, the query fails immediately. You can cast the unsupported data type to a different supported data type.
Project. You must create the connection resource in the same project as the Cloud SQL or AlloyDB instance.
Pricing
If you are using the on-demand pricing model, you are charged for the number of bytes returned from the external query when executing federated queries from BigQuery. For more information, see On-demand analysis pricing.
If you are using BigQuery editions, you are charged based on the number of slots you use. For more information, see Capacity compute pricing.
SQL pushdowns
Federated queries are subject to the optimization technique known as SQL pushdowns.
They improve the performance of a query by delegating operations like filtering down
to the external data source instead of performing them in BigQuery.
Reducing the amount of data transferred from the external data source can reduce
query execution time and lower costs. SQL pushdowns include column pruning
(SELECT
clauses) and filter pushdowns (WHERE
clauses).
When you use theEXTERNAL_QUERY
function, SQL pushdowns work by rewriting the original query.
In the following example, the EXTERNAL_QUERY
function is used to communicate with a Cloud SQL database:
SELECT COUNT(*)
FROM (
SELECT * FROM EXTERNAL_QUERY("<connection>", "select * from operations_table")
)
WHERE a = 'Y' AND b NOT IN ('COMPLETE','CANCELLED');
Without SQL pushdowns, the following query is sent to Cloud SQL:
SELECT *
FROM operations_table
When this query is executed, the entire table is sent back to BigQuery, even though only some rows and columns are needed.
With SQL pushdowns, the following query is sent to Cloud SQL:
SELECT `a`, `b`
FROM (
SELECT * FROM operations_table) t
WHERE ((`a` = 'Y') AND (NOT `b` IN ('COMPLETE', 'CANCELLED'))
When this query is executed, only two columns and the rows that match the filtering predicate are sent back to BigQuery.
SQL pushdowns are also applied when running federated queries with Spanner external datasets.
You can examine applied pushdowns (if any) in the query plan.
Limitations
SQL pushdowns have various limitations that vary depending on the external data source and on the way you query data.
Limitations for query federation when using EXTERNAL_QUERY
- SQL pushdowns are only applied to federated queries of the form
SELECT * FROM T
. - Only column pruning and filter pushdowns are supported. Specifically compute, join, limit, order by and aggregation pushdowns aren't supported.
- For filter pushdowns, literals must be
of one of the following types:
BOOL
,INT64
,FLOAT64
,STRING
,DATE
,DATETIME
,TIMESTAMP
. Literals that are structs aren't supported. - SQL function pushdowns are applied only for functions that are supported by both BigQuery and a destination database.
- SQL pushdowns are only supported for AlloyDB, Cloud SQL, and Spanner.
- SQL pushdowns aren't supported for SAP Datasphere.
Limitations for query federation when using Spanner external datasets
- Column pruning, filter, compute and partial aggregation pushdowns are supported. Specifically join, limit and order by aggregation aren`t not supported.
- For filter pushdowns, literals must be one of the following types:
BOOL
,INT64
,FLOAT64
,STRING
,DATE
,DATETIME
,TIMESTAMP
,BYTE
or Arrays. Literals that are structs aren't supported. - SQL function pushdowns are applied only for functions that are supported by both BigQuery and Spanner.
Supported functions by data source
The following are supported SQL functions by data source. No functions are supported for SAP Datasphere.
Cloud SQL MySQL
- Logical operators:
AND
,OR
,NOT
. - Comparison operators:
=
,>
,>=
,<
,<=
,<>
,IN
,BETWEEN
,IS NULL
. - Arithmetic operators:
+
,-
,*
(only forINT64
andFLOAT64
).
Cloud SQL PostgreSQL and AlloyDB
- Logical operators:
AND
,OR
,NOT
. - Comparison operators:
=
,>
,>=
,<
,<=
,<>
,IN
,BETWEEN
,IS NULL
. - Arithmetic operators:
+
,-
,*
,/
(only forINT64
,FLOAT64
, andDATE
types, except forDATE
subtraction).
Spanner - PostgreSQL dialect
- Logical operators:
AND
,OR
,NOT
. - Comparison operators:
=
,>
,>=
,<
,<=
,<>
,IN
,BETWEEN
,IS NULL
. - Arithmetic operators:
+
,-
,*
,/
(only forINT64
,FLOAT64
,NUMERIC
).
Spanner - GoogleSQL dialect
- Logical operators:
AND
,OR
,NOT
. - Comparison operators:
=
,>
,>=
,<
,<=
,<>
,IN
,BETWEEN
,IS NULL
. - Arithmetic operators:
+
,-
,*
,/
(only forINT64
,FLOAT64
,NUMERIC
). - Safe arithmetic operators:
SAFE_ADD
,SAFE_SUBTRACT
,SAFE_MULTIPLY
,SAFE_DIVIDE
(only forINT64
,FLOAT64
,NUMERIC
). - When using external datasets, additionally:
- Compute pushdown,
- Partial Aggregate pushdown,
- String functions,
- Math functions,
- Cast functions,
- Array functions.
What's next
- Learn how to query Spanner data.
- Learn how to create Spanner external datasets.
- Learn how to query Cloud SQL data.
- Learn how to query AlloyDB data.
- Learn how to query SAP Datasphere data