Work with raster data
This document
explains how to combine raster and vector data by using the
ST_REGIONSTATS
function
to perform geospatial analysis in BigQuery.
Overview
A raster is a two-dimensional grid of pixels, each of which is assigned one or more values called bands. For example, each pixel could correspond to one particular square kilometer on the earth's surface and have bands for average temperature and average rainfall. Raster data includes satellite imagery and other continuous, grid-based data such as weather forecasts and land cover. Many common image formats, such as PNG or JPEG files, are formatted as raster data.
Raster data
is often contrasted with vector data, in which the data is described by
lines or curves rather than a fixed rectangular grid. For
example, you can use the GEOGRAPHY
data type in BigQuery to
describe the boundaries of countries, cities, or other regions.
Geospatial raster and vector data is often combined using a zonal statistics operation, which computes an aggregate of all raster values within a given vector region. For example, you might want to compute the following:
- Average air quality in each county in your state.
- Solar potential for a collection of building polygons.
- Fire risk summarized along power line corridors in forested areas.
BigQuery excels in processing vector data, and Google Earth Engine
excels in processing raster data. You can use the
ST_REGIONSTATS
geography function
to combine raster data using Earth Engine with your vector data stored
in BigQuery.
Before you begin
To use the
ST_REGIONSTATS
function in your queries, enable the Earth Engine API.Optional: To subscribe to and use data published to Analytics Hub by using the
ST_REGIONSTATS
function, enable the Analytics Hub API.
Required permissions
To get the permissions that
you need to call the ST_REGIONSTATS
function,
ask your administrator to grant you the
following IAM roles on your project:
-
Earth Engine Resource Viewer (
roles/earthengine.viewer
) -
Service Usage Consumer (
roles/serviceusage.serviceUsageConsumer
) -
Subscribe to datasets in Analytics Hub:
BigQuery Data Editor (
roles/bigquery.dataEditor
)
For more information about granting roles, see Manage access to projects, folders, and organizations.
These predefined roles contain
the permissions required to call the ST_REGIONSTATS
function. To see the exact permissions that are
required, expand the Required permissions section:
Required permissions
The following permissions are required to call the ST_REGIONSTATS
function:
-
earthengine.computations.create
-
serviceusage.services.use
-
bigquery.datasets.create
You might also be able to get these permissions with custom roles or other predefined roles.
Find raster data
The raster_id
parameter in the ST_REGIONSTATS
function is a string that
specifies the source of your raster data. The following sections explain how to
find and format the raster ID.
Analytics Hub image tables
You can use Analytics Hub to discover and access raster datasets in BigQuery. To use Analytics Hub, you need to enable the Analytics Hub API and ensure that you have required permissions to view and subscribe to listings.
Google Earth Engine publishes publicly available datasets in Analytics Hub that contain raster data. To subscribe to an Earth Engine dataset with raster data, follow these steps:
Go to the Analytics Hub page.
Click
Search listings.In the Search for listings field, enter
"Google Earth Engine"
.Click a dataset that you want to subscribe to.
Click Subscribe.
Optional: Update the Project or Linked dataset name fields.
Click Save. The linked dataset is added to your project.
The dataset contains a table, sometimes called an image table, that
stores metadata for a collection of raster images following
the STAC items specification. An image table is
analogous to an Earth Engine image collection
(ImageCollection
).
Each row in the table corresponds to a single raster image.
The raster ID for each image is stored in the
assets.image.href
column. To find a specific image, you can filter by
image properties in the properties
column. You can find information
about the bands in the table description.
For example, the ERA5-Land dataset provides daily climate variable statistics
and is publicly available through Analytics Hub. The climate
table contains multiple raster IDs. The following query filters the image table
using the properties.start_datetime
column to get the raster ID for the
image corresponding to January 1, 2025 and computes the average temperature
for each country using the temperature_2m
band:
WITH SimplifiedCountries AS (
SELECT
ST_SIMPLIFY(geometry, 10000) AS simplified_geometry,
names.primary AS name
FROM
`bigquery-public-data.overture_maps.division_area`
WHERE
subtype = 'country'
)
SELECT
sc.simplified_geometry AS geometry,
sc.name,
ST_REGIONSTATS(
sc.simplified_geometry,
(SELECT assets.image.href
FROM `LINKED_DATASET_NAME.climate`
WHERE properties.start_datetime = '2025-01-01 00:00:00'),
'temperature_2m'
).mean - 273.15 AS mean_temperature
FROM
SimplifiedCountries AS sc
ORDER BY
mean_temperature DESC;
Cloud Storage GeoTIFF
GeoTIFF is a common file format for storing geospatial raster data. The
ST_REGIONSTATS
function supports raster data stored in the GeoTIFF format in
Cloud Storage buckets that are located in the US
or us-central1
regions. Provide the Cloud Storage URI as the raster ID, such as
gs://bucket/folder/raster.tif
.
Earth Engine image assets
The ST_REGIONSTATS
function supports passing an Earth Engine
image asset path for the raster_id
argument. Earth Engine raster
data is available as individual images or collections of images. To find the
raster ID for an image, follow these steps:
- Search the Earth Engine data catalog for the dataset that you're interested in.
To open the description page for that entry, click the dataset name. The Earth Engine Snippet either describes a single image or a collection of images.
If the Earth Engine snippet is of the form
ee.Image('IMAGE_PATH')
, then the raster ID is'ee://IMAGE_PATH'
.If the Earth Engine snippet is of the form
ee.ImageCollection('IMAGE_COLLECTION_PATH')
, you can use the Earth Engine Code Editor to filter the ImageCollection to a single image. Use theee.Image.get('system:id')
method to print theIMAGE_PATH
value for that image to the console. The raster ID is'ee://IMAGE_PATH'
.
Pixel weights
You can specify a weight, sometimes referred to as a mask value,
for the include
parameter in the ST_REGIONSTATS
function that determines how much to
weight each pixel in calculations. Weight values must range from 0 to 1.
Weights outside this range are set to the nearest limit, either 0 or 1.
A pixel is considered valid if it has a weight greater than 0. A weight of 0 indicates an invalid pixel. Invalid pixels usually represent missing or unreliable data, such as areas obscured by clouds, sensor anomalies, processing errors, or locations outside of a defined boundary.
If you don't specify a weight, each pixel is automatically weighted by the proportion of the pixel that falls within the geometry, allowing for proportional inclusion in zonal statistics.
If a partially intersecting pixel has a weight from the include
argument to ST_REGIONSTATS
, then BigQuery uses the minimum of
that weight and the fraction of the pixel that intersects the region.
Weight values don't have the same precision as FLOAT64
values. In practice,
their true value might differ from the value used in computations by up to
1/256 (about 0.4%).
You can provide an expression using Earth Engine
image expression syntax
in your include
argument to dynamically weight
pixels based on specific criteria within raster bands. For example, the
following expression restricts calculations to pixels where the probability
band exceeds 70%:
include => 'probability > 0.7'
If the dataset includes a weight-factor band, you can use it with the following syntax:
include => 'weight_factor_band_name'
Pixel size and scale of analysis
A geospatial raster image is a grid of pixels that corresponds to some location on the surface of the Earth. The pixel size of a raster, sometimes called the scale, is the nominal size of one edge of a pixel in the grid's coordinate reference system. For example, a raster with 10-meter resolution has pixels of size 10 meters by 10 meters. Original reported pixel size can vary dramatically between datasets, from less than 1 meter to greater than 20 kilometers.
When using the ST_REGIONSTATS
function to compute zonal statistics, the pixel
size of the raster data is a crucial consideration. For example, aggregating
high-resolution raster data over the region of a country can be computationally
intensive and unnecessarily granular. Conversely, aggregating low-resolution
data over the region, such as city parcels, might not provide sufficient detail.
To get meaningful and efficient results from your analysis, we recommend choosing a pixel size appropriate for the size of your polygons and the objective of your analysis. You can find the pixel size for each raster dataset in the description section of image tables in Analytics Hub.
Changing the pixel size changes the number of pixels that intersect a given geography, which affects the results and their interpretation. We don't recommend changing the pixel size for production analyses. However, if you're prototyping a query, increasing the pixel size can reduce query runtime and cost, especially for high-resolution data.
To change the pixel size, set the
scale
in the options
argument to the ST_REGIONSTATS
function. For example,
to compute statistics over 1,000-meter pixels, use
options => JSON '{"scale":1000}'
Billing
When you run a query, usage of the ST_REGIONSTATS
function is billed
separately from the rest of the query because Earth Engine computes
the results of the function call. You are
billed for this usage in slot hours under the BigQuery Services SKU,
regardless of whether you use on-demand billing or reservations. To see the
amount billed for BigQuery calls to Earth Engine,
view your billing report
and use labels to filter by
the label key goog-bq-feature-type
, with value EARTH_ENGINE
.
Cost factors
The following factors impact the compute usage when you run the ST_REGIONSTATS
function:
- The number of input rows.
- The raster image that you use. Some rasters are composites created from source image collections in the Earth Engine data catalog, and the computational resources to produce the composite result varies.
- The resolution of the image.
- The size and complexity of the input geography, number of pixels that intersect the geography, and the number of image tiles and bytes read by Earth Engine.
The location of the input geography on Earth relative to the source images and the image's projection and resolution.
- Image projections can warp pixels, especially pixels at high latitudes or far outside the image's intended coverage area.
- For composite rasters, the number of source images intersecting the input geography can vary regionally and over time. For example, some satellites produce more images at low or high latitudes, depending on their orbit and data collection parameters, or may omit images depending on changing atmospheric conditions.
The use of formulas in the
include
orband_name
arguments, and the number of bands they involve.The caching of previous results.
Control costs
To control costs associated with the ST_REGIONSTATS
function, you can adjust
the quota that controls the amount of slot time that the function is
allowed to consume. When you view your quotas,
filter the Metric list to
earthengine.googleapis.com/bigquery_slot_usage_time
to see the Earth Engine quota associated with calls from
BigQuery. For more information, read about
BigQuery
raster functions quotas
in the Google Earth Engine documentation.
Limitations
Queries that call the ST_REGIONSTATS
function must run in one of the following
regions:
US
us-central1
us-central2
What's next
- Try the tutorial that shows you how to use raster data to analyze temperature.
- Learn more about geography functions in BigQuery.
- Learn more about working with geospatial data.