Work with raster data

This document explains how to combine raster and vector data by using the ST_REGIONSTATS function to perform geospatial analysis in BigQuery.

Overview

A raster is a two-dimensional grid of pixels, each of which is assigned one or more values called bands. For example, each pixel could correspond to one particular square kilometer on the earth's surface and have bands for average temperature and average rainfall. Raster data includes satellite imagery and other continuous, grid-based data such as weather forecasts and land cover. Many common image formats, such as PNG or JPEG files, are formatted as raster data.

Raster data is often contrasted with vector data, in which the data is described by lines or curves rather than a fixed rectangular grid. For example, you can use the GEOGRAPHY data type in BigQuery to describe the boundaries of countries, cities, or other regions.

Geospatial raster and vector data is often combined using a zonal statistics operation, which computes an aggregate of all raster values within a given vector region. For example, you might want to compute the following:

  • Average air quality in each county in your state.
  • Solar potential for a collection of building polygons.
  • Fire risk summarized along power line corridors in forested areas.

BigQuery excels in processing vector data, and Google Earth Engine excels in processing raster data. You can use the ST_REGIONSTATS geography function to combine raster data using Earth Engine with your vector data stored in BigQuery.

Before you begin

  1. To use the ST_REGIONSTATS function in your queries, enable the Earth Engine API.

    Enable the API

  2. Optional: To subscribe to and use data published to Analytics Hub by using the ST_REGIONSTATS function, enable the Analytics Hub API.

    Enable the API

Required permissions

To get the permissions that you need to call the ST_REGIONSTATS function, ask your administrator to grant you the following IAM roles on your project:

For more information about granting roles, see Manage access to projects, folders, and organizations.

These predefined roles contain the permissions required to call the ST_REGIONSTATS function. To see the exact permissions that are required, expand the Required permissions section:

Required permissions

The following permissions are required to call the ST_REGIONSTATS function:

  • earthengine.computations.create
  • serviceusage.services.use
  • bigquery.datasets.create

You might also be able to get these permissions with custom roles or other predefined roles.

Find raster data

The raster_id parameter in the ST_REGIONSTATS function is a string that specifies the source of your raster data. The following sections explain how to find and format the raster ID.

Analytics Hub image tables

You can use Analytics Hub to discover and access raster datasets in BigQuery. To use Analytics Hub, you need to enable the Analytics Hub API and ensure that you have required permissions to view and subscribe to listings.

Google Earth Engine publishes publicly available datasets in Analytics Hub that contain raster data. To subscribe to an Earth Engine dataset with raster data, follow these steps:

  1. Go to the Analytics Hub page.

    Go to Analytics Hub

  2. Click Search listings.

  3. In the Search for listings field, enter "Google Earth Engine".

  4. Click a dataset that you want to subscribe to.

  5. Click Subscribe.

  6. Optional: Update the Project or Linked dataset name fields.

  7. Click Save. The linked dataset is added to your project.

The dataset contains a table, sometimes called an image table, that stores metadata for a collection of raster images following the STAC items specification. An image table is analogous to an Earth Engine image collection (ImageCollection).

Each row in the table corresponds to a single raster image. The raster ID for each image is stored in the assets.image.href column. To find a specific image, you can filter by image properties in the properties column. You can find information about the bands in the table description.

For example, the ERA5-Land dataset provides daily climate variable statistics and is publicly available through Analytics Hub. The climate table contains multiple raster IDs. The following query filters the image table using the properties.start_datetime column to get the raster ID for the image corresponding to January 1, 2025 and computes the average temperature for each country using the temperature_2m band:

WITH SimplifiedCountries AS (
  SELECT
    ST_SIMPLIFY(geometry, 10000) AS simplified_geometry,
    names.primary AS name
  FROM
    `bigquery-public-data.overture_maps.division_area`
  WHERE
    subtype = 'country'
)
SELECT
  sc.simplified_geometry AS geometry,
  sc.name,
  ST_REGIONSTATS(
    sc.simplified_geometry,
    (SELECT assets.image.href
     FROM `LINKED_DATASET_NAME.climate`
     WHERE  properties.start_datetime = '2025-01-01 00:00:00'),
    'temperature_2m'
  ).mean - 273.15 AS mean_temperature
FROM
  SimplifiedCountries AS sc
ORDER BY
  mean_temperature DESC;

Cloud Storage GeoTIFF

GeoTIFF is a common file format for storing geospatial raster data. The ST_REGIONSTATS function supports raster data stored in the GeoTIFF format in Cloud Storage buckets that are located in the US or us-central1 regions. Provide the Cloud Storage URI as the raster ID, such as gs://bucket/folder/raster.tif.

Earth Engine image assets

The ST_REGIONSTATS function supports passing an Earth Engine image asset path for the raster_id argument. Earth Engine raster data is available as individual images or collections of images. To find the raster ID for an image, follow these steps:

  1. Search the Earth Engine data catalog for the dataset that you're interested in.
  2. To open the description page for that entry, click the dataset name. The Earth Engine Snippet either describes a single image or a collection of images.

    If the Earth Engine snippet is of the form ee.Image('IMAGE_PATH'), then the raster ID is 'ee://IMAGE_PATH'.

    If the Earth Engine snippet is of the form ee.ImageCollection('IMAGE_COLLECTION_PATH'), you can use the Earth Engine Code Editor to filter the ImageCollection to a single image. Use the ee.Image.get('system:id') method to print the IMAGE_PATH value for that image to the console. The raster ID is 'ee://IMAGE_PATH'.

Pixel weights

You can specify a weight, sometimes referred to as a mask value, for the include parameter in the ST_REGIONSTATS function that determines how much to weight each pixel in calculations. Weight values must range from 0 to 1. Weights outside this range are set to the nearest limit, either 0 or 1.

A pixel is considered valid if it has a weight greater than 0. A weight of 0 indicates an invalid pixel. Invalid pixels usually represent missing or unreliable data, such as areas obscured by clouds, sensor anomalies, processing errors, or locations outside of a defined boundary.

If you don't specify a weight, each pixel is automatically weighted by the proportion of the pixel that falls within the geometry, allowing for proportional inclusion in zonal statistics.

If a partially intersecting pixel has a weight from the include argument to ST_REGIONSTATS, then BigQuery uses the minimum of that weight and the fraction of the pixel that intersects the region.

Weight values don't have the same precision as FLOAT64 values. In practice, their true value might differ from the value used in computations by up to 1/256 (about 0.4%).

You can provide an expression using Earth Engine image expression syntax in your include argument to dynamically weight pixels based on specific criteria within raster bands. For example, the following expression restricts calculations to pixels where the probability band exceeds 70%:

include => 'probability > 0.7'

If the dataset includes a weight-factor band, you can use it with the following syntax:

include => 'weight_factor_band_name'

Pixel size and scale of analysis

A geospatial raster image is a grid of pixels that corresponds to some location on the surface of the Earth. The pixel size of a raster, sometimes called the scale, is the nominal size of one edge of a pixel in the grid's coordinate reference system. For example, a raster with 10-meter resolution has pixels of size 10 meters by 10 meters. Original reported pixel size can vary dramatically between datasets, from less than 1 meter to greater than 20 kilometers.

When using the ST_REGIONSTATS function to compute zonal statistics, the pixel size of the raster data is a crucial consideration. For example, aggregating high-resolution raster data over the region of a country can be computationally intensive and unnecessarily granular. Conversely, aggregating low-resolution data over the region, such as city parcels, might not provide sufficient detail.

To get meaningful and efficient results from your analysis, we recommend choosing a pixel size appropriate for the size of your polygons and the objective of your analysis. You can find the pixel size for each raster dataset in the description section of image tables in Analytics Hub.

Changing the pixel size changes the number of pixels that intersect a given geography, which affects the results and their interpretation. We don't recommend changing the pixel size for production analyses. However, if you're prototyping a query, increasing the pixel size can reduce query runtime and cost, especially for high-resolution data.

To change the pixel size, set the scale in the options argument to the ST_REGIONSTATS function. For example, to compute statistics over 1,000-meter pixels, use options => JSON '{"scale":1000}', which tells Earth Engine to resample the image at the requested scale. To learn more about how Earth Engine handles rescaling, see Scale in the Google Earth Engine documentation.

Billing

When you run a query, usage of the ST_REGIONSTATS function is billed separately from the rest of the query because Earth Engine computes the results of the function call. You are billed for this usage in slot hours under the BigQuery Services SKU, regardless of whether you use on-demand billing or reservations. To see the amount billed for BigQuery calls to Earth Engine, view your billing report and use labels to filter by the label key goog-bq-feature-type, with value EARTH_ENGINE.

Cost factors

The following factors impact the compute usage when you run the ST_REGIONSTATS function:

  • The number of input rows.
  • The raster image that you use. Some rasters are composites created from source image collections in the Earth Engine data catalog, and the computational resources to produce the composite result varies.
  • The resolution of the image.
  • The size and complexity of the input geography, number of pixels that intersect the geography, and the number of image tiles and bytes read by Earth Engine.
  • The location of the input geography on Earth relative to the source images and the image's projection and resolution.

    • Image projections can warp pixels, especially pixels at high latitudes or far outside the image's intended coverage area.
    • For composite rasters, the number of source images intersecting the input geography can vary regionally and over time. For example, some satellites produce more images at low or high latitudes, depending on their orbit and data collection parameters, or may omit images depending on changing atmospheric conditions.
  • The use of formulas in the include or band_name arguments, and the number of bands they involve.

  • The caching of previous results.

Control costs

To control costs associated with the ST_REGIONSTATS function, you can adjust the quota that controls the amount of slot time that the function is allowed to consume. When you view your quotas, filter the Metric list to earthengine.googleapis.com/bigquery_slot_usage_time to see the Earth Engine quota associated with calls from BigQuery. For more information, read about BigQuery raster functions quotas in the Google Earth Engine documentation.

Limitations

Queries that call the ST_REGIONSTATS function must run in one of the following regions:

  • US
  • us-central1
  • us-central2

What's next