You might need to partition your BigQuery tables into smaller segments to improve query performance and control costs. Because Datastream doesn't support partitioning tables in BigQuery, you need to manually add the partitions before starting your stream. For general information about partitioning in BigQuery, see Introduction to partitioned tables.
Partition tables in BigQuery
To partition your tables in BigQuery, use one of the options described in the sections that follow, depending on your use case.
Option 1: The table already exists in BigQuery and is included in a stream
- Exclude the table from the source configuration of your stream. For more information about including and excluding objects from your source configuration, see Configure source databases.
- Wait a few minutes to ensure that Datastream has completed processing all events for the table.
- Create your partitioned table in BigQuery. If you want to keep the data that was already in the original BigQuery table, give the table a different, temporary name.
- Copy the data from the original table to the new partitioned table.
- Drop or rename the original table.
- Change the temporary name of the new table to the name of the original table.
- Add the source table to the configuration of your stream.
Option 2: The table doesn't exist in BigQuery
Create the table in BigQuery using one of the following approaches:
- Use the BigQuery Migration Toolkit.
Manually create a Datastream-compatible BigQuery table. For example, if you want to create a table and partition the data based on the
TIMESTAMP
column, you can use a query similar to the following:CREATE TABLE dataset.partitioned_table ( 'id' INT64, 'name' STRING 'update_date' DATETIME, 'datastream_metadata' STRUCT<'uuid' STRING, 'source_timestamp' INT64>, PRIMARY KEY ('id') NOT ENFORCED ) PARTITION BY TIMESTAMP(update_date)
After you create the partitioned table, make sure that its
max_staleness
value is set according to your requirements. If you don't set the value, the default value of0
is set. For more information, see Use BigQuery tables with themax_staleness
option.Optionally, if you've set manual backfill for the stream, initiate backfill for the table.