Dataflow managed I/O for BigQuery

Managed I/O supports the following capabilities for BigQuery:

Requirements

The following SDKs support managed I/O for BigQuery:

  • Apache Beam SDK for Java version 2.61.0 or later
  • Apache Beam SDK for Python version 2.61.0 or later

Configuration

Managed I/O for BigQuery supports the following configuration parameters:

BIGQUERY Read

Configuration Type Description
kms_key str Use this Cloud KMS key to encrypt your data
query str The SQL query to be executed to read from the BigQuery table.
row_restriction str Read only rows that match this filter, which must be compatible with Google standard SQL. This is not supported when reading via query.
fields list[str] Read only the specified fields (columns) from a BigQuery table. Fields may not be returned in the order specified. If no value is specified, then all fields are returned. Example: "col1, col2, col3"
table str The fully-qualified name of the BigQuery table to read from. Format: [${PROJECT}:]${DATASET}.${TABLE}

BIGQUERY Write

Configuration Type Description
table str The bigquery table to write to. Format: [${PROJECT}:]${DATASET}.${TABLE}
drop list[str] A list of field names to drop from the input record before writing. Is mutually exclusive with 'keep' and 'only'.
keep list[str] A list of field names to keep in the input record. All other fields are dropped before writing. Is mutually exclusive with 'drop' and 'only'.
kms_key str Use this Cloud KMS key to encrypt your data
only str The name of a single record field that should be written. Is mutually exclusive with 'keep' and 'drop'.
triggering_frequency_seconds int64 Determines how often to 'commit' progress into BigQuery. Default is every 5 seconds.

What's next

For more information and code examples, see the following topics: