Dataflow managed I/O for Apache Iceberg

Managed I/O supports the following capabilities for Apache Iceberg:

Catalog Batch read Batch write Streaming write Dynamic table creation Dynamic destinations
Hadoop Supported Supported Supported Supported Supported
Hive Supported Supported Supported Supported Supported
REST-based catalogs Supported Supported Supported Supported Supported
BigQuery metastore Supported Supported Supported Supported Supported

For BigQuery tables for Apache Iceberg, use the BigQueryIO connector with BigQuery Storage API. The table must already exist; dynamic table creation is not supported.

Requirements

Requires Apache Beam SDK for Java version 2.58.0 or later.

Configuration

Managed I/O uses the following configuration parameters for Apache Iceberg:

Read and write configuration Data type Description
table string The identifier of the Apache Iceberg table. Example: "db.table1".
catalog_name string The name of the catalog. Example: "local".
catalog_properties map A map of configuration properties for the Apache Iceberg catalog. The required properties depend on the catalog. For more information, see CatalogUtil in the Apache Iceberg documentation.
config_properties map An optional set of Hadoop configuration properties. For more information, see CatalogUtil in the Apache Iceberg documentation.
Write configuration Data type Description
triggering_frequency_seconds integer For streaming write pipelines, the frequency at which the sink attempts to produce snapshots, in seconds.

For more information and code examples, see the following topics: