Bigtable Beam connector
The Bigtable Beam connector (BigtableIO
) is an open source Apache
Beam I/O connector that can help you perform batch and streaming
operations on Bigtable data in a pipeline using
Dataflow.
If you are migrating from HBase to Bigtable or you are running an
application uses the HBase API instead of the Bigtable
APIs, use the Bigtable HBase Beam connector
(CloudBigtableIO
) instead of the connector described on this page.
Connector details
The Bigtable Beam connector is a component of the
Apache Beam GitHub
repository. The Javadoc is available
at Class
BigtableIO
.
Before you create a Dataflow pipeline, check Apache Beam runtime support to make sure you are using a version of Java that is supported for Dataflow. Use the most recent supported release of Apache Beam.
The Bigtable Beam connector is used in conjunction with the Bigtable client for Java, a client library that calls the Bigtable APIs. You write code to deploy a pipeline that uses the connector to Dataflow, which handles the provisioning and management of resources and assists with the scalability and reliability of data processing.
For more information on the Apache Beam programming model, see the Beam documentation.
Batch write flow control
When you send batch writes to a table using the Bigtable Beam connector, you can enable batch write flow control. When this feature is enabled, Bigtable automatically does the following:
- Rate-limits traffic to avoid overloading your Bigtable cluster
- Ensures the cluster is under enough load to trigger Bigtable autoscaling (if enabled), so that more nodes are automatically added to the cluster when needed
For more information, see Batch write flow control. For a code sample, see Enable batch write flow control.
What's next
- Read an overview of Bigtable write requests.
- Review a list of Dataflow templates that work with Bigtable.