Integrations with Bigtable

This page describes integrations between Bigtable and other products and services.

Google Cloud services

This section describes the Google Cloud services that Bigtable integrates with.

BigQuery

BigQuery is Google's fully managed, petabyte-scale, low-cost analytics data warehouse. You can use BigQuery with Bigtable for the following purposes:

You can create a BigQuery external table and then use it to query your Bigtable table and join the data to other BigQuery tables. For more information, see Query Bigtable data.
You can export your BigQuery data to a Bigtable table by using reverse ETL (RETL) from BigQuery to Bigtable. For more information, see Export data to Bigtable.

Cloud Asset Inventory

Cloud Asset Inventory, which provides inventory services based on a time series database, supports and returns Bigtable resources types. For a complete list, see Supported resource types.

Dataplex Universal Catalog

Dataplex Universal Catalog and Data Catalog (deprecated) automatically catalog metadata about Bigtable resources. Cataloged information about your data can help facilitate analysis, data reuse, application development, and data management. For more information, see Manage data assets using Data Catalog.

Dataflow

Dataflow is a cloud service and programming model for big data processing. Dataflow supports both batch and streaming processing. You can use Dataflow to process data that is stored in Bigtable or to store the output of your Dataflow pipeline. You can also use Dataflow templates to export and import your data as Avro, Parquet, or SequenceFiles.

To get started, see Bigtable Beam connector.

You can also use Bigtable as a key-value lookup to enrich the data in a pipeline. For an overview, see Enrich streaming data. For a tutorial, see Use Apache Beam and Bigtable to enrich data.

Dataproc

Dataproc provides Apache Hadoop and related products as a managed service in the cloud. With Dataproc, you can run Hadoop jobs that read from and write to Bigtable.

For an example of a Hadoop MapReduce job that uses Bigtable, see the /java/dataproc-wordcount directory in the GitHub repository GoogleCloudPlatform/cloud-bigtable-examples.

Vertex AI Vector Search

Vertex AI Vector Search is a technology that can search from billions of semantically similar or semantically related items. It's useful for implementing recommendation engines, chatbots, and text classification.

You can use Bigtable to store vector embeddings, export them into a Vector Search index, and then query the index for similar items. For a tutorial that demonstrates a sample workflow, see Bigtable to Vertex AI Vector Search Export in the workflows-demos GitHub repository.

You can also push streaming updates to keep the vector search index in sync with Bigtable in real time. For more information, see the Bigtable change streams to Vector Search template.

Big Data

This section describes Big Data products that Bigtable integrates with.

Apache Beam

Apache Beam is a unified model for defining batch and streaming data-parallel processing pipelines. The Bigtable Beam connector (BigtableIO) helps you perform batch and streaming operations on Bigtable data in a pipeline.

For a tutorial showing how to use the Bigtable Beam connector to deploy a data pipeline to Dataflow, see Process a Bigtable change stream.

Apache Hadoop

Apache Hadoop is a framework that enables distributed processing of large data sets across clusters of computers. You can use Dataproc to create a Hadoop cluster, then run MapReduce jobs that read from and write to Bigtable.

For an example of a Hadoop MapReduce job that uses Bigtable, see the /java/dataproc-wordcount directory in the GitHub repository GoogleCloudPlatform/cloud-bigtable-examples.

StreamSets Data Collector

StreamSets Data Collector is a data-streaming application that you can configure to write data to Bigtable. StreamSets provides a Bigtable library in its GitHub repository at streamsets/datacollector.

Graph databases

This section describes graph databases that Bigtable integrates with.

HGraphDB

HGraphDB is a client layer for using Apache HBase or Bigtable as a graph database. It implements the Apache TinkerPop 3 interfaces.

For more information about running HGraphDB with Bigtable support, see the HGraphDB documentation.

JanusGraph

JanusGraph is a scalable graph database. It is optimized for storing and querying graphs containing hundreds of billions of vertices and edges.

For more information about running JanusGraph with Bigtable support, see Running JanusGraph with Bigtable or the JanusGraph documentation.

Infrastructure management

This section describes infrastructure management tools that Bigtable integrates with.

Pivotal Cloud Foundry

Pivotal Cloud Foundry is an application development and deployment platform that offers the ability to bind an application to Bigtable.

Terraform

Terraform is an open source tool that codifies APIs into declarative configuration files. These files can be shared among team members, treated as code, edited, reviewed, and versioned.

For more information about using Bigtable with Terraform, see Bigtable Instance and Bigtable Table in the Terraform documentation.

Time-series databases and monitoring

This section describes time-series databases and monitoring tools that Bigtable integrates with.

OpenTSDB

OpenTSDB is a time-series database that can use Bigtable for storage. The OpenTSDB documentation provides information to help you get started.