Integrations with Bigtable
This page describes integrations between Bigtable and other products and services.
Google Cloud services
This section describes the Google Cloud services that Bigtable integrates with.
BigQuery
BigQuery is Google's fully managed, petabyte-scale, low-cost analytics data warehouse. You can use BigQuery with Bigtable for the following purposes:
You can create a BigQuery external table and then use it to query your Bigtable table and join the data to other BigQuery tables. For more information, see Query Bigtable data.
You can export your BigQuery data to a Bigtable table by using reverse ETL (RETL) from BigQuery to Bigtable. For more information, see Export data to Bigtable.
Cloud Asset Inventory
Cloud Asset Inventory, which provides inventory services based on a time series database, supports and returns Bigtable resources types. For a complete list, see Supported resource types.
Data Catalog
Data Catalog is a feature of Dataplex that automatically catalogs metadata about Bigtable resources. Data Catalog information about your data can help facilitate analysis, data reuse, application development, and data management. For more information, see Manage data assets using Data Catalog.
Dataflow
Dataflow is a cloud service and programming model for big data processing. Dataflow supports both batch and streaming processing. You can use Dataflow to process data that is stored in Bigtable or to store the output of your Dataflow pipeline. You can also use Dataflow templates to export and import your data as Avro, Parquet, or SequenceFiles.
To get started, see Bigtable Beam connector.
You can also use Bigtable as a key-value lookup to enrich the data in a pipeline. For an overview, see Enrich streaming data. For a tutorial, see Use Apache Beam and Bigtable to enrich data.
Dataproc
Dataproc provides Apache Hadoop and related products as a managed service in the cloud. With Dataproc, you can run Hadoop jobs that read from and write to Bigtable.
For an example of a Hadoop MapReduce job that uses Bigtable, see
the /java/dataproc-wordcount
directory in the GitHub repository
GoogleCloudPlatform/cloud-bigtable-examples.
Vertex AI Vector Search
Vertex AI Vector Search is a technology that can search from billions of semantically similar or semantically related items. It's useful for implementing recommendation engines, chatbots, and text classification.
You can use Bigtable to store vector embeddings, export them into
a Vector Search index, and then query the index for similar items. For
a tutorial that demonstrates a sample workflow, see Bigtable to
Vertex AI Vector Search Export in the
workflows-demos
GitHub repository.
You can also push streaming updates to keep the vector search index in sync with Bigtable in real time. For more information, see the Bigtable change streams to Vector Search template.
Big Data
This section describes Big Data products that Bigtable integrates with.
Apache Beam
Apache Beam is a unified model for defining batch and streaming data-parallel
processing pipelines. The
Bigtable Beam connector (BigtableIO
)
helps you perform batch and streaming operations on Bigtable
data in a pipeline.
For a tutorial showing how to use the Bigtable Beam connector to deploy a data pipeline to Dataflow, see Process a Bigtable change stream.
Apache Hadoop
Apache Hadoop is a framework that enables distributed processing of large data sets across clusters of computers. You can use Dataproc to create a Hadoop cluster, then run MapReduce jobs that read from and write to Bigtable.
For an example of a Hadoop MapReduce job that uses Bigtable, see
the /java/dataproc-wordcount
directory in the GitHub repository
GoogleCloudPlatform/cloud-bigtable-examples.
StreamSets Data Collector
StreamSets Data Collector is a data-streaming application that you can configure to write data to Bigtable. StreamSets provides a Bigtable library in its GitHub repository at streamsets/datacollector.
Graph databases
This section describes graph databases that Bigtable integrates with.
HGraphDB
HGraphDB is a client layer for using Apache HBase or Bigtable as a graph database. It implements the Apache TinkerPop 3 interfaces.
For more information about running HGraphDB with Bigtable support, see the HGraphDB documentation.
JanusGraph
JanusGraph is a scalable graph database. It is optimized for storing and querying graphs containing hundreds of billions of vertices and edges.
For more information about running JanusGraph with Bigtable support, see Running JanusGraph with Bigtable or the JanusGraph documentation.
Infrastructure management
This section describes infrastructure management tools that Bigtable integrates with.
Pivotal Cloud Foundry
Pivotal Cloud Foundry is an application development and deployment platform that offers the ability to bind an application to Bigtable.
Terraform
Terraform is an open source tool that codifies APIs into declarative configuration files. These files can be shared among team members, treated as code, edited, reviewed, and versioned.
For more information about using Bigtable with Terraform, see Bigtable Instance and Bigtable Table in the Terraform documentation.
Time-series databases and monitoring
This section describes time-series databases and monitoring tools that Bigtable integrates with.
Heroic
Heroic is a monitoring system and time-series database. Heroic can use Bigtable to store its data.
For more information about Heroic, see the GitHub repository spotify/heroic, as well as the documentation for configuring Bigtable and configuring metrics.
OpenTSDB
OpenTSDB is a time-series database that can use Bigtable for storage. The OpenTSDB documentation provides information to help you get started.