Connectors overview

This document provides an overview of Kafka Connect connectors in Google Cloud. Discover when to use each connector type to manage and integrate your data streams.

These connectors use the Kafka Connect framework to integrate Apache Kafka with other applications. They ingest and replicate data between your Kafka clusters and applications. The available connector types include:

  • MirrorMaker 2.0 connectors

    • Source connector

    • Checkpoint connector

    • Heartbeat connector

  • BigQuery Sink connector

  • Cloud Storage Sink connector

  • Pub/Sub Source connector

  • Pub/Sub Sink connector

MirrorMaker 2.0 connectors are specifically designed for data replication and disaster recovery between Kafka clusters. They facilitate the mirroring of data from one Kafka cluster to another, enabling high availability and fault tolerance.

MirrorMaker 2.0 connectors can establish connections between Managed Service for Apache Kafka clusters and other Managed Service for Apache Kafka clusters or self-managed Kafka clusters.

The other Sink and Source connectors serve to integrate Kafka with various Google Cloud services. These connectors enable data transfer between Managed Service for Apache Kafka clusters and Google Cloud services, such as BigQuery, Cloud Storage, or Pub/Sub.

Before you begin

Before you explore and create connectors, ensure you have the following understanding and prerequisites:

When to use MirrorMaker 2.0

Use MirrorMaker 2.0 connectors in the following scenarios:

  • Migrate data: Move your Kafka workload to a new Managed Service for Apache Kafka cluster.

  • Recover from disasters: Create a backup cluster to ensure business continuity in case of failures.

  • Aggregate data: Consolidate data from multiple Kafka clusters into a central Managed Service for Apache Kafka cluster for analytical purposes.

MirrorMaker 2.0 key features

  • Replicates all necessary components, including topics, data, configurations, consumer groups with offsets, and ACLs.
  • Maintains the same partitioning scheme in the target cluster, which simplifies the transition for applications.
  • Automatically detects and replicates new topics and partitions, minimizing manual configuration.
  • Provides essential metrics, like end-to-end replication latency, that let you track the health and performance of the replication process.
  • Ensures reliable operation, even with high data volumes, and can be scaled horizontally to handle increasing workloads.
  • Uses internal topics for offset synchronization, checkpoints, and heartbeats. These topics have configurable replication factors, such as offset.syncs.topic.replication.factor, to ensure high availability and fault tolerance.

Use the MirrorMaker 2.0 Source connector

The MirrorMaker 2.0 Source connector replicates topics and data from one Kafka cluster (the source) to another Kafka cluster (the target).

Source Target
Managed Service for Apache Kafka cluster Managed Service for Apache Kafka cluster
Managed Service for Apache Kafka cluster External or self-managed Kafka cluster
External or self-managed Kafka cluster Managed Service for Apache Kafka cluster

The MirrorMaker 2.0 Source connector supports these migration scenarios:

  • Replicate or migrate data from an external or self-managed Kafka cluster into a Managed Service for Apache Kafka cluster

  • Replicate or migrate data from a Managed Service for Apache Kafka cluster to an external or self-managed Kafka cluster.

  • Replicate Kafka data across regions to satisfy disaster recovery and high availability requirements.

Use the MirrorMaker 2.0 Checkpoint connector

The usage of the MirrorMaker 2.0 Checkpoint connector is optional. It copies the consumer offsets, which indicate the last successfully consumed message. This process ensures that consumers on the target cluster can resume processing from the same point as the source cluster.

This connector is not required for the MirrorMaker 2.0 Source connector to function. This connector is only necessary if you require synchronizing ConsumerGroup state for minimal downtime during a switch from the source to the target cluster. If you only need a copy of your source data, this connector is not required.

Use the MirrorMaker 2.0 Checkpoint connector for the following use cases:

  • Disaster recovery to maintain a consistent consumer state across clusters and enable seamless failover.

  • Preserve consumer progress in scenarios where it is critical.

Use the MirrorMaker 2.0 Heartbeat connector

The MirrorMaker 2.0 Heartbeat connector is an optional component that generates periodic heartbeat messages on the source Kafka cluster. The connector writes these messages to a dedicated topic, typically named heartbeats.

You can configure a MirrorMaker 2.0 Source connector to replicate the heartbeats topic to the target cluster. By observing this replicated topic on the target cluster, you can monitor the status and performance of your topic replication flow. This provides a way to verify the connection and data flow between clusters, even when no other data is being produced or replicated.

Deploying the Heartbeat connector alone does not automatically monitor replication health. To use it for monitoring, you must replicate the heartbeats topic and then observe its presence and timeliness on the target cluster, or use monitoring tools that consume these heartbeats.

The Heartbeat connector is not required for the MirrorMaker 2.0 Source connector to function. Use the MirrorMaker 2.0 Heartbeat connector for the following use cases:

  • Monitor the health and status of MirrorMaker 2 replication.

  • Configure alerts in Cloud Monitoring using the generated heartbeats and available metrics to notify you when replication or the heartbeat stops.

Use Sink connectors

Sink connectors export data from Kafka topics to other systems.

Use the BigQuery Sink connector

The BigQuery Sink connector streams data from Kafka topics to BigQuery tables.

Use the BigQuery Sink connector for the following use cases:

  • Data warehousing, to load streaming data into BigQuery for analytics and reporting.

  • Populating BigQuery tables that power real-time dashboards.

Use the Cloud Storage Sink connector

The Cloud Storage Sink connector streams data from Kafka topics to Cloud Storage buckets.

Use the Cloud Storage Sink connector for the following use cases:

  • Data lake ingestion, to store Kafka data in a data lake for long-term archival and batch processing.

  • Archiving data to meet regulatory requirements.

Use the Pub/Sub Sink connector

The Pub/Sub Sink connector streams messages from Kafka topics to a Pub/Sub topic.

Use the Pub/Sub Sink connector for the following use cases:

  • Service integration, to send data from Kafka to other Google Cloud services or applications that consume from Pub/Sub.

  • Triggering real-time notifications or actions based on processed data.

Use Source connectors

Source connectors import data from other systems into Kafka topics.

Use the Pub/Sub Source connector

The Pub/Sub Source connector streams messages from a Pub/Sub subscription into a Kafka topic.

Use the Pub/Sub Source connector for the following use cases:

  • Real-time data ingestion, bringing data from cloud services or other applications and publishing to Pub/Sub into Kafka for stream processing.

  • Event-driven architectures, triggering Kafka-based processing based on events published to Pub/Sub.

What's next?