Replicate Kafka topics with MirrorMaker 2.0


This tutorial shows how to replicate data between Managed Service for Apache Kafka clusters, by using the MirrorMaker 2.0 Source connector.

This tutorial requires data to be present in the source Kafka cluster that you intend to replicate. If you don't have this, then follow the steps in Produce and consume messages with the CLI. Make sure to create a topic in the source cluster for replication.

The clusters can be in the same project or in different projects, and they can be in the same region or different regions. For this tutorial, you set up replication across regions but within the same project. However, the steps are the same for other combinations.

Architecture

For this scenario, there are three clusters:

  • A Kafka cluster whose data is being replicated. This cluster is called the source cluster, because it's the source of the data.

  • A Kafka cluster where the replicated data is written. This cluster is called the target cluster.

  • A Connect cluster, which lets you create and manage the MirrorMaker 2.0 Source connector.

A Connect cluster has a primary Kafka cluster, which is the Kafka cluster associated with the Connect cluster. To minimize latency for write operations, it's recommended to designate the target cluster as the primary cluster, and to put the Connect cluster in the same region as the target cluster.

The following diagram shows these components:

Architecture of the Kafka data replication scenario

Data replication is performed by the MirrorMaker 2.0 Source connector. Two other MirrorMaker 2.0 connectors are optional for this scenario:

  • MirrorMaker 2.0 Checkpoint connector: Enables seamless failover, by ensuring that consumers on the target cluster can resume processing from the same point as the source cluster.

  • MirrorMaker 2.0 Heartbeat connector: Generates periodic heartbeat messages on the source Kafka cluster, which lets you monitor the health and status of the data replication.

This tutorial does not use these optional connectors. For more information, see When to use MirrorMaker 2.0.

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

    Go to project selector

  3. Verify that billing is enabled for your Google Cloud project.

  4. Enable the Managed Kafka API.

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

    Enable the API

  5. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

    Go to project selector

  6. Verify that billing is enabled for your Google Cloud project.

  7. Enable the Managed Kafka API.

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

    Enable the API

  8. Make sure that you have the following role or roles on the project: Managed Kafka Cluster Editor, Managed Kafka Connect Cluster Editor, and Managed Kafka Connector Editor

    Check for the roles

    1. In the Google Cloud console, go to the IAM page.

      Go to IAM
    2. Select the project.
    3. In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.

    4. For all rows that specify or include you, check the Role column to see whether the list of roles includes the required roles.

    Grant the roles

    1. In the Google Cloud console, go to the IAM page.

      Go to IAM
    2. Select the project.
    3. Click Grant access.
    4. In the New principals field, enter your user identifier. This is typically the email address for a Google Account.

    5. In the Select a role list, select a role.
    6. To grant additional roles, click Add another role and add each additional role.
    7. Click Save.
  9. If you don't already have a source Kafka cluster, create one by following the steps in Produce and consume messages with the CLI. Make sure to create a topic in the cluster.

Create the target Kafka cluster

In this step, you create a second Managed Service for Apache Kafka cluster. This cluster is the target cluster, to which MirrorMaker 2.0 replicates data.

Console

  1. Go to the Managed Service for Apache Kafka > Clusters page.

    Go to Clusters

  2. Click Create.

  3. In the Cluster name box, enter a name for the cluster.

  4. In the Region list, select a location for the cluster. Choose a different region from where the source cluster is located.

  5. For Network configuration, configure the subnet where the target cluster is accessible. The subnet must be in the same VPC network as the source cluster's subnet.

    1. For Project, select your project.
    2. For Network, select the same VPC network as the source cluster's subnet.
    3. For Subnet, select the subnet.
    4. Click Done.
  6. Click Create.

While the cluster is being created, the cluster state is Creating. When the cluster has finished being created, the state is Active.

gcloud

To create the target Kafka cluster, run the managed-kafka clusters create command.

gcloud managed-kafka clusters create TARGET_KAFKA_CLUSTER \
--location=REGION \
--cpu=3 \
--memory=3GiB \
--subnets=projects/PROJECT_ID/regions/REGION/subnetworks/SUBNET_NAME \
--async

Replace the following:

  • KAFKA_CLUSTER: a name for the Kafka cluster
  • REGION: the location of the cluster; choose a different region from where the source cluster is located.

    For information about supported locations, see Managed Service for Apache Kafka locations.

  • PROJECT_ID: your project ID

  • SUBNET_NAME: the subnet where you want to deploy the cluster; for example, default

    The subnet must be in the same VPC as the source cluster.

The command runs asynchronously and returns an operation ID:

Check operation [projects/PROJECT_ID/locations/REGION/operations/OPERATION_ID] for status.

To track the progress of the create operation, use the gcloud managed-kafka operations describe command:

gcloud managed-kafka operations describe OPERATION_ID \
  --location=REGION

For more information, see Monitor the cluster creation operation.

Create a Connect cluster

In this step, you create a Connect cluster. Creating a Connect cluster usually takes 20-30 minutes.

Before you start this step, make sure the target Kafka cluster from the previous step is fully created.

Console

  1. Go to the Managed Service for Apache Kafka > Connect Clusters page.

    Go to Connect Clusters

  2. Click Create.

  3. For the Connect cluster name, enter a string. Example: my-connect-cluster.

  4. For Primary Kafka cluster, select the target Kafka cluster that you created in the previous step. (Do not select your source Kafka cluster.)

  5. To enable the Connect cluster to access the subnet of the source cluster, perform the following steps:

    1. Expand Accessible subnets.

    2. Click Add a connected subnet.

    3. For Subnet URI path, enter the subnet of the source Kafka cluster. Use the following format:

    projects/PROJECT_ID/regions/SOURCE_CLUSTER_REGION/subnetworks/SOURCE_CLUSTER_SUBNET
    

    Replace the following:

    • PROJECT_ID: your project ID
    • SOURCE_CLUSTER_REGION: the region where you created the source Kafka cluster
    • SOURCE_CLUSTER_SUBNET: the name of the subnet for the source Kafka cluster
  6. To enable the Connect cluster to resolve the DNS domain of the source cluster, perform the following steps:

    1. Expand Resolvable DNS domains.

    2. Click Add a DNS domain.

    3. In the Kafka cluster list, select the source Kafka cluster.

  7. Click Create.

While the cluster is being created, the cluster state is Creating. When the cluster has finished being created, the state is Active.

gcloud

To create a Connect cluster, run the gcloud alpha managed-kafka connect-clusters create command.

gcloud alpha managed-kafka connect-clusters create CONNECT_CLUSTER \
  --location=REGION \
  --cpu=12 \
  --memory=12GiB \
  --primary-subnet=projects/PROJECT_ID/regions/REGION/subnetworks/SUBNET_NAME \
  --kafka-cluster=TARGET_KAFKA_CLUSTER \
  --dns-name=SOURCE_KAFKA_CLUSTER.SOURCE_CLUSTER_REGION.managedkafka.PROJECT_ID.cloud.goog. \
  --additional-subnet=projects/PROJECT_ID/regions/SOURCE_CLUSTER_REGION/subnetworks/SOURCE_CLUSTER_SUBNET \
  --async

Replace the following:

  • CONNECT_CLUSTER: a name for the Connect cluster
  • REGION: the region where you created the target Kafka cluster
  • PROJECT_ID: your project ID
  • SUBNET_NAME: the subnet where you created the target Kafka cluster
  • TARGET_KAFKA_CLUSTER: the name of the target Kafka cluster
  • SOURCE_KAFKA_CLUSTER: the name of the source Kafka cluster
  • SOURCE_CLUSTER_REGION: the region where you created the source Kafka cluster
  • SOURCE_CLUSTER_SUBNET: the name of the subnet for the source Kafka cluster

The command runs asynchronously and returns an operation ID:

Check operation [projects/PROJECT_ID/locations/REGION/operations/OPERATION_ID] for status.

To track the progress of the create operation, use the gcloud managed-kafka operations describe command:

gcloud managed-kafka operations describe OPERATION_ID \
  --location=REGION

For more information, see Monitor the cluster creation operation.

Create a MirrorMaker 2.0 Source connector

In this step, you create a MirrorMaker 2.0 Source connector. This connector replicates messages from the source Kafka cluster to the target Kafka cluster.

Console

  1. Go to the Managed Service for Apache Kafka > Connect Clusters page.

    Go to Connect Clusters

  2. Click the name of the Connect cluster.

  3. Click Create connector.

  4. For the Connector name, enter a string. Example: mm2-connector.

  5. In the Connector plugin list, select MirrorMaker 2.0 Source.

  6. Select Use primary Kafka cluster as target cluster.

  7. For Source cluster, select Managed Service for Apache Kafka Cluster.

  8. In the Kafka cluster list, select your source cluster.

  9. In the Comma-separated topic names or topic regex box, enter the name of the Kafka topic or topics that you want to replicate.

  10. Click Create.

gcloud

To create a MirrorMaker 2.0 Source connector, run the gcloud alpha managed-kafka connectors create command.

gcloud alpha managed-kafka connectors create CONNECTOR_NAME \
  --location=REGION \
  --connect_cluster=CONNECT_CLUSTER \
  --configs=connector.class=org.apache.kafka.connect.mirror.MirrorSourceConnector,\
source.cluster.alias=source,\
source.cluster.bootstrap.servers=bootstrap.SOURCE_KAFKA_CLUSTER.SOURCE_CLUSTER_REGION.managedkafka.PROJECT_ID.cloud.goog:9092,\
target.cluster.alias=target,\
target.cluster.bootstrap.servers=bootstrap.TARGET_KAFKA_CLUSTER.REGION.managedkafka.PROJECT_ID.cloud.goog:9092,\
tasks.max=3,\
topics=KAFKA_TOPIC_NAME

Replace the following:

  • CONNECTOR_NAME: a name for the connector, such as mm2-connector
  • CONNECT_CLUSTER: the name of your Connect cluster
  • SOURCE_KAFKA_CLUSTER: the name of the source Kafka cluster
  • SOURCE_CLUSTER_REGION: the region where you created the source Kafka cluster
  • PROJECT_ID: your project ID
  • TARGET_KAFKA_CLUSTER: the name of the target Kafka cluster
  • REGION: the region where you created the Connect cluster; which is also the target Kafka cluster region
  • KAFKA_TOPIC_NAME: the name of the topic to replicate. This parameter can also specify a comma-separated list of topic
    names or a regular expression.

The MirrorMaker 2.0 Source connector creates a new topic in the target cluster named "source.KAFKA_TOPIC_NAME", where KAFKA_TOPIC_NAME is the name of the topic in the source cluster.

View results

To verify that messages are being replicated, you can use the Kafka command line tools. For information about setting up the Kafka CLI, see Set up a client machine in the document Produce and consume messages with the CLI.

For example, to send a message to the source cluster, enter the following at the command line:

export BOOTSTRAP=bootstrap.SOURCE_KAFKA_CLUSTER.SOURCE_CLUSTER_REGION.managedkafka.PROJECT_ID.cloud.goog:9092

for msg in {1..10}; do
  echo "message $msg"
done | kafka-console-producer.sh --topic KAFKA_TOPIC_NAME \
       --bootstrap-server $BOOTSTRAP --producer.config client.properties

To read the duplicated messages from the target cluster, enter the following at the command line:

export BOOTSTRAP=bootstrap.TARGET_KAFKA_CLUSTER.REGION.managedkafka.PROJECT_ID.cloud.goog:9092

kafka-console-consumer.sh --topic source.KAFKA_TOPIC_NAME --from-beginning \
 --bootstrap-server $BOOTSTRAP --consumer.config client.properties

The output looks like the following:

message 1
message 2
message 3
message 4
[...]

Clean up

To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.

Console

  1. Delete the Connect cluster.

    1. Go to the Managed Service for Apache Kafka > Connect Clusters page.

      Go to Connect Clusters

    2. Select the Connect cluster and click Delete.

  2. Delete the target Kafka cluster.

    1. Go to the Managed Service for Apache Kafka > Clusters page.

      Go to Clusters

    2. Select the Kafka cluster and click Delete.

  3. Repeat the previous step to delete the source Kafka cluster.

gcloud

  1. To delete the Connect cluster, use the gcloud alpha managed-kafka connect-clusters delete command.

    gcloud alpha managed-kafka connect-clusters delete CONNECT_CLUSTER \
      --location=REGION --async
    
  2. To delete the target Kafka cluster, use the gcloud managed-kafka clusters delete command.

    gcloud managed-kafka clusters delete TARGET_KAFKA_CLUSTER \
      --location=REGION --async
    
  3. Repeat the previous step to delete the source Kafka cluster.

What's next