Develop a Python producer application

Learn how to create a Google Cloud Managed Service for Apache Kafka cluster and write a Python producer application that can use Application Default Credentials (ADC). ADC is a way for your applications running on Google Cloud to automatically find and use the right credentials for authenticating to Google Cloud services.

Before you begin

Follow these steps to set up the gcloud CLI and a Google Cloud project. These are required to complete this guide.

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. Install the Google Cloud CLI.
  3. To initialize the gcloud CLI, run the following command:

    gcloud init
  4. Create or select a Google Cloud project.

    • Create a Google Cloud project:

      gcloud projects create PROJECT_ID

      Replace PROJECT_ID with a name for the Google Cloud project you are creating.

    • Select the Google Cloud project that you created:

      gcloud config set project PROJECT_ID

      Replace PROJECT_ID with your Google Cloud project name.

  5. Make sure that billing is enabled for your Google Cloud project.

  6. Enable the Managed Kafka, Compute Engine, and Cloud DNS APIs:

    gcloud services enable managedkafka.googleapis.com compute.googleapis.com dns.googleapis.com
  7. Install the Google Cloud CLI.
  8. To initialize the gcloud CLI, run the following command:

    gcloud init
  9. Create or select a Google Cloud project.

    • Create a Google Cloud project:

      gcloud projects create PROJECT_ID

      Replace PROJECT_ID with a name for the Google Cloud project you are creating.

    • Select the Google Cloud project that you created:

      gcloud config set project PROJECT_ID

      Replace PROJECT_ID with your Google Cloud project name.

  10. Make sure that billing is enabled for your Google Cloud project.

  11. Enable the Managed Kafka, Compute Engine, and Cloud DNS APIs:

    gcloud services enable managedkafka.googleapis.com compute.googleapis.com dns.googleapis.com

Create a cluster

A Managed Service for Apache Kafka cluster is defined by its project, location, size, and networking configuration. The size of the cluster is the number of vCPUs and RAM across all brokers in the cluster. Setup of individual brokers and storage is automatic. Networking configuration is a set of subnets in which broker and bootstrap IP addresses are provisioned. Here we use the default Virtual Private Cloud and subnet in us-central1.

To create a cluster, run the gcloud managed-kafka clusters create command.

gcloud managed-kafka clusters create CLUSTER_ID \
  --location=us-central1 \
  --cpu=3 \
  --memory=3GiB \
  --subnets=projects/PROJECT_ID/regions/us-central1/subnetworks/default \
  --async

Replace the following:

  • CLUSTER_ID with the name you want your Kafka cluster to be.
  • PROJECT_ID with your Google Cloud project name.
  • The response is similar to the following:

    Create request issued for: [CLUSTER_ID]
    

    This operation returns immediately, but cluster creation might take around half an hour. You can monitor the state of the cluster using the gcloud managed-kafka clusters describe command.

    gcloud managed-kafka clusters describe CLUSTER_ID \
      --location=us-central1
    

    Replace CLUSTER_ID with the name that you named your Kafka cluster.

    The output of the command is similar to the following:

    bootstrapAddress: bootstrap.CLUSTER_ID.us-central1.managedkafka.PROJECT_ID.cloud.goog:9092
    capacityConfig:
      memoryBytes: '3221225472'
      vcpuCount: '3'
    createTime: '2024-05-28T04:32:08.671168869Z'
    gcpConfig:
      accessConfig:
        networkConfigs:
        - subnet: projects/PROJECT_NUMBER/regions/us-central1/subnetworks/default
    name: projects/PROJECT_ID/locations/us-central1/clusters/CLUSTER_ID
    rebalanceConfig:
      mode: AUTO_REBALANCE_ON_SCALE_UP
    state: CREATING
    updateTime: '2024-05-28T04:32:08.671168869Z'
    

    The state field is useful for monitoring the creation operation. You can use the cluster after the state turns to ACTIVE. The bootstrapAddress is the URL you use to connect to the cluster.

    Set up a client VM

    A producer application must run on a machine with network access to the cluster. We use a Compute Engine virtual machine instance (VM). This VM must be in the same region as the Kafka cluster. It must also be in the VPC containing the subnet that you've used in the cluster configuration. To create the client VM, run the following command:

    gcloud compute instances create test-instance \
      --scopes=https://www.googleapis.com/auth/cloud-platform \
      --subnet=projects/PROJECT_ID/regions/us-central1/subnetworks/default \
      --zone=us-central1-f
    

    Replace PROJECT_ID with your Google Cloud project name.

    This VM has a Compute Engine default service account. Your application uses this service account to authenticate with the Managed Service for Apache Kafka API. This service account needs permission to connect to the cluster.

    Grant this permission to the service account:

    gcloud projects add-iam-policy-binding \
    PROJECT_ID \
    --member="serviceAccount:PROJECT_NUMBER-compute@developer.gserviceaccount.com" \
    --role=roles/managedkafka.client
    

    This command requires both the project ID and the project number. You can look it up with gcloud projects describe PROJECT_ID.

    Replace the following:

  • PROJECT_ID with your Google Cloud project name.
  • PROJECT_NUMBER with your Google Cloud project number.
  • Create a Python producer application

    1. Connect to the client VM using SSH. One way to do this is to run the following command:

      gcloud compute ssh --project=PROJECT_ID \
          --zone=us-central1-f test-instance
      

      Replace PROJECT_ID with your Google Cloud project name.

      For more information about connecting using SSH, see About SSH connections.

    2. Install pip, a Python package manager and the virtual environment manager:

      sudo apt install python3-pip -y
      sudo apt install python3-venv -y
      
    3. Create a new virtual environment (venv) and activate it:

      python3 -m venv kafka
      source kafka/bin/activate
      
    4. Install Confluent's Python Client for Apache Kafka and other dependencies:

      pip install confluent-kafka google-auth urllib3 packaging
      
    5. Create a Python script called producer.py.

      import base64
      import datetime
      import http.server
      import json
      import random
      import google.auth
      import google.auth.transport.urllib3
      import urllib3
      import confluent_kafka
      import functools
      import time
      
      # Token Provider class
      # This class handles the OAuth token retrieval and formatting
      class TokenProvider(object):
      
        def __init__(self, **config):
          self.credentials, _project = google.auth.default()
          self.http_client = urllib3.PoolManager()
          self.HEADER = json.dumps(dict(typ='JWT', alg='GOOG_OAUTH2_TOKEN'))
      
        def valid_credentials(self):
          if not self.credentials.valid:
            self.credentials.refresh(google.auth.transport.urllib3.Request(self.http_client))
          return self.credentials
      
        def get_jwt(self, creds):
          return json.dumps(
              dict(
                  exp=creds.expiry.timestamp(),
                  iss='Google',
                  iat=datetime.datetime.now(datetime.timezone.utc).timestamp(),
                  scope='kafka',
                  sub=creds.service_account_email,
              )
          )
      
        def b64_encode(self, source):
          return (
              base64.urlsafe_b64encode(source.encode('utf-8'))
              .decode('utf-8')
              .rstrip('=')
          )
      
        def get_kafka_access_token(self, creds):
          return '.'.join([
            self.b64_encode(self.HEADER),
            self.b64_encode(self.get_jwt(creds)),
            self.b64_encode(creds.token)
          ])
      
        def token(self):
          creds = self.valid_credentials()
          return self.get_kafka_access_token(creds)
      
        def confluent_token(self):
          creds = self.valid_credentials()
      
          utc_expiry = creds.expiry.replace(tzinfo=datetime.timezone.utc)
          expiry_seconds = (utc_expiry - datetime.datetime.now(datetime.timezone.utc)).total_seconds()
      
          return self.get_kafka_access_token(creds), time.time() + expiry_seconds
      
      # Confluent does not use a TokenProvider object
      # It calls a method
      def make_token(args):
        """Method to get the Token"""
        t = TokenProvider()
        token = t.confluent_token()
        return token
      
      kafka_cluster_name = 'CLUSTER_ID'
      region = 'us-central1'
      project_id = 'PROJECT_ID'
      port = '9092'
      kafka_topic_name = 'example-topic'
      
      # Kafka Producer configuration with OAUTHBEARER authentication
      config = {
          'bootstrap.servers': f'bootstrap.{kafka_cluster_name}.{region}.managedkafka.{project_id}.cloud.goog:{port}',
          'security.protocol': 'SASL_SSL',
          'sasl.mechanisms': 'OAUTHBEARER',
          'oauth_cb': make_token,
      }
      
      producer = confluent_kafka.Producer(config)
      
      # Produce and submit 10 messages
      for i in range(10):
        # Generate a random message
        now = datetime.datetime.now()
        datetime_string = now.strftime("%Y-%m-%d %H:%M:%S")
      
        message_data = {
            "random_id": random.randint(1, 100),
            "date_time": datetime_string
        }
      
        # Serialize data to bytes
        serialized_data = json.dumps(message_data).encode('utf-8')
      
        # Produce the message
        producer.produce(kafka_topic_name, serialized_data)
      
        print(f"Produced {i} messages")
      
      producer.flush()
      

      Replace the following:

      • CLUSTER_ID with the name you want your Kafka cluster to be.
      • PROJECT_ID with your Google Cloud project name.
    6. You are now ready to run the application:

      python producer.py
      

    Clean up

    To avoid incurring charges to your Google Cloud account for the resources used on this page, delete the Google Cloud project with the resources.

    To delete the cluster, run the gcloud managed-kafka clusters delete command:

    gcloud managed-kafka clusters delete CLUSTER_ID --location=us-central1
    

    To delete the VM, run the gcloud compute instances delete command:

    gcloud instances delete test-instance \
      --zone=us-central1-f
    

    What's next

    Apache Kafka® is a registered trademark of The Apache Software Foundation or its affiliates in the United States and/or other countries.