Managed Service for Apache Kafka overview

Managed Service for Apache Kafka is a Google Cloud service that helps you run secure, scalable open source Apache Kafka clusters. This page is an overview of what the service automates and simplifies for you. For more information about Apache Kafka, see the Apache Kafka website.

Simple sizing and scaling

To size or scale a Managed Service for Apache Kafka cluster, you need only set the total vCPU count and RAM size for the cluster. Management of brokers, including storage, is fully automated. To keep up with demands of clients, you can monitor vCPU and RAM utilization and adjust them up or down.

When you set the vCPU count and RAM size, the service automates broker provisioning, storage management, and rebalancing.

Broker provisioning

When you configure the total vCPU and RAM size for the cluster, the service provisions new brokers and scales existing brokers. For a typical cluster configuration, the total vCPU and RAM size is split evenly across all brokers. This means that fractional vCPU counts per broker are allowed, although a minimum of a single vCPU per broker is required. All clusters are distributed across three zones. This means that a minimum of 3 vCPU and 3 GiB of RAM per cluster is required.

As you increase the cluster size, brokers are scaled vertically up to 15 vCPU per broker. After this limit is reached, the service creates new brokers. When you decrease the cluster size, existing brokers are scaled down to a single vCPU, but not deleted.

The maximum broker size might change at any time. This limit was chosen to maintain linear scaling of broker throughput with vCPU count. You can examine individual broker configurations using Apache Kafka command line tools as well as Cloud Monitoring.

Scaling algorithm

The number of brokers is determined by the cluster's total vCPU or memory capacity. The scaling ratio is 1 broker for every 15 vCPUs or 120 gibibytes (GiB) of resources, whichever results in a greater number of brokers. The vCPU to memory ratio (vCPU:GiB) must stay between 1:1 and 1:8. The brokers are equally distributed among the 3 zones, with a maximum difference of one.

For example, if you configure a cluster with 70 vCPUs and 130 GiB RAM, along with a replication factor of 3, the following calculation determines the number of brokers:

Calculate the number of brokers required to account for vCPUs: ceiling(70 vCPUs / 15 vCPUs) = 5 brokers
Calculate the number of brokers required to account for memory: ceiling(130 GiB / 120 GiB) = 2 brokers

In this scenario, the cluster has 5 brokers, because the number of brokers is determined by the number of vCPUs. Two of the 3 zones each have 2 brokers assigned to them, and the last zone has 1 broker.

Storage management

Storage management is automated. In most situations, you are responsible for setting the retention time on individual topics to control cost or satisfy your data retention policies. You don't need to provision and manage persistent disks.

The service relies on tiered storage (KIP-405). Tiered storage combines pre-provisioned persistent disk volumes attached to brokers with virtually unlimited object storage. As of writing, the service uses 100 GiB of SSD persistent disks for each vCPU to balance performance, availability, and cost. Each partition leader buffers messages in segment files on these persistent disks. After a segment is rolled, it is moved to persistent object storage backed by regional Cloud Storage. The size of these segment files is set by log.roll.ms and log.segment.bytes settings.

While these details are useful to understand, storage is managed by the service. The specific configurations, such as amount of persistent disk capacity per vCPU, are implementation details that might change. You don't have direct access to Cloud Storage buckets used for persistent storage.

Rebalancing

For newly provisioned brokers to be useful in maintaining performance, some traffic from existing brokers must be moved to these new machines. To make this easier, you can turn on automatic rebalancing.

With automatic rebalancing turned on, when a new broker is provisioned, the service automatically rebalances the partitions from existing brokers. The tiered storage model ensures that a relatively small amount of data must be copied to new brokers, speeding up rebalancing.

The rebalancing algorithm is based on the count of partitions. It does not account for the actual traffic served by each partition.

Flexible networking

The service makes a cluster accessible from any VPC securely. This includes access from multiple VPCs, projects, and regions.

To configure networking for a cluster, you provide the set of subnets where the cluster is accessible. The service provisions private IP addresses for the bootstrap servers and brokers in each subnet. It also sets up private Cloud DNS with URLs for each IP address. The bootstrap servers have a load balancer, so there is a single bootstrap URL per cluster. The URLs are the same across all VPCs so client configurations can be consistent across environments.

This level of flexibility is achieved thanks to Private Service Connect (PSC). Each IP address allocated for a cluster requires a PSC endpoint. The endpoints are provisioned automatically.

Secure clusters

The service does much of the work to keep your cluster secure. This includes authentication, authorization, encryption, patching, and resource isolation. It also disallows unauthenticated and unencrypted connections and storage.

All connections to managed clusters and the administrative APIs are authenticated with an IAM identity. Human, service, and federated accounts are supported. You don't have to manage these identities in the clusters.

The service employs a layered approach towards authorization.

IAM controls cluster management actions such as creating, updating, and deleting resources, and basic connection permissions through Google Cloud IAM roles. In addition, Kafka ACLs provide fine-grained control within the cluster for data plane and administrator operations through the open-source Kafka wire protocol. You can manage these ACLs with the Google Cloud tools or third-party Kafka tools.

For information on configuring IAM and Kafka ACLs, see the IAM documentation.

Encryption is required. All connections to clusters must be made with TLS. The TLS certificates presented by the brokers are signed by the Public Certificate Authority. Stored data is always encrypted. You can choose whether to use Google-managed or Customer-managed encryption keys (CMEK) for encryption at rest.

The service team keeps track of security vulnerabilities discovered in the open source code. When vulnerabilities are discovered, your clusters are patched automatically.

Another security feature of the service is resource isolation. The managed service deploys clusters in tenant projects in a private VPC inaccessible through public IP addresses. Each of your projects has a dedicated tenant project, with a dedicated service agent account. This helps limit the scope of access granted to the service.

Schema registry

To simplify coordination between producers and consumers, Managed Service for Apache Kafka includes a schema registry API. A registry provided by the service acts as a repository of schemas that are shared among applications.

The service implements the Confluent Schema Registry REST API that helps in integration with existing Kafka applications. Apache Avro and Protocol Buffer (Protobuf) schema formats are supported. JSON is not supported.

Managed Service for Apache Kafka also offers an administrative API and toolset for managing schema registries and schemas. The toolset includes the Google Cloud console, gcloud CLI, and client libraries.

For more information about schema registry, see the Schema registry overview.

Data integration with Kafka Connect

Managed Service for Apache Kafka simplifies data integration through Kafka Connect. Kafka Connect offers several built-in connector plugins hosted in Connect clusters. These connectors are used for migration, backup, disaster recovery, high availability, and data integration. These connectors let you connect your Managed Service for Apache Kafka clusters to various systems, including other Kafka deployments and Google Cloud services like BigQuery, Cloud Storage, and Pub/Sub. Kafka Connect provides scalable, reliable data integration with lower operational overhead and integrated monitoring and logging.

To know more about Kafka Connect, see the Kafka Connect overview.

High availability clusters

The goal of the service is to provide regional clusters for mission-critical applications. Specifically, the service protects you from failures of individual zones or brokers.

To achieve this, all clusters are provisioned in a rack-aware three-zone configuration. The default topic configuration requires at least three replicas. Rack-awareness makes sure that replicas are created in different zones. The default minimum number of in-sync replicas is two. This means that your cluster can tolerate complete loss of a zone or a broker.

When a broker fails, due to software, hardware or networking failure, it is replaced automatically. When the service detects broker failure, it automatically restarts it, on a different machine if necessary. After the broker is available, Apache Kafka integrates the broker into the cluster. Complete zone failure might make it impossible to create a new broker. However, the cluster continues operating as long as the other two zones remain available.

In addition to these specific features, a growing list of internal tools and processes proactively maintain the health of the service, Apache Kafka code, and updates. Data and metadata backups are maintained at multiple levels, allowing the service to recover from many human errors and software failures.

The service does not provide protection from regional or dual-zone failures. For applications that require this level of protection, we recommend running two separate regional clusters. You can synchronize the data between two clusters by using tools such as MirrorMaker 2.0 from Kafka Connect.

Tools for your style of administration

The service aims to offer a complete set of tools for your style of cluster management and troubleshooting. This includes tools for administering, monitoring, and logging.

The Managed Service for Apache Kafka is exposed as a Google Cloud API. This means that you can manage clusters and cluster resources using REST and gRPC APIs. Several clients and interfaces are provided for these APIs, including

Terraform providers if you prefer the infrastructure as code approach.
UI in Google Cloud console for interactive work in a browser.
The gcloud CLI for interactive work in a shell.
Client libraries in Java, Python, Go and other languages for custom development and scripting.

For monitoring and troubleshooting, the service exports metrics to Cloud Monitoring. Some of the metrics are available in the service UI. A complete set is available in Cloud Monitoring for interactive work, configuring alerts, and export to other systems.

The service also exports broker logs to Cloud Logging. These are searchable and can be used to create log-based metrics and alerts.

Automatic upgrades and patches

Managed Service for Apache Kafka clusters now run on Apache Kafka version 3.7.1.

The service aims to keep all clusters updated to a recent stable version of Apache Kafka and underlying software. The service aims to stay 1 minor version behind the latest Apache Kafka version.

Updates to the underlying infrastructure, including the operating system and orchestration layers, are also continuous and automatic. Brokers are updated with a rolling restart, with no downtime to the overall cluster. All updates are tested before they are made available and are monitored for stability. Upgrades require no manual intervention.

Transparent cost

The pricing model for Managed Service for Apache Kafka is similar to the charges you see when you run Apache Kafka yourself on Compute Engine. You pay for the resources you provision—vCPU, RAM, and local storage—and consume—persistent storage and data transfer. Persistent storage and vCPU cost more with Managed Service for Apache Kafka compared to setting up a similar system yourself. In contrast, data transfer and local storage prices are similar between Managed Service for Apache Kafka and self-managed Kafka. For more information about pricing, see the pricing guide.

Compatible because we run Apache Kafka

Finally, Managed Service for Apache Kafka runs the same open source software you may already run in your environment. You don't have to change your application code to migrate it to the service.

What's next?

Create a cluster in Google Cloud console
Create a producer application following the quickstart guide
Review the Managed Service for Apache Kafka limitations
Read the pricing guide

Apache Kafka® is a registered trademark of The Apache Software Foundation or its affiliates in the United States and/or other countries.

Managed Service for Apache Kafka overview Stay organized with collections Save and categorize content based on your preferences.

Simple sizing and scaling

Broker provisioning

Scaling algorithm

Storage management

Rebalancing

Flexible networking

Secure clusters

Schema registry

Data integration with Kafka Connect

High availability clusters

Tools for your style of administration

Automatic upgrades and patches

Transparent cost

Compatible because we run Apache Kafka

What's next?

Managed Service for Apache Kafka overview