Dataflow managed I/O for Apache Kafka

Managed I/O supports reading and writing to Apache Kafka.

Requirements

Requires Apache Beam SDK for Java version 2.58.0 or later.

Configuration

Managed I/O uses the following configuration parameters for Apache Kafka.

Read and write configuration Data type Description
bootstrap_servers string Required. A comma-separated list of Kafka bootstrap servers. Example: localhost:9092.
topic string Required. The Kafka topic to read or write.
file_descriptor_path string The path to a protocol buffer file descriptor set. Applies only if data_format is "PROTO".
data_format string The format of the messages. Supported values: "AVRO", "JSON", "PROTO", "RAW". The default value is "RAW", which reads or writes the raw bytes of the message payload.
message_name string The name of the protocol buffer message. Required if data_format is "PROTO".
schema string

The Kafka message schema. The expected schema type depends on the data format:

For read pipelines, this parameter is ignored if confluent_schema_registry_url is set.

Read configuration Data type Description
auto_offset_reset_config string

Specifies the behavior when there is no initial offset or the current offset no longer exists on the Kafka server. The following values are supported:

  • "earliest": Reset the offset to the earliest offset.
  • "latest": Reset the offset to the latest offset.

The default value is "latest".

confluent_schema_registry_subject string The subject of a Confluent schema registry. Required if confluent_schema_registry_url is specified.
confluent_schema_registry_url string The URL of a Confluent schema registry. If specified, the schema parameter is ignored.
consumer_config_updates map Sets configuration parameters for the Kafka consumer. For more information, see Consumer configs in the Kafka documentation. You can use this parameter to customize the Kafka consumer.
max_read_time_seconds int The maximum read time, in seconds. This option produces a bounded PCollection and is mainly intended for testing or other non-production scenarios.
Write configuration Data type Description
producer_config_updates map Sets configuration parameters for the Kafka producer. For more information, see Producer configs in the Kafka documentation. You can use this parameter to customize the Kafka producer.

To read Avro or JSON messages, you must specify a message schema. To set a schema directly, use the schema parameter. To provide the schema through a Confluent schema registry, set the confluent_schema_registry_url and confluent_schema_registry_subject parameters.

To read or write Protocol Buffer messages, either specify a message schema or set the file_descriptor_path parameter.

For more information and code examples, see the following topics: