Kafka Basics

Apache Kafka is an open-source, distributed, publish-subscribe messaging system designed to handle large amounts of data.

Important terms

Topic: Messages or data are published to and read from topics.

Partition: Topics can be split into multiple partitions, allowing for parallel processing of data streams. Each partition is an ordered, immutable sequence of records. Partitions provide a way to horizontally scale data processing within a Kafka cluster.

Broker: Kafka cluster supporting pub-sub

Producer: Publish data to the topic.

Consumer: Subscribe to the topic.

Offset: Unique identifiers for messages. Each record in a partition is assigned a unique, sequential offset, and the order of the records within a partition is maintained. This means that data is guaranteed to be processed in the order it was written to the partition.

ZooKeeper: Apache ZooKeeper is a distributed coordination service for managing distributed systems. If a node fails, another node takes over its responsibilities, ensuring high availability. ZooKeeper uses a consensus algorithm to ensure that all nodes in the system have a consistent view of the data. It helps Kafka to manage coordination between brokers and to maintain configuration information.