The first section of Apache Kafka for Beginners defines Kafka as a publish-subscribe based durable messaging system that exchanges data across processes, applications, and servers. The article introduces Kafka Topic and provides a quick overview of messaging and distributed logs. Finally, the post goes on to describe how to create Kafka Topic.
Table Of Contents
- What is Kafka?
- Components of Kafka
- Publish-subscribe durable messaging system
- What is Kafka Topic?
- Kafka Partition
- Replication Factor (RF)
- How to Create a Kafka Topic?
What is Kafka?
Kafka is a real-time streaming communications system and protocol that is open source and based on the publish-subscribe model. Producers publish data to feeds that consumers subscribe to in this system.
Clients inside a system can share information more quickly and with less danger of major failure when using Kafka. Rather than establishing direct connections between subsystems, clients communicate through a server, which acts as a middleman between producers and consumers. The data is also partitioned and dispersed across multiple servers. In a distributed system, Kafka replicates these partitions as well.
What are the Components of Kafka?
A Kafka system is made up of four major components:
- Broker: Handles all client requests (produce, consume, and metadata) and ensures that data is replicated within the cluster. A cluster may contain one or many brokers.
- Zookeeper: Maintains the cluster’s status (brokers, topics, users).
- A producer is someone who sends records to a broker.
- Consumer: A person who consumes batches of records from a broker.
Publish-subscribe durable messaging system
Apache Kafka is a publish-subscribe based durable messaging system. Messages are sent between processes, applications, and servers via a messaging system. Apache Kafka is a platform that allows users to construct Kafka Topics and applications to add, process, and reprocess records.
Applications connect to this system and upload data to the Kafka Topic. A record can contain any type of information, such as information on an event that occurred on a website or an event that is designed to trigger another event. A different application may connect to the system and process or reprocess records from a subject. The data submitted is kept until the chosen retention period expires.
What is Kafka Topic?
Kafka Topics are used to organize all Kafka records. Kafka Topics are written by producer apps, while Kafka Topics are read by consumer applications. Records that are published to the Kafka Clusters remain in the Kafka Clusters until a specified retention time expires.
Kafka topic, is Kafka’s most fundamental organizational unit. You establish several Kafka Topics to house various types of events, as well as different Kafka Topics to hold filtered and altered versions of the same type of event.
A Kafka Topic is a chronological record of events. Kafka stores records in a log. Logs are simple data structures with well-known meanings, therefore they are simple to grasp.
- First, they are only appended: When you add a new message to a log, it is always added at the end.
- Second, they can only be read by looking for an arbitrary offset in the log and then reading through the log entries sequentially.
- Third, events in the log are immutable—once something has occurred, it is extremely difficult to undo it.
Traditionally structured enterprise messaging systems use topics and queues to temporarily store messages between source and destination. Because Kafka Topics are logs, the data in them is not intrinsically temporary. Every Kafka Topic can be programmed to expire data after a particular age, from seconds to years, or even to preserve messages eternally.
For more information, click here.
Kafka Topic Partition
Replication in Kafka is done at the partition level. A replica is a redundant unit of a Kafka partition. Each Kafka Partition often contains one or more replicas, which means that Kafka Partitions contain messages that are duplicated across several Kafka brokers in the cluster. To scale and achieve fault tolerance, the Kafka distributed system Kafka Partitions and replicates Topics over different servers. Kafka Partition enables us to distribute a topic’s data over numerous broker servers for writes by multiple producers and reads by multiple consumers. Within a broker cluster, the Kafka partition is either automatically or manually assigned to a broker server.
Kafka topics are separated into Kafka Partitions that hold messages in an immutable order. Every message that enters a Kafka Partition receives a unique sequence ID known as an offset.
Kafka may optionally replicate Kafka Partitions in the event that the leader partition fails and the follower replica is required to replace it and become the leader. Remember that Kafka Partition is meant for fast read and write rates, scalability, and dispersing huge volumes of data while establishing a topic.
Every Kafka Partition (Replica) contains one server that acts as the leader and the others as followers. The leader replica handles all read-write requests for the Kafka Partition, while the followers replicate the leader. If the lead server fails, one of the follower servers becomes the leader by default. To disperse the burden, you should try for a healthy balance of leaders, so that each broker is a leader of an equal number of Kafka Partitions.
Replication Factor (RF)
The replication factor (RF), on the other hand, is intended to achieve a specific level of fault-tolerance. Replicas have no direct impact on performance because only one leader partition is responsible for answering producer and consumer requests via broker servers at any given time. Another thing to consider when determining the replication factor is the number of consumers required by your service to meet the production volume.
How to Create a Kafka Topic?
Creating a topic, using Command Line Interface (CLI) on Windows, in production is an operational effort that necessitates awareness and planning. When creating a new Topic, you must choose the Kafka partition count and replication factor, and the following options affect the performance and reliability of your system.
Step1: Initially, make sure that both zookeeper, as well as the Kafka server, should be started.
Step2: Type ‘kafka-topics -zookeeper localhost:2181 -topic -create‘ on the console and press enter.
Step3: Now, rewrite the above command after fulfilling the necessities and Press Enter, as:
‘kafka-topics.bat -zookeeper localhost:2181 -topic <topic_name> –create ?partitions <value> –replication-factor <value>‘
Thus, assuming all of the preceding steps are completed properly, the Kafka Topic will be created successfully.
Read Also: 5 Splendid Theme-parks in Dubai
The preceding article covered the fundamentals of Apache Kafka and its components followed by an introduction to the Kafka Topic and the step-by-step guide to create a Kafka Topic.