What is Apache Kafka?
Kafka is a messaging system that collects and processes extensive amounts of data in real-time, making it a vital integrating component for applications running in a Kubernetes cluster. The efficiency of applications deployed in a cluster can be further augmented with an event-streaming platform such as Apache Kafka.
This in-depth tutorial shows you how to configure a Kafka server on a Kubernetes cluster.
How Does Apache Kafka Work?
Apache Kafka is based on a publish-subscribe model:
- Producers produce messages and publish them to topics.
- Kafka categorizes the messages into topics and stores them so that they are immutable.
- Consumers subscribe to a specific topic and absorb the messages provided by the producers.
Producers and Consumers in this context represent applications that produce event-driven messages and applications that consume those messages. The messages are stored on Kafka brokers, sorted by user-defined topics.
Zookeeper is an indispensable component of a Kafka configuration. It coordinates Kafka producers, brokers, consumers, and cluster memberships.
Kafka cannot function without Zookeeper. The Kafka service keeps restarting until a working Zookeeper deployment is detected.
Deploy Zookeeper beforehand, by creating a YAML file zookeeper.yml. This file starts a service and deployment that schedule Zookeeper pods on a Kubernetes cluster.
Note: You can use Yet Another Markup Language (YAML) to create files suitable for both human users and software tools. Files like the ones presented in this tutorial are readily and freely available on online repositories such as GitHub. The files, in their current form, are not meant to be used in a production environment. Instead, you should adequately edit these files to fit your system’s requirements.
Use your preferred text editor to add the following fields to zookeeper.yml:
apiVersion: v1 kind: Service metadata: name: zk-s labels: app: zk-1 spec: ports: - name: client port: 2181 protocol: TCP - name: follower port: 2888 protocol: TCP - name: leader port: 3888 protocol: TCP selector: app: zk-1 --- kind: Deployment apiVersion: extensions/v1beta1 metadata: name: zk-deployment-1 spec: template: metadata: labels: app: zk-1 spec: containers: - name: zk1 image: bitnami/zookeeper ports: - containerPort: 2181 env: - name: ZOOKEEPER_ID value: "1" - name: ZOOKEEPER_SERVER_1 value: zk1
Run the following command on your Kubernetes cluster to create the definition file:
kubectl create -f zookeeper.yml
Create Kafka Service
We now need to create a Kafka Service definition file. This file manages Kafka Broker deployments by load-balancing new Kafka pods. A basic kafka-service.yml file contains the following elements:
apiVersion: v1 kind: Service metadata: labels: app: kafkaApp name: kafka spec: ports: - port: 9092 targetPort: 9092 protocol: TCP - port: 2181 targetPort: 2181 selector: app: kafkaApp type: LoadBalancer
Once you have saved the file, create the service by entering the following command:
kubectl create -f kafka-service.yml
Note: In the above-mentioned Kafka Service definition file, Type is set to
LoadBalancer. If you have Kubernetes deployed on bare metal, use MetalLB, a load balancer implementation for bare metal Kubernetes.
Define Kafka Replication Controller
Create an additional .yml file to serve as a replication controller for Kafka. A replication controller file, in our example kafka-repcon.yml, contains the following fields:
--- apiVersion: v1 kind: ReplicationController metadata: labels: app: kafkaApp name: kafka-repcon spec: replicas: 1 selector: app: kafkaApp template: metadata: labels: app: kafkaApp spec: containers: - command: - zookeeper-server-start.sh - /config/zookeeper.properties image: "wurstmeister/kafka" name: zk1 ports: - containerPort: 2181
Save the replication controller definition file and create it by using the following command:
kubectl create -f kafka-repcon.yml
Start Kafka Server
The configuration properties for a Kafka server are defined in the config/server.properties file. As we have already configured the Zookeeper server, start the Kafka server with:
How to Create Kafka Topic
Kafka has a command-line utility called kafka-topics.sh. Use this utility to create topics on the server. Open a new terminal window and type:
kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic Topic-Name
We created a topic named Topic-Name with a single partition and one replica instance.
How to Start a Kafka Producer
The config/server.properties file contains the broker port id. The broker in the example is listening on port 9092. It is possible to specify the listening port directly using the command line:
kafka-console-producer.sh --topic kafka-on-kubernetes --broker-list localhost:9092 --topic Topic-Name
Now use the terminal to add several lines of messages.
How to Start a Kafka Consumer
As with the Producer properties, the default Consumer settings are specified in config/consumer.properties file. Open a new terminal window and type the command for consuming messages:
kafka-console-consumer.sh --topic Topic-Name --from-beginning --zookeeper localhost:2181
--from-beginning command lists messages chronologically. You are now able to enter messages from the producer’s terminal and see them appearing in the consumer’s terminal.
How to Scale a Kafka Cluster
Use the command terminal and directly administer the Kafka Cluster using kubectl. Enter the following command and scale your Kafka cluster quickly by increasing the number of pods from one (1) to six (6):
kubectl scale rc kafka-rc --replicas=6
By following the instructions in this tutorial, you have successfully installed Kafka on Kubernetes. A single Kafka broker can process an impressive amount of reads and writes from a multitude of clients simultaneously.
If you are deploying applications within a Kubernetes cluster, use Kafka to improve the capacity of your apps to exchange information in real-time.
For alternative message brokers check out our article on deploying RabbitMQ on Kubernetes.