Search
  • Marvin Hoxha

Apache Kafka and ML use cases


GitHub Repo:https://github.com/data-max-hq/kafka-dogbreed-classification


In the post, we will be focusing on Apache Kafka, if you want to know more about model Serving with Seldon-Core and Tensorflowserving you check our Dog-breed-classification blog post.

Topics:

  1. How does Apache Kafka work.

  2. What is Confluent Platform and how to setup it up.

  3. How to publish a message on a topic with a Kafka producer.

  4. How to fetch a message from a topic with Kafka consumer and send a message to Slack Channel

Why would you use Apache Kafka?

Apache Kafka is a data store optimized for processing streaming data in real-time. Streaming data is the continuous transfer of data from different data sources at a steady, high-speed rate. A streaming platform needs to handle this huge amount of data incrementally and sequentially.


The Benefits of using Kafka:

  1. Kafka is scalable

  2. Kafka is highly reliable

  3. Kafka offers High Performance

So how does Apache Kafka work?

Broadly, Kafka accepts streams of events written by data producers and stores records chronologically in partitions across brokers (servers). Each record contains information about an event which they are grouped into topics. Finally, data consumers get their data by subscribing to the topics they want.


Project Development:

Apache Kafka is written in java and scala by Apache Software Foundation. In this project, we use Kafka-python which is a python client for Apache Kafka and is designed to function much like the official java client. In this Project, we are using Confluent Platform, which includes the latest release of Kafka and additional tools and services that make it easier to build and manage an Event Streaming Platform. To set up the Confluent Platform we are using the official Docker Compose file.


Kafka producer:

  1. We encode the image to string using base64.

  2. We use producer.send() to create the topic in the broker if it doesn't already exist and send the image to the broker endpoint.

Kafka consumer:

  1. The consumer receives the image as a string from the topic in the broker and decodes and saves it locally.

  2. We use TensorFlow Keras to process the image for the prediction.

  3. We use either TensorFlow Serving or Seldon-Core to serve the image to the model and get back the prediction.

  4. We then send the photo and prediction to slack

To send the message and photo to slack we need to build a slack app and connect it to the channel you want to send the message. Then copy the Bot User OAuth Token from the

OAuth & Permissions section in the features menu and add it to the code. You will also need to add the slack channel id.

Project structure:


Setup

The docker-compose file from Confluent will start the required components. The dashboard can be accessed here http://localhost:9021.


Trigger producer:

python3 kafka_producer.py
  • If the topic "dogtopic" doesn't exist it will automatically be created.

  • A dog image will be sent as a message to the topic.

Consume data:

When the consumer receives a dog image:

  • It will be sent to Seldon-Core to get the predicted dog breed.

  • The result will be posted on a slack channel.

Conclusion:

In this blog, you learned how to set up Confluent Kafka and create a simple Kafka producer that creates a topic and sends an image which is then received by the Kafta consumer and used to get a prediction from a TensorFlow model, and then lastly the Kafka consumer sends the image and prediction to slack.

This basic example should give you an idea of the kafka-python structure that you can use for your own projects.