In today's world, real-time information is continuously getting generated by applications, the applications can be of any type and this information needs easy,reliably and quick solution to route to different types of receivers. Most of the time the applications are producing the data and the receivers i.e. the application that are consuming the data are apart and inaccessible to each other.
In todays data centric era, the main concern is to collect the data and then to analyze it. The analysis include
→ User behaviour data.
→ Application performance tracing.
→ Activity data which are logged by the application.
→ Event driven messages.
So now what is the solution for this?
Answer is Message brokers. A message broker is an architectural pattern for message validation, transformation and routing. There are multiple option to which can be explored in detail in other section.
Kafka is a solution to the real-time problems of any software solution, to deal with real-time volumes of information and route it to multiple consumers quickly.Kafka provides smooth integration between information producers and consumers without blocking the producers of the information, and without letting producers know who the final consumers are.
Apache Kafka is an open source, distributed publish-subscribe messaging system,with the following characteristics:
Persistent messaging: To derive the real value from big data, any kind of information loss cannot be afforded. Apache Kafka is designed with O(1) disk structures that provide constant-time performance even with very large volumes of stored messages, which is in order of TB.
• High throughput: Keeping big data in mind, Kafka is designed to work on commodity hardware and to support millions of messages per second. • Distributed: Apache Kafka explicitly supports messages partitioning over Kafka servers and distributing consumption over a cluster of consumer machines while maintaining per-partition ordering semantics.
• Multiple client support: Apache Kafka system supports easy integration of clients from different platforms such as Java, .NET, PHP, Ruby, and Python.
• Real time: Messages produced by the producer threads should be immediately visible to consumer threads; this feature is critical to event-based systems such as Complex Event Processing systems known as CEP.
A Typical data aggregation supported by Apache Kafka messaging system which is used in big data.
Here the Producer can correspond to
→ Producer 1 (Page views logging system)
→ Producer 2 (User Life cycle management services typically used in subscription based app or stores)
→ Producer 3 (Application logs) etc.
Consumer be like :
- Consumer 1 (Real time Event Processor)
- Consumer 2 (Mongo DB)
- Consumer 3 (Hadoop)
- Consumer 4 (Large data warehouses)