Huge amount of data is generated by companies which has its web presence and lets its user perform some activities. This site typically includes some of the common user activities like login, number of page visits, social activities including commenting on post, sharing pictures, posts etc. In order to monitor the user behaviour such data is needed, this data is typically handled by logging and aggregation solution due to millions of messages per seconds. The tradational methods required a offline logging into some files and then asynchronously importing the data to some analysis system such as hadoop. However, the solutions are very limiting for building real-time processing systems. But with the recent trends in internet applications, the user activity data has become a part of production data in order to run analytics at real time. The real time analytics can be further categorized such as showing content based on user previous serach history, recomendation or for example the sentiment analysis based upon the user post. The main motto of kafka is to bring together both offline and online mode processing by providing a way to load data parallel into hadoop systems.
Kafka usages by some of the companies are as follows:
• LinkedIn (www.linkedin.com)
• DataSift (www.datasift.com/)
• Twitter (www.twitter.com/)
• Foursquare (www.foursquare.com/)
• Square (www.squareup.com/)