Kai Wähner works as a Technology Evangelist at Confluent, a Silicon Valley startup working closely with the Apache community to improve the project Kafka, a streaming platform to build highly scalable, mission-critical infrastructures.
His main area of expertise lies within the fields of Big Data Analytics, Machine Learning, Integration, Microservices, Internet of Things, Stream Processing and Blockchain. He is a regular speaker at international conferences such as JavaOne, O’Reilly Software Architecture or ApacheCon, furthermore, he also writes articles for professional journals.
Kai will deliver a talk about Deep Learning at Extreme Scale in the Cloud with Apache Kafka and TensorFlow at Codemotion Berlin 2018.
Discover more about Codemotion Berlin!
Kai, as Tech Evangelist, how would you describe the work Confluent is doing on Kafka?
Confluent builds Kafka itself (including Kafka Connect for integration and Kafka Streams for stream processing) and adds a powerful ecosystem including open source components such as REST Proxy, Schema Registry and KSQL (the Streaming SQL engine for Kafka). There is a great 40min video from a conference talk if you want to get a high-level introduction and overview: “Introduction to Apache Kafka as Event-Driven Open Source Streaming Platform”
How does your working routine look like? What is the most exciting part of being a Tech Evangelist?
As Technology Evangelist, I have two main tasks in my daily job:
1) Work with customers to discuss architectures, projects and a combination of different (cutting edge and legacy) technologies, and
2) do public talks, webinars and articles. I focus on cutting-edge technologies such as Apache Kafka and its open source ecosystem, Machine Learning frameworks such as TensorFlow, Internet of Things technologies such as MQTT, container technologies such as Docker and Kubernetes, and modern architectures leveraging microservices or serverless.
As part of preparing talks and demos, I also build small side projects on Github, e.g. for running Deep Learning models built with TensorFlow, DeepLearning4J or H2O within a Kafka Streams application (https://github.com/kaiwaehner/kafka-streams-machine-learning-examples) or end-to-end integration from MQTT devices to Kafka clusters in hybrid scenarios (on premise and public cloud) using KSQL and Confluent Replicator (https://github.com/kaiwaehner/ksql-udf-deep-learning-mqtt-iot).
What are the pros and cons of this technology?
Apache Kafka and its open source ecosystem are present in almost any big company. It evolved from a scalable messaging layer with high throughput capabilities to a much more powerful streaming platform. The use cases started with big data log analytics into Hadoop for batch processing, but now include mission-critical deployments for payments, real-time fraud detection, logistics or predictive maintenance. Kafka is everywhere and its ecosystem gets stronger and stronger every month.
Who is it that could use this technology?
Kafka is used by companies such as LinkedIn (processing over 4.5 trillion messages per day), Netflix (processing 6 Petabyte data per day at peak times), and almost any other tech giant. But also most traditional companies such as banks, telcos, retailers, automotive, and others use Kafka more and more as a central nervous system for their most critical and innovative projects.
It is not just used for high throughput and scalability, it also decouples systems and applications well. This allows building microservice infrastructures without tight coupling. Something that was not possible before – even with tools like Enterprise Service Bus (ESB) or other integration frameworks (which promised similar capabilities. My blog post “Apache Kafka vs. Enterprise Service Bus (ESB)—Friends, Enemies, or Frenemies?” (https://www.confluent.io/blog/apache-kafka-vs-enterprise-service-bus-esb-friends-enemies-or-frenemies/) goes into much more detail here.
What’s your and your company day-to-day commitment to Kafka?
At Confluent, we fix critical bugs, add new features (such as exactly-once semantics) and security standards (such as OAuth recently), and build a whole ecosystem with many new components (such as KSQL for scalable stream processing without writing source code). We also work closely with the Apache Kafka open source community on Kafka mailing list, via our community Slack channel (https://launchpass.com/confluentcommunity), in meetups all over the world, or at conferences such as Kafka Summit where you can listen to Kafka Committers from Confluent, but also to companies from LinkedIn, Apple, Uber, Zalando, Google, and many more.
Why should people be interested in Kafka?
As mentioned above, Kafka is cutting edge technology, but also used in many critical projects today. KSQL is a game changer. It allows people with less programming skills to still build powerful stream processing applications on top of Apache Kafka; just with SQL-like code, no source code like Java needed. KSQL also offers a REST interface, so data engineers and developers can use it from non-JVM languages such as Python, Go or any other REST-based tooling. While KSQL is easy to use, you can build powerful streaming use cases including Streaming ETL, Real Time Dash Boards or Anomaly Detection. The best is that it is based natively on Apache Kafka – with all its benefits like high scalability, high volume throughput and fail-over. You can deploy KSQL queries for continuous processing and scale it to millions of messages per second, with high availability and zero data loss.
What about Kafka and the hot topic “Machine Learning”?
At Codemotion Berlin, my talk will be about the combination of Apache Kafka and Machine Learning to build a scalable infrastructure for analytic models. This includes ingestion, preprocessing, training, deployment and monitoring of analytic models. This is a huge challenge for most companies as you cannot simply deploy some Python code into production and expect 24/7 availability and good performance. You need to have the right infrastructure for the whole ML process. This is where Kafka ecosystem shines, thus this is a perfect combination. See my blog post “How to Build and Deploy Scalable Machine Learning in Production with Apache Kafka” (https://www.confluent.io/blog/build-deploy-scalable-machine-learning-production-apache-kafka/).
Interested in Kafka, Big Data, Machine Learning? Join us at Codemotion Berlin, don’t miss the opportunity to deepen your knowledge on these topics with Kai Wähner on November 20-21!