-#+date: 2018-09-11 21:09:06 +0800
+#+date: <2018-09-11 21:09:06 +0800>
#+filetags: Apache Nifi Kafka bigdata streaming
#+title: Using Apache Nifi and Kafka - big data tools
Working in analytics these days, the concept of big data has been firmly
established. Smart engineers have been developing cool technology to
-work with it for a while now. The [[Apache Software
-Foundation|https://apache.org]] has emerged as a hub for many of these -
+work with it for a while now. The [[https://apache.org][Apache Software Foundation]] has emerged as a hub for many of these -
Ambari, Hadoop, Hive, Kafka, Nifi, Pig, Zookeeper - the list goes on.
While I'm mostly interested in improving business outcomes applying
Over the past few weeks, I have been exploring some tools, installing
them on my laptop or a server and giving them a spin. Thanks to
-[[Confluent, the founders of Kafka|https://www.confluent.io]] it is
+[[https://www.confluent.io][Confluent, the founders of Kafka]] it is
super easy to try out Kafka, Zookeeper, KSQL and their REST API. They
all come in a pre-compiled tarball which just works on Arch Linux.
(After trying to compile some of these, this is no luxury - these apps
#+END_SRC
I also spun up an instance of
-[[nifi|https://nifi.apache.org/download.html]], which I used to monitor
+[[https://nifi.apache.org/download.html][nifi]], which I used to monitor
a (json-ised) apache2 webserver log. Every new line added to that log
goes as a message to Kafka.
-[[Apache Nifi configuration|/pics/ApacheNifi.png]]
+#+CAPTION: Apache Nifi configuration
+#+ATTR_HTML: :class img-fluid :alt Apache Nifi configuration
+[[file:../assets/ApacheNifi.png]]
A processor monitoring a file (tailing) copies every new line over to
another processor publishing it to a Kafka topic. The Tailfile monitor