Kafka Vs NiFi

7,870 views
Skip to first unread message

singh

unread,
Mar 30, 2016, 3:00:13 PM3/30/16
to Confluent Platform
In our company we have simple requirement of collecting server-logs into Hadoop in real-time.
I am having a tough time justifying kafka over NiFi from what little I know, from what I have seen and looked at for my simple case both are equally good options. 
So I would like to hear differing opinions if anybody has or can justify one over the other.

Stuart Wong

unread,
Apr 4, 2016, 10:58:54 AM4/4/16
to Confluent Platform
From the little I know about NiFi:

- push vs. pull: you tell NiFi each source where it must pull the data, and each destination where it must push the data. With Kafka, you're providing a pipeline or Hub so on the source side each client (producer) must push its data, while on the output, each client (consumer) pulls it's data. Difference in responsibilities and management along with a hub or central place to get data with Kafka, where with NiFi you're still getting data from multiple places and using NiFi to manage (you'd likely run into scale issues with the management and do multiple NiFi deployments and lose some visibility into data flows). Bear in mind that with Kafka you can do Spark streaming or Kafka Streaming soon, and you also have Kafka Connect so Kafka can out of the box push or pull, while looking like less management than NiFi.

- performance: On the face of it Kafka appears to be more performant and would scale better via multiple brokers and partitions vs. NiFi's clustering capabilities though I've not intrinsic data since I've only read on NiFi.

NiFi does have the edge in terms of security out of the box and with a provided GUI to handle the data flows. There are GUIs for Kafka, just not for data flows that I've seen yet publicly available.

I think it comes down to your requirements. If you're looking for more of a streaming pipeline or data hub then for me Kafka wins, especially if it's at scale. If your interests are smaller then Nifi would seem to be a better fit. Either case is not a knock against the other though, again requirements and ticking those boxes ;-)

Gwen Shapira

unread,
Apr 4, 2016, 11:46:21 AM4/4/16
to confluent...@googlegroups.com
Thanks Stuart.

One thing I'm wondering about is Nifi's HA capabilities. It used to store data between stages in the local node, so if a node went down you'd lose (or at least delay) a bunch of data. Do you know if they fixed this issue?

Gwen

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platf...@googlegroups.com.
To post to this group, send email to confluent...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/7a8c076e-291a-4ce6-bac7-89f27507d7b7%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Lars Francke

unread,
Apr 4, 2016, 6:08:01 PM4/4/16
to confluent...@googlegroups.com
Hi Gwen,

the NiFi folks are still working on a better HA story <https://cwiki.apache.org/confluence/display/NIFI/High+Availability+Processing>

Currently it's still the way you remember it.

Cheers,
Lars

Joe Witt

unread,
Apr 5, 2016, 1:41:47 PM4/5/16
to Confluent Platform
Happy to reply on behalf of NiFi but want to be respectful of the fact that this is Confluent's forum.

Gwen do you mind if I reply?


On Monday, April 4, 2016 at 6:08:01 PM UTC-4, Lars Francke wrote:
Hi Gwen,

the NiFi folks are still working on a better HA story <https://cwiki.apache.org/confluence/display/NIFI/High+Availability+Processing>

Currently it's still the way you remember it.

Cheers,
Lars
On Mon, Apr 4, 2016 at 5:46 PM, Gwen Shapira <gw...@confluent.io> wrote:
Thanks Stuart.

One thing I'm wondering about is Nifi's HA capabilities. It used to store data between stages in the local node, so if a node went down you'd lose (or at least delay) a bunch of data. Do you know if they fixed this issue?

Gwen
On Mon, Apr 4, 2016 at 7:58 AM, Stuart Wong <cgs....@gmail.com> wrote:
From the little I know about NiFi:

- push vs. pull: you tell NiFi each source where it must pull the data, and each destination where it must push the data. With Kafka, you're providing a pipeline or Hub so on the source side each client (producer) must push its data, while on the output, each client (consumer) pulls it's data. Difference in responsibilities and management along with a hub or central place to get data with Kafka, where with NiFi you're still getting data from multiple places and using NiFi to manage (you'd likely run into scale issues with the management and do multiple NiFi deployments and lose some visibility into data flows). Bear in mind that with Kafka you can do Spark streaming or Kafka Streaming soon, and you also have Kafka Connect so Kafka can out of the box push or pull, while looking like less management than NiFi.

- performance: On the face of it Kafka appears to be more performant and would scale better via multiple brokers and partitions vs. NiFi's clustering capabilities though I've not intrinsic data since I've only read on NiFi.

NiFi does have the edge in terms of security out of the box and with a provided GUI to handle the data flows. There are GUIs for Kafka, just not for data flows that I've seen yet publicly available.

I think it comes down to your requirements. If you're looking for more of a streaming pipeline or data hub then for me Kafka wins, especially if it's at scale. If your interests are smaller then Nifi would seem to be a better fit. Either case is not a knock against the other though, again requirements and ticking those boxes ;-)

On Wednesday, March 30, 2016 at 2:00:13 PM UTC-5, singh wrote:
In our company we have simple requirement of collecting server-logs into Hadoop in real-time.
I am having a tough time justifying kafka over NiFi from what little I know, from what I have seen and looked at for my simple case both are equally good options. 
So I would like to hear differing opinions if anybody has or can justify one over the other.

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.

To post to this group, send email to confluent...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.

To post to this group, send email to confluent...@googlegroups.com.

Ewen Cheslack-Postava

unread,
Apr 5, 2016, 5:06:19 PM4/5/16
to Confluent Platform
Joe,

Please do!

-Ewen

To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platf...@googlegroups.com.
To post to this group, send email to confluent...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platf...@googlegroups.com.
To post to this group, send email to confluent...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platf...@googlegroups.com.

To post to this group, send email to confluent...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Thanks,
Ewen

Joe Witt

unread,
Apr 6, 2016, 2:16:05 AM4/6/16
to Confluent Platform
Both Apache NiFi and Apache Kafka provide a broker to connect producers and consumers but they do so in a way that is quite different from one another and complementary when looking holistically at what it takes to connect the enterprise.

In thinking about the 'data plane of connecting systems' the approach with Kafka is around having collaborating producers and consumers agree to exchange information on a specified topic using Kafka's protocol and exchanging data of a format and schema which both parties agree to and understand.  With NiFi by comparison we take the view that many enterprises have tremendous diversity of systems, protocols, formats, schemas, priorities and yet we still have to connect them.  In my view and in very simplistic terms Kafka's model is optimal when you can fix the majority of those terms of agreement between systems whereas NiFi's model is optimal when you cannot.

We should also think about the 'control plane of connecting systems' and by this I mean think about how do we actually manage and control the flow of data.  With Kafka the logic of the dataflow lives in the systems that produce data and systems that consume data.  With NiFi we wanted to decouple the producers and consumers further and allow as much of the dataflow logic as possible or desired to live in the broker itself.  This is why NiFi has interactive command and control to effect immediate change and why NiFi offers the processor API to  operate on, alter, and route the data streams as they flow. It is also why NiFi provides powerful back-pressure and congestion control features. The model NiFi offers means you do have a point of central control with distributed execution, where you can address cross cutting concerns, where you can tackle things like compliance checks and tracking which you would not want on the producer/consumers.

There are of course many other aspects to discuss but sticking to the ideas raised in the thread so far here is a response for a few of them.

'Push vs Pull'

  In Kafka producers push to Kafka and consumers pull from Kafka.  This is a clean and scalable model but again it requires systems to accept and adopt that protocol.  In NiFi we do not require a specific protocol.  We support both push/pull patterns for getting data into NiFi just as we do for getting data out.  There are great architectural reasons to strive for the convergence that Kafka promotes and very practical realities of connecting systems across the enterprise that NiFi is designed to accommodate.

'HA'

  On the data plane NiFi does not offer distributed data durability today as Kafka does.  As Lars pointed out the NiFi community is adding distributed durability but the value of it for NiFi's use cases will be less vital than it is for Kafka as NiFi isn't holding the data for the arbitrary consumer pattern that Kafka supports. If a NiFi node goes down the data is delayed while it is down. Avoiding data loss though is easily solved thanks to tried and true RAID or distributed block storage. NiFi's control plane does already provide high availability as the cluster manager and even multiple nodes in a cluster can be lost while the live flow can continue operating normally.

'Performance'

  Kafka offers an impressive balance of both high throughput and low latency.  But comparing performance of Kafka and NiFi is not very meaningful given that they do very different things.  It would be best to discuss performance tradeoffs in the context of a particular use case.

Thanks
Joe
Joe,

Please do!

-Ewen

To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.
To post to this group, send email to confluent...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.
To post to this group, send email to confluent...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.

To post to this group, send email to confluent...@googlegroups.com.



--
Thanks,
Ewen

James Cheng

unread,
Apr 6, 2016, 4:23:10 PM4/6/16
to confluent...@googlegroups.com
Joe,

This is a fantastic writeup. As someone who knows Kafka but very little about Nifi, I found this very informative.

Thanks!
-James

To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platf...@googlegroups.com.

To post to this group, send email to confluent...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.




This email and any attachments may contain confidential and privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments) by others is prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete this email and any attachments. No employee or agent of TiVo Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo Inc. may only be made by a signed written agreement.

pamnani learning

unread,
Aug 11, 2020, 11:24:40 AM8/11/20
to Confluent Platform

Hi Joe,

I found the kafka vs nifi information very helpful.  How we can see nifi different from kafka connect? Kafka connect also provide the connector, transformation, converter.

Thanks
Reply all
Reply to author
Forward
0 new messages