Both Apache NiFi and Apache Kafka provide a broker to connect producers and consumers but they do so in a way that is quite different from one another and complementary when looking holistically at what it takes to connect the enterprise.
In thinking about the 'data plane of connecting systems' the approach with Kafka is around having collaborating producers and consumers agree to exchange information on a specified topic using Kafka's protocol and exchanging data of a format and schema which both parties agree to and understand. With NiFi by comparison we take the view that many enterprises have tremendous diversity of systems, protocols, formats, schemas, priorities and yet we still have to connect them. In my view and in very simplistic terms Kafka's model is optimal when you can fix the majority of those terms of agreement between systems whereas NiFi's model is optimal when you cannot.
We should also think about the 'control plane of connecting systems' and by this I mean think about how do we actually manage and control the flow of data. With Kafka the logic of the dataflow lives in the systems that produce data and systems that consume data. With NiFi we wanted to decouple the producers and consumers further and allow as much of the dataflow logic as possible or desired to live in the broker itself. This is why NiFi has interactive command and control to effect immediate change and why NiFi offers the processor API to operate on, alter, and route the data streams as they flow. It is also why NiFi provides powerful back-pressure and congestion control features. The model NiFi offers means you do have a point of central control with distributed execution, where you can address cross cutting concerns, where you can tackle things like compliance checks and tracking which you would not want on the producer/consumers.
There are of course many other aspects to discuss but sticking to the ideas raised in the thread so far here is a response for a few of them.
'Push vs Pull'
In Kafka producers push to Kafka and consumers pull from Kafka. This is a clean and scalable model but again it requires systems to accept and adopt that protocol. In NiFi we do not require a specific protocol. We support both push/pull patterns for getting data into NiFi just as we do for getting data out. There are great architectural reasons to strive for the convergence that Kafka promotes and very practical realities of connecting systems across the enterprise that NiFi is designed to accommodate.
'HA'
On the data plane NiFi does not offer distributed data durability today as Kafka does. As Lars pointed out the NiFi community is adding distributed durability but the value of it for NiFi's use cases will be less vital than it is for Kafka as NiFi isn't holding the data for the arbitrary consumer pattern that Kafka supports. If a NiFi node goes down the data is delayed while it is down. Avoiding data loss though is easily solved thanks to tried and true RAID or distributed block storage. NiFi's control plane does already provide high availability as the cluster manager and even multiple nodes in a cluster can be lost while the live flow can continue operating normally.
'Performance'
Kafka offers an impressive balance of both high throughput and low latency. But comparing performance of Kafka and NiFi is not very meaningful given that they do very different things. It would be best to discuss performance tradeoffs in the context of a particular use case.
Thanks
Joe