FBP capabilities (NIFI) in real time stream processing

79 views
Skip to first unread message

Bobby Harsono

unread,
Mar 6, 2018, 11:27:16 PM3/6/18
to Flow Based Programming
Good day,

I would like to know the "safety" and "reliability" of FBP framework in real time processing; I will ask about Apache Nifi here, since that is my main framework in learning. 

I'm thinking of a simple application of messaging, 
  • A processor will accept messages and do a simple mapping of its attributes to database, i.e. if message_body.startsWith("123") then get data from table with ID = 123 and add the value as new attribute
  • A processor will act as a router
  • Two processors will accept act as dispatcher 
In simple illustration,

|RECEIVER| -- |ROUTER| -- |DISPATCHER A|
                                           -- |DISPATCHER B|


Now my questions are:
  1. What is the best practice / proper way to do database mapping with complex criteria?, in my experience, the best way to do this in Java is to fetch the data in specific table (let's say Table A) into a collection (Map) in regular schedule (per 5 mins or so); The reason i'm doing this is to reduce the connection thingy when querying. Imagine if for each message we have to open connection, query, close connection. I've tested this approach and it was much faster, even with connection pooling. The downside is i need space in my memory which grow along with my database size. I use MySQL and haven't tested in memory database, i just want to know the common and fast approach to do this. If there is no other way, i might create a standalone module using spring boot to handle this stuffs and communicating with NIFI processor using REST API
  2. Is it OK to use one flow file for one message?
  3. Is it possible and recommended to use dependency injection library like Spring?

Thank you,


Bobby

toivo...@gmail.com

unread,
Mar 8, 2018, 6:28:12 AM3/8/18
to Flow Based Programming
Hi Bobby,

You probably already known this, but You could use NiFi QueryDatabaseTable processor for incremental reading.
NiFi database processors are using connection pooling, so getting new connection is pretty fast.
Database Extract with NiFi
https://www.batchiq.com/database-extract-with-nifi.html

2. Is it OK to use one flow file for one message?
It depends.  Often one flow file per one message is OK.
But handling FlowFile has some overhead, so if you need blazing speed, combining several messages to one FlowFile is maybe better alternative.

3. Is it possible and recommended to use dependency injection library like Spring?
Not sure I understand your question, how and why you want to dependency injection?

You can always create your specific processors which handle your data in desired way.

Thank you
Toivo

Paul Morrison

unread,
Jun 5, 2018, 11:17:21 AM6/5/18
to flow-based-...@googlegroups.com, toivo...@gmail.com
Re Toivo's #2, given that NiFi FlowFiles correspond to classical FBP's Information packets, you might find the discussion about FBP "tree" structures relevant - http://www.jpaulmorrison.com/fbp/tree.shtml .

I agree with what Toivo seems to be saying: in general, I feel the added complexity, and shift of viewpoint, caused by stepping through multiple messages within a single FlowFile, is not warranted unless processing speed outweighs other considerations, such as ease of maintenance.

--
You received this message because you are subscribed to the Google Groups "Flow Based Programming" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flow-based-programming+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages