Apache NiFi - a newly proposed ASF incubator project - based on FBP

1,096 views
Skip to first unread message

Joe Witt

unread,
Nov 22, 2014, 9:27:08 AM11/22/14
to flow-based-...@googlegroups.com
Flow-based Programming Community,

I wanted to make you aware of a newly proposed Apache Software Foundation (ASF) project we're calling Apache NiFi.

The project from its very foundation was based on the core concepts of flow-based programming which we've found to be an extremely powerful set of simple abstractions for building a general purpose processing platform which we've heavily used in the dataflow/system integration problem space.

If the ASF community approves we'll have the code out there for folks to review, critique, and contribute to very soon.  We hope to build a strong open source community around this FBP technology.  We'll be working hard to ramp up the documentation and interaction with the community in the coming weeks and months.

Would really like to be able to talk with other FBP enthusiasts to ensure we're doing right by the FBP ideals.  I think there are some elements we're missing at this point which we should consider adding.  But look forward to hearing some of your thoughts on this.

Also, as someone who has long since been following the FBP community but unable to collaborate i wanted to say thanks.  I have seen a notable increase in FBP activity over the last couple of years including the very awesome work the NoFlo team and others are doing.  I think FBP is really starting to catch its stride.

Thanks
Joe

Joe Witt

unread,
Nov 22, 2014, 9:28:06 AM11/22/14
to flow-based-...@googlegroups.com
And the link to the proposal: http://wiki.apache.org/incubator/NiFiProposal

John Cowan

unread,
Nov 22, 2014, 10:12:45 AM11/22/14
to flow-based-...@googlegroups.com
Joe Witt scripsit:

> The project from its very foundation was based on the core concepts of
> flow-based programming which we've found to be an extremely powerful set of
> simple abstractions for building a general purpose processing platform
> which we've heavily used in the dataflow/system integration problem space.

Is this a classical FBP system, or a reactive-FBP system like NoFlo?

--
John Cowan http://www.ccil.org/~cowan co...@ccil.org
Almost all theorems are true, but almost all proofs have bugs.
--Paul Pedersen

Ged Byrne

unread,
Nov 22, 2014, 10:40:59 AM11/22/14
to flow-based-...@googlegroups.com
Hi Joe,

The proposal says it is based on Niagra Files. Is that anything to do with Niagra AX?

Regards,



Ged
--
You received this message because you are subscribed to the Google Groups "Flow Based Programming" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flow-based-programming+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Joe Witt

unread,
Nov 22, 2014, 10:47:03 AM11/22/14
to flow-based-...@googlegroups.com, co...@mercury.ccil.org
John

I can't honestly say I appreciate the distinction between classical and reactive FBP.  Can you describe that more or point to a previous discussion which helps distinguish?  I know of http://www.reactivemanifesto.org/ and if that is what is meant by reactive here then yes I think we're on that path.  Having said that I'm not sure the original writings or descriptions or ideals for FBP would have excluded these ideas.

We are strong on the foundations of:
- Information Packets (we call them 'Flow Files'
- Broker (we call this a 'Flow Controller')
- Black boxes (we call this a 'Flow File Processor')
- Resource constrained relationship (we call this a relationship but it supports prioritization and bounding)

We also have support for subnets which we call 'Process Groups' and those have input ports and output ports which establish the context for use.

Thanks
Joe

Joe Witt

unread,
Nov 22, 2014, 10:51:34 AM11/22/14
to flow-based-...@googlegroups.com
Ged,

Hello.  No it does not.  We've done a complete name change to NiFi at this point.  The last reference we'll have made to Niagarafiles was just in that proposal and that is essentially to provide closure to that reference.

Thanks
Joe
To unsubscribe from this group and stop receiving emails from it, send an email to flow-based-programming+unsubscri...@googlegroups.com.

John Cowan

unread,
Nov 22, 2014, 11:01:15 AM11/22/14
to Joe Witt, flow-based-...@googlegroups.com
Joe Witt scripsit:

> I can't honestly say I appreciate the distinction between classical
> and reactive FBP. Can you describe that more or point to a previous
> discussion which helps distinguish?

In classical FBP, each component is free to read from an input port
whenever its logic says to, whereas in reactive FBP like NoFlo, a
component is involuntarily entered whenever input is available for it.
Consequently, reactive FBP can be implemented without multiple threads,
whereas classical FBP is inherently multi-threaded (or multi-process,
as in Unix pipelines).
In computer science, we stand on each other's feet. --Brian K. Reid

Joe Witt

unread,
Nov 22, 2014, 11:11:12 AM11/22/14
to flow-based-...@googlegroups.com, joe....@gmail.com, co...@mercury.ccil.org
In that sense then we're definitely classical.  Each component chooses when to read from its input queue(s).  What we don't support is named input queues from a recipient process sense.  While multiple incoming relationships can exist to a given process X the perception process X has is of reading from a single queue.  If there was a specific necessary context for a given information packet then we'd expect that to be included in the attributes/context of that IP.  

For the scheduling of processes we support a few different styles of scheduling:
- timer based
- event-driven
- cron-based

Timer based as you can imagine just means the process will execute based on some fixed delay from its last execution and is subject to availability of a thread in the controller managed thread pool.

Event driven means it will execute as soon as a new data item shows up in its work queue and there is a thread available.  But it doesn't technically have to take data from the input queue.  The process is simply given a thread with which to do something useful.

Cron-based is just a fancier version of timer based of course letting you be more precise about the time ranges you want to execute.

All of these modes support concurrency up to some configurable max localized to that process and then all processes are limited by the overall thread pool size of the controller.

All relationships between processes are queues and those queues are dynamically prioritized and bounded either by the number of things in the queue or by the total data size represented by the items in the queue or both.  The queues then if full cause back pressure which means processes which could feed them will not execute while the pressure exists which in turn propagates the pressure...  In addition, the queues supported automated expiration meaning that if items are in the queue and become older than that expiration time the system will automatically purge those objects.  

Finally for each queue it is backed by a pluggable repository construct.  The default out of the box mode uses a file system and a write-ahead log to ensure durability of the data across process faults, system failures, etc..

Thanks
Joe  

On Saturday, November 22, 2014 11:01:15 AM UTC-5, John Cowan wrote:
Joe Witt scripsit:
proce

Joe Witt

unread,
Nov 22, 2014, 11:21:52 AM11/22/14
to flow-based-...@googlegroups.com
Hello Ged,

posting again because the last one seemed to not go t.hrough...

There is no relationship to the 'Niagara AX' product.  The Niagarafiles name we've been using for many years is being retired with the initiation of this proposal.  We've switched to just NiFi and specifically if accepted it will be 'Apache NiFi'.

Thanks
Joe

On Saturday, November 22, 2014 10:40:59 AM UTC-5, Ged Byrne wrote:
To unsubscribe from this group and stop receiving emails from it, send an email to flow-based-programming+unsubscri...@googlegroups.com.

Paul Morrison

unread,
Nov 22, 2014, 12:17:01 PM11/22/14
to flow-based-...@googlegroups.com, Joe Witt, Oleksandr Lobunets, John Cowan
Hi Joe,

It's great hearing from you, and seeing you join in the FBP conversation!  Also your timing is excellent, as Alex Lobunets and I have recently been revising the web page comparing NoFlo with "classical" FBP.  John sums it up well, but so many people have come into FBP via NoFlo that we thought it important to try to clarify the differences in more detail.  In fact your comment about "back pressure" reminded me that that concept should also be included!

The page in question is http://www.jpaulmorrison.com/fbp/noflo.html .

It occurs to me that this parallels in a way the history of OO, where they started off the using the metaphor of message passing, but didn't implement it (for performance reasons?), and simply implemented indirect method calls via the class prototype, which of course is very different. I have actually appended a comparison of FBP and OO to the above-mentioned web page, for those interested in the history of computing!

I'm sure there will be objections, brickbats - but hopefully this will provoke some discussion, and we can refine it further!

Joe, thanks for the kind words, and I look forward to reading your web site in detail.

Best regards,

Paul

John Cowan

unread,
Nov 22, 2014, 12:58:02 PM11/22/14
to flow-based-...@googlegroups.com, joe....@gmail.com
Joe Witt scripsit:

> In that sense then we're definitely classical. Each component chooses when
> to read from its input queue(s). What we don't support is named input
> queues from a recipient process sense. While multiple incoming
> relationships can exist to a given process X the perception process X has
> is of reading from a single queue. If there was a specific necessary
> context for a given information packet then we'd expect that to be included
> in the attributes/context of that IP.

I see; we could perhaps call this a semi-classical design. It makes
something like an alternating merge (one packet from here, then one from
there) impossible unless the component maintains its own internal queues.

If you think about a program that opens multiple input files for various
purposes, but then gets the input from the different files intermingled,
so that you can't predict when you call read() just which file you'll
get data from, you can see the difficulty of this style of programming.
Yakka foob mog. Grug pubbawup zink wattoom gazork. Chumble spuzz.
--Calvin, giving Newton's First Law "in his own words"

Paul Morrison

unread,
Nov 22, 2014, 1:00:39 PM11/22/14
to flow-based-...@googlegroups.com
I agree - it's a variant of the Collate problem that NoFlo has trouble with!

Cheers,

Paul

--
You received this message because you are subscribed to the Google Groups "Flow Based Programming" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flow-based-progra...@googlegroups.com.

Joe Witt

unread,
Nov 22, 2014, 1:18:42 PM11/22/14
to flow-based-...@googlegroups.com
John, Paul,

While i understand the theory behind why named input ports are useful we have not as of yet found them to be necessary on a given discrete black box.  We have supported a broad range of fan-in, multiple-input sorts of use-cases without them.  As I mentioned above the way that we do this is by using the context available to us on the information within a given IP.  It is true that our approach requires that process to use an internal data structure then to sort items as appropriate but we've found this necessary and desirable for a variety of cases.  These internal queues are still bound to their given input sources.  If the black box were to roll-back its execution cycle for instance the items return t their input queues.  One of our more commonly used and powerfully simple black boxes is called MergeContent.  It uses the bin-packing algorithm to efficiently bin items and as criteria is met (size, number of items, time) it kicks out completed bins.  It can form bins based on matching a set of criteria as found specifically within the IPs.

Remember too I mentioned that we support 'subnets' which we call 'Process Groups'.   Those do have named input ports because those ports establish the entry context to a given Process Group.  That group is in effect a 'black box composed of black boxes'.  It is just that on a single discrete black box we don't have named input ports - as of yet.

Are there specific use cases/examples you can point to which you believe are best addressed by having named input ports?  Perhaps it is a case where a black-box requires the presence of multiple distinct items to happen before that black-box can meaningfully do its work?  We've dealt with that again using this very simple context mechanism I describe above.

It will obviously be easier to discuss once the software is available.  Quite honestly it is funny we've zeroed in here so quickly as it is the primary part of FBP i've often debated whether we needed (at least as described 'named input ports for a black box').

Thanks
Joe

Oleksandr Lobunets

unread,
Nov 22, 2014, 3:10:06 PM11/22/14
to flow-based-...@googlegroups.com
Hi Joe,

Your proposal looks interesting, although it's hard to understand what NiFi is about completely without looking into the sources.

How do you execute your FBP processes: fibers, green threads, native threads, system processes? 
Is your platform going to be language-agnostic?
Which flow description languages (DSL) do you plan to support or probably develop (besides the visual diagramming)?

Best regards,
Alex
To unsubscribe from this group and stop receiving emails from it, send an email to flow-based-programming+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Flow Based Programming" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flow-based-programming+unsub...@googlegroups.com.

Joe Witt

unread,
Nov 22, 2014, 7:28:39 PM11/22/14
to flow-based-...@googlegroups.com
Ged,

Each time i respond to the question about niagara ax the post seems to get dropped.  Posting independent of your question in hopes that it helps.

Niagarafiles is what we used to call this project and it has no relationship to the commercial 'Niagara AX' product suite.  From here on though our project will be referred to as 'Apache NiFi' should the ASF community accept it.

thanks
Joe

Joe Witt

unread,
Nov 22, 2014, 8:10:07 PM11/22/14
to flow-based-...@googlegroups.com
Alex

The platform is designed for languages which operate on the JVM.  The black box processes operate on threads within the JVM.  The command and control construct to build flows works through an HTTP-based API and the primary actor on that interface then is our web-base user interface.  We don't have plans beyond that at this particular time for the actual construction of flows - certainly we expect processes to interact with the system through this API but not to change the structure of a data flow.  FBP provides a very nice model which translates well to an intuitive user interface for the real-time construction and modification of flows.  

We hope to have the source available soon through the ASF.

Thanks
Joe

On Saturday, November 22, 2014 9:27:08 AM UTC-5, Joe Witt wrote:

Ged Byrne

unread,
Nov 23, 2014, 2:54:30 AM11/23/14
to flow-based-...@googlegroups.com
Thanks Joe,

I'm looking forward to seeing the sources. Having a flow based project in the Apache stack would be great.

Do you have plans to integrate with other Apache projects, especially Camel?

Regards,



Ged
--

Joe Witt

unread,
Nov 23, 2014, 2:23:57 PM11/23/14
to flow-based-...@googlegroups.com
Ged,

We certainly have plans to integrate with quite a few more Apache projects.  More analysis will need to occur to best understand if integration with Camel makes sense.  I believe Camel and NiFi tackle a similar problem space but in vastly different ways.  The way I currently look at it, and it is perhaps incorrect and a major simplification:
- With Camel if you're building an application that does integration you can use Camel in your application.
- With NiFi if you're building an application that does integration you can build your application in NiFi.

Thanks
Joe


Ged Byrne

unread,
Nov 23, 2014, 3:28:34 PM11/23/14
to flow-based-...@googlegroups.com
Hi Joe,

I've done some work in the relationship between Camel and FBP. Hopefully I can contribute something.


Regards,



ged
--
You received this message because you are subscribed to the Google Groups "Flow Based Programming" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flow-based-progra...@googlegroups.com.

Joe Witt

unread,
Nov 26, 2014, 2:25:01 PM11/26/14
to flow-based-...@googlegroups.com
Hello

The platform is designed for languages which operate on the JVM.  The black box processes operate on threads within the JVM.  The command and control construct to build flows works through an HTTP-based API and the primary actor on that interface then is our web-base user interface.  We don't have plans beyond that at this particular time for the actual construction of flows - certainly we expect processes to interact with the system through this API but not to change the structure of a data flow.  FBP provides a very nice model which translates well to an intuitive user interface for the real-time construction and modification of flows.  

We hope to have the source available soon through the ASF.

Thanks
Joe

To unsubscribe from this group and stop receiving emails from it, send an email to flow-based-progra...@googlegroups.com.

Paul Morrison

unread,
Nov 27, 2014, 10:55:19 AM11/27/14
to flow-based-...@googlegroups.com


On Saturday, November 22, 2014 1:18:42 PM UTC-5, Joe Witt wrote:
John, Paul,

Are there specific use cases/examples you can point to which you believe are best addressed by having named input ports?  Perhaps it is a case where a black-box requires the presence of multiple distinct items to happen before that black-box can meaningfully do its work?  We've dealt with that again using this very simple context mechanism I describe above.


I sent an answer to this question in a note, but realized that I didn't publish it to the group as a whole, so I will repeat it here, slightly modified:

You say that processes only have one input port, similarly to NodeRed, which rules out basic FBP business functions like the batch Update program, based on "Collate" - described on p.90 (and following pages) in the 2nd edition of my book.  If you are able to suspend a thread, I don't see why you would not support multiple input ports (and/or port arrays). Alternatively, something I read suggests you don't suspend threads, in which case your system would be FBP-like, but not "classical" FBP!

Consider a thought experiment where you have billions of detail records running against thousands of master records.  I don't really want to store huge numbers of IPs, while deciding which IP to put out first.  Collate lets you merge separately sorted streams of records based on key values, whereas I believe your merged stream resembles the output of the FBP Collate, so you still need some kind of merge or sort function.  It's hard to see how you would merge your masters and details without a Collate, unless you sort them all together in a separate step (the master records are usually already sorted, so it's wasteful to sort them again), or use a non-FBP merge mechanism, which will actually be doing what Collate does, except with files.  In fact, given that it is so natural to read selectively from files when you need the data, why wouldn't you extend that metaphor to data streams?

Interestingly, when you said you didn't have named input ports, I assumed you have numbered input ports instead - which is what we used very successfully for years with AMPS!  Once we introduced named ports, then of course we had to have array ports - and this is in fact what Collate uses (one port, but any number of array elements - 1 to n).

Last point, the NiFi technology sounds perhaps a bit over-complex.  My approach has always been to build the minimum and then only add what people absolutely need.  Principle of YAGNI.  We even backed out facilities a few times, e.g. the NIP function, described in file:///C:/Users/Paul/
Documents/Business/FBP/perform.shtml  (do a find on NIP - it's near the end of the chapter). The "DropOldest" function was only added a few months ago - basically in response to a user's need for something that would handle real-time measurements (not something that crops up in most business apps). For instance, you say one of your options is timer-driven scheduling, but why build this into the infrastructure, when you could simply have a clock process sending out signals, as described in Chap. 19 of the 2nd edition...  Why do you need to back up all connections - it's much easier to debug deadlocks if you only add files at specific points in the network - also see Chap. 16 of the 2nd edition (Deadlocks).   And some of the features you describe sound downright scary, from the point of view of guaranteeing completeness of processing!

Since you are working in Java, you might want to give JavaFBP a try - and then let us know what features it is missing!

Regards,

Paul M.
 

Joe Witt

unread,
Nov 27, 2014, 11:45:53 AM11/27/14
to flow-based-...@googlegroups.com
Paul,

Thank you for your response.

My primary question at this point, and frankly in the spirit of YAGNI, is about understanding the necessity of named input ports and I suggested an alternative mechanism based on the context afforded within the information packets themselves.  As we are able to make more of the software and documentation available i think perhaps this discussion will be more effective as we'll have more of a shared understanding on which to discuss.

Regarding the complexity suggested it is likely premature to make that assessment given the limited availability of documentation, source, and discussion at this point.  We hope to resolve that soon and look forward to understanding if your assertions of potential correctness are still a concern.  Fortunately, the open source community will have an opportunity to assess and advise whether we've found the right fit or are too complex.  Their feedback will help us understand when and how to course correct as needed.

I look forward to discussing these items further with you as more information is available to inform the discussion.

Thanks
Joe


--

Paul Morrison

unread,
Nov 27, 2014, 12:41:29 PM11/27/14
to flow-based-...@googlegroups.com

On Thu, Nov 27, 2014 at 11:45 AM, Joe Witt <joe....@gmail.com> wrote:
Paul,

Thank you for your response.

My primary question at this point, and frankly in the spirit of YAGNI, is about understanding the necessity of named input ports and I suggested an alternative mechanism based on the context afforded within the information packets themselves.

Hi Joe,

This may be more a question of clashing metaphors!  I have always thought of FBP processes as little main lines (stand-alone programs), so restricting a process to just one input port is (to me) like saying that a program can only have one input file!  So my question, which may not require me to see the code, would be: what advantage do you gain from this restriction?

TIA

Paul

Joe Witt

unread,
Nov 27, 2014, 12:56:44 PM11/27/14
to flow-based-...@googlegroups.com
Paul,

Our aim is to allow developers and non-developers alike to build powerful data flows by doing so visually and in real-time so feedback and consequence of modification is immediate.  Input ports for a given black box, and we've built or seen hundreds to date, have never been necessary to to elegantly achieve a given goal.  Thus, we've kept the additional concept and construct of named input ports on black boxes out of the system.  We have not by any means closed the book on them - we are just seeking to understand their necessity before we commit to them.

I recognize this is an important aspect of what you refer to as classical FBP and so will continue to find a common ground to better articulate and discuss.

Thanks
Joe

--

Oleksandr Lobunets

unread,
Nov 27, 2014, 8:02:07 PM11/27/14
to flow-based-...@googlegroups.com
Joe,

Interesting discussion's topic btw (multiple input ports). 

I'm implementing some flows for home/office automation using IBM's NodeRed, where the nodes have only single input port. Probably it depends on the specific task but in my case I'm experiencing a great discomfort being limited to a single input. If the node cannot distinguish data by ports than the only way to do so is introduce a specific IP structure (header with type and other fields) and to look into each incoming IP to decide which logic to trigger. 
In NodeRed in particular that leads to heavy usage of "function" nodes, where a developer write arbitrary JavaScript implement as a body. 
Typical example is using Kicker (Inject node) to issue a HTTP request with specific headers. With classical FBP I would send IIP to configuration port of HTTP request node but in NodeRed I have to place a "function" node between Kicker and Request where I modify IP by adding a headers property to it. Writing more or less complex flow for NodeRed implies too much conventional programming with JavaScript and therefore less visual programming. Though I would agree this decision simplifies life of a component developer :-)

Kind regards,
Alex
To unsubscribe from this group and stop receiving emails from it, send an email to flow-based-programming+unsub...@googlegroups.com.

Paul Morrison

unread,
Nov 27, 2014, 8:06:22 PM11/27/14
to flow-based-...@googlegroups.com


On Nov 27, 2014 12:56 PM, "Joe Witt" <joe....@gmail.com> wrote:
>
> Paul,
>

> Our aim is to allow developers and non-developers alike to build powerful data flows by doing so visually and in real-time so feedback and consequence of modification is immediate.  Input ports for a given black box, and we've built or seen hundreds to date, have never been necessary to to elegantly achieve a given goal.  Thus, we've kept the additional concept and construct of named input ports on black boxes out of the system.  We have not by any means closed the book on them - we are just seeking to understand their necessity before we commit to them.
>
> I recognize this is an important aspect of what you refer to as classical FBP and so will continue to find a common ground to better articulate and discuss.
>
> Thanks
> Joe
>

I guess we share a lot of the same goals, as well as the desire for elegance, but where you see multiple input ports as an unnecessary frill, I see allowing only one input port as an unnecessary restriction!  As I asked before, what does it buy you?  Perhaps if you could explain your term "context", things might become a bit clearer!  How for instance do you implement components that do "gating"?  How do you delay a process until data arrives at a secondary input port? Perhaps you have developed techniques for all of these, but based on a different mental model...

Regards,

Paul

Joe Witt

unread,
Nov 27, 2014, 8:59:45 PM11/27/14
to flow-based-...@googlegroups.com
Paul, Alex,

Thank you both for the comments.  Alex described how we do this accurately I think which was

"...introduce a specific IP structure (header with type and other fields) and to look into each incoming IP)..."

As I've mentioned we've dealt with several fan-in cases like bin-packing/merge/etc.. using this technique.  However, I concede there are potentially use cases where multiple input ports offer a more elegant approach.  I am simply trying to come to understand what those cases are.  You hinted at a case where a process needs multiple different inputs at the same time before it can reasonably execute.  I will do some more digging here.

Thanks
Joe

Paul

--

Paul Morrison

unread,
Nov 27, 2014, 10:15:27 PM11/27/14
to flow-based-...@googlegroups.com
Hi Joe,

On reading Alex's comment which you quote, I get the impression that you are thinking in terms of using the different ports for different types of data.  For me multiple input ports have more to do with timing relationships, for want of a better word.  If you don't know what sequence the data is going to arrive in, and you are willing to process them in arrival sequence, it certainly makes sense to bring all the data into a single input port, as e.g. for a component or subnet which processes the output of Collate.  As described in my book, you can then look at a code in the incoming IPs, or, in class languages like Java and C#, you can often just test the class of the IP contents, to decide how to process the data.

On the other hand, if you wanted to merge 6 streams of data on a round-robin basis, independent of contents, it would make most sense to have 6 input ports (actually array port elements), and then just do receives from input port IN[i], where 'i' runs from 0 to 5, repeatedly, until all input streams are exhausted.  If you want to gate one input stream, holding up processing until another piece of data arrives from a different source, it is hard to see how you can do this with only one input port.  I am sure we can come up with more examples!

I think there is also the hidden assumption that you can hold data somewhere until it is time to process it, invisibly overflowing to disk if necessary.  This will probably work for reasonable data volumes, but when you are dealing with very large amounts of data, it may get unwieldy.  See Mike Beckerle's comments at various places in my book - they were dealing with humungous amounts of data!  It can also make it difficult to troubleshoot deadlock conditions!

As I said before, you might want to try some simple tests with JavaFBP - I can help if you run into any problems!

Regards,

Paul

Oleksandr Lobunets

unread,
Nov 28, 2014, 2:47:38 AM11/28/14
to flow-based-...@googlegroups.com
Hello Joe, Paul,

To me the multiple input ports feature is also a usability aspect (UX/DX) at the networking design level when it comes to the visual programming part. 
If I discover a component in the library, drag it to the canvas I look at it inputs/outputs same as I look at the function signature (arguments, return) in the API documentation. 
Let's imagine a scenario where you loaded a network of 20 nodes, you see one node with 9 fan-ins and your task is to change the configuration of this node. With a single input port you need to follow connection from input port at each of 9 upstream nodes to find which one is responsible for configuration. So that would be something like O(9) for a designer in respect to efforts, but it could be O(1). This problem scales of course.

In respect to Collate example, I can't imagine implementing selective port reads with a single port unless the implementation technology allows to look up an IP in the queue without actually taking it from there. This is usually possible and implemented as locking item in the queue with/without timeout and then marking it as not-processed, but in the most queue system I worked with such elements go to the back, which breaks the natural order of arrival. I don't consider implementing buffers on the component side as it creates too much work for a component developer.

Regards,
Alex

Ged Byrne

unread,
Nov 28, 2014, 3:20:45 AM11/28/14
to flow-based-...@googlegroups.com
Hi Joe,

I agree with Alex.  Named ports are a real bonus when it comes to reading the flow graph.  The great thing about open sourcing NiFi is that we can add Named Ports if we really want them :)

Could you share an image of what a NiFi graph looks like in the editor?

Regards, 


Ged



Joe Witt

unread,
Nov 28, 2014, 8:41:58 PM11/28/14
to flow-based-...@googlegroups.com
Alex, Ged,

Thanks.  I like the phrase 'selective port read'.  I will re-examine the existing process we have to see if there are any which would be better served.  Once the software is open sourced and perhaps you've had some time to see it I'd like to come back and ask your thoughts again.  I think we've dealt with some of the UI elements well to address your noted concern but for the developer experience I am not sure.  As soon as we can put our code/docs up we will be able to get images/screenshots/etc.. in place.

Thanks
Joe

Ged Byrne

unread,
Nov 29, 2014, 2:32:44 AM11/29/14
to flow-based-...@googlegroups.com
Thanks Joe, looking forward to seeing more.

Paul Morrison

unread,
Nov 29, 2014, 3:10:13 PM11/29/14
to flow-based-...@googlegroups.com
I agree with Alex too.  Trying to do a Collate using selective receives from a single connection seems to me complicated and potentially dangerous.  As Alex says, you would lose the ordering, and remember also that connections have a finite capacity, which is presumably why your proposed system overflows connections to disk.  Pushback, which you say you have, helps ensure the smooth flow of data through your network - with out of sequence processing, and data "ballooning" periodically onto disk, it seems that debugging will be a lot more difficult.  Also it will be hard to verify that all data has been processed.

Regards,

Paul

Suminda Dharmasena

unread,
Dec 11, 2014, 12:46:11 PM12/11/14
to flow-based-...@googlegroups.com
Few questions / requests:
  • How does it fare with GC Gitter and use of off heap memory to prevent GC Gitter?
  • Will you have a web editor which can be used in web based projects?
  • Will you able to use NoFlo UI with this as an alternative editor? Will you support this in the future?
  • Will the editor be a projectional editor / structure editor?
  • Is it possible to extend this to more general Dataflow programming using both Visual and text based language? Like: Microsoft Dryad, Netflix Mantis, Apache Flink, Verilog, VHDL, CAL Actor Language, etc.
  • Will you support data connection and feed handlers for important domains out of the box? E.g FIX Protocol handlers for trading, OpenMAMA support, etc.
  • Any plans to use a language work bench (https://www.jetbrains.com/mps/http://mbeddr.com/https://github.com/JetBrains/Nitra) so this can be ported outside the JVM while maintaining the same code base?
  • Will this support data flow spreadsheet paradigm? E.g. http://www.ankhor.com/en/
  • Will this support real time / streaming DataFrames? E.g. http://ddf.io/, borrowing concepts from Pandas, R DataFrame, https://www.quantrix.com/en/

Joe Witt

unread,
Dec 12, 2014, 10:36:13 AM12/12/14
to flow-based-...@googlegroups.com
Hello

I don't want to interfere with the intent of the fbp group so will redirect these questions to the apache nifi (incubating) mailing lists and respond as best I can there.

Thank you
Joe

--

Suminda Dharmasena

unread,
Dec 12, 2014, 1:01:17 PM12/12/14
to flow-based-...@googlegroups.com
Sure.

Can you get Nabble interface set up for the forum?

S

--
You received this message because you are subscribed to a topic in the Google Groups "Flow Based Programming" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/flow-based-programming/_lQhsJR_Ihg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to flow-based-progra...@googlegroups.com.

Oleksandr Lobunets

unread,
Dec 15, 2014, 7:08:03 PM12/15/14
to flow-based-...@googlegroups.com


On Saturday, November 22, 2014 3:28:06 PM UTC+1, Joe Witt wrote:
And the link to the proposal: http://wiki.apache.org/incubator/NiFiProposal

Joe Witt

unread,
Dec 20, 2014, 9:08:15 PM12/20/14
to flow-based-...@googlegroups.com
I am contributing on behalf of the Apache NiFi project, you can learn more about our incubating project at http://nifi.incubator.apache.org. One of the first community contributions that we received could potentially be well served with named input ports, a previous recommendation in this thread! https://issues.apache.org/jira/browse/NIFI-190

--

Samuel Lampa

unread,
Jul 21, 2015, 5:21:23 PM7/21/15
to Flow Based Programming

Paul Morrison

unread,
Jul 21, 2015, 10:15:16 PM7/21/15
to flow-based-...@googlegroups.com
Yay!  Apache!  Adrian Bridgwater even has a nice tweet pointing at the FBP main web site!  Thanks, Adrian!

--

Paul Morrison

unread,
Jul 21, 2015, 10:26:13 PM7/21/15
to flow-based-...@googlegroups.com
And thanks, also, Samuel, for pointing this out!

Best regards,

Paul M.

On Tue, Jul 21, 2015 at 5:21 PM, Samuel Lampa <samuel...@gmail.com> wrote:

--
Reply all
Reply to author
Forward
0 new messages