FBP Pattern for parallel processing data where number of paths is unknown at design time

96 views
Skip to first unread message

Dan Rumney

unread,
May 26, 2016, 11:10:52 AM5/26/16
to Flow Based Programming
Hi folks,

I'm new to FBP (but old to programming) and I'm beginning to get my feet wet.

I'm hoping this group is going to be a good source of feedback around questions of good design and useful patterns as there's nobody in my office or circle who've done this before.

In particular, I'm trying to figure out the best way to approach a split/merge problem.

I have a network that will take in N data entities. I need to process each of those entities independently and then merge and compare the processed versions.

I don't know the value of N at design time.

So... is there a standard FBP approach to this? Is there a standard FBP diagramming approach to portraying this?

Are we basically talking a Composite Component that instantiates multiple instances of a SubNet for each of the entities?

Thanks, everyone! Looking forward to some interesting conversations!

Dan

Paul Morrison

unread,
May 26, 2016, 11:34:29 AM5/26/16
to flow-based-...@googlegroups.com
Hi Dan, welcome to the group!

I am not sure how subnets help you - what did you have in mind?  Can you draw the picture?

I assume you are thinking of generating some number of processes dependent on N.  If you look at https://github.com/jpaulm/javafbp/blob/master/src/main/java/com/jpmorrsn/fbp/examples/networks/TestLoadBalancer.java , you will see a variable number of processes, based on the number of processors, but it could use some other algorithm.  Unfortunately this number is set before you have had a chance to read your data IPs, so I might suggest breaking your app into two networks, where the first one just counts the IPs, and writes the count to a file, while the second one generates a network based on that count. 

Rather sledge-hammer approach - and I may have misunderstood your problem - almost certainly, in fact!

Of course the general answer is to provide support for dynamic modification of the network at run time - this has been left as an exercise for the reader!

Cheers

--
You received this message because you are subscribed to the Google Groups "Flow Based Programming" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flow-based-progra...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Alfredo Sistema

unread,
May 26, 2016, 3:37:59 PM5/26/16
to flow-based-...@googlegroups.com
Welcome Dan, hope you enjoy the topic.

There are two different problems, the first one is to tag data to be processed in a graph, and then reorder it at the end. The second problem is to make a graph that changes in shape.
For the first problem, you can either use groups, that "name" streams of packets, think of them as tags, for example <5> data data data </5> <6> data data data </6>. The idea is that you can process the data, and forward the group information, that way you have a tagged stream of data along the graph.
Another solution that requires special process support is to add ordering metadata to each packet, like how tcp works and many other communication protocols.

For the second problem, you should optimize the graph for the computer it will run on, so adjust the graph accordingly with enough processing processes ( forgive the redundancy ). You are not going to get more performance by having more os processes or threads than cores, so it's not going to be a giant graph unless you have a giant computer, in that case you can always turn parts into subgraphs for maintenance.

Another option is to use the concept of multiplexing, which I think that only cppFBP supports, Right Paul ? The concept is about defining how many copies of a process are going to be run concurrently, reading from the same queues and writing to the same output queues. This should be done with processes with only one input and one output for deadlock prevention imo.

If you have any questions just ask.

Paul Morrison

unread,
May 26, 2016, 7:01:31 PM5/26/16
to flow-based-...@googlegroups.com

Re multiplexing, sorry if I caused confusion - classical  FBP doesn't support one output feeding multiple inputs.   When I refer to multiplexing, I am talking about something like the pattern in the Java network that I mentioned in my previous post on this topic, where multiple instances of the same component    or subnet are fed by a Load Balance component or something similar.

Given this definition, multiplexing is supported by all 4 of the current classical FBP implementations: JavaFBP, C#FBP, CppFBP, JSFBP.

HTH

Samuel Lampa

unread,
May 9, 2017, 5:24:12 AM5/9/17
to Flow Based Programming
On Thursday, May 26, 2016 at 9:37:59 PM UTC+2, Alfredo wrote:
Another solution that requires special process support is to add ordering metadata to each packet, like how tcp works and many other communication protocols.

 I think I'm having a pretty similar problem as Dan, in our work on scientific workflows.

Scatter/gather actually turns out to be one of the most common workflow patterns at all.

My thinking so far is along the lines of exactly what you mention above: To allow custom tags in each IP. The reason for going with this rather than tagged sub-streams is exactly that one often have multiple grouping information for each IP. This could be hierarchical in the most complex ways. Also, by tagging richly, one has the option to achieve a dataset at the end that is very easy to slice and dice to your liking in an interactive fashion in the end, which is often wanted in scientific use cases.

Apache Spark [1] , which has found a lot of use in data science at least, has this as a very very central feature (working with tuples of values that are used for sorting etc).

Best

Paul Morrison

unread,
May 10, 2017, 9:56:54 AM5/10/17
to Flow Based Programming
Hi Dan,

On rereading your post, I realized you said "at design time".  The FBP implementations I have worked with allow you to specify the number of paths up to run initialization time.  To change the number of paths once the application is actually running, you would need to change the network control blocks dynamically.  I actually don't see a big problem here (source for JavaFBP, etc., is all available on GitHub), and could be fun exercise!  It is probably easier to add more nodes than to subtract them, as, in the latter case, you would have to worry about IPs in transit.  The concept of substream-sensitive composites might be of assistance here, as this provides a way of draining a connection without killing the application as a whole - see  http://www.jpaulmorrison.com/fbp/checkpt.shtml - around diagram 20.3.

Regards,

JPaul

On Thursday, May 26, 2016 at 11:10:52 AM UTC-4, Dan Rumney wrote:

...

I have a network that will take in N data entities. I need to process each of those entities independently and then merge and compare the processed versions.

I don't know the value of N at design time.

So... is there a standard FBP approach to this? Is there a standard FBP diagramming approach to portraying this?

...

Tom Young

unread,
May 10, 2017, 2:04:02 PM5/10/17
to Flow Based Programming
Hi Dan,  

It would seem that the answer is that there is no standard FBP approach to this.     Elsewhere in this group,  I posted my approach and the NDL language definition.   I have developed two languages, NDL2, which is declarative; and NDL, a superset of NDL2, which adds procedural like commands, IF, SWITCH/CASE, WHILE, etc..  This allows the designer to describe variations of a network which 
depend on conditions at run time, just as you describe.   This ability 
does not depend on the ability to dynamically alter a running network, although that would certainly be a nice feature.  Subnets
are not a requirement for this, but can be useful to reduce clutter in the definition. 

While any valid NDL2 definition can be graphed, unfortunately, I have not addressed the problem of graphing NDL definitions or editing them visually.     The run time, expanded NDL definition could be graphed, I suppose.  

Attached is a benchmark example, scather2.ndl
​ and here are two runs, a) with one dummy input, and b) with three dummy inputs:​
      

           ---------   One Input  -------------------    
  $   dfd -d1  -i -- 4 1 11 999 6000   <./sca*2.ndl  2>&1   
(dfd   -d1 -i -- 4 1 11 999 6000 );
(G1 Gen 6000 999)out => in1(M Mux 1 PFX 1);
(J Join 1)out => in(T Time);
(D DeMux 1 1)out1 => in1(J Join 1);
(M Mux 1 PFX 1)mux => mux(D DeMux 1 1);
(E echo Case 4  of 4: )1 => in(S Stdout);
# 2 Partitions, 5 DataFlows, 7 Processes, 7 Components, 10 Gates. 0 Cycles
Case 4 of 4:

T-1.0:  0:00:00.985341sec. 6000000 bytes.   6000 flows.
6089 FPS   6089262 BPS   1000 bytes/flow.

           ---------  
​ Three​
Input
​s​
 -------------------    

$ dfd -d1  -i -- 4 3 11 999 6000   <./sca*2.ndl  2>&1   
(dfd   -d1 -i -- 4 3 11 999 6000 );
(G3 Gen 2000 999)out => in3(M Mux 3 PFX 1);
(G2 Gen 2000 999)out => in2(M Mux 3 PFX 1);
(G1 Gen 2000 999)out => in1(M Mux 3 PFX 1);
(J Join 3)out => in(T Time);
(D DeMux 3 1)out3 => in3(J Join 3);
(D DeMux 3 1)out2 => in2(J Join 3);
(D DeMux 3 1)out1 => in1(J Join 3);
(M Mux 3 PFX 1)mux => mux(D DeMux 3 1);
(E echo Case 4  of 4: )1 => in(S Stdout);
# 2 Partitions, 9 DataFlows, 9 Processes, 7 Components, 18 Gates. 2 Cycles
Case 4 of 4:

T-1.0:  0:00:01.006527sec. 6000000 bytes.   6000 flows.
5961 FPS   5961092 BPS   1000 bytes/flow.

​                             ---------   End of Runs    --------------​

​I would not conclude  much from the timings. ​

​Cheers, 

  -twy​



Tom Young
47 MITCHELL ST.
STAMFORD, CT  06902


When bad men combine, the good must associate; ...
  -Edmund Burke 'Thoughts on the cause of the present discontents' , 1770



On Thu, May 26, 2016 at 11:10 AM, Dan Rumney <danc...@gmail.com> wrote:

--
You received this message because you are subscribed to the Google Groups "Flow Based Programming" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flow-based-programming+unsub...@googlegroups.com.
scather2.ndl

Tom Young

unread,
May 10, 2017, 4:09:51 PM5/10/17
to Flow Based Programming
Hi Paul, 

I don't think one need worry so much about IPs in transit as state 
information that might be contained in nodes being dynamically removed or replaced.

I do think that solving the general dynamic network problem in toto would lead to burgeoning demand for wider FBP adoption. 

Cheers, 

     -twy




Tom Young
47 MITCHELL ST.
STAMFORD, CT  06902


When bad men combine, the good must associate; ...
  -Edmund Burke 'Thoughts on the cause of the present discontents' , 1770



--

Paul Morrison

unread,
May 11, 2017, 4:03:20 PM5/11/17
to flow-based-...@googlegroups.com
That's why I suggested looking at   http://www.jpaulmorrison.com/fbp/checkpt.shtml - this does address the need to quiesce nodes, as well as draining connections.  If a process/node is long-running, some provision must be made for storing its state data while it is quiesced -  in this chapter, I allude to something I call a State Data Repository - I have left that as an exercise for the reader!

To unsubscribe from this group and stop receiving emails from it, send an email to flow-based-programming+unsubscri...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Ged Byrne

unread,
May 12, 2017, 4:50:43 PM5/12/17
to flow-based-...@googlegroups.com
Hi Paul,

Going all the way to be beginning of the thread, do you see any advantage to using a sub component for this.

Lets say that I have a common pipeline of processes that I want the entities to pass through, and I want to instantiate the pipeline 1 for each entity.

It seems to make sense that I first create the pipeline as a subcomponent.  As you mention in your first edition: "We could now have "black box" composite components. This facility would allow subnets to be packaged as separate load modules for distribution, which could later be linked with other components by a developer to form the full application network."  http://www.jpaulmorrison.com/fbp/compos.shtml

So this is using a subcomponent as a template.  Can you see any problem with that?

Regards, 


Ged

To unsubscribe from this group and stop receiving emails from it, send an email to flow-based-progra...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Flow Based Programming" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flow-based-progra...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Flow Based Programming" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flow-based-progra...@googlegroups.com.

Paul Morrison

unread,
May 13, 2017, 10:43:52 AM5/13/17
to flow-based-...@googlegroups.com
Hi Ged,

I totally agree that a flow of several process nodes and their connections could be packaged and replicated, but, as we said before, the trick would be to add these "packages" dynamically into a running network.  I still think some of the ideas described in http://www.jpaulmorrison.com/fbp/compos.shtml would be helpful here - it needs someone to try it out.  The Subnet Manager component has some interesting capabilities - I believe it could serve as a basis for trying things out... 

The reference you quote talks about "load modules", so I assume (given the date) I was talking about the IBM MVS architecture (now z/OS).  In the more recent FBP implementations, maybe the C++ implementation would be a useful test bed, as there is no VM - https://github.com/jpaulm/cppfbp - where I assume the corresponding objects would be dll's.  In fact, CppFBP already has some dynamic capabilities, which could be extended...

As I said before, the source code is all on GitHub - I'd love to see someone try out these ideas... :-)

Regards,

JPaul

To unsubscribe from this group and stop receiving emails from it, send an email to flow-based-programming+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Flow Based Programming" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flow-based-programming+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Flow Based Programming" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flow-based-programming+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Flow Based Programming" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flow-based-programming+unsub...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages