JavaFBP LoadBalance update

142 просмотра
Перейти к первому непрочитанному сообщению

Paul Morrison

не прочитано,
1 авг. 2015 г., 10:45:5401.08.2015
– Flow Based Programming
I recently realized there was a problem when substreams were being sent through a LoadBalance process: the components of the substream (open brackets, data IPs, and close bracket) could theoretically be sent to different output port array elements, resulting in the substream getting disassembled.

I have therefore modified LoadBalance to route all components of a substream to the same output port array element.  This code is now on GitHub - I will be doing some testing, but I would appreciate it if anyone using substreams and LoadBalance, preferably with large data volumes could also check it out.

Thanks in advance,

Paul M.

Paul Morrison

не прочитано,
13 авг. 2015 г., 10:21:4413.08.2015
– Flow Based Programming
This in turn raises another interesting problem!

BTW before I go on, it should be stressed that this is in the context of "classical" FBP, as implemented by JavaFBP, C#FBP, CppFBP (and perhaps JSFBP).  The mod to LoadBalance has only been actually implemented in JavaFBP so far, and in view of the problem I am going to describe, it seems I only did half the job anyway!  I have no idea if NoFlo and similar FBP-like systems have an analogous problem.

Let us say that we have used the (modified) LoadBalance component to route whole substreams to the various output port elements of LoadBalance - the split streams will most likely have to be recombined at some point.  Now we can't use something like a round robin merge to bring them back into one stream, as that would introduce the possibility of deadlocks.  So we need someway of bringing them into a single input port - but the default first-come, first served merge into one port would mix up the substreams, so I am proposing a new API call which allows a component to wait on any element of an input array port, and return the element number at which there is a data IP waiting.  This function will suspend until an IP arrives at one of the array port elements.

a) I will try to build this function over the next week or so, unless

b) someone else wants to try their hand, or

c) someone has a better idea!

TIA - and apologies for not spotting this years ago!

Ged Byrne

не прочитано,
13 авг. 2015 г., 12:24:3113.08.2015
– flow-based-...@googlegroups.com
Hi Paul,

In the Splitter and Aggregator integration patterns this problem is usually addressed through the use of a Correlation Identifier.


To quote from the book: "In some cases it is useful to equip child messages with sequence numbers to improve message traceability and simplify the task of an Aggregator (268). Also, it is a good idea to equip each message with a reference to the original (combined) message so that processing results from the individual messages can be correlated back to the original message. This reference acts as a Correlation Identifier." https://books.google.co.uk/books?id=qqB7nrrna_sC&lpg=PP1&pg=PA262#v=onepage&q&f=false

With this approach their would be the option to add a correlation identifier to the IP.  This identifier would allow the aggregator/merge to reassemble the messages based on the identifiers rather than having to bind them to individual queues. 

A simple scheme would be enough to solve the immediate problem, and developers are three to implement more sophisticated implementations if they want.

Regards, 


Ged


--
You received this message because you are subscribed to the Google Groups "Flow Based Programming" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flow-based-progra...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Alfredo Sistema

не прочитано,
13 авг. 2015 г., 14:18:3113.08.2015
– flow-based-...@googlegroups.com
I think that I've proposed this kind of functionality in the past without much success but what I currently have is a function that returns a list of indexes an array port has that contain packets so that you can iterate over those and read packets. Receives would be exactly the same.
The correlation identifier is a good idea, one could achieve the same with substreams, but it adds the substream handling overhead. I experimented with a "tag" attribute in packets for situations like this, so that packets can be sorted by tag after merging streams, so even if packets got scrambled, the stream can be reassembled. A problem with this approach is that if a process adds packets to a substream, and those packets don't have the correct tag, then they will be lost. A functionality to "splice" a tagged packet in place would be needed.

Paul Morrison

не прочитано,
18 авг. 2015 г., 14:29:1118.08.2015
– Flow Based Programming, Alfredo, Ged Byrne
I have put up a new release of JavaFBP, containing enhancements along the lines that Alfredo is proposing - see https://github.com/jpaulm/javafbp/releases/tag/v3.0.2 .  It addresses the problem I brought up in this topic, where in this case you cannot use a simple FBP "multiple output ports - one input port" merge, as the output of the merge will be totally scrambled! 

(I believe) I have solved this using a new component - SubstreamSensitiveMerge - which ensures that the IPs within a given substream are preserved and maintain their sequence after having been split by the LoadBalance component, although the sequence of substreams may change. 

I have added documentation to https://github.com/jpaulm/javafbp/issues/8 .

The test network is https://github.com/jpaulm/javafbp/blob/master/src/main/java/com/jpmorrsn/fbp/examples/networks/TestLoadBalanceWithSubstreams.java .  This network also contains a boolean variable to allow the network to be tested with or without using the SubstreamSensitiveMerge component.

SubstreamSensitiveMerge also uses a new API call - findPortWithData() - which suspends if none of the elements of an array input port have any data queued up.  So no polling is required.

Feel free to play with this network and/or component - feedback would be welcome!

Regards,

Paul

PS I will also be adding a checker component to check the output from SubstreamSensitiveMerge  but so far it looks fine using an eyeball check...

PPS This does not preclude the Aggregator type of solution described by Ged - it just provides a more flow-like solution in some situations where the Aggregator is not needed.


On Thursday, August 13, 2015 at 2:18:31 PM UTC-4, Alfredo wrote:
I think that I've proposed this kind of functionality in the past without much success but what I currently have is a function that returns a list of indexes an array port has that contain packets so that you can iterate over those and read packets. Receives would be exactly the same.
The correlation identifier is a good idea, one could achieve the same with substreams, but it adds the substream handling overhead. I experimented with a "tag" attribute in packets for situations like this, so that packets can be sorted by tag after merging streams, so even if packets got scrambled, the stream can be reassembled. A problem with this approach is that if a process adds packets to a substream, and those packets don't have the correct tag, then they will be lost. A functionality to "splice" a tagged packet in place would be needed.


El jue., 13 ago. 2015 a las 13:24, Ged Byrne (<ged....@gmail.com>) escribió:
Hi Paul,

In the Splitter and Aggregator integration patterns this problem is usually addressed through the use of a Correlation Identifier.


To quote from the book: "In some cases it is useful to equip child messages with sequence numbers to improve message traceability and simplify the task of an Aggregator (268). Also, it is a good idea to equip each message with a reference to the original (combined) message so that processing results from the individual messages can be correlated back to the original message. This reference acts as a Correlation Identifier." https://books.google.co.uk/books?id=qqB7nrrna_sC&lpg=PP1&pg=PA262#v=onepage&q&f=false

With this approach their would be the option to add a correlation identifier to the IP.  This identifier would allow the aggregator/merge to reassemble the messages based on the identifiers rather than having to bind them to individual queues. 

A simple scheme would be enough to solve the immediate problem, and developers are three to implement more sophisticated implementations if they want.

Regards, 


Ged
On Thu, 13 Aug 2015 at 15:21 Paul Morrison <paul.m...@rogers.com> wrote:
This in turn raises another interesting problem!

BTW before I go on, it should be stressed that this is in the context of "classical" FBP, as implemented by JavaFBP, C#FBP, CppFBP (and perhaps JSFBP).  The mod to LoadBalance has only been actually implemented in JavaFBP so far, and in view of the problem I am going to describe, it seems I only did half the job anyway!  I have no idea if NoFlo and similar FBP-like systems have an analogous problem.

Let us say that we have used the (modified) LoadBalance component to route whole substreams to the various output port elements of LoadBalance - the split streams will most likely have to be recombined at some point.  Now we can't use something like a round robin merge to bring them back into one stream, as that would introduce the possibility of deadlocks.  So we need some way of bringing them into a single input port - but the default first-come, first served merge into one port would mix up the substreams, so I am proposing a new API call which allows a component to wait on any element of an input array port, and return the element number at which there is a data IP waiting.  This function will suspend until an IP arrives at one of the array port elements.


a) I will try to build this function over the next week or so, unless

b) someone else wants to try their hand, or

c) someone has a better idea!

TIA - and apologies for not spotting this years ago!

Paul Morrison

не прочитано,
23 авг. 2015 г., 13:30:1223.08.2015
– Flow Based Programming, sistemas...@gmail.com, ged....@gmail.com
Update: I added a checker process, and it detected an empty substream (open and close bracket, but no data IPs).  It turned out that this was being created at end of stream by the test component GenSS (when the number of IPs being created was an exact multiple of the specified substream size - obvious, right?!).   This has now been fixed.

Paul Morrison

не прочитано,
23 авг. 2015 г., 13:39:4623.08.2015
– Flow Based Programming, sistemas...@gmail.com, ged....@gmail.com
I have changed the API call "findPortWithData" to "findInputPortElementWith Data" - I apologize for the longer name, but I believe the old name may have been raising false expectations!

Regards,

Paul M.

Paul Morrison

не прочитано,
8 сент. 2015 г., 19:58:2808.09.2015
– Flow Based Programming, sistemas...@gmail.com, ged....@gmail.com
Substream-sensitive Load Balancer and Merge have been added to JSFBP - https://github.com/jpaulm/jsfbp - as well.

C#FBP next - once I figure out how to run the IDE on my new machine!

Regards,

Paul


On Sunday, August 23, 2015 at 1:39:46 PM UTC-4, Paul Morrison wrote:
I have changed the API call "findPortWithData" to "findInputPortElementWith Data" - I apologize for the longer name, but I believe the old name may have been raising false expectations!

Regards,

Paul M.

On Sunday, August 23, 2015 at 1:30:12 PM UTC-4, Paul Morrison wrote:
Update: I added a checker process, and it detected an empty substream (open and close bracket, but no data IPs).  It turned out that this was being created at end of stream by the test component GenSS (when the number of IPs being created was an exact multiple of the specified substream size - obvious, right?!).   This has now been fixed.

On Tuesday, August 18, 2015 at 2:29:11 PM UTC-4, Paul Morrison wrote:
I have put up a new release of JavaFBP, containing enhancements along the lines that Alfredo is proposing - see https://github.com/jpaulm/javafbp/releases/tag/v3.0.2 .  It addresses the problem I brought up in this topic, where in this case you cannot use a simple FBP "multiple output ports - one input port" merge, as the output of the merge will be totally scrambled! 

(I believe) I have solved this using a new component - SubstreamSensitiveMerge - which ensures that the IPs within a given substream are preserved and maintain their sequence after having been split by the LoadBalance component, although the sequence of substreams may change. 

I have added documentation to https://github.com/jpaulm/javafbp/issues/8 .

The test network is https://github.com/jpaulm/javafbp/blob/master/src/main/java/com/jpmorrsn/fbp/examples/networks/TestLoadBalanceWithSubstreams.java .  This network also contains a boolean variable to allow the network to be tested with or without using the SubstreamSensitiveMerge component.

SubstreamSensitiveMerge also uses a new API call - findPortWithData() - now called "findInputPortElementWith Data()" - which suspends if none of the elements of an array input port have any data queued up.  So no polling is required.

Feel free to play with this network and/or component - feedback would be welcome!

Regards,

Paul


Paul Morrison

не прочитано,
11 окт. 2015 г., 11:09:0511.10.2015
– Flow Based Programming, sistemas...@gmail.com, ged....@gmail.com
In an earlier post on this topic, I pointed at  https://github.com/jpaulm/javafbp/blob/master/src/main/java/com/jpmorrsn/fbp/examples/networks/TestLoadBalanceWithSubstreams.java .

You will see that there is a LoadBalance process (recently modified so it doesn't split up substreams) feeding some processes, which in turn feed into a SubstreamSensitiveMerge process.  This of course is the classical divergent-convergent topology, which is one of the danger signs for deadlocks. 

I believe that this example will run safely if the longest substream will fit within the connections on the shortest path between the LoadBalance and the SubstreamSensitiveMerge, but, if this rule is violated, you may get deadlocks.  I have replaced the Passthru processes with SlowPass components (Passthru with a random delay), and this rule appears to hold.

I am therefore wondering if it would be better to simply make a blanket statement telling people they can use either of these two components, but not to use both in the same area of the network.

I have done some tests, and the above rule seems to hold, but I am wondering if any interested Group members could think about this, and give me some recommendations. There may be a role for both of these components, but perhaps not in the same network!

Thanks,

Paul

Paul Morrison

не прочитано,
27 окт. 2015 г., 10:38:4327.10.2015
– Flow Based Programming, sistemas...@gmail.com, ged....@gmail.com

This is getting strange!  I discovered a few small bugs in SubstreamSensitiveMerge  and the (new) API call it uses - findInputPortElementWithData.  These apply to JavaFBP, C#FBP, and probably JSFBP as well.  These problems have now been fixed for the first two.  Running various tests with these, using different connection capacities, now very seldom crashes with a deadlock! 

At this time I can't prove that this network will never give a deadlock, which makes me rather uncomfortable!  This network is, after all, a divergent-convergent topology, which is known to cause problems, but it may be that the findInputPortElementWithData function somehow compensates for this.

The above-mentioned API call has shown up rather late in FBP's evolution, in response to the problem caused by combining substreams with LoadBalance, and I am really not sure if I would build a production system using it.  It should be noted that it should very seldom be necessary to recombine streams that have been split by LoadBalance - much better to leave them to go their separate ways in the network!

I am attaching a diagram and would very much appreciate some Group member(s) of a mathematical bent doing some analysis on this...   Remember that substream size can also be manipulated by modifying GenSS in both of these environments.  GenSS can be found in folder examples\components in JavaFBP, and in FBPVerbs in C#FBP.

Thanks in advance, and best regards,

Paul M.
Ответить всем
Отправить сообщение автору
Переслать
0 новых сообщений