The Disruptor High Performance Inter-Thread Messaging Library

420 views
Skip to first unread message

Ged Byrne

unread,
Apr 13, 2012, 5:25:48 AM4/13/12
to flow-based-...@googlegroups.com
Here in the London Java Community there's a lot of interest in the LMAX Disruptor framework.

When I finally get time, I'm hoping to investigate building an FBP framework based on Disruptors.

 The Business Logic Processor is surrounded by Disruptors - a concurrency component that implements a network of queues that operate without needing locks. During the design process the team concluded that recent directions in high-performance concurrency models using queues are fundamentally at odds with modern CPU design.

The team that developed the Disruptor looked at the Latency of a queue base architecture and found that the biggest bottleneck were the queues themselves, due to contention on the head.  Instead of a queue a ring buffer is used.  

The key thing about the Disruptor is that it was designed to have zero contention within the framework. This is achieved by following the single-writer principle—only one thing can ever write to a single piece of  data.

The full details are here: http://code.google.com/p/disruptor/

Here's an explanation about how the diamond pattern of consumers is wired up using a DSL: http://mechanitis.blogspot.co.uk/2011/07/dissecting-disruptor-wiring-up.html


Here is the traditional queue based approach:
Here is the Disruptor:

The single ring buffer is passed between the processors rather than the items being passed from queue to queue, removing the needs to read and write from the intermediate queues.

In factory analogy this is the equivalent of moving from series production to a one piece flow: http://www.thetoyotasystem.com/lean_concepts/one_piece_flow.php  

traditional productionToyota System

Instead of having buffers of inventory between each work station there is a continuous flow.

I'd love to hear peoples thoughts.

Regards, 


Ged



Brad Cox

unread,
Apr 13, 2012, 6:45:59 AM4/13/12
to flow-based-...@googlegroups.com
Wow. This will take some time to digest. 

William la Forge

unread,
Apr 13, 2012, 6:46:48 AM4/13/12
to flow-based-...@googlegroups.com
I've achieved comparable performance but with more flexibility in JActors. On an i5 we can pass 1.2 billion messages per second or create a billion actors per second. Of course, I was inspired by the disruptor pattern--it gave me an idea of what is possible! Turns out, short methods are more important than ring buffers. And I came up with an idea that I named commandeering which allows almost everything to run synchronously--except of course where you need to run things in parallel.


Bill

Paul Morrison

unread,
Apr 13, 2012, 10:57:15 AM4/13/12
to flow-based-...@googlegroups.com
Hi Ged,

The diagram in "Dissecting Disruptor Wiring" looks incredibly complicated!� I hope I don't come across as someone who is wedded to an old technology - but maybe you can explain what the appeal of this approach is!

A couple of specific points - maybe you can clarify...

- I am wondering if they assume queues have to be filled up and then emptied completely...?� Just in case they think that, I should point out that FBP connections _are_ ring structures, where a get pointer chases a put pointer around the ring.� I am not sure if that is made clear in my book - I didn't realize it might be important :-)�� A process is only suspended when it tries to send to a full connection,� or receive from an empty one.�� FBP implementations have always used ring structures - as you know, this goes back to the late '60s.

- The kanbans described in the "one piece flow" article sound to me like FBP connections with capacity of 1, although the Wikipedia article seems to suggest a more flexible set up - again I may be wrong.

- Again, as I keep saying, the "diamond" pattern IMO is deadlock-prone, and the article seems to jump through hoops to support this - I much prefer the simple solution I described in Wikipedia Merge Proposal or, when you have huge amounts of data, Mike Beckerle's approach - see http://groups.google.com/group/flow-based-programming/browse_thread/thread/1904d6a109f69fcc

It seems as if they satisfy the diamond by having really complicated scheduling rules - I am not sure if it scales up...

- How can you have a network of queues without needing locks?� As soon as you have more than one thread accessing anything, I was under the impression you needed locks.� In JavaFBP and C#FBP we basically only need locks on connections (and one other more complex situation); in AMPS we didn't need locks, but then we only had one processor, therefore one thread.

- I guess I don't understand the "only one thing can ever write to a single piece of data" principle - I assume this is different from FBP's ownership rules...?�

Regards, and apologies if these are dumb questions!

Paul

PS I assume this Martin Fowler is the UML guy...?� Sounds like he is interested in a lot of the same things that we are!� What does he think of FBP?!

PPS Kilim is very fast, and that's quite FBP-like...




On 13/04/2012 5:25 AM, Ged Byrne wrote:
Here in the London Java Community there's a lot of interest in the LMAX Disruptor framework.

When I finally get time, I'm hoping to investigate building an FBP framework based on Disruptors.

�The Business Logic Processor is surrounded by Disruptors - a concurrency component that implements a network of queues that operate without needing locks. During the design process the team concluded that recent directions in high-performance concurrency models using queues are fundamentally at odds with modern CPU design.

The team that developed the Disruptor looked at the Latency of a queue base architecture and found that the biggest bottleneck were the queues themselves, due to contention on the head. �Instead of a queue a ring buffer is used. �

The key thing about the�Disruptor is that it was designed�to have zero contention within�the framework. This is achieved�by following the single-writer�principle�only one thing can�ever write to a single piece of �data.

The full details are here:�http://code.google.com/p/disruptor/

Here's an explanation about how the diamond pattern of consumers is wired up using a DSL:�http://mechanitis.blogspot.co.uk/2011/07/dissecting-disruptor-wiring-up.html


Here is the traditional queue based approach:
Here is the Disruptor:

The single ring buffer is passed between the processors rather than the items being passed from queue to queue, removing the needs to read and write from the�intermediate�queues.

In factory analogy this is the equivalent of moving from series production to a one piece flow:�http://www.thetoyotasystem.com/lean_concepts/one_piece_flow.php��

traditional productionToyota System

Instead of having buffers of inventory between each work station there is a continuous flow.

I'd love to hear peoples thoughts.

Regards,�


Ged





Ged Byrne

unread,
Apr 13, 2012, 11:08:26 AM4/13/12
to flow-based-...@googlegroups.com
Hi Paul,

Good questions.  I believe that the "only one thing can ever write to a single piece of data" principle is actually closely related to FBP's ownership rules.  That's what I'm hoping to explore.

Don't get too tied up to the kanban analogy, I was just using that to tie it back to the running factory metaphor.

Regards, 


Ged


On 13 April 2012 15:57, Paul Morrison <paul.m...@rogers.com> wrote:
Hi Ged,

The diagram in "Dissecting Disruptor Wiring" looks incredibly complicated!  I hope I don't come across as someone who is wedded to an old technology - but maybe you can explain what the appeal of this approach is!


A couple of specific points - maybe you can clarify...

- I am wondering if they assume queues have to be filled up and then emptied completely...?  Just in case they think that, I should point out that FBP connections _are_ ring structures, where a get pointer chases a put pointer around the ring.  I am not sure if that is made clear in my book - I didn't realize it might be important :-)   A process is only suspended when it tries to send to a full connection,  or receive from an empty one.   FBP implementations have always used ring structures - as you know, this goes back to the late '60s.


- The kanbans described in the "one piece flow" article sound to me like FBP connections with capacity of 1, although the Wikipedia article seems to suggest a more flexible set up - again I may be wrong.

- Again, as I keep saying, the "diamond" pattern IMO is deadlock-prone, and the article seems to jump through hoops to support this - I much prefer the simple solution I described in Wikipedia Merge Proposal or, when you have huge amounts of data, Mike Beckerle's approach - see http://groups.google.com/group/flow-based-programming/browse_thread/thread/1904d6a109f69fcc

It seems as if they satisfy the diamond by having really complicated scheduling rules - I am not sure if it scales up...

- How can you have a network of queues without needing locks?  As soon as you have more than one thread accessing anything, I was under the impression you needed locks.  In JavaFBP and C#FBP we basically only need locks on connections (and one other more complex situation); in AMPS we didn't need locks, but then we only had one processor, therefore one thread.


- I guess I don't understand the "only one thing can ever write to a single piece of data" principle - I assume this is different from FBP's ownership rules...? 

Regards, and apologies if these are dumb questions!

Paul

PS I assume this Martin Fowler is the UML guy...?  Sounds like he is interested in a lot of the same things that we are!  What does he think of FBP?!


PPS Kilim is very fast, and that's quite FBP-like...




On 13/04/2012 5:25 AM, Ged Byrne wrote:
Here in the London Java Community there's a lot of interest in the LMAX Disruptor framework.

When I finally get time, I'm hoping to investigate building an FBP framework based on Disruptors.

 The Business Logic Processor is surrounded by Disruptors - a concurrency component that implements a network of queues that operate without needing locks. During the design process the team concluded that recent directions in high-performance concurrency models using queues are fundamentally at odds with modern CPU design.

The team that developed the Disruptor looked at the Latency of a queue base architecture and found that the biggest bottleneck were the queues themselves, due to contention on the head.  Instead of a queue a ring buffer is used.  

The key thing about the Disruptor is that it was designed to have zero contention within the framework. This is achieved by following the single-writer principle—only one thing can ever write to a single piece of  data.

The full details are here: http://code.google.com/p/disruptor/

Here's an explanation about how the diamond pattern of consumers is wired up using a DSL: http://mechanitis.blogspot.co.uk/2011/07/dissecting-disruptor-wiring-up.html


Here is the traditional queue based approach:
Here is the Disruptor:

The single ring buffer is passed between the processors rather than the items being passed from queue to queue, removing the needs to read and write from the intermediate queues.

In factory analogy this is the equivalent of moving from series production to a one piece flow: http://www.thetoyotasystem.com/lean_concepts/one_piece_flow.php  

traditional productionToyota System

Instead of having buffers of inventory between each work station there is a continuous flow.

I'd love to hear peoples thoughts.

Regards, 


Paul Morrison

unread,
Apr 13, 2012, 3:05:38 PM4/13/12
to flow-based-...@googlegroups.com
Things become clearer!� This article - http://disruptor.googlecode.com/files/Disruptor-1.0.pdf - seems to me to be focusing on the advantages of the ring over, say, a linked list of IPs.� They say they use Doug Lea's ArrayBlockingQueue - IIRC I tried using it, but it didn't give me all the function I needed.� I just use an IP array, which they seem to like!

There are certainly parts of this article that I didn't get my head around, but what (I think) I did glean was that maybe we could be more selective about how we lock connections, e.g. if the consumer is accessing one part of the array ring, and a producer is accessing a different part, maybe we don't have to issue a lock :-)� This seems to be what the "principle" that I asked about is talking about...

Now producers access the put index, and consumers access the get index, so they don't have to lock on either... but IMO you have to know when the connection is empty or full, so maybe the IP count has to be "volatile" - how does this affect performance?

If I'm right about this, it sounds like we could make the lock logic in the JavaFBP Connection class more sophisticated - and they seem to be saying that it could speed up performance enormously...� � Well, we won't know until we try...� � The code is on SourceForge - anyone want to give it a shot?�� Of course we would have to do very careful testing - Bob, that's your forte...?!

John (Cowan), as you have helped me in the past with locking (for which thanks), does the above make sense?


On 13/04/2012 11:08 AM, Ged Byrne wrote:
Hi Paul,

Good questions. �I believe that the�"only one thing can ever write to a single piece of data" principle is actually closely related to FBP's ownership rules. �That's what I'm hoping to explore.

Don't get too tied up to the kanban analogy, I was just using that to tie it back to the running factory metaphor.

Regards,�


Ged


On 13 April 2012 15:57, Paul Morrison <paul.m...@rogers.com> wrote:
Hi Ged,

The diagram in "Dissecting Disruptor Wiring" looks incredibly complicated!� I hope I don't come across as someone who is wedded to an old technology - but maybe you can explain what the appeal of this approach is!


A couple of specific points - maybe you can clarify...

- I am wondering if they assume queues have to be filled up and then emptied completely...?� Just in case they think that, I should point out that FBP connections _are_ ring structures, where a get pointer chases a put pointer around the ring.� I am not sure if that is made clear in my book - I didn't realize it might be important :-)�� A process is only suspended when it tries to send to a full connection,� or receive from an empty one.�� FBP implementations have always used ring structures - as you know, this goes back to the late '60s.


- The kanbans described in the "one piece flow" article sound to me like FBP connections with capacity of 1, although the Wikipedia article seems to suggest a more flexible set up - again I may be wrong.

- Again, as I keep saying, the "diamond" pattern IMO is deadlock-prone, and the article seems to jump through hoops to support this - I much prefer the simple solution I described in Wikipedia Merge Proposal or, when you have huge amounts of data, Mike Beckerle's approach - see http://groups.google.com/group/flow-based-programming/browse_thread/thread/1904d6a109f69fcc

It seems as if they satisfy the diamond by having really complicated scheduling rules - I am not sure if it scales up...

- How can you have a network of queues without needing locks?� As soon as you have more than one thread accessing anything, I was under the impression you needed locks.� In JavaFBP and C#FBP we basically only need locks on connections (and one other more complex situation); in AMPS we didn't need locks, but then we only had one processor, therefore one thread.

- I guess I don't understand the "only one thing can ever write to a single piece of data" principle - I assume this is different from FBP's ownership rules...?�

Regards, and apologies if these are dumb questions!

Paul

PS I assume this Martin Fowler is the UML guy...?� Sounds like he is interested in a lot of the same things that we are!� What does he think of FBP?!


PPS Kilim is very fast, and that's quite FBP-like...




On 13/04/2012 5:25 AM, Ged Byrne wrote:
Here in the London Java Community there's a lot of interest in the LMAX Disruptor framework.

When I finally get time, I'm hoping to investigate building an FBP framework based on Disruptors.

�The Business Logic Processor is surrounded by Disruptors - a concurrency component that implements a network of queues that operate without needing locks. During the design process the team concluded that recent directions in high-performance concurrency models using queues are fundamentally at odds with modern CPU design.

The team that developed the Disruptor looked at the Latency of a queue base architecture and found that the biggest bottleneck were the queues themselves, due to contention on the head. �Instead of a queue a ring buffer is used. �

The key thing about the�Disruptor is that it was designed�to have zero contention within�the framework. This is achieved�by following the single-writer�principle�only one thing can�ever write to a single piece of �data.

The full details are here:�http://code.google.com/p/disruptor/

Here's an explanation about how the diamond pattern of consumers is wired up using a DSL:�http://mechanitis.blogspot.co.uk/2011/07/dissecting-disruptor-wiring-up.html


Here is the traditional queue based approach:
Here is the Disruptor:

The single ring buffer is passed between the processors rather than the items being passed from queue to queue, removing the needs to read and write from the�intermediate�queues.

In factory analogy this is the equivalent of moving from series production to a one piece flow:�http://www.thetoyotasystem.com/lean_concepts/one_piece_flow.php��

traditional
                        productionToyota System

Instead of having buffers of inventory between each work station there is a continuous flow.

I'd love to hear peoples thoughts.

Paul Morrison

unread,
Apr 13, 2012, 3:08:57 PM4/13/12
to flow-based-...@googlegroups.com
On 13/04/2012 3:05 PM, Paul Morrison wrote:
> maybe the IP count has to be "volatile" -
Sorry, make that AtomicInteger...

Raoul Duke

unread,
Apr 13, 2012, 5:33:15 PM4/13/12
to flow-based-...@googlegroups.com
keen. remind me -- as ever, of course -- to find time to install and use it. :-)

John Cowan

unread,
Apr 13, 2012, 10:33:21 PM4/13/12
to flow-based-...@googlegroups.com
Paul Morrison scripsit:

> John (Cowan), as you have helped me in the past with locking (for
> which thanks), does the above make sense?

As you described it, yes. There is no need to lock the whole buffer, only
when there is contention in part of it.

--
John Cowan co...@ccil.org
At times of peril or dubitation, http://www.ccil.org/~cowan
Perform swift circular ambulation,
With loud and high-pitched ululation.

William la Forge

unread,
Apr 14, 2012, 12:33:00 AM4/14/12
to flow-based-...@googlegroups.com
I'll note that locks, and even the atomic classes, are quite slow. --Bill

William la Forge

unread,
Apr 14, 2012, 12:39:21 AM4/14/12
to flow-based-...@googlegroups.com
Though not as slow as passing data to something that is idle, of course.

Instead of ring buffers, each message source has an array list to buffer output to each destination. The array lists are then sent when the message source has no additional messages to process. I use other techniques for performance as well, but this one is important. In JActor v3 (being released today/Saturday), I can send up to 1.2 billion messages per second.

Jactor has one semaphore in its threadpool, one atomic reference in each mailbox (mailboxes are shared for added performance) and NO locks.


OK, I'll go away now. My intent is not to troll.

Bill

Paul Morrison

unread,
Apr 18, 2012, 4:36:18 PM4/18/12
to Flow Based Programming


On Apr 13, 10:33 pm, John Cowan <co...@mercury.ccil.org> wrote:

> There is no need to lock the whole buffer, only
> when there is contention in part of it.

It seems my understanding of locks is not adequate to modify JavaFBP
to take advantage of this optimization :-) Anyone want to give it a
try? Alternatively, send me code or pseudocode, with the locks marked
in, showing how the two functions should work...? It seems to me that
things would be simpler with suspend() and resume()... but these
functions have been deprecated!

Paul Morrison

unread,
Apr 22, 2012, 12:26:48 PM4/22/12
to flow-based-...@googlegroups.com
Hi Ged,

I have been playing with my JavaFBP code, and went off after what I think is a red herring - see a couple of incoherent notes I sent off earlier on this thread!  Apologies! 

The LMAX approach is very different from FBP's and IMHO much more complicated!  And they are trying to build a high-accuracy financial system (if I understand this correctly) on top of this logic - I don't think I would dare :-)    As one of them says somewhere, lock-free concurrency is not for the faint of heart! 

Granted their approach is very fast - but we can still probably milk quite a bit of performance out of various FBP approaches - I mentioned Kilim in an earlier post -  see http://www.malhar.net/sriram/kilim/ .  Just picking up on one of their comments about the modulo function (in send and receive) being slow gave me what looks like a significant speed improvement in JavaFBP, and I want to try playing with locks instead of synchronized, to see if that makes a difference also... 

I will probably continue trying to understand their approach, but for now I don't see any easy way to enhance JavaFBP or C#FBP to take advantage of their approach... We could throw this out as a challenge:  can you - or any of the readers of this Group - visualize some kind of hybrid approach?

Regards,

Paul
   


   

William la Forge

unread,
Apr 22, 2012, 10:30:24 PM4/22/12
to flow-based-...@googlegroups.com
Paul,

I've developed some techniques in JActor which I believe would benefit you. (Current version of JActor moves 1.2 billion messages per second on an i5.)

JActor is fully multi-threaded, but except as mandated by the developer it runs synchronously whenever it can. In particular, when a mailbox sends a message to another mailbox which is empty/idle, the sending mailbox commandeers the receiving mailbox and does the processing on its own thread--providing the receiving mailbox allows it (it is not always a good idea). See  https://github.com/laforge49/JActor/blob/master/src/main/java/org/agilewiki/jactor/events/JAEventQueue.java 

Of course, JActor uses a number of speed enhancing techniques. Another is the use of a very fast alternative to threadpools: https://github.com/laforge49/JActor/blob/master/src/main/java/org/agilewiki/jactor/concurrent/JAThreadManager.java

JActor is entirely lock free. The JAThreadManager does use a semaphore, and JAEventQueue does use an AtomicReference. (Without polling!) But that's about it. And it runs at a speed comparable to the disruptor pattern. (I'm not sure which is faster--but that probably depends on the architecture of the application and the quality of the implementation.) My own biased opinion here is that JActor is easier to use and would as a result generally give you a faster implementation.

Bill

Ged Byrne

unread,
Apr 23, 2012, 7:12:24 AM4/23/12
to flow-based-...@googlegroups.com
Hi Paul,

Sorry for not responding earlier.  I've just started a new job so free time has been scarce.

This is exactly my thoughts, it is more complex than it needs to be.  The framework itself is derived by optimisation based on the Java memory model and this leads to  complexity.

I wonder if the lessons learnt in the framework can be applied to JavaFBP in order to achieve similar speeds with a much simpler developer model.

Thanks to everybody for your thoughts, it has been very interesting.

I hope to be able to blog something about it all in the summer.

Regards, 



Ged

Dan

unread,
Apr 24, 2012, 6:12:13 AM4/24/12
to flow-based-...@googlegroups.com, ged....@gmail.com
The article "One Piece Flow" (Toyota Production System) is misleading the readers. It induces the idea that a person could be released from its job just because there is a difference between the two systems (mass production vs. Toyota system).
Just let's recapitulate what everybody is doing in each system.
Mass Production System
Frank:   wire -> spring;
Dorothy: plastic tube + ink cartridge;
Stephen: add the spring;
Martha:  close the pen with another plastic tube;
Toyota System
Frank:   wire -> spring;
Dorothy: plastic tube + ink cartridge and add the spring;
Martha:  close the pen with another plastic tube;

In the Toyota system Dorothy makes the Stephen's job entirely. It's just a matter of working (time) balance here that is not related to either system. A person is just doing more work so another one could be released.
The question: "What happened to Stephen?" does not make much sense. Nothing is miraculous here.
The kanban contains more items in the first case (mass production system) and only one in the second case (Toyota system).
In programming this translates to: more memory used (buffers) vs. less memory used (no buffers but only one item / variable in memory).
The problem resides in the difference of processing speed of each processing unit (job). In order to keep everybody busy (full speed) we need to balance the speed of processing of each unit. If this speed of processing is variable for whatever reasons we need buffering (e.g. video streaming). Many times this is just unavoidable.

Even the conclusion is really strange:
"I wish I could demonstrate in this article how it is that the second company with the same number of work force can produce more pens and each costs them less than the costs of the first, traditional example."
"I wish I could"?!! So what stopped him to do so?

The added value is the difference in muda (waste). This is the real advantage indeed.
Disadvantages of the first system (mass production):
- tracking boxes (time, personnel, loosing items etc.)
- more space needed (higher rent, more space to clean etc.)
- a mistake replicates to the subcomponents well before an item is completely produced (no error corrections can be taken right away)
Disadvantages of the second system (Toyota system):
- no buffering can induce dead time (something brakes in the production line then everybody waits)

So the advantage of a system over the other is that it eliminates the other's disadvantages.
Nowadays "just in time" system is really useful because of its advantages. The client can see it in the car commercial centers where the client orders a particular car (specific color, specific features etc.). The car is not in stock but it will be manufactured immediately after the order is made so no cars are in stock, no deposit necessary, no risk to not sell the cars in stock etc. But this is not always an advantage.

My final question is: the Ford assembly line was working with "kanbans" that contained more than one item? The car's assembly lines were working with "kanbans" of only one item since a long time ago. I don't see the big difference. Is the name: "Mass Production System"  contrary to the "Toyota System"? I think that Toyota System is a set of improvements complementary to the  "Mass Production System" and nothing more than the article tries to induce. Is not "Toyota System" a "Mass Production System"?

Regards,
Dan

Ged Byrne

unread,
Apr 24, 2012, 8:53:24 AM4/24/12
to flow-based-...@googlegroups.com
Hi Dan,

I was just over extending a metaphor in my post.  The link was an intended generic description of mass vs flow production. I don't think it's very relevant to FBP and I regret mentioning it in the original post.  None of the Disruptor developers have made the comparison, it was just me.  

I don't think its a good idea to discuss it any further on this group.

However, I'm always happy to discuss these things personally, so I'll send you an email directly.

Regards, 


Ged

Ged Byrne

unread,
Apr 24, 2012, 9:32:40 AM4/24/12
to flow-based-...@googlegroups.com
Hi again,

You're right that it isn't a good article.  i mainly use it because it has a couple of nice pictures.

It does suffer from the usual problem around Lean: marketing hype.  It tries to sell Toyota as this amazing, unique company that changed the world.  Theory of Constraints has a similar problem.  It becomes more about selling a brand than good engineering.

From an engineering perspective it is all down to one thing: managing work in progress.  This is often referred to as "Limiting WIP" but I think that's over simplifying it.  As you say, sometimes there is good reason to introduce a buffer in order to keep the flow level.

I'd just like to specifically address a couple of the points your raise:
In the Toyota system Dorothy makes the Stephen's job entirely. It's just a matter of working (time) balance here that is not related to either system. A person is just doing more work so another one could be released.

The key point, completely missed in the article, it Takt time.  Within the one piece flow every person completes there task within the Takt time, a single cycle within the factory.  If Dorothy can add the spring in addition to her other tasks within a single cycle then it is effectively being done without taking up any extra time.

There is a danger in taking this out of context.  What if the extra work means that Dorothy is now overburdened?  What if the change in working conditions means that she is now so stretched that every time she sneezes or scratches her nose she fails to complete her task within a cycle?  As soon as this happens she will break the flow of the line and the problem will immediately become apparent.  

If there was a buffer then their would be enough slack for Dorothy to fall behind every now and then and catch up a little later.  Such variation occurs everywhere in the line and is absorbed by the buffers.  This can be seen as an advantage of buffers, but it can also be a disadvantage because it is hiding information.  When a manager removes Stephen and forces Dorothy to do more he appears to have been successful and suffered no consequences.  Then he does the same thing again, and again, and again.  By the time all of the spare capacity of the buffers is finally exhausted it is too late to fix things.

 Is the name: "Mass Production System"  contrary to the "Toyota System"? I think that Toyota System is a set of improvements complementary to the  "Mass Production System" and nothing more than the article tries to induce. Is not "Toyota System" a "Mass Production System"?

I completely agree with you.  There's a remarkable book from the 50s by an unknown Englishman called Frank Woollard called "The Principles of Mass and Flow Production."  I blog about it here: http://softwareflow.wordpress.com/2011/07/16/woollardandflowproductionorigins/

In his principles of Flow Production is defines it's application as an alternative to Mass production: 

1. a) Mass production demands mass consumption. b) Flow production requires continuity of demand.

Where there is mass consumption over production is not a problem.  Demand can be treated as infinite.  You can sell as many widgets as you can make.  Mass production is fine here because you don't worry about suddenly having unwanted inventory on your hands.

When demand is finite, such as a niche market or a market that has reach saturation, over production is a problem.   If demand is ongoing then flow production allows for the measured release of products to market while minimising the risk of unwanted inventory. 

I'd like to hear more about what you think.

Dan

unread,
Apr 24, 2012, 4:13:30 PM4/24/12
to flow-based-...@googlegroups.com, ged....@gmail.com
Thank you Ged for your comments. Despite the quality of that article I think the discussion here was very useful. For me it is very interesting to notice how the people think and how they can draw conclusions in a religious manner. If A is true then B should be true just because all the process of reasoning contained "scientific" and "precise mathematical equations". Also it is interesting to notice how many programmers are just victims of such illusions.

Regards,
Dan

Paul Morrison

unread,
Apr 25, 2012, 1:31:05 PM4/25/12
to Flow Based Programming
Hi Bill,

Re Threadpools, I found the following:

"Thread pooling works only when the tasks are relatively short-
lived. ... For tasks that run indefinitely, a normal thread is usually
a better choice."

What do you think?

I am still interested in techniques for speeding up the FBP
implementations (JavaFBP on my desktop (4 AMD II 925 processors) only
runs at 770,000 sends or receives / sec) , but I suspect that there
may be hidden differences that might make it hard for me to pick up
your techniques. e.g. multiple-producer, single-consumer... How can
we get a handle on this?

What do you actually mean by 1.2 billion messages? 1.2 billion send/
receive pairs?

TIA

Paul

On Apr 22, 10:30 pm, William la Forge <laforg...@gmail.com> wrote:
> Paul,
>
> I've developed some techniques in JActor which I believe would benefit you.
> (Current version of JActor moves 1.2 billion messages per second on an i5.)
>
> JActor is fully multi-threaded, but except as mandated by the developer it
> runs synchronously whenever it can. In particular, when a mailbox sends a
> message to another mailbox which is empty/idle, the sending mailbox
> commandeers the receiving mailbox and does the processing on its own
> thread--providing the receiving mailbox allows it (it is not always a good
> idea). Seehttps://github.com/laforge49/JActor/blob/master/src/main/java/org/agi...
>
> Of course, JActor uses a number of speed enhancing techniques. Another is
> the use of a very fast alternative to threadpools:https://github.com/laforge49/JActor/blob/master/src/main/java/org/agi...
>
> JActor is entirely lock free. The JAThreadManager does use a semaphore, and
> JAEventQueue does use an AtomicReference. (Without polling!) But that's
> about it. And it runs at a speed comparable to the disruptor pattern. (I'm
> not sure which is faster--but that probably depends on the architecture of
> the application and the quality of the implementation.) My own
> biased opinion here is that JActor is easier to use and would as a result
> generally give you a faster implementation.
>
> Bill
>
> On Sun, Apr 22, 2012 at 9:56 PM, Paul Morrison <paul.morri...@rogers.com>wrote:
>
>
>
>
>
>
>
> >  Hi Ged,
>
> > I have been playing with my JavaFBP code, and went off after what I think
> > is a red herring - see a couple of incoherent notes I sent off earlier on
> > this thread!  Apologies!
>
> > The LMAX approach is very different from FBP's and IMHO *much *more
> > complicated!  And they are trying to build a high-accuracy financial system
> > (if I understand this correctly) on top of this logic - I don't think I
> > would dare :-)    As one of them says somewhere, lock-free concurrency is
> > not for the faint of heart!
>
> > Granted their approach is *very* fast - but we can still probably milk

John Cowan

unread,
Apr 25, 2012, 1:37:46 PM4/25/12
to flow-based-...@googlegroups.com
Paul Morrison scripsit:

> "Thread pooling works only when the tasks are relatively short-
> lived. ... For tasks that run indefinitely, a normal thread is usually
> a better choice."

Thread-pooling is just good old CICS, so it works fine for components without
state (your "non-loopers"). When the component needs to keep state, it's not
a win, because the component will typically need to run until the program
is over, and its thread will not get released back to the pool.

--
As we all know, civil libertarians are not John Cowan
the friskiest group around --comes from co...@ccil.org
forever being on the qui vive for the sound http://www.ccil.org/~cowan
of jack-booted fascism coming down the pike. --Molly Ivins

Raoul Duke

unread,
Apr 25, 2012, 1:56:37 PM4/25/12
to flow-based-...@googlegroups.com
On Wed, Apr 25, 2012 at 10:37 AM, John Cowan <co...@mercury.ccil.org> wrote:
> Thread-pooling is just good old CICS

thanks! i learn something new every day!
http://preview.tinyurl.com/d8vsfof

Paul Morrison

unread,
Apr 25, 2012, 2:30:55 PM4/25/12
to flow-based-...@googlegroups.com
I wonder if they have finally gotten around to adding FBP to CICS :-)
When we were looking at supporting FBP on OLTP systems, IMS was no
problem, but CICS was a problem just because its internals were so
similar to those of AMPS (the first FBP implementation - early 1970s) .
And yes, CICS is that old! It would have been easy for IBM to add
FBP-style connections to CICS, but of course there was the usual NIH
(Not Invented Here) problem.

William la Forge

unread,
Apr 25, 2012, 9:59:06 PM4/25/12
to flow-based-...@googlegroups.com
Paul,

I'm only doing 600 million send/receive pairs. And that particular timing test uses several threads to fully load the i5, and with all message passing synchronous (the default).

JActor also handles multiple sources and/or multiple consumers (of requests), so that is not an issue.

Now when a request is sent, it can be sent either synchronously or asynchronously, transparently. Requests can also be applied to a consumer with a call method, in which case it is always sent synchronously. Call is a convenience to the app dev, but call can only be used when both actors use the same mailbox--which means that both always use the same thread.

Now what I love about 2-way messages is that it is easy to do synchronous message exchanges without having to worry about stack overflow.

I also use commandeering--it is slower than simple synchronous (when both actors use the same mailbox), but still much faster than than asynchronous. Commandeering (of the consumer actor's mailbox by the sending actor's mailbox) can only occur when the consumer mailbox is idle (is not commandeered and has no active thread). And while a mailbox is commandeered it can not get a thread of its own. Commandeering is implemented using an atomic reference.

As for long lived (dedicated) threads vs threadpool, with commandeering it is important that most mailboxes not have a dedicated thread. But note that commandeering only works with mailboxes that are mostly idle.

Also, for good performance it is generally accepted that the number of threads shouldn't be much larger than the number of hardware threads--4 of an i5, 8 for an i7, 12 for i7 extreme or i9. Of course, threads that are blocked on I/O don't figure into this.

Hope this was helpful.

Bill

Paul Morrison

unread,
Apr 29, 2012, 1:21:57 PM4/29/12
to Flow Based Programming
Thanks, William, I'm still having trouble getting my head around this
stuff :-) The terminology is different too...

I discovered that my test case wasn't using all the processors to
100%, and I wasn't distinguishing between create/drop pairs and send/
receive pairs. My latest results give roughly 2.3 microsecs per send/
receive pair - i.e. a bit over 400,000 send/receive pairs per second.
My test runs on an AMD Phenom II X4 925 (4 processors), and keeps all
4 processors busy, while yours runs on an Intel i5 (is it 2 or 4
processors?).

You might be interested in looking at Sriram Srinivasan's stuff,
called Kilim - there is quite a good video on http://www.youtube.com/watch?v=37NaHRE0Sqw
. He describes a benchmark with 9,000,000 messages, running at .45
microsecs per message (send/receive pair), and he is using a lot of
very interesting techniques, involving IIRC injecting bytecode into
Java methods. This still only gives 2,000,000 messages per sec,
although, for a large number of processes, it is faster than Erlang,
which is usually considered a benchmark for this kind of thing. Now I
am very willing to believe that your stuff is way faster than my
implementation, but I have trouble believing it's 1500 times faster
(or 300 times faster than Kilim)! As I said before, we need to figure
out if we are comparing chalk with chalk, or chalk with cheese, so one
possibility is for you to actually run one of the Appkatas, and then
we can tell if we are talking about the same (or comparable)
systems...

Look forward to continuing the discussion,

Paul

BTW does anyone know how Java assigns threads to processors? Another
test case (with way more processes) was only loading my processors at
35%!


On Apr 25, 9:59 pm, William la Forge <laforg...@gmail.com> wrote:
> Paul,
>

Raoul Duke

unread,
Apr 29, 2012, 4:07:51 PM4/29/12
to flow-based-...@googlegroups.com
On Sun, Apr 29, 2012 at 10:21 AM, Paul Morrison
<paul.m...@rogers.com> wrote:
> You might be interested in looking at Sriram Srinivasan's stuff,
> called Kilim - there is quite a good video on http://www.youtube.com/watch?v=37NaHRE0Sqw
> .   He describes a benchmark with 9,000,000 messages, running at .45
> microsecs per message (send/receive pair), and he is using a lot of
> very interesting techniques, involving IIRC injecting bytecode into
> Java methods.  This still only gives 2,000,000 messages per sec,

Sriram appears to be a smart guy. :-)

But, based on my experience, use of bytecode injection in most java
external projects is fundamentally flawed. It seems to inevitably not
play well with other libraries and/or systems. So speaking for myself,
anything that requires bytecode injection is automatically right out
:-)

Paul Morrison

unread,
Apr 29, 2012, 9:00:59 PM4/29/12
to flow-based-...@googlegroups.com
On 29/04/2012 4:07 PM, Raoul Duke wrote:
> Sriram appears to be a smart guy. :-) But, based on my experience, use
> of bytecode injection in most java external projects is fundamentally
> flawed. It seems to inevitably not play well with other libraries
> and/or systems. So speaking for myself, anything that requires
> bytecode injection is automatically right out :-)

Yes, I had that concern too! Maybe he is hoping that Java will
eventually add the tools he needs to the JVM...
--
http://jpaulmorrison.blogspot.com/

William la Forge

unread,
Apr 29, 2012, 11:10:52 PM4/29/12
to flow-based-...@googlegroups.com
Paul,

An i5 is a dual core with 4 hardware threads (total).

Some details on my timings....

First, I count each request/response as two. Depending on the mode of operation, this can be 2 distinct messages or one method call.


There are 3 tests covering the 3 modes of operation (which depends on mailbox configuration, NOT the application code or the API):

1. SharedMailboxTest - a common mailbox is used by the actors, which means everything is synchronous.

2. MailboxTest - different mailboxes are used, but commandeering is allowed. So when the target of the request has an idle mailbox, the target mailbox is acquired and messages are passed synchronously.

3. AsyncMailboxTest - Purely asynchronous message passing, as commandeering is not allowed by the mailbox of the target actor.

Shared mailbox timings:

        //burst size of 1000
        //16 parallel runs of 200000000 messages each.
        //3200000000 messages sent with 4 threads.
        //msgs per sec = 1,005,340,873 (now 185,485,740)
        //.99 nanoseconds per message
        //5 clock cycles per message

Mailbox timings (with commandeering):

        //burst size of 1000
        //16 parallel runs of 20000000 messages each.
        //320000000 messages sent with 4 threads.
        //msgs per sec = 108511359
        //9.2 nanoseconds per message
        //46 clock cycles per message

Fully asynchronous timings:

        //burst size of 1000
        //4 parallel runs of 100000000 messages each.
        //400000000 messages sent with 4 threads.
        //msgs per sec = 43080236
        //23 nanoseconds per message
        //116 clock cycles per message

In practice, I'm finding that I mostly create groups of actors with the same mailbox where most messaging is exchanged. So the billion mps is actually valid. But I'm doing actors in the small here, with actors also serving as data. (I can create a billion actors per second.) I realize that FBP is not actors in the small.

On the other hand the biggest problem I see with async message passing occurs when the target is idle--which can occur when an application gains complexity and specialized components are used to handle special (rare) cases. In this case you aren't grouping your messages and passing them as a block but are sending them 1 at a time so you pay the full price for asynchronicity. This is where commandeering really shines.




We can also skype if that will help, though remember that I'm in India. Send me your skype id if you'd like to lafo...@gmail.com and we can work out a time. (My skype is normally not running.)

Bill

William la Forge

unread,
Apr 29, 2012, 11:13:20 PM4/29/12
to flow-based-...@googlegroups.com
Oh yes, I just wanted to say that I'm not the only one getting such numbers. The guys who developed the E language tell me that their actors are passing messages at about the same rate. --Bill

Paul Morrison

unread,
May 1, 2012, 12:03:18 PM5/1/12
to flow-based-...@googlegroups.com
Thanks, William, the pdf helps quite a bit! 

I would still like to see how you tackle one of the Appkatas, though.  Definitions can be found in http://geekswithblogs.net/theArchitectsNapkin/archive/2011/06/25/appkata---enter-the-next-level-of-programming-exercises.aspx .

Regards,

Paul

William la Forge

unread,
May 1, 2012, 10:55:55 PM5/1/12
to flow-based-...@googlegroups.com
Paul,

I do not see how these exercises would be helpful.

First, there is currently no GUI support. And that will take a while to come. I'm having a meeting with a dev this weekend to discuss integration with zeromq, but that doesn't apply either.

Second, actors with the same mailbox can simply call methods on each other. The catch is that without an RP instance you can only send events to actors with a different mailbox. But wrapping like a solution to a CSV problem in an actor would not illustrate anything.

The next big project I plan to work on will be file persistence. The highlight will be a reliable transaction logger that can handle (I hope) a million transactions per second. (Reliable means that a transaction isn't considered logged until it has been flushed.) 

This will be a stepping stone to a high performance RELIABLE in-memory database, something like redis but much faster. (The problem with redis I've heard is that transaction rate drops to almost nothing when you turn on flushing.)

Bill

Paul Morrison

unread,
May 2, 2012, 9:46:17 AM5/2/12
to flow-based-...@googlegroups.com
On 01/05/2012 10:55 PM, William la Forge wrote:
> I do not see how these exercises would be helpful.
The Appkatas are just applications. Can you build an application using
JActor?!
>
> First, there is currently no GUI support.

Just code up the app using text - that's what we did for the first
40-odd years, and it worked fine. As I said in my book, some people
were able to build apps using FBP without ever drawing a picture - even
on paper! I can't do that, but some people do - and very successfully!
> Second, actors with the same mailbox can simply call methods on each
> other. The catch is that without an RP instance you can only send
> events to actors with a different mailbox. But wrapping like a
> solution to a CSV problem in an actor would not illustrate anything.

Sorry, what's RP?

HTH,

Paul


--
http://jpaulmorrison.blogspot.com/

William la Forge

unread,
May 2, 2012, 10:46:56 AM5/2/12
to flow-based-...@googlegroups.com
RP is the callback class used when sending a request or event:  https://github.com/laforge49/JActor/blob/master/src/main/java/org/agilewiki/jactor/RP.java 

An application that makes effective use of JActor would be difficult without additional libraries. The old AgileWiki project was such an application, but it was huge and none of the subsystems were ever really finished.

My current approach is to implement and perfect those subsystems one at a time. I can make a good stab at this because I know the a use case very well--I've spent 10 years on this. I also plan to do some interesting things along the way, because AgileWiki is really too big for one person to do and my hope is that some of those things along the way will attract other participants.

I had planned on working on persistence next, but this weekend I may be meeting with some developers interested in integrating zeromq with JActor/JID, so we'll see what happens. Meanwhile this week I am taking holiday to work on other things like taxes. Joy!

Bill

Ged Byrne

unread,
May 3, 2012, 3:19:11 AM5/3/12
to flow-based-...@googlegroups.com
Hi William,

Can you think of a good simple demonstration that could be implemented in both JActor and FBP that would make sense?

How about a Soduku solver?  Could this Stage implementation easily be implemented in JActor: http://people.ucalgary.ca/~sillito/stage/sudoku.html ?

I'd love to do some side by side comparisons.

Regards, 


Ged

On 2 May 2012 03:55, William la Forge <lafo...@gmail.com> wrote:
Paul,

I do not see how these exercises would be helpful.

[...]

Paul Morrison

unread,
May 3, 2012, 12:18:39 PM5/3/12
to flow-based-...@googlegroups.com
Thanks Ged,

The Stage implementation certainly looks like it would map on to JActor.  However, I should confess that Peter Norvig's solution matches more closely the way I think about such problems - plus, I wouldn't know how to build a network in (classical) FBP to implement the Sillito solution.   I tend to feel that FBP buys you less as your processes become more strongly connected... and the Sillito solution would presumably involve connecting each of the 81 cells to 24 (?) neighbours...  Plus you need tables describing these relationships - so I'd be inclined to do the whole job using tables...

Still, I'd be interested to see what William comes up with, and it would be interesting to compare its performance with some of the other implementations described at the bottom of Peter's article....

Regards,

Paul

William la Forge

unread,
May 3, 2012, 9:16:26 PM5/3/12
to flow-based-...@googlegroups.com
I would simply implement everything as ordinary objects with the cells calling each other directly and the master object waking each cell in turn repeatedly until there are no changes. And then change all these classes to be subclasses of JLPCActor and instantiate them all with the same mailbox. --Not very interesting.

Alternately, I could implement them all as subclasses of JLPCActor, instantiate them all with asynchronous mailboxes and send asynchronous requests instead of just using method calls. --Not very fast.

Again, this doesn't look like a good choice for comparison, except to note that there are two possible implementations.

Sorry.

Bill

William la Forge

unread,
May 3, 2012, 10:07:33 PM5/3/12
to flow-based-...@googlegroups.com
Oh yes, note that the second solution becomes effectively the same as the first (only a tad slower) if all the actors are instantiated with the same mailbox. --b

Tom Young

unread,
May 3, 2012, 10:11:23 PM5/3/12
to flow-based-...@googlegroups.com
A DFD network definition  might look something like this:

%NP = 81;

PROCESS S supervisor $(%NP);
PROCESS D disp ;
PROCESS DRAW draw;
PROCESS J Join $(%NP);
PROCESS SPLIT Split "2" ;

%I = %NP;
WHILE %I > 0 
    { $I = $(%I);
      &CELL = &(CELL * $I);
      PROCESS  &CELL cell;
      &JIN =  &(in * $I);
      &DOUT = &(out * $I);
      (D)&DOUT => IN(&CELL);
      (&CELL)OUT => &JIN(J); 
      %I = %I - 1;
    };

(SPLIT)out2 => IN(DRAW);
(SPLIT)out1 ==> IN(S);
(J)out => IN(S)OUT => in(D);

Which expands to this:

        (CELL1 localhost:cell )OUT => in1(J localhost:Join 81);
        (CELL2 localhost:cell )OUT => in2(J localhost:Join 81);
        (CELL3 localhost:cell )OUT => in3(J localhost:Join 81);
        (CELL4 localhost:cell )OUT => in4(J localhost:Join 81);
        (CELL5 localhost:cell )OUT => in5(J localhost:Join 81);
        (CELL6 localhost:cell )OUT => in6(J localhost:Join 81);
        (CELL7 localhost:cell )OUT => in7(J localhost:Join 81);
        (CELL8 localhost:cell )OUT => in8(J localhost:Join 81);
        (CELL9 localhost:cell )OUT => in9(J localhost:Join 81);
        (CELL10 localhost:cell )OUT => in10(J localhost:Join 81);
        (CELL11 localhost:cell )OUT => in11(J localhost:Join 81);
        (CELL12 localhost:cell )OUT => in12(J localhost:Join 81);
        (CELL13 localhost:cell )OUT => in13(J localhost:Join 81);
        (CELL14 localhost:cell )OUT => in14(J localhost:Join 81);
        (CELL15 localhost:cell )OUT => in15(J localhost:Join 81);
        (CELL16 localhost:cell )OUT => in16(J localhost:Join 81);
        (CELL17 localhost:cell )OUT => in17(J localhost:Join 81);
        (CELL18 localhost:cell )OUT => in18(J localhost:Join 81);
        (CELL19 localhost:cell )OUT => in19(J localhost:Join 81);
        (CELL20 localhost:cell )OUT => in20(J localhost:Join 81);
        (CELL21 localhost:cell )OUT => in21(J localhost:Join 81);
        (CELL22 localhost:cell )OUT => in22(J localhost:Join 81);
        (CELL23 localhost:cell )OUT => in23(J localhost:Join 81);
        (CELL24 localhost:cell )OUT => in24(J localhost:Join 81);
        (CELL25 localhost:cell )OUT => in25(J localhost:Join 81);
        (CELL26 localhost:cell )OUT => in26(J localhost:Join 81);
        (CELL27 localhost:cell )OUT => in27(J localhost:Join 81);
        (CELL28 localhost:cell )OUT => in28(J localhost:Join 81);
        (CELL29 localhost:cell )OUT => in29(J localhost:Join 81);
        (CELL30 localhost:cell )OUT => in30(J localhost:Join 81);
        (CELL31 localhost:cell )OUT => in31(J localhost:Join 81);
        (CELL32 localhost:cell )OUT => in32(J localhost:Join 81);
        (CELL33 localhost:cell )OUT => in33(J localhost:Join 81);
        (CELL34 localhost:cell )OUT => in34(J localhost:Join 81);
        (CELL35 localhost:cell )OUT => in35(J localhost:Join 81);
        (CELL36 localhost:cell )OUT => in36(J localhost:Join 81);
        (CELL37 localhost:cell )OUT => in37(J localhost:Join 81);
        (CELL38 localhost:cell )OUT => in38(J localhost:Join 81);
        (CELL39 localhost:cell )OUT => in39(J localhost:Join 81);
        (CELL40 localhost:cell )OUT => in40(J localhost:Join 81);
        (CELL41 localhost:cell )OUT => in41(J localhost:Join 81);
        (CELL42 localhost:cell )OUT => in42(J localhost:Join 81);
        (CELL43 localhost:cell )OUT => in43(J localhost:Join 81);
        (CELL44 localhost:cell )OUT => in44(J localhost:Join 81);
        (CELL45 localhost:cell )OUT => in45(J localhost:Join 81);
        (CELL46 localhost:cell )OUT => in46(J localhost:Join 81);
        (CELL47 localhost:cell )OUT => in47(J localhost:Join 81);
        (CELL48 localhost:cell )OUT => in48(J localhost:Join 81);
        (CELL49 localhost:cell )OUT => in49(J localhost:Join 81);
        (CELL50 localhost:cell )OUT => in50(J localhost:Join 81);
        (CELL51 localhost:cell )OUT => in51(J localhost:Join 81);
        (CELL52 localhost:cell )OUT => in52(J localhost:Join 81);
        (CELL53 localhost:cell )OUT => in53(J localhost:Join 81);
        (CELL54 localhost:cell )OUT => in54(J localhost:Join 81);
        (CELL55 localhost:cell )OUT => in55(J localhost:Join 81);
        (CELL56 localhost:cell )OUT => in56(J localhost:Join 81);
        (CELL57 localhost:cell )OUT => in57(J localhost:Join 81);
        (CELL58 localhost:cell )OUT => in58(J localhost:Join 81);
        (CELL59 localhost:cell )OUT => in59(J localhost:Join 81);
        (CELL60 localhost:cell )OUT => in60(J localhost:Join 81);
        (CELL61 localhost:cell )OUT => in61(J localhost:Join 81);
        (CELL62 localhost:cell )OUT => in62(J localhost:Join 81);
        (CELL63 localhost:cell )OUT => in63(J localhost:Join 81);
        (CELL64 localhost:cell )OUT => in64(J localhost:Join 81);
        (CELL65 localhost:cell )OUT => in65(J localhost:Join 81);
        (CELL66 localhost:cell )OUT => in66(J localhost:Join 81);
        (CELL67 localhost:cell )OUT => in67(J localhost:Join 81);
        (CELL68 localhost:cell )OUT => in68(J localhost:Join 81);
        (CELL69 localhost:cell )OUT => in69(J localhost:Join 81);
        (CELL70 localhost:cell )OUT => in70(J localhost:Join 81);
        (CELL71 localhost:cell )OUT => in71(J localhost:Join 81);
        (CELL72 localhost:cell )OUT => in72(J localhost:Join 81);
        (CELL73 localhost:cell )OUT => in73(J localhost:Join 81);
        (CELL74 localhost:cell )OUT => in74(J localhost:Join 81);
        (CELL75 localhost:cell )OUT => in75(J localhost:Join 81);
        (CELL76 localhost:cell )OUT => in76(J localhost:Join 81);
        (CELL77 localhost:cell )OUT => in77(J localhost:Join 81);
        (CELL78 localhost:cell )OUT => in78(J localhost:Join 81);
        (CELL79 localhost:cell )OUT => in79(J localhost:Join 81);
        (CELL80 localhost:cell )OUT => in80(J localhost:Join 81);
        (CELL81 localhost:cell )OUT => in81(J localhost:Join 81);
        (SPLIT localhost:Split 2)out1 => IN(S localhost:supervisor 81);
        (SPLIT localhost:Split 2)out2 => IN(DRAW localhost:draw );
        (J localhost:Join 81)out => IN(S localhost:supervisor 81);
        (D localhost:disp )out1 => IN(CELL1 localhost:cell );
        (D localhost:disp )out2 => IN(CELL2 localhost:cell );
        (D localhost:disp )out3 => IN(CELL3 localhost:cell );
        (D localhost:disp )out4 => IN(CELL4 localhost:cell );
        (D localhost:disp )out5 => IN(CELL5 localhost:cell );
        (D localhost:disp )out6 => IN(CELL6 localhost:cell );
        (D localhost:disp )out7 => IN(CELL7 localhost:cell );
        (D localhost:disp )out8 => IN(CELL8 localhost:cell );
        (D localhost:disp )out9 => IN(CELL9 localhost:cell );
        (D localhost:disp )out10 => IN(CELL10 localhost:cell );
        (D localhost:disp )out11 => IN(CELL11 localhost:cell );
        (D localhost:disp )out12 => IN(CELL12 localhost:cell );
        (D localhost:disp )out13 => IN(CELL13 localhost:cell );
        (D localhost:disp )out14 => IN(CELL14 localhost:cell );
        (D localhost:disp )out15 => IN(CELL15 localhost:cell );
        (D localhost:disp )out16 => IN(CELL16 localhost:cell );
        (D localhost:disp )out17 => IN(CELL17 localhost:cell );
        (D localhost:disp )out18 => IN(CELL18 localhost:cell );
        (D localhost:disp )out19 => IN(CELL19 localhost:cell );
        (D localhost:disp )out20 => IN(CELL20 localhost:cell );
        (D localhost:disp )out21 => IN(CELL21 localhost:cell );
        (D localhost:disp )out22 => IN(CELL22 localhost:cell );
        (D localhost:disp )out23 => IN(CELL23 localhost:cell );
        (D localhost:disp )out24 => IN(CELL24 localhost:cell );
        (D localhost:disp )out25 => IN(CELL25 localhost:cell );
        (D localhost:disp )out26 => IN(CELL26 localhost:cell );
        (D localhost:disp )out27 => IN(CELL27 localhost:cell );
        (D localhost:disp )out28 => IN(CELL28 localhost:cell );
        (D localhost:disp )out29 => IN(CELL29 localhost:cell );
        (D localhost:disp )out30 => IN(CELL30 localhost:cell );
        (D localhost:disp )out31 => IN(CELL31 localhost:cell );
        (D localhost:disp )out32 => IN(CELL32 localhost:cell );
        (D localhost:disp )out33 => IN(CELL33 localhost:cell );
        (D localhost:disp )out34 => IN(CELL34 localhost:cell );
        (D localhost:disp )out35 => IN(CELL35 localhost:cell );
        (D localhost:disp )out36 => IN(CELL36 localhost:cell );
        (D localhost:disp )out37 => IN(CELL37 localhost:cell );
        (D localhost:disp )out38 => IN(CELL38 localhost:cell );
        (D localhost:disp )out39 => IN(CELL39 localhost:cell );
        (D localhost:disp )out40 => IN(CELL40 localhost:cell );
        (D localhost:disp )out41 => IN(CELL41 localhost:cell );
        (D localhost:disp )out42 => IN(CELL42 localhost:cell );
        (D localhost:disp )out43 => IN(CELL43 localhost:cell );
        (D localhost:disp )out44 => IN(CELL44 localhost:cell );
        (D localhost:disp )out45 => IN(CELL45 localhost:cell );
        (D localhost:disp )out46 => IN(CELL46 localhost:cell );
        (D localhost:disp )out47 => IN(CELL47 localhost:cell );
        (D localhost:disp )out48 => IN(CELL48 localhost:cell );
        (D localhost:disp )out49 => IN(CELL49 localhost:cell );
        (D localhost:disp )out50 => IN(CELL50 localhost:cell );
        (D localhost:disp )out51 => IN(CELL51 localhost:cell );
        (D localhost:disp )out52 => IN(CELL52 localhost:cell );
        (D localhost:disp )out53 => IN(CELL53 localhost:cell );
        (D localhost:disp )out54 => IN(CELL54 localhost:cell );
        (D localhost:disp )out55 => IN(CELL55 localhost:cell );
        (D localhost:disp )out56 => IN(CELL56 localhost:cell );
        (D localhost:disp )out57 => IN(CELL57 localhost:cell );
        (D localhost:disp )out58 => IN(CELL58 localhost:cell );
        (D localhost:disp )out59 => IN(CELL59 localhost:cell );
        (D localhost:disp )out60 => IN(CELL60 localhost:cell );
        (D localhost:disp )out61 => IN(CELL61 localhost:cell );
        (D localhost:disp )out62 => IN(CELL62 localhost:cell );
        (D localhost:disp )out63 => IN(CELL63 localhost:cell );
        (D localhost:disp )out64 => IN(CELL64 localhost:cell );
        (D localhost:disp )out65 => IN(CELL65 localhost:cell );
        (D localhost:disp )out66 => IN(CELL66 localhost:cell );
        (D localhost:disp )out67 => IN(CELL67 localhost:cell );
        (D localhost:disp )out68 => IN(CELL68 localhost:cell );
        (D localhost:disp )out69 => IN(CELL69 localhost:cell );
        (D localhost:disp )out70 => IN(CELL70 localhost:cell );
        (D localhost:disp )out71 => IN(CELL71 localhost:cell );
        (D localhost:disp )out72 => IN(CELL72 localhost:cell );
        (D localhost:disp )out73 => IN(CELL73 localhost:cell );
        (D localhost:disp )out74 => IN(CELL74 localhost:cell );
        (D localhost:disp )out75 => IN(CELL75 localhost:cell );
        (D localhost:disp )out76 => IN(CELL76 localhost:cell );
        (D localhost:disp )out77 => IN(CELL77 localhost:cell );
        (D localhost:disp )out78 => IN(CELL78 localhost:cell );
        (D localhost:disp )out79 => IN(CELL79 localhost:cell );
        (D localhost:disp )out80 => IN(CELL80 localhost:cell );
        (D localhost:disp )out81 => IN(CELL81 localhost:cell );
        (S localhost:supervisor 81)OUT => in(D localhost:disp );
# 1 Partition, 166 DataFlows, 86 Processes, 6 Components, 331 Gates. 81 Cycles

This avoids each CELL having to compute its neighbors, instead the supervisor determines which CELL process
gets which message, attaches its number to the message and sends it to the dispacher (D).  The Join and Split
components already exist leaving only 4 components to be written.  The cell component could be written in Ruby (and that code is available online).    

        Cheers,

                    -Tom
--
Thomas W. Young, Founding Member
Stamford Data, LLC
47 Mitchell Street, Stamford,  CT  06902

Phone: (203)539-1278
Email: TomY...@stamforddata.com


Tom Young

unread,
May 3, 2012, 10:21:53 PM5/3/12
to flow-based-...@googlegroups.com
Correction:  the dispatcher component also exists, so only three components need to be created.


On Thu, May 3, 2012 at 10:11 PM, Tom Young <stamfo...@twyoung.com> wrote:
A DFD network definition  might look something like this:

%NP = 81;

PROCESS S supervisor $(%NP);
           . . .
 
--
"...the shell syntax required to initiate a coprocess and connect its input and output to other processes is quite contorted..."
W. Richard Stevens [Advanced Programming in the Unix Environment, p441]

Paul Morrison

unread,
May 5, 2012, 10:32:51 AM5/5/12
to flow-based-...@googlegroups.com
Hi Bill,

I have to agree with you - this problem doesn't seem like a very good
choice for comparison - what about one or more of the Appkatas, as I
suggested - or perhaps something along the lines of Henri Bergius'
NoFlo examples...?

Regards,

Paul

William la Forge

unread,
May 5, 2012, 12:12:43 PM5/5/12
to flow-based-...@googlegroups.com
Paul,

The problem with a one-person team is that you need to keep a tight focus. Of course that means opportunities lost, but tough judgement calls are a necessity.

I've been working on the same project for about 10 years now. A very substantial project involving countless rewrites, where no part of it was ever really finished. Needless to say, there was very little interest. But I did develop a good perspective of the problem domain that I want to work in, and a lot of experience in a number of related technologies. 

A few months back I decided to take a different approach. Pick one thing, polish and finish it and then pick another. I started with JActor and finished it. And was happy that the performance was more than respectable, because unless you learn JActor you can only make limited use of the rest of what I am working on. I'm very much aware of acceptance barriers. But there is some interest. And I suspect as I add to this first thing, the interest may increase.

I recently finished the next thing, JID--incremental deserialization/reserialization. Its performance is also more than reasonable and I'm happy for that. Still it is a long way from a set of capabilities that one can easily build an application on.

Today I made a preliminary release of the next thing, JFile. My next step now is to add to JFile a high-throughput transaction logger. I am hopeful that the numbers will be good, but I'm pretty sure that I will achieve something less than a logging rate (with force/flush) of a 1,000,000 transactions per second. Still I am hopeful that the final numbers will be good, as this will place an upper limit on subsequent work. But I am very focused on writing this transaction logger. I need to know what kind of performance I can deliver.

If I can deliver a good number on this logger, perhaps I can attract other developers and build a team. A one-person technology isn't going to go very far. And that is also a reason why I've worked so hard on the wiki pages, and why I've started going to a number of meetups around Bangalore. I gave one talk (twice) at the Bangalore Open Java User's group. In 4-6 weeks I will be giving another talk for a meetup called "revenge of the nerds" and I will publish the slides for those as well. I've also given a 5-min lightning talk at the semi-annual stackoverflow meetup here in Bangalore.

After JFile, the next piece will be an in-memory database. I've talked with a number of developers about Redis and have learned that when force/flush is turned on--necessary for transaction durability--performance is less than reasonable. My hope is that I can deliver something with comparable capability but which can deliver both good throughput and transaction durability. Of course, to deliver a reasonable replacement for Redis, I'm going to need help.

So that's my short term focus. To deliver a usable in-memory database. And to do so by the end of the year. And to start building a team around JActor and these other technologies that can help deliver and support such a product. I'd say that my chances of success in this are going to be dependent on a lot of other people's good will. Which is why it is so tough to decide what to work on and what to let fall by the wayside.

Now to take my participation on this thread back to my initial comments, I was saying that I've developed a number of optimizations in JActor that allow me to deliver some interesting performance. And I am happy to discuss and explain them in the hopes that some of them might be used in other systems as well. Part of my motivation here is that the more I explain and discuss these ideas, the better I get at doing it. And if I get good enough, perhaps I can lower the barriers of acceptance for the things I have done.

I will also note that the Disruptor pattern was an inspiration to me, that there was a lot of room for improved performance. Naturally I read a number of articles on this technology and thought long and hard about it while working on JActor. Which is why I chimed in on this thread in the first place. :-)

Bill

Paul Morrison

unread,
May 7, 2012, 10:04:21 AM5/7/12
to flow-based-...@googlegroups.com
Hi Bill,

On 05/05/2012 12:12 PM, William la Forge wrote:
> The problem with a one-person team is that you need to keep a tight
> focus. Of course that means opportunities lost, but tough judgement
> calls are a necessity.

This seems counter-intuitive to me! With a one-person team, you have
much more freedom to experiment, to go down dead ends, and even
rearchitect when necessary. The moment you bring in another person,
even more so when you are the chief architect for a group of 60 people,
as I was, then you really have to keep focus :-)
>
> I've been working on the same project for about 10 years now. A very
> substantial project involving countless rewrites...

I may have missed this in one of your earlier posts, but what is the
area of application of this project - or aren't you allowed to say? I
am unclear what the motivation for JActor is: is it to take advantage of
multiple processors, to reduce development times, to improve
maintainability, or to ... ? In the case of FBP, when I first got a
feeling for its potential, I saw it as a way to vastly improve
programmer productivity. Later on, I realized that, even more, it
improved maintainability (or as Ralf calls it, evolvability); then still
later, as a way of taking advantage of multiple processors, and in fact
as a single application view that spans the range from mini to maxi :-)
And remember that the very first implementation was on single-processor,
IBM hardware the IBM 360), when we could hardly imagine having multiple
processors in the same box...

I'd be interested in your views on this - or anyone's!

Regards,

Paul

William la Forge

unread,
May 7, 2012, 11:58:43 AM5/7/12
to flow-based-...@googlegroups.com
Its a one-man team for the moment, but there is some local interest here in Bangalore, so that might change (again).

The need for a tight focus is because I've set goals and a timeline. I have a body of work that I want to deliver while I am still not too old to do it and enjoy some of the fruits of my work. Also, I'm married. :-)

I have a mentor in Boston, Norman Kashdan. He introduced me to a theory of knowledge systems (Rolonics Theory) that he developed (unpublished). I've been trying to implement that for the last 10 years. I call the project AgileWiki. That's the big project. Nothing secret. All open source.

I used to talk about Rolonics Theory a lot. But it is really difficult to grasp, because it is so damn intuitively obvious--once you can hear the sound of one hand clapping as it were. And besides being difficult to implement with current technology, it always sounds like smoke & mirrors when I talk about it.

I've had a lot of partial successes over the years, but as I said before too much of the code was never finished. And now I'm implementing AND FINISHING the parts as I go. But it is going to take years.

The motivation for JActor, JID and the new JFile projects are very low-level parts of AgileWiki. I'm implementing them because I need them, though they may have a further impact beyond the AgileWiki project. My hope, anyway.

My short term goal right now is an in-memory database. Then to expand it to a COW-based hierarchical database. And then integrate it with a web server. And then slowly to reintroduce Rolonics in the form of a Wiki, something like http://wagn.org/

Bill

ifknot

unread,
May 19, 2012, 7:19:16 PM5/19/12
to flow-based-...@googlegroups.com, ged....@gmail.com
IMVHO and from a C++ context wrp to performance...

A ring buffer is the intuitive choice for inter-thread communication avoiding all the overheads of list node creation/deletion. (okay you pay a fixed size price)

Further that it should be cache-aligned and preferably from a scalable allocator.

Then it seems filling the buffer up before you hand it over to the consumer thread, prevents cache corruption.

Go further and double buffer your pipe so that the consumer is busy draining one ring while your producer is filing up the other waiting to swap!

If your pipe only ever has one producer and one consumer it simplifies your concurrency issues and the then the only thing you have to 'lock' is the exchange of ring buffer pointers which one can do with atomic variables.

If you're consumer/producer rates aren't well matched and you want to keep the benefits of 1 to 1 producer-consumer pipes you can inverse multiplex into 1+n consumers, for example, down your double buffered pipes.

Don't know how well that can translate to Java?

ATB

ifknot

On Friday, 13 April 2012 10:25:48 UTC+1, Ged Byrne wrote:
Here in the London Java Community there's a lot of interest in the LMAX Disruptor framework.

When I finally get time, I'm hoping to investigate building an FBP framework based on Disruptors.

 The Business Logic Processor is surrounded by Disruptors - a concurrency component that implements a network of queues that operate without needing locks. During the design process the team concluded that recent directions in high-performance concurrency models using queues are fundamentally at odds with modern CPU design.

The team that developed the Disruptor looked at the Latency of a queue base architecture and found that the biggest bottleneck were the queues themselves, due to contention on the head.  Instead of a queue a ring buffer is used.  

The key thing about the Disruptor is that it was designed to have zero contention within the framework. This is achieved by following the single-writer principle—only one thing can ever write to a single piece of  data.

The full details are here: http://code.google.com/p/disruptor/

Here's an explanation about how the diamond pattern of consumers is wired up using a DSL: http://mechanitis.blogspot.co.uk/2011/07/dissecting-disruptor-wiring-up.html


Here is the traditional queue based approach:
Here is the Disruptor:

The single ring buffer is passed between the processors rather than the items being passed from queue to queue, removing the needs to read and write from the intermediate queues.

In factory analogy this is the equivalent of moving from series production to a one piece flow: http://www.thetoyotasystem.com/lean_concepts/one_piece_flow.php  

traditional productionToyota System

Instead of having buffers of inventory between each work station there is a continuous flow.

I'd love to hear peoples thoughts.

Regards, 


Ged




On Friday, 13 April 2012 10:25:48 UTC+1, Ged Byrne wrote:
Here in the London Java Community there's a lot of interest in the LMAX Disruptor framework.

When I finally get time, I'm hoping to investigate building an FBP framework based on Disruptors.

 The Business Logic Processor is surrounded by Disruptors - a concurrency component that implements a network of queues that operate without needing locks. During the design process the team concluded that recent directions in high-performance concurrency models using queues are fundamentally at odds with modern CPU design.

The team that developed the Disruptor looked at the Latency of a queue base architecture and found that the biggest bottleneck were the queues themselves, due to contention on the head.  Instead of a queue a ring buffer is used.  

The key thing about the Disruptor is that it was designed to have zero contention within the framework. This is achieved by following the single-writer principle—only one thing can ever write to a single piece of  data.

The full details are here: http://code.google.com/p/disruptor/

Here's an explanation about how the diamond pattern of consumers is wired up using a DSL: http://mechanitis.blogspot.co.uk/2011/07/dissecting-disruptor-wiring-up.html


Here is the traditional queue based approach:
Here is the Disruptor:

The single ring buffer is passed between the processors rather than the items being passed from queue to queue, removing the needs to read and write from the intermediate queues.

In factory analogy this is the equivalent of moving from series production to a one piece flow: http://www.thetoyotasystem.com/lean_concepts/one_piece_flow.php  

traditional productionToyota System

Instead of having buffers of inventory between each work station there is a continuous flow.

I'd love to hear peoples thoughts.

Regards, 


Ged



William la Forge

unread,
May 19, 2012, 10:42:44 PM5/19/12
to flow-based-...@googlegroups.com, ged....@gmail.com
In JFile I built a flow-constrained transaction pipeline, where backpressure is used to determine how many transactions to put into each block being passed down the pipeline.

The pipeline consists of an aggregator, a serializer, a logger and a transaction processor, where the logger does an fsync before passing the block to the transaction logger.

Built the whole thing using JActor actors and JID serialization. It is all Java and runs at 1.3 million transactions per second on my vaio laptop which has an i5.

I was very much inspired by the disruptor pattern, but I don't use ring buffers. Only I use ArrayList for transaction blocks and set a generous initial capacity--which gives me the same effect.


Bill

Samuel Lampa

unread,
Jul 6, 2016, 8:30:02 AM7/6/16
to Flow Based Programming, ged....@gmail.com
On Friday, April 13, 2012 at 11:25:48 AM UTC+2, Ged Byrne wrote:
Here in the London Java Community there's a lot of interest in the LMAX Disruptor framework.

Just starting to read up on this now .... 

One thought is, I wonder how it compares with FastFlow [1] [2], written in C++ and is, according to [1]:
"[...] a parallel programming framework for multi-core platforms based upon non-blocking lockfree/fencefree synchronization mechanisms"

Maybe a topic for a separate thread though...



(Continuing to read ...)

// Samuel
Reply all
Reply to author
Forward
0 new messages