FBP, FBP-inspired, FBP-like: multithreading options

280 views
Skip to first unread message

Paul Morrison

unread,
Aug 13, 2017, 10:02:31 AM8/13/17
to Flow Based Programming
There have been several attempts at classifying FBP and FBP-related systems, and IMO most fall down because there are so many variables. 

So I thought I would just try to list the options in just one dimension: multithreading.  This list is probably not definitive - it is more intended to spark discussion...

As I see it, there are three (or possibly four) possibilities:

  - multiple "threads" - can support multiple cores - called "processes" in Erlang...?

  - coroutines or green threads - requires multiple stacks - supports one core only

      - green threads, with sends/receives only at top level - can be done with one stack?  I believe some JavaScript services (yield?) have a "top frame only" restriction...?

  - FBP-inspired: send/receive implemented as call - I believe this is what NoFlo does...?

  - other flavours of FBP-inspired???


There may be other options - and the situation is complicated by the fact that there does not seem to be agreement on terminology across systems.  When I looked up "unit of concurrency" for my book, I found the following:

process (Erlang)
actor (actor theory)
agent
transaction
thread
operator (MIT, Expressor, et al.)
vat (E language)
island (Tweak)
task (Ada)
resource (Minuet)
stage (John Hartmann’s CMS Pipelines)

Plus:
coroutine (Conway)

Comments, anyone?






Paul Tarvydas

unread,
Aug 14, 2017, 4:25:37 PM8/14/17
to flow-based-...@googlegroups.com
On 2017-08-13 10:02 AM, Paul Morrison wrote:
...
actor (actor theory)
In https://www.youtube.com/watch?v=7erJ1DV_Tlo Hewitt says that Actor theory was invented as a way to reason about programs.

I haven't tried, but from how he describes it, it looks like Actors could be used to reason about (explain the semantics of) FBP.

pt

Sam Watkins

unread,
Aug 16, 2017, 1:25:29 AM8/16/17
to flow-based-...@googlegroups.com
I think there are many different aspects / important features
that could be used to classify or measure FBP systems.

- concurrency model
- single language / multiple langauges / any language
- portable to different platforms
- textual and graphical editors for programs
- packet type: streams, data packets, structures, objects; bracketing sub-streams
- extent of component toolset; can use existing tools?
- performance - overhead, throughput, latency, multiprocessing; support for GPU; support for hardware / FPGA
- heavy or light-weight system; support for embedded devices, server processes, browser processes
- granularity - can it support small processes that do one simple thing, without
- hierarchical composition
- process model - what is a process, support for hierarchy (sub-graphs), dynamic / adaptive networks?
- processes can be normal programs, or a limited type of subroutines, or state-machines, or ...?
- semantic model: stateful, functional, relational
- networking - can programs span networks
- realtime - suitable for realtime / performace-critical applications
- flexibility, to implement any type of program at least as well as conventional langauges; or limited applicable scope

... probably there are lots more important factors also!

Sam
> --
> You received this message because you are subscribed to the Google Groups "Flow Based Programming" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to flow-based-progra...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

ern0

unread,
Aug 16, 2017, 2:31:09 AM8/16/17
to Flow Based Programming, Sam Watkins
- run characteristics: push/pull/mixed/once-auto/once-manual
- does it support request-reply messages
- sync/async
- message graph pre-evaluation [1]
- multi-round processing [2]
- runtime editable graph
- multiplatform
- self-modification

[1]: Depending on the *data*, some processing skipped. See Houdini: it
disables calculating of non-visible objects.
[2]: Universal form of [1]
--
ern0
dataflow evangelist

Paul Morrison

unread,
Aug 16, 2017, 10:23:32 AM8/16/17
to flow-based-...@googlegroups.com, Sam Watkins
I totally agree, Sam!  I just thought I would get the ball rolling by singling out one dimension: one that I feel is a major consideration for implementation approaches...  I hope we can eventually expand on every item in the list you provided.  I look forward to the discussions!

Best regards,

Paul M.

> To unsubscribe from this group and stop receiving emails from it, send an email to flow-based-programming+unsub...@googlegroups.com.

> For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Flow Based Programming" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flow-based-programming+unsub...@googlegroups.com.

Paul Morrison

unread,
Aug 16, 2017, 11:04:16 AM8/16/17
to Flow Based Programming, s...@nipl.net
Sorry, Ernő, didn't see your post!  See my response to Sam!

Best regards,

Paul;
--
ern0
dataflow evangelist

Paul Morrison

unread,
Aug 17, 2017, 3:37:27 PM8/17/17
to Flow Based Programming
So far, nobody has commented on this classification "axis".  Sam and Ernő have correctly pointed out that there are a number of other possible axes, but IMO this one is the major determinant of how to build an FBP scheduler.

It is not just a matter of building an FBP-like scheduler in your favorite language, just because you can - my top question is about how many ways you can build a "classical" FBP scheduler which supports multiple cores!   You may think JavaScript is the greatest thing since sliced bread, but, if it can't do this, I am not interested!  In the TIOBE list of 5 most popular languages, Python has threading, and does not seem to have a "classical" FBP scheduler - unless it's pflow or rill - so this gap needs to be filled.

Paul Morrison

unread,
Aug 17, 2017, 4:08:02 PM8/17/17
to flow-based-...@googlegroups.com
Thanks for the video... I couldn't stand watching it!  They sound like they are talking to high-school kids! 😣

If I ever have a month or two to spare, I could try turning Collate into a network of actors - but frankly, I don't see the point!  Collate is already very simple - why would I want to make it more complex?!  And would it help people understand it any better? 

I also don't like the fact that Hewitt says storage is essential to his actors - I feel he is conflating two rather different concepts.  Yes, FBP processes have working storage, but most of the data they work with is held in Information Packets - so would Hewitt turn IPs into a different kind of actor?  And, at least in the part I listened to, he didn't mention indirect communication between processes via ports... so I imagine FBP connections, and maybe even ports, have to be yet other kinds of actor.  I'm afraid the model is getting more and more complex - with no advantage that I can see to the developer or maintainer! 

Thanks anyway, and, by the way, thanks also for your sterling work over the years as chief Organizer for the FBP Toronto Meetup!

All the best,

Paul M.





ern0

unread,
Aug 18, 2017, 3:05:38 AM8/18/17
to Flow Based Programming
I have identified three approaches in the topic of scheduler.

The first is default: the framework/engine should take care about
thread-component assignment, assigning free threads from a pool to
waiting components. It's like how Make (the build tool) works.

The second is quite primitive: every component runs in separate
thread. If I say: separate process, you have already figured out which
one I'm talking about: Unix pipes.

The third is not optimal, but gives control to the application
creator, and also it's very easy to implement (even I have implemented
it for our automation-purpose prototype system): the Thread component.
This component starts a thread upon startup for producer side, and
passes packet received in consumer port to it, so they will be
processed in the separate thread. Just comes to my mind, it should
launch more threads and play some kind of scheduler role.

I think, there's another area, which is somehow regarding to the
scheduler: persistency. Scheduler may implement pause/resume function,
even across sessions, I mean the system may be able to resume after it
has been exited, so the scheduler should save state: packets' value
and pending messages (in my Prototype system I've implemented it
halfway: values are stored in a file, but there are no strategy how to
resume the system, only values are loaded back).
> --
> You received this message because you are subscribed to the Google Groups
> "Flow Based Programming" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to flow-based-progra...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



--
ern0
dataflow evangelist

John Cowan

unread,
Aug 18, 2017, 8:00:00 AM8/18/17
to flow-based-...@googlegroups.com

On Thu, Aug 17, 2017 at 4:07 PM, Paul Morrison <jpau...@gmail.com> wrote:

I also don't like the fact that Hewitt says storage is essential to his actors - I feel he is conflating two rather different concepts.  Yes, FBP processes have working storage, but most of the data they work with is held in Information Packets - so would Hewitt turn IPs into a different kind of actor? 

Sort of.  The model divides actors into two kinds: active, which are like FBP processes, and passive, which are like objects (including packets).  (Active) actors certainly can be stateless: the main point is that all mutable state is in actors: none of it is global in the system as a whole.

The main difference between the actor model and the classical FBP model is that an actor has only a single input port, and must demultiplex inputs of different types itself.  This input port is backed by a queue whose length is determined when the actor is set up.  Another difference is that actors are typically created by other actors dynamically as needed rather than statically when the program begins, although nothing about FBP prevents the dynamic style from being used.  Typically the creation operation is passed the identities of other actors that the new actor will write to.  When an actor stops either voluntarily or because of an error, its creator is notified by a system-generated message; this allows clean failure and recovery.

I think the most important thing about classical FBP (though presaged in the way operating systems interact with programs) is the autonomous style: FBP processes pull from their inputs and push to their outputs, with the system handling the necessary buffering.  In most other systems, a program gets its input pushed to it and pushes its output.  However, it is also possible to create systems around pulling, where programs pull their inputs and are pulled from by their outputs.

-- 
John Cowan          http://vrici.lojban.org/~cowan        co...@ccil.org
Most languages are dramatically underdescribed, and at least one is
dramatically overdescribed.  Still other languages are simultaneously
overdescribed and underdescribed.  Welsh pertains to the third category.
        --Alan King

Paul Morrison

unread,
Aug 18, 2017, 9:25:23 AM8/18/17
to flow-based-...@googlegroups.com
Hi Ernő,

This is the kind of thing I was looking for! I think my preferred approach is your second option - using Java, C# or Boost threading.  I have never tried adding thread pools to "classical" FBP - not sure exactly how this would work...  Conversely, I am not sure how to implement multiple stacks in Java or C# (for coroutines) - I can do it in C...

However, re your other approaches, I am curious what languages you are referring to, and what threading tools...? If you are using Unix pipes, what do you use for your thread pools?    Would they work on Windows, or with Java and C#?  Are you restricting yourself to C[++]?  I have to confess that I have never worked with Unix... 🙄

Thanks in advance,

Paul M.




> For more options, visit https://groups.google.com/d/optout.



--
ern0
dataflow evangelist

--
You received this message because you are subscribed to the Google Groups "Flow Based Programming" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flow-based-programming+unsub...@googlegroups.com.

Paul Morrison

unread,
Aug 18, 2017, 10:09:12 AM8/18/17
to Flow Based Programming
On second thoughts, I am not sure how the idea of thread pools would apply to FBP, given that most processes in an FBP application are long running...    If you want to have fewer threads than processes, in Java or C#, then presumably some processes would have to be green threads (perhaps running under red threads) - and then you run into the problem of how to create multiple stacks in those languages.

So this would seem to restrict one to using C[++], where you could use Boost for the red threads, and coroutines for the green ones!  I can construct multiple stacks in C, but this requires assembly language, so it will be machine- and OS-dependent... 

I have put up my original (green thread) C implementation of "classical" FBP on GitHub, to show how you can create multiple stacks - see https://github.com/jpaulm/threadn .  This logic can be found in https://github.com/jpaulm/threadn/blob/master/src/THREADS.CPP lines 233-262.  This logic was written quite a few years ago, so it may need to be changed to reflect newer architectures or instruction sets.  Feedback would be welcome!

Regards,

Paul Morrison

unread,
Aug 18, 2017, 10:25:13 AM8/18/17
to flow-based-...@googlegroups.com
Thanks, John!  Your description makes it sound like actors would have to be enhanced quite a bit to implement classical FBP - I don't know why Hewitt says that Actor theory was invented as a way to reason about programs (Paul T's quote above).  This reminds me a bit of the Vienna Definition Language, which turned out to e bit of a dead end, although it was used quite a bit in the early design work on PL/I, if I remember correctly... 

Regards,

Paul M.

Brad Christensen

unread,
Aug 18, 2017, 4:42:08 PM8/18/17
to Flow Based Programming
Regarding the discussion around actors, might I suggest taking a look at the Pony programming language.

Pony is an open-source, object-oriented, actor-model, capabilities-secure, high-performance programming language.

Pony may have some features that may fit a FBP implementation better than other actor libraries/frameworks/languages. Specifically, its version of the actor model (including its garbage collection), its work stealing scheduler, and its unique reference capabilities seem to already be a step in the classical FBP direction as far as I can tell.

The Pony runtime has its own scheduler, which by default has a number of threads equal to the number of CPU cores on your machine. Each scheduler thread can be executing an actor behaviour at any given time, so Pony programs are naturally concurrent.

Of course, there would be work to do to model ports, their pull and push, and perhaps the pipes themselves with actors. At the moment, I don't think there is any built-in back presure for the messaging between actors, though it may be planned. The flow of IPs, and ownership of an IP could be enforced easily through 'iso' reference capability assignment to every IP.

It may be worth a look.

Cheers, Brad

Ged Byrne

unread,
Aug 18, 2017, 5:25:57 PM8/18/17
to flow-based-...@googlegroups.com
Hi Paul,

There's an interesting Turing Award Lecture on the Actor Model that Robin Milner delivered in 1993 called "Elements of Interaction." It's available here: http://delivery.acm.org/10.1145/160000/151240/a1991-milner.pdf?ip=213.205.198.148&id=151240&acc=OPEN&key=4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E6D218144511F3437&CFID=973929253&CFTOKEN=25845288&__acm__=1503088355_24a0beefc400d676a44f232b0b89d465

I think we can find the common ground between Actors and FBP in the rejection of the shared memory model. Milner sees the problems of concurrency being caused by the memory being accessible by multiple processes:

"Once the memory is no longer at the behest of a single master, then the master-to-slave (or: function-to- value) view of the program-to- memory relationship becomes a bit of a fiction. An old proverb states: He who serves two masters serves none. It is better to develop a general model of interactive systems in which the program-to-memory interaction is just a special case of interaction among peers."

This problem looks to be at the root of FBP, although you approached it from a different angle:

"Part of the reason for this is that most of today's computers have a uniform array of pigeon-holes for storage, and this storage behaves very differently from the way storage systems behave in real life. In real life, paper put into a drawer remains there until deliberately removed. It also takes up space, so that the drawer will eventually fill up, preventing more paper from being added. Compare this with the computer concept of storage - you can reach into a storage slot any number of times and get the same data each time (without being told that you have done it already), or you can put a piece of data in on top of a previous one, and the earlier one just disappears.... Although destructive storage is not integral to the the von Neumann machine, it is assumed in many functions of the machine, and this is the kind of storage which is provided on most modern computers. "
http://www.jpaulmorrison.com/fbp/concepts_book.shtml

Where Milner uses domain theory, you use a physical analogy.

So It seems that both approaches are based on the same insight but they differ in terms of implementation.

I think this is because of the approach taken. Milner developed the Actor model by starting with the conceptual model and then building an implementation of that model.

Correct me if I am wrong, but I believe that FBP was engineered through practical implementation. You started with the built product and refined it through practical experience.

The fact that both paths converge so closely is fascinating.

What I am wondering is this: the Actor model we now have is one implementation of the theoretical model. Is it the only possible implementation? Do other valid implementations exist?

Would FBP be one of valid those implementations?

Does anybody know of a formal method for this type of thing? How do you go about proving wether an approach is a valid implementation of a conceptual model?

This could solve one of the big problems that FBP faces: the lack of a formal model.

If FBP is a valid implementation of the Actor model, then we have a formal model off the shelf.

If it is not, we can make the necessary changes to the Actor model to create the formal model we need.

Attempting to be formal myself, but lacking the training, I say:

Let VI be a set of valid implementations for Milner's Actor model.

Let VI(known) be the subset that includes all known valid implementations, such as Akka.

Let VI(Unknown) be the subset of implementations not yet known.

Can we prove that VI(unknown) is now empty? Can we prove wether FBP belongs in this set?

Let CI be a candidate implementation.

Let MF(VI, CI) be a membership function that determines if CI is a member of CI.

What would MF look like? Would it return true of false for FBP?

Surely MF must be a list of assertions. If all assertions are true, then CI is a member of VI.

If it is false, then we can examine the failed assertions and consider their meaning.

What changes would be needed to make those assertions true?

Or am I being naive. Do formal models not work this way?

Regards,


Ged
To unsubscribe from this group and stop receiving emails from it, send an email to flow-based-progra...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Flow Based Programming" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flow-based-progra...@googlegroups.com.

Sam Watkins

unread,
Aug 19, 2017, 9:07:10 AM8/19/17
to flow-based-...@googlegroups.com
In my opinion, we need three types of multi-threading: ultra-light-wight
coroutines, for writing efficient FBP programs with small parts; threads,
for making use of all CPU cores in a network of coroutines; and OS processes,
for reusing existing tools that might be written in any programming language.
If I had to choose only one, I would probably choose OS processes.

Sam

On Thu, Aug 17, 2017 at 12:37:26PM -0700, Paul Morrison wrote:
> So far, nobody has commented on this classification "axis". Sam and Ernő
> have correctly pointed out that there are a number of other possible axes,
> but IMO this one is the major determinant of how to build an FBP scheduler.
>
> It is not just a matter of building an FBP-like scheduler in your favorite
> language, just because you can - my top question is about how many ways you
> can build a "classical" FBP scheduler *which supports multiple cores*!
> > *Comments, anyone?*

Raoul Duke

unread,
Aug 19, 2017, 9:22:30 AM8/19/17
to flow-based-...@googlegroups.com, Sam Watkins
right:

One can simulate green thread coroutines in/on a heavyweight OS thread.

It is harder to simulate OS threads in/on a green thread coroutine.

iiuc.

Paul Morrison

unread,
Aug 19, 2017, 11:48:48 AM8/19/17
to flow-based-...@googlegroups.com, Sam Watkins, rao...@gmail.com
You guys seem to have different mental models!  Could we get a little more concrete? 

Suppose we take "classical" FBP and Java - but not necessarily JavaFBP (I am sure other implementations are possible):  let us start with the FBP nodes all multithreading with each other.  As I am sure you both know, I have implemented this with Java threads (the same approach is also used for C#).  Now, apparently early implementations of Java used green threads, but they have since moved to more heavy weight threads - they must have had a good reason for this... 

Are you saying some FBP nodes should use green threads?  If so, I would have to have multiple stacks... I don't know how to do this in Java!  But of course, Java threads provide this!

Are you saying that FBP nodes should use OS threads?  a) aren't they doing so already... or are you referring to something else,  e.g. *nix processes?  If so, how would we implement an FBP network, and manage it?

Would one of you care to sketch out an alternative approach to building a Java FBP implementation?

TIA

Paul M.

--


ern0

unread,
Aug 19, 2017, 5:59:29 PM8/19/17
to Flow Based Programming
> However, re your other approaches, I am curious what languages you
> are referring to, and what threading tools...? If you are using Unix
> pipes, what do you use for your thread pools?

I just said, that Unix piping (when you say: cat myfile | sort | uniq
| grep foo) use separate process for components (just because
components are separate programs, they can be started only as
processes). This is not too optimal, but far enough for tasks comes up
when one works with CLI.

As Unix piping comes with the system and it's trivial and primitive,
the only thing to do with it is, if you ever write a CLI command, you
may count on that your program will be used as a component in a
pipeline (e.g. that's why error messages going to stderr, and not
stdout, stdout is the input of the next chain, it's not a good idea to
feed it with error messages).

> Would they work on Windows, or with Java and C#?

Life is too short for playing with these cra... platforms! I've never
written programs (only) for Windows, and it was my best technical and
life decision ever. I am also stopped using Windows years ago, XP was
my last Windows operating system.

> Are you restricting yourself to C[++]? I have to confess that I
> have never worked with Unix...

I was just talking theoretically. I am not familiar with dataflow
system implementations (except mine), but I'm pretty sure that each
system have some kind of scheduler, and it has a point, when it
decides, which message is to process next. This is the point where
thread pool comes into the picture.

Web servers (or other kind of services) are processing incoming
request similar way: the request receiver passes the request to a
chosen thread (or process), which is picked from a pool, if there's
any available.

Some months ago, I had to write a service *without* using multiple
threads, and have to say, it was interesting. It is a key-value
storage service, using hash table. In this case, single thread
architecture is probably a good choice:
- the processing time of a request is minimal,
inserting-updating-deleting a small item into a hash table takes very
short time,
- using single thread, no parallel requests run, there's no race
condition to handle, no semaphores and locks.

You may check it: https://github.com/ern0/hashr
This is the main loop:
https://github.com/ern0/hashr/blob/develop/server/src/Server.c#L101

What I want to point is that threads are great, but some cases there's
better not to use them. That's why I like the third method, when
there's a Thread component: this is the method, when the application
creator can make decision about thread usage.
--
ern0
dataflow evangelist

Raoul Duke

unread,
Aug 20, 2017, 12:50:10 PM8/20/17
to flow-based-...@googlegroups.com

Paul Morrison

unread,
Aug 21, 2017, 11:44:19 AM8/21/17
to flow-based-...@googlegroups.com, John Cowan, Joe Witt
Thanks, Ged - very thought-provoking!  Some comments interspersed...

On Fri, Aug 18, 2017 at 5:25 PM, Ged Byrne <ged....@gmail.com> wrote:
Hi Paul,

There's an interesting Turing Award Lecture on the Actor Model that Robin Milner delivered in 1993 called "Elements of Interaction." It's available here: http://delivery.acm.org/10.1145/160000/151240/a1991-milner.pdf?ip=213.205.198.148&id=151240&acc=OPEN&key=4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E6D218144511F3437&CFID=973929253&CFTOKEN=25845288&__acm__=1503088355_24a0beefc400d676a44f232b0b89d465

I think we can find the common ground between Actors and FBP in the rejection of the shared memory model. Milner sees the problems of concurrency being caused by the memory being accessible by multiple processes:

"Once the memory is no longer at the behest of a single master, then the master-to-slave (or: function-to- value) view of the program-to- memory relationship becomes a bit of a fiction. An old proverb states: He who serves two masters serves none. It is better to develop a general model of interactive systems in which the program-to-memory interaction is just a special case of interaction among peers."

This problem looks to be at the root of FBP, although you approached it from a different angle:

"Part of the reason for this is that most of today's computers have a uniform array of pigeon-holes for storage, and this storage behaves very differently from the way storage systems behave in real life. In real life, paper put into a drawer remains there until deliberately removed. It also takes up space, so that the drawer will eventually fill up, preventing more paper from being added. Compare this with the computer concept of storage - you can reach into a storage slot any number of times and get the same data each time (without being told that you have done it already), or you can put a piece of data in on top of a previous one, and the earlier one just disappears.... Although destructive storage is not integral to the the von Neumann machine, it is assumed in many functions of the machine, and this is the kind of storage which is provided on most modern computers. "
http://www.jpaulmorrison.com/fbp/concepts_book.shtml


Where Milner uses domain theory, you use a physical analogy.

So It seems that both approaches are based on the same insight but they differ in terms of implementation.

Historically, variables started out as "words" on machines like the IBM 650...  The IBM 650 words were 10 digits wide and could hold 10 digits (0-9) and a sign, giving them a capacity of almost ±10 billion (10^10) - so this must have seemed very generous!  These same words could also be used for floating point numbers - and instructions.  We called them "variables" precisely because they were variable - which seemed like a good idea at the time!   And it turned out that this worked great for mathematical calculations...  not so great for non-mathematical apps!  The IBM 650 could also support alpha data using 2 digits per character, but in the early days most computing was mostly just that: computations.  Because the words allowed modification and storage of instructions, you could write compilers! 

So obviously these words behaved as I describe in my paragraph that you quote above, and it wasn't too bad as long as you stayed with fairly simple looping structures.

However, add in complex loops and conditions, and it becomes harder and harder to control what is happening!  I think this is why programmers get very nervous when they can't control the exact sequence of events in a program...  I should stress that multithreading only made this problem worse - it was already bad, even on old computers which only had a single thread!

This was borne in on me in the late '60s when I was trying to come up with a generalized Update - this seemed useful as Update was in those days one of the most common application types... and one of the hardest to get right!  So I figured that if I could put a generalize Update up on the shelf, programmers would just have to parametrize it a bit, and they would be able to build Updates with far less pain and effort!

Here is the schematic for the traditional Update that I used in one of my PPT presentations: http://jpaulmorrison.com/fbp/Trad_update.png  (the green circles are connectors - I ran out of room!  The blue ellipses are starts and stops).  This is actually a paper flow-chart that programmers were supposed to be able to use as a basis - there was no componentry... just a paper chart...   So FBP with Collate - http://jpaulmorrison.com/fbp/update.jpg - was totally different - it was like waking up from a bad dream!  In fact, my then manager, when I showed it to him, said, "That's a nice way of documenting updates", and I said, "No, we can actually build it that way!".

To me, based on this experience, actors (things which act) come at the problem from the wrong end: the important concept is data as "things".  The actor people seem to treat data as just something held by an actor, whereas, in FBP, data is central, and processes are just the way one stream of data is transformed into another.

 
I think this is because of the approach taken. Milner developed the Actor model by starting with the conceptual model and then building an implementation of that model.

 FBP is a true paradigm change - and IMO you can't get there by trying to generalize the old approach to programming - you have to make a total jump...  Kuhn and Joel Barker are well worth reading on paradigms.

On the other hand, it is strange that data flow has been a design methodology for years - Ged, you and I have talked about the Jackson methodology many times.


Correct me if I am wrong, but I believe that FBP was engineered through practical implementation. You started with the built product and refined it through practical experience.

I had this Update problem rattling around in my head plus wanting to have a more visual way of showing data processing plus wanting to have more code reuse: a few years earlier I had worked with Geoffrey Gordon's brilliant GPSS simulation system - see https://en.wikipedia.org/wiki/GPSS ... and somehow all these ideas suddenly came together in my head!   BTW The Wikipedia article on GPSS does not show the graphical notation, but they had one - which had to be converted to the text format for execution. 

In 1969 I was already using these concepts to build reliable, maintainable, applications in a Data Centre for paying customers! 


The fact that both paths converge so closely is fascinating.

So I think they just appear to - I think the two paradigms are at base very different.  However, as you say, they both reject the shared memory model - IMO this is really a no-brainer, so I would expect there to be many approaches that reject this model. 
 

What I am wondering is this: the Actor model we now have is one implementation of the theoretical model. Is it the only possible implementation? Do other valid implementations exist?

So... do we have one theoretical model or two?!  I think we have two, philosophically and paradigmatically (?) different.

There may be a lowest common denominator to these two approaches - which would simply be a cloud of independent processes, communicating somehow.  This model seems to me to be missing most of the important details which would allow the model to get any real work done.  Since it needs so much to get real work done, I am not really sure how useful it is.  Sorry!  🤔
 

Would FBP be one of valid those implementations?



I think John Cowans's description below sums up the differences very well - thanks John!

BTW With respect to actors' single input queue, mentioned by John, Joe Witt's NiFi currently has that limitation, which, as he says, puts limitations on having a generalized Collate.  He has said that he may add multiple input ports at some time in the future...  Correct, Joe?

Does anybody know of a formal method for this type of thing? How do you go about proving whether an approach is a valid implementation of a conceptual model?


This could solve one of the big problems that FBP faces: the lack of a formal model.

Hunh?!  😕  I have never felt the lack!  Can you expand on this?  What am I missing?  Do any languages or operating systems have formal models?
 
I think I understand what follows - but I am afraid I don't understand how it really advances the discussion!  See the previous line...

Best regards,

Paul M.

Ged Byrne

unread,
Aug 22, 2017, 3:08:14 AM8/22/17
to flow-based-...@googlegroups.com, Joe Witt, John Cowan
Hi Paul,

I'm glad you find it thought provoking, and I appreciate the experiences you've shared.  

I don't believe that we are not dealing with two separate paradigms. I believe there is a single unified theory that is more than a lowest common denominator.

If they are separate paradigms then we are dealing with the blind men and the elephant.  Different people groping different parts of the same creature, each with an accurate but incomplete view of the whole.  

I'm certain that we are close to stepping back and seeing the whole creature and gaining a complete understanding so that we can say to the four blind men: "You two have been describing the trunk and you the tail.  They sounded then same because both shared commonalities but they are actually very different.  You two were both feeling a front leg each and they are both the same, but your choice of words made them sound quite different."

If we are to find such a complete and accurate understanding then I think we need formal mathematics to do the job. 

Is it necessary to get the work done?   Not at all, people are managing to develop more than enough software without such a theory. 

The truth is, they are also managing to write that software without FBP. 

You say that the flow based systems proved to be maintainable.  They could be easily understood and adapted.  

The majority of software is not understood and it is difficult to adapt.

We do not need the unifying theory to get the work done, but we do need it to understand and fix the work of others.  

I envisage an approach that allows us to take existing code and extract from it the underlying sequential processes and reassemble them into a flow graph. 

I see it, but without the math I lack the language to accurately describe it.  If I cannot describe it then I cannot build it.    

Regards,


Ged


--

Raoul Duke

unread,
Aug 22, 2017, 3:47:30 AM8/22/17
to flow-based-...@googlegroups.com
Ged, FBP is a particular form of the dataflow concept, no? Which does have real programming language theory behind / applied to it. So I'd guess the elephant is dataflow, no?

Ged Byrne

unread,
Aug 22, 2017, 5:11:48 AM8/22/17
to flow-based-...@googlegroups.com
My discussion on the differences between data flow and FBP can be found here: https://groups.google.com/forum/m/#!msg/flow-based-programming/iHCpYZp9-YE/jP71CFpukLEJ
--

Raoul Duke

unread,
Aug 22, 2017, 10:12:14 AM8/22/17
to flow-based-...@googlegroups.com
I'm confused why you say you are seeking the elephant then, it seems like the animals are pretty well understood in terms of a classification/taxonomy? I guess you mean we'd like a more maths of some ilk based description?

Ged Byrne

unread,
Aug 22, 2017, 11:20:21 AM8/22/17
to flow-based-...@googlegroups.com
Because we are not dealing with animals, just one creature; the elephant.

I believe the key is flow, the push and pull that John uses to describe the differences between data flow and FBP. The problem is that we don't have a clear description of push and pull and the laws that govern then flow they produce.

Push and pull are like the gravity of the moon that moves the tides around the earth. Newton reduced that push and pull to just three laws of motion. We need the same for software.
On Tue, 22 Aug 2017 at 15:12, Raoul Duke <rao...@gmail.com> wrote:
I'm confused why you say you are seeking the elephant then, it seems like the animals are pretty well understood in terms of a classification/taxonomy? I guess you mean we'd like a more maths of some ilk based description?

--

Joe Witt

unread,
Aug 22, 2017, 11:24:30 AM8/22/17
to Flow Based Programming, co...@ccil.org, jw...@hortonworks.com
BTW With respect to actors' single input queue, mentioned by John, Joe Witt's NiFi currently has that limitation, which, as he says, puts limitations on having a generalized Collate.  He has said that he may add multiple input ports at some time in the future...  Correct, Joe?


It is true we are likely to add support for named input ports at some point.  We've not needed them yet because we've been able to implement key use cases we've needed by leveraging the attributes of the information packets to act as the multiplexer rather than relying on the port they came in on.  However, to generalize that in our current approach requires some pretty awkward logic for the API and is inefficient at best.  Named input ports let you do things like avoid scheduling the black box to run until all input ports have some valid input ready/available.  Named input ports let you efficiently pull from a single port the item you want rather than our current limitation which is to grab a bunch of IPs from a single port then sort out whether we got the items we need.

So, clearly a useful pattern that offers real power. We'll get there i'm sure. 

ern0

unread,
Aug 23, 2017, 11:25:25 AM8/23/17
to Flow Based Programming
> The problem is that we don't have a
> clear description of push and pull

...and there are several other types of how message passing and
processing related to each other (once, buffered push-pull mix,
sync-fix-one etc.).
--
ern0
dataflow evangelist

Paul Morrison

unread,
Aug 23, 2017, 1:16:19 PM8/23/17
to Flow Based Programming, jw...@hortonworks.com, co...@ccil.org
Hi Ged, I guess I missed the "elephant" discussion! 

Please forgive me for being very skeptical about your remark, "...take existing code and extract from it the underlying sequential processes and reassemble them into a flow graph. "   We have talked before about reducing an iceberg into ice-cubes, and I can visualize this, but this is a gradual process - and a very different image.  

I'd like to repeat a para from earlier in this conversation:

"...Here is the schematic for the traditional Update that I used in one of my PPT presentations: http://jpaulmorrison.com/fbp/Trad_update.png  (the green circles are connectors - I ran out of room!  The blue ellipses are starts and stops).  This is actually a paper flow-chart that programmers were supposed to be able to use as a basis - there was no componentry... just a paper chart...   So FBP with Collate - http://jpaulmorrison.com/fbp/update.jpg - was totally different - it was like waking up from a bad dream!  In fact, my then manager, when I showed it to him, said, "That's a nice way of documenting updates", and I said, "No, we can actually build it that way!". "

First question: can you get from either diagram to the other?!

Given the FBP Update diagram, how could you use it to produce a conventional program, without using multiple threads or stacks?  Didn't Jackson's later work require coroutines or threads...?

I believe the two paradigms are fundamentally different!  We need to teach data flow to kids, using precoded components, and do not allow them to actually write any code for the first 5 years...  That might work!

Regards,

Paul

Ged Byrne

unread,
Aug 23, 2017, 1:22:29 PM8/23/17
to Flow Based Programming, jw...@hortonworks.com, co...@ccil.org
Hi Paul,

I'm talking about the other direction. Taking existing code and extracting the flow graph.

That is a long way down the road. First I want to focus on Plip Plop, which I hope will help challenge what is and isn't FBP.

Regards,


Ged


Paul Morrison

unread,
Aug 23, 2017, 4:31:18 PM8/23/17
to flow-based-...@googlegroups.com
> The problem is that we don't have a
> clear description of push and pull

Seems pretty clear to me:

-  "push" ---------------------  send an Information Packet handle to a connection unless it is full; in which case suspend that process until there is space in the connection again, and try again (if the connection is closed, report this fact)

- "pull"    ---------------------  receive an Information Packet handle from a connection unless it is empty; in which case suspend that process until more data arrives, and try again (if the connection is closed, report this fact)

Whether a process pushes or pulls is entirely under its own control...

Not very formal, I'm afraid!  But, again, we ran a bank on this logic!

Regards,

Paul M.

Ged Byrne

unread,
Aug 23, 2017, 5:38:57 PM8/23/17
to flow-based-...@googlegroups.com
Hi Paul,

What if the information you need to pull is not sitting in a queue waiting to be consumed? What if it needs to be pulled from a large data store or some human's head?

Regards,


Ged
On Wed, 23 Aug 2017 at 21:31, Paul Morrison <jpau...@gmail.com> wrote:
> The problem is that we don't have a
> clear description of push and pull

Seems pretty clear to me:

-  "push" ---------------------  send an Information Packet handle to a connection unless it is full; in which case suspend that process until there is space in the connection again, and try again (if the connection is closed, report this fact)

- "pull"    ---------------------  receive an Information Packet handle from a connection unless it is empty; in which case suspend that process until more data arrives, and try again (if the connection is closed, report this fact)

Whether a process pushes or pulls is entirely under its own control...

Not very formal, I'm afraid!  But, again, we ran a bank on this logic!

Regards,

Paul M.


On Wed, Aug 23, 2017 at 11:25 AM, ern0 <er...@linkbroker.hu> wrote:
> The problem is that we don't have a
> clear description of push and pull

...and there are several other types of how message passing and
processing related to each other (once, buffered push-pull mix,
sync-fix-one etc.).
--
ern0
dataflow evangelist



John Cowan

unread,
Aug 23, 2017, 7:16:48 PM8/23/17
to flow-based-...@googlegroups.com

On Wed, Aug 23, 2017 at 5:38 PM, Ged Byrne <ged....@gmail.com> wrote:

What if the information you need to pull is not sitting in a queue waiting to be consumed? What if it needs to be pulled from a large data store or some human's head? 

No problem.  Each such device is equated to a component in the flow network.  The large data store is fronted by a component that accepts query packets on its input port and writes results to its output port.  There can be multiple copies of this component for improved concurrency.  The human being, by the same token, is fronted by a web or desktop-GUI component that accepts a representation of the screen/page to display on its input port and writes the submitted data to its output port.

-- 
John Cowan          http://vrici.lojban.org/~cowan        co...@ccil.org
Mark Twain on Cecil Rhodes: I admire him, I freely admit it,
and when his time comes I shall buy a piece of the rope for a keepsake.

Paul Morrison

unread,
Aug 23, 2017, 10:09:15 PM8/23/17
to flow-based-...@googlegroups.com, Joe Witt, John Cowan
Hi Ged,

Not clear what the "directions" are!  You are presumably not talking about a control flow diagram - we've been doing those since the days of Ada Lovelace. 

What you presumably need to do is take all the variables and connect the events involving each variable, showing whether they are being read or written (or both).  This sounds like a huge job, for any real-life program...  Maybe you will come up with a different approach...

Anyway, we are all waiting with bated breath to see what Plip Plop is going to look like - I assume the analogy is plumbing... right?

Regards,

Paul



On Wed, Aug 23, 2017 at 1:22 PM, Ged Byrne <ged....@gmail.com> wrote:
Hi Paul,

Paul Morrison

unread,
Aug 23, 2017, 10:10:46 PM8/23/17
to flow-based-...@googlegroups.com, Ged Byrne, John Cowan
Thanks, John, that's exactly how I would have responded - except you said it better!

Regards to you both,

Paul

--
You received this message because you are subscribed to the Google Groups "Flow Based Programming" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flow-based-programming+unsub...@googlegroups.com.

Ged Byrne

unread,
Aug 24, 2017, 4:09:43 AM8/24/17
to flow-based-...@googlegroups.com
Hi John, 

That is exactly how I see it.  The basic principle is cause and effect and the directions depend on scale.  It isn't just about softaware but all components of the system.  

At the highest scale I pull a new mouse from Amazon.  I order it, and they send it to me.  

But when we zoom in we see many interactions.  I search, pulling a list of available products.  I choose one and push it into my shopping cart. At the checkout information is pulled from me using forms. 

At the lowest scale there is only push, with cause leading to effect. Request leading to response. 

Jackson has a conceptual model for all this with his problem frames which I've spent years trying a apply practically  

Regards,


Ged

--

ern0

unread,
Aug 24, 2017, 4:17:25 PM8/24/17
to Flow Based Programming
Correct. Now pls tell me that Unix pipe connections are pull or push?

A process can write on STDOUT any time, but it does not trigger the
consumer process, anyway, I should call it "push", because the process
itself initiates the data transfer without any external help. When the
buffer is full, the write operation will be blocked, but the program
may ignore it.

A process can read from STDIN any time, but it does not trigger the
producer process, anyway, I should call it "pull", because the process
itself initiates the data transfer without any external help. When the
buffer is full, the read operation will be blocked, but the program
may ignore it.

Okay, let's see, spreadsheets are pull or push?

When a formula changes, all the dependencies gets updated in a strange
order: first, all dependencies will be marked as dirty, then they gets
updated, if they have only clean predecessors. Update is somewhat a
pull, because the actual formula uses the referred ones, but finding
next formula to update is a kind of push, as freshly updated formulas
points to them.

I did not said that you don't know what push and pull means (people
usually understand what they've discovered/invented), I just said that
they are fuzzy categories. Also, other characteristics of DF systems
can be fuzzy, similar way.
> --
> You received this message because you are subscribed to the Google Groups
> "Flow Based Programming" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to flow-based-progra...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



--
ern0
dataflow evangelist

John Cowan

unread,
Aug 24, 2017, 4:56:05 PM8/24/17
to flow-based-...@googlegroups.com

On Thu, Aug 24, 2017 at 4:17 PM, ern0 <er...@linkbroker.hu> wrote:

Correct. Now pls tell me that Unix pipe connections are pull or push?

It's a matter of network flow:

In a push-based network, a component gets input pushed to it by its
predecessor (and generally there is only one) and pushes output to
its successors.

In a pull-based network, a component pulls input from its predecessors,
and its successor (and generally there is only one) pulls output from it.

In a classical FBP system (of which pipes and file systems are particular
cases), a component pulls input from its predecessors (or input files) and
pushes output to its successors (or output files).

In addition to the greater flexibility of classical FBP, it is also simpler to
write components, because control flow need not be deformed into an
event loop or (whatever the push dual of an event loop is called, I don't
know).  They are written using a simple read-process-write pattern, like
any old BASIC program.

 -- 
John Cowan          http://vrici.lojban.org/~cowan        co...@ccil.org
My confusion is rapidly waxing
For XML Schema's too taxing:
I'd use DTDs / If they had local trees --
I think I best switch to RELAX NG.


Sam Watkins

unread,
Aug 26, 2017, 6:16:18 AM8/26/17
to flow-based-...@googlegroups.com
On Thu, Aug 24, 2017 at 10:17:01PM +0200, ern0 wrote:
> Now pls tell me that Unix pipe connections are pull or push?
>
> A process can write on STDOUT any time, but it does not trigger the
> consumer process, anyway, I should call it "push", because the process
> itself initiates the data transfer without any external help. When the
> buffer is full, the write operation will be blocked, but the program
> may ignore it.

The Unix pipe connections are suitable for FBP. As soon as data is pushed into
the pipe it will be available to pull at the other end, and waiting processes
are activated immediately when data or pipe space is available.

You can use Unix pipes directly without stdio.

Stdin and stdout are libc IO buffers. Stdout can be "flushed" at any time
using fflush(). You need to do that if you want the data to be written to the
file or pipe immediately. Stdin can store up extra data from the pipe that the
process doesn't need yet. This has no harmful effects with FBP, just makes
reading more efficient.

There is no real logical distinction between "push" and "pull" flow as I
understand. Data only moves when it's pushed on the one end and pulled at the
other end. In FBP with threaded components all the components are always
active (or blocked), so there's no need to worry about activation. If the
system does not have proper threaded or independent components you might have
to think about whether to push or pull data through a series of functions, but
then it's not really proper FBP.

Tom Young

unread,
Aug 26, 2017, 12:58:45 PM8/26/17
to Flow Based Programming, Sam Watkins
H
​i Sam, 


Just a note about *nix buffers. 

On Sat, Aug 26, 2017 at 6:16 AM, Sam Watkins <s...@nipl.net> wrote:
Stdin and stdout are libc IO buffers.  Stdout can be "flushed" at any time
​ ​
using fflush().  You need to do that if you want the data to be written to th
​e f​
ile or pipe immediately. 

​describes a number of ways to cause *nix processes to avoid buffering and write immediately (immediate line buffering mode).  fflush() may  not be available in all languages, but immediate mode should be available for just about any *nix program which uses stdio,  either by one of the solutions  described in the link above or otherwise 'fooling' the process into 'thinking' it is writing to a terminal.  

Buffering can lead to deadlock(or excessive latency) in almost any cyclic network and also in some acyclic networks and so is an important design consideration, which probably should be addressed in FBP network definitions.

 "...the shell syntax required to initiate a coprocess and connect its input and output to other processes is quite contorted..."
W. Richard Stevens [Advanced Programming in the Unix Environment, p441]

Ged Byrne

unread,
Aug 28, 2017, 4:02:29 AM8/28/17
to flow-based-...@googlegroups.com
Hi John,

I think you've hit the nail on the head with you're comparison with old basic programming.  

I'm one of a generation of self taught programmers who started with Basic. 

My earliest program was not the simple statement "Hello World".  It was an interaction that looked like this:

10 Input "What is your name?", a$
20 Print "Hello "; a$

The next step was to add a conditional to welcome my friends and tell everybody else they smelled. 

Back then interaction was easy.  With Plip Plop that is my goal.  I'm not trying to do multi threading or visual programming, I just want an interactive flow to be so simple that even 10 year old me could do it. 

Regards,


Ged


On Thu, 24 Aug 2017 at 21:56, John Cowan <co...@ccil.org> wrote:

In addition to the greater flexibility of classical FBP, it is also simpler to
write components, because control flow need not be deformed into an
event loop or (whatever the push dual of an event loop is called, I don't
know).  They are written using a simple read-process-write pattern, like
any old BASIC program.

 -- 
John Cowan          http://vrici.lojban.org/~cowan        co...@ccil.org
My confusion is rapidly waxing
For XML Schema's too taxing:
I'd use DTDs / If they had local trees --
I think I best switch to RELAX NG.


Reply all
Reply to author
Forward
0 new messages