Notes on dataflow engine design

309 views
Skip to first unread message

Aristophanes

unread,
Mar 20, 2011, 5:17:53 PM3/20/11
to Flow Based Programming
Over time I have been working on the design and implementation of a
data flow engine. This article presents some of my observations and
idea.


This article is about data flow engines about design. A dataflow
engine is the software that handles scheduling the transmission
of messages (information packets) between autons (components),
maintenance of the data structures holding messages, and the API
between the component software and the data flow engine.

I take the view that dataflow engine design naturally separates
into two distinct components, inter-thread and intra-thread
communication, each with their own implementation issues. The
focus in this article is on scheduling issues for engines that
run within a single thread of control.

From the perspective of the scheduler autons (components) are a
set of inports (input ports) and outports (output ports).
Outports emit packages. A package has a wrapping and a
packet. The wrapping contains the package address, an
(inport,auton) pair, and sundry metadata. The packet contains
metadata and either an document (FBP - an information packet) or
a data value.

Packet contents are passed to input response code associated with
the inport. The input response code processes the packet. In
turn it can emit new packages to be delivered at a later date by
the scheduler. Once packet processing is complete control
returns to the scheduler. In turn the scheduler selects the next
package to be delivered.

DEFINITIONS

Terminology is a problems in discussing dataflow/flow based
programming. Here is a little section on the terminology that I
use.

Auton: Short for autonomous computing element.
Inport: An input port associated with an auton.
Outport: An output port associated with an auton.
Pipe: A connector between an inport and an
outport. Data flows from an inport thru
the pipe and into an outport.
Network: A collection of autons connected by pipes.
Nest: Auton network managed as a single entity
by a scheduler.
Packet: Data element that an auton can receive.
There are two types of packets - documents
and data.
Document: A data object that persists as a singleton
object throughout its duration in the
network.
Package: An <address,packet> pair. Autons emit
packets; the outport wraps the packet in a
package; pipes transport packets; the packet
is extracted at the inport.
Address: An <auton id, inport id> pair specifying
the address of a destination inport.


SCHEDULING CONSTRAINTS

"Flow Based Programming" mentions three general constraints. To
these I have added two more. They are:

(FBP) The flow constraint: Basically this says a package cannot
be delivered until it has been sent. This may seem obvious but
it implies that data flows follow a flow diagram.

(FBP) The one place at a time constraint: In FBP an information
packet can only be in one place at a time - either in a queue,
within another IP, or within an auton. I distinguish between
documents and data; Documents are objects that can only be in
one place at a time. Data are not objects; they are only values.

(FBP) The consistent ordering constraint: The general statement
is that objects must be processed in the same order that they are
sent. The ordering constraint implies that packages must be
stored in queues before they are delivered.

There are three variants of this constraint, inport order, auton
order, and nest order.

Inport order: The inport queue (or its image in a larger queue)
preserves the order in which the packages were sent.

Auton order: Packages sent to an auton are processed in the order
they were sent. Auton order is usually associated with an auton
queue.

Nest order: A master queue holds all undelivered packages in the
order that they were sent.

(RH) The delivery must be attempted constraint: An attempt must
be made to deliver every package; either it must ultimately be
delivered or rejected as undeliverable.

(RH) Execution is divided into epochs. Packages emitted within
one epoch are not delivered until the next epoch. This is the
default; it can be over ridden.

INPUT PROCESSING STYLES

When a delivery is made there are three major possible responses.
The first is that one package is delivered. The second is that
all packages in the inport's queue of packages are processed.
The third is that all packages in the auton's inports are
processed. Which of these strategies is used depends on the
auton code structure.

(1) Single package processing: Morrison calls this type of
processing no-loopers. The auton internal code is divided into
inport response functions, one for each inport. The inport
response function receives a packet as an argument. The internal
response code has an equivalent of this form (pick your own
keywords)

From <inport id> Receive <packet>
<body of response>
End

(2) Inport queue processing: In this style the inport has an
associated queue. As above each inport has its own response
function. However in this style the response function has a
master loop which extracts and processes packets from the queue.
Morrison calls this type of processing loopers. The inport
response code has an equivalent of this form:

While (More <inport id>)
Receive <packet>
<process packet>
End while

(3) Process all queued input: In this style the auton has a
single inport. There is a single input queue that holds all
packets received by the auton. When the auton is activated the
auton response function has a master loop that extracts packets
from the input queue. Within the loop body there is a switch on
the packet type. Functional languages such as Erlang tend to use
this style. The auton response code has an equivalent of this form:

While (More input)
Receive <packet>
Switch <packet type>
Case <type 1>
<code to process type 1>
...
End switch
End while

The processing style does not have to be implemented in the input
response code. Instead the scheduler can implement a chosen
style by controlling the order of packages being delivered.


RELATIONSHIP OF STYLES AND CONSTRAINTS

We cannot arbitrarily mix style type and order constraint type.
The nest-order constraint requires that single-package processing
be used. The auton-order constraint requires that either
single-package processing or auton-order processing be used.
Finally, the inport-order constraint is consistent with all three
processing styles. In other words, the inport-order constraint
is the weakest constraint, auton-order is intermediate, and
nest-order is the strongest constraint.

ESTABLISHING A TRANSITION BOUNDARY

In simulations it is useful to establish a "clock". The
simulation has a sequence of state and input pairs, <S,I> that
represent the system state and its inputs at the beginning of a
clock tick. In the interval between one tick and the next the
system generates a set of outputs in response to its inputs. In
turn the outputs become the inputs for the next tick.
Symbolically

<S0,I> => M => <S1,M>

Regardless of the constraint choice or the processing style we
can always maintain a transition boundary by having a separate
output queue. All packages generated during an interval are
attached to the output queue. When the transition boundary is
reached the packages from the output queue become the new input
queue.

An output queue is necessary if inport or auton queue processing
is used. It is not necessary if nest-order is used; however
there is no advantage in not having an output queue.

Having a transition boundary is a constraint. It says that
packages emitted during one clock tick are not delivered until
the next clock tick.

SPECIFYING THE ORDER OF EXECUTION

Order constraints dictate using queues; however inport-order and
auton-order constraints do not fully specify the order of
execution. Here are two alternatives.

(1) Random selection scheduling: The idea here is to maintain a
list of all autons/inports that have queues with content. Pick
one at random and process the queue. This potentially violates
the "delivery must be attempted" constraint; however there are
simple ways to ensure that every entry in the list is eventually
selected.

(2) Original order scheduling: The idea here is to use a master
list of packages ordered by when they were sent. Processed
packages are deleted from the list.

USING A MASTER QUEUE, ADVANTAGES AND DISADVANTAGES

Many implementations maintain a separate queue for each
auton/inport. Using a master queue arguably is both more
efficient in storage use and arguably provides better system
integrity.

The reason is quite simple. When separate queues are used space
must be allocated for the maximum number of packages expected to
be in each queue. When a master queue is used we need only
allocate space for the maximum number of packages in the system.

Obviously using a master queue will require less storage, though
whether the savings are significant depends both on the
implementation and the application. Configuring applications is
simpler when just the master queue must be sized than when each
separate queue has to be sized.

Using a single master queue simplifies design because the need
for "suspend on send" blocking is obviated. The root cause for
flooded queues is processes that produce packets at a rate
greater than the rate that the rest of the system can handle. The
cure is to bound the output rate of said processes.

STACK BASED SCHEDULING

In stack based scheduling the scheduler pushes emitted autons
onto a stack. When it selects the next package to be delivered
it pops one off the stack. Stack based scheduling is efficient
in its use of space; however it can violate order constraints.

For example, suppose auton X sends packages to Y and Z in that
order. The scheduler delivers a package to Z which in turn sends
packages to W and Y. The stack now contains a package for Y, one
for W, and a second, later one for Y. The scheduler will now
deliver the later one before the earlier one.

Another problem with stack based scheduling is that the
application can get trapped in an infinite loop cycling through
part of the network.

None-the-less there are situations in which restricted stack
based scheduling can be useful.

One such is "follow the packet scheduling". This can be used
when all packets are documents (FBP: Information packets). The
idea is that when an IP enters an auton it only has one path out;
therefore it can be followed until it is discarded or otherwise
exits the system. This strategy requires keeping a list of all
documents in the system.

A second situation is recursion. Stack based scheduling is
appropriate when recursion is effected by spawning subsidiary
autons.

DEFERRED DELIVERY

In simulations we may want to defer delivery of a
package until some later time, either in simulation pseudo-time
or in real time. A common way to handle deferred delivery is to
build a deferred delivery heap into the scheduler.

An alternative is to use a auton that receives a package that
contains a message and a send time. The auton responds by doing
a time check on each clock tick. When the send time it reached
it packages the message and sends it on.

BLOCKED PORTS

When each auton or each inport has its own separate queue it is
fairly easy to create blockages by filling queues. Packages
cannot be sent to the filled queues. If the sending auton does
not take action to handle the failure to send it becomes blocked
with a "suspend on send". The auton remains suspended until the
blocked queue is opened up. This condition can propagate
upstream. There is a fair amount of discussion in "Flow Based
Programming" about dealing with this condition.

There also is "suspend on receive". There are two cases to
consider. One is normal; it occurs when the input queue is empty.
The other is algorithmic; the code within the auton places a
block on an inport that is not lifted until some condition is
met.

The situation is quite different when there is a single master
queue. There are no "suspend on send" blockages. It is possible
for processes to emit excessive packages and thereby flood the
master queue. This is easily detected and it is quite
straightforward to fix.

Normally the only blockages that are seen in the single master
queue scenario are blocks placed on an inport by the auton for
algorithmic reasons. For example, suppose an auton is
interleaving inputs from ports A and B. When it reads data from
port A we block it so that the next input must come from port B.

When an inport is blocked it is convenient for its queue to be
separate from the master queue.

ern0

unread,
Mar 22, 2011, 4:06:14 AM3/22/11
to flow-based-...@googlegroups.com
> Over time I have been working on the design and implementation of a
> data flow engine. This article presents some of my observations and
> idea.

TLDR, but I will read it.

Anyway, I've just finished implementing my 2nd dataflow system. Opposite
to the previous one, there is not the whole system is built around the
dataflow architecture, only just a part of it is DF. It's a
collaborative spreadsheet using PHP/MySQL backend and AJAX/JS frontend.

Now I'm working on the documentation, which I will push here when it
will be at least in RC state, if anybody interested, or even if not.
--
ern0
dataflow programmer & evangelist

Vladimir Sibirov

unread,
Mar 22, 2011, 4:47:27 AM3/22/11
to Flow Based Programming
Good points on single-thread scheduling. The most important idea in
this article in my opinion is the master queue. Or, in other words,
single packet space managed by the scheduler. Though there is a
possibility of ineffective queue management if there are many
algorithmic blocks and the scheduler has to skip many packets for
blocked inports on every epoch.

Single thread message passing is quite effective in existing
implementation such as Erlang and CSP. Some recent benchmarks show
that crossing the barrier of multiple threads costs a nearly 4-5 times
slowdown in auton communication. So, scheduling in multithreaded/
multiprocess/multinode environments is a more interesting domain
nowadays.
> onto a stack.  When it selects...
>
> продолжение »

Paul Morrison

unread,
Mar 25, 2011, 2:12:49 PM3/25/11
to Flow Based Programming


On Mar 22, 4:47 am, Vladimir Sibirov <trustmas...@kodigy.com> wrote:
>
> Single thread message passing is quite effective in existing
> implementation such as Erlang and CSP. Some recent benchmarks show
> that crossing the barrier of multiple threads costs a nearly 4-5 times
> slowdown in auton communication. So, scheduling in multithreaded/
> multiprocess/multinode environments is a more interesting domain
> nowadays.

a) are you sure this is how Erlang works - they refer to something
called "green processes" (not "green threads").

b) how would single thread take advantage of Java or C# threads? Not
sure what you mean by "crossing the barrier of multiple threads" -
don't multiple threads give improved performance on multicore
machines? JavaFBP certainly keeps all the cores busy - although I
suppose that could be Java overhead :-)

John Cowan

unread,
Mar 25, 2011, 2:32:50 PM3/25/11
to flow-based-...@googlegroups.com
Paul Morrison scripsit:

> a) are you sure this is how Erlang works - they refer to something
> called "green processes" (not "green threads").

Erlang calls its loci of control "processes", but they are normally
implemented using green threads. Like OS processes, they share nothing,
communicating only by message passing.

> b) how would single thread take advantage of Java or C# threads? Not
> sure what you mean by "crossing the barrier of multiple threads" -
> don't multiple threads give improved performance on multicore
> machines?

Yes, if you are I/O-bound or hold down the number of threads to equal
the number of processors. There are systems that multiplex N green
threads on M OS threads, though.

--
John Cowan co...@ccil.org http://ccil.org/~cowan
Half the lies they tell about me are true.
--Tallulah Bankhead, American actress

Vladimir Sibirov

unread,
Mar 25, 2011, 2:34:59 PM3/25/11
to flow-based-...@googlegroups.com, Paul Morrison

a) are you sure this is how Erlang works - they refer to something
called "green processes" (not "green threads").

Sorry, if I mix threads and processes sometimes when it does not make much difference. "green processes" (as well as "green threads"; coroutine is probably a good term to use instead) are not OS processes/threads, many of those coroutines exist within actual OS processes/threads.

It is pretty clear that Erlang runs all those "green processes" within only one of the four processor's cores and it is still faster than Java program using all four of them with "complete threads".
 

b) how would single thread take advantage of Java or C# threads?  Not
sure what you mean by "crossing the barrier of multiple threads" -
don't multiple threads give improved performance on multicore
machines?  JavaFBP certainly keeps all the cores busy - although I
suppose that could be Java overhead  :-)

Now that you are familiar with difference between "threads" and "green threads" we can get back to Aristophanes' notes. And as he states himself: "The 

focus in this article is on scheduling issues for engines that 
run within a single thread of control." Which reminds me of "green" process/thread scheduling.

For such a system, a problem of scaling from intra-thread to inter-thread scheduling/messaging exists. For example, I investigated it when I implemented Joe Armstrong's classical Ring benchmark for message passing in Go programming language: http://groups.google.com/group/golang-nuts/browse_thread/thread/dea3b9a501369c4

Paul Morrison

unread,
Mar 26, 2011, 10:48:40 AM3/26/11
to Flow Based Programming


On Mar 25, 2:34 pm, Vladimir Sibirov <trustmas...@kodigy.com> wrote:
>
> "green processes" (as well as "green threads"; coroutine is
> probably a good term to use instead) are not OS processes/threads, many of
> those coroutines exist within actual OS processes/threads.

Joe Armstrong told me the following a few years ago: "Erlang emulates
processes and does not use native OS processes (well it does, it uses
one OS process and three - four native threads - but this is deep in
the implementation of the virtual machine)." Of course I don't know
if the implementation has changed in the interim.

So it sounds like this is your understanding also. Still Joe states
that Erlang's thrust is rather different from FBP's - Erlang is
oriented around a very large number of short-lived tasks - he says
typically "we use (say) 6 processes/connection [telephone connection]
so you're easily in in hundreds of thousands of connections - the
software is dimensioned for millions of processes". Whereas FBP
applications tend to have a few tens or hundreds of long-lived tasks,
many of which may run for the whole duration of the application.

You might want to look at Kilim - this is an FBP-like implementation
that apparently performs comparably to Erlang.

Aristophanes

unread,
Mar 26, 2011, 11:33:46 AM3/26/11
to Flow Based Programming


On Mar 22, 3:47 am, Vladimir Sibirov <trustmas...@kodigy.com> wrote:
> Good points on single-thread scheduling. The most important idea in
> this article in my opinion is the master queue. Or, in other words,
> single packet space managed by the scheduler. Though there is a
> possibility of ineffective queue management if there are many
> algorithmic blocks and the scheduler has to skip many packets for
> blocked inports on every epoch.

That's one reason why I used separate queues for blocked inports.
The
technique I used was to have a separate scheduler state for draining
blocked inports. The nest state vector keeps a list of blocked and
formerly blocked inports. Whenever a blocked inport becomes unblocked
it is drained. This avoids the skipping of packets. The nub is that
messages in blocked inports have priority because of the order
constraint. The procedure seems to work fairly well; however I
haven't heavily tested it.

>
> Single thread message passing is quite effective in existing
> implementation such as Erlang and CSP. Some recent benchmarks show
> that crossing the barrier of multiple threads costs a nearly 4-5 times
> slowdown in auton communication. So, scheduling in multithreaded/
> multiprocess/multinode environments is a more interesting domain
> nowadays.

Interesting. Offhand it seems to me the issue is not so scheduling
as
it is work balancing, with the caveat that it depends on what kind of
threads we are talking about. In other words the work is divided
into
works areas, with each work area being a single thread. Most of the
message passing happens within the single threads. This
implies message passing rates be approximately balanced.

[snip article]

Vladimir Sibirov

unread,
Mar 26, 2011, 12:27:38 PM3/26/11
to flow-based-...@googlegroups.com

Interesting.  Offhand it seems to me the issue is not so scheduling
as
it is work balancing, with the caveat that it depends on what kind of
threads we are talking about.  In other words the work is divided
into
works areas, with each work area being a single thread.  Most of the
message passing happens within the single threads.  This
implies message passing rates be approximately balanced.

This is what we have come to so far in the non-FBP world: if the goal is performance then work should be balanced manually between OS threads (CPU cores) first and then coroutines may communicate within the same thread quite effectively while different OS threads talk to each other relatively infrequently. This surely makes the lives of concurrent application developers no easier. In the FBP world we can take the advantage of its visual nature and conclude some "work areas" from the FBP graph topology. Given that the number of cores on a machine which runs the application may nowadays vary from 1 to e.g. 24 and the second figure is growing year-by-year, the requirement of effective automatic job distribution is emerging.

Aristophanes

unread,
Mar 26, 2011, 6:47:35 PM3/26/11
to Flow Based Programming


On Mar 25, 1:34 pm, Vladimir Sibirov <trustmas...@kodigy.com> wrote:

[snip]
> > b) how would single thread take advantage of Java or C# threads?  Not
> > sure what you mean by "crossing the barrier of multiple threads" -
> > don't multiple threads give improved performance on multicore
> > machines?  JavaFBP certainly keeps all the cores busy - although I
> > suppose that could be Java overhead  :-)
>
> Now that you are familiar with difference between "threads" and "green
> threads" we can get back to Aristophanes' notes. And as he states himself: "
> The
> focus in this article is on scheduling issues for engines that
> run within a single thread of control." Which reminds me of "green"
> process/thread scheduling.
>
> For such a system, a problem of scaling from intra-thread to inter-thread
> scheduling/messaging exists. For example, I investigated it when I
> implemented Joe Armstrong's classical Ring benchmark for message passing in
> Go programming language:http://groups.google.com/group/golang-nuts/browse_thread/thread/dea3b...

My thought is that you don't scale up; rather you move in the
hierarchy. The way I picture the hierarchy is that at the bottom
there is a nest of autons talking to each other. An auton isn't aware
of other autons; the only thing an auton knows about the universe is
that it receives messages in its inports and sends messages out in its
outputs. Now the nest knows what autons there are and how they are
connected. It takes care of the administration and the scheduling.
Like autons it doesn't know anything about the outside universe,
except for portals that connect to the outside world. I call them
importals and exportals. These are special ports that connect to the
outside.

Now the way I picture it there is another administrative layer above
the nests that manages the message passing between nests. It knows
about threads, cores, sockets, and network connections. It isn't much
like the single thread scheduler because it has little control over
how well the networking. Messages can go out and never be received.
Since the upper level can reside on multiple processors it is
essentially peer-to-peer.

Anyway, that's the architecture I envision. As I say, though, my
focus is currently on the single thread engine.

Paul Morrison

unread,
Mar 26, 2011, 8:13:12 PM3/26/11
to Flow Based Programming


On Mar 26, 6:47 pm, Aristophanes <c...@tiac.net> wrote:
>
> My thought is that you don't scale up; rather you move in the
> hierarchy.  
>

This sounds much like FBP subnets. FBP fits well with design
technologies like Structured Analysis where you start with a high-
level design, and "explode" it level by level. We had an application
where the designer never drew a single picture - he just built up the
network hierarchically, eventually resulting in a network of around
200 processes. Of course you also have to use your knowledge of
available components at some point - just as you can't invent drywall
by working down from a high level design of a house! Sockets and
network connections are also talked about in one of the chapters in my
book that describes techniques of "cleaving" networks.

Aristophanes

unread,
Mar 27, 2011, 1:04:00 PM3/27/11
to Flow Based Programming
I opine that the similarity is superficial. Subnets encapsulate
functionality within an application. Subnets are nice for
constructing applications that run under a single thread of control.
What I was talking about is splitting applications into work areas,
where a work area is a separate application running under a separate
thread of control. This is more like your cleaving of networks.

When we talk about splitting applications into work areas we are
talking about dividing the task into subtasks that are relatively
independent with respect to function and inter-task data flow. In
other words they can work pretty much independently.

Ralf Westphal

unread,
Mar 27, 2011, 1:31:52 PM3/27/11
to Flow Based Programming
On 27 Mrz., 00:47, Aristophanes <c...@tiac.net> wrote:
> Now the way I picture it there is another administrative layer above
> the nests that manages the message passing between nests.   It knows
> about threads, cores, sockets, and network connections.  

I wonder why you change the concept when "moving up" from nests? Why
is there an administrative layer above nests?
Why can´t nests be nested indefinitely?

I´m not really familiar with the FBP terminology, but to me it sounds
like:

-There are autons, ie. functional units responsible for actually doing
stuff :-) They represent domain logic. They are atomic with regard to
a FBP software system.
Correct?

-There are nests, ie. functional units aggregating autons to form some
higher level functionaliy.

Example: Converting all lines in a textfile from to upper case could
be done like this:

autons: read lines from file, convert lines, store lines in file
nest: convert file(read -> convert -> store)

(Please forgive me this ad hoc notation; I hope it conveys what I
mean.)

Now my question: Why should nests look different on the outside from
autons? I don´t thing they should.
From that follows:

-Nests don´t just connect autons but also nests.
-Nests can be nested.

That means a whole software system can be represented by just 1 nest
at the top - and then be refined into any number of nested nests, i.e.
any number of levels of functional abstraction. And all those levels
look the same. It´s a self-similar hierarchical structure.


>It isn't much
> like the single thread scheduler because it has little control over
> how well the networking.  Messages can go out and never be received.
> Since the upper level can reside on multiple processors it is
> essentially peer-to-peer.

How messages are passed from auton/nest to auton/nest should be
orthogonal to the functional decomposition. It´s an implementation
detail.

Aristophanes

unread,
Mar 27, 2011, 5:48:52 PM3/27/11
to Flow Based Programming


On Mar 27, 12:31 pm, Ralf Westphal <ralf...@googlemail.com> wrote:
> On 27 Mrz., 00:47, Aristophanes <c...@tiac.net> wrote:
>
> > Now the way I picture it there is another administrative layer above
> > the nests that manages the message passing between nests.   It knows
> > about threads, cores, sockets, and network connections.  
>
> I wonder why you change the concept when "moving up" from nests? Why
> is there an administrative layer above nests?
> Why can´t nests be nested indefinitely?

Because there is a technology phase change when you scale up. A nest
has a single thread of control and resides within a single unshared
memory address space. All of the autons under its supervision are
visible to it. When we go up to the distributed level there are
multiple threads of control that reside in different memory address
spaces. What is more in a networked environment the totality of the
network may not be visible at any one processor. In other words,
nests can have top down control in a single supervising program. In a
network the supervising program(s) may be implementing peer-to-peer
or ring connection architecture.

That said, there is room for a single OS supervisory level supervising
nests on a multi-threaded multi-core processor. It might be good to
have a term for that, but nest wouldn't be it.
>
> I´m not really familiar with the FBP terminology, but to me it sounds
> like:

Terminology is a problem. In his book, "Flow Based Programming", J.
Paul Morrison uses the term, component where I use the word auton.
FBP terminology depends on who you are talking to. :-)
>
> -There are autons, ie. functional units responsible for actually doing
> stuff :-) They represent domain logic. They are atomic with regard to
> a FBP software system.
> Correct?

Partly correct. Autons are the functional units responsible for doing
stuff. They are atomic in the system. However it's misleading to say
that they represent the domain logic; rather they implement the
operations available in the system. The domain logic resides in the
connectivity network.
>
> -There are nests, ie. functional units aggregating autons to form some
> higher level functionality. Likewise they are atomic. They
>
> Example: Converting all lines in a textfile from to upper case could
> be done like this:
>
> autons: read lines from file, convert lines, store lines in file
> nest: convert file(read -> convert -> store)
>
> (Please forgive me this ad hoc notation; I hope it conveys what I
> mean.)

Well that wasn't quite what I meant. I'm thinking of a nest as
(almost) a whole program. You can connect autons together to get
higher level functionality. FBP, the book, calls them subnets. But
it wouldn't make sense to make them into nests. A nest not only holds
autons, it also holds the connection network and all of the data
structures that the data-flow engine needs to run the system.

I think that one of the things that is an obstacle for people trying
to get a handle on flow based programming is that the autons and the
connection network are not a program.
>
> Now my question: Why should nests look different on the outside from
> autons? I don´t thing they should.
> From that follows:
>
> -Nests don´t just connect autons but also nests.
> -Nests can be nested.

Yes, one could do that and maybe it might make sense to do so.

>
> That means a whole software system can be represented by just 1 nest
> at the top - and then be refined into any number of nested nests, i.e.
> any number of levels of functional abstraction. And all those levels
> look the same. It´s a self-similar hierarchical structure.

The trouble is that they aren't self similar. The world wide web is
not a web site written large.
>
> >It isn't much
> > like the single thread scheduler because it has little control over
> > how well the networking.  Messages can go out and never be received.
> > Since the upper level can reside on multiple processors it is
> > essentially peer-to-peer.
>
> How messages are passed from auton/nest to auton/nest should be
> orthogonal to the functional decomposition. It´s an implementation
> detail.

You underestimate the importance of implementation details. They can
dictate logical structure.

Paul Morrison

unread,
Mar 27, 2011, 9:48:15 PM3/27/11
to Flow Based Programming
@Aristophanes, correct me if I am wrong, but the discussion that
started this thread seems to be about an FBP-like notation *within* a
single thread of control. So could you say you are describing a
design for a hybrid green thread/red thread implementation? What is
the motivation for wanting to going down below FBP components? Do
your users have to work with two different mental models? It seems to
me that it would be nice to have one model, and be able to defer the
question of how to split up the network, similarly to the chapter on
network cleaving in my book. It might be nice to be able to
automatically assign auta to threads. My understanding is that Erlang
does something similar, but only among a small number of actual OS
threads.

@Ralf, in FBP terms, what you said makes perfect sense. FBP subnets
are indeed a self-similar hierarchical structure. FBP components are
simply functions constructed using another language, rather than
processes and their connections. Where you said

>autons: read lines from file, convert lines, store lines in file
>nest: convert file(read -> convert -> store)

in JavaFBP, the first line is a list of components, while the second
line is a specification of a subnet or network, specifying how three
processes communicate, and in fact, several early implementations used
exactly that notation!

@Aristophanes, have you thought about building a nest scheduler as a
JavaFBP component? It could be interesting! I tried doing something
similar with the Actors described in my latest book, but your nests
sound more powerful!

PS Kilim that I mentioned above can be found at http://www.malhar.net/sriram/kilim/
.

Aristophanes

unread,
Mar 28, 2011, 12:52:46 AM3/28/11
to Flow Based Programming


On Mar 27, 8:48 pm, Paul Morrison <paul.morri...@rogers.com> wrote:
> @Aristophanes, correct me if I am wrong, but the discussion that
> started this thread seems to be about an FBP-like notation *within* a
> single thread of control.  So could you say you are describing a
> design for a hybrid green thread/red thread implementation?   What is
> the motivation for wanting to going down below FBP components?  Do
> your users have to work with two different mental models?  It seems to
> me that it would be nice to have one model, and be able to defer the
> question of how to split up the network, similarly to the chapter on
> network cleaving in my book.  It might be nice to be able to
> automatically assign auta to threads.  My understanding is that Erlang
> does something similar, but only among a small number of actual OS
> threads.

I see that I have been insufficiently clear. No, I'm not talking
about going below FBP components; instead I was talking about what one
might do to go above nests. In ordinary applications in a single
thread processor the each application has its own nest. What Ralf
and I were talking about (or at least what I thought we were talking
about) was the situation where you have multiple nests, presumably
each in a different thread or processor.

Put it this way: Auta (thank you) are all green. Each single nest is
green. However a collection of nests is red. A green thead (auton/
component) is cheap to run and manage. You can run a whole collection
of green threads (i.e., a nest) cheaply because the whole thing is
green. However managing a collection of nests is not so cheap because
you have all of that inter-thread logic to deal with. My thought is
that it makes sense to have a two tier engine, a green one for the
simpler green world, and a red one for the multi-thread and multi-
processor world..

Ralf was arguing for formal self-similarity of models as you go up the
hierarchy auton - nest - red application. That may make sense.
>
> @Ralf, in FBP terms, what you said makes perfect sense.  FBP subnets
> are indeed a self-similar hierarchical structure.  FBP components are
> simply functions constructed using another language, rather than
> processes and their connections.  Where you said
>
> >autons: read lines from file, convert lines, store lines in file
> >nest: convert file(read -> convert -> store)
>
> in JavaFBP, the first line is a list of components, while the second
> line is a specification of a subnet or network, specifying how three
> processes communicate, and in fact, several early implementations used
> exactly that notation!
>
> @Aristophanes, have you thought about building a nest scheduler as a
> JavaFBP component?  It could be interesting!   I tried doing something
> similar with the Actors described in my latest book, but your nests
> sound more powerful!

That might be possible,though I don't have time for it currently. I'd
have to look at JavaFBP closely to see if something could be done.

ern0

unread,
Mar 28, 2011, 4:25:22 AM3/28/11
to flow-based-...@googlegroups.com
> My thought is that it makes sense to have a two tier engine, a green
> one for the simpler green world, and a red one for the multi-thread
> and multi- processor world..

Can you tell us a not-too-abstract situation when a single multi-thread
dispatcher fails? I wanna understand all the multi-threading dispatcing
problems before I write my own. Until now, I voted for a single
dispatcher (queue), with some priority handling, but I'm not sure now.

(My dispatcher, written in C++ for our automation/homeaut system is
single-threaded, but there's a component, called Thread, which works
like a simple Thru, it just passes the packet unchanged, but the further
processing will be run in a separate thread - it's some kinda quick and
dirty solution compared to a real multithreading dispatcher).

Also, you made me think about compound components (subnets): what are
the situations, where atomic components of a compound one should be kept
together somehow (e.g. one thread, queue, etc. for one compound
component)? In my view (and implementation), compound components are
just macros, they are "exploding" to atomic components, in a running
network there're no differences between inside-compound-component
connections and any other defined components.

(Linguistix: I think, we should celebrate the 100th synonim of the
component/auton/process/etc.)

Ralf Westphal

unread,
Mar 28, 2011, 2:12:33 PM3/28/11
to Flow Based Programming
On 27 Mrz., 23:48, Aristophanes <c...@tiac.net> wrote:
> Because there is a technology phase change when you scale up.  A nest
> has a single thread of control and resides within a single unshared
> memory address space.  

Ah, I see, what you mean.
Sure, there is a difference between connecting functional units in the
same address space and accross the world.


> operations available in the system.  The domain logic resides in the
> connectivity network.

Hm... this is interesting. The logic resides in how functional units
are connected.
I would have called that a process or flow. But you might be right,
whatever "makes sense" within a domain surely is a collaboration of
several functional units.

And autons/atoms (I call them parts) are the very basic operations of
the domain. From them higher level operations can be built (or call
them domain logic flows).


> I think that one of the things that is an obstacle for people trying
> to get a handle on flow based programming is that the autons and the
> connection network are not a program.

Why don´t they form a program/application? It´s that the whole which
contains nests etc. implementing the total functionality?


> > How messages are passed from auton/nest to auton/nest should be
> > orthogonal to the functional decomposition. It´s an implementation
> > detail.
>
> You underestimate the importance of implementation details.  They can
> dictate logical structure.

Hm... I guess this is where we differ or even where this dicussion
runs into misunderstandings.
My feeling is you´re talking about implementation - and I´m talking
(or maybe Paul Morrison too) about design (or architecture).
I´m trying to separate the logical view from the code view.

This is why there are two articles on flow-orientation in my blog: one
is about the notation, ie. how do design flows:
http://geekswithblogs.net/theArchitectsNapkin/archive/2011/03/19/flow-design-cheat-sheet-ndash-part-i-notation.aspx

the other one is about the translation of flows into code:
http://geekswithblogs.net/theArchitectsNapkin/archive/2011/03/20/flow-design-cheat-sheet-ndash-part-ii-translation.aspx

This distinction has greatly helped in educating developers in flow-
oriented thinking. They are able to start thinking along these lines
easily by telling themselves, "This is all very easy to translate into
methods." And when they are ready, they can take the next step.
Finally moving to async flows (which I haven´t documented yet).

Also sync and async can be mixed in flow designs. Parts of a hierarchy
of flows can be async with regard to other parts. It´s just a matter
of how you translate the connections/wires between functional units.

But I digress ;-)

Bottom line: I think strictly separating modelling/design from
implementation would help the discussion.

Aristophanes

unread,
Mar 29, 2011, 1:32:30 PM3/29/11
to Flow Based Programming


On Mar 28, 3:25 am, ern0 <e...@linkbroker.hu> wrote:
> > My thought is that it makes sense to have a two tier engine, a green
> > one for the simpler green world, and a red one for the multi-thread
> > and multi- processor world..
>
> Can you tell us a not-too-abstract situation when a single multi-thread
> dispatcher fails? I wanna understand all the multi-threading dispatcing
> problems before I write my own. Until now, I voted for a single
> dispatcher (queue), with some priority handling, but I'm not sure now.
>
> (My dispatcher, written in C++ for our automation/homeaut system is
> single-threaded, but there's a component, called Thread, which works
> like a simple Thru, it just passes the packet unchanged, but the further
> processing will be run in a separate thread - it's some kinda quick and
> dirty solution compared to a real multithreading dispatcher).

I'm probably not the chap you want for this. However here are a few
comments. First of all, you had better have a good handle on inter-
thread communication, i.e., locking, semaphores, mutexes etc. If you
don't, you can get into real trouble with race conditions, mystery
bugs, that sort of thing. I expect that you already know this. The
answer, IMNSHO, is to keep things very simple.

Let's start with two threads, call them A and B. Now, where is the
dispatcher? Say it is in A. The problem is that the dispatcher is
going to work slowly in B so that B ends up running slowly. There are
things you can do about that, but then we are straying away from
simple.

An approach I like is to give each thread its own dispatcher and
connect them with pipes to carry the messages passed between threads.
The catch here is the dispatchers will need some logic to control how
much they read from the pipes. You could get a situation where A
tries to empty the pipe. Each time it reads a message it sends it off
to be processed. The result could be that B floods the pipe and A
never processes its own messages.

I'm not sure that this is of any help, but it is something to think
about.

Vladimir Sibirov

unread,
Mar 29, 2011, 3:35:28 PM3/29/11
to flow-based-...@googlegroups.com
Such caveats are described pretty well in Morrison's book. And I don't have a clue why we would need mutexes, semaphores and critical sections in FBP system, and vice versa why we would need FBP if we still have to do all the classic thread-safety tricks in it. Yes, it would rather be better to keep things simple and flow-based indeed.

The variety of terms meaning the same things in other words is starting to grow scarily. One day I would like to see an online specification for Flow-Based Programming, including definition of what flow-based systems are, what they aren't, what systems are flow-based, what classes of flow-based systems exist or can possibly exist, basic principles and of course solid terminology. Morrison's book contains answers to some of these questions, but it is not available to everyone, it is too long for a quick reference and some of its terminology may be considered as dated by other FBP researchers. Once a draft is prepared by someone, then a decision on actual terms can be made by voting and corrections can be made later in wiki fashion.

Despite there can't be the only one FBP framework or even FBP theory, solid terminology for common principles would help FBP people to understand each other and save a lot of effort reinventing the wheel (disk, round, circle,..).

2011/3/29 Aristophanes <c...@tiac.net>

Paul Morrison

unread,
Mar 29, 2011, 3:59:35 PM3/29/11
to Flow Based Programming


On Mar 29, 3:35 pm, Vladimir Sibirov <trustmas...@kodigy.com> wrote:
> Such caveats are described pretty well in Morrison's book. And I don't have
> a clue why we would need mutexes, semaphores and critical sections in FBP
> system, and vice versa why we would need FBP if we still have to do all the
> classic thread-safety tricks in it. Yes, it would rather be better to keep
> things simple and flow-based indeed.
>
No, FBP programmers very seldom have to worry about mutexes,
semaphores, etc. I suspect you could build very complex systems in
FBP without even knowing about them - certainly none of the test cases
in the JavaFBP jar file use them.
>
> The variety of terms meaning the same things in other words is starting to
> grow scarily. One day I would like to see an online specification for
> Flow-Based Programming, including definition of what flow-based systems are,
> what they aren't, what systems are flow-based, what classes of flow-based
> systems exist or can possibly exist, basic principles and of course solid
> terminology. Morrison's book contains answers to some of these questions,
> but it is not available to everyone, it is too long for a quick reference
> and some of its terminology may be considered as dated by other FBP
> researchers. Once a draft is prepared by someone, then a decision on actual
> terms can be made by voting and corrections can be made later in wiki
> fashion.
>
I would just like to point out that the book is now available on
Kindle and Lulu, so it is available to anyone with access to the
Internet. :-)
>
> Despite there can't be the only one FBP framework or even FBP theory, solid
> terminology for common principles would help FBP people to understand each
> other and save a lot of effort reinventing the wheel (disk, round,
> circle,..).
>
I would love to see that too! However the implementation keeps
intruding! Just for starters, could someone try to describe the
difference between auta, green threads, green processes and red
processes. Preferably in tabular format...

Vladimir Sibirov

unread,
Mar 29, 2011, 4:19:11 PM3/29/11
to flow-based-...@googlegroups.com
2011/3/29 Paul Morrison <paul.m...@rogers.com>
.
>
> The variety of terms meaning the same things in other words is starting to
> grow scarily. One day I would like to see an online specification for
> Flow-Based Programming, including definition of what flow-based systems are,
> what they aren't, what systems are flow-based, what classes of flow-based
> systems exist or can possibly exist, basic principles and of course solid
> terminology. Morrison's book contains answers to some of these questions,
> but it is not available to everyone, it is too long for a quick reference
> and some of its terminology may be considered as dated by other FBP
> researchers. Once a draft is prepared by someone, then a decision on actual
> terms can be made by voting and corrections can be made later in wiki
> fashion.
>
I would just like to point out that the book is now available on
Kindle and Lulu, so it is available to anyone with access to the
Internet. :-)
 
Sorry, but when I say a specification available to everyone, I mean a web page that can be opened in a web browser on any device pretty much like you can open any of W3C's specifications. It took me a few weeks to get your book delivered and such things as Kindle and Lulu make a little sense in my area even though it is not located in a galaxy far far away and there are thousands of programmers here.

And on the other hand a specification should be brief, targeting professionals rather than people taking their first steps in application development. When you have just a week or two to start using a new technology, accessing the essential information in one place is vital.

Aristophanes

unread,
Mar 29, 2011, 5:18:05 PM3/29/11
to Flow Based Programming


On Mar 29, 2:35 pm, Vladimir Sibirov <trustmas...@kodigy.com> wrote:
> Such caveats are described pretty well in Morrison's book. And I don't have
> a clue why we would need mutexes, semaphores and critical sections in FBP
> system, and vice versa why we would need FBP if we still have to do all the
> classic thread-safety tricks in it. Yes, it would rather be better to keep
> things simple and flow-based indeed.

I believe you've missed the context. Ern0 and I were talking about
issues of implementing an FBP style engine in a multi-core
environment. Threading esoterica is irrelevant to FBP applications
and the FBP programmers precisely because the FBP engine handles all
of that in a multi-threaded environment. Things for you, the FBP
programmer, have things simple and flow-based because the underlying
FBP engine makes it possible.

That said, handling threading in an engine can be fairly
straightforward.

>
> The variety of terms meaning the same things in other words is starting to
> grow scarily. One day I would like to see an online specification for
> Flow-Based Programming, including definition of what flow-based systems are,
> what they aren't, what systems are flow-based, what classes of flow-based
> systems exist or can possibly exist, basic principles and of course solid
> terminology. Morrison's book contains answers to some of these questions,
> but it is not available to everyone, it is too long for a quick reference
> and some of its terminology may be considered as dated by other FBP
> researchers. Once a draft is prepared by someone, then a decision on actual
> terms can be made by voting and corrections can be made later in wiki
> fashion.
>
> Despite there can't be the only one FBP framework or even FBP theory, solid
> terminology for common principles would help FBP people to understand each
> other and save a lot of effort reinventing the wheel (disk, round,
> circle,..).

That sounds like a good idea.

John Cowan

unread,
Mar 30, 2011, 12:38:15 AM3/30/11
to flow-based-...@googlegroups.com
Paul Morrison scripsit:

> I would love to see that too! However the implementation keeps
> intruding! Just for starters, could someone try to describe the
> difference between auta, green threads, green processes and red
> processes. Preferably in tabular format...

Auta are components. "Green" is an implementation technique for threads
that doesn't pass through the OS kernel when thread switching, which
make them lightweight. So "green threads" and "green processes" are
the same thing, except that "green processes" implies a shared-nothing
situation in which the threads communication only by message passing.
"Red processes" is not AFAIK an FBP term.

--
Barry thirteen gules and argent on a canton azure John Cowan
fifty mullets of five points of the second, co...@ccil.org
six, five, six, five, six, five, six, five, and six.
--blazoning the U.S. flag http://www.ccil.org/~cowan

John Cowan

unread,
Mar 30, 2011, 12:40:11 AM3/30/11
to flow-based-...@googlegroups.com
Vladimir Sibirov scripsit:

> Sorry, but when I say a specification available to everyone, I mean a web
> page that can be opened in a web browser on any device pretty much like you
> can open any of W3C's specifications. It took me a few weeks to get your
> book delivered and such things as Kindle and Lulu make a little sense in my
> area even though it is not located in a galaxy far far away and there are
> thousands of programmers here.

I have to agree with this. The book is only available in encumbered
formats, so you not only have to pay (which is reasonable), but you are
seriously restricted in what devices you can use to read it (which is not).
I'm not blaming JPM for this; it's the current state of most e-books today.

--
I now introduce Professor Smullyan, John Cowan
who will prove to you that either co...@ccil.org
he doesn't exist or you don't exist, http://www.ccil.org/~cowan
but you won't know which. --Melvin Fitting

Paul Morrison

unread,
Mar 30, 2011, 9:20:36 AM3/30/11
to Flow Based Programming

On Mar 30, 12:40 am, John Cowan <co...@mercury.ccil.org> wrote:
>
> I have to agree with this.  The book is only available in encumbered
> formats, so you not only have to pay (which is reasonable), but you are
> seriously restricted in what devices you can use to read it (which is not).
> I'm not blaming JPM for this; it's the current state of most e-books today.

I have downloaded the free "Kindle for PC" application, which lets me
read the Kindle version on my PC. There is also "Kindle for MAC".
Kindle for PC is quite friendly. I can also read both versions on
Calibre, which is also free. I also have downloaded it to my Samsung
Smartphone running Android - the free software is called Aldiko. Hope
one of these solutions will help.

ern0

unread,
Mar 30, 2011, 10:02:56 AM3/30/11
to flow-based-...@googlegroups.com
> read the Kindle version on my PC.

Perhaps, but it would be nice to have some urls like diz:
http://dataflowprogramming.net/terms/message

John Cowan

unread,
Mar 30, 2011, 10:54:06 AM3/30/11
to flow-based-...@googlegroups.com
Paul Morrison scripsit:

> I have downloaded the free "Kindle for PC" application, which lets me
> read the Kindle version on my PC. There is also "Kindle for MAC".
> Kindle for PC is quite friendly. I can also read both versions on
> Calibre, which is also free. I also have downloaded it to my Samsung
> Smartphone running Android - the free software is called Aldiko. Hope
> one of these solutions will help.

I'll add that Calibre can run on Linux, and that
http://lifehacker.com/#!5406505/run-kindle-for-pc-in-linux-with-wine
explains how to use Kindle for PC on Linux. But I'm surprised: Calibre
isn't supposed to be able to handle DRM-encumbered formats.

The penguin geeks is happy / As under the waves they lark
The closed-source geeks ain't happy / They sad cause they in the dark
But geeks in the dark is lucky / They in for a worser treat
One day when the Borg go belly-up / Guess who wind up on the street.

Paul Morrison

unread,
Mar 30, 2011, 5:17:54 PM3/30/11
to Flow Based Programming


On Mar 30, 10:54 am, John Cowan <co...@mercury.ccil.org> wrote:

> I'll add that Calibre can run on Linux, and thathttp://lifehacker.com/#!5406505/run-kindle-for-pc-in-linux-with-wine
> explains how to use Kindle for PC on Linux.  But I'm surprised: Calibre
> isn't supposed to be able to handle DRM-encumbered formats.
>
Sorry, I was running my own files on Calibre, before they were
uploaded to Kindle or Lulu, and forgot the latter attach DRMs to the
files. Still Kindle for PC works well, and I'm sure there are other
readers around (I think Adobe has one).

Isaac Gouy

unread,
Apr 5, 2011, 1:30:42 PM4/5/11
to Flow Based Programming


On Mar 25, 11:34 am, Vladimir Sibirov <trustmas...@kodigy.com> wrote:
> > a) are you sure this is how Erlang works - they refer to something
> > called "green processes" (not "green threads").
>
> Sorry, if I mix threads and processes sometimes when it does not make much
> difference. "green processes" (as well as "green threads"; coroutine is
> probably a good term to use instead) are not OS processes/threads, many of
> those coroutines exist within actual OS processes/threads.
>
> Have a look here in the "CPU Load" column:http://shootout.alioth.debian.org/u64q/performance.php?test=threadring
> It is pretty clear that Erlang runs all those "green processes" within only
> one of the four processor's cores and it is still faster than Java program
> using all four of them with "complete threads".


Note - 'Benchmarks that may seem to be concurrent are often
sequential. The estone benchmark, for instance, is entirely
sequential. So is also the most common implementation of the "ring
benchmark'; usually one process is active, while the others wait in a
receive statement.'

8.3 The SMP emulator

http://www.erlang.org/doc/efficiency_guide/processes.html#id66266



Have a look here, it's pretty clear that Erlang runs all these "green
processes" on all four processor cores :-)

http://shootout.alioth.debian.org/u64q/program.php?test=spectralnorm&lang=hipe&id=2




> > b) how would single thread take advantage of Java or C# threads?  Not
> > sure what you mean by "crossing the barrier of multiple threads" -
> > don't multiple threads give improved performance on multicore
> > machines?  JavaFBP certainly keeps all the cores busy - although I
> > suppose that could be Java overhead  :-)
>
> Now that you are familiar with difference between "threads" and "green
> threads" we can get back to Aristophanes' notes. And as he states himself: "
> The
> focus in this article is on scheduling issues for engines that
> run within a single thread of control." Which reminds me of "green"
> process/thread scheduling.
>
> For such a system, a problem of scaling from intra-thread to inter-thread
> scheduling/messaging exists. For example, I investigated it when I
> implemented Joe Armstrong's classical Ring benchmark for message passing in
> Go programming language:http://groups.google.com/group/golang-nuts/browse_thread/thread/dea3b...
Reply all
Reply to author
Forward
0 new messages