Introduction

80 views
Skip to first unread message

Auke van Slooten

unread,
May 14, 2012, 12:06:27 PM5/14/12
to flow-based-...@googlegroups.com
Hi,

I found this group - and the concept of flow based programming - just last week. I had been looking at a talk by Alan Kay, which has been posted here before ( http://tele-task.de/archive/video/flash/14029/ ) and it influenced my thinking profoundly. Over the last two weeks I tinkered a bit with one idea to reduce software complexity by - among other things - preventing shared state. The solution I came up with was a software bus which allows only immutable messages. I've since searched for other examples and found this group ( and zeromq - which is also very interesting ).

Seeing how there are some very intelligent people gathered here who at least have some of the same concerns I had, I was wondering what you think of the bus approach I've taken. You can read more at http://poefke.nl/en/code/simplebus/ and see a simple php implementation at https://github.com/poef/simpleBus/blob/master/php/simplebus.php and a very simple example application at https://github.com/poef/simpleBus/blob/master/php/examples/rss.1.php

I'm not sure if this approach has advantages or disadvantages over the flow based approach - where you must explicitly connect components. It looks like the bus approach is more difficult to debug, though I'm working on some tools to see if that can be made simpler.

Auke van Slooten
Muze

Henri Bergius

unread,
May 14, 2012, 12:41:53 PM5/14/12
to flow-based-...@googlegroups.com
Hi,

On Mon, May 14, 2012 at 6:06 PM, Auke van Slooten
<a.c.van...@gmail.com> wrote:
> Seeing how there are some very intelligent people gathered here who at least
> have some of the same concerns I had, I was wondering what you think of the
> bus approach I've taken. You can read more at
> http://poefke.nl/en/code/simplebus/ and see a simple php implementation
> at https://github.com/poef/simpleBus/blob/master/php/simplebus.php

Interesting, I like the chaining-style API you've done there!

I've written a Flow-Based Programming engine for Node.js
(http://noflojs.org/), plus a PHP port of the same:
https://github.com/bergie/phpflo

Your API would be an interesting alternative way of writing
components. My current one is quite traditional OOP style:
https://github.com/bergie/phpflo/blob/master/src/PhpFlo/Component/ReadFile.php
https://github.com/bergie/noflo/blob/master/src/components/ReadFile.coffee

if you're interested in how message queues and FBP can play together,
feel free to try the MQ capabilities in NoFlo:
https://github.com/bergie/noflo/tree/master/src/components/MQ

Their unit tests probably show how they work:
https://github.com/bergie/noflo/blob/master/test/MQ_SendMessage.coffee
https://github.com/bergie/noflo/blob/master/test/MQ_Queue.coffee

> Auke van Slooten
> Muze

/Henri

--
Henri Bergius
Motorcycle Adventures and Free Software
http://bergie.iki.fi/

Jabber: henri....@gmail.com
Microblogs: @bergie

Dan

unread,
May 14, 2012, 1:10:09 PM5/14/12
to flow-based-...@googlegroups.com
Welcome on board Auke!
I strongly recommend the following sources to understand FBP better:
J. Paul Morrison's book:
http://www.amazon.com/Flow-Based-Programming-2nd-Edition-ebook/dp/B004PLO66O/ref=sr_1_2?ie=UTF8&qid=1337014272&sr=8-2 (latest version)
http://www.jpaulmorrison.com/fbp/book.pdf (free but older version of the same book)
Blog:
http://my.opera.com/Vorlath/blog/ (A blog with very interesting articles about FBP but not only. Here you will find some of my comments too)

As far as I have read about SimpleBus I can notice that the closest resemblance is with a classical message system. Flow Based Programming (FBP) is not just a message system, it is a way to model the world in computer field. The liberty to send messages to many unconnected targets does not necessarily mean advantages in terms of modeling, speed or loose coupling. FBP could generate  very efficient code if you have a good compiler for it because FBP is the only one that exposes the data flow (explicit), in comparison to imperative and even functional languages (hidden data flow). Also you can use FBP concepts with other (classical) languages at various levels of granularity and with different approaches (top-down, bottom-up).
Have a glance to the info referred above and you can do a better comparison with the system you are working on.
Regards,
Dan

Ron Lewis

unread,
May 14, 2012, 5:03:13 PM5/14/12
to flow-based-...@googlegroups.com
Not that I have seen so many languages, but almost all programming languages I have seen have functions that can return at most one thing. So whether C, C++, C#, Basic, Visual Basic, and maybe some others, we have functions (or methods, whatever we decide to call them). The functions are something like this:

int s = myfunction( a, b, c, d);

where a, b, c, d are inputs, and s is a single output.

In Visual Basic and other languages, the function may look a little different, but is the same idea of multiple inputs and a single output.

So, we can deal with multiple outputs in C using structs. But having to define a struct every time we would like multiple outputs seems to be a move towards complexity. A single struct can be returned, but we have to define the struct every time. The inputs are still simple, just list a, b, c, d, etc.

In C++ and C#, we can return a single object (but not multiple objects). I think we are still talking about the same category as a C struct.

We can use arguments as outputs (c and d could be outputs). But that seems to me also a move toward complexity because now we have the inputs and outputs mixed together. (a and b are in the same list as c and d). And having mutable values (in this case the same argument that is an input is also an output) also moves towards complexity.

Some Windows programmers may recognize this as a famous example of an input that becomes the output:

MSG msg;
while (GetMessage(&msg, NULL, 0, 0))
{
TranslateMessage(&msg);
DispatchMessage(&msg);

msg is an output of the GetMessage function and is both an input and an output of the TranslateMessage and DispatchMessage functions. Additionally, msg is an instance of the MSG struct. This is the famous or infamous Windows Message Loop.

I think functional languages have tuples. I know F# has tuples. Tuples allow us to return multiple values from a function. 

I have never written a program in FORTH. But I learned that FORTH uses a data stack in addition to a return stack. The other languages (C, C++, etc.) have only one stack for both data and returns to calling functions. Seems to me that the FORTH data stack makes returning multiple values easy. But of course we have to keep track of what data is on our data stack.

The FORTH data stack avoids the problem of memory fragmentation. To return multiple values in C, C++, Visual Basic, etc. we can use structs or objects. But we need to allocate memory somewhere. We can declare a variable in stack memory (such as in the Windows Message Loop, GetMessage(&msg, NULL, 0, 0)). Then we pass the address or reference to the function. The function returns the value to the address or reference. 

Without declaring the variable in stack memory, we can allocate the memory inside the function using malloc() or new. new uses malloc() -- a conclusion I reached from stepping through code with a debugger.

When we use malloc() and free(), we fragment memory. The same would be true for new and delete. To reduce memory fragmentation, free() and delete use some kind of algorithm that works towards keeping unallocated heap memory in large chunks. The FORTH data stack never has a problem with memory fragmentation. Unallocated data stack memory remains in one chunk. Whatever free() algorithm we use consumes CPU time. I think Windows hangs from time to time while it consolidates fragmented heap memory, as well as for other reasons. In C# and VB, the consolidation of heap memory is garbage collection.

I think there are many instances in which returning multiple values is desirable. We might implement a high precision division using integers. To do so, we call a function to return the Quotient, and another function to return the Remainder. Each function does exactly the same division. Why do we need to do the same division twice?

Seems to me that part of reducing complexity would be to have functions that take a tuple input and return a tuple output, without need to define a struct. So, instead of a function looking like:

o = f(i1, i2)

a function looks like:

(o1, o2) = f(i1, i2)

with o1 and o2 being outputs and i1 and i2 being inputs.

Maybe the definition of the function might be:
(int o1, bool o2) f (float i1, float i2)
{
// whatever

What I am saying is that I think passing multiple values into a function is simpler than returning multiple values from a function. And having an equally simple way to return multiple values from a function would also be a step towards simplifying programming.

I hope to learn from what others say about this post of mine. 


Brad Cox

unread,
May 14, 2012, 5:09:24 PM5/14/12
to flow-based-...@googlegroups.com
FBP streams convey Objects, which are typically JSON or XML instances, any of which can contain any number of values. Same true of any Java app. C's the impoverished exception.

Tom Young

unread,
May 14, 2012, 5:56:21 PM5/14/12
to flow-based-...@googlegroups.com
http://ubuntuforums.org/showthread.php?t=1460449
shows how to return multiple values from a C/C++ function using pointers.
Ex. tester( &x, &);  // will return x and y to the caller.

        -twy
 
In C++ and C#, we can return a single object (but not multiple objects). I think we are still talking about the same category as a C struct.

                      ... 



--
"...the shell syntax required to initiate a coprocess and connect its input and output to other processes is quite contorted..."
W. Richard Stevens [Advanced Programming in the Unix Environment, p441]

Ron Lewis

unread,
May 14, 2012, 10:58:18 PM5/14/12
to flow-based-...@googlegroups.com
 Yes, the tester example works and has some advantages. The returned values are on the stack, therefore, no CPU time consumed on algorithms to prevent fragmentation of heap memory. The problem of which I am aware with fragmentation of heap memory is you could have an allocation failure even when you have enough unallocated memory. This failure occurs when malloc() and free() are used in a way that leaves heap memory with enough bytes of unallocated memory to fulfill an allocation request, but no block of bytes large enough to fulfill the allocation request. So allocating on the stack spares us of memory fragmentation problems.
 
Another feature of the pointer approach is that structs and classes are not needed to return the multiple values.
 
A complexity disadvantage is that the outputs (&x and &y) are in the same place as where the inputs go. Does this add to the complexity? or maybe to possible confusion? I observed that Microsoft has two #defines. IN and OUT to help. The defines would be:
 
#define IN
#define OUT
 
That is, IN and OUT are defined as nothing. So you could write  
 
tester(OUT &x, OUT &y ); // will return x and y to the caller.
 
[I think Tom Young already knows this. But suppose that not everyone is as familiar with C as Tom. So, I add some more explanation.]
 
Can anyone say how tuples are implemented in a functional language interpreter? Would tuples be implemented with malloc() or new? If C would have tuples, then you could write:
 
(x2, y2, z2) = f (x1, y1, z1, 5);    // f is a function
 
and
 
(s, t) = (25, t + 42);    // sets s = 25 and t = t + 42
 
 and
 
F1 ( F2 (5 , 7.2), "some string");     // F1 and F2 are functions. F2 returns one or more values which F1 uses for inputs
 
However, C/C++ does not have statements like these for returning and manipulating multiple return values. I think tuples might have been added to C#.
 
 
 

Dan

unread,
May 15, 2012, 6:31:58 AM5/15/12
to flow-based-...@googlegroups.com
Hi,

In the classical imperative languages there is a good reason why the function returns only one value (struct) and even if you have a construct like: (o1, o2) = f(i1, i2) does not solve the problem, it just simplifies the program coding.
All the classical imperative languages and functional languages too, have this problem. The reason is the function substitution model where all the processes are convergent.
Of course you have functions (components, modules) that have multiple outputs that should go (split) to various different targets (receptor components). In FBP you define a connection between an output port (it represents one of many returned values) and an input port (one from many input parameters of another components/function). The FBP system is going to do the work of message transfer whenever some data is available to an output port. J.Paul Morrison calls this messages / parameters as IPs (Information Package)
The problem of divergent paths that emerge exactly at the components with multiple return values creates problems in terms of execution mapping. Because all the classical imperative languages are natively single threaded they force you (as a programmer) to implement by yourself the execution of multiple divergent (and parallel) paths in sequence. Even if later you chose to use multi-threading features the logic is not described in the program as such (explicitly parallel where ever it is the case). You are forced to explicitly implement the multi-threading by yourself and this is a nightmare. In FBP multi-threading is implicit because the data flow is explicit where in imperative languages is not.
That's why FBP is so good because it is implicitly parallel and the order of lines of code does not matter. It is declarative. The order is strictly given by connections among components.
You can find more info about this issue in the following article:
http://my.opera.com/Vorlath/blog/show.dml/177363http://my.opera.com/Vorlath/blog/show.dml/177363
You can look for my comments (show comments at the end and search for "Dan Pologea" and the author answers).
Regards,
Dan

Brad Cox

unread,
May 15, 2012, 7:05:04 AM5/15/12
to flow-based-...@googlegroups.com
In C's case, the reason is more that it pushes incoming arguments onto a stack (i1, i2) but returns outgoing stuff via explicit code, typically in a register. Single value lives in that register when it unwinds the call stack. Multiple value returns are hard in C because there's no place to reliably store return values. Thus the requirement to pass &struct as an argument to return struct as a value.
Message has been deleted

Paul Morrison

unread,
May 15, 2012, 10:43:08 AM5/15/12
to flow-based-...@googlegroups.com
Ron, it seems to me that you are conflating several different ideas.  Returning multiple values from a function may just be a matter of syntax, but IMO it weakens the power of the function - and you still need a "heap" mechanism for allocate and free, because allocates and frees are not done in a strictly nested way.    You can allocate A, allocate B, free A, then free B, as dictated by the program logic - and then when you add in the flow ideas of FBP, which, as Dan says, significantly simplify the logic of your application, this becomes even more necessary.  If a process is copying IPs,  the two copies will typically take different routes through the network, and have different lifetimes. 

IIRC the FORTH stack is a great way of managing arithmetic expressions, and even more so functional structures, e.g. "a + b * c" is implemented very nicely by:

  push a
  push b
  pop top 2 items off stack; add them; push result on stack
  push c
  pop top 2 items off stack; multiply them; push result on stack

Very nice and clean, and generalizes nicely to functions!  Yes, you could push 2 results onto the stack but I think at some loss in clarity.   However, this is still purely synchronous, and does not address the multiprocessing model at all - although of course a FORTH stack for each process would certainly work well.

I would just like to add, re Ron's comments about the overhead of allocate and free, that in AMPS (the first FBP implementation) we paid a lot of attention to storage allocation and deallocation - and there was a lot of academic work going on in that area at that time (early 1970s).  We were able to take advantage of the fact that IPs of a given type typically were of fixed size to get a very fast allocation/deallocation mechanism - of course today where strings can vary from 0 bytes to a gazillion, this might not buy you as much...  Still there may be tuning opportunities here :-)

Regards

PS I had forgotten about Vorlath's work - he's been rather quiet recently...?

Ron Lewis

unread,
May 15, 2012, 10:45:51 AM5/15/12
to flow-based-...@googlegroups.com

Wow! Right on explanations from Dan and Brad. 

If internally C would use a separate data stack, then I would think passing multiple values to and from functions would be natural. But since C uses a register for returning values. Using a register would take less CPU than a stack. And I think optimizing C compilers would use a register when possible to pass an argument to the function.

These days, we are concerned with multiple CPUs. Maybe we block simultaneous execution of a function by multiple CPUs, or we write re-entrant functions that can be executed by multiple CPUs simultaneously. Seems to me that this can effect how we implement multiple return values.

When you call a function for which you need to check for success and error, multiple return values would be convenient. For example, open_file(). You get a file handle returned. If the file handle is invalid (maybe use zero for an invalid handle), then you need to call some other function to find out why. With multiple CPUs, another CPU could execute  open_file() before you get a chance to call the error function to find out why the file did not open. Maybe we use try-catch. But is try-catch simpler than returning multiple values? A moot point.

IPs are multi-value. What would be the most FBP-way for open_file()? Or maybe FBP would relegate to the component. If error, the open_file component sends one message to component A; if success, send a different message to component B? 


On Tuesday, May 15, 2012 7:05:04 AM UTC-4, Brad Cox wrote:
In C's case, the reason is more that it pushes incoming arguments onto a stack (i1, i2) but returns outgoing stuff via explicit code, typically in a register. Single value lives in that register when it unwinds the call stack. Multiple value returns are hard in C because there's no place to reliably store return values. Thus the requirement to pass &struct as an argument to return struct as a value.

Ged Byrne

unread,
May 15, 2012, 10:47:18 AM5/15/12
to flow-based-...@googlegroups.com
Hi Ron,

It strikes me that if you have a language that accepts and returns tuples then you essentially have is a relational programming language.  Like the one described in the Third Manifesto: http://en.wikipedia.org/wiki/The_Third_Manifesto

You can play with the relational language defined by Date and Darwen, Tutorial D, using the open source Rel: http://dbappbuilder.sourceforge.net/Rel.php

What I find interesting about this is that usually a relational language will result in a flow diagram very similar to FBP rather than the stack based implementation.

Regards, 


Ged

On 14 May 2012 22:03, Ron Lewis <rle...@indinfer.com> wrote:
[...]

Ron Lewis

unread,
May 15, 2012, 11:34:52 AM5/15/12
to flow-based-...@googlegroups.com
I received a lot of good thinking and information from Paul. The fixed-size allocations does obviate searching for large-enough blocks and concerns for fragmentation.

I think of our hard disk file systems. Our files are variable lengths. But underneath it is a heap fragmentation problem solved by having each hard disk sector an equal size. I would reckon that on average the last sector of a file is about half unused (wasted). On the other hand, we do NOT have a complicated algorithm to find a block of sectors big enough for a file space request. I would suppose that a solid state hard drive (SDD) would not experience any speedup through using Windows (DOS) defrag utility.

Maybe the tradeoffs today are different from yesteryear. Maybe today we feel we have so much memory that we are up to our ears in gigabytes of memory. The speed of SDDs may approach the speed of memory. So, instead of hard disk files versus memory we have persistent storage versus volatile memory. 

Not so simple dealing with fixed allocation units which are sometimes linked to other allocation units when more space is needed. Our file systems are made to look like variable length files.



On Tuesday, May 15, 2012 10:43:08 AM UTC-4, Paul Morrison wrote:
Ron, it seems to me that you are conflating several different ideas.  Returning multiple values from a function may just be a matter of syntax, ...
 
...

John Cowan

unread,
May 15, 2012, 11:51:36 AM5/15/12
to flow-based-...@googlegroups.com
Ron Lewis scripsit:

> Not that I have seen so many languages, but almost all programming
> languages I have seen have functions that can return at most one thing.

Scheme, Common Lisp, and Lua are the languages I've used actively that
allow multiple values (including zero values) to be returned from
a function. There are others. Lua's syntax for capturing multiple
values is probably the most familiar:

a, b, c = f()

where f returns three values, will put them into a, b, and c respectively.

In Common Lisp and Lua, attempts to return more values than the context
expects will cause the remaining values to be ignored. For example,
the Common Lisp "round" function will round a floating-point value to
the nearest integer, returning the rounded value and the remainder as
multiple values. If you care only about the rounded value, you can assign
it to a variable or pass it to another function (including arithmetic
operators) without further fuss -- the second value is simply disregarded.

Multiple values are a generalization of the ability to return either zero
or one value in most languages (in C, you always return a value, but it's
garbage if the function exits with "return;"), and of the ability for
functions to accept multiple arguments. Indeed, in Scheme and Common Lisp
one way to a function that produces multiple values is to use syntax that
passes the values to a consumer function which accepts multiple arguments.

> We can use arguments as outputs (c and d could be outputs). But that seems
> to me also a move toward complexity because now we have the inputs and
> outputs mixed together.

That's what is done in Fortran, where any argument can be an input, an
output, or both. In Ada this is also true, but you explicitly mark the
arguments at the point of definition as being "in", "out", or "in out".

> Seems to me that the FORTH data stack makes returning multiple values easy.

Correct.

> The FORTH data stack avoids the problem of memory fragmentation. To return
> multiple values in C, C++, Visual Basic, etc. we can use structs or
> objects. But we need to allocate memory somewhere.

In C, the compiler typically arranges for the caller of a struct-valued
procedure to allocate space for the returned value on the stack along
with the arguments. The called function gets a pointer to this struct,
fills it in, and returns.

--
John Cowan http://www.ccil.org/~cowan <co...@ccil.org>
"Any legal document draws most of its meaning from context. A telegram
that says 'SELL HUNDRED THOUSAND SHARES IBM SHORT' (only 190 bits in
5-bit Baudot code plus appropriate headers) is as good a legal document
as any, even sans digital signature." --me

John Cowan

unread,
May 15, 2012, 11:56:42 AM5/15/12
to flow-based-...@googlegroups.com
Brad Cox scripsit:

> In C's case, the reason is more that it pushes incoming arguments onto a
> stack (i1, i2) but returns outgoing stuff via explicit code, typically in a
> register.

That's wrong-end-to: C only allows single return, so it can return the
value in a register. In machine architectures with a reasonable number
of general-purpose registers (arguably there is only one on the x86),
it is common to pass the first N arguments in N registers and the rest,
if any, on the stack.

> Multiple value returns are hard in C because there's no place to
> reliably store return values. Thus the requirement to pass &struct as an
> argument to return struct as a value.

That's not the case. Struct-valued functions were added to C in 1977, just
before the publication of K & R first edition. They remain valid today.
Unfortunately, most of the C standard library was already set in stone
by then, and writers of more modern APIs have copied its archaic style.

--
"But I am the real Strider, fortunately," John Cowan
he said, looking down at them with his face co...@ccil.org
softened by a sudden smile. "I am Aragorn son http://www.ccil.org/~cowan
of Arathorn, and if by life or death I can
save you, I will." --LotR Book I Chapter 10

John Cowan

unread,
May 15, 2012, 12:14:17 PM5/15/12
to flow-based-...@googlegroups.com
Paul Morrison scripsit:

> Very nice and clean, and generalizes nicely to functions! Yes, you
> could push 2 results onto the stack but I think at some loss in
> clarity.

Forth words often do push multiple values on the stack. For example,
the DUP word has a signature of (x y -- y x), meaning that it pops
two values off the stack and pushes them back in reverse order.

--
"The serene chaos that is Courage, and the phenomenon co...@ccil.org
of Unopened Consciousness have been known to the John Cowan
Great World eons longer than Extaboulism."
"Why is that?" the woman inquired.
"Because I just made that word up", the Master said wisely.
--Kehlog Albran, The Profit http://www.ccil.org/~cowan

John Cowan

unread,
May 15, 2012, 12:16:41 PM5/15/12
to flow-based-...@googlegroups.com
Ron Lewis scripsit:

> I think of our hard disk file systems. Our files are variable lengths. But
> underneath it is a heap fragmentation problem solved by having each hard
> disk sector an equal size. I would reckon that on average the last sector
> of a file is about half unused (wasted). On the other hand, we do NOT have
> a complicated algorithm to find a block of sectors big enough for a file
> space request.

Some filesystems do have a concept of block fragments, where a single
block may have fragments from more than one file. But blocks are always
allocated first and then fragments within the blocks as needed.

--
That you can cover for the plentiful John Cowan
and often gaping errors, misconstruals, http://www.ccil.org/~cowan
and disinformation in your posts co...@ccil.org
through sheer volume -- that is another misconception. --Mike to Peter

Brad Cox

unread,
May 15, 2012, 12:37:26 PM5/15/12
to flow-based-...@googlegroups.com
One of us must be wrong. I wonder which one?

John Cowan

unread,
May 15, 2012, 1:02:19 PM5/15/12
to flow-based-...@googlegroups.com
Brad Cox scripsit:

> One of us must be wrong. I wonder which one?

Well, it's easy to see from any copy of K&R 1 that I'm right about
structure-valued functions in C. I suspect that multiple value returns
never even entered the mind of anyone involved with C, B, BCPL, or CPL,
so the return-in-a-register convention was completely obvious.

--
La mayyitan ma qadirun yatabaqqa sarmadi John Cowan
Fa idha yaji' al-shudhdhadh fa-l-maut qad yantahi. co...@ccil.org
--Abdullah al-Hazred, Al-`Azif http://www.ccil.org/~cowan

Paul Tarvydas

unread,
May 15, 2012, 2:06:34 PM5/15/12
to flow-based-...@googlegroups.com
> One of us must be wrong. I wonder which one?

The answer may lie in these pages:


Given that V7 C required strict declaration before use, there was no *need* to use registers for return values, e.g. caller pre-allocates return variable on the stack knowing the return signature of the called function, before pushing the args.

pt


Reply all
Reply to author
Forward
0 new messages