considering dataflow for streams, by looking at the syntax

1 view
Skip to first unread message

Ben Clifford

unread,
Mar 5, 2010, 8:44:22 AM3/5/10
to stupi...@googlegroups.com

I have (and I think other Ben has) a certain uneasiness with syntax for
output streams.

On the one hand, they're "output" so should go on the LHS of a function
call. But you aren't really assigning anything to the variable when you're
doing that. So maybe the C style of passing in on the right-hand side as
an input value is the way to go?

1. functions and variables for "functions"

x = f(y)

y is something that you can read a single value from (and you *must* be
able to read a value - there's no option that y doesn't have a value)

x is something that you can write a value to (in the above).

But ultimately you read from it, by using it in another expression, or by
returning it.

y = f(x)
z = g(y)

So what's the y for? its a label for getting the output of f and feeding
it into g.

And what are values? Pretty much they're thinks you could sensibly(!) copy
down onto a (very large) piece of paper - eg integers, arrays of a million
bits.

2. unix pipes and redirection, for "streams"

In a shell you can write s

sed 's/a/X/' < /dev/ttyS1 > /tmp/y

/tmp/y might be a file, or it might be something more like a fifo.

Maybe you want to process data from /dev/ttyS1 using several tools:

sed 's/a/X'/ < /dev/ttyS1 > /tmp/y
sort < /tmp/y > /tmp/z

Now what if we change this sh syntax so data always flows leftwards, like
in the function example?

/tmp/y < sed 's/a/X' < /dev/ttyS1
/tmp/z < sort < /tmp/y

That starts to take on the same shape as in section 1 - with "sed 's/a/X'
instead of f, and sort instead of g. So what is the point of /tmp/y ? Its
some way to connect the output of sed to the output of sort.

So do the 'variables' in this shell notation have 'values'?

Certainly if /tmp/y is a file, I can type:
cat /tmp/y
and get something that I can copy down onto a piece of paper. If I type
that cat command over and over, I'll get the same thing to copy down onto
a piece of paper, forever (at least in single-thread land, where no other
process can modify my filesystem). So yes, /tmp/y has a value here.

But under that definition, what is the 'value' of /dev/ttyS1, the serial
port round the back of my PC? or of /dev/urandom? I can start copying down
bytes coming in off /dev/urandom onto a piece of paper as they come out of
cat. But what about if i restart that cat and start copying again? I'll
get (probably) something different to write down. So there's no well
defined notion of 'value' there.

So forget the idea of things having a 'value' in this bash shell model.

In the value assignment section, = and f() notation is used. In the second
section, little arrows < and > are used.

Both forms direct 'data' into 'things that process data', and 'variables'
are used to connect those things that process data together. In section 1,
there were values, and functions of those values. In section 2, there were
streams of bytes, and stream transformers that, given one stream as input,
output another stream as output.

So lets abstract the syntax a bit. Forget = and f(), and < and > and have
a more general syntax:

y <- f <- x

When x and y are 'values' (in the s1) sense, then this is this function
invocation:

y = f(x)

When x, y are 'streams' (in the s2) sense, then this is a shell command:

f < x > y


Do we have to always live "the world of values and functions" or "the
world of streams" or can they be merged into a single world?

They can!

That already happened with sed in the example above:

sed 's/a/X' >/tmp/y < /dev/ttyS1

This looks like sed is a stream processor living in the "world of streams"
- but we actually gave it a value too: the regexp to be processed.

So maybe we could write this:

y <- sed <- (/dev/ttyS1, 's/a/X/')

We feed in a stream and a value, and we get out a stream.

This extends to returns as well. Imagine a procedure which capitalises a
stream of text, and also returns the word count after reaching the end of
the stream. Invoking it might look like this:

(capitalised, count) <- capitaliseAndCount <- txt

Capitalised is a stream. Count is an int.

Similarly, multiple return values, multiple return streams, multiple input
parameters and streams.

The above, then, is what is/should be going on in stupid. Slightly
changing the syntax, at least for discussion, if not in the
implementation, changes 'assignment' and "how do you assign to a
streams!?" into a more general flow-of-data syntax.

--
http://www.hawaga.org.uk/ben/

Ben Laurie

unread,
Mar 20, 2010, 11:03:25 AM3/20/10
to stupi...@googlegroups.com

I like the argument, though I'd note that Unix shells already have a
notation for streams that goes the other way, i.e. pipes.

In any case, I've decided I'm pretty happy with output streams being
treated as output variables :-)

What I'm wondering about now is multi-valued functions in the middle of
statements. Using your example above, could I write...

x = capitaliseAndCount(txt)[1];

? There'd have to be an implicit null stream as output here, which seems
a little ... odd. But supposed it instead returns two values, then that
makes more sense. Or suppose it returns an array.

I'm somewhat inclined to ban such elaborate constructions and insist
that you write it out in full, e.g.

(null, x) = capitaliseAndCount(txt);

or...

array(uint8, 25) x = someArrayProducingFunc();
y = x[1];

--
http://www.apache-ssl.org/ben.html http://www.links.org/

"There is no limit to what a man can do or how far he can go if he
doesn't mind who gets the credit." - Robert Woodruff

Ben Clifford

unread,
Mar 20, 2010, 1:49:40 PM3/20/10
to stupi...@googlegroups.com

On Sat, 20 Mar 2010, Ben Laurie wrote:

> I like the argument, though I'd note that Unix shells already have a
> notation for streams that goes the other way, i.e. pipes.

Right. That notation also allows the equivalent of something like g(f(x))
in function notation, cat x | f | g

> What I'm wondering about now is multi-valued functions in the middle of
> statements. Using your example above, could I write...

Before, I think(?), we talked about not allowing function calls to be
nested - so that you couldn't say g(f(x)) - instead you could only make a
function call in an assignment. y=f(x); z=g(y)

If thats the case, it extends fairly naturally to disallowing function
calls inside any kind of nested expression, so that a function call cannot
be the left operand of []

> x = capitaliseAndCount(txt)[1];

> I'm somewhat inclined to ban such elaborate constructions

I think you did before already ;)

> and insist that you write it out in full, e.g.
>
> (null, x) = capitaliseAndCount(txt);
>
> or...
>
> array(uint8, 25) x = someArrayProducingFunc();
> y = x[1];

yeah. Some languages use _ for that null, so you write

(_,x) = capitaliseAndCount(txt);

That null concept works for both values and streams equally well.

--
http://www.hawaga.org.uk/ben/

Ben Laurie

unread,
Mar 21, 2010, 8:47:34 AM3/21/10
to stupi...@googlegroups.com
On 20/03/2010 17:49, Ben Clifford wrote:
>
> On Sat, 20 Mar 2010, Ben Laurie wrote:
>
>> I like the argument, though I'd note that Unix shells already have a
>> notation for streams that goes the other way, i.e. pipes.
>
> Right. That notation also allows the equivalent of something like g(f(x))
> in function notation, cat x | f | g
>
>> What I'm wondering about now is multi-valued functions in the middle of
>> statements. Using your example above, could I write...
>
> Before, I think(?), we talked about not allowing function calls to be
> nested - so that you couldn't say g(f(x)) - instead you could only make a
> function call in an assignment. y=f(x); z=g(y)

Hmm, that's true.

A problem I have with even that, though, is it seems natural to use a
function as an initialiser. But it doesn't translate very readily into
legal C. For example...

struct foo x = foo_init();

has to become:

struct foo x;

foo_init(&x);

which leaves x initially uninitialised. Even worse, if there are
variable declared after x, then they get initialised out of order
(because in C all variables have to be declared up front). Which is even
worse still if they depend on x in some way. Which then means you end up
declaring all variables, and then initialising them. Which is horrible,
because arrays, structs, etc, then end up being initialised with a memcpy.

If, instead, we ban complex initialisers (i.e. limit to only things
which can be calculated at compile time), then things that use functions
to initialise end up being initialised twice. Which seems sad.

Of course, the example above _is_ effectively calculable at compile
time, so perhaps I should just say that one day someone will write a
good optimising compiler for Stupid and put up with the inefficiencies!

> If thats the case, it extends fairly naturally to disallowing function
> calls inside any kind of nested expression, so that a function call cannot
> be the left operand of []
>
>> x = capitaliseAndCount(txt)[1];
>
>> I'm somewhat inclined to ban such elaborate constructions
>
> I think you did before already ;)
>
>> and insist that you write it out in full, e.g.
>>
>> (null, x) = capitaliseAndCount(txt);
>>
>> or...
>>
>> array(uint8, 25) x = someArrayProducingFunc();
>> y = x[1];
>
> yeah. Some languages use _ for that null, so you write
>
> (_,x) = capitaliseAndCount(txt);
>
> That null concept works for both values and streams equally well.
>


--

Ben Clifford

unread,
Mar 21, 2010, 9:00:44 AM3/21/10
to stupi...@googlegroups.com

> which leaves x initially uninitialised. Even worse, if there are
> variable declared after x, then they get initialised out of order
> (because in C all variables have to be declared up front).

> Which is even worse still if they depend on x in some way.

C99 (I think) permits interleaved declarations and code. But maybe thats
too futuristic for now.

In earlier C, I think that the equivalent can be made by using nested
blocks. By converting this C99 code:

int x;
init(&x);
int y;
init(&y);
restofprogram

into this C90 code:

int x;
init(&x);
{
int y;
init(&y);
restofprogram
}

(adding one nesting level)


Assuming that the only thing between the declaration of x and the start of
the next nested scope is the initialisation of x, then nothing except the
initialiser function sees an uninitialised x.

Each variable declaration statement would cause a new nested block.
(potentially multiple variable declarations can be made in a single
statement, and in that case, they would all appear in the same block with
a singl initialiser at the end)

Makes for even uglier destination C code. I'm unsure what the end result
of compiling that is, but I think it shouldn't be expensive.

--

Ben Laurie

unread,
Mar 22, 2010, 10:59:48 AM3/22/10
to stupi...@googlegroups.com

Since gcc is OK with declarations in random places, I am inclined to
accept your theory that I could work around the problem like this :-)

Ben Clifford

unread,
Mar 22, 2010, 11:02:48 AM3/22/10
to stupi...@googlegroups.com

> Since gcc is OK with declarations in random places, I am inclined to
> accept your theory that I could work around the problem like this :-)

> > int x;
> > init(&x);
> > {
> > int y;

That should work in any C, not just gcc, I think ... with gcc running in
its normal mode, you can lose the nesting of {}-blocks entirely.

--

Ben Laurie

unread,
Mar 23, 2010, 7:59:06 AM3/23/10
to stupi...@googlegroups.com

You misunderstand - I am not doing the nested blocks thing for now.

Ben Clifford

unread,
Mar 23, 2010, 8:53:20 AM3/23/10
to stupi...@googlegroups.com

> You misunderstand - I am not doing the nested blocks thing for now.

ahh, the easy way ;)

--

Reply all
Reply to author
Forward
0 new messages