Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Data Flow languages and programming - Part I

28 views
Skip to first unread message

Richard Harter

unread,
Oct 26, 2009, 6:22:14 PM10/26/09
to

SHOULD IT BE TURTLES ALL THE WAY UP?

In the famous anecdote, the little old lady replies to the noted
philosopher, "It's turtles all the way down." When it comes to
writing software many writers on software design and many
programming language creators seem to believe that it is turtles
all the way up.

What do I mean by "turtles all the way up"? By this I mean the
thesis that the techniques and programming language concepts that

are used in the small can be extended indefinitely to programming

in the large. In other words if we use language X for 100 line
programs and 1000 line programs we can also use it for 1000000
line programs. We may have to add some extensions and new
features along the way, but we can increase the size of the
programs we write indefinitely using the same ideas.
<p>
The antithesis is that it isn't turtles all the way up, or at
least it shouldn't be turtles all the way up. That is, the kinds

of languages and programming technologies that we use in
large scale programming should be quite different from the kind
of languages used for programming in the small.
<p>
There are many propositions as to what those large scale
technologies should be, and many such technologies in use. Here
I am going to look at data flow languages and data flow
structuring of programs.

WHAT ARE DATA FLOW LANGUAGES?

There are two wikipedia articles that give useful answers:
http://en.wikipedia.org/wiki/Dataflow_programming
http://en.wikipedia.org/wiki/Flow-based_programming

The distinction wikipedia makes between data flow programming and

flow based programming is obscure. The following definition is
an edited version of definitions used in the two articles.

Data flow languages structure applications as networks of
"black box" elements that exchange data across predefined
connections by message passing. Elements execute when they
get messages; they send messages asynchronously. Data flow
based applications are inherently parallel.

There are a wide variety of data flow languages, varying from
spread sheets, to Labview, to Erlang. Many are graphical;
programming is done by altering flow diagrams. One thing they
all have in common is that they have a run time system.

Traditional imperative programs are composed of routines that
call each other; i.e., when a call is made the caller constructs
a data packet (calling sequence) and transfers control and the
data packer to the called routine. When the called routine is
done it constructs a data packet to pass back to the caller and
transfers control back to the caller.

In data flow programs the "routines" do not call each other.
Instead they are activated by the run time system when there is
input for them; when they create outputs the run time system
takes care of moving the output to the destination input areas.
When the "routines" are done they transfer control back to the
run time system.

One difference between traditional programs and data flow
programs is that traditional programs use LIFO semantics whereas
data flow programs use FIFO semantics. That is, a traditional
program puts data on a stack and gets data back on the same
stack. In data flow programs each element gets data from a queue

and puts data to other queues.

Another difference is that the connectivity of traditional
programs is deeply embedded in the code. To pass data from A to
B, A calls B. That is, the caller has to specify where the data
goes. In data flow programs the connectivity can be separate
from the code. A doesn't pass data directly to B; instead it
passes data to the run time system which in turn passes the data
to B. The caller does not have to specify where the data goes.

As a result data flow programs can use different languages for
the internal implementation of the computational elements and for

the connectivity. In fact, it is common for data flow languages
to be graphical.

ADVANTAGES AND DISADVANTAGES

What are the advantages and disadvantages of data flow
programming?

Some significant advantages:

* Concurrency and parallelism are natural. Code can be
distributed between cores and even across networks. Many of the
problems associated with threads disappear.

* Data flow networks are natural for representing process. This
is particularly true for transaction processing.

* Message passing gets rid of the problems associated with shared
memory and locks.

* Data flow programs are more extensible than traditional
programs. Elements can be readily composed into composite
elements from the bottom up rather than top down.

Some significant disadvantages:

* The mindset of data flow programming is unfamiliar to most
professional programmers. Most dataflow programming languages
are niche languages used by non-professional programer users.

* The intervention of the run time system can be expensive. The
great advantage of LIFO semantics is that the implementing code
is immediate and cheap.

* Not using shared memory has costs. Either messages must be
copied or they must be immutable.

* Using data flow programming in the large pretty much requires
that it be used from the start. That is, converting traditional
programs into data flow programs is difficult because the
structuring is so different.


Richard Harter, c...@tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
Kafka wasn't an author;
Kafka was a prophet!

user923005

unread,
Oct 26, 2009, 8:04:36 PM10/26/09
to
On Oct 26, 3:22 pm, c...@tiac.net (Richard Harter) wrote:
> SHOULD IT BE TURTLES ALL THE WAY UP?
>
> In the famous anecdote, the little old lady replies to the noted
> philosopher, "It's turtles all the way down."  When it comes to
> writing software many writers on software design and many
> programming language creators seem to believe that it is turtles
> all the way up.
>
> What do I mean by "turtles all the way up"?  By this I mean the
> thesis that the techniques and programming language concepts that
>
> are used in the small can be extended indefinitely to programming
>
> in the large.  In other words if we use language X for 100 line
> programs and 1000 line programs we can also use it for 1000000
> line programs.  We may have to add some extensions and new
> features along the way, but we can increase the size of the
> programs we write indefinitely using the same ideas.

I've never seen a convincing argument to show that this is wrong.

We can use a 26 letter alphabet to make little words.
We can use a 26 letter alphabet to make bigger words.
We can use a 26 letter alphabet to make little paragraphs.
We can use a 26 letter alphabet to make bigger paragraphs.
We can use a 26 letter alphabet to make little books.
We can use a 26 letter alphabet to make bigger books.
We can use a 26 letter alphabet to make entire libraries.

Why isn't the same thing true of programming languages?

Now, I admit that if our tiny building blocks are poorly made, the
bigger building blocks become more and more fragile.
But that is true no matter which programming language we use to build
the smallest blocks. And it is always a tragic mistake to try to make
one big giant block that does it all (Forth metaphor is not an
exception because the mega word comes from its babies).

I also don't think it matters which direction we build the turtles, as
long as they make it from top to bottom.

> Richard Harter, c...@tiac.nethttp://home.tiac.net/~cri,http://www.varinoma.com

Mok-Kong Shen

unread,
Oct 27, 2009, 4:05:57 AM10/27/09
to
Richard Harter wrote:
[snip]

> Some significant advantages:
>
> * Concurrency and parallelism are natural. Code can be
> distributed between cores and even across networks. Many of the
> problems associated with threads disappear.
[snip]

To be able to naturally deal with concurrency and parallelism
is excellent. But wouldn't it be, on the other hand, somewhat
inconvenient to program processes that requires sequentiality?
Should or could there be an adequate interface between data flow
and conventional programming languages?

M. K. Shen

Dmitry A. Kazakov

unread,
Oct 27, 2009, 4:55:22 AM10/27/09
to
On Mon, 26 Oct 2009 22:22:14 GMT, Richard Harter wrote:

> Some significant disadvantages:
>
> * The mindset of data flow programming is unfamiliar to most
> professional programmers. Most dataflow programming languages
> are niche languages used by non-professional programer users.
>
> * The intervention of the run time system can be expensive. The
> great advantage of LIFO semantics is that the implementing code
> is immediate and cheap.
>
> * Not using shared memory has costs. Either messages must be
> copied or they must be immutable.
>
> * Using data flow programming in the large pretty much requires
> that it be used from the start. That is, converting traditional
> programs into data flow programs is difficult because the
> structuring is so different.

* Low abstraction level, basically lacking any abstraction. Data flow works
with primitive built-in atomic types forced into value semantics.

* Already mentioned inefficiency because of value semantics (i.e. either
permanent marshaling inputs, or else blocking the producers or the
consumers when data are shared) Normally it is the programmer to choose how
to protect states. With the dataflow it must be the engine to decide.
Basically it is lack of abstraction again. Working with *data* instead of
*states*, that summaries it.

* The networks of wired blocks cannot be encapsulated into reusable opaque
primitives. You can group blocks, that's it. Usual methods of decomposition
do not work, because there is a firewall between "block" and "data". You
cannot merge them (like in OO) into a reusable entity (type = values +
behavior).

* Unmaintainable code. Look at large data flow programs (e.g. in DiaDem,
Simulink, LabView). It is impossible to use reasonable software processes
on them. How to compare two diagrams? When are they semantically
equivalent? How do I validate a diagram? How to search for anything in a
diagram? Another example of this problem is represented by GUI libraries,
which are mostly event controlled. Practically none of them can be easily
used in multitasking environment. It is a horror to debug hangups or
generators in messages routed from one messages loop to another. And the
proposal is to build *everything* this way. Shudder.

* Semantic problems, the barriers of a block that fire the computation of
the block's output is a very primitive model that does not scale in
reality. It almost instantly goes into a mess of deadlocking vs. race
condition choices. And see above, it practicably cannot be debugged except
for trivial cases (without feedbacks, with all inputs synchronous etc) it
cannot be decomposed into simpler tasks...

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

Mok-Kong Shen

unread,
Oct 27, 2009, 6:47:35 AM10/27/09
to
Richard Harter wrote:

> Some significant advantages:
>
> * Concurrency and parallelism are natural.

I guess that such a language could profitably be used to simulate
the neuronal network of the brain. Could that be right?

M. K. Shen

Joshua Cranmer

unread,
Oct 27, 2009, 10:09:25 AM10/27/09
to
On 10/26/2009 06:22 PM, Richard Harter wrote:
> The antithesis is that it isn't turtles all the way up, or at
> least it shouldn't be turtles all the way up. That is, the kinds
> of languages and programming technologies that we use in
> large scale programming should be quite different from the kind
> of languages used for programming in the small.

I think most people would agree that methodologies for small-scale
programs are different from those for large-scale programs, although
there might be considerable disagreement over the dividing line. I
certainly am not going to be used a full-blown OOP paradigm for writing
even something like an automatic code generator; on the other hand, I
shudder at the thought of writing something so complex as an operating
system in a functional paradigm.

An interesting project would be to take a large corpus of various mature
open-source programs of varying size and see if there is a correlation
between size and the use of certain language features or patterns.

> One difference between traditional programs and data flow
> programs is that traditional programs use LIFO semantics whereas
> data flow programs use FIFO semantics. That is, a traditional
> program puts data on a stack and gets data back on the same
> stack. In data flow programs each element gets data from a queue
> and puts data to other queues.

One nit to point out: at this stage, many programs don't follow a
procedural model of programming. OOP is the dominant paradigm, and I
don't see the sequence of data flow as being LIFO. Yet you continually
refer to procedural programming as those that make up `traditional
programs.'

> Another difference is that the connectivity of traditional
> programs is deeply embedded in the code. To pass data from A to
> B, A calls B. That is, the caller has to specify where the data
> goes. In data flow programs the connectivity can be separate
> from the code. A doesn't pass data directly to B; instead it
> passes data to the run time system which in turn passes the data
> to B. The caller does not have to specify where the data goes.

Two words: function pointers. You can also go with virtual or dynamic
method dispatch, but function pointers is shorter and sounds better.

> * Concurrency and parallelism are natural. Code can be
> distributed between cores and even across networks. Many of the
> problems associated with threads disappear.

They don't disappear, they're pushed into the runtime system. From
practical experience, that means they probably bubble up and annoy you
in the programming language.

> * Data flow networks are natural for representing process. This
> is particularly true for transaction processing.

Maybe I'm just being a stupid idiot here, but I don't see how data flow
is natural for representing some common processes. For example, the
HTML5 tokenization process. I suppose you can flow current-state output
back around and into itself as next-state input, but that's not exactly
natural.

This also brings up a question of how the language deals with loops in
the dependency graph. The solutions I see either bring back the problems
with threading or could create unreliability.

> * Message passing gets rid of the problems associated with shared
> memory and locks.

Just as credit default swaps and other financial machinations got rid of
risk :-). From what I can tell, similar problems arise, but they're in a
different form (data flow dependency graphs, particularly the fact that
they're typically not acyclic).

> * Data flow programs are more extensible than traditional
> programs. Elements can be readily composed into composite
> elements from the bottom up rather than top down.

Even if you consider good old procedural programming, I would say that
the exact same thing is easily doable in traditional programs. I can
take code that does asynchronous socket reads and turn it into a MIME
decoder by creating a module that calls the asynchronous socket read
module appropriately. It's also an example of your "bottom-up composition."

> Some significant disadvantages:


>
> * Using data flow programming in the large pretty much requires
> that it be used from the start. That is, converting traditional
> programs into data flow programs is difficult because the
> structuring is so different.

I don't see it that way. Ultimately, most libraries are merely "pass X
in as input, get Y out as output"--it shouldn't be too hard to make that
an atomic data-flow program block.

--
Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth

gremnebulin

unread,
Oct 27, 2009, 1:46:46 PM10/27/09
to

Hmm. Trad languages aren't stack-based in the sense
FORTH is, with explicit PUSH and POP instructions.
Parameters and return values sit on the stack for
implementational convenience. You could re-implement
a trad language with mini-queues that are only one message long.
(Thus making the FIFO/LIFO distinction irrelevant). Indeed,
OO languages oftens see method calls as message passing.
(http://c2.com/cgi/wiki?AlanKaysDefinitionOfObjectOriented)

The stacking seems to come in with a rule that
replies/returns are always to the sender/caller. Such a rule would,
I think, lead to emergent stack-like behaviour even without
a low-level stack in the implementation. It is interesting to consider
how recursion could be implemented in a dataflow language.


> * Using data flow programming in the large pretty much requires
> that it be used from the start. That is, converting traditional
> programs into data flow programs is difficult because the
> structuring is so different.


Counterexample:

ls | sort

gremnebulin

unread,
Oct 27, 2009, 1:54:34 PM10/27/09
to

what do you make libraries into?

At each stage in your
list some extra organising structure is introduced. A library
has shelves and indexes, it is not a single gigantic book or trillion-
word sentence.

So the answer to "why not" is "we have reached the highest organising-
pricniple
allowed by the language, and we still have more to organise". (what do
you make libraries into? )

Currently we are struggling to cope with applications spread accross
multiple nodes and processores, a situation which was not forseen
in most traditonal HLLs.

Ray

unread,
Oct 27, 2009, 1:50:57 PM10/27/09
to
Richard Harter wrote:

> There are many propositions as to what those large scale
> technologies should be, and many such technologies in use. Here
> I am going to look at data flow languages and data flow
> structuring of programs.

I think dataflow programming deserves some more exploration than
it's had. Right now, piping things between shell commands is all
the dataflow programming most of us do, and that's pretty trivial.
Also, the languages involved are pretty heavily constrained by
what they do and the very limited uses to which they are put.
The dataflow parts of shell programming, for example, are
effectively monotyped, with the datatype they work on being the
"line of text."

We haven't really seen examples of "modern" dataflow
languages (with, eg, user-defined types, good tools, proper
debugging interfaces, an asynchronous execution model,
queue-length and queued-data age monitoring, function types,
encapsulation, etc).

It's worth a research project or two, certainly. The semantic
primitives we use to describe type systems, for example, take on
different meanings, whose theorems and corollaries have not yet
been fully examined, in the absence of function calls as we
understand them in OO, functional, and imperative languages.

But I think I need to take exception to some of your claims.

> * Data flow networks are natural for representing process.  This
> is particularly true for transaction processing.

And particularly untrue for naturally-recursive processes such as
formal language parsing, etc. Also? The reason they're very
natural for transaction processing is because they are BUILT ON
TOP OF transaction processing systems. All those operations on
queues are data commits that the runtime has to manage. A typical
dataflow language is mostly a way of customizing the transaction
processing system on which it's built to the needs of a particular
application.

> * Message passing gets rid of the problems associated with shared
> memory and locks.

Sort of. In practice the runtime has to manage the queues, and it
has to mediate contention between processes that want to write to
them and between processes that want to read them. This may allow
you to not notice the locks etc, but it doesn't remove them. On
the other hand, good dataflow design can often limit queues to a
single producer and a single consumer.

> * Data flow programs are more extensible than traditional
> programs.  Elements can be readily composed into composite
> elements from the bottom up rather than top down.

Got any experimental results? Frankly I don't see one as
being superior to the other, and I don't see how "bottom-up"
or "top-down" designs are more or less favored by either
paradigm. If you're going to make a claim like this, you
should do the study to try and prove it.

"Top-down" vs. "Bottom-up" IME are usually more influenced
by programming tools (like, eg, a REPL or otherwise
interactive environment as opposed to a static compiler).

Bear

gremnebulin

unread,
Oct 27, 2009, 1:57:19 PM10/27/09
to
On 27 Oct, 08:05, Mok-Kong Shen <mok-kong.s...@t-online.de> wrote:

> To be able to naturally deal with concurrency and parallelism
> is excellent. But wouldn't it be, on the other hand, somewhat
> inconvenient to program processes that requires sequentiality?
> Should or could there be an adequate interface between data flow
> and conventional programming languages?


The transforms that operate on flows are black boxes
and could be written in PP, OO, or functional languages.
The DF system just has to know what they can handle.

robert...@yahoo.com

unread,
Oct 27, 2009, 7:52:25 PM10/27/09
to
On Oct 26, 5:22 pm, c...@tiac.net (Richard Harter) wrote:
> What do I mean by "turtles all the way up"?  By this I mean the
> thesis that the techniques and programming language concepts that
>
> are used in the small can be extended indefinitely to programming
>
> in the large.  In other words if we use language X for 100 line
> programs and 1000 line programs we can also use it for 1000000
> line programs.  We may have to add some extensions and new
> features along the way, but we can increase the size of the
> programs we write indefinitely using the same ideas.
> <p>
> The antithesis is that it isn't turtles all the way up, or at
> least it shouldn't be turtles all the way up.  That is, the kinds
>
> of languages and programming technologies that we use in
> large scale programming should be quite different from the kind
> of languages used for programming in the small.


While this is perhaps true, it's completely unclear what we need to
successfully manage large projects. Sure a few things do help (like
strong type, memory safety, good support for interfaces), but those
also help on all but the smallest projects as well.

At the level of several tens of thousands of lines of code, it's all
irrelevant. Basically any language and methodology, including winging
it, will work. At the million line level, there simply isn't any
demonstrated reliable* methodology, unless you can break up the system
into many small and highly independent pieces (consider the library of
tens of thousands of device drivers on some modern OSs).

No magic bullets, and all that…


*where reliable is defined as producing something approximating a
working version of the desired system in sometime approximating the
planned time and budget.

Pascal J. Bourguignon

unread,
Oct 27, 2009, 8:25:35 PM10/27/09
to
"robert...@yahoo.com" <robert...@yahoo.com> writes:

> [...] At the million line level, there simply isn't any


> demonstrated reliable* methodology, unless you can break up the system
> into many small and highly independent pieces (consider the library of
> tens of thousands of device drivers on some modern OSs).

Well, there's one proven methodology: the compiler. That is,
metaprogramming. I can write working programs of one million lines of
assembler any day. Only it's not me who will write this million of
assembler lines each day, it's the compiler. I will only write 10,000
lines of source code. Now if you need to write a million of source
line, then just don't do it. Use metaprogramming to generate this
million of source lines from a smaller source. And so on, you can add
layers of metaprogramming all you need to compact your sources and
always have something of manageable size.

--
__Pascal Bourguignon__

Tim Little

unread,
Oct 27, 2009, 8:57:43 PM10/27/09
to
On 2009-10-28, Pascal J. Bourguignon <p...@informatimago.com> wrote:
> Now if you need to write a million of source line, then just don't
> do it. Use metaprogramming to generate this million of source lines
> from a smaller source. And so on, you can add layers of
> metaprogramming all you need to compact your sources and always have
> something of manageable size.

So, you're saying that *every* programming problem can be solved in at
most a few tens of thousands of lines of code?

Certainly some problems can, but most can't. Metaprogramming is just
a form of compression, and there is no compression system that can
reduce every source below a given size. Some problems really are
irreducibly complex, and demand complex solutions.


- Tim

jpwoodruff

unread,
Oct 27, 2009, 9:32:08 PM10/27/09
to

In one domain - evaluating large mathematical expressions - dataflow
languages express their programs well. One complete step in program
design - memory map design - does not occur at all.


That's where the early work on Sisal was directed if I'm not mistaken.
Research on these mathematical programs - and the search for execution
speed - still continues.


Part of the charm ascribed to "dataflow machines" was their
parallelism. Parallelism was to be as inherent in the execution as in
the programming. That was to be a fundamental goal of the computer
hardware for these machines.

I think that an associative processor to be used for matching up
tokens in the (executable) dataflow graph does not exist in
large-scale hardware. Instead it is simulated by the "run time" that
was mentioned earlier. That's just not the same thing at all.

John

Pascal J. Bourguignon

unread,
Oct 28, 2009, 12:32:08 AM10/28/09
to
Tim Little <t...@little-possums.net> writes:

> On 2009-10-28, Pascal J. Bourguignon <p...@informatimago.com> wrote:
>> Now if you need to write a million of source line, then just don't
>> do it. Use metaprogramming to generate this million of source lines
>> from a smaller source. And so on, you can add layers of
>> metaprogramming all you need to compact your sources and always have
>> something of manageable size.
>
> So, you're saying that *every* programming problem can be solved in at
> most a few tens of thousands of lines of code?

Can you not specify all programming problem in less that a few
thousands of lines of specification?

Well, you can always write more detailed specifications, but I can
assure you that sales peoples will always be able to put the whole
specifications of your software on a 2-page booklet.


> Certainly some problems can, but most can't. Metaprogramming is just
> a form of compression, and there is no compression system that can
> reduce every source below a given size. Some problems really are
> irreducibly complex, and demand complex solutions.

Yes indeed. However, assuming a big ontology (eg. take wikipedia, or
even the whole web), wouldn't it be possible to express the needs for
any software in less than ten thousands lines, and let the
sufficiently smart system develop it, filling in the blanks in the
specifications with all the knowledge it can extract from the web?


Or take the problem actually in the other dirrection. Would you trust
any implementation of a system that has orders of magnitude more
than ten thousand lines of specifications? How can you ensure these
specifications are consistent? How can you ensure that they're
effectively implemented?

Wouldn't you be more able to understand and check the specifications
if they were shorter, that is indeed, given the ultimate limits to
compression, if what they specified was less complex or of a more
limited scope?


If you accept that big systems must be decomposed into small programs,
you should probably also accept that big specifications are as bad as
big program(*), and that they should be short too, to be
understandable and effectively implementable. Then the degree of
automatization in the process of translating the specifications into
executable code is only a matter of advancement of the techniques,
while the size of the executable code is only (roughly) a function of
the number of metaprogramming levels used.


(*) Specifications = programs, as in: high level, declarative programs.
--
__Pascal Bourguignon__

robert...@yahoo.com

unread,
Oct 28, 2009, 2:22:45 AM10/28/09
to
On Oct 27, 11:32 pm, p...@informatimago.com (Pascal J. Bourguignon)
wrote:

> Tim Little <t...@little-possums.net> writes:
> > On 2009-10-28, Pascal J. Bourguignon <p...@informatimago.com> wrote:
> >> Now if you need to write a million of source line, then just don't
> >> do it.  Use metaprogramming to generate this million of source lines
> >> from a smaller source.  And so on, you can add layers of
> >> metaprogramming all you need to compact your sources and always have
> >> something of manageable size.
>
> > So, you're saying that *every* programming problem can be solved in at
> > most a few tens of thousands of lines of code?
>
> Can you not specify all programming problem in less that a few
> thousands of lines of specification?


Well sure. I can always write the spec as "Implement the tax code of
the United States." Doesn't actually help that the actual tax code is
somewhere on the order of half a million lines of text.

Plenty of real world problems, and probably including a large fraction
of the ones you can actually get paid to write programs to tackle, are
fundamentally inexpressible in a few thousand lines of specification.


> Well, you can always write more detailed specifications, but I can
> assure you that sales peoples will always be able to put the whole
> specifications of your software on a 2-page booklet.
>
> > Certainly some problems can, but most can't.  Metaprogramming is just
> > a form of compression, and there is no compression system that can
> > reduce every source below a given size.  Some problems really are
> > irreducibly complex, and demand complex solutions.
>
> Yes indeed.  However, assuming a big ontology (eg. take wikipedia, or
> even the whole web), wouldn't it be possible to express the needs for
> any software in less than ten thousands lines, and let the
> sufficiently smart system develop it, filling in the blanks in the
> specifications with all the knowledge it can extract from the web?


Well, that's always been the fantasy. More to the point, something
like Cyc might be a more appropriate base for something like that.
Still, we have no idea how to accomplish any of that.


> Or take the problem actually in the other dirrection.  Would you trust
> any implementation of a  system  that has orders of magnitude more
> than ten thousand lines of specifications?  How can you ensure these
> specifications are consistent?  How can you ensure that they're
> effectively implemented?


It's hard, hard, hard. And folks who attempt systems of that level of
complexity fail with a distressingly high frequency. Even the
specification of a simple language like C is on the order of 10,000
lines. The C99 standard for the language - not counting the library,
is about 200 pages of not particularly easy text (and the library is a
similar amount), add to that the specification of the target processor
(just volume 2 of Intel's x86 reference, which is just the instruction
descriptions, and not any of the architectural stuff, is some 1600
pages).


> Wouldn't you be more able to understand and check the specifications
> if they were shorter, that is indeed, given the ultimate limits to
> compression, if what they specified was less complex or of a more
> limited scope?


Without doubt our ability to build big software systems has grown.
It's been pretty slow, but many systems can now be implemented with
many fewer LOCs than say 20 years ago. Much of this is from various
forms of reuse, much is from tools (SCM, automated testing, etc.), and
a lot is just plain hard work and money.

And reuse is really hard too. The SQL statement (select xxx from yyy
where zzz...) is clearly rather more expressive than the single ISAM
statement (read record from file where key is...) in a Cobol program,
but not by nearly the difference in the difficulty of implementing DB2
vs. implementing the ISAM package. And it's hard to count more than a
tiny fraction of most components in a system as effective “reuse” -
most reusable components are massively overengineered for any
particular task (for example, consider a 100,000 line program that
uses the a million line database engine to store the data, vs. a
200,000 line program that uses a 50,000 line ISAM engine to do the
same thing - the database engine is certainly not providing a million
lines of value to the first program). I'm not saying that's wrong -
just that almost any particular use of them uses only a small part of
their functionality.

In a lot of ways the number of LOCs a person or team of a particular
size can write has increased only modestly over the decades. It *has*
increased a bit (mainly because of improvements in the tools), but far
less than we'd like. This was first noticed when the first compilers
came out. It was pretty obvious that the typical programmer could
write x lines of code per day, and it didn't matter if it was
assembler or Cobol or Fortran. Obviously the situations where in
increased expressiveness of the HLL could be used made them a huge win
in programmer productivity. Unfortunately, excepting the big bang
from the first compilers, that increase in expressiveness has been
slow. Although over five decades, the compound increase has been
substantial. It hard to put a number on it, but it's probably been in
the ball park of a 100-fold increase compared to the guys writing
everything in assembler in 1959.

Yet we keep building bigger systems than we can do comfortably,
because we want them to do more. The new Boeing 787 is reported to
have 6.5 million lines of code in the various aircraft systems (not
counting things like the passenger entertainment systems). And I can
assure you that Boeing doesn't have any desire to take on a software
project that big, but rather it's what they needed to do to accomplish
their goals of building a next generation airliner ('course they're
still working on that).

Assuming the trend continues (and I think the evidence is to the
contrary) 50 years from now we'll have teams of half a dozen people
able to reliably tackle today’s million line projects (and
implementing them in a few tens of thousands of lines of code). Of
course then we'll be struggling with what today would be (impossible)
billion line projects.


> If you accept that big systems must be decomposed into small programs,
> you should probably also accept that big specifications are as bad as
> big program(*), and that they should be short too, to be
> understandable and effectively implementable.  Then the degree of
> automatization in the process of translating the specifications into
> executable code is only a matter of advancement of the techniques,
> while the size of the executable code is only (roughly) a function of
> the number of metaprogramming levels used.


Good specifications are basically of similar difficulty to write as
the actual code. This is the downfall of many formal methods - a
specification complete enough that allows an automated checker of some
sort to verify that the software implements the specification is
basically as hard to write as the software. And as hard to verify
that it's correct.

Decomposition has been understood to be the basis for system
development since, oh, I don't know, the late fifties? The problem is
that decomposing a large and complex system is difficult to do without
fully understanding the whole system. And small errors at that stage
can turn into huge millstones that are extremely difficult to
correct. And then reality keeps rearing its ugly head and tends to
create unpleasant coupling between the neatly decomposed modules.

There have been new fads/methodologies/techniques/technologies/
whatever every few years for decades which have promised order-of-
magnitude gains in productivity. Other than compilers for HLLS, it
simply hasn’t happened. Many of the techniques have proven useful,
and have been added to everyone’s toolkit, but like Brooks said… No
silver bullets.

Mok-Kong Shen

unread,
Oct 28, 2009, 4:14:04 AM10/28/09
to
Pascal J. Bourguignon wrote:
> Tim Little wrote:
>
[snip]

> If you accept that big systems must be decomposed into small programs,
> you should probably also accept that big specifications are as bad as
> big program(*), and that they should be short too, to be
> understandable and effectively implementable. Then the degree of
> automatization in the process of translating the specifications into
> executable code is only a matter of advancement of the techniques,
> while the size of the executable code is only (roughly) a function of
> the number of metaprogramming levels used.

I barely know anything in this direction. But isn't it that UML
is supposed to be able to do stepwise refinement of design up
to the point of transition to executable codes? I saw long time
ago also a leaflet advertising a different commercial system
that claims to include even the code generation step.

M. K. Shen

Mok-Kong Shen

unread,
Oct 28, 2009, 4:59:39 AM10/28/09
to
robert...@yahoo.com wrote:
[snip]

> Yet we keep building bigger systems than we can do comfortably,
> because we want them to do more. The new Boeing 787 is reported to
> have 6.5 million lines of code in the various aircraft systems (not
> counting things like the passenger entertainment systems). And I can
> assure you that Boeing doesn't have any desire to take on a software
> project that big, but rather it's what they needed to do to accomplish
> their goals of building a next generation airliner ('course they're
> still working on that).
[snip]

> Good specifications are basically of similar difficulty to write as
> the actual code. This is the downfall of many formal methods - a
> specification complete enough that allows an automated checker of some
> sort to verify that the software implements the specification is
> basically as hard to write as the software. And as hard to verify
> that it's correct.

Verification seems to be an inherently unachievable goal. If one
uses a software to do verification, that software itself evidently
has to be verified. So I personally wonder how absolute correctness
could be achieved in practice at all. Anyway, a recent news said that
some 7000 lines of code of the kernel of an operating system has been
verified with a sophisticated verification system plus some 6 man years.

M. K. Shen

Joshua Cranmer

unread,
Oct 28, 2009, 8:05:09 AM10/28/09
to
On 10/28/2009 12:32 AM, Pascal J. Bourguignon wrote:
> Can you not specify all programming problem in less that a few
> thousands of lines of specification?
>
> Well, you can always write more detailed specifications, but I can
> assure you that sales peoples will always be able to put the whole
> specifications of your software on a 2-page booklet.

The PDF 1.6 specification was, if I recall correctly, approximately 1400
pages of text. My draft of C++0 weighs in at a whopping 1314 pages. The
JVM spec is smaller, at only 542 pages. The full x86_64 ISA comes in
volumes; even the ancient MC6800 ISA took 40 pages or so to explain.

Many specifications, to approach the degree of completeness required for
independent implementation, need to drag on for hundreds or thousands of
pages.

Pascal J. Bourguignon

unread,
Oct 28, 2009, 11:38:32 AM10/28/09
to
Mok-Kong Shen <mok-ko...@t-online.de> writes:

> robert...@yahoo.com wrote:
> [snip]
>> Yet we keep building bigger systems than we can do comfortably,
>> because we want them to do more. The new Boeing 787 is reported to
>> have 6.5 million lines of code in the various aircraft systems (not
>> counting things like the passenger entertainment systems). And I can
>> assure you that Boeing doesn't have any desire to take on a software
>> project that big, but rather it's what they needed to do to accomplish
>> their goals of building a next generation airliner ('course they're
>> still working on that).
> [snip]
>> Good specifications are basically of similar difficulty to write as
>> the actual code. This is the downfall of many formal methods - a
>> specification complete enough that allows an automated checker of some
>> sort to verify that the software implements the specification is
>> basically as hard to write as the software. And as hard to verify
>> that it's correct.
>
> Verification seems to be an inherently unachievable goal. If one
> uses a software to do verification, that software itself evidently
> has to be verified.

Happily, the verification software may be simplier than the verified
software. Therefore it should be easier to verify. Possibly by a
second order verification software that is itself even simplier to
verify. And so on, until you can have it verified it by hand by as
many human you need to reach the required degree of confidence.


> So I personally wonder how absolute correctness
> could be achieved in practice at all. Anyway, a recent news said that
> some 7000 lines of code of the kernel of an operating system has been
> verified with a sophisticated verification system plus some 6 man years.
>
> M. K. Shen

--
__Pascal Bourguignon__

Pascal J. Bourguignon

unread,
Oct 28, 2009, 11:43:00 AM10/28/09
to
Mok-Kong Shen <mok-ko...@t-online.de> writes:

All the techniques developed so far are only one step techniques.
Useless.

What we really need is metaprogramming. The only usable
metaprogramming solution I know is Lisp and its macros. With
metaprogramming, you can implement as many steps you need to keep the
complexity in check.


--
__Pascal Bourguignon__

Pascal J. Bourguignon

unread,
Oct 28, 2009, 11:44:55 AM10/28/09
to
Joshua Cranmer <Pidg...@verizon.invalid> writes:

> On 10/28/2009 12:32 AM, Pascal J. Bourguignon wrote:
>> Can you not specify all programming problem in less that a few
>> thousands of lines of specification?
>>
>> Well, you can always write more detailed specifications, but I can
>> assure you that sales peoples will always be able to put the whole
>> specifications of your software on a 2-page booklet.
>
> The PDF 1.6 specification was, if I recall correctly, approximately
> 1400 pages of text. My draft of C++0 weighs in at a whopping 1314
> pages. The JVM spec is smaller, at only 542 pages. The full x86_64 ISA
> comes in volumes; even the ancient MC6800 ISA took 40 pages or so to
> explain.
>
> Many specifications, to approach the degree of completeness required
> for independent implementation, need to drag on for hundreds or
> thousands of pages.

Why should you care about the detail of the micro-instructions of your
MC6800? Cannot you write a specification for a microprocessor in two
lines? Let the system care about the details itself.

--
__Pascal Bourguignon__

Richard Harter

unread,
Oct 28, 2009, 12:08:06 PM10/28/09
to
On Mon, 26 Oct 2009 17:04:36 -0700 (PDT), user923005
<dco...@connx.com> wrote:

>On Oct 26, 3:22=A0pm, c...@tiac.net (Richard Harter) wrote:
>> SHOULD IT BE TURTLES ALL THE WAY UP?
>>
>> In the famous anecdote, the little old lady replies to the noted

>> philosopher, "It's turtles all the way down." =A0When it comes to


>> writing software many writers on software design and many
>> programming language creators seem to believe that it is turtles
>> all the way up.
>>

>> What do I mean by "turtles all the way up"? =A0By this I mean the


>> thesis that the techniques and programming language concepts that
>>
>> are used in the small can be extended indefinitely to programming
>>

>> in the large. =A0In other words if we use language X for 100 line


>> programs and 1000 line programs we can also use it for 1000000

>> line programs. =A0We may have to add some extensions and new


>> features along the way, but we can increase the size of the
>> programs we write indefinitely using the same ideas.
>
>I've never seen a convincing argument to show that this is wrong.
>
>We can use a 26 letter alphabet to make little words.
>We can use a 26 letter alphabet to make bigger words.
>We can use a 26 letter alphabet to make little paragraphs.
>We can use a 26 letter alphabet to make bigger paragraphs.
>We can use a 26 letter alphabet to make little books.
>We can use a 26 letter alphabet to make bigger books.
>We can use a 26 letter alphabet to make entire libraries.

>
>Why isn't the same thing true of programming languages?

It is. We can use 1's and 0's to build software.

Similarly human brains are made out of atoms.

Nothing wrong with the argument except that it is not relevant.
As the Jedi Master said, These are not the turtles you want.

Richard Harter, c...@tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com

Dmitry A. Kazakov

unread,
Oct 28, 2009, 12:29:04 PM10/28/09
to

Yes, but that is the machine language. When communicating ideas about
software to other people we are using natural languages.

> Similarly human brains are made out of atoms.

So is the ink used to print books. This is an invalid comparison. Atoms is
the "language" of the Universe, considering the latter as a machine. Human
brain does not operate atoms, its language is different.

> Nothing wrong with the argument except that it is not relevant.
> As the Jedi Master said, These are not the turtles you want.

The argument is that it is not clear what a hierarchy of programming
languages adds. As we know it adds nothing to the power, rather subtracts,
since many "higher level" language are not Turing-complete. One can argue
that they provide a better mapping to the domain, but the domain is
actually always in the brain of a human programmer. So it looks rather so
that we cannot adequately communicate to humans. Even considering
multi-paradigm/language approach as a kind of patchwork to cover this
nakedness. It is still not convincing why the patchwork should lay patches
over patches and in so many layers. It won't hold better, it won't make you
warmer...

Mok-Kong Shen

unread,
Oct 28, 2009, 12:45:19 PM10/28/09
to
Pascal J. Bourguignon wrote:
> Mok-Kong Shen writes:

>> Verification seems to be an inherently unachievable goal. If one
>> uses a software to do verification, that software itself evidently
>> has to be verified.
>
> Happily, the verification software may be simplier than the verified
> software. Therefore it should be easier to verify. Possibly by a
> second order verification software that is itself even simplier to
> verify. And so on, until you can have it verified it by hand by as
> many human you need to reach the required degree of confidence.

Sure, this is the very philosophy underlying program verifications.
But how "sure" can a human ensure that everything he designs/constructs
is correct? Certain industries, e.g. aerospace manufacturers, chip
manufacturers etc., that critically depend on the correctness of
their design software, have since long invested much resources in
verification. Yet I believe Murphy's Law remains true and human
errors can never be entirely eradicated but only (hopefully) reduced
to tolerable levels. (Or else why disasters did occur e.g. in
a few very carefully designed space missions?)

M. K. Shen

Mok-Kong Shen

unread,
Oct 28, 2009, 12:52:17 PM10/28/09
to
Pascal J. Bourguignon wrote:
[snip]

> What we really need is metaprogramming. The only usable
> metaprogramming solution I know is Lisp and its macros. With
> metaprogramming, you can implement as many steps you need to keep the
> complexity in check.

I don't like to enter into debates about programming languages,
which tend normally to ahve a flavour akin to religious debates.
For fans of C++ (I am not one of these) would very likely disagree
with you.

M. K. Shen

Richard Harter

unread,
Oct 28, 2009, 1:54:01 PM10/28/09
to

True enough, which was the point.


>
>> Nothing wrong with the argument except that it is not relevant.
>> As the Jedi Master said, These are not the turtles you want.
>
>The argument is that it is not clear what a hierarchy of programming
>languages adds.

You are missing the point. He made an observation and called it
an argument. It wasn't. He did go on to question whether a
hierarchy of programming languages had value. However his
subsequent paragraph had no significant connection to his
observation.

[snip rhetoric]

Richard Harter

unread,
Oct 28, 2009, 2:01:44 PM10/28/09
to

Just so. I should have pointed that out in the original. We get
more speed and power by distributing processing. The price is
that we have to deal with distributed programming.

A useful paper in context is "The Problem with Threads" by Edward
A. Lee. The URL is
http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.pdf

Ray

unread,
Oct 28, 2009, 2:16:13 PM10/28/09
to
Pascal J. Bourguignon wrote:

> Why should you care about the detail of the micro-instructions of your
> MC6800? Cannot you write a specification for a microprocessor in two
> lines? Let the system care about the details itself.

Problem: you have a few thousand binaries written for this
CPU that nobody produces anymore; you need to write software
that runs them in emulation.

No "two-line specification" is going to contain enough
information to allow you to write software that will emulate
that CPU sufficiently well that all these programs work.

Likewise, if your aim is to produce a C compiler, you will
need *ALL* of the specification of the C language as part
of your specification.

Likewise, if you need a stress analysis of the Eiffel Tower,
you will need a complete schematic of the tower - its parts,
their exact dimensions, and the way they fit together.

Specifications simpler than the problems they intend to
solve exist, certainly; but they incorporate, implicitly
or explicitly, additional information whose retrieval and
interpretation I know of no way to automate. If your
boss (or the salesguy) hands you a specification that
you must provide a program that does foo, he is relying
on you the human to know (or be able to find out) what
foo is and how foo is done. And that information, even
though he didn't provide it, is part of the specification
of the problem.

Bear

Joshua Cranmer

unread,
Oct 28, 2009, 2:47:07 PM10/28/09
to
On 10/28/2009 11:44 AM, Pascal J. Bourguignon wrote:
> Why should you care about the detail of the micro-instructions of your
> MC6800? Cannot you write a specification for a microprocessor in two
> lines? Let the system care about the details itself.

It's been two years since I last saw a MC6800 processor, so I don't
recall the exact semantics. But there are effectively around 100 or so
instructions (if you count different addressing modes as different
instructions). If you're trying to write an MC6800 emulator, you'll have
to have a pretty liberal definition of `line' to fit all of that in two
lines.

And the MC6800 is an 8-bit microprocessor designed in 1974. Any emulator
for a modern processor is going to be incredibly more complex. It sounds
like you're saying that I'd have a line somewhere that says "emulate an
XXX processor", but that doesn't cut it for an emulator's specification:
it's a circular definition. At some point, you have to enumerate the
specification of a processor's ISA, which easily takes pages. If you
don't wish to include that in the highest level count, include it in the
specification for the metacompiler that will have to compile the
"emulate" instruction.

user923005

unread,
Oct 28, 2009, 3:34:25 PM10/28/09
to

I made an analogy and never called it an argument.

> It wasn't.  He did go on to question whether a
> hierarchy of programming languages had value.  

I went on to say that we do the same thing with programming
langauges. For instance, in C, I will write simple functions to build
libraries. I will use libraries to build filter programs. I will
pipe filter programs in a chain to accomplish complicated tasks. This
is a metaphor that works pretty well. A lot of computer science seems
to me to be largely mythology. What I mean by that is (for instance)
the notion that OO programming will allow less expensive complicated
systems. It is possible to write complicated systems in OO languages
for a reasonable cost. But empirical studies have shown that the cost
is not lower than writing the same systems in a simple language like C
[1]. So now people are trying other paradigms like Aspect Oriented
programming. I am not saying that alternative paradigms are bad. In
fact, I program almost exclusively in C++ and love the OO metaphor. I
am just saying that a change in paradigm doesn't usually change much.
For instance, in C++, by creating objects we do NOT reduce
complexity. We only hide it.

> However his
> subsequent paragraph had no significant connection to his
> observation.

Perhaps the connection was only clear to me.

> [snip rhetoric]

[1] "Productivity Analysis of Object-Oriented Software Developed in a
Commercial Environment" Thomas E. Potok, see: http://portal.acm.org/citation.cfm?id=326103

robert...@yahoo.com

unread,
Oct 28, 2009, 4:52:05 PM10/28/09
to
On Oct 28, 10:38 am, p...@informatimago.com (Pascal J. Bourguignon)
wrote:

> Happily, the verification software may be simplier than the verified
> software.  Therefore it should be easier to verify.  Possibly by a
> second order verification software that is itself even simplier to
> verify.  And so on, until you can have it verified it by hand by as
> many human you need to reach the required degree of confidence.


The main problem isn't the verifier (although verifying it *is* an
issue), but the specification that you're going to verify against.
That won't be all that much simpler that the program to be verified,
and somehow needs to be verified itself.

Not to say there aren't occasional ways around that. An interesting
effort I read a paper on about a dozen years ago (and then never saw a
follow up), was with someone trying an alternative approach to
verifying the correctness of a compiler. As opposed to attempting to
prove that the compiler was correct, and would always produce correct
code, they attempted (and appeared to be making progress) to prove
that any particular output from the compiler met the "specification"
of the input (the source code being a sort of formal specification, of
course). IOW, their scheme was to run the verifier every time the
compiler ran, and complain if the output was flawed.

mike

unread,
Oct 28, 2009, 7:02:13 PM10/28/09
to
In article <d44aa7fe-fab3-4f04-a379-bba9d37c5ca9
@r5g2000yqb.googlegroups.com>, robert...@yahoo.com says...

> On Oct 26, 5:22 pm, c...@tiac.net (Richard Harter) wrote:
> > What do I mean by "turtles all the way up"?  By this I mean the
> > thesis that the techniques and programming language concepts that
> >
> > are used in the small can be extended indefinitely to programming
> >
> > in the large.  In other words if we use language X for 100 line
> > programs and 1000 line programs we can also use it for 1000000
> > line programs.  We may have to add some extensions and new
> > features along the way, but we can increase the size of the
> > programs we write indefinitely using the same ideas.
> > <p>
> > The antithesis is that it isn't turtles all the way up, or at
> > least it shouldn't be turtles all the way up.  That is, the kinds
> >
> > of languages and programming technologies that we use in
> > large scale programming should be quite different from the kind
> > of languages used for programming in the small.
>
>
> While this is perhaps true, it's completely unclear what we need to
> successfully manage large projects. Sure a few things do help (like
> strong type, memory safety, good support for interfaces), but those
> also help on all but the smallest projects as well.
>
I think that most of the problems inherent in any large-scale
programming project result from the inherent 'fragility' of all
programming languages.

If you compare a large computing project with a large engineering
project there are clear similarities, but one very significant
difference is that almost any undetected error in code has the potential
to result, somewhere down the line, in catastrophic outcomes; whereas if
a nail is not quite hammered in as far as specified or if a light
fitting (or even a window) is put in the wrong place then the building
usually remains safe, functional, and fit for purpose.

If someone has some ideas about a programming language/paradigm that is
fault-resistant, not only in the sense that it reduces the number of
actual bugs and/or errors produced, but also in the sense that many bugs
have no significant effect on function and behaviour then large-scale
projects may be a lot easier to manage.

Whether this will ever be a possibility remains, in my opinion, unknown
to date.

Mike

Richard Heathfield

unread,
Oct 28, 2009, 11:36:56 PM10/28/09
to
In <08799c89-fa49-4efb...@13g2000prl.googlegroups.com>,
user923005 wrote:

> On Oct 28, 10:54 am, c...@tiac.net (Richard Harter) wrote:

<snip>

>> However his
>> subsequent paragraph had no significant connection to his
>> observation.
>
> Perhaps the connection was only clear to me.

Not so. It is perfectly obvious what your point was, and it could be
missed only by someone determined to miss it.

The whole "turtles" analogy is flawed. The alphabet analogy is much
better. It's /letters/ all the way down.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
"Usenet is a strange place" - dmr 29 July 1999
Sig line vacant - apply within

Dmitry A. Kazakov

unread,
Oct 29, 2009, 4:52:30 AM10/29/09
to
On Thu, 29 Oct 2009 12:02:13 +1300, mike wrote:

> If someone has some ideas about a programming language/paradigm that is
> fault-resistant, not only in the sense that it reduces the number of
> actual bugs and/or errors produced, but also in the sense that many bugs
> have no significant effect on function and behaviour then large-scale
> projects may be a lot easier to manage.

That is because the "physics" of the code is different. Small distances in
the normal space mean small powers, and also fewer things you can do (right
or wrong). A tiny piece of code can do a lot, this power is translated into
fragility.

> Whether this will ever be a possibility remains, in my opinion, unknown
> to date.

The evolution of programming languages slowly goes in the direction of
weakening the constructs deployed to accomplish given task (e.g. getting
rid of gotos, constraining operations per datatypes etc). "Least-energy
principle"...

Mok-Kong Shen

unread,
Oct 29, 2009, 6:02:51 AM10/29/09
to
mike wrote:

> I think that most of the problems inherent in any large-scale
> programming project result from the inherent 'fragility' of all
> programming languages.
>
> If you compare a large computing project with a large engineering
> project there are clear similarities, but one very significant
> difference is that almost any undetected error in code has the potential
> to result, somewhere down the line, in catastrophic outcomes; whereas if
> a nail is not quite hammered in as far as specified or if a light
> fitting (or even a window) is put in the wrong place then the building
> usually remains safe, functional, and fit for purpose.
>
> If someone has some ideas about a programming language/paradigm that is
> fault-resistant, not only in the sense that it reduces the number of
> actual bugs and/or errors produced, but also in the sense that many bugs
> have no significant effect on function and behaviour then large-scale
> projects may be a lot easier to manage.
>
> Whether this will ever be a possibility remains, in my opinion, unknown
> to date.

In engineering works there are "factors of safety" to take account
of variability of materials, unpredictability of actual loadings
and inaccuracies in construction etc. etc. In computing, analogous
has been done. For results for control of critical (including
in particular potentially dangerous) processes, one employs
double or triple hardware to take care of the possibility of
malfunction of hardware. As far as I know, in order to better
detect programmer errors, one similarly employs different and
independent teams of programmers to do the same job and then with
test cases to compare the results of the resulting programs.
However, if I don't err, this comparision is only done in the design
phase of the software and later only the work of one of the teams
is selected for practical application. A safer way, I think, would
have these presumably equivalent software (resulting from different
teams, employing preferably different programming languages and
environments) in actual production runs to always work parallelly
on multiple hardware (of possibly different types), so as to futher
reduce the risk of errors, since testing of software in the design
phase might not be thorough enough to uncover all errors that may be
present. Of course, errors could not be "absolutely" eradicated,
in accordance with Murphy's Law.

M. K. Shen

Nick Keighley

unread,
Oct 29, 2009, 7:38:45 AM10/29/09
to
On 26 Oct, 22:22, c...@tiac.net (Richard Harter) wrote:

> WHAT ARE DATA FLOW LANGUAGES?

> Some significant advantages:
>
> * Concurrency and parallelism are natural.  Code can be
> distributed between cores and even across networks.  Many of the
> problems associated with threads disappear.

hurrah!

> Some significant disadvantages:
>
> * The mindset of data flow programming is unfamiliar to most
> professional programmers.  Most dataflow programming languages
> are niche languages used by non-professional programer users.

didn't we use to design programs with data flow diagrams then convert
them into procedural programs...

--
Nick keighley


Nick Keighley

unread,
Oct 29, 2009, 8:00:03 AM10/29/09
to
On 28 Oct, 19:34, user923005 <dcor...@connx.com> wrote:
> On Oct 28, 10:54 am, c...@tiac.net (Richard Harter) wrote:
> > On Wed, 28 Oct 2009 17:29:04 +0100, "Dmitry A. Kazakov"
> > <mail...@dmitry-kazakov.de> wrote:
> > >On Wed, 28 Oct 2009 16:08:06 GMT, Richard Harter wrote:
> > >> On Mon, 26 Oct 2009 17:04:36 -0700 (PDT), user923005
> > >> <dcor...@connx.com> wrote:
> > >>>On Oct 26, 3:22=A0pm, c...@tiac.net (Richard Harter) wrote:


> > >>>> SHOULD IT BE TURTLES ALL THE WAY UP?
>
> > >>>> In the famous anecdote, the little old lady replies to the noted
> > >>>> philosopher, "It's turtles all the way down." =A0When it comes to
> > >>>> writing software many writers on software design and many
> > >>>> programming language creators seem to believe that it is turtles
> > >>>> all the way up.
>
> > >>>> What do I mean by "turtles all the way up"? =A0By this I mean the
> > >>>> thesis that the techniques and programming language concepts that
>
> > >>>> are used in the small can be extended indefinitely to programming
>
> > >>>> in the large. =A0In other words if we use language X for 100 line
> > >>>> programs and 1000 line programs we can also use it for 1000000
> > >>>> line programs. =A0We may have to add some extensions and new
> > >>>> features along the way, but we can increase the size of the
> > >>>> programs we write indefinitely using the same ideas.
>
> > >>>I've never seen a convincing argument to show that this is wrong.
>
> > >>>We can use a 26 letter alphabet to make little words.

<snip>


> > >>>We can use a 26 letter alphabet to make entire libraries.
>
> > >>>Why isn't the same thing true of programming languages?

you don't make a library out of letters. Your own analogy demonstrates
that lack of scalability of the letter.

> > >> It is.  We can use 1's and 0's to build software.
>
> > >Yes, but that is the machine language.

right so we change the unit as we scale


> > > When communicating ideas about
> > >software to other people we are using natural languages.

I use programming languages.

There was the guy that explained to me that a telecommunications
switch was "just shift registers!". Whilst probably true was
unhelpful. This lack of appreciation for scale and distrust of
abstraction seems very odd in field that is all about abstraction.

Boolean algebra is much easier if you just write out the wave equation
for the electron and the hole.


> > >> Similarly human brains are made out of atoms.

quarks! It's quarks all the way down!


<snip>

> > >The argument is that it is not clear what a hierarchy of programming
> > >languages adds.
>
> > You are missing the point.  He made an observation and called it
> > an argument.  
>
> I made an analogy and never called it an argument.
>
> > It wasn't.  He did go on to question whether a
> > hierarchy of programming languages had value.  
>
> I went on to say

you did not. I mean really, where did you say that?

> that we do the same thing with programming
> langauges.  

and then you go on to demonstrate that we don't...


> For instance, in C, I will write simple functions to build
> libraries.

expession -> function -> library

that's three layers of abstraction right there

> I will use libraries to build filter programs.  I will
> pipe filter programs in a chain to accomplish complicated tasks.

ooo look! data flow programming!


> This
> is a metaphor that works pretty well.  

it does but the opposite way from what you wanted.

Don't get me wrong I've spent too much time debugging designs encoded
in arcane graphical formats that then have to be debugged all over
again when converted to code (generate the code you say? why not just
write the code?) to be a fan of graphical representations; but the
fact is we do need higher level things to reasons with than raw C. And
we have them.


> A lot of computer science seems to me to be largely mythology.  
> What I mean by that is (for instance)
> the notion that OO programming will allow less expensive complicated
> systems.  It is possible to write complicated systems in OO languages
> for a reasonable cost.  But empirical studies have shown that the cost
> is not lower than writing the same systems in a simple language like C
> [1].  So now people are trying other paradigms like Aspect Oriented
> programming.  I am not saying that alternative paradigms are bad.  In
> fact, I program almost exclusively in C++ and love the OO metaphor.  I
> am just saying that a change in paradigm doesn't usually change much.
> For instance, in C++, by creating objects we do NOT reduce
> complexity.  We only hide it.

<snip>

--
Nick Keighley

"Object-oriented programming is an exceptionally bad idea
that could only have originated in California."
Dijkstra quote:

Nick Keighley

unread,
Oct 29, 2009, 8:10:37 AM10/29/09
to
On 27 Oct, 08:55, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
wrote:

> * Unmaintainable code. Look at large data flow programs (e.g. in DiaDem,
> Simulink, LabView). It is impossible to use reasonable software processes
> on them. How to compare two diagrams?

convert it to a textual represention the run diff on it. I'm not
saying it's trivial but I don't think its intractable either.


> When are they semantically equivalent?

when they are the same. Code gives you exactly the same problem. The
pretty picture is only a representaion! You're programmer's for
Turing's sake! What else do you do apart from manipulate
representations?

> How do I validate a diagram?

there were tools that could do this in the 80s


> How to search for anything in a diagram?

solved in the 80s

what's the quote about being condemed to repeat it?

I don't *like* graphical design method-ologies but I don't pretend
they can't do things that they can do!

Dammit I'm running out exclamation marks??

> Another example of this problem is represented by GUI libraries,
> which are mostly event controlled. Practically none of them can be easily
> used in multitasking environment. It is a horror to debug hangups or
> generators in messages routed from one messages loop to another. And the
> proposal is to build *everything* this way. Shudder.

though you may have a point here


--
Nick Keighley


As our circle of knowledge expands,
so does the circumference of darkness surrounding it.
(Albert Einstein)


Nick Keighley

unread,
Oct 29, 2009, 8:49:45 AM10/29/09
to
On 28 Oct, 15:44, p...@informatimago.com (Pascal J. Bourguignon)
wrote:

talk to the Sparc and ARM people. They license their designs. I bet
they come to more than two lines. Oh, I forgot if they'd used Lisp
they be able to hide it all in one macro!

Nick Keighley

unread,
Oct 29, 2009, 8:56:07 AM10/29/09
to
On 28 Oct, 23:02, mike <m....@irl.cri.replacethiswithnz> wrote:

> I think that most of the problems inherent in any large-scale
> programming project result from the inherent 'fragility' of all
> programming languages.
>
> If you compare a large computing project with a large engineering
> project

large software projects aren't large engineering projects?


> there are clear similarities, but one very significant
> difference is that almost any undetected error in code has the potential
> to result, somewhere down the line, in catastrophic outcomes; whereas if
> a nail is not quite hammered in as far as specified or if a light
> fitting (or even a window) is put in the wrong place then the building
> usually remains safe, functional, and fit for purpose.

even civil engineers have their bad days
http://en.wikipedia.org/wiki/Hyatt_Regency_walkway_collapse

O-rings

<snip>

robert...@yahoo.com

unread,
Oct 29, 2009, 11:19:46 AM10/29/09
to
On Oct 29, 5:02 am, Mok-Kong Shen <mok-kong.s...@t-online.de> wrote:
> In engineering works there are "factors of safety" to take account
> of variability of materials, unpredictability of actual loadings
> and inaccuracies in construction etc. etc. In computing, analogous
> has been done. For results for control of critical (including
> in particular potentially dangerous) processes, one employs
> double or triple hardware to take care of the possibility of
> malfunction of hardware. As far as I know, in order to better
> detect programmer errors, one similarly employs different and
> independent teams of programmers to do the same job and then with
> test cases to compare the results of the resulting programs.
> However, if I don't err, this comparision is only done in the design
> phase of the software and later only the work of one of the teams
> is selected for practical application. A safer way, I think, would
> have these presumably equivalent software (resulting from different
> teams, employing preferably different programming languages and
> environments) in actual production runs to always work parallelly
> on multiple hardware (of possibly different types), so as to futher
> reduce the risk of errors, since testing of software in the design
> phase might not be thorough enough to uncover all errors that may be
> present. Of course, errors could not be "absolutely" eradicated,
> in accordance with Murphy's Law.


This is sometime done. For example, the Space Shuttle has five
(identical) computers in its flight control system. Four run the full
mission software in a parallel/lockstepped and redundant
configuration, while the fifth runs a basic version of the flight
control software developed independently from the main software
group. The fifth system has enough function to fly the Shuttle, but
nothing else.

Note that they did not choose to implement different hardware for the
fifth system (it's very difficult to use heterogeneous hardware within
a lockstepped group, so that's not really an option for the first
four). While in theory different hardware for the fifth system might
improve reliability some, the systems used are extremely well tested
(and actually quite old - part of why they're so well tested), and
since they're running different software, a latent hardware bug (as
opposed to a fault, which would only take out a single computer) is
unlikely to hit both the primary group of four and the fifth system
simultaneously. It is, as always, a tradeoff.

Obviously the cost for this sort of approach is very high.

Also, this sort of parallel/independent development has been
demonstrated to *not* be as independent as one would like. While many
of the low level details of the independent implementations do show
fairly little correlation, several studies (including one by NASA,
IIRC) have found that there is a significant tendency for independent
development groups to make similar errors interpreting specifications
and making similar higher level implementation decisions. This has
been noticed by the two "independent" systems having a surprising
number of parallel bugs.

Ray

unread,
Oct 29, 2009, 1:16:54 PM10/29/09
to
robert...@yahoo.com wrote:

> Also, this sort of parallel/independent development has been
> demonstrated to *not* be as independent as one would like. While many
> of the low level details of the independent implementations do show
> fairly little correlation, several studies (including one by NASA,
> IIRC) have found that there is a significant tendency for independent
> development groups to make similar errors interpreting specifications
> and making similar higher level implementation decisions.

That seems to me to be likely the fault of those who wrote the
specification rather than those who were interpreting it.
Seriously, if people are reaching the *same* misinterpretation,
the flaw is in the document they're reading.

It's kind of like Fox News: Polls show that large percentages
of the people who watch it believe things that just aren't
true about, eg, the Iraq conflict. Even if you can't point
to a particular day, hour, and minute when they came out and
gave a straight-up provably-false lie as the truth, the fact
that their viewers believe the same set of false things at a
rate several times higher than the viewers of other news
programs tells you something important.

Bear

Dmitry A. Kazakov

unread,
Oct 29, 2009, 2:15:20 PM10/29/09
to
On Thu, 29 Oct 2009 05:10:37 -0700 (PDT), Nick Keighley wrote:

> On 27 Oct, 08:55, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
> wrote:
>
>> * Unmaintainable code. Look at large data flow programs (e.g. in DiaDem,
>> Simulink, LabView). It is impossible to use reasonable software processes
>> on them. How to compare two diagrams?
>
> convert it to a textual represention the run diff on it. I'm not
> saying it's trivial but I don't think its intractable either.

Are pixel positions and sizes of the blocks relevant to the comparison?
(:-))

The very argument that a text form is somewhat better raises a suspicion
why not to use it from the start?

>> When are they semantically equivalent?
>
> when they are the same.

Yes two programs are equivalent when they are...

> Code gives you exactly the same problem.

Sure, but for a programmer it is easier to decide when he deals with text.
"Find 10 differences" is a game known for pictures, no for texts.

(I am not an expert in psychology, but it seems that abstract visual
information cannot be digested at a finely detailed level. Maybe it is a
fundamental problem. Maybe not. But so far text works at best. Look at the
younger generation, who read virtually nothing. They only watch. They have
got completely new gadgets (like mobile phones), my generation didn't have.
I.e. it was a fresh start. And what we see? SMS - naked texts reborn!)

>> How do I validate a diagram?
>
> there were tools that could do this in the 80s

There was no LabView that time, and there is no such tools now... (:-))
AFAIK, engineers designing models in Simulink, LabView etc, validate the
generated code. They do not the diagrams.

>> How to search for anything in a diagram?
>
> solved in the 80s

Ah, that is maybe because there were only alphanumeric display device back
then? (:-))

Richard Harter

unread,
Oct 29, 2009, 3:52:38 PM10/29/09
to
On Mon, 26 Oct 2009 17:04:36 -0700 (PDT), user923005
<dco...@connx.com> wrote:

This really is tangential, but perhaps we can get it out of the
way.

>I've never seen a convincing argument to show that this is wrong.
>
>We can use a 26 letter alphabet to make little words.

>We can use a 26 letter alphabet to make bigger words.
>We can use a 26 letter alphabet to make little paragraphs.
>We can use a 26 letter alphabet to make bigger paragraphs.
>We can use a 26 letter alphabet to make little books.
>We can use a 26 letter alphabet to make bigger books.


>We can use a 26 letter alphabet to make entire libraries.

I shall begin with what appears to be a quibble but really isn't.
Of these seven statements all but the first two are false.
Paragraphs are made of sentences. In English text sentences
include a period at the end and initialization at the end. To
get the full range of normal English text you need upper and
lower case letters, digits, and a suite of punctuation
characters. To use the jargon of programming, you need the full
English language character set.

That is not all. You also need the rules for putting together
valid English prose, a dictionary of English words, and means to
add new words to that dictionary.

Is that all? No. A proper character set, composition rules, and
a dictionary will enable you to produce paragraphs. It does not
suffice for creating books. The text in books is on a two
dimensional surface that is partitioned into pages. The material
in the books does not go into that surface randomly. There are
rules for layout. What is more, these rules vary from book to
book. Finally books can contain graphical elements of various
sorts and there are rules for doing that.

What it comes down to is that you need a lot more than your 26
letters to make your paragraphs, books, and libraries. What is
more, further you go up the scale the more extra stuff you need.
I am confident you agree with the need, though you might feel
that it's just unimportant details. It's not.

So much for that. Let's get on to the juxtaposiion fallacy. I
don't recall the correct name of the fallacy offhand, but the
idea is to suggest a relationship by putting elements next to
each without ever establishing the relationship. It is a
rhetorical device for appearing to make an argument without
actually making it.

In your seven sentences there is an implicit suggestion that
there is a ladder of building blocks, i.e., we build words out of
letters, paragraphs out of words, books out of paragraphs, and
libraries out of books. There is a real problem with this idea.
Books can have a lot of different things in them that are not
paragraphs, indeed aren't even necessarily words. Examples
include indices, bibliographies, tables, footnotes, chapter
titles, illustrations, shaded text boxes, and graphs. Oh yes,
don't forget recipes and poems.

>

>
>Why isn't the same thing true of programming languages?

>

>Now, I admit that if our tiny building blocks are poorly made, the
>bigger building blocks become more and more fragile.
>But that is true no matter which programming language we use to build
>the smallest blocks. And it is always a tragic mistake to try to make
>one big giant block that does it all (Forth metaphor is not an
>exception because the mega word comes from its babies).
>
>I also don't think it matters which direction we build the turtles, as
>long as they make it from top to bottom.

And here is where the fallacy has been used. The argument uses
the implied analogy to a fallacious model of natural language
texts. There is a double fault here. The first is the smuggling
in of a faulty model of natural language texts. The second is
the drawing of an analogy without establishing whether it is
appropriate.

Don't think I didn' see the connection you were suggesting. What
I am saying is that your argument wasn't even close to being
legitimate.

BTW, All of that said, none of this has anything to do with
whether data flow languages are a good idea.

Richard Harter

unread,
Oct 30, 2009, 1:14:51 AM10/30/09
to
On Tue, 27 Oct 2009 09:55:22 +0100, "Dmitry A. Kazakov"
<mai...@dmitry-kazakov.de> wrote:

>On Mon, 26 Oct 2009 22:22:14 GMT, Richard Harter wrote:
>
>> Some significant disadvantages:

[snip]

>* Low abstraction level, basically lacking any abstraction. Data flow works
>with primitive built-in atomic types forced into value semantics.
>
>* Already mentioned inefficiency because of value semantics (i.e. either
>permanent marshaling inputs, or else blocking the producers or the
>consumers when data are shared) Normally it is the programmer to choose how
>to protect states. With the dataflow it must be the engine to decide.
>Basically it is lack of abstraction again. Working with *data* instead of
>*states*, that summaries it.
>
>* The networks of wired blocks cannot be encapsulated into reusable opaque
>primitives. You can group blocks, that's it. Usual methods of decomposition
>do not work, because there is a firewall between "block" and "data". You
>cannot merge them (like in OO) into a reusable entity (type = values +
>behavior).


>
>* Unmaintainable code. Look at large data flow programs (e.g. in DiaDem,
>Simulink, LabView). It is impossible to use reasonable software processes

>on them. How to compare two diagrams? When are they semantically
>equivalent? How do I validate a diagram? How to search for anything in a
>diagram? Another example of this problem is represented by GUI libraries,


>which are mostly event controlled. Practically none of them can be easily
>used in multitasking environment. It is a horror to debug hangups or
>generators in messages routed from one messages loop to another. And the
>proposal is to build *everything* this way. Shudder.
>

>* Semantic problems, the barriers of a block that fire the computation of
>the block's output is a very primitive model that does not scale in
>reality. It almost instantly goes into a mess of deadlocking vs. race
>condition choices. And see above, it practicably cannot be debugged except
>for trivial cases (without feedbacks, with all inputs synchronous etc) it
>cannot be decomposed into simpler tasks...


I will accept some blame for the above farrago. I did point out
there is a wide variety of data flow languages and gave examples.
It didn't occur to me that anyone would actually confuse data
flow programming and graphical programming. No doubt I should
have.

That said, it seems clear to me that if one is thinking in terms
of using data flow languages for high level concurrency one is
talking about languages like Erlang, Axum, and Hermes.

Mok-Kong Shen

unread,
Oct 30, 2009, 5:24:12 AM10/30/09
to
Ray wrote:
> robert...@yahoo.com wrote:
>
>> Also, this sort of parallel/independent development has been
>> demonstrated to *not* be as independent as one would like. While many
>> of the low level details of the independent implementations do show
>> fairly little correlation, several studies (including one by NASA,
>> IIRC) have found that there is a significant tendency for independent
>> development groups to make similar errors interpreting specifications
>> and making similar higher level implementation decisions.
>
> That seems to me to be likely the fault of those who wrote the
> specification rather than those who were interpreting it.
> Seriously, if people are reaching the *same* misinterpretation,
> the flaw is in the document they're reading.

To have correct and good specifications is indeed of paramount
importance for the success of software projects. Long time ago,
I was told by someone that there are systems that somehow tightly
control and assist the designer in software development throughout
the process of specifiction, code generation and documentation in
the field of real-time embedded software. I never know how general
it was, i.e. whether it was well applicable outside of the said
special domain. (I just searched for it on the internet and found
that the producer had been acquired by IBM. See an Wiki article
http://en.wikipedia.org/wiki/I-Logix.) Could someone who have
substantially worked with systems of such kind let us share his
good or bad experiences with them?

M. K. Shen

PS. Newsgroup set to comp.programming

Nick Keighley

unread,
Oct 30, 2009, 6:32:30 AM10/30/09
to
On 29 Oct, 18:15, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>

wrote:
> On Thu, 29 Oct 2009 05:10:37 -0700 (PDT), Nick Keighley wrote:
> > On 27 Oct, 08:55, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
> > wrote:

> >> * Unmaintainable code. Look at large data flow programs (e.g. in DiaDem,
> >> Simulink, LabView). It is impossible to use reasonable software processes
> >> on them. How to compare two diagrams?
>

> > convert it to a textual represention then run diff on it. I'm not


> > saying it's trivial but I don't think its intractable either.
>
> Are pixel positions and sizes of the blocks relevant to the comparison?
> (:-))

I presume the smiley means you can answer that yourself...


> The very argument that a text form is somewhat better raises a suspicion
> why not to use it from the start?

where did I say better? They have different strengths and weaknesses.
Programmers I know spend a fair amount of time drawing strange
diagrams on whiteboards. Personnally, I think they should then throw
the diagrams away and code it up but a lot of people like diagrams.
You can have both. Rational Rose has the pretty pictures but
underneath there's a textual language. The tools of the 80s did
similar tricks. Rational Rose even separates the logical [(link box-a
box-b arrow-c)] from the representiational [(box box-a 200 300 "snow-
white" "cosmic-black")]

> >> When are they semantically equivalent?
>
> > when they are the same.
>
> Yes two programs are equivalent when they are...

yes, but I fail to see why the graphical form presents any different
problems from the textual form.


> > Code gives you exactly the same problem.
>
> Sure, but for a programmer it is easier to decide when he deals with text.
> "Find 10 differences" is a game known for pictures, no for texts.

that is true. I don't play find the differences with text either I use
diff (or, better a graphical version). It would be nice if there were
a graphical differences tool- that is one that would didplay the
picture and highlight the differnces. Not undoable I'm sure.

> (I am not an expert in psychology, but it seems that abstract visual
> information cannot be digested at a finely detailed level. Maybe it is a
> fundamental problem. Maybe not. But so far text works at best.

not for roughly half the population of programmers. I can get more of
a program on a diagram.


> Look at the
> younger generation, who read virtually nothing. They only watch. They have
> got completely new gadgets (like mobile phones), my generation didn't have.
> I.e. it was a fresh start. And what we see? SMS - naked texts reborn!)

I played arounf with Macs when they didn't have any CLI. It
was ...interesting...

> >> How do I validate a diagram?
>
> > there were tools that could do this in the 80s
>
> There was no LabView that time, and there is no such tools now... (:-))
> AFAIK, engineers designing models in Simulink, LabView etc, validate the
> generated code. They do not the diagrams.

Cadre Teamwork could check that flows into and out of a data transform
(is that the right word for a bubble) matched. It could validate the
"syntax" of c-specs (state machines diagrams). Admittedly the p-specs
(code) couldn't be validated.


> >> How to search for anything in a diagram?
>
> > solved in the 80s

it's because the diagram wasn't a bunch of pixels it was a logical
structure. I started to write a DFD editor for fun (fun?!) perhaps I
should dig it out again.

> Ah, that is maybe because there were only alphanumeric display device back
> then? (:-))

blinking lights and switches. If you can't read the machine code in
binary you aren't a Real programmer. Octal is for quiche eaters.


--
"The Dinosaurs have come and gone,
we Theriodonts remain"

Pascal J. Bourguignon

unread,
Oct 30, 2009, 7:19:46 AM10/30/09
to
Nick Keighley <nick_keigh...@hotmail.com> writes:

> It would be nice if there were
> a graphical differences tool- that is one that would didplay the
> picture and highlight the differnces. Not undoable I'm sure.

Indeed. Have a look at:
http://www.informatimago.com/develop/pic-merge-diff3/index.html
A trivial patch to this script would highlight the differences.

(Unfortunately, it's only a pixel diff/merge operation, we would have
to use more sophisticated algorithms to detect in a concise form block
transformations; but even in this case, this gimp script is rather
effective, visually).

--
__Pascal Bourguignon__

Dmitry A. Kazakov

unread,
Oct 30, 2009, 11:21:08 AM10/30/09
to
On Fri, 30 Oct 2009 03:32:30 -0700 (PDT), Nick Keighley wrote:

> On 29 Oct, 18:15, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
> wrote:
>> On Thu, 29 Oct 2009 05:10:37 -0700 (PDT), Nick Keighley wrote:
>>> On 27 Oct, 08:55, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
>>> wrote:
>
>>>> * Unmaintainable code. Look at large data flow programs (e.g. in DiaDem,
>>>> Simulink, LabView). It is impossible to use reasonable software processes
>>>> on them. How to compare two diagrams?
>>
>>> convert it to a textual represention then run diff on it. I'm not
>>> saying it's trivial but I don't think its intractable either.
>>
>> Are pixel positions and sizes of the blocks relevant to the comparison?
>> (:-))
>
> I presume the smiley means you can answer that yourself...

No, equivalence of two directed graphs is not a simple task.

>> The very argument that a text form is somewhat better raises a suspicion
>> why not to use it from the start?
>
> where did I say better? They have different strengths and weaknesses.
> Programmers I know spend a fair amount of time drawing strange
> diagrams on whiteboards. Personnally, I think they should then throw
> the diagrams away and code it up but a lot of people like diagrams.
> You can have both.

Yes, physicists draw lots of diagrams too. Nevertheless the language of
physics is differential equations. The role of diagrams is always
supplementary, they cannot serve as a language.

> yes, but I fail to see why the graphical form presents any different
> problems from the textual form.

It is easier to analyse textual form for both humans and computers.

>> Ah, that is maybe because there were only alphanumeric display device back
>> then? (:-))
>
> blinking lights and switches. If you can't read the machine code in
> binary you aren't a Real programmer. Octal is for quiche eaters.

Yep, that was fun. You could immediately see if the program ran in a cycle,
the pattern repeated itself. Blue screen, that's not suckers! Real men read
the program counter of the crash location looking at the front panel LEDs!

Dmitry A. Kazakov

unread,
Oct 30, 2009, 11:23:39 AM10/30/09
to

Yes, as someone with image processing, pattern recognition and AI
background I can tell you, there is no such algorithms. You will need a
complex scene analysis in order to get at the level of a diagram. Even this
does not work. Because image segmenting and line detection do not really
do. Such a trivial task, but alas, no algorithm can compete human in image
segmenting. But even if you passed through, got geometric regions, lines
etc. To analyse their connections, shapes (e.g. the scene), that does not
work. Diagrams generated by GUI tools skip these steps, because the tool
places this information into the intermediate file.

The point is that the task is immense and though we have solved it during
our evolution, my guess is that it leaves no free "computational" resources
to analyse the picture at a much detailed level. It wasn't evolutionary
needed too.

Richard Harter

unread,
Oct 30, 2009, 11:49:36 PM10/30/09
to
On Tue, 27 Oct 2009 10:09:25 -0400, Joshua Cranmer
<Pidg...@verizon.invalid> wrote:

>On 10/26/2009 06:22 PM, Richard Harter wrote:
>> The antithesis is that it isn't turtles all the way up, or at
>> least it shouldn't be turtles all the way up. That is, the kinds
>> of languages and programming technologies that we use in
>> large scale programming should be quite different from the kind
>> of languages used for programming in the small.
>

>I think most people would agree that methodologies for small-scale
>programs are different from those for large-scale programs, although
>there might be considerable disagreement over the dividing line. I
>certainly am not going to be used a full-blown OOP paradigm for writing
>even something like an automatic code generator; on the other hand, I
>shudder at the thought of writing something so complex as an operating
>system in a functional paradigm.

IIANM people have written operating systems in functional
languages without difficulty.
>
>An interesting project would be to take a large corpus of various mature
>open-source programs of varying size and see if there is a correlation
>between size and the use of certain language features or patterns.

Perhaps this would be an appropriate project for CS PhD
candidates.

>
>> One difference between traditional programs and data flow
>> programs is that traditional programs use LIFO semantics whereas
>> data flow programs use FIFO semantics. That is, a traditional
>> program puts data on a stack and gets data back on the same
>> stack. In data flow programs each element gets data from a queue
>> and puts data to other queues.
>
>One nit to point out: at this stage, many programs don't follow a
>procedural model of programming. OOP is the dominant paradigm, and I
>don't see the sequence of data flow as being LIFO. Yet you continually
>refer to procedural programming as those that make up `traditional
>programs.

To nit a nit - the word "procedural" doesn't appear at all in the
article. I used the term "traditional imperative". More to the
point, data flow is LIFO. By this I don't mean all data flow;
rather I am referring to the data passed to functions/methods via
calling sequences and the data returned from them via return
statements. Likewise the flow of control is LIFO. That is, when
a function/method exits it returns control to the place it was
called from.


>> Another difference is that the connectivity of traditional
>> programs is deeply embedded in the code. To pass data from A to
>> B, A calls B. That is, the caller has to specify where the data
>> goes. In data flow programs the connectivity can be separate
>> from the code. A doesn't pass data directly to B; instead it
>> passes data to the run time system which in turn passes the data
>> to B. The caller does not have to specify where the data goes.
>
>Two words: function pointers. You can also go with virtual or dynamic
>method dispatch, but function pointers is shorter and sounds better.

Well, no. Function pointers et al are still specifications of
destination. More than that, A actually sends the data
immediately to B via a call. Consider the following two lines of
code which look very similar:

func(x); /* func is a function pointer */
send(x); /* send is a messaging primitive */

In the first line "func is bound (directly or indirectly) to the
actual function that will act on x. In the second "send" is not
bound to the function that will act on x.

>
>> * Concurrency and parallelism are natural. Code can be
>> distributed between cores and even across networks. Many of the
>> problems associated with threads disappear.
>

>They don't disappear, they're pushed into the runtime system. From
>practical experience, that means they probably bubble up and annoy you
>in the programming language.

What kind of practical experience have you had with data flow
languages?

>
>> * Data flow networks are natural for representing process. This
>> is particularly true for transaction processing.
>
>Maybe I'm just being a stupid idiot here, but I don't see how data flow
>is natural for representing some common processes. For example, the
>HTML5 tokenization process. I suppose you can flow current-state output
>back around and into itself as next-state input, but that's not exactly
>natural.

I sounds as though you're asking how one would implement a state
machine in a dataflow language. It would be straightforward
enough, but there's not much point in doing it unless it gives
you cheap thrills.
>
>This also brings up a question of how the language deals with loops in
>the dependency graph. The solutions I see either bring back the problems
>with threading or could create unreliability.
>
>> * Message passing gets rid of the problems associated with shared
>> memory and locks.
>
>Just as credit default swaps and other financial machinations got rid of
>risk :-).

It's the other way around. Threading, shared memory, mutexes,
and locks are the equivalent of default swaps. :-)

The argument really is straightforward. Data flow languages
typically require that shared memory be immutable. If it's not
immutable then it's not shared. This works. There are prices.
Mostly they are performance prices.

>From what I can tell, similar problems arise, but they're in a
>different form (data flow dependency graphs, particularly the fact that
>they're typically not acyclic).

Perhaps you could give an example of what you are thinking of.
It is not quite clear to me why you think this is a problem.

>
>> * Data flow programs are more extensible than traditional
>> programs. Elements can be readily composed into composite
>> elements from the bottom up rather than top down.
>
>Even if you consider good old procedural programming, I would say that
>the exact same thing is easily doable in traditional programs. I can
>take code that does asynchronous socket reads and turn it into a MIME
>decoder by creating a module that calls the asynchronous socket read
>module appropriately. It's also an example of your "bottom-up composition."
>
>> Some significant disadvantages:
>>
>> * Using data flow programming in the large pretty much requires
>> that it be used from the start. That is, converting traditional
>> programs into data flow programs is difficult because the
>> structuring is so different.
>
>I don't see it that way. Ultimately, most libraries are merely "pass X
>in as input, get Y out as output"--it shouldn't be too hard to make that
>an atomic data-flow program block.

The problem doesn't lie in the libraries; it lies in the
superstructure and program organization.

Good comments, thank you.

Nick Keighley

unread,
Oct 31, 2009, 5:38:53 PM10/31/09
to
On 30 Oct, 15:23, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
wrote:

> On Fri, 30 Oct 2009 12:19:46 +0100, Pascal J. Bourguignon wrote:
> > Nick Keighley <nick_keighley_nos...@hotmail.com> writes:

> >> It would be nice if there were

> >> a graphical differences tool- that is one that would display the


> >> picture and highlight the differnces. Not undoable I'm sure.
>
> > Indeed. Have a look at:
> >http://www.informatimago.com/develop/pic-merge-diff3/index.html
> > A trivial patch to this script would highlight the differences.
>
> > (Unfortunately, it's only a pixel diff/merge operation, we would have
> > to use more sophisticated algorithms to detect in a concise form block
> > transformations; but even in this case, this gimp script is rather
> > effective, visually).
>
> Yes, as someone with image processing, pattern recognition and AI
> background I can tell you, there is no such algorithms. You will need a
> complex scene analysis in order to get at the level of a diagram. Even this
> does not work. Because image segmenting and line detection do not really
> do. Such a trivial task, but alas, no algorithm can compete human in image
> segmenting. But even if you passed through, got geometric regions, lines
> etc. To analyse their connections, shapes (e.g. the scene), that does not
> work. Diagrams generated by GUI tools skip these steps, because the tool
> places this information into the intermediate file.

I had in mind an easier problem than comparing two pixel collections.
I was assuming the diagram was a collection of shapes (typical sort of
thing graphical design tools mess around with). I was hoping such
diagrams could be compared fairly easily, though in another post you
dash this hope! :-(

> The point is that the task is immense and though we have solved it during
> our evolution, my guess is that it leaves no free "computational" resources
> to analyse the picture at a much detailed level. It wasn't evolutionary
> needed too.


--
Nick Keighley

Nick Keighley

unread,
Oct 31, 2009, 5:51:23 PM10/31/09
to
On 30 Oct, 15:21, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>

wrote:
> On Fri, 30 Oct 2009 03:32:30 -0700 (PDT), Nick Keighley wrote:
> > On 29 Oct, 18:15, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
> >> On Thu, 29 Oct 2009 05:10:37 -0700 (PDT), Nick Keighley wrote:
> >>> On 27 Oct, 08:55, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>

> >>>> * Unmaintainable code. Look at large data flow programs (e.g. in DiaDem,
> >>>> Simulink, LabView). It is impossible to use reasonable software processes
> >>>> on them. How to compare two diagrams?
>
> >>> convert it to a textual represention then run diff on it. I'm not
> >>> saying it's trivial but I don't think its intractable either.
>
> >> Are pixel positions and sizes of the blocks relevant to the comparison?
> >> (:-))
>
> > I presume the smiley means you can answer that yourself...
>
> No, equivalence of two directed graphs is not a simple task.

oh, shame. I was perhaps hoping it was doable if the two diagrams were
"similar"


> >> The very argument that a text form is somewhat better raises a suspicion
> >> why not to use it from the start?
>
> > where did I say better? They have different strengths and weaknesses.
> > Programmers I know spend a fair amount of time drawing strange
> > diagrams on whiteboards. Personnally, I think they should then throw
> > the diagrams away and code it up but a lot of people like diagrams.
> > You can have both.
>
> Yes, physicists draw lots of diagrams too. Nevertheless the language of
> physics is differential equations. The role of diagrams is always
> supplementary, they cannot serve as a language.

someof the UML people seem to think it can. (Perhaps UML + something
else as I'm not sure UML itself has any semantics).

I know I'm posting from ignorance here, but I'd be interested in
learning something


> > yes, but I fail to see why the graphical form presents any different
> > problems from the textual form.
>
> It is easier to analyse textual form for both humans and computers.
>
> >> Ah, that is maybe because there were only alphanumeric display device back
> >> then? (:-))
>
> > blinking lights and switches. If you can't read the machine code in
> > binary you aren't a Real programmer. Octal is for quiche eaters.
>
> Yep, that was fun. You could immediately see if the program ran in a cycle,
> the pattern repeated itself. Blue screen, that's not suckers! Real men read
> the program counter of the crash location looking at the front panel LEDs!
> (:-))

LEDs?! foo use the real thing, Nixie tubes

mike

unread,
Nov 1, 2009, 10:48:43 PM11/1/09
to
In article <hcbp8d$7pv$02$1...@news.t-online.com>, mok-kong.shen@t-
online.de says...
Yes - it is true that paralelling the hardware and software with result
comparison migh help reduce software errors, but I think it may be a
while before I can load up a [Windows/Linux/Chrome] operating system on
my new [AMD/Intel/Via(?)] chipset...

Mok-Kong Shen

unread,
Nov 2, 2009, 4:03:30 AM11/2/09
to
mike wrote:
> mok-kong.shen wrote:
[snip]
>> .......... A safer way, I think, would

>> have these presumably equivalent software (resulting from different
>> teams, employing preferably different programming languages and
>> environments) in actual production runs to always work parallelly
>> on multiple hardware (of possibly different types), so as to futher
>> reduce the risk of errors, since testing of software in the design
>> phase might not be thorough enough to uncover all errors that may be
>> present. Of course, errors could not be "absolutely" eradicated,
>> in accordance with Murphy's Law.
>>
> Yes - it is true that paralelling the hardware and software with result
> comparison migh help reduce software errors, but I think it may be a
> while before I can load up a [Windows/Linux/Chrome] operating system on
> my new [AMD/Intel/Via(?)] chipset...

It depends of course on how critical the consequence of an eventual
error would be in one's application. In allday works, the tolerance is
relatively high. That's why almost all commonly used software repeatedly
have updates, which not only introduce new features but also often
correct errors, and yet people use them. One takes that risk
consciously, just like anyone taking a flight knows that there is
a very small but certainly non-zero probability that his plane would
have troubles en route and he might not arrive at his destination.
He must judge whether it is wise for him to take the risk. That is
unfortunately life.

M. K. Shen

Marco van de Voort

unread,
Nov 4, 2009, 10:09:14 AM11/4/09
to
On 2009-10-28, Pascal J. Bourguignon <p...@informatimago.com> wrote:
>> On 2009-10-28, Pascal J. Bourguignon <p...@informatimago.com> wrote:
>>> Now if you need to write a million of source line, then just don't
>>> do it. Use metaprogramming to generate this million of source lines
>>> from a smaller source. And so on, you can add layers of
>>> metaprogramming all you need to compact your sources and always have
>>> something of manageable size.
>>
>> So, you're saying that *every* programming problem can be solved in at
>> most a few tens of thousands of lines of code?

>
> Can you not specify all programming problem in less that a few
> thousands of lines of specification?

Yes but that is usually a top-down specification, and while implementing
lots of little decisions still have to be made.

So usually, such specifications are not complete.

> Well, you can always write more detailed specifications, but I can
> assure you that sales peoples will always be able to put the whole
> specifications of your software on a 2-page booklet.

Yes, but strictly speaking that is a summary. Not a full specification.

IOW you reduce complexity by removing details, and describe first order
operation only. Not by introducing an abstraction that reduces data.

>> Certainly some problems can, but most can't. Metaprogramming is just
>> a form of compression, and there is no compression system that can
>> reduce every source below a given size.

I like that analogy.

>> Some problems really are irreducibly complex, and demand complex
>> solutions.
>
> Yes indeed. However, assuming a big ontology (eg. take wikipedia, or
> even the whole web), wouldn't it be possible to express the needs for
> any software in less than ten thousands lines, and let the
> sufficiently smart system develop it, filling in the blanks in the
> specifications with all the knowledge it can extract from the web?

I don't think so. Trying to do that you just move a lot of assumptions and
decisions in the architecture of the interpreter of those thousand lines,
which will make that interpreter harder to reuse.

Tim Little

unread,
Nov 4, 2009, 9:15:00 PM11/4/09
to

You can do that: but now your program's correctness depends upon the
correctness of every detail in Wikipedia, or even the whole web, as
well as that of both the "sufficiently smart" system that extracts
information from it, and the ten thousand lines of specification.

And yet, there will still be programs that cannot be specified in less
than ten thousand lines (unless the lines are of unbounded length).


> Or take the problem actually in the other dirrection. Would you
> trust any implementation of a system that has orders of magnitude
> more than ten thousand lines of specifications?

Not with the fate of the human species, no. As an acceptable element
of risk to my own life, yes. You probably have done so too: air
traffic control systems do have much more than ten thousand lines of
specification. So do the systems that actually allow the pilots to
fly the jets. So do hospital patient record systems, including
records of medications to be administered.


> How can you ensure these specifications are consistent? How can you
> ensure that they're effectively implemented?

I cannot. I could not even read most of the specifications if I
wanted to, nor the source code of virtually any system upon which my
life may depend. Even if I could compare them, most of them would
require specific knowledge in domains that could take a decade to
achieve the required level of background knowledge to adequately check
their correctness.


> Wouldn't you be more able to understand and check the specifications
> if they were shorter, that is indeed, given the ultimate limits to
> compression, if what they specified was less complex or of a more
> limited scope?

If that were possible. At some point reduction in length can only
come with less comprehensibility, and beyond that there is a point
where no reduction in length is possible at all.


> If you accept that big systems must be decomposed into small
> programs,

I do not. Sometimes it is appropriate to decompose a big system into
small programs, sometimes it does not help much. Sometimes it makes
the problem worse by greatly increasing the complexity of interactions
between programs.


> Then the degree of automatization in the process of translating the
> specifications into executable code is only a matter of advancement
> of the techniques, while the size of the executable code is only
> (roughly) a function of the number of metaprogramming levels used.

Resource requirements for a given task can scale exponentially with
the number of metaprogramming levels used, and frequently do. Also,
the concept of "leaky abstraction" usually applies, becoming worse
with every level added.

Note that most large real-world systems have enormous numbers of
details that are both highly specific and necessary to correct
function. They will not be present in the pre-existing programming
environment because they are specific to the particular problem. They
cannot be ignored because they are necessary. Hence they must all be
represented in both the specification and resulting system source
code, which irreducibly blows both their lengths well past ten
thousand lines.


- Tim

Paul E. Black

unread,
Nov 5, 2009, 1:01:47 PM11/5/09
to
On Wednesday 04 November 2009 21:15, Tim Little wrote:
> On 2009-10-28, Pascal J. Bourguignon <p...@informatimago.com> wrote:
>> ... However, assuming a big ontology (eg. take wikipedia,

>> or even the whole web), wouldn't it be possible to express the needs
>> for any software in less than ten thousands lines, and let the
>> sufficiently smart system develop it, filling in the blanks in the
>> specifications with all the knowledge it can extract from the web?
>
> You can do that: but now your program's correctness depends upon the
> correctness of every detail in Wikipedia, or even the whole web, as
> well as that of both the "sufficiently smart" system that extracts
> information from it, and the ten thousand lines of specification.
>
> And yet, there will still be programs that cannot be specified in less
> than ten thousand lines (unless the lines are of unbounded length).

Let me take another tack. Suppose I want to create a specification
language that is so powerful that ANY program can be specified in less
than 10 000 lines.

But wait, there are an infinite number of computer programs! If I
limit my specification language to about 75 characters (upper and
lower case letters plus numbers and punctuation), I can only write
about 75^(10 000 * 80) = 10^1500049. That's a lot of program, but not
infinite.

Ok, but I really don't care about specifying bizarre "programs" in my
language. I'll settle for only specifying "reasonable" ones.

Hold on, how can I decide beforehand which programs are reasonable
(that is, those I should be able to specify) and those that aren't??

I hope it is clear that specifications are no magic bullet.


Yes, I much prefer a short specific (in an "easily understood"
language) to a long, possibly complex-for-decent-performance program.
But not experience has shown that many perfectly reasonable tasks
don't have succinct specifications nor programs.

-paul-
--
Paul E. Black (p.b...@acm.org)

0 new messages