Writing A Plain English Compiler

161 views
Skip to first unread message

Gerry Rzeppa

unread,
Nov 4, 2014, 11:27:29 PM11/4/14
to
[And with that, this thread is over unless someone has something
to say that's related to compilers. Thanks, all. -John]

Good idea. Let's talk nitty-gritty compiler stuff and see if anyone is
interested.

Exactly how does one go about writing a Plain English compiler? I imagine
there are lots of ways, but here's how my elder son and I did it.

Having worked with a wide variety of traditional languages, and having
written a number of compilers together, we knew that a practical procedural
programming language could be concocted with just five types of statements:

1. Type definitions;
2. Global Variable definitions;
3. Routine Headers;
4. Conditional Commands; and
5. Unconditional Commands.

We then noticed that when talking with each other about Types, in English,
our sentences typically started with indefinite articles like "A" and "An"
and "Some", as in "A color has a hue..." or "An ellipse is a..." or "Some
pages are...".

We also noticed that when talking about Global Variables, we always used the
definite article, "The", as in "The screen's width..." or "The grand
total...".

We noticed, as well, that when describing a real-world Routine or procedure
to someone, we usually started our explanation with the preposition "To", as
in "To get to the store from here..." or "To bake a cake...".

Fourthly, we noticed that Conditional Commands, in our everyday speech, most
often began with the word "If", as in "If I'm not home in an hour..." or "If
the oven isn't hot enough..."

Lastly, we noticed that Unconditional Commands could start with almost any
verb, as in "Bake the cake" or "Fill the car up with gas". Since we wanted
our Plain English language to be user-extensible -- and since we certainly
didn't want to have to make a list of all the verbs any programmer might
want to use -- we decided to describe this category simply as "everything
else".

The result of these ruminations was the following tentative definitions for
our language:

1. Type definitions, which always start with A, AN, or SOME;
2. Global Variable definitions which always start with THE;
3. Routine headers, which always start with TO;
4. Conditional Commands, which begin with "IF"; and
5. Unconditional Commands, which start with anything else.

At this point we needed just one more thing to get going on a simple
prototype: rules for the formation of names. We noticed that the names we
gave things in ordinary speech were typically nouns, often preceded with one
or more descriptive adjectives, such as "table" or "orange table" or "big
orange tables", often containing spaces, and almost always preceded by an
article in everyday speech, as in "the table" or "an orange table" or "some
big orange tables". So, taking an approach similar to our handling of
Routines above, we decided to try this definition:

6. A name is anything after an article, up to (but not including) any other
article or preposition.

We then set about formalizing these ideas in a somewhat sloppy EBNF, where,
for our own convenience, we used "A" to represent any indefinite article,
like so:

type = A name IS A type-name .

Where a "type-name" is the name of any built-in or user-defined Type. (To
allow user-defined Types to appear in any order in the source code of a
Plain English program, and to support recursive data structures without the
need for additional "keywords", we decided to have our compiler make more
than one pass of the source: collecting names the first time around,
resolving them later.) Continuing:


global = THE name IS A type-name .

No surprises there. Moving on:

routine = TO routine-name : commands

Where routine-name turns out to be not quite so simple: a routine-name, at
this stage, turned out to be a combination of those "anything else" words,
with optional parameters sprinkled in between. The parameters, it turns out,
were easy to define: they're essentially local variables which are defined
like global variables except that indefinite articles are used. So we have:

parameter = A name

Where "A" represents any indefinite article, as above.

Calling the groups of miscellaneous words preceding, in between, and
following parameters, "monikettes", we could then define a routine-name like
so:

routine-name = { monikette | parameter }

Which left us, at this point, with just the two kinds of Commands to define
(which I'll take out of the above order):

command = unconditional | conditional .

Unconditional Commands take the form:

unconditional = { monikette | new-local | variable-reference}

Where a new local variable is defined like so:

new-local = A name

And a variable-reference to a previously defined global, local, or
parameter, looks like this:

variable-reference = THE name

And a Conditional Command takes the form:


conditional = IF decider-call , unconditional { ; unconditional } .

Where the decider-call is a reference to a special kind of Routine defined
as follows:

decider: TO DECIDE IF routine-name : commands

And that, I think, is enough for now. Questions? Comments? Anyone? Anyone?
Bueller?

Joshua Cranmer 🐧

unread,
Nov 6, 2014, 4:12:43 PM11/6/14
to
On 11/4/2014 3:50 PM, Gerry Rzeppa wrote:
> Having worked with a wide variety of traditional languages, and having
> written a number of compilers together, we knew that a practical procedural
> programming language could be concocted with just five types of statements:
>
> 1. Type definitions;
> 2. Global Variable definitions;
> 3. Routine Headers;
> 4. Conditional Commands; and
> 5. Unconditional Commands.

Honestly (and I mean no disrespect), this doesn't feel like "practical
procedural programming language" but "C or maybe Java." Personally,
I've never felt that distinguishing between procedural, imperative,
object-oriented, functional, etc. is terribly useful because the major
modern languages end up being a complex mismatch of everything. I mean
C++ and Java both now have lambdas, and C++ is even talking about
coroutines the other day.

Essentially, I find your categorization troubling because you're
assuming a particular, very specific way of looking at
programming--and a model that's not necessarily reflective of most
large-scale programs (where object-oriented code rules the day).

A good example of where things really break down is in the
introduction of parallelism. Many (all?) languages these days have put
some emphasis on trying to decide good parallelism and concurrency
models at increasingly fundamental positions in the language. I
believe there's a widespread consensus that shared multithreading
isn't good enough, but while there are models of more structured
constructs (e.g., OpenMP, which does explict task parallelism), I
can't speak to how often those get used in practice in major systems.
Certainly, I think the academic community heavily desires the use of
"structured" parallelism instead of "unstructured" (much as we prefer
structured control flow over the goto statement), but industry best
practices can lag a bit or a lot.

> The result of these ruminations was the following tentative definitions for
> our language:

What you have done is not used English as a programming language. What
you did was start with a semantic definition of your language and work
out a syntax for it that happens to be a subset of the English
language. Clearly, anything that's not expressible in your semantic
language definition isn't expressible in the English-language-version
of your syntax--even if that thought is expressible as a reasonable,
simple English sentence. Since there's a bit of a cooking theme in the
examples, here's an example of such a sentence from a cooking article
in a recent newspaper:

Once it has slowed to one or two seconds between pops, take it out of
the microwave.

In other words, now you have an example of event-driven programming!
Well, kind of (there are better example sentences). The way you've
defined your language, though, you can't express this. And it could be
that you don't want to--that's a design decision, and design decisions
are basically subjective opinions.

The reason I bring this all up, though, is that you make a conceit in
your Indiegogo page that the importance of what you're doing is that
you want "scientists, engineers, programmers and many others [ ... ]
to simply write down what they're thinking, in the way that's most
natural." And the way that's most natural is not necessarily
procedural programming--sometimes you can express stuff most naturally
in an event-driven model, perhaps a functional model, or maybe an
actor model. And that's why the major programming languages of today
are all multi-paradigm languages (and even continually expanding their
paradigms)--because, in the end, sometimes it's better to structure
portions of a program in different paradigms.

Now, to some degree, a Turing-complete language can emulate any other
Turing-complete language, even if the two are completely different,
incompatible paradigms. That doesn't mean it's a wise idea though, and
you can lose a lot of expressiveness and simplicity in such a
conversion. Using a parser generator like bison allows for you to much
more clearly use the code as documentation for the syntax of the
language--writing the code in a recursive-descent parser loses that
clarity, and it requires substantial commenting to recover what is
going on. You can also use the more abstract description to gain more
features: e.g., a LALR parser basically boils down to

for (each token):
switch (token):
... all code inline here ...

-- and that form is easier to convert into one that gets its tokens
streaming than a recursive descent parser. (On the other hand, parser
generators tend to suffer in their ability to print good diagnostics.)

TO sum it up, from my perspective, you have a rather banal and
underwhelming language, semantically speaking (actually some of the
semantic decisions you make some of us mightily disagree with, but
those are quibbles compared to what I discussed in this post). The
syntax seems to be what you are most proud of, but I rather suspect
that most of the people in this group don't care too much about
syntax.
--
Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth

Gerry Rzeppa

unread,
Nov 7, 2014, 1:17:56 PM11/7/14
to
Joshua Cranmer says, Honestly (and I mean no disrespect), this doesn't feel
like "practical
procedural programming language" but "C or maybe Java."

Gerry replies, All I meant by "practical" was "sufficient to write,
conveniently and efficiently, the kind of programs we typically write".
Those include advanced user interfaces, simplified file managers, elegant
text editors, hex dumpers, native-code-generating compilers, wysiwyg
page-layout facilities, and many more.

Joshua Cranmer says, Personally,
I've never felt that distinguishing between procedural, imperative,
object-oriented, functional, etc. is terribly useful because the major
modern languages end up being a complex mismatch of everything. I mean
C++ and Java both now have lambdas, and C++ is even talking about
coroutines the other day.

Gerry replies, I have to disagree; the mental models one forms when using
different kinds of languages are significantly different and thus affect
both how one thinks and how one codes. With a procedural langauge, for
example, one pictures oneself commanding the machine to do things; with an
object-oriented approach, one pictures semi-autonomous objects interacting
with one another. The resulting systems are typically significantly
different.

Joshua Cranmer says, Essentially, I find your categorization troubling
because you're
assuming a particular, very specific way of looking at
programming--and a model that's not necessarily reflective of most
large-scale programs (where object-oriented code rules the day).

Gerry replies, Yes, I am assuming a particular, very specific way of looking
at programming, because that's the model that we find works best for us. If
others prefer different ways of thinking, that's fine for them. And yes, our
model is "not necessarily reflective of most large-scale programs (where
object-oriented code rules the day)" -- intentionally so. I'm an old man and
I come from an era where we could send men to the moon and back with a slide
rule. Judging by more recent attempts at space travel, I suspect something
important in the way of simplicity and reliability got lost along the way.

Joshua Cranmer says, A good example of where things really break down is in
the
introduction of parallelism. Many (all?) languages these days have put
some emphasis on trying to decide good parallelism and concurrency
models at increasingly fundamental positions in the language. I
believe there's a widespread consensus that shared multithreading
isn't good enough, but while there are models of more structured
constructs (e.g., OpenMP, which does explict task parallelism), I
can't speak to how often those get used in practice in major systems.
Certainly, I think the academic community heavily desires the use of
"structured" parallelism instead of "unstructured" (much as we prefer
structured control flow over the goto statement), but industry best
practices can lag a bit or a lot.

Gerry replies, It is our belief that in many cases, parallelism of any kind
is simply overkill.

Joshua Cranmer says, What you have done is not used English as a programming
language. What
you did was start with a semantic definition of your language and work
out a syntax for it that happens to be a subset of the English
language.

Gerry replies, Yes, we've coded for a subset of English. But a subset that
can be grown both by improving the compiler, and -- more importantly -- by
programmers simply adding to the collection of available libraries.

Joshua Cranmer says, Clearly, anything that's not expressible in your
semantic
language definition isn't expressible in the English-language-version
of your syntax--even if that thought is expressible as a reasonable,
simple English sentence. Since there's a bit of a cooking theme in the
examples, here's an example of such a sentence from a cooking article
in a recent newspaper:

Once it has slowed to one or two seconds between pops, take it out of
the microwave.

Gerry replies, In the current version of our compiler, you'd have to express
that thought more like this:

If it has slowed to one or two seconds between pops, take the popcorn out of
the microwave.

But I hope you can see that it's not a big deal to extend our compiler to
recognize "once" as a synonym for "if" in such cases. Our program is, after
all, a mere prototype, a "proof of concept".

Joshua Cranmer says, In other words, now you have an example of event-driven
programming! Well, kind of (there are better example sentences).

Gerry replies: Yeah, I'm not quite sure why you call that "event driven"
programming; placing that statement in a strictly procedural loop would
work. But we do support the "event driven" paradigm in Plain English; in
fact, our sample program in our instruction manual is event driven. See page
20 here:

www.osmosian.com/instructions.pdf

Where the main event loop is defined as follows:

To handle any events:
Deque an event.
If the event is nil, exit.
Handle the event.
Repeat.

Joshua Cranmer says, The way you've
defined your language, though, you can't express this. And it could be
that you don't want to--that's a design decision, and design decisions
are basically subjective opinions.

Gerry replies: But we can express that thought, as above. Or like this:


To make some popcorn:
Put the popcorn into the microwave oven.
Turn on the oven.
Loop. If it not has slowed to one or two seconds between pops, repeat.
Take the popcorn out of the microwave.

And obviously, it won't require much of an extension to our prototype to
support a variety of "loop" expressions, including, for example:

Repeat until it slows to one or two seconds between pops.

Or

When it slows to one or two seconds between pops, stop.

Etc. Like any natural language, we expect Plain English to grow. At this
stage we've only answered our initial questions (enumerated and discussed,
quite fully, in a related thread here):

http://forums.anandtech.com/showthread.php?t=2358744

Joshua Cranmer says, The reason I bring this all up, though, is that you
make a conceit in
your Indiegogo page that the importance of what you're doing is that
you want "scientists, engineers, programmers and many others [ ... ]
to simply write down what they're thinking, in the way that's most
natural." And the way that's most natural is not necessarily
procedural programming--sometimes you can express stuff most naturally
in an event-driven model, perhaps a functional model, or maybe an
actor model. And that's why the major programming languages of today
are all multi-paradigm languages (and even continually expanding their
paradigms)--because, in the end, sometimes it's better to structure
portions of a program in different paradigms.

Gerry: We fully agree: that's the whole point of Hybrid programming; you use
the mode of expression that's most natural for each part of a program. But
the whole is contained in the most natural framework, specifically, a
natural language (in our case, English) -- like a math book, or a physics
book, or a technical (or non-technical) paper. Think Plain English with
sub-compilers for whatever other syntaxes seem generally (or specifically!)
useful.


Joshua Cranmer says, Now, to some degree, a Turing-complete language can
emulate any other
Turing-complete language, even if the two are completely different,
incompatible paradigms. That doesn't mean it's a wise idea though, and
you can lose a lot of expressiveness and simplicity in such a
conversion.

Gerry: We're not thinking "conversion" -- we're thinking "sub-compilers", as
above.

Joshua Cranmer says, Using a parser generator like bison allows for you to
much
more clearly use the code as documentation for the syntax of the
language--writing the code in a recursive-descent parser loses that
clarity, and it requires substantial commenting to recover what is
going on. You can also use the more abstract description to gain more
features: e.g., a LALR parser basically boils down to


for (each token):
switch (token):
... all code inline here ...


-- and that form is easier to convert into one that gets its tokens
streaming than a recursive descent parser. (On the other hand, parser
generators tend to suffer in their ability to print good diagnostics.)

Gerry: We use recursive descent only where convenient; in our prototype, for
example, we use it to (1) locate and (2) parse certain different statement
types, typically at later time. We also use a variety of ad hoc methods. And
we're convinced that in future versions, a more general "statement locator
and classifier" algorithm would serve us better. Specifically, our plan is
to scan the source, dividing it into blocks where each block contains some
recognizable syntax -- and then sub-compile each block with an appropriate
sub-compiler.

Joshua Cranmer says, TO sum it up, from my perspective, you have a rather
banal and
underwhelming language, semantically speaking (actually some of the
semantic decisions you make some of us mightily disagree with, but
those are quibbles compared to what I discussed in this post). The
syntax seems to be what you are most proud of, but I rather suspect
that most of the people in this group don't care too much about
syntax.

Gerry: Actually, what we're most proud of is not the syntax per se, but (1)
the fact that we were able to do so much with so little: that we were able
to conveniently and efficiently write a complete, non-trivial and
iconoclastic development environment with a unique native-code-generating
compiler/linker for a subset of English (with a wysiwyg page-layout
documentation facility, to boot) entirely in that very subset of English;
and (2) that our system can be used, educationally, with everyone from
primary school children to college students writing compilers -- the very
same development environment and language being suitable for both. But our
works does indeed strike different people in different ways. Unsolicited
reviews have ranged from "utter rubbish" to "a minor work of genius"; from
"banal and underwhelming" to "nothing short of impressive". Different
strokes for different folks.

Thanks for taking the time to comment.

[I think you're missing the point he was making about "once ...".
Event driven languages have a different model, you tell it what
the triggers are, it figures how how and when to call them,
typically from an implicit event loop. But you don't have
to write the loop. -John]

BartC

unread,
Nov 7, 2014, 1:20:58 PM11/7/14
to
"Gerry Rzeppa" <gerry....@pobox.com> wrote in message
> Having worked with a wide variety of traditional languages, and having
> written a number of compilers together, we knew that a practical
> procedural
> programming language could be concocted with just five types of
> statements:
>
> 1. Type definitions;
> 2. Global Variable definitions;
> 3. Routine Headers;
> 4. Conditional Commands; and
> 5. Unconditional Commands.

> And that, I think, is enough for now. Questions? Comments? Anyone? Anyone?
> Bueller?

I have problems with the whole approach.

For a start, anyone considering using such a language probably wouldn't be
concerned with type definitions. If it is meant to make programming simple
and accessible, then dynamic typing is expected (or more likely, someone
won't even think about that aspect; it should just work).

But then, even with a more informal language, there's only so much that can
be done, because to create sizeable programs that work reliably, I think you
need to use a more stylised and disciplined way of writing instructions. Any
free-format English would get in the way of that.

(I assume this syntax would just be a front-end to a conventional language
and conventional compiler, as your list suggests, and is not some
super-advanced AI project that can understand anything.)

Using syntax that looks like written English sounds attractive, and might be
suitable for some kinds of command-line interaction ('kill dwarf with axe'),
but I'm not sure it would be taken seriously for real programming.

Maybe, a tool can be used to translate conventional syntax into the English
style you propose, which would be fun, but I don't know if it warrants
creating a whole new language. Not if it doesn't otherwise do anything new.

--
Bartc
[Back in the 1970s there was a vogue for extensible languages, in
which you could add new syntax on the fly, by adding new BNF to the
underlying grammar. They all disappeared, because what happened was
that no two programs were written in the same langauge and nobody
could read them. Now we have OOP, where the syntax doesn't change,
but you can add lots of new types and semantics to go with them.
-John]

BartC

unread,
Nov 7, 2014, 1:22:59 PM11/7/14
to
"Joshua Cranmer p '" <Pidg...@verizon.net> wrote in message
Actually, some of us are fussy about syntax, but while we might prefer
something less terse than C++, it's not necessary to go as far as something
that can be mistaken for English.

About a year ago, I had an experimental project where I could write code in
my own preferred syntax, and it would translate it to one of a number of
well-known languages.

It worked up to a point (I wrote some small benchmarks that were translated
to C, Python, Lua, Lisp and one or two others. In some simple cases, exactly
the same code would work unchanged on a number of different targets). But
you still had to be aware of the different capabilities of each target. In
the end, I preferred to create a proper language of my own (two in fact).

It shows however that syntax is not that important. Well, provided it's not
totally crazy, and that some agreement is reached as to what is allowed
(otherwise no one can understand anyone else's code).

Technically, it wouldn't be a problem to allow a small number of different
styles (eg. C, Algol, Python, Lisp, maybe even Plain English) and use that
style across all languages instead of needing to switch.

--
Bartc

George Neuner

unread,
Nov 8, 2014, 12:08:38 AM11/8/14
to
On Wed, 05 Nov 2014 00:54:18 -0600, Joshua Cranmer ?
<Pidg...@verizon.net> wrote:

>On 11/4/2014 3:50 PM, Gerry Rzeppa wrote:
>> Having worked with a wide variety of traditional languages, and having
>> written a number of compilers together, we knew that a practical procedural
>> programming language could be concocted with just five types of statements:
>>
>> 1. Type definitions;
>> 2. Global Variable definitions;
>> 3. Routine Headers;
>> 4. Conditional Commands; and
>> 5. Unconditional Commands.
>
>Honestly (and I mean no disrespect), this doesn't feel like "practical
>procedural programming language" but "C or maybe Java." Personally,
>I've never felt that distinguishing between procedural, imperative,
>object-oriented, functional, etc. is terribly useful because the major
>modern languages end up being a complex mismatch of everything. I mean
>C++ and Java both now have lambdas, and C++ is even talking about
>coroutines the other day.

Lambdas in C++ and Java are merely ad-hoc objects. It's true that
objects and closures are isomorphic, but the manner in which they are
provided creates differences in idiomatic use and utility.


>Essentially, I find your categorization troubling because you're
>assuming a particular, very specific way of looking at
>programming--and a model that's not necessarily reflective of most
>large-scale programs (where object-oriented code rules the day).

OO has it's place, but it isn't any kind of universal solution. There
is a large class of problems that are not a good fit.

I frequently am amazed at the contortions Java programmers go through
to solve problems within the restrictive OO model. IMO, "Design
Patterns" was an admission that too many programmers couldn't use Java
to solve routine programming problems.
[That's as much an indictment of the lack of CS education among
programmers. Java - somewhat successfully - tried to pander to less
educated programmers by making simple things simpler, but in doing so
it made solving many real problems more complicated.]

IMO, the utility of a language is measured by how easily it bends to
fit the problem. I try to avoid using languages that unnecessarily
force the problem to bend.


>A good example of where things really break down is in the
>introduction of parallelism. Many (all?) languages these days have put
>some emphasis on trying to decide good parallelism and concurrency
>models at increasingly fundamental positions in the language.

Most of which are ineffective.

CSP (ala Hoare) works well for well structured problems: i.e. chained
"bucket brigade" solutions and solutions that can be cast as
map/reduce. However, more general, less well structured problems that
require frequent and/or complex coordination patterns among tasks
still are difficult to code.

Shared data structures general are a disaster - it's well known that
the majority of programmers are unable to write a correct solution
that involves shared data. However, programmers are enamored with
shared memory threads and so languages/libraries "help" by including
synchronization "primitives". Unfortunately, those "primitives" tend
to be anything but - often they are extremely slow due to their
generality. Where light weight special purpose primitives are
available, they often are eschewed because programmers don't
understand where, when and how to use them correctly.

Functional languages operating on immutable data are inherently
parallel at various granularities. But effective use of fine grain
parallelism has proved difficult because available hardware is not
optimized for it. Moreover immutable data structures often are far
more costly to manipulate than are their mutable versions.


>I believe there's a widespread consensus that shared multithreading
>isn't good enough, but while there are models of more structured
>constructs (e.g., OpenMP, which does explict task parallelism), I
>can't speak to how often those get used in practice in major systems.

OpenMP is an example of the CSP model. It is used extensively in big
science coding (with well structured problems). Erlang is another
example that sees a lot of use in telecommunications where the tasks
are numerous but are largely independent.
[Scala is a language based largely on Erlang that targets the JVM.]


Dataflow languages (Lucid, Oz, etc.), constraint logic (Prolog, etc.),
and explicit coordination languages (Linda, Occam, etc.) solve the
coordination issues of unstructured problems, but they are not
popular. [Note that some of the examples above fall under multiple
categories.]

The dataflow and logic languages tend to be limited to shared memory
systems, declarative and functional (which puts off many people), and
rely on unfamiliar computation models and/or extra-logical constructs
(which further puts off people).

Coordination languages tend to be more familiar looking ... but some
like Occam are too closely associated with hardware that no longer
exists, and others like Linda rely on associative memory that (even
distributed) can become a bottleneck for a large system.


>Certainly, I think the academic community heavily desires the use of
>"structured" parallelism instead of "unstructured" (much as we prefer
>structured control flow over the goto statement), but industry best
>practices can lag a bit or a lot.

CSP handles structured problems quite well. The real issue is that
there are too many problems that are not well structured. The
academic community recognizes this, but has produced few viable
solutions and none that scale well.


>What [Gerry Rzeppa et all] have done is not used English as a
>programming language. What you did was start with a semantic
>definition of your language and work out a syntax for it that
>happens to be a subset of the English language.

+1


>In other words, now you have an example of event-driven programming!

There are languages specifically designed for event programming. They
also tend to be niche and unpopular.


>The reason I bring this all up, though, is that you make a conceit in
>your Indiegogo page that the importance of what you're doing is that
>you want "scientists, engineers, programmers and many others [ ... ]
>to simply write down what they're thinking, in the way that's most
>natural." And the way that's most natural is not necessarily
>procedural programming--sometimes you can express stuff most naturally
>in an event-driven model, perhaps a functional model, or maybe an
>actor model. And that's why the major programming languages of today
>are all multi-paradigm languages (and even continually expanding their
>paradigms)--because, in the end, sometimes it's better to structure
>portions of a program in different paradigms.

What's important is problem solving logic - the language used to
express the logic of the solution may either help or hinder the
implementation.

Most natural languages are a hindrance to logic because they are
inherently ambiguous. There is some anecdotal evidence that Sumerian
might have been sufficiently unambiguous to program with, but the
issue is moot because no one today speaks it.


>Now, to some degree, a Turing-complete language can emulate any other
>Turing-complete language, even if the two are completely different,
>incompatible paradigms. That doesn't mean it's a wise idea though, and
>you can lose a lot of expressiveness and simplicity in such a
>conversion.

Any Turing-complete language *can* emulate any other (and also any
sub-Turing language). Expressiveness of the language is important to
programmer productivity but does not affect capability.

E.g., anything you can write in Lisp you also can write in assembler
... assuming you have unlimited time on your hands.

George

BartC

unread,
Nov 8, 2014, 12:09:31 AM11/8/14
to
"BartC" <bc...@freeuk.com> wrote in message news:14-1...@comp.compilers...

> (I assume this syntax would just be a front-end to a conventional language
> and conventional compiler, as your list suggests, and is not some
> super-advanced AI project that can understand anything.)
>
> Using syntax that looks like written English sounds attractive, and might be
> suitable for some kinds of command-line interaction ('kill dwarf with axe'),
> but I'm not sure it would be taken seriously for real programming.
>
> Maybe, a tool can be used to translate conventional syntax into the English
> style you propose, which would be fun, but I don't know if it warrants
> creating a whole new language. Not if it doesn't otherwise do anything new.
>

> [Back in the 1970s there was a vogue for extensible languages, in
> which you could add new syntax on the fly, by adding new BNF to the
> underlying grammar. They all disappeared, because what happened was
> that no two programs were written in the same langauge and nobody
> could read them. Now we have OOP, where the syntax doesn't change,
> but you can add lots of new types and semantics to go with them.
> -John]

They haven't all disappeared; there is still Seed7 (sometimes appearing in
this group).

Although this language supposedly has a user-extensible syntax, I've
never seen source examples that look like anything other than Seed7,
so perhaps my worry that everyone will invent a different look isn't
justified. (Or maybe I just haven't seen fully customised versions of
it; or haven't recognised them!)

With OOP, I don't really agree with 'language building' features being
used for everyday coding, at least not to the extent they are
available in C++ for example. It makes everything a lot more
complicated (language and compiler), and I think encourages people to
use them unnecessarily.

My point of view is a little unusual however, because I use my own
compilers, so adding a new, fully overloaded type for example, is
easily done by changing something in the compiler; that means the
ability to do that directly in the language is not a priority.

Maybe there is a need to do these things in an easier manner, but
without going so far as allowing a free-for-all as some languages do.

--
Bartc

Joshua Cranmer 🐧

unread,
Nov 8, 2014, 12:12:33 AM11/8/14
to
On 11/6/2014 6:52 PM, Gerry Rzeppa wrote:
> Gerry replies, Yes, I am assuming a particular, very specific way of
> looking at programming, because that's the model that we find works
> best for us.
>
> Gerry replies, It is our belief that in many cases, parallelism of
> any kind is simply overkill.
>
> Gerry: We fully agree: that's the whole point of Hybrid programming;
> you use the mode of expression that's most natural for each part of a
> program. But the whole is contained in the most natural framework,
> specifically, a natural language (in our case, English) -- like a
> math book, or a physics book, or a technical (or non-technical)
> paper. Think Plain English with sub-compilers for whatever other
> syntaxes seem generally (or specifically!) useful.

I'm putting these three replies in juxtaposition with each other
because it's the best way I can think of to explain the inherent
contradiction I see.

You want to design a language model that allows people to express
their thoughts in the most natural way possible. Yet at the same time,
you limit the expressibility of the programming model because, well,
you personally don't think naturally in alternative models and thus
you don't see why other people might.

When I tried to point out examples of alternative paradigms, your
reply was fixated on trying to prove that you could express them in
your procedural paradigm. That's not the point. The point was that
having to do these paradigm translations is as much a programmer
burden as, say, having to translate from "I need to solve this
equation" to writing the code to solve that equation.



> Once it has slowed to one or two seconds between pops, take it out
> of the microwave.
>
[ ... ]
> Gerry replies: But we can express that thought, as above. Or like
> this:
> To make some popcorn: Put the popcorn into the microwave oven. Turn
> on the oven. Loop. If it not has slowed to one or two seconds between
> pops, repeat. Take the popcorn out of the microwave.

But those two statements are not semantically equivalent. The example
you give requires the program to continually check in a busy-wait
loop; the example I gave does not.

> Gerry: Actually, what we're most proud of is not the syntax per se,
> but (1) the fact that we were able to do so much with so little: that
> we were able to conveniently and efficiently write a complete,
> non-trivial and iconoclastic development environment with a unique
> native-code-generating compiler/linker for a subset of English (with
> a wysiwyg page-layout documentation facility, to boot) entirely in
> that very subset of English

After reading the thread you posted, it became clear to me that you
are (were?) under the impression that having a self-hosting compiler
was rare for a new proposed language. On the contrary, it is (to my
knowledge) standard rigor to present a new language by having its
self-hosted compiler be the largest application written in said
language. I've even heard it opined that this tends to cause people to
optimize their language design for writing compilers to the detriment
of other kinds of applications.

Your comments about "subset of English" are misleading, when taken in
context of your stated goal of making a "natural language" compiler.
That term, in the eyes of most, would imply that you could, say, build
a compiler that inputs the ISO C++11 specification and outputs a
working C++11 compiler.

Regardless of whether or not that is what you desire to build, your
prototype does not clearly make any advancements to that goal. What
you have is not a compiler that parses English but one that parses a
language which happens to be a subset of English--it's a very big
difference, but several others have attempted to explain that
difference to you and failed, and I have no expectation of being any
different.

> and (2) that our system can be used,
> educationally, with everyone from primary school children to college
> students writing compilers -- the very same development environment
> and language being suitable for both.

Programming came so naturally to me that I disclaim any ability to
properly evaluate programming 101. But the responses you have given,
both in this newsgroup and the AnandTech forum thread, suggest to me
that you are aware of the issues in writing large, complex software:
you appear to value minimalism in a language (rather than in client
code) very highly, and you appear to be very dismissive of "new"
paradigms.

A mild example of that tendency is present in the statement I quoted:
you list compilers as the "more complex" endpoint, yet compilers are
relatively simple affairs--which is why compiler courses tend to
assign "build a compiler" as an ongoing project in the course. The
sorts of projects I associate with production languages tend to start
considering themselves large only when they get a few million lines of
code.

Gerry Rzeppa

unread,
Nov 8, 2014, 9:06:56 AM11/8/14
to
I'm replying to several posts together because the answers are closely
related. I'll answer specific questions and comments in a moment; but first,
allow me a couple of introductory remarks and an example that will help
throughout.

One of the main purposes of our research is to determine how "ordinary"
people -- ie, people untrained in formal methods of problem solving and
artificial modes of expression -- (a) naturally think about, and (b)
naturally describe, procedures for solving problems. Since the making of
popcorn was brought up earlier, I'll use that as an example. Here's how my
wife described the process -- with just a little prompting for clarification
of certain points -- when I asked her:

To make some popcorn in a microwave oven:
Put the popcorn in the microwave.
Turn on the microwave.
Wait until the popcorn is done popping.
Turn off the microwave.
Take the popcorn out of the microwave.

Note that my nine-year-old son independently gave a virtually identical
description. Further note the following:

1. The process, in their minds, is naturally sequential and procedural.

2. They don't give formal names to any of the objects in the situation, but
use generic nouns (popcorn, oven) preceded by articles (some, a, the) with
an optional adjective sometimes appearing inbetween (microwave). They use
indefinite articles (some, a) when referring to entities in general, as in
their opening statements, but switch to the definite article to refer to
those same entities thereafter. And they often use the adjective preceding a
noun by itself (as if it were a noun) for convenience -- just "microwave"
instead of "microwave oven".

3. Admitting that there may be more concise (or even precise) ways of
expressing these thoughts, and allowing for minor variations in expression,
I think it's fair to say that this way of describing the process is probably
the clearest and most natural for the kind of people described above -- any
English-speaking person can understand it, without training of any sort. And
it works for the computer, too, since it is also Plain English source code
that will compile directly to machine instructions.

And now to your specific remarks.

John says,
I think you're missing the point he was making about "once ...".
Event driven languages have a different model, you tell it what
the triggers are, it figures how how and when to call them,
typically from an implicit event loop. But you don't have
to write the loop.

Gerry replies:
Yes, I can see how you might think that. The bottom line is that I don't see
any significant difference between a system with a compiler-generated loop
that dispatches events, and a system with a similar loop hand coded: in both
cases the programmer has to tell the thing where to go and when to go there.

Bartc says,

For a start, anyone considering using such a language probably wouldn't be
concerned with type definitions. If it is meant to make programming simple
and accessible, then dynamic typing is expected (or more likely, someone
won't even think about that aspect; it should just work).

Gerry replies:
I strongly disagree. Ordinary people are extremely "type oriented" -- they
refer to almost everything in their lives (other than humans and pets) by
type names. Like "popcorn" and "microwave oven" above. "Put the book on the
table by the chair" is a natural mode of expression, and it is thoroughly
based on types (books and tables and chairs). Further, such people expect
books to be books and chairs to be chairs, and they expect them to stay that
way; "dynamic typing" is a very foreign concept in real world.

Bartc says,
But then, even with a more informal language, there's only so much that can
be done, because to create sizeable programs that work reliably, I think you
need to use a more stylised and disciplined way of writing instructions. Any
free-format English would get in the way of that.

Gerry replies:
We offer our prototype system as proof that is not so. It is a sizeable,
reliable program that includes a unique interface, a simplified file
manager, an elegant text editor, a hexadecimal dumper, a
native-code-generating compiler/linker, and a wysiwyg page layout facility
for documentation; and it was coded entirely in Plain English --
conveniently and efficiently so.

Bartc says,
(I assume this syntax would just be a front-end to a conventional language
and conventional compiler, as your list suggests, and is not some
super-advanced AI project that can understand anything.)

Gerry replies:
On the contrary, we compile Plain English source directly to machine code.

Bartc says,
Using syntax that looks like written English sounds attractive, and might be
suitable for some kinds of command-line interaction ('kill dwarf with axe'),
but I'm not sure it would be taken seriously for real programming.

Gerry replies:
Our prototype IS a real and non-trivial program, and it was coded entirely
in Plain English. It is capable of reproducing itself, and has been used to
create many new and better versions of itself. It has also been used to
create other programs that have been used productively by a variety of users
in a variety of ways. It is also excellent for education because it is not
only suitable for beginners, but can be used to teach advanced subjects
(such as data structures, compiler design and implementation, etc) as well.

Bartc says,
Maybe, a tool can be used to translate conventional syntax into the English
style you propose, which would be fun, but I don't know if it warrants
creating a whole new language. Not if it doesn't otherwise do anything new.

Gerry replies:
The language has already been created. But we're not striving for something
new; rather, we're striving for something old, as I described at the top of
this post: a programming environment that will allow ordinary people to
explain things to a computer in their ordinary way.

John says:
Back in the 1970s there was a vogue for extensible languages, in
which you could add new syntax on the fly, by adding new BNF to the
underlying grammar. They all disappeared, because what happened was
that no two programs were written in the same langauge and nobody
could read them.

Gerry replies:
An obviously bad idea. And not at all what we've done and are proposing for
the future. We're aiming at a compiler that will properly translate natural,
popular, and well-understood modes of expression into something computer can
run.

John says:
Now we have OOP, where the syntax doesn't change,
but you can add lots of new types and semantics to go with them.

Gerry replies:
OOP is fundamentally flawed. See
http://harmful.cat-v.org/software/OO_programming/why_oo_sucks . And Plain
English is user-extensible in those ways without the syntax problem (since
any extensions to the syntax are also English, which is already understood
by every English-speaking person).

Bartc says,
Actually, some of us are fussy about syntax, but while we might prefer
something less terse than C++, it's not necessary to go as far as something
that can be mistaken for English.

Gerry replies:
It IS necessary if we want to program like we naturally think and speak to
others; if we want our programs to be self-documenting; if we want to be
able to teach both beginners and experts using the same language and the
same paradigm; etc.

Bartc says,
About a year ago, I had an experimental project where I could write code in
my own preferred syntax, and it would translate it to one of a number of
well-known languages. It worked up to a point
(I wrote some small benchmarks that were translated
to C, Python, Lua, Lisp and one or two others. In some simple cases, exactly
the same code would work unchanged on a number of different targets). But
you still had to be aware of the different capabilities of each target. In
the end, I preferred to create a proper language of my own (two in fact).

Gerry replies:
We've written lots of different languages of our own, too. The problem with
all such things is they are essentially arbitrary -- and there is no end to
them. We were working on one when we switched to natural language
programming, in fact. Why? Because we wanted to write the "last" programming
language, not just "another". And because we saw that natural languages
allow ordinary people, and very young students, and even experts to think
and write in ways that are natural to them and their peers. We saw that such
programs are self-documenting, as well. And long-lasting -- think about it:
grab a routine written in any popular programming language and ask yourself
how intelligible it will be to people who live 100 years from now; then ask
yourself the same thing regarding the Plain English routine at the top of
this post.

Gerry Rzeppa

unread,
Nov 8, 2014, 9:08:23 AM11/8/14
to
Joshua_Cranmer:
You want to design a language model that allows people to express
their thoughts in the most natural way possible.

Gerry:
Yes. And it seems to us that the method of expression that is both most
frequently used and that has survived the test of time is that which is
pictured and described in our Hybrid Programming project:

https://www.indiegogo.com/projects/the-hybrid-programming-language/x/8950932

A natural language framework, with other modes of expression (formulas,
graphics, etc) inserted where appropriate. You can see examples of it in
human-to-human communication everywhere: web pages, encyclopedias, white
papers, and this very forum. It's the natural way humans communicate ideas.
All we want to do is extend that paradigm to include human-to-computer
communications.

Joshua_Cranmer:
Yet at the same time,
you limit the expressibility of the programming model because, well,
you personally don't think naturally in alternative models and thus
you don't see why other people might.

Gerry:
It's true, I prefer a procedural approach for solution descriptions; and I
think most of the
people on the planet, by far, do as well. But that doesn't mean that I would
be against someone inserting a snippet of event-driven code, or an object or
two, or whatever, in the middle of a Hybrid program.


Joshua_Cranmer:
When I tried to point out examples of alternative paradigms, your
reply was fixated on trying to prove that you could express them in
your procedural paradigm. That's not the point. The point was that
having to do these paradigm translations is as much a programmer
burden as, say, having to translate from "I need to solve this
equation" to writing the code to solve that equation.

Gerry:
Yes, but I believe that most systems are procedural at bottom; and that most
people naturally think procedurally. (Think of the billions of instruction
manuals, repair manuals, how-to books and even recipe cards that have been
written on any and every imaginable topic.) I further believe that most
problems, when fully understood, can be most easily described and understood
and solved procedurally. But Hybrid programming allows for the specialists
among us, as above; and rightly so.

Joshua_Cranmer:
But those two statements are not semantically equivalent. The example
you give requires the program to continually check in a busy-wait
loop; the example I gave does not.

Gerry:
And in a Hybrid program you would be able to specify either solution.

Joshua_Cranmer:
After reading the thread you posted, it became clear to me that you
are (were?) under the impression that having a self-hosting compiler
was rare for a new proposed language. On the contrary, it is (to my
knowledge) standard rigor to present a new language by having its
self-hosted compiler be the largest application written in said
language. I've even heard it opined that this tends to cause people to
optimize their language design for writing compilers to the detriment
of other kinds of applications.

Gerry:
I don't think it's at all rare for a new language to be "written in itself".
(Though many of today's most popular languages are not: Javascript, for
example.) It is somewhat more rare to have a complete development
environment
written in the new language; and quite rare for a compiler-writer to "prove"
his language by writing a wysiwyg page-layout facility in his new language
and then use that facility to produce the documentation for the language.
And extremely rare for the whole thing to be written in English-language
sentences. Point me to some English-sentence compilers (or even
interpreters) written in English: SIRI? No. Alpha? No. Inform? No.

Joshua_Cranmer:
Your comments about "subset of English" are misleading, when taken in
context of your stated goal of making a "natural language" compiler.
That term, in the eyes of most, would imply that you could, say, build
a compiler that inputs the ISO C++11 specification and outputs a
working C++11 compiler.

Gerry:
A program that could "input the ISO C++ specification and output a
working C++ compiler" would not be a compiler, to my way of thinking, but
a (non-human) analyst and programmer; a thing capable of analyzing a
specification, devising an appropriate design, and then implementing that
design. A compiler is much more like a translator in my mind. Now if the
"specification" in question was a step-by-step procedural description of
exactly how to convert C++ code into machine language, then I would call a
program that actually accomplished the feat a compiler.

Joshua_Cranmer:
Programming came so naturally to me that I disclaim any ability to
properly evaluate programming 101. But the responses you have given,
both in this newsgroup and the AnandTech forum thread, suggest to me
that you are aware of the issues in writing large, complex software:
you appear to value minimalism in a language (rather than in client
code) very highly, and you appear to be very dismissive of "new"
paradigms.

Gerry:
Yes, I value minimalism in everything very highly, and I am dismissive of
"new" paradigms that have obvious flaws or limitations that are not
recognized by their proponents.

Joshua_Cranmer:
A mild example of that tendency is present in the statement I quoted:
you list compilers as the "more complex" endpoint, yet compilers are
relatively simple affairs--which is why compiler courses tend to
assign "build a compiler" as an ongoing project in the course.

Gerry:
I see compilers as unique. A program that cannot be used to produce better
versions of itself is derivative at best, sterile at worst. I also believe
that there is no such thing as easy or hard problems: that any
well-understood solution to anything, taken one proper step at a time, can
be understood by nearly anyone. Some such problems are certainly larger than
others, and some more tedious; but that doesn't make them hard; just big or
boring.

Joshua_Cranmer:
The sorts of projects I associate with production languages tend to start
considering themselves large only when they get a few million lines of
code.

Gerry:
Yeah, I don't think any problem deserves a million lines of code. Compare,
say, Windows (50 million lines) to Oberon (25,000); somebody needs to start
deleting. But let's say Plain English or the Hybrid Programming Language
that we're proposing never got past 25,000-line systems; surely they would still
be a useful things.

Richard Hable

unread,
Nov 8, 2014, 1:08:34 PM11/8/14
to
On 11/07/14 22:23, George Neuner wrote:

> Most natural languages are a hindrance to logic because they are
> inherently ambiguous. There is some anecdotal evidence that Sumerian
> might have been sufficiently unambiguous to program with, but the
> issue is moot because no one today speaks it.

The most appropriate natural language to use for programming would
probably be Esperanto, because of its regular grammar.

Nevertheless, it would still be like using a hammer to tighten a
screw: it might somehow work if a hammer is all you have at your
disposal, but it would always be better to get a screwdriver.

Richard

Joshua Cranmer 🐧

unread,
Nov 8, 2014, 2:22:31 PM11/8/14
to
On 11/7/2014 3:23 PM, George Neuner wrote:
> On Wed, 05 Nov 2014 00:54:18 -0600, Joshua Cranmer ?
> <Pidg...@verizon.net> wrote:
>> Essentially, I find your categorization troubling because you're
>> assuming a particular, very specific way of looking at
>> programming--and a model that's not necessarily reflective of most
>> large-scale programs (where object-oriented code rules the day).
>
> OO has its place, but it isn't any kind of universal solution. There
> is a large class of problems that are not a good fit.

I apparently didn't state this very well in my original post, but my
personal opinion is that there is no "one true paradigm" but rather a
set of paradigms best suited to different tasks. IMHO, programming
languages ultimately need to have a collection of paradigms that it
supports well, and for languages targeting large applications, OO is
one (but not the only one) of the ones that absolutely needs to
supported.

> IMO, the utility of a language is measured by how easily it bends to
> fit the problem. I try to avoid using languages that unnecessarily
> force the problem to bend.
>
>> A good example of where things really break down is in the
>> introduction of parallelism. Many (all?) languages these days have put
>> some emphasis on trying to decide good parallelism and concurrency
>> models at increasingly fundamental positions in the language.
>
> Most of which are ineffective.

I won't deny that parallelism is a problem for which a good solution
does not yet exist. But we've reached an age where it's a problem that
can't be solved by sticking your head in the sand--and a solution that
allows you to solve some problems well is better than one that allows
you to solve nothing.

I think it's clear that the mutable shared model and
mutex/condvar/etc.-based building blocks as the only level of support is
a, well, disaster, as you put it. Immutable data I think has also been
shown to be a complete non-starter. Sometimes when your parallelism is
completely centered on very large mutable, shared data structures, and a
solution that completely prevents that from being used is untenable. At
the same time, shared mutable memory is a definite lightning rod for
disaster--maybe having "unsafe" blocks isn't such a bad idea. :-)

I think actor models (which is basically CSP?) appear to be winning out,
but, as you say, they're not completely sufficient. If I had to make a
bet, it would be on an actor-based model with some support for
cross-thread shared data collections. But I would be hesitant to wager
very much even on that.

You've failed to mentioned transactional memory. There are also things
like Concurrent Collections which I'm hard-pressed to place, particular
since I'm vague on the details (maybe dataflow?).

> What's important is problem solving logic - the language used to
> express the logic of the solution may either help or hinder the
> implementation.
>
> Most natural languages are a hindrance to logic because they are
> inherently ambiguous. There is some anecdotal evidence that Sumerian
> might have been sufficiently unambiguous to program with, but the
> issue is moot because no one today speaks it.

There's an issue not strictly related to ambiguity that I realized
when reading Gerry's examples. English (and presumably other natural
languages, but I'm not enough of a linguist to say with any
confidence) allows you to reliably refer to one, sometimes two, things
without naming them. At the same time, it makes it cumbersome to name
a temporary object.

> E.g., anything you can write in Lisp you also can write in assembler
> ... assuming you have unlimited time on your hands.

I prefer Brainfuck as the go-to "unprogrammable" language. :-)

BartC

unread,
Nov 10, 2014, 3:14:28 AM11/10/14
to
> Gerry:
> I don't think it's at all rare for a new language to be "written in
> itself".
> (Though many of today's most popular languages are not: Javascript, for
> example.) It is somewhat more rare to have a complete development
> environment
> written in the new language; and quite rare for a compiler-writer to
> "prove"
> his language by writing a wysiwyg page-layout facility in his new language
> and then use that facility to produce the documentation for the language.

(Probably not so rare; it's what I used to do and still do. Because once a
language is working enough to be useful, then why shouldn't it be used for
whatever tasks happen to come up? It's a good test of language and
implementation.

Self-hosting (implementing a language in itself) is a nice touch, but
I don't think it's that important, perhaps because people are
realising that the best language to implement a compiler in, is not
necessarily the language that is being compiled. Unless of course you
are trying to use a single language for everything.)

> And extremely rare for the whole thing to be written in English-language
> sentences. Point me to some English-sentence compilers (or even
> interpreters) written in English: SIRI? No. Alpha? No. Inform? No.

I've looked at your project in a little more detail. It seems quite
interesting, but I think the Plain English aspect is obscuring the
rest of the project.

It's an odd mix of high and low level features, for example there is
even some inline machine code in there! (In 'the noodle', so perhaps
it isn't implemented quite 100% in itself, unless this is some
temporary bootstrapping or other code.) Then there are direct calls to
Win32 functions.

Plus there is lots of code that would be written in pretty much the
same way in nearly every other language, but the need to pretend it is
plain English just adds clutter. One example:

a uuid is a record with
a number called d1,
a wyrd called d2,
a wyrd called d3,
8 bytes called d4.

(I would guess someone savvy enough to know what to do with a UUID can
probably be trusted with some conventional syntax.)

What's not clear is whether an expression such as A+B*C needs to be
written as 'Multiply B by C then Add A', or whatever it might be. See,
people who had known perfectly well how to write such an expression,
are now reduced to guessing! I think anyone who's been to school would
be able to cope with writing algebraic expressions. (I take it that
numeric constants are allowed to use digits and don't have to be
written out in words.)

So in many ways the language is conventional (apart from it's many
restrictions: no nested statements for example); so what does plain
English add except long-windedness and clutter (and which can't be
done for viewing purposes by a translator or a smart editor)?

You're right that it is rare to use natural English as a basis for syntax,
but perhaps there are good reasons for that!

--
Bartc
email: change @ to as@ in b...@freeuk.com
[Even COBOL has COMPUTE A=B+C+D as an alternative to
ADD B TO C TO D GIVING D (or something like that.)
English isn't a very good way to describe algorithms
once they get at all complicated. -John]

George Neuner

unread,
Nov 10, 2014, 3:15:34 AM11/10/14
to
On Sat, 08 Nov 2014 10:48:09 -0600, Joshua Cranmer ?
<Pidg...@verizon.invalid> wrote:
>On 11/7/2014 3:23 PM, George Neuner wrote:


>I think actor models (which is basically CSP?) appear to be winning out,
>but, as you say, they're not completely sufficient. If I had to make a
>bet, it would be on an actor-based model with some support for
>cross-thread shared data collections. But I would be hesitant to wager
>very much even on that.

Yes, actors are a form of CSP. And I agree that CSP in whatever form
currently is the best option available. The major difficulty with it
is handling complex coordination patterns.

Special purpose coordination languages are better able to handle
complex communication patterns, but they rely on temporal decoupling -
i.e. they have no sense of real world time - and, so far, none has
proven to scalable beyond a few thousands of actors [which is not a
lot for a complex system].


>You've failed to mentioned transactional memory.

My MS degree is in database technology.

I can see potential for transactional memory, but currently HTM is
non-scalable, confined to shared-memory systems, and hard to use
effectively due to limitations on the size of managed memory, the
number of concurrent transactions supported and the less-than-friendly
isolation model that is implemented.

On chip HTM essentially works through an extension of the cache
coherence protocol to fail on X-access to a line owned by another core
rather than to arbitrate access to it. TM needs to track a great deal
of meta information - minimally the entire set of written lines on a
per transaction basis - which severely limits the size of memory that
can be managed. Existing HTM implementations provide only a handful
of kilobytes.

Implementations vary as to whether they also fail on read of a dirty
line owned by a concurrent transaction. Most don't and thus (in
database terms) provide only uncommitted-read isolation.
Uncommitted-read is the most basic isolation level and the most
difficult to work with ... essentially every TM hosted value has to be
considered (C) volatile.

STM layered on top of HTM can provide more programmer friendly
isolation and also can address the HTM size issues. But the STM layer
creates additional register and/or cache pressure on transactions that
can't tolerate the limitations of the underlying HTM [which in most
uses is likely to be most transactions.]

Pure STM can (at least theoretically) support clusters as well as
shared memory, but STM is slow even in shared memory: performance
similar to that of an in-memory DBMS, a bit faster because STM does
not also have to address data persistence.


Again, TM has value but is not a scalable solution.



>There are also things like Concurrent Collections which I'm
>hard-pressed to place, particular since I'm vague on the details
>(maybe dataflow?).

CC is just a thread safe collection.



>> Most natural languages are a hindrance to logic because they are
>> inherently ambiguous. There is some anecdotal evidence that Sumerian
>> might have been sufficiently unambiguous to program with, but the
>> issue is moot because no one today speaks it.
>
>There's an issue not strictly related to ambiguity that I realized
>when reading Gerry's examples. English (and presumably other natural
>languages, but I'm not enough of a linguist to say with any
>confidence) allows you to reliably refer to one, sometimes two, things
>without naming them. At the same time, it makes it cumbersome to name
>a temporary object.

Exactly. English in particular has slippery pronouns: the "royal"
"we" is singular; "you", "they" and "it" all may be singular or plural
depending on context.
[The plural use of "it" is rare in polite conversation, but is
relatively common in exasperation and vulgarity: e.g., "Screw it!" may
be a multiple reference.]


>I prefer Brainfuck as the go-to "unprogrammable" language. :-)

A neat simulation of a Turing tape machine. IIRC, there are several
compilers that target it as a bytecoded interpreter. Very small, but
still very powerful.

George

Gerry Rzeppa

unread,
Nov 10, 2014, 3:58:54 PM11/10/14
to
George says,
English in particular has slippery pronouns: the "royal"
"we" is singular; "you", "they" and "it" all may be singular or plural
depending on context.
[The plural use of "it" is rare in polite conversation, but is
relatively common in exasperation and vulgarity: e.g., "Screw it!" may
be a multiple reference.]

Gerry replies:
Which is one of the reasons we adopted the "language design model" that we
did. We knew that experts in linguistics understood such subtleties; but we
also knew that most of the people who speak English don't -- and yet they
somehow get by.

So, instead of trying to "understand" imperative sentences, we simply
attempt to MATCH them with routines defined elsewhere -- because this
appears to be what happens in many cases when we ask questions of, or give
commands to, our wives, our children, and even our dogs.

Say, "Want a treat little buddy?" and a baby (or a dog) may hear only "blah,
blah, TREAT, blah, blah" -- but will nevertheless respond appropriately.
Why? Because the word "treat" is associated with some kind of
previously-recorded mental picture or concept. Say, "Where has that mother
of yours gone?" and the subject may only hear, "WHERE blah blah MOTHER blah
blah blah" -- but will again respond appropriately, because a "where"
routine with a "mother"-compatible parameter has been previously defined and
can be called and executed.

Our compiler takes the same approach: we let the programmer define his own
routine headers -- using his own vocabulary, spelling, good or bad grammar,
etc -- and then we select the "best match" for his imperative sentences from
those.

Now one might think that such a sloppy approach would result in an difficult
and unpredictable system: but the fact of the matter (and we speak from
actual experience here) is that it results in a system that is both
convenient and reliable.

Tomasz Kowaltowski

unread,
Nov 10, 2014, 3:59:36 PM11/10/14
to
This discussion reminds me of the history of mathematical notation.
The notation we use today was adopted mainly in the 17th century and
contributed to rapid development of mathematics. Before that equations
were usually written in "Plain Latin" or "Plain Arabic", or ""Plain
Whatever". I don't think anybody would recommend going back ;-).

Tomasz

Gerry Rzeppa

unread,
Nov 10, 2014, 4:02:47 PM11/10/14
to
Bartc says,
I've looked at your project in a little more detail. It seems quite
interesting, but I think the Plain English aspect is obscuring the
rest of the project. It's an odd mix of high and low level features, for
example there is
even some inline machine code in there! (In 'the noodle', so perhaps
it isn't implemented quite 100% in itself, unless this is some
temporary bootstrapping or other code.) Then there are direct calls to
Win32 functions.

Gerry replies,
We realize that English isn't the most convenient or effective way of saying
everything, but we didn't have time to implement the "hybrid" programming
language that we ultimately envision. So we settled for a mix of the "very
highest" (English) and "very lowest" (machine code) as a "proof of
concept" -- and because it was necessary (a compiler has to put out machine
code somewhere). Thus, in the noodle and the compiler we end up with such
things as:

To add a number to another number:
Intel $8B85080000008B008B9D0C0000000103.

Which is closer to our envisioned Hybrid language than strict Plain English.

Bartc says,
Plus there is lots of code that would be written in pretty much the
same way in nearly every other language, but the need to pretend it is
plain English just adds clutter. One example:

a uuid is a record with
a number called d1,
a wyrd called d2,
a wyrd called d3,
8 bytes called d4.

(I would guess someone savvy enough to know what to do with a UUID can
probably be trusted with some conventional syntax.)

Gerry replies,
Most of the "ugly stuff" in the noodle is the result of the need to
interface with the truly ugly stuff we were faced with in Windows;
backward-compatibility stuff, if you will. It is not representative of what
we envision as pure Plain English code, or even Hybrid code. When we get
around to writing an operating system in Plain English, most of this ugly
middle layer will simply disappear.

Bartc says,
What's not clear is whether an expression such as A+B*C needs to be
written as 'Multiply B by C then Add A', or whatever it might be. See,
people who had known perfectly well how to write such an expression,
are now reduced to guessing! I think anyone who's been to school would
be able to cope with writing algebraic expressions. (I take it that
numeric constants are allowed to use digits and don't have to be
written out in words.)

Gerry replies,
We have two classes of people to consider here: the "learned" and the
"unlearned". The unlearned -- almost everyone, relatively speaking -- will
say that "Five plus five times three" equals thirty, not twenty. If you
doubt that statement, go down to Walmart and ask around. The learned class,
on the other hand, will often convert that statement to something like
"5+5*3" or even, "5+(5*3)" in their heads and arrive at the alternate value.
Both groups, we believe should be allowed to code what they're thinking, in
the syntax they're thinking it. Future versions of our compiler --
particularly the "hybrid" version -- will support that. No guessing
necessary, since the compiler will select the correct precedence rules based
on the syntactical form (eg, if the variables are named in an English-like
way with leading articles and there are no special symbols present, we'll
take it the Walmart way; if the statement has "mathematically inspired"
names without leading articles and if there are special symbols are present,
we'll take it the Oxford & Cambridge way; when in doubt, we'll ask for
clarification.

Bartc says,
So in many ways the language is conventional...

Gerry replies:
Sure; we're not interested in reinventing any wheels here. We're more
interested in "getting back to basics".

Bartc says,
...(apart from it's many
restrictions: no nested statements for example);

Gerry replies:
The restrictions we implemented were intended (1) to "force" the code into
more natural modes of expression -- people don't typically nest their IFs in
everyday speech, and especially not when they're trying to be clear; and (2)
to mimic the level of understanding that we might find in, say, a five-year
old child -- thinking that if we could get the computer to understand like
that, we'd really have something! Five-year-olds can properly execute all
kinds of commands, for example, yet they know nothing of real numbers, and
traditional mathematical precedence rules, etc. So how do they ever manage
to be so capable?

Bartc says,
...so what does plain
English add except long-windedness and clutter (and which can't be
done for viewing purposes by a translator or a smart editor)?

Gerry replies,
What Plain English adds is a "naturalness" to the coding process (plus
self-documenting code, a great syntax for beginners that can also be used to
later teach all sorts of other programming concepts and techniques, etc).
But the important thing is the "naturalness" of the process. This is most
evident with beginners, of course: a primary school child is obviously going
to have less trouble with "Clear the screen" or "Paint the screen black"
than something like "graphicslib.clrscr(0x000000)". But we found -- even
though we are experienced programmers in a wide variety of traditional
languages -- that after coding in Plain English for a while, we found Plain
English more natural to think about and more convenient to type (in spite of
it's "long-windedness"). I don't, however, think any experienced programmer
will be convinced of that until he actually tries it for a significant
period of time, with an open mind. "User-friendly is what the user is used
to". That's why we recommend that people actually print off the first 50
pages of our instruction manual and actually type in the sample program, a
little at a time, before jumping to any conclusions. Reading the code isn't
enough; you actually have to type it to get the "feel" of programming in
English.

Bartc says,
You're right that it is rare to use natural English as a basis for syntax,
but perhaps there are good reasons for that!

Gerry replies,
Perhaps. Perhaps not -- after all, it's what MOST people use MOST of the
time for MOST every kind of communication. Here we are, in fact, you and I,
doing it again. A natural language framework with snippets of alternative
syntax where appropriate.

John says,
Even COBOL has COMPUTE A=B+C+D as an alternative to
ADD B TO C TO D GIVING D (or something like that.)

Gerry replies:
Sure. See my answer to Bartc above.

John says,
English isn't a very good way to describe algorithms
once they get at all complicated.

Gerry replies:
Which is one of the greatest advantages of Plain English programming: it
encourages simplicity; it discourages algorithms that "get at all
complicated". As Einstein said, "If you can't explain it simply, you don't
understand it well enough." Plain English quickly "tips you off" to those
instances when your thinking is convoluted or unclear.

Martin Ward

unread,
Nov 11, 2014, 2:35:48 PM11/11/14
to
"By relieving the mind of all unnecessary work, a good notation sets
it free to concentrate on more advanced problems, and in effect,
increases the mental power of the race." -- Alfrid North Whitehead

Mathematicians have been slowly and painstakingly developing new
and better notations over the centuries. Each new development
has increased the mental power of the race. In effect,
each advance in notation allows human beings to think new thoughts.

Before the invention of algebraic notation, mathematical problems
had to be expressed in long, complex sentences in natural language.
My father was a carpenter and was taught how to compute the radius
of a circle, given the length of a chord and the "rise"
(the perpendicular distance from the middle of the chord
to the edge of the circle) as follows:

"To the square of half the chord, add the square of the rise.
Divide this sum by twice the rise and the quotient is the radius".

This is a reasonable plain English description of the computation:
but I somehow doubt that it would be accepted by the Plain English
compiler! But if you want to take in the whole computation at a glance,
in order to analyse or manipulate it, then the mathematical notation
is far superior:

2 2
R = (C/2) + r
-----------
2r

When the "Revised Report on the Algorithmic Language Algol 60"
was published, researchers immediately recognised the superiority
of the notation and started using it (in preference to natural language)
as a way to describe algorithms to each other: even before
the first Algol 60 compilers appeared.

Each new programming paradigm: functional, parallel, dataflow,
declarative, object-oriented and so on, has added to the mental
power of the race. *Every* programmer should learn at least
one example of each of the above types of language.
For example: even if you are not writing in a functional programming
language, if your problem has a natural functional programming
solution, then you will write a much better implementation
in your target language if you already know functional programming.
Your mental power has been increased by your knowledge
of functional programming.

On the other hand, if you only program in "mutilated English",
your mental power will hardly increase at all!
As the saying goes: if the only tool you have is a hammer,
you treat everything as if it were a nail.

On 05/11/14 06:54, Joshua Cranmer wrote:
> Once it has slowed to one or two seconds between pops, take it out of
> the microwave.

Joshua claims that this is an example of event-drive programming,
but Gerry disagrees and claims that the program can be expressed
procedurally, in Plain English, as:

Loop. If it not has slowed to one or two seconds between pops, repeat.

Apart from the rather clumsy construction ("If it not has ..."),
this is an excellent example of the problems caused by ambiguity
in natural language. What does "has slowed to one or two seconds
between pops" mean? Presumably, we wait for a pop, note the time,
then wait for the next pop, compare the times and if the time
is less than one second, then we can assert that it has not yet
"slowed to one or two seconds".

Now suppose that pops come at a rate of less than one per second
and then stop altogether. We will be stuck in the loop,
waiting for the next pop, until our popcorn sets fire to the microwave!
Clearly we need to wait until *either* the next pop *or*
more than one second (or is it two seconds?) have passed.
This is an event-driven process, where one event is a timeout,
and another is a "pop".

But a slightly more complex cookery example will better illustrate
that a procedural description is not always the most natural.
Suppose you want bacon and scrambled eggs for breakfast, and want them
both nice and hot at the same time. The most natural description
is to describe how to scramble eggs, then separately describe how
to grill bacon, then state that the two operations should be carried
out in parallel. The procedural description involves an inner loop
which repeatedly checks the eggs, checks the bacon, terminates when both
have cooked. But it has tp check only the eggs if the bacon
has already cooked, and check only the bacon if the eggs have
already cooked. The two simple processes become intemingled into
a single, much more complex, process.

Bartc says,
> to create sizeable programs that work reliably, I think you
> need to use a more stylised and disciplined way of writing
> instructions.

To which Gerry replies:
> We offer our prototype system as proof that is not so.

I think by a "sizable program", Bartc means something a *little*
larger than six source files containing a totel of 19,731 lines
of code. To be precise: something too large to be implemented
and maintained by a single programmer. At that point,
you do not want every programmer writing in their own private language:
even if each language is a small, highly stylised subset of English.
Making each private language a subset of English will only
increase the confusion, due to the inherent ambiguity of English.
For example, one programmer decides that "foo is between bar and baz"
means that bar <= foo <= baz. Another decides
that "between" should exclude the end points of the range.
If they need to work on each other's code, they will
get very confused! Similarly in English "or" can mean
"inclusive or" or "exclusive or" depending on the context.

I would argue that each different class of "sizable programs"
(those which require a team of programmers and millions of lines
of code) should actually be implemented in a different language.
A language which is specifically designed for implementing
"that type of program". For example: a program transformation system
should be implemented in a language designed for writing
program transformations. So when I started re-implementing
the FermaT program transformation system from scratch,
I first designed the MetaWSL programming language.
The system currently consists of around 3MB of MetaWSL code,
which compiles into 15MB of highly complex C code.

I wrote what I believe to be the first paper on this approach
to programming large systems back in 1994:

"Language Oriented Programming", Martin Ward,
Software--Concepts and Tools, 15, pp 147-161, 1994, ISSN 0945-8115
http://www.gkc.org.uk/martin/papers/middle-out-t.pdf

Since then, the approach has been taken up by quite a few people
under the terms "Language workbenches" (Fowler), "Intentional
Programming" (Microsoft), "Concept Programming" (XL),
"Dielecting (REBOL), "Model Driven (Software) Development",
and, Domain Specific Languages. For example, Google's map/reduce
language is specifically designed for writing massively parallel
distributed systems.

Designing a new language which is a genuine improvement
on all existing languages (for implementing a particular
class of programs) is hard work: and the task should
not be underestimated. However, the results of a really
good design can be applied to many programs
and the benefits outweigh the effort many times over:
a really good design for a new Domain Specific Language
actually can increase the mental power of the race!

--
Martin

Dr Martin Ward STRL Principal Lecturer & Reader in Software Engineering
mar...@gkc.org.uk http://www.cse.dmu.ac.uk/~mward/ Erdos number: 4
G.K.Chesterton web site: http://www.cse.dmu.ac.uk/~mward/gkc/
Mirrors: http://www.gkc.org.uk and http://www.gkc.org.uk/gkc

Gerry Rzeppa

unread,
Nov 11, 2014, 2:38:20 PM11/11/14
to
Tomasz says:
This discussion reminds me of the history of mathematical notation.
The notation we use today was adopted mainly in the 17th century and
contributed to rapid development of mathematics. Before that equations
were usually written in "Plain Latin" or "Plain Arabic", or ""Plain
Whatever". I don't think anybody would recommend going back ;-).

Gerry replies:
We agree that mathematical notation is useful in its place; that's why we
believe the "ultimate" programming language will be a hybrid of different
syntaxes (and graphics, as well) as described here:

https://www.indiegogo.com/projects/the-hybrid-programming-language/x/8950932

But we also think that starting with mathematical notation and "bubbling it
up" to the rest of a programming language is and was a mistake.
""Graphics_Lib.ClrScrn(0x000000)" simply can't compete with "Clear the
screen." And most of most programs fall are not mathematical in nature at
all. Most of most programs are things like "move this" or "draw that" or
"put this on the disk". It is our belief that if programming was invented by
poets (with a little help from mathematicians) -- rather than being invented
by mathematicians with an utter disregard for poets -- we'd be further along
the road to computers like the HAL 9000 (which the poets thought would
arrive in 2001).

But things are looking up. We've got Apple's SIRI, Wolfram's ALPHA, Nelson's
INFORM, Microsoft's CORTANA, and now Amazon's ECHO. And, of course, Plain
English!
[This argument has gone about as far as it's going to. Back to compiler design,
please. -John]

Gene Wirchenko

unread,
Nov 11, 2014, 2:40:53 PM11/11/14
to
On Fri, 07 Nov 2014 16:23:06 -0500, George Neuner
<gneu...@comcast.net> wrote:

[snip]

>I frequently am amazed at the contortions Java programmers go through
>to solve problems within the restrictive OO model. IMO, "Design
>Patterns" was an admission that too many programmers couldn't use Java
>to solve routine programming problems.

I bounced off "Design Patterns" myself. It seemed that the
authors would rather do anything but simply instantiate an object.

I had a uni assignment that I tried to solve using OO. As I
worked my way around it, I realised that OO was not going to happen.
Oh, I could have done it, but the code would have been much longer and
quite difficult to validate/debug.

>[That's as much an indictment of the lack of CS education among
>programmers. Java - somewhat successfully - tried to pander to less
>educated programmers by making simple things simpler, but in doing so
>it made solving many real problems more complicated.]

I lost a few hours to this on another assignment, because ints
could not be unsigned.

[snip]

Sincerely,

Gene Wirchenko

Richard Hable

unread,
Nov 11, 2014, 2:44:53 PM11/11/14
to
On 11/09/14 02:16, BartC wrote:

> Self-hosting (implementing a language in itself) is a nice touch, but
> I don't think it's that important, perhaps because people are
> realising that the best language to implement a compiler in, is not
> necessarily the language that is being compiled. Unless of course you
> are trying to use a single language for everything.)

I think, the main advantage of self-hosting is that it can be used as
one big unit test: you can compile the compiler with itself, use the
resulting compiler to compile itself again, and check if the result is
exactly the same.

Richard

BartC

unread,
Nov 11, 2014, 11:17:50 PM11/11/14
to
"Richard Hable" <port...@aon.at> wrote in message
Yes, that's partly what I do, because I don't have any substantial
non-compiler, non-language and non-development related applications to try
different versions on.

It does, however, run into problems, such as frequently ending up with
non-working compilers and/or support programs, and with most of the bridges
burnt! Then recovering can be tricky.

It is possible to compile a compiler with itself, and perhaps even repeat
that again, but still end up with something that apparently still works, but
has now developed a bug. Perhaps because it affects some code that is not
executed in the compiler, because the particular language construct to
invoke the bug doesn't occur in the compiler source.

(My setup is a bit awkward to describe, but:

I have an interpreted language M that compiles to bytecode. Its compiler is
written in M.

I have a statically typed language B that compiles to native code. Its
compiler is also written in M.

To run any M bytecode programs, I use an interpreter written in B. So there
are a few cyclic dependencies and you can imagine the fun and games when
some subtle bug gets past or I skimp on testing when I make changes. The bug
can be in either compiler (or in the generated bytecode or native code), or
in the interpreter.

To help out however, I also maintain a version of the interpreter written in
C. (Or near enough C, as I use a slightly different syntax, put through a
simple translator, to end up as proper C. That translator is written in
M...)

BTW if anyone is wondering how well an interpreted compiler performs, this
one can compile itself in about 1.3 seconds on a low-end PC. But it's
divided into modules and the biggest one takes about 1/6th second to
compile. The whole thing isn't very big, about 20Kloc, so the compiled
lines-per-second isn't great, but it's not a problem at the minute.)

--
Bartc
[It's time to reread Ken Thompson's Turing award lecture,
available here: http://cm.bell-labs.com/who/ken/trust.html
-John]

Anton Ertl

unread,
Nov 12, 2014, 4:06:25 AM11/12/14
to
Yes, having additional testing for your compiler is one advantage
(although I would not call it a unit test).

Another reason to self-host a compiler is the signal it sends: If you
claim that your language is general-purpose, and you write the
compiler for it in a different language, you are obviously not very
convinced of the capabilities of your language, so why should anybody
else use it?

- anton
--
M. Anton Ertl
an...@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/

Anton Ertl

unread,
Nov 12, 2014, 4:07:35 AM11/12/14
to
"BartC" <b...@freeuk.com> writes:
>It does, however, run into problems, such as frequently ending up with
>non-working compilers and/or support programs, and with most of the bridges
>burnt! Then recovering can be tricky.

To deal with that, you have a "stable" and executable (or buildable)
version somewhere, and make sure that your compiler builds with that
version. So if the current version no longer builds with itself, you
build it with the stable version; for that you also need a simple way
to switch from building with the current version (default) to building
with the stable version.

E.g., if Gforth no longer builds itself because of some bug, we do
"make distclean" and invoke the script "BUILD_FROM_SCRATCH", which
invokes the stable version of Gforth for the self-hosting.

>It is possible to compile a compiler with itself, and perhaps even repeat
>that again, but still end up with something that apparently still works, but
>has now developed a bug. Perhaps because it affects some code that is not
>executed in the compiler, because the particular language construct to
>invoke the bug doesn't occur in the compiler source.

Sure, a test cannot prove the absence of bugs. You need additional
tests, especially for features not exercised in the compiler (say,
vectorization).

>[It's time to reread Ken Thompson's Turing award lecture,
>available here: http://cm.bell-labs.com/who/ken/trust.html
>-John]

Self-hosting does not make binary distributions of code (including
compilers) any more prone to hidden functions. These days we have
even more stuff to worry about, with stuff like system-management mode
and VM functionality making it very easy to hide functions, and huge
amounts of code in the firmware of the PC, or even in the firmware of
USB sticks.

BartC

unread,
Nov 12, 2014, 9:43:59 PM11/12/14
to
"Anton Ertl" <an...@mips.complang.tuwien.ac.at> wrote in message
> "BartC" <b...@freeuk.com> writes:
>>It does, however, run into problems, such as frequently ending up with
>>non-working compilers and/or support programs, and with most of the
>>bridges
>>burnt! Then recovering can be tricky.
>
> To deal with that, you have a "stable" and executable (or buildable)
> version somewhere, and make sure that your compiler builds with that
> version. So if the current version no longer builds with itself, you
> build it with the stable version; for that you also need a simple way
> to switch from building with the current version (default) to building
> with the stable version.

There are still problems, especially with on-going development of a language
and its compiler.

For example, you make an enhancement in the language. The compiler source is
modified, and the change is tested and it works. Then you decide to make use
of the new feature in the compiler code.

The problem comes when something else doesn't work and you need to revert
back, but those old binaries will not support the new feature that now
litters the compiler sources!

In the case of bytecode languages, there is a more specific problem: the
bytecode format could be modified in such a way that new binary bytecode
files will be incompatible with both old bytecode files, and with the
interpreter used. (This might be as simple as adding a new bytecode
instruction.)

Now, with a self-hosting system where all the support programs (the
compiler, any loader (a program that does a similar job to a linker but for
bytecode), libraries, and even IDEs and editors) will all be using version
X, and the interpreter runs only version X, everything needs to be switched
over to version Y. But everything can't be done simultaneously, so it gets
messy (with X-X, X-Y, and Y-Y versions being involved).

Without self-hosting and using external tools maintained by someone else,
most of these headaches would disappear. (I do it because I like the
satisfaction of being independent.)

"Anton Ertl" <an...@mips.complang.tuwien.ac.at> wrote in message

> Another reason to self-host a compiler is the signal it sends: If you
> claim that your language is general-purpose, and you write the
> compiler for it in a different language, you are obviously not very
> convinced of the capabilities of your language, so why should anybody
> else use it?

Sometimes you have to use a different language to get started. Or you may be
modifying a compiler which is in another language (perhaps one of yours) to
compile a new language, and it would be silly to rewrite it in the new
language just for the purpose of being able to say it's self-hosted.

Another problem with self-hosting is distributing the program so that
someone else can use it. But is that done as a binary executable, or as
source-code?

With source-code in a new language, no-one will yet have a working binary
compiler to build it! While binary distributions have all the usual problems
(of supporting multiple platforms and getting people to trust it).

(If I was distributing a new language compiler, it would likely be in the
form of a single C source file. However, because my compilers aren't written
in C, and are interpreted, that single file would actually be the
interpreter, plus the compiler bytecode represented as data within the C
source.)

--
Bartc
[That's fine -- tools like bison are written in themselves, but they
distribute enough C code to bootstrap it. -John]

federat...@netzero.com

unread,
Dec 29, 2014, 6:09:57 PM12/29/14
to
On Tuesday, November 4, 2014 10:27:29 PM UTC-6, Gerry Rzeppa wrote:
> [And with that, this thread is over unless someone has something
> to say that's related to compilers. Thanks, all. -John]
> Exactly how does one go about writing a Plain English compiler?

What you should be focusing on (and to sorta abide by the moderator's wish) is
what goes on in the semantics, and on the back end more so than how the
language is presented to someone.

That means that the "fundamentals"
> 1. Type definitions;
> 2. Global Variable definitions;
> 3. Routine Headers;
> 4. Conditional Commands; and
> 5. Unconditional Commands.
have to be properly addressed; and first you have to ask whether these really
are the fundamentals, and if not, then what they should be.

In fact, you can get away with just the following for the procedural element:
* the comma statement A, B (do A, discard its results, then do B [with
results])
* the conditional A? B: C. The comma can be treated as a special case A, B
= A? B: B.
* recursive definitions of continuation or call-return segments. Doing
things by continuations is preferrable since you can define call-return
semantics in terms of it, but not so easily the other way around.
* the ability to reference continuations (i.e. gotos).

Fortran actually headed in that direction early on (as did the scripting
language DC), but decided to fall back in more recent years.

The distinction between locals and globals has to go one level deeper: to
include also the distinction between thread-local and shared.

Finally, as for type systems: it appears that contemporary languages have been
progressively trying to incorporate more and more of the Curry-Howard
isomorphism (or "Type-Proposition" correspondence), and ... in particular ...
the various modifications that try to this correspondence to include
quantifiers in predicate logic (which then requires indexed or parametrized
types). These are embodied by the various higher-order type theories that have
cropped up since Carnac and Goedel (Goedel's Dialectica interpretation,
Martin-Loef, Carnac's systems B and C, Lambek/Scott's formalism). I think C++
was moving in the direction of Hindley-Milnor type theory.

This aspect of the language design -- which is (in fact) the aspect that bears
the greatest affinity to natural language (think Montague semantics) -- is
highly non-trivial. This is borne witness by the large number of ways that
different language have tried to approach this issue.

In particular, what's non-trivial about it is that objects which can be named
and addressed can be contained in or contain other objects. Fortran actually
went a step further allowing *overlapping* objects (by way of the equivalence
statement). Languages like C++, in contrast, seem to shy away from this.

The Lambda operator *can* be incorporate *seamlessly* into any of these
languages provided you recognize the continuation for what it really is: an
infinitary lambda expression (i.e. an expression with an infinitary parse
tree.) So, for instance, the cyclic statement (while (E) S) followed by a
continuation Q would correspond to the infinitary expression E? (S (E? S (E? S
(...) Q) Q) Q, which is called "rational" since it has only a finite number of
distinct subexpressions. A goto label is simply a label of a subexpression,
and a goto statement is just a compact reference to the labelled
subexpression.

This works quite well with continuation semantics, since the call and return
are already factored out by the time you get to this semantics. All you have
are jumps. So, I won't belabor how you represent calls and returns in
continuation form.

The correspondence goes all the way back to Landin, via the following
equivalence for assignments to simple variables (x = A, Q) <-> (lambda x.Q) A,
where Q is a continuation. But Landin never incorporated the above-mentioned
insight on cyclic control-flow structures, so the whole programme languished
in the 1960's.

Ultimately, the correspondence is of the following form: to each statement S
is a context S[] (i.e. an infinitary expression with a missing subexpression)
such that S[Q] is the result of applying Q after executing S. Thus, for
instance, the context corresponding to the simple assignment statement x = A
is just (lambda x.[]) A.

But the non-trivial part: the type system, the objects residing within it and
pointers. What's non-trivial about is what goes on with the object-subobject
hierarchy; i.e. what does lambda V[I].Q mean, for instance, for the I element
of the vector V? There is, in fact, an facility native to the lambda calculus
for addressing this very issue; the equivalence
lambda V[I].Q = (V[I] = [], Q)
= (V = (lambda J.(I == J? []: V[J])), Q)
= (lambda V.Q) (lambda J.(I == J? []: V[J]))
which effectively treats the structured object as a function applied to its
"designators" (here, the index [I]) and the assignment as a partial update to
the function itself.

A good translation method will be one that treats *all* instances equivalent
to this type of statement the same (no matter whether they come from an
assignment to a designated subobject or not) while avoiding any unnecessary
structure-wide copying of unchanged elements.

A proper account of how pointers fit into this meshes completely with the
question of how the inheritance and object/subobject hierarchies are defined
in the language. To each object type T, the corresponding pointer operator *
is actually a universal object (i.e. the type-cast pointer operator *(T *)) so
that if P is a pointer variable, then P is the designator that bears the same
relation to *P that the index I bears to component V[I]. This fits snugly into
the type/subtype hierarchy in such a way that if T' is a derived from T, then
the operator *(T' *) is a subobject of *(T *).
Reply all
Reply to author
Forward
0 new messages