1) Dogma. Let's skip that.
2) Efficient, easy to write parsers. That is no longer a major
issue, and I can deliver that in other ways.
3) Diagnosability of errors. I can deliver that in other ways.
So WHY should I use a context-free grammar? Good reasons appreciated.
Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email: nm...@cam.ac.uk
Tel.: +44 1223 334761 Fax: +44 1223 334679
[The best argument I've heard is that to a first approximation, CFGs
match languages that people can understand. Of course, since I write
everything in perl these days, I suppose I don't believe that, either.
-John]
> I have been thinking about a programming language, and have good
> reasons to abandon context-free grammars completely. So what I am
> asking is what reasons are there to favour them - I know of the
> following:
>
> 1) Dogma. Let's skip that.
>
> 2) Efficient, easy to write parsers. That is no longer a major
> issue, and I can deliver that in other ways.
>
> 3) Diagnosability of errors. I can deliver that in other ways.
>
> So WHY should I use a context-free grammar? Good reasons appreciated.
> [The best argument I've heard is that to a first approximation, CFGs
> match languages that people can understand. Of course, since I write
> everything in perl these days, I suppose I don't believe that, either.
> -John]
I agree with John that CFG's are relatively easy for humans to
understand, and using a stronger class of grammars can make programs
harder for people (as well as compilers) to parse.
An advantage of CFG's is also reasonably compact definitions.
Depending on what you want to replace them with, you might lose that.
A third advantage is orthogonality: It is fairly easy to add new
constructions to your language without destroying old ones. This is
more or less the definition of being context-free I suppose. In
practice, though, this isn't entirely true due to requirements of
non-ambiguity in addition to context-freeness. And I suppose that
this may be another reason for using CFG's: You can get automatic
"proofs" of non-ambiguity from your parser generators. And absence of
ambiguity is certainly an important property of a language definition
(at least where the ambiguity leads to several observably different
behaviours).
Torben
> I have been thinking about a programming language, and have good
> reasons to abandon context-free grammars completely.
This portends a delicious language-design discussion, if our esteemed
moderator allows such (as opposed to pure compiler-specific
discussions). ;-)
I favor context-free grammars because I understand them, both as a user
of a language, and as a compiler writer.
Now, please tell us, why you "have good reasons to abandon" them
"completely". Tell us the good reasons. And the alternative.
> [The best argument I've heard is that to a first approximation, CFGs
> match languages that people can understand. Of course, since I write
> everything in perl these days, I suppose I don't believe that, either.
> -John]
Sorry, but I'm not impressed with Perl.
- Bob
[Your moderator is happy to have language design discussions insofar as
they relate to the way that compilers have to deal with the language.
As the FAQ says, once it degnerates into where the semicolon goes, it's
over.
Perl's syntax is gross, but it's the most productive language I've
ever used, partly because of its nice complete garbage collected type
system, partly because of the vast set of application libraries
available. -John]
Parsers generated from explicit grammars, as opposed to hand-written
parsers, have the benefit that you can easily state precisely what language
they accept -- you've got the grammar right there. It takes a lot more
discipline to make sure you can concisely describe what a hand-written
parser accepts, beyond "run it and see". If the grammar formalism you
use happens not to be context-free grammars, I don't think that's
necessarily a big deal. But not being able to state precisely what
language you're parsing is a mistake.
Tom Duff wrote, in a paper introducing a new shell called rc:
It is remarkable that in the four most recent editions
of the UNIX system programmer's manual the Bourne
shell grammar described in the manual page does not
admit the command "who|wc". This is surely an oversight,
but it suggests something darker: nobody really knows
what the Bourne shell's grammar is. Even examination
of the source code is little help. The parser is implemented
by recursive descent, but the routines corresponding to
the syntactic categories all have a flag argument that
subtly changes their operation depending on the context.
Rc's parser is implemented using yacc, so I can say
precisely what the grammar is.
Russ
The best reason I know of is correctness. The argument proceeds like
this.
Recursive descent parsers have a 1-1 correspondence with LL grammars.
That is, if you can write the rules for your language looking at only
a fixed number of tokens ahead, then you can write that down as an LL
grammar (or a recursive descent parser) and you can derive the other
one mechanically.
Next, if you have an LL grammar and if you use a tool, your tool can
(and any good tool will) detect when you make mistakes and make
certain your grammar is still LL. As long as that is true, one can
simply read the grammar and know exactly what language it parses.
To me that's a pretty strong guarantee.
A fairly similar guarantee holds for LR languages. The only thing one
loses from the guarantee with LR is that you can't look at the first
tokens of a rule and tell whether it (or some other similar rule)
applies. If one isn't careful, then losing that guarantee can be too
much, which is why I recommend striving for an LL grammar.
Now, if you have good reasons to abandon context-free grammars, can
you be certain your programming language description describes what
you want? If so, go for it.
Hope this helps,
-Chris
*****************************************************************************
Chris Clark Internet : com...@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)
Thanks for all the responses. I shall reply here, as spawning threads
is merely confusing :-)
>I favor context-free grammars because I understand them, both as a user
>of a language, and as a compiler writer.
Yes. One thing that I should have made clear, and clearly didn't, is
that I fully agree that one should design a mathematically clean
grammar and work from that. While there ARE people who can code a
parser and end up with an unambiguous language, I am not one. Nor
were Backus, Kernighan and Ritchie :-)
I am specifically referring to using a grammar that includes context.
One obvious solution is to go in for van Wijngaarden grammars and
related technologies, but I have never found a comprehensible
description. In fact, I am 90% certain that they are out of fashion
for precisely that reason!
>Now, please tell us, why you "have good reasons to abandon" them
>"completely". Tell us the good reasons. And the alternative.
Right. Firstly, I want to bring types into the grammar, in ways that
are similar to Algol 68 but more extensive. For example:
<conditional expression> ::=
'(' <integer expression> ? <type_1 expression> ':'
<type_1 expression> ':' <type_1 expression> |
'(' <boolean expression> ? <type_2 expression> ':'
<type_2 expression> ')'
might allow three values for integers (negative, zero and positive)
and constrain the value types to be the same. Now, this can clearly
be done using two-level grammars, but my problem with them is as
above. That example can be, and was done, in Algol 68.
Secondly, I can see no good reason to stop programs from extending
the syntax by adding such rules. With tolerable constraints, it is
easy for the compiler to detect the introduction of an ambiguity
(regular expression overlap is a solved problem - or, at least, I
invented an algorithm and could publish if it has not already been
done).
Thirdly, I want to bring some of the scoping rules into the grammar,
for things like exception handling, breaking out of loops etc.
Even Algol 68 tended to use English for some of this, so I clearly
cannot do it completely - but it isn't hard to add extra predicates
to a basic BNF-style grammar.
Note that I am tending to assume a recursive descent parser, but one
where certain, clearly-defined aspects of the context are visible
to the parsing rules. Provided that you ensure that all such context
is visible before it is needed, there is no difficulty - but I remain
amazed by how many languages (INCLUDING Algol 68, Fortran and C) have
missed that obvious point.
>> [The best argument I've heard is that to a first approximation, CFGs
>> match languages that people can understand. Of course, since I write
>> everything in perl these days, I suppose I don't believe that, either.]
>
>Sorry, but I'm not impressed with Perl.
Oh, it's clearly a CFG - Compiles Files of Gibberish. I use Python
because, when I make a mistake, it will usually tell me I am an idiot.
Perl is more forgiving, and will do what I might have meant in another
universe - I checked every single branch and path in my first 24-line
Perl program on artificial data, and it then failed when I first used
it, because I had misread the description and made a syntax error.
I have never used it since when anything else will do, and regard
it as far too dangerous ever to use as superuser.
[ Oh, I was also the person who did a draft port of Perl 4 to MVS
(think K&R,ASCII,Unix => ISO,EBCDIC,MVS), which coloured my view
of it as a piece of software engineering. ]
My intent is to be as far from Perl as possible :-)
Regards,
Nick Maclaren.
[Odd, I never have that problem writing perl applications, and mine
are somewhat longer than 24 lines. Maybe I'm better at reading the
manual, although I admit to staying away from the internal
implementation.
With respect to the prior discussion, I would worry about error
diagnostics and general readability. When you start building types
into the syntax, that means that many type errors now are likely to
produce "syntax error" rather than "string found where boolean
expected", and I have to say your 2 vs. 3 way branch is grosser than
anything I've done in perl. As far as extending the syntax on the
fly, that avenue was extensively investigated in the 1970s in
languages like IMP-72 and EL/1, all of which died. We discovered that
if you can write every program in a slightly different language, six
months later you won't remember what each language was and you won't
be able to read any of them. OOP style type extensions seem to be
reasonably easy to understand, but new and improved ternary operators
and mobius loops are not. -John]
(1) Regular grammars are not sufficient for generating programming
languages since in most of the PLs we need matching procedure
call-return structures, many balanced structures for statements like
if-then-else etc which a PDA can easily keep track of.
(2) Well known parsing algorithms are known to exist for CFLs.
One important thing he was saying was that some constraints like 'The
type associated with the use of variable in an expression should
always match with it's declared type' can't be expressed within a CFG
itself ( although it can be achieved using attributes attached during
syntax-directed translation ). Also he was saying that a
context-sensitive grammar could express these kinds of constraints.
Is the above assertion true ? If so, wouldn't using CSGs be more
helpful for purposes like type checking etc ?
Thanks.
[See my notes about two messages ago. Yes, you can build the types into
the grammar by using context sensitivity, but no, it's not usually a
good idea because it ruins the diagnotstics. Use attributes. That's
what they're for. -John]
http://languagemachine.sourceforge.net
The language machine implements applies grammatical rewriting or
substitution rules to a stream of input symbols. The application of a
rule has a recognition phase and a substitution phase. The rules are
completely general - the recognition phae can produce any number of
symbols to be recognised, and the substitution phase can produce any
number of symbols to be substituted, and either can be empty. The
lm-diagram (lots of examples on the site) lets you see exactly how
this works in practice.
The effect of an analysis is described by means of variable bindings
and side effect actions which can include calls on external procedures
in the D or C languages. Translations are constructed as fragments
which are bound as variable values and which are typically not
evaluated until the end of a unit of analysis. In effect each fragment
is a closure with scope that relates to the nesting structure of
recognition phases.
I have used predecessors of this system to deal with type information
in several compilers and translators.
In my view, W-grammars are hard to understand for reasons which also
led to an unhealthy obsession with CFGs: however useful for theory, it
was unhelpful for practical purposes to think of the grammar as
generating the language.
The generative theory of grammar takes string rewriting rules as given.
But rewriting systems that allow nesting in both recognition and
substitution phases and that deal in terminal and nonterminal symbols
are themselves grammatical engines which are relatively easy to
understand, can be reasonably efficient, and can apply any kind of
grammatical rule.
I would be interested to hear what you think. If you have questions,
there is a mailing list and also a forum.
Regards
Peri Hankey
Two languages that I have used recently that allow syntax changes, TeX
and mathematica, don't seem to be going away so fast.
There are some very strange errors that you can get in both languages.
TeX even allows changes to the character codes defining which characters
are letters and which aren't, so that the tokenizing can change.
Changes must be done carefully so that future text isn't expanded until
the change has taken effect.
-- glen
[You're right, but all the TeX users I know use the pre-written macros
in LaTeX and avoid doing any syntax magic of their own. -John]
That is what two-level (van Wijngaarden) grammars were designed for
but, as I said, I know of no comprehensible description. I have found
nothing outside the "program proving" area that can handle scoping
etc., and it is a lifetime task to extract useful information from the
morass of obfuscated drivel that surrounds it.
It is very sad that there is such a gulf between theoreticians and
practitioners :-(
>[See my notes about two messages ago. Yes, you can build the types
>into the grammar by using context sensitivity, but no, it's not
>usually a good idea because it ruins the diagnotstics. Use
>attributes. That's what they're for. -John]
No, that is wrong. What you mean is that the parsing techniques that
produce reasonable diagnostics for CFGs cease doing so when such
context is added. There is no fundamental incompatibility between the
use of context and good diagnostics. ALGOL68C had good ones, and I
have also seen good ones in the very highly context-dependent
interpretive languages used in areas like statistics.
I fully agree that just bolting types onto an existing language and
parsing system will lead to poor diagnostics :-)
Regards,
Nick Maclaren.
But some packages for LaTeX use such a magic very actively. For
example, babel -- multilingual support.
--
Ivan Boldyrev
> http://languagemachine.sourceforge.net
Whoop! This is a real nice surprise to me.
About ten Years ago a friend and i started research on portable systems and
generative programming. I was so happy when the book "Generative
Programming" was published that i went to the ecoop 2000 in Budapest to
participate in a workshop held by Czarnecki. Years passed and i simplified
my ideas a lot away from trying to make a programming environment towards a
grammar substitution engine to make a proper start.
>From time to time i was astonished that i couldn't find projects steering
into that direction. I just found about the language machine and after a
quick glance it seems you have done what i couldn't find the time for to do
properly!
I'll take a deeper look into it this week and i'll save my congrats until i
know for sure what i'm talking about ;)
Regards
- Robert Figura
--
/* mandlsig.c v0.23 (c) by Robert Figura */
I=1702;float O,o,i;main(l){for(;I--;putchar("oO .,\nm>cot.bitamea\
@urigrf <raguFit erobR"[I%74?I>837&874>I?I^833:l%5:5]))for(O=o=l=
0;O*O+o*o<(16^l++);o=2*O*o+I/74/11.-1,O=i)i=O*O-o*o+I%74*.04-2.2;}
The impression I got from reading the revised report on Algol 68 was that
the grammar involved a lot of fiddly programming at the wrong level of
abstraction - it wasted effort on recursive definitions of numerals, for
example. A good formalism should make things clearer, but the vW grammaer
is just obscure. Furthermore it has a reputation for being brittle - the
grammar was difficult to change as the language was refined.
Modern programming language theory doesn't generally bother itself with
syntax, probably because the type theory is interesting and complicated
enough without getting mired in low-level irrelevance.
Tony.
--
f.a.n.finch <d...@dotat.at> http://dotat.at/
FAEROES: SOUTH OR SOUTHWEST 6, INCREASING GALE 8 TO STORM 10, PERHAPS VIOLENT
STORM 11 LATER. RAIN OR SHOWERS. GOOD BECOMING MODERATE OR POOR.
Not to say Lisp :-)
>There are some very strange errors that you can get in both languages.
Yes, indeed. TeX, in particular, is ghastly - but that has little
to do with its extensibility.
But there are extensions and extensions and, while I agree that too
much flexibility leads to confusion, there are examples of controlled
extensibility with good diagnostics. Consider the Phoenix command
language (a Cambridge phenomenon) or Genstat. Both allow the addition
of user-defined statements - as do a fair number of other "higher
level" languages. The point with those is that any syntax error is
localised by the fixed language.
Even when you get more general (e.g. Algol 68 and languages with
type-dependent expressions), restricting them only slightly allows
for diagnostics like:
In the context of a <numbat> statement, there is no expression
that matches 'AARDVARK <integer array expression> : ...'.
While that doesn't tell you if you meant to type AARDWOLF, you forgot
to subscript the array or you typed ':' meaning ';', such diagnostics
are better that most you get from C! I can't remember which language
I used produced ones very like that.
Back to compilation techniques. My observation is that the keys to
good diagnostics are:
Bounded look-ahead at ALL levels - Algol 68 got it wrong with
modes and operators, C99 has got it seriously wrong with inline
(and, historically static/extern), and Fortran got it wrong with
external functions.
All context must be set at a serialisation point before it is
used, not just by the time it is needed. Many languages get this
wrong, though not usually seriously. Without this, diagnostics often
have to refer to the internal names of types, which is very confusing.
Any extensions must be controlled to balance flexibility and
recoverability. In particular, allowing an extension to mean that
a user error creates an ambiguity in the nature of a syntactic
object is a bad idea. Well, the same applies to built-in syntax.
If you have those three techniques, a compiler can always decide
where the lexically next error occurs, and can avoid the diagnostics
that say "Somewhere in the program, there was a syntax error." It
can also produce messages that link the erroneous area back to the
definitions it is using at that point.
Note that I am NOT saying this is possible in a compiler generated
using the advanced methods, though it may be. But it is certainly
possible in a single-pass, recursive descent compiler. Without
those restrictions, good diagnostics are often impossible using any
method at all.
Regards,
Nick Maclaren.
> Use attributes. That's what they're for.
John, suggestion has a lot of merit, irrespective of error processing.
Atrributes (i.e. little expressions one adds to the grammar that
compute some context sensitive information) are a really good solution
to the problem of adding well-typed expressions to a grammar.
Trying to put types directly into a grammar is a disaster. It makes
the grammar grow exponentially. That's the general failure of VW
grammars--such grammars are a way of succinctly describing the
explosion. Unfortunately, when one is actually parsing with them, one
has to explode them.
In contrast, attribute expressions effectively become another pass.
One parses the grammar for some rough syntactic correctness using the
CFG. Then once, one has identified the expressions and their
syntactic structure, one walks the resulting tree (or dag) and
computes whether the types can be consistently applied. If one can,
one proceeds to the next step in compilation. If one can't, one can
issue a diagnostic that says, "This part of the expression looked like
it wanted to be this type (or set of types), but this part of the
expression wanted to be this type (or set), and the types are not
compatible, because the following rule is violated...."
The notation of attributes are even correct for expressing type
calculations. They really are calculations and generally form some
form of functional programing language. And, what one wants, as
described above is to "unify" two expressions. Unify meaning, can we
find a setting of the not yet defined values which makes this
expression consistent. This can be used to insert the implied
promotions, conversions, coercions, and/or casts which are used to
make the expression consistent. (And, if it were my language, I would
add a step (compiler output) which gave the fully disambiguated
expressions, so the user could verify that the right interpretation
was made, but that's another topic.)
Enough advice for one day.
The language your compiler accepts /already/ isn't context free, if it
checks that variables are bound and that expressions typecheck -- ie,
the set of strings that the compiler will do other than emit an error
on is not context free.
Splitting the task of parsing into phases -- a) lexing, b) parsing,
and c) type-checking -- is just plain old good engineering. Each phase
creates stronger invariants on the things it passes on to the next, so
that writing them is easier. It's easier to form an AST when you know
what's a keyword and what's not. It's easier to typecheck a program
when you know your input is an abstract syntax tree -- your
typechecker doesn't have to have cases for what to do if a paren
doesn't have a closing brace. Making each phase simpler, makes it
easier to understand and debug, and yields more modular and
maintainable code.
--
Neel Krishnaswami
ne...@cs.cmu.edu
Shouldn't the question be whether it would have been possible to design
a language as productive as Perl, perhaps with a similar syntax in many
areas, but which would have been CF or even LL1 ?
We all know that some languages are not LL or not CF, but is there an
example, a feature that makes Perl special that actually *depends* on
the fact that it is or isn't CF, or LL ? I can't think of one, but I'd
be glad to be shown wrong in this matter.
Darius.
[Python has a much cleaner syntax than perl, but never got perl's critical
mass. I think this tells us that syntactic elegance isn't a big issue for
language users. -John]
> Shouldn't the question be whether it would have been possible to design
> a language as productive as Perl, perhaps with a similar syntax in many
> areas, but which would have been CF or even LL1 ?
>
> [Python has a much cleaner syntax than perl, but never got perl's critical
> mass. I think this tells us that syntactic elegance isn't a big issue for
> language users. -John]
That could well be true. How else could a syntactic abomination like
C++ ever gain popularity?
I guess most users of languages with complicated rules use only a
small subset of the language where they don't get confused about the
syntax and only occasionally (if at all) "break out" of this small
subset.
So, perhaps, the negative impact of complicated syntax isn't so much
reduced productivity as:
- Occasional unexpected behaviour when the compiler reads a
programmer typo as correct syntax that is just different from what
the programmer intended.
- Inconsistent behaviour of different compilers due to different
interpretations of what the syntax rules actually are.
- Increased complexity of source code tools such as code analysers,
program transformations, pretty printers etc.
And just about the only advantage is the ability to write really
obfuscated code. And _possibly_ a slight increase in code density,
but there are other factors that are much more important for this than
syntax.
Torben
Well, the data I have on programming language popularity
<http://www.complang.tuwien.ac.at/anton/comp.lang-statistics/>
certainly indicates that Python got a critical mass. Indeed, it seems
to gain popularity, while Perl seems to lose popularity; indeed,
Python seems to be more popular than perl since 2003. Of course, my
methodology for estimating the popularity is questionable (as is any I
have seen), but it certainly indicates that Python is very popular.
> I think this tells us that syntactic elegance isn't a big issue for
>language users. -John]
Conversely, maybe the increase in the popularity of Python is based on
its syntactic characteristsics, at least in part.
Disclaimer: I have never programmed in Python, and very little
in Perl.
- anton
--
M. Anton Ertl
an...@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/home.html
[I think if you look at the number of people running web sites in perl
vs. in python, perl still wins big. There's nothing wrong with python,
but it's not enough nicer than perl to get me to give up the vast CPAN
libraries. -John]
I fully agree. It has shown since COBOL, PL/1, C++, Perl and others.
However, IMHO, from an engineering point of view, the question should
not be whether people care for syntactic elegance, but whether they
would complain if their language was syntactically restricted to meet LL
constraints.
I think they would not, and therefore, I see no need to design non-CF
design languages. Non-CF constructs do not lead to better expressions of
programming constructs.
But I'd be glad to be shown an example where a non-CF construct actually
brings some added value that so CF construct would have.
Darius.
As someone who used it a fair amount, in several different variants, I
agree with that. Given my complete failure to find a comprehensible
description of that type of grammar, my suspicion is that it should be
regarded as a blind alley.
[ And to Chris Clark ]
Yes, attributes are good, but I react against making them part of the
'active' syntax - i.e. I feel that they are better used to impose
constraints and improve diagnostics than allow alternative syntax.
Others may disagree.
> Trying to put types directly into a grammar is a disaster. It makes
> the grammar grow exponentially. That's the general failure of VW
> grammars--such grammars are a way of succinctly describing the
> explosion. Unfortunately, when one is actually parsing with them, one
> has to explode them.
But, if you start to use attributes for 'active' syntax, you have to
explode THEM (directly or indirectly)! TANSTAAFL.
However, I fully agree that you need some way of controlling the
exponential explosion. You can't JUST add types to a traditional
design and expect things to work.
Regards,
Nick Maclaren.
My observations indicate that it has a critical mass and is growing in
relative popularity, but I would not go as far as you.
|> > I think this tells us that syntactic elegance isn't a big issue for
|> >language users. -John]
Anyone like to justify Fortran's or C's? Yes, quite.
|> Conversely, maybe the increase in the popularity of Python is based on
|> its syntactic characteristsics, at least in part.
|>
|> Disclaimer: I have never programmed in Python, and very little
|> in Perl.
I have, and do. It is one of my main reasons, though really only as
part of the "When I make an error, tell me I am an idiot, don't just
do something unpredictable."
Regards,
Nick Maclaren.
That sounds perilously close to dogma! I agree that the use of such
constructs should be justified.
However, my experience is that most people think in a context-full
fashion, and so there is a strong argument for designing a language
where controlled aspects of the context are first-class parts of the
syntax.
|> But I'd be glad to be shown an example where a non-CF construct
|> actually brings some added value that so CF construct would have.
Easy. Consider the switch/case statement. The syntax of the labels
is dependent on the type of the controlling expression. Now, that can
be handled by the use of context-free grammars and attributes, at the
expense of requiring an expression to be parsable typelessly.
Where I was thinking of using it more generally was to allow
extensible syntax of the following form (simplified for example):
<keyword> <type expression> <rest of construct>
Expressions and each such construct would be describable in a
context-free fashion, and there would be some very strong way of
recognising the end of a construct. The selection of construct is
determined by a combination of the keyword and type of the expression.
The point is that this is easy to parse and diagnose. Never mind the
theory - think of how to write a compiler for it :-) The Phoenix
command language used very much this model, and it was/is also common
in application-level languages.
Regards,
Nick Maclaren.
I wouldn't recommend them but I did find the treatment in the
following comprehensible, though YMMV :-
Grammars for Programming Languages, J. Craig Cleaveland and Robert
C. Uzgalis, Elsevier, 1977.
You may also find the following interesting :-
Affix grammars, C. H. A. Koster, pp 95-109, ALGOL 68 Implementation,
J. E. L. Peck, North-Holland, 1972.
The abstract of which is :-
The purpose of this paper is to present a type of
two-level grammar, akin to that used in the definition of ALGOL 68,
but better suited for syntax directed parsing techniques
> > I have been thinking about a programming language, and have good
> > reasons to abandon context-free grammars completely. So what I am
> > asking is what reasons are there to favour them ....
>
> The best reason I know of is correctness. The argument proceeds like
> this.
>
> Recursive descent parsers have a 1-1 correspondence with LL grammars.
...
> Now, if you have good reasons to abandon context-free grammars, can
> you be certain your programming language description describes what
> you want? If so, go for it.
I'm not sure I understand how your argument supports preferring (LL)
context-free grammars to context-sensitive grammars or two-level
grammars, although I share your preference for LL grammars.
Wouldn't it be possible to combine the advantages of an LL grammar with
the power of two-level grammars? As I understand it (if I do at all) a
VWG can be viewed as a finite specification of an infinite CFG. So I
would assume, given a string to parse, you would generate as much of the
CFG as is necessary to parse the string.
Would it be wrong to say that for a given VWG and a finite string W in
the language, it is possible to produce a finite subset (of the
generated "infinite" CFG generated by the VWG), which parses W?
Is there anything that prevents adding a requirement that the generated
grammar be LL? And how much would this restriction "cost"? Would a
restricted VWG (LLVWG?) be less powerful than a full VWG?
I believe this is the method used by A. J. Fisher in _Practical
LL(1)-Based Parsing of van Wijngaarden Grammars_ (1985).
Nick Maclaren was looking for a comprehensible description of VWG,
perhaps I may recommend J. Craig Cleaveland & Robert C. Uzgalis:
_Grammars for Programming Languages_ (1977)? (As I still am quite unsure
of my understanding of VWG, my recommendation of the book probably isn't
worth much, though.)
-Lasse
> However, my experience is that most people think in a context-full
> fashion, and so there is a strong argument for designing a language
> where controlled aspects of the context are first-class parts of the
> syntax.
I agree with the premise, but I don't understand how you derive the
conclusion from it. Context-sensitive aspects of programming
languages are routinely handled by subsequent passes (type checking,
elaboration, etc.) There is no particularly good reason to move this
stuff into the parser, and there are several good reasons (which
people here have already enumerated) for not doing so.
Cheers,
Matthias
The fact that there is a strong argument in favour of A doesn't mean
that there aren't even stronger arguments against it! All is means is
that A shouldn't be rejected without good reason.
My logic is that it is often necessary to play revolting games in
conventional languages to get around restrictions imposed by the
syntax, often in ways that could quite easily be relaxed. Most such
relaxations are best done in a context-free fashion, I agree (why
complicate things?), but some aren't.
Something that has been forgotten by some of the language community is
that the conventional approach is just that. It has become
conventional because it has a good theory behind it that maps fairly
well into practice, but that doesn't mean that it is the only
worthwhile approach. Designing languages and writing compilers is
like writing tribal lays - see Kipling :-)
Regards,
Nick Maclaren.
In a vW grammar, you can have a protonotion which encodes different
trees at the same time. If this happens the parser cannot exploit a
large order structure to the attributed parse tree it is created, and
would be hopelessly inefficient.
I don't know a simple or straightforward to explain it, but a given vW
grammar might be regarded as similar to attribute grammar with a
skeleton of context free grammar and metanotions attached that act as
attribute variables. Instead of attribute assignments, you want to use
a more generalised unification (assign or verify existing value).
NEST :: EMPTY; NEST LAYER.
LAYER :: new; LAYER DEF.
DEF :: define TAG.
program : LAYER series defining LAYER.
NEST series defining LAYER :
where NEST defines TAG, reference token, TAG token.
NEST series defining LAYER :
begin token, NEST LAYER1 series defining LAYER1, end token.
NEST LAYER0 series defining LAYER define TAG:
unless LAYER0 defines TAG, reference token, TAG token.
WHETHER NEST LAYER define TAG1 defines TAG:
where (TAG1) is (TAG), WHETHER true;
unless (TAG1) is (TAG), WHETHER NEST LAYER defines TAG.
WHETHER new defines TAG: WHETHER false.
WHETHER NEST LAYER new defines TAG: WHETHER NEST LAYER defines TAG.
Now let '=' represent unification, '!=' anti-unification', and '<'...'>'
mark off a structure. Then rewrite the grammar as
program :
T1 series defining LAYER,
T1 = <LAYER>.
NEST series defining LAYER :
where NEST defines TAG, reference token, TAG token.
NEST series defining LAYER :
begin token,
T2 series defining LAYER1,
end token,
T2 = <NEST LAYER1>.
T3 series defining T4:
T3 = <NEST LAYER0>,
T4 = <LAYER T5>,
T5 = <define TAG>,
unless T6 defines TAG, reference token, TAG token,
T6 = <LAYER0>.
where T7 defines TAG:
T7 = <NEST T8>,
T8 = <LAYER T9>,
T9 = <define TAG1>,
TAG1 = TAG, where true;
T7 = <NEST T8>,
T8 = <LAYER T9>,
T9 = <define TAG1>,
TAG1 != TAG, where T11 defines TAG,
T11 = <NEST LAYER>.
where T13 defines TAG:
T12 = <T13>,
T13 = <new>,
where false.
where NEST LAYER new defines TAG:
T14 = <NEST T15>,
T15 = <LAYER T16>,
T16 = <new>,
where NEST LAYER defines TAG,
T17 = <NEST LAYER>.
unless T7 defines TAG:
T7 = <NEST T8>,
T8 = <LAYER T9>,
T9 = <define TAG1>,
TAG1 = TAG, unless true;
T7 = <NEST T8>,
T8 = <LAYER T9>,
T9 = <define TAG1>,
TAG1 != TAG, unless T11 defines TAG,
T11 = <NEST LAYER>.
unless T13 defines TAG:
T12 = <T13>,
T13 = <new>,
unless false.
unless NEST LAYER new defines TAG:
T14 = <NEST T15>,
T15 = <LAYER T16>,
T16 = <new>,
unless NEST LAYER defines TAG,
T17 = <NEST LAYER>.
# Is there anything that prevents adding a requirement that the
# generated grammar be LL? And how much would this restriction "cost"?
# Would a restricted VWG (LLVWG?) be less powerful than a full VWG?
Once you got 'metanotion-grammar' (not exactly the classic
attribute-grammar), the CF subgrammar are just a bunch of rules you
peel off and feed into your parser generator, LL(k), LR(k), whatever.
--
SM Ryan http://www.rawbw.com/~wyrmwif/
Who's leading this mob?
This can apply even to things that are not traditionally thought of as
languages. The second or third time I implemented POSIX regular
expressions, I experimented with using a distinct scanner and parser
rather than trying to do everything in a parser. I was startled at how
much easier this was, and how much cleaner the code was. All sorts of
things that had previously contorted the parser logic became two lines of
code apiece in the scanner. In particular, the distinction between
"basic" (ed-style) and "extended" (egrep-style) REs became essentially
a choice of two different scanners, invisible to the parser.
--
spsystems.net is temporarily off the air; | Henry Spencer
mail to henry at zoo.utoronto.ca instead. | he...@spsystems.net
Yes, but that is not necessarily the best split. My rule for
deciding whether to split up or join modules in programs or aspects
of specifications is quite simple: if the PRECISE specification of
A, B and their interface is simpler or cleaner than that of A+B,
split them; and conversely. Where the skill comes in is in choosing
a good place to do the split :-)
In your example, splitting was clearly the right solution. There are
others where it could be clearly the wrong one. The mistake made by
the dogmatic splitters (remember the "no routine should exceed 20
lines" brigade?) is to ignore the fact that the interface
specification can be more complex than the entity specifications.
In the case I am thinking of, which started this, I am thinking of at
least three phases: lexing, parsing/type matching, and scope and
inheritance checking. To keep the middle phase within bounds, it is
clearly necessary to keep the type matching really simple and really
just part of the parsing.
But the basic principle is the same as Neelakantan Krishnaswami
suggests - the divisions are just slightly different.
Regards,
Nick Maclaren.
It sounds like you are assuming a language with a nugatory type system.
Doesn't scope checking affect type checking?
Tony.
--
f.a.n.finch <d...@dotat.at> http://dotat.at/
VIKING NORTH UTSIRE SOUTH UTSIRE: SOUTHEAST 6 TO GALE 8, PERHAPS SEVERE GALE 9
LATER. MAINLY FAIR. MODERATE WITH FOG PATCHES.
No. It is hard to explain briefly, but I will try. In most sane
languages, the syntactic nature of an object (e.g. an object or type
name) is visible during parsing. Algol 68 broke that with modes and
operators, which caused havoc far beyond its usefulness. Fortran
breaks that with external functions. C99 sort of breaks it in a few
places.
However, many languages either don't do that for types or have rules
such that the type of a just-parsed construct involves some evaluation
that goes beyind parsing. This very rarely adds anything to the
language except the ability to leave out casts, but does mean that the
type of a construct is not reliably and easily visible during parsing.
So you had better NOT use them in the syntax rules.
My assertion (based on experience) is that requiring explicit type
specifications is a minor problem for competent programmers and, while
hated by incompetent ones, helps them against themselves. And it also
allows the use of types as first-class syntactic properties. As I
say, it has been done.
>Doesn't scope checking affect type checking?
It depends on the language :-) But you may not understand what I mean
by that - I am talking about much more advanced checking than the
simple static scopes of the Algols, C or Fortran 90 and am wanting to
introduce the ability for rigorous, largely compiled, checking of the
relative scope of pointers and objects. That has been a neglected
approach since Algol 68, and (for reasons that are nor germane) the
myth that garbage collectors are a solution is, er, so much collective
garbage.
I am also talking about the integration of things like exception
handling (including signals), BCPL/C longjmp etc. In a statically
checked fashion! All easy, and with prior art, PROVIDED that the
language makes it possible.
Regards,
Nick Maclaren.
In a recent interview, Alan Kay suggested that while extensibility
*is* useful, there should be "a fence to hop" to get into
extension-writing mode. His observation is that even the best
programmers tend to do a very bad job on extension design when they're
doing it as part of a programming project, because the desire to get
on with the project discourages the careful thought that makes good
extensions.
This fits with something I've observed in the past: building a library
and building a program are two different jobs, which often want to use
somewhat different sets of language features.
Yes, indeed. However, even defining types is extending the language,
and the fence is usually very low. The reason that works is that the
extensions are very constrained.
>This fits with something I've observed in the past: building a
>library and building a program are two different jobs, which often
>want to use somewhat different sets of language features.
Again, quite. As someone with quite a lot of experience in designing
and implementing libraries and run-time systems, my belief is that few
language designers, compiler writers and operating system designers
either understand the issues or even attempt to provide the facility
to design and implement a decent third-party library or run-time
system.
It could be done, but it does involve a change in mind-set.
Regards,
Nick Maclaren.
This seems to lead to an interesting language design principle:
When I studied physics, I was told that one's notes to describe the
experiments one is doing much be sufficiently detailed that one self can
read them six months later. They quoted some guy who six months later
could not write the paper or thesis, whatever it was, due to insufficient
notes explaining how that special, crucial experiment was performed.
So, in language design, there must be a balance: The language must be
sufficiently rich to admit the programmer express the programming task
at hand. If the language is not sufficient in itself, the
extensibility features must be such that the semantics can fairly
easily be deduced.
I am somewhat interested in this question, writing on a theorem prover,
because in pure math, there is not universal accepted notation: each paper
can in principle have its own syntax, even though heavily dictated by the
notions at hand, as well by traditions. So this calls for some
extensibility. But too much extensibility will probably be quite unusable
from the practical point of view. So there is a question what might be
suitable to tuck this syntax problem conveniently away, from the
implementation point of view.
--
Hans Aberg
But TeX (like METAFONT) is closely tailored to its domain, and within
that domain, is quite elegant. Of course it can be 'ghastly' for tasks
of a different character - Knuth himself gives several examples in The
TeXbook. Anything is *possible* but not necessarily easy.
Similarly I find that many people who complain about Perl - especially
comparing it to languages such as Ruby or Python - have missed the
point. Perl has its sweet spot domains (as a child of sed, C, sh, etc)
and while it is a powerful general purpose programming language, that
was not its guiding principle. It was highly adapted from birth.
--Toby
Normally a theorem prover has some sort of interactive interface where
you can develop a script that proves your theorems. The formulas
entered are unreadable. If you want to prove some mathematical theorem,
then you have to enter a formula as readable as a formula written in
TeX. If you want to prove some some aspect of a program, then you have
to deal with formulas so long, that they are difficult to understand
because of that. The formulas are not only those entered by the human,
they are also formulas generated by the theorem prover during the
interactive process of proving theorems.
In both cases you need the theorem prover to present the formulas in a
way such that they are understandable to human readers. (You need the
theorem prover to present your input in a human readable way so that you
can proofread your input.) You need some sort of pretty printer, that
can present math formulas in the way used in books on math, and computer
programs and states in the way used in computer science books. This
should be integrated into some sort of IDE like tool, that acts as a
front end to the theorem prover just like an IDE acts as a front end to
the compiler and debugger.
I think such a front end and pretty printer is much more important than
a facility for changing the grammar of the input.
Karsten Nyblad
148f3wg02 at sneakemail dot com
No, that wasn't my point. One of the "gotchas" is that spaces are
syntactically significant, and you cannot introduce layout (ANY form
of layout) for clarity without changing the meaning of the program.
And, because TeX doesn't have a precise description (merely a guide on
how to use it), it is very hard to analyse a program for why it
doesn't do what you understand the TeX book to say that it should do.
>Similarly I find that many people who complain about Perl -
>especially comparing it to languages such as Ruby or Python - have
>missed the point. Perl has its sweet spot domains (as a child of sed,
>C, sh, etc) and while it is a powerful general purpose programming
>language, that was not its guiding principle. It was highly adapted
>from birth.
Again, you have missed the point. Perl is bad, even for the purposes
that it was designed for and is most often used for - system scripts.
People who care about RAS really, really do NOT want a privileged
script to do something unexpected. Humans make errors, but Perl is
such that most errors make it do something unpredictable rather than
issuing an error message.
People may have heard before, but my first and last Perl program was
20 lines long, with every branch tested both for correctness of the
condition and that each path was being correctly executed. It gave
wrong answers the first time I used it.
I had misunderstood the manual, and typed completely broken syntax.
This behaved as I expected during my tests and not on real data.
Not really related to that, I also half converted it to MVS, ISO C
and EBCDIC (don't ask), and so had to stufy its code. I have rarely
seen anything so bad, and its quality made it very clear why it was
likely to behave in that sort of way. I could go into details, but
they are irrelevant. I have heard that newer versions are better,
but my reaction is that they could scarcely fail to be.
Regards,
Nick Maclaren.
[You keep mentioning your failure to write a correct perl program on
the first try, but I can't figure out what conclusion we're supposed
to draw. That someone who is a skilled programmer in some languages
is still a novice in languages he doesn't know? That you aren't very
good at learning languages from the manual? That preconceptions make
it hard to learn new or different languages? I don't think my first
perl program worked either, nor did my first Lisp, C, Basic, Algol 60,
PL/I, Fortran, or Varian 620 assembler program, but I don't think it
was the languages' fault. (Well, maybe for C.)
Oh, by the way, I wrote a python program using my favorite text editor
which has four character tab stops. It didn't work, so nobody should
use python. -John]
> and implementing libraries and run-time systems, my belief is that few
> language designers, compiler writers and operating system designers
> either understand the issues or even attempt to provide the facility
> to design and implement a decent third-party library or run-time
> system.
>
> It could be done, but it does involve a change in mind-set.
I have also observed this phenomenon so many times.
The change in mind-set that you are talking about is
the different perspective of a lib's user and its implementor.
The inside view and the outside view. Even skilled implementors
are usually unable to change their mind-set (temporarily).
How can we change this ? What do I have to tell a skilled
implementor to make him change his mind when his design is bad ?
Whenever I try to do this, I run the risk of beeing treated
as a "bean-counter who simply doesnt understand".
> [You keep mentioning your failure to write a correct perl program on
> the first try, but I can't figure out what conclusion we're supposed
> to draw.
(snip)
> I don't think my first
> perl program worked either, nor did my first Lisp, C, Basic, Algol 60,
> PL/I, Fortran, or Varian 620 assembler program, but I don't think it
> was the languages' fault. (Well, maybe for C.)
I think one is that some languages make it easier for the compiler to
recognize when you did something wrong. C is not one of those
languages.
One Fortran feature that I still remember from watching so many
beginning programmers run into it is the ability to pass a function
address to a subroutine or function. It is easy for beginning
programmers to forget (as in not ever realize) that arrays must be
dimensioned in subroutines in addition to the calling program. If you
forget and only use the array on the right side of assignment
statements there is no compilation error. Fortran instead assumes it
is being passed a function address and executes the contents of the
array.
I believe that more language designers are recognizing the problem and
designing languages to reduce it. Also, more compilers issue warnings
for legal but confusing constructs, though sometimes I am not so happy
with those warnings.
-- glen
Whereas I get told that I am an academic theoretician who doesn't
understand - despite my record in implementation and support, some of
which has been commercial!
I don't know. What I do know is that X amount of effort spent in
designing for implementation cleanliness, failure handling, error
detection and diagnosis saves 3-10 X effort in debugging and support,
and often speeds up the time to the first genuinely working product.
But it slows down the time to demonstration considerably, which is bad
for researchers who have no interest in anything beyond that and
developers controlled by "all that matters is time to market"
salesdroids.
The difference in providing facilities for end users and for
third-party developers is similar, but is a lot more subtle. I do
know that one of the major reasons that it is so hard to get the
messages across is that few people nowadays have any real familiarity
with all aspects of developing, supporting and using products. That
never was common, but is as rare as hen's teeth in people under 50.
Regards,
Nick Maclaren.
> Normally a theorem prover has some sort of interactive interface where
> you can develop a script that proves your theorems. The formulas
> entered are unreadable. If you want to prove some mathematical theorem,
> then you have to enter a formula as readable as a formula written in
> TeX.
It turns out that TeX is wholly unsuitable, as it is a graphically
oriented format. The formulas should be entered in a semantically
correct manner, and one should develop a suitable syntax for
that. Therefore, I am developing my own, better suited, format. Once
the formulas have been properly entered and parsed, it becomes easy to
add suitable pretty-printing format.
The current pretty-printer tries to write closely to the input
theorems and proofs, with additional diagnostics, i.e., whether a
statement was proved or not. One can also write out the proofs in a
human readable format. This is needed for proper debugging.
> In both cases you need the theorem prover to present the formulas in a
> way such that they are understandable to human readers. (You need the
> theorem prover to present your input in a human readable way so that you
> can proofread your input.)
This is, as you say one very important aspect. If the formulas are not
readable to me, then I cannot debug properly. Right know, the most
difficult part is to make sure the proofs are completely accurate
(according to standard metamath).
In fact I have come quite far. Instead of a Prolog first depth search, I
make a breadth first search, and I have a format to write out the proofs
by which I can easily check their accuracy. (I have been able to attain
some automated induction proofs, i.e., the theorem prover finds the
induction predicate, and the user only needs to guess which formulas to
use in order to cut down the proof search paths.)
In addition, I have recently started to use Unicode UTF-8, using correct
mathematical symbols instead of ASCII words. This turns out to help up the
code a great deal.
> I think such a front end and pretty printer is much more important than
> a facility for changing the grammar of the input.
As always in math, notions and notation are closely intertwined. In
the computer language design version, semantics and syntax should be
closely worked together. Since I am the only user, it suffices right
now to use a static grammar, handled by Flex and Bison. But the
question is what dynamic extensibility will lingers on the horizon.
> You need some sort of pretty printer, that
> can present math formulas in the way used in books on math, and computer
> programs and states in the way used in computer science books. This
> should be integrated into some sort of IDE like tool, that acts as a
> front end to the theorem prover just like an IDE acts as a front end to
> the compiler and debugger.
Right now, I want to avoid doing anything more than the core
math. Thus, I have only a very small math set which contains only some
math core components to play around with.
But it runs out that one can do quite much with a text only syntax,
both using only ASCII, but also using UTF-8. A graphical pretty
printer would be beneficial for handling sup-/super-scripts and other
more complex math notation. But right now, I do not have so much need
for that.
--
Hans Aberg
I never found it particularly difficult. The TeXbook is superb
documentation, and for the truly desperate, there's the source. :-)
> >Similarly I find that many people who complain about Perl -
> >especially comparing it to languages such as Ruby or Python - have
> >missed the point. Perl has its sweet spot domains (as a child of sed,
> >C, sh, etc) and while it is a powerful general purpose programming
> >language, that was not its guiding principle. It was highly adapted
> >from birth.
>
> Again, you have missed the point. Perl is bad, even for the purposes
> that it was designed for and is most often used for - system scripts.
I disagree. At least you can admit it's no worse than bash.
> People who care about RAS really, really do NOT want a privileged
> script to do something unexpected. Humans make errors, but Perl is
> such that most errors make it do something unpredictable rather than
> issuing an error message.
I agree with that much.
> People may have heard before, but my first and last Perl program was
> 20 lines long, with every branch tested both for correctness of the
> condition and that each path was being correctly executed. It gave
> wrong answers the first time I used it.
Why should I accept your judgments of Perl then? Just asking. I would
not offer such criticisms of a language I'd only used once (and
incorrectly).
> I had misunderstood the manual, and typed completely broken syntax.
> This behaved as I expected during my tests and not on real data.
Why do you expect to get it right first time? One thing I have learned
about humans versus technology is that we are only truly proficient at
things we do every day; and that it takes 10 years to get good at
anything.
>
> Not really related to that, I also half converted it to MVS, ISO C
> and EBCDIC (don't ask), and so had to stufy its code. I have rarely
> seen anything so bad, and its quality made it very clear why it was
> likely to behave in that sort of way. I could go into details, but
> they are irrelevant. I have heard that newer versions are better,
> but my reaction is that they could scarcely fail to be.
It's clearly not for you. But there are plenty of wonderful languages
out there. Perl and TeX I consider to be domain specific, and therefore
perhaps prone to the learning-curve and fragility problems that you
encountered.
--Toby
>
> Regards,
> Nick Maclaren.
>
> [You keep mentioning your failure to write a correct perl program on
> the first try, but I can't figure out what conclusion we're supposed
> to draw. That someone who is a skilled programmer in some languages
> is still a novice in languages he doesn't know? That you aren't very
> good at learning languages from the manual? That preconceptions make
> it hard to learn new or different languages? I don't think my first
> perl program worked either, nor did my first Lisp, C, Basic, Algol 60,
> PL/I, Fortran, or Varian 620 assembler program, but I don't think it
> was the languages' fault. (Well, maybe for C.)
>
> Oh, by the way, I wrote a python program using my favorite text editor
> which has four character tab stops. It didn't work, so nobody should
> use python. -John]
Touché, John. At what point do your editorials become too long for
interjection and require participation as a civilian? :-)
[When they're longer than the original article, I guess.
This is veering away from compiler design into language design, so with
one more message in the queue I'm going to decree this thread to be over.
-John]
> I don't know. What I do know is that X amount of effort spent in
> designing for implementation cleanliness, failure handling, error
> detection and diagnosis saves 3-10 X effort in debugging and
> support, and often speeds up the time to the first genuinely working
> product. But it slows down the time to demonstration considerably,
> which is bad for researchers who have no interest in anything beyond
> that and developers controlled by "all that matters is time to
> market" salesdroids.
Yes, that's the way it is. Reminds me of some really bad software I
have seen. I would get into legal trouble if I gave examples. So, I
find that there is no way out of this.
> The difference in providing facilities for end users and for
> third-party developers is similar, but is a lot more subtle. I do
> know that one of the major reasons that it is so hard to get the
> messages across is that few people nowadays have any real
> familiarity with all aspects of developing, supporting and using
> products. That never was common, but is as rare as hen's teeth in
> people under 50.
A way out of this may be to educate these people differently. If
there is one principal designer (the Software Architect) who writes
the software interfaces (.h files or .ads in ADA), while someone else
writes the implementations. Is this likely to happen ? No, "we cant
afford to pay two when one will do".
[Thus endeth the thread. Thanks to all, it was certainly interesting. -John]
Out of curiosity, IS there a newsgroup for language design? The best I
could come up with are comp.lang and alt.comp.lang, both of which only seem
to contain only spam.
- Oliver
[ Try comp.lang.misc. -John]
There is ll1 discussion list that might be interesting:
<https://lists.csail.mit.edu/mailman/listinfo/ll-discuss>.
Though it is a list, not a newsgroups.
--
Ivan Boldyrev
XML -- new language of ML family.
You might also want to have a look at Lamport's paper "How to write a
long formula" (DEC SRC-119, 1994), possibly still findable at
<http://gatekeeper.dec.com/pub/DEC/SRC/research-reports/SRC-119.pdf>,
in which he relates some experience with notations for presenting
lengthy program-correctness formulas.
> >[human-readable theorem-prover output]
> >In addition, I have recently started to use Unicode UTF-8, using correct
> >mathematical symbols instead of ASCII words. This turns out to help up the
> >code a great deal.
>
> You might also want to have a look at Lamport's paper "How to write a
> long formula" (DEC SRC-119, 1994), possibly still findable at
> <http://gatekeeper.dec.com/pub/DEC/SRC/research-reports/SRC-119.pdf>,
> in which he relates some experience with notations for presenting
> lengthy program-correctness formulas.
I have a vague memory of coming across this paper before. The kind of
constructions suggested here are already use in math in various forms,
used to simplify notation. The reason it does not show up so clearly
in Lamport's context, is that he is only focusing on formulas alone,
whereas I write on a program supporting statements like theorems
grouped into theories as well. Then it turns out that mathematicians
use those structures to simplify formulas, simply by putting
in definitions ahead of the formulas, instead of the "let .. in"
construction that Lamport suggests.
And as for the "if else" construct he discusses, my program is built
up around a semantic type, which right now is quite similar to that of
abstract syntax tree (AST), then called simply "semantic tree". One
difference is that the latter is usually an intermediate in language
translation, whereas in my theory, the former is the primary object.
(Another difference is that one may not use trees; in an alternate
theory, bound variables could for example be represented by arrows to
the heads that bind them. This then has certain advantages and
disadvantages.) Thus, the parser is just an interface for the
translation into the semantic tree, and there is no fixed requirement
of what syntax a parser should handle. This approach is necessary in a
context of pure math, because there is no agreement of what a syntax
should be, and if the mathematician cannot let notions and notation
flow together, the formulas quickly become unreadable. (One should
note that this approach also breaks off from much traditional
metamathematics, which is usually syntactic, not semantic, in its
study of the object mathematical theory.) Then, once one has internal
semantic trees to compute around, there is the opposite problem, the
"expressing" or "pretty-printing", and again, there is no fixed
requirement of what it should be. Right now, I have some hardwired
parsing and expressing that have the effect that the output is a quite
similar, but formatted version, of the input (which unproved
substatements marked down), also giving the chance, at will, to
inspect the found proofs as a form of debugging. This suffices for me
right now, but I am thinking of hooking up some more general parsing
and expressing engines later, and one reason of doing so is that it
will simplify programming, as I then can focus more on the computing
engine. So, if carried out, Lamport's "if else" suggestion can be done
by the one designing an input grammar, and a person which does not
like a grammar used for a particular input, could format the output
into something else.
As for the third suggestion, the numbering of subformulas,
similar constructs are in use in math, namely, in the form of a
theorem with subparts that should be proved separately. In this case,
one sometimes wants to do it, because a more general proof method can
be applied to a sequence of formulas. I am looking to implement such
things as well, which is not difficult, just taking up time, but
important, if one should be able to enter the theories in an efficient
human way.
Otherwise, the comparison between pure mathematics and the structure
of modern computer languages is very interesting. For example, pure
math is definitely declarative, and so is my program. The proof of a
theorem can have subtheorems, which correspond to nested environments
in a computer language. Theorems and other statements are grouped into
theories, and the latter correspond to the modules of a computer
language. For example, a reductio ad absurdum proof in math (which
appears frequently, hidden in various forms), is formally a relation
between two different theories, one with the contradiction one
without. So if one should adhere to strict formality, theories
developed as modules become necessary. And it seems that the lambda
calculus, which corresponds to functional languages, will show up
fairly quickly, needed to trace back statements when these have a
special form.
So, many of the logical structures used in pure mathematics and
computer language design, apparently developed the last couple of
decades wholly independently of each other, use similar underlying
logical superstructures. Perhaps it is due to the human way, perhaps
it is due to cross-feeding; I do not know. It suggests however,
though, that the two, in the future, will flow together more.
--
Hans Aberg
In designing the metalanguage for the language machine I deliberately
avoided encouraging deeply bracketing structures. As I show in the
lambda experiment, you can think of rules that recognise and
subsititute grammatical sequences as equivalent to functions with
matching in the style of ML and HASKELL, except that nested recognition
phases can and do most of the time occur as part of the matching
process. So if you encourage deep nesting, it could occur on both the
left- and right-sides of rules, and the results can become pretty
difficult to arrange on the page.
Of course rules in the language machine are essentially simple
replacements, and the metalanguage is intended to emphasise that fact.
Incidentally, once you understand that unrestricted rules that
recognise and substitute grammatical sequences are so closely related
to functions in functional languages, it becomes clear why it is so
hard to make sense of unrestricted rules when they are written in the
traditional Chomsky/BNF generative direction - it's the equivalent of
having to write the name and arguments of the function after its body.
Our brains are very flexible, true, but they can also get pretty bent.
Analysis is the coalface of language, generative engines cannot
directly do any kind of analysis, and thinking indirectly about things
that can be tackled directly is probably bad for the brain.
Somewhere in the history of this topic was quite a lot of discussion
about van Wijngaarden w-grammars. It seems to me that they were a
valiant attempt to stretch the envelope of what you can do with the
generative model of language. But now, in the language machine you have
a reasonably efficient and usable system which is now shown to be
Turing-complete, and which directly applies unrestricted analytical
rules.
Any volunteers to attempt translating w-grammars to rules in the
language machine? It might even be feasible - whether desirable or
useful I'm really not sure.
Considering the technical realm it occurs to me that there is a simple
and straightforward implementation for self modifying grammars.
The appeal of domain specific languages come to mind and also lots of
reasons to take great care. the tower of babel is coming...?
This must have been the reason why few popular high level languages
provide true metaprogramming and even less programmers trust this
technology. Still i find e.g. the c++ mechanisms for compile-time
programming disgusting.
In my practical work i gained lots of positive experiences in
generative programming, which almost always was accomplished by
preprocessors or generators, usually as writen in the same language as
the generated code.
My desire for clean design has been calling for relief for years and
it won't stop. My hope is that we might find better ways to bring
marriages to our language zoo in the future.
> Any volunteers to attempt translating w-grammars to rules in the
> language machine? It might even be feasible - whether desirable or
> useful I'm really not sure.
In spite of the fact that i'm currently quite occupied, my first interest
would be to accrete more language platforms into your project. A proper
c(++) port would be a boost.
Very interesting as well and probably OnTopic here is an abstraction layer
for conversion between turing complete languages. I cannot say i have found
useful ressources but i cannot imagine that the should be no opinion about
it out there.
The Language of the LanguageMachine feels hairy here. Seems to be more of an
assembly than a structured way to formulate. But the cure is inside. Time
again...
Regards
- Robert Figura
If weeks last for months, you may wonder how long a day is.
--
/* mandlsig.c v0.23 (c) by Robert Figura */
I=1702;float O,o,i;main(l){for(;I--;putchar("oO .,\nm>cot.bitamea\
@urigrf <raguFit erobR"[I%74?I>837&874>I?I^833:l%5:5]))for(O=o=l=
0;O*O+o*o<(16^l++);o=2*O*o+I/74/11.-1,O=i)i=O*O-o*o+I%74*.04-2.2;}
Remember the original PL/1 language and the original Algol 68 report?
Finite time is not enough - it needs to be bounded by a reasonable
function in the size of a program.
>My desire for clean design has been calling for relief for years and
>it won't stop. My hope is that we might find better ways to bring
>marriages to our language zoo in the future.
I sympathise, but am likely to retire with my desire unsatisfied.
You may be younger, but you may well do the same.
Regards,
Nick Maclaren.
The lm-diagram shows that for unrestricted rules you need two nesting
systems that may and often will overlap - the notational equivalent is
a system of two interleaved bracket pairs (eg '(' ')' for start-end of
recognition phase, '[' ']' for start-end of substitution phase), as in:
.------------------------. .-----.
| .-----------. | | |
| | | | | |
( ... ( ... )[ ... )[ ... ( ... ] ... ] ... )[ ... ]
| | | | | |
| '-----' | | |
'-------------------' '------------------'
Here '...' represents a sequence of symbols and variable bindings that
have been matched/consumed in the course of the analysis, the brackets
are as described above, and the ascii art above and below the line
represents overlapped nestings of recognition phases (below the line)
and substitution phases (above the line) . Any symbol that is not
covered by nesting above the line comes from the external input. The
scope rules for variable reference relate to the nesting structure of
the recognition phases in which variables are bound.
My lambda experiment shows that this rule system in effect contains the
lambda calculus - and incidentally it does this without any kind of
self-modifying grammar - I have designed the rule system specifically
to permit grammatical rules to be added and deleted at run-time, but
have never yet needed to add the operations that would make this
possible.
The lambda experiment is intended as an educational and informative
exercise that would help place the language machine in relation to
theory and in relation to questions that are well understood - I don't
intend to vanish utterly into the 'Turing tarpit'.
On the question of the metalanguage notation, I have had this piece of
feedback:
> the syntax is weird at first glance but is in fact really concise
and efficient. I love it
I agree that the metalanguage is in effect the high-level assembler of
the language machine - but my experience has been that it gets the job
done, and I have put immense effort into making it concise and easy to
use. If you add structured complexity to the notation, you are setting
up yet another level of counterpoint, and as the lm-diagram shows, you
are already dealing with a kind of counterpoint at run-time (JS Bach
would have loved it).
I also agree about the disgustingness of c++ compile-time features, and
about the possible usefulness of doing a c++ implementation, analyser,
or converter - perhaps I'll just have to hold my nose and get on with
it.
In fact Remy Moueza has already modified my d2d translator to help in
converting c++ to D, but there are details that I need to look into
before posting the results on the website. During the 1980s I used a
previous implementation of these language techniques to translate C to
a variant/derivative of Algol68. This used grammatical rules and symbol
tables to deal with type analysis, and proved capable of porting itself
and a minimal runtime library to a completely different machine where
it continued in use for some time. Certainly c++ is harder than C, but
I doubt if there is anything that the language machine cannot handle.
So here's a question: would anyone sponsor a project that will use the
language machine to produce a retargettable frontend for c++ and at
least a demonstration backend for use in analysing and converting c++?
I could do this either within the existing project at sourceforge, or
start a new project to do it.
Peri Hankey - http://languagemachine.sourceforge.net - the language
machine