I'm the maintainer of the GCC for BCPL project
(http://gccbcpl.sourceforge.net) and have recently got to the stage
where I need some help from fellow compiler-people!
Recently work has been spent on validating the BCPL grammar, but this
has forced a re-evaluation of using Bison to create the parser. The
original BCPL parsers were recursive-descent and certain language
features (particularly the optional semicolon problem) seem to point
towards their use now. However, does anyone with experience of BCPL
feel that Bison is adequate for generating a BCPL parser? As soon as
this stage is over, the development focus can move back to GCC
integration and the building of the (GCC 4.0) intermediate
representations.
Another minor issue, that was overlooked when posted to the GCC
mailing lists recently, was the potential re-use of a generic symbol
table structure that already exists within the GCC
infrastructure. Does anyone know if there is one, or if there are any
guidelines for symbol tables when developing front ends with GCC? I
have already re-used an existing table, but I just wondered if there
was no need to do this. [Apologies for the GCC-centric nature of this
question.]
Thanks and regards,
Tom Crick
tomc...@users.sourceforge.net
http://gccbcpl.sourceforge.net
Hmm. I know BCPL, but not Bison :-)
My reaction is that isn't a very good way to proceed, and that it
might well be simpler and cleaner to do a bootstrapping job via a
BCPL interpreter. I.e.:
Take a BCPL compiler/interpreter (such as Martin Richards's!),
and add an alternate mode to generate the GCC internal form (or
something suitable to read in to generate it).
Bootstrap by running that on itself and feeding its output
into GCC - voila! - a GCC compiler for BCPL.
Of course, things won't be quite as simple as that. Have you asked
Martin Richards what he thinks?
Regards,
Nick Maclaren.
> It might be interesting to explain why you abandoned recursive
> descent, and what the reasons for doing so.
>
> >Recently work has been spent on validating the BCPL grammar, but this
> >has forced a re-evaluation of using Bison to create the parser. The
> >original BCPL parsers were recursive-descent and certain language
> >features (particularly the optional semicolon problem) seem to point
> >towards their use now.
That was partly my reason for posting to the list! The use of a
bottom-up parser generator such as Bison seems to create some problems
when trying to adequately describe some of the language features in
BCPL. It seems that certain features (particularly the optional
semicolon as a command separator, unlike C where it is a terminator)
would be easier to handle with a top-down, recursive-descent parser,
like the original BCPL parsers. I had wondered if anyone had any
experience with BCPL grammars in Bison and whether it would be
worthwhile to rewrite the parser.
Cheers,
Tom
As a (non-BCPL) data point on this, in "The Design and Evolution of C++",
Bjarne Stroustrup comments (p. 68):
In 1982 when I first planned Cfront [the preprocessor from C++ to C],
I wanted to use a recursive descent parser because I had experience
writing and maintaining such a beast, because I liked such parsers'
ability to produce good error messages, and because I liked the idea
of having the full power of a general-purpose programming language
available when decisions had to be made in the parser. However, being
a conscientious young computer scientist I asked the experts. Al Aho
and Steve Johnson were in the Computer Science Research Center and
they, primarily Steve, convinced me that writing a parser by hand was
most old-fashioned, would be an inefficient use of my time, would
almost certainly result in a hard-to-understand and hard-to-maintain
parser, and would be prone to unsystematic and therefore unreliable
error recovery. The right way was to use an LALR(1) parser generator,
so I used Al and Steve's YACC.
For most projects, it would have been the right choice. For almost
every project writing an experimental language from scratch, it would
have been the right choice. For most people, it would have been the
right choice. In retrospect, for me and C++ it was a bad mistake.
He was dealing with much the same situation: C was originally designed
for recursive descent, and fitting it into LALR(1) wasn't easy. PCC
had a yacc parser for C, but in fact it wasn't right: it handled some
of the more obscure cases, later significant in C++, incorrectly. An
LALR(1) grammar for C eventually appeared as part of the ANSI C work,
too late. There have been repeated LALR(1) problems as C++ has
evolved.
--
"Think outside the box -- the box isn't our friend." | Henry Spencer
-- George Herbert | he...@spsystems.net