Assuming that this will raise a chorus of "NO's", I'll then put
this forward: if I have to create one, what would be a good
starting point ? I'm considering stripping the central works out
of either BSD YACC or Allen Holub's Occs, and using the service
routines and overall organization. I can keep my eyes on
DeRemer's LALR(2) parser-generator for I.B.M. mainframes. I
know -- there's supposed to be a bug in it, but I didn't run into
it when I was using it a few years ago; anyhow, I'd keep my eyes
open.
BTW, what's the current state of Holub's Occs ? The stuff that I
got from him has files dated in 1991.
All the best !
John H. Lindsay,
Department of Mathematics and Computer Science,
Royal Military College of Canada, (**)
Kingston, Ontario, Canada, K7K 5L0.
Internet: Lind...@RMC.CA
Phone: (613) 541-6000 Extension 6419
Fax: (613) 542-8129
(**) Royal Military College is both a degree-granting university
and the military college for Canada's Armed Forces (our Army,
Navy and Air Force are all parts of just one organization which
is VERY frequently called on for U.N. peacekeeping service).
R.M.C. is small; we only had 750 undergraduates until just
recently, have about 1200 for 1995-96 and will have about 900 in
the future due to Canada having seriously downsized the military
and closed our other two Military Colleges, Royal Roads Military
College in Victoria, British Columbia, and College Militaire
Royal in Saint-Jean, Quebec (We'll miss them!). The students
and a few staff were transferred here. We have about 200
graduate students and we have programmes in Engineering, Arts,
and the hard Sciences, each in both English and French (just
over 1/3 of Canada speaks French). In spite of R.M.C.'s size,
we seem to produce a Rhodes Scholar every 2 or 3 years.
--
Send compilers articles to comp...@iecc.com,
meta-mail to compiler...@iecc.com.
> I'm working with a language definition that needs an LALR(2)
> parser-generator for a couple of good reasons; does anyone have
> one or know where I can get one ?
The SYNTAX scanner and parser generator, developed at INRIA by Pierre
Boullier <Pierre....@inria.fr>, has a Regular-LR parser generator
(among many other goodies) that subsumes LALR(2) (I think).
In addition to be that powerful, SYNTAX is also a very mature,
production-quality system that is free for education and research
purposes.
Unfortunately, Pierre put too much effort in making SYNTAX better than its
competitors and not enough in making it more famous...
Disclaimer: I've been working side-by-side to Pierre for a long time
(we've even co-authored a paper on SYNTAX' nice error recovery techniques)
and I'm a very satisfied user of SYNTAX for my own purposes.
I hope this helps.
Martin Jourdan
Action Charme, INRIA, Rocquencourt, France
Phone +33-1-39-63-54-35, fax +33-1-39-63-56-98, Martin....@inria.fr
> I'm working with a language definition that needs an LALR(2)
> parser-generator for a couple of good reasons;
Although it is quite common to find non-LALR(1) constructs in
programming language grammars, in most cases they are easily (?)
removed by rewriting the grammar. I'm curious about the reasons
to need LALR(2) power. What kind of language structure makes
such an unusual requirement? Wouldn't perhaps a full LR(1) parser
generator be enough for the task?
Paulo S. L. M. Barreto -- Software Analyst -- Unisys Brazil
Standard disclaimer applies ("I do not speak for Unisys", etc.)
Hallo John,
a good alternative to a LALR(n), n>1, or even LR(n), n>1 tool is
the usage of tools with a local definable trial parsing (or
backtrack parsing). One advantage of such approach is the better
speed, since no longer lookahead than 1 will be analysed in most
productions. Another advantage is, that such a trial parsing is
de facto an anlysis with an unlimitted lookahead, so that you can
formal describe even such nondeterministic grammars, like
the C++ grammar.
I am aware on two such tools based on bottom-up technique:
LADE by Xorian Ltd and Lark by J. Grosch and his CocoLab GmbH.
During an evaluation phase I have observed, that LADE generated
parsers are very, very slow. Compared to that, the parsers generated
by Lark are really quick. It is known, that without trial parsing
the lalr/Lark parsing machine outperforms yacc parsing machine
by a factor 1.5-3 depending on the particular grammar and input
_and_ provides an automatical error detection/correction on the fly.
With the trial parsing, some additional computational effort is
necessary (for example, one or two if's per token to ask for the
current parsing mode). Furthermore, Lark provides a fine support for
a couple of languages, whereas LADE supports just C++.
Lark supports Modula-2, Ansi C, C++, Eiffel and ADA
(and possibly some more).
It's a pleasure to use this tool and to parse almost
impossible looking grammars with it!
Thomas.
--
Thomas Herter email: thomas...@mch.sni.de
+89 636-49973 10027...@compuserve.com
> Although it is quite common to find non-LALR(1) constructs in
> programming language grammars, in most cases they are easily (?)
> removed by rewriting the grammar.
Of course this is true: AFAIK, there is a theorem that states that every
language that has an LR(k) grammar also has an LR(1) grammar. The main
problem when applying such grammar transformations is that the resulting
grammar may be quite far from the "natural" grammar and make semantic
processing quite awkward. I guess this is part of the "couple of very good
reasons" the original poster invoked for justifying his need.
In some sense, this is similar to the high-level programming language
debate: you can write any program in assembly language, it's just much
simpler and more natural in Pascal, C, ML, you name it. You can describe
any language with an LR(1) grammar, it's just much simpler and more
natural when you don't have to bother with the "k" in LR(k).
Martin Jourdan
Action Charme, INRIA, Rocquencourt, France
Phone +33-1-39-63-54-35, fax +33-1-39-63-56-98, Martin....@inria.fr
> a good alternative to a LALR(n), n>1, or even LR(n), n>1 tool is
> the usage of tools with a local definable trial parsing (or
> backtrack parsing).
There's also a free backtracking variant of yacc, which may be particularly
interesting if you already have experience with ordinary yacc, or need some
grammar that is already available for yacc to be extended. I don't remember
where I got it from, but you may be able to find out by mailing the author,
who's email address is given in the README file which I'm appending below.
------------------------------------------------------------------------------
BTYACC -- backtracking yacc.
BTYACC was created by Chris Dodd using ideas from many places and lots
of code from the Berkeley Yacc distribution, which is a public domain
yacc clone put together by the good folks at Berkeley. This code is
distributed with NO WARRANTEE and is public domain. It is certain to
contain bugs, which you should report to:
do...@csl.sri.com
See the README.BYACC, NEW_FEATURES, and ACKNOWLEDGEMENTS files for more
about Berkeley Yacc and other sources of info.
BTYACC is a modified version of yacc that supports automatic
backtracking and semantic disambiguation to parse ambiguous grammars, as
well as syntactic sugar for inherited attributes (which tend to
introduce conflicts). Whenever a btyacc generated parser runs into a
shift-reduce or reduce-reduce error in the parse table, it remembers
the current parse point (yacc stack and input stream state), and goes
into trial parse mode. It then continues parsing, ignoring most rule
actions. If it runs into an error (either through the parse table or
through an action calling YYERROR), it backtracks to the most recent
conflict point and tries a different alternative. If it finds a
successful parse (reaches the end of the input or an action calls
YYVALID), it backtracks to the point where it first entered trial parse
mode, and continues with a full parse (executing all actions),
following the path of the successful trial.
Actions in btyacc come in two flavors -- {}-actions, which are only
executed when not in trial mode, and []-actions which are executed
regardless of mode. There are also inherited attributes, which look
like arguments (they're enclosed in `()') and act like []-actions.
What this buys you:
* No more lexer feedback hack. In yacc grammars for C, a standard hack,
know as the `lexer feedback hack' is used to find typedef names. The
lexer uses semantic information to decide if any given identifier is
a typedef-name or not and returns a special token. With btyacc, you
no longer need to do this; the lexer should just always return an
identifier. The btyacc grammar then needs a rule of the form:
typename: ID [ if (!IsTypeName(LookupId($1))) YYERROR; ]
While the hack works adequately well for parsing C, it becomes a
nightmare when you try to parse something like C++, where treating
an ID as a typedef becomes heavily dependent on context.
* Easy disambiguation via simple ordering. Btyacc runs its trials via
the rule ``try shifting first, then try reducing by the order that the
conflicting rules appear in the input file''. This means you can deal
with semantic a disambiguation rule like:
[1] If it looks like a declaration it is, otherwise
[2] If it looks like an expression it is, otherwise
[3] it is a syntax error
[Ellis & Stroustrup, Annotated C++ Reference Manual, p93]
To deal with this, you need only put all the rules for declarations
before the rules for expressions in the grammar file.
* No extra cost if don't use it. Backtracking is only triggered when
the parse hits a shift/reduce or reduce/reduce conflict in the table. If
you have no conflicts in your grammar, there's no extra cost, other than
some extra code which will never be invoked.
* C++ and ANSI C compatible parsers. The parsers produced by btyacc
can be compiled with C++ correctly. If you `#define' YYSTYPE to be
some C++ type with constructor and destructor, everything will work fine.
My favorite is `#define YYSTYPE SmartPointer', where SmartPointer is a
smart pointer type that does garbage collection on the pointed to
objects.
BTYACC was originally written to make it easy to write a C++ parser
(my goal was to be able to use the grammar out of the back of the
ARM with as few modifications as possible). Anyone who has ever looked
at Jim Roskind's public domain C++ yacc grammar, or the yacc-based
grammar used in g++ knows how difficult this is. BTYACC is very
useful for parsing any ambiguous grammar, particularly ones that
come from trying to merge two (or more) complete grammars.
Inherited attributes in btyacc:
Inherited attributes look a lot like function arguments to non-terminals,
which is what they end up being in a recursive descent parser, but NOT
how they're implemented in btyacc. Basically they're just syntactic
sugar for embedded semantic actions and $0, $-1, ... in normal yacc.
btyacc gives you two big advantages besides just the syntax:
1. it does type checking on the inherited attributes, so you don't
have to specify $<type>0 and makes sure you give the correct`
number of arguments (inherited attributes) to every use of a
non-terminal.
2. It `collapses' identical actions from that are produced from
inherited attributes. This eliminates many potential reduce-reduce
conflicts arrising from the inherited attributes.
You use inherited attributes by declaring the types of the attributes in
the preamble with a type declaration and declaring names of the attributes
on the lhs of the yacc rule. You can of course have more than one
rule with the same rhs, and you can even give them different names in each,
but the type and number must be the same.
Here's a small example:
%type <t1> lhs(<t1>, <t2>) /* lhs takes two inherited attributes */
stuff(<t1>, <t2>)
%%
lhs($i1, $i2) : { $$ = $i1 }
| lhs($i1, $i2) stuff($1,$i2) { $$ = $2; }
This is roughly equivalent to the following yacc code:
lhs : { $$ = $<t1>-1; }
| lhs [ $<t1>$ = $1; ] [ $<t2>$ = $<t2>0; ] stuff { $$ = $4; }
;
See the file `test/t2.y' for a longer and more complete example.
At the current time, the start symbol cannot have any arguments.
Marc Wachowitz <m...@ipx2.rz.uni-mannheim.de>
Sometimes you can use the scanner lookahead to help with this kind of problem,
for example:
Scanner:
id - Letter Letter*
Parser:
S -> Xid "=" Xid "."
XId -> id | id "." id
In LALR(1) you will have a Shift reduce conflict with ".". But it is
LALR(2).
Using the scanner look ahead, the grammar can be rewrite to:
Scanner:
id - Letter Letter*
struct_id = Letter Letter* LOOKAHEAD("." Letter Letter*)
Parser:
S -> Xid "=" Xid "."
XId -> id | struct_id "." id
And this new LALR(1) grammar doesn't have shift/reduce conflict.
The struct_id token will only be recognice if ("." id) is after the first id.
for the input:
a = a.x.
| | |
| | |-- id
| |---- struct_id
|-------- id
In lex the Lookahead operator is "/", many other scanner/parser geneator have
this operator also.
Maybe you can use somthing like this to help you use a LALR(1) grammar for
your language.
Best Regards,
Frankie Arzu
Computer Science Depatment,
Universidad del Valle de Guatemala
E-mail: fa...@uvg.edu.gt
sunguat!assist!fra...@sun.com
Thanks to lind...@rmc.ca (John H. Lindsay) who asked for such a tool two
months ago. I took this question as an impulse to finish my implementation.
Best regards
Josef Grosch
CoCoLab
Hagsfelder Allee 16
D-76131 Karlsruhe
Germany
Tel.: +49-721-697061
Fax : +49-721-661966
Mail: gro...@cocolab.sub.com