Yacc/Lex for Java?

Yinghao Ma

unread,

Apr 3, 1997, 3:00:00 AM4/3/97

to

I'm doing work on a compiler using Java. Does anyone know if there is some
tools like Yacc/Lex produce Java code?
Thank for your help!

Yinghao

----------------------------------------------------------------------------
y...@cis.ohio-state.edu http://www.cis.ohio-state.edu/~yma

Rune Braathen

unread,

Apr 4, 1997, 3:00:00 AM4/4/97

to

Yinghao Ma <y...@peach.cis.ohio-state.edu> writes:

> I'm doing work on a compiler using Java. Does anyone know if there is some
> tools like Yacc/Lex produce Java code?

Indeed it is:

JavaLex
http://www.cs.princeton.edu/~appel/modern/java/JavaLex/

JavaCUP (yacc-ish)
http://www.cc.gatech.edu/gvu/people/Faculty/hudson/java_cup/home.html

jax/jell (not maintained, I think)
http://www.blackdown.org/~kbs/old.html

--
__________________________________________________________________
runeb / cF - ru...@td.org.uit.no - http://www.td.org.uit.no/~runeb
a new life awaits you, in the off-world colonies.

gru...@bnl.gov

unread,

Apr 4, 1997, 3:00:00 AM4/4/97

to

Yinghao Ma (y...@peach.cis.ohio-state.edu) wrote:
: I'm doing work on a compiler using Java. Does anyone know if there is some

: tools like Yacc/Lex produce Java code?

: Thank for your help!

JavaCC is the Java equivalent. Unlike JavaCup and JavaLex, which
are independent projects, JavaCC is directly supported by Sun Labs.
This in itself, of course, does not make it better, just more official.
However I use it, and it is not bad. The designers are also fairly responsive
in terms of support. Pick it up (for free) at:

http://www.suntest.com/JavaCC/

George Ruban
gru...@gte.com

Bob Horvath

unread,

Apr 4, 1997, 3:00:00 AM4/4/97

to

On 04 Apr 1997 11:21:19 +0200, Rune Braathen <ru...@td.org.uit.no>
wrote:

>Yinghao Ma <y...@peach.cis.ohio-state.edu> writes:
>
>> I'm doing work on a compiler using Java. Does anyone know if there is some
>> tools like Yacc/Lex produce Java code?
>
>

>Indeed it is:
>
>JavaLex
>http://www.cs.princeton.edu/~appel/modern/java/JavaLex/
>
>JavaCUP (yacc-ish)
>http://www.cc.gatech.edu/gvu/people/Faculty/hudson/java_cup/home.html
>
>jax/jell (not maintained, I think)
>http://www.blackdown.org/~kbs/old.html
>
>--

There is also JavaCC (formerly known as Jack) that is more like PCCTS
than lex and yacc. It is a different kind of parser, which having
worked with lex and yacc (but not their Java counterparts) I prefer.

Check it out at:

http://www.suntest.com/JavaCC/

Robert Edison

unread,

Apr 4, 1997, 3:00:00 AM4/4/97

to

Checkout PCCTS at http://java.magelang.com/antlr/download.html

Bob Edison
Kronos Inc

Yinghao Ma wrote:
>
> I'm doing work on a compiler using Java. Does anyone know if there is some
> tools like Yacc/Lex produce Java code?

Bob Horvath

unread,

Apr 4, 1997, 3:00:00 AM4/4/97

to

On Fri, 04 Apr 1997 13:08:50 -0500, Robert Edison <bed...@kronos.com>
wrote:

>Checkout PCCTS at http://java.magelang.com/antlr/download.html
>

I have looked a little at PCCTS, but have already done most of what I
need to do in JavaCC. I was curious if you could comment on
differences or similarities between the two. I know that JavaCC is
loosely based on PCCTS, so I am guessing that it might be further
along on some things that JavaCC doesn't have (such as an error
recover mechanism).

Also, is antlr much different than the C/C++ PCCTS implementation?

Bob Horvath

unread,

Apr 4, 1997, 3:00:00 AM4/4/97

to

On Fri, 04 Apr 1997 17:58:43 -0500, Liwei Zhao
<lwz...@cis.ohio-state.edu> wrote:

>Thank for all your helps!
>
>Maybe I have sticked to Yacc/Lex. JavaCC seems a little bit strange to
>me at the
>first glance, since it's top-down structure and can't handle
>left-recursion. If JavaCup/JavaLex can work together(hopefully they
>will), and don't have too much bugs, probably I prefer it. Any comments?
>

Yes. I much prefer the top down approach to building parsers. You
can pass data up and down the grammar as need be to build a parse
tree. They also have a thing called JJTree that automatically builds
a parse tree. I haven't tried that particular part of it yet.

Once you get used to it, then using the top down style is much more
intuitive.

Consider the following. Lets say you have a list of 0 or more things.
In lex an yacc, you end up doing something like (with syntactic
simplified to make it easier to make a point):

list : /* nothing */
| thing
| list thing

meaning that a list can be nothing, one thing, or recursively a list
followed by a new thing.

In JavaCC you would have the following:

list()

(thing())*

meaning that a list is zero or more things.

I am not sure if a simple example like this is informative enough, but
as things get more complicated, it is a lot easier to maintain the
JavaCC way rather than the lex/yacc way. I always found I had a
trickle down effect in lex and yacc, whereas the JavaCC grammars I
have done so far seem to be much close to the problem then the lex and
yacc grammars I have built.

I think if I had to build a C based parser, I would check out PCCTS
instead of lex and yacc as it is top down based as well.

Paul Hepworth

unread,

Apr 4, 1997, 3:00:00 AM4/4/97

to

In article <Pine.SUN.3.95.97040...@peach.cis.ohio-state.edu>, Yinghao Ma <y...@peach.cis.ohio-state.edu> writes:
> I'm doing work on a compiler using Java. Does anyone know if there is some
> tools like Yacc/Lex produce Java code?
> Thank for your help!

You have several options:

For parsers, you have four options that I know of:

There is JavaLex. I have used this and been very pleased. It is similar
to traditional lex, but has some syntax differences and some OOP features.

There is jf, which is a Java program that converts flex (or lex) output into
java code. (i.e. it uses the "real" flex) I have not used this, since I
already had my lexer done in JavaLex when I discovered it. (Also see jb.)

There is Jell, which I have not used. (Also see Jax.)

Lastly, I believe Sun has a lexer to complement its parser (q.v.).

For parsers, you also have four options:

There is a LALR(1) parser called JavaCUP. I have not used it because I
required yacc's accociativity and precedence rules for ambiguous grammars,
which CUP does not support. (Ambiguous grammars are often used in conjunction
with precedence and associativity rules to make more efficient arithmetic
expression parsers.)

There is jb, which is a Java program that converts bison (or yacc) output into
java code. (i.e. it uses the "real" bison) I have used this system and was
very pleased. (Although I did have to do a bit of hacking of the jb templates
to get it to work in my application. Noteably, it duplicates constants across
classes rather than placing them in an interface; boo! ) I suspect that if
you used both jf and jb, rather than forcing jb to work with JavaLex, as I
did, you wouldn't have to hack the templates (althought I'd still encourage
you to fix the duplicated constants isssue).

There is Jax, the counterpart of Jell. (Sorry if I have their names reversed.)
This parser is an LL(1) parser. While not as powerful as yacc or CUP, it is
simpler to understand and might make your parsers easier to debug.

There is Sun's LL(k) parser, formerly called Jack (I forget its new name).
This one comes with a working grammar for the Java language, which demonstrates
that even a complex language may be specified with LL(k). LL(k) parsers about
equally as powerful as LALR(1) parsers. (Just increase k until it will take
your grammar :-) Note that this is LL(k) not LL(1). (You choose the k.)
I have not used this one (yet).

I have not provided URLs; you can look them up as easy as I.

IHTH.

Paul

Liwei Zhao

unread,

Apr 4, 1997, 3:00:00 AM4/4/97

to gru...@bnl.gov

Thank for all your helps!

Maybe I have sticked to Yacc/Lex. JavaCC seems a little bit strange to
me at the
first glance, since it's top-down structure and can't handle
left-recursion. If JavaCup/JavaLex can work together(hopefully they
will), and don't have too much bugs, probably I prefer it. Any comments?

Yinghao

VISWANADHA SREENIVASA R

unread,

Apr 4, 1997, 3:00:00 AM4/4/97

to

Yinghao Ma <y...@peach.cis.ohio-state.edu> wrote:
>I'm doing work on a compiler using Java. Does anyone know if there is some
>tools like Yacc/Lex produce Java code?

Check out JavaCC available at :

http://www.suntest.com/JavaCC

Sreeni.

Steve E. Chapel

unread,

Apr 5, 1997, 3:00:00 AM4/5/97

to

hor...@enteract.com (Bob Horvath) writes:

>On Fri, 04 Apr 1997 17:58:43 -0500, Liwei Zhao
><lwz...@cis.ohio-state.edu> wrote:

>>..JavaCC seems a little bit strange... since it's top-down...

>I much prefer the top down approach to building parsers...

[Traditional yacc/bison code]

> list : /* nothing */
> | thing
> | list thing

>In JavaCC you would have the following:

>list()
> (thing())*

That seems to me to simply be syntactic sugar that has nothing to
do with whether the parser is parsing bottom-up or top-down.

Granted, the sugar is nice, but I think your implication that the
reason JavaCC allows it while yacc and bison do not has something
to do with whether the parser is LL(k) or LALR(1) is just wrong.

[Help stamp out rampant cross-posting. Newsgroups trimmed.]
--
Steve Chapel sch...@cs.ucsb.edu http://www.cs.ucsb.edu/~schapel
Billy: Hello. I'm Billy. I hear radio waves in my head.
DJ Jim: You hear radio waves in your head? Uh, heh-heh, is there
a request that you have tonight for KAOS?

Bob Horvath

unread,

Apr 6, 1997, 4:00:00 AM4/6/97

to

On 5 Apr 1997 22:03:02 -0800, sch...@cs.ucsb.edu (Steve E. Chapel)
wrote:

>hor...@enteract.com (Bob Horvath) writes:
>
>>On Fri, 04 Apr 1997 17:58:43 -0500, Liwei Zhao
>><lwz...@cis.ohio-state.edu> wrote:
>
>>>..JavaCC seems a little bit strange... since it's top-down...
>
>>I much prefer the top down approach to building parsers...
>
>[Traditional yacc/bison code]
>
>> list : /* nothing */
>> | thing
>> | list thing
>
>>In JavaCC you would have the following:
>
>>list()
>> (thing())*
>
>
>That seems to me to simply be syntactic sugar that has nothing to
>do with whether the parser is parsing bottom-up or top-down.
>
>Granted, the sugar is nice, but I think your implication that the
>reason JavaCC allows it while yacc and bison do not has something
>to do with whether the parser is LL(k) or LALR(1) is just wrong.
>
>

You are correct. That is more a EBNF issue vs. a BNF issues I guess.

If you are more up on your compiler theory, please help explain the
difference between LALR(1) and LL(k).

I just know that, in my opinion, JavaCC grammars are easier to
maintain as they are closer to the problem domain. You can pass
arguments up AND down the tree, which is very helpful, and you don't
have to jump thorugh hoops to do things like lists.

Scott Stanchfield

unread,

Apr 7, 1997, 3:00:00 AM4/7/97

to

I don't play a compiler writer on TV, but I work as one in real-life...

About the basic difference between LL and LALR schemes of parsing... (LALR
is a nice enhancement on LR that my boss, Frank DeRemer, came up with
(plug, plug))

LL parsing is completely "prediction based." The parser looks at a certain
number of tokens (words, operators, strings, etc) and has to decide which
way the parse should proceed. In an LL(1) parse, the parser gets to look
at exactly the next token and decide which alternative to proceed with.
There are several nice enhancements to LL parsing, such as adding
predicates to allow semantic information help guide the parse, and
backtracking (if one alt fails, just go back and try another). LL parsing
can be implemented using a stack or, more commonly, as a mutually-exclusive
set of rule functions (known as a recursive-descent parser).

LALR parsing takes a "keep looking at input until we can make a decision"
standpoint. Until a decision can be made, the current input token is
pushed on a stack. When we get to a point where the stack matches only one
possible production, we "reduce" the stack, replacing the matched, topmost
portion with the production non-terminal that matched it. In most LALR
parser-generators, such as yacc, a large state-machine is used to keep
track of the current "stack" state.

There are several pros and cons to each, as few of which are (my opinion ==
"I think")

Debugging -- I think LL is easier -- not an objective opinion, mind you...

Strength of representation -- LALR grammars are much easier to write -- LL
grammars cannot have left-recursion in rules or common prefixes (they must
be left-factored)

Speed -- Depends very much on the implementation, size of generated tables
(for LALR), depth of rule calls (for recursive-descent LL)

Error recovery/reporting -- LL is stronger here as there is more context
information. Think of LL as gathering information from the top of the file
being parsed, LALR as gathering groups of information at the top of the
stack -- LALR matches from the rightmost-edge of the rule being matched; LL
matches from the left

Tools -- there are several good tools for both methods. A few common ones:
LALR -- yacc is the standard - there are several variants & several other
tools (I think GMD is LALR?)
LL -- PCCTS (my favorite), JavaCC

A lot depends on how comfortable you are with the methodology and the tools
available. (Personally I think yacc is rather poor...)

I'm a _bit_ biased toward PCCTS as I'm currently working on a nifty
debugger for PCCTS 2.0 (written in and generating Java source), and I'm
working on a tutorial book for it.

You can find info on some parser tools that generate Java at:

Purdue/Parr Compiler Construction Tool Set (PCCTS):
http://java.magelang.com/antlr
Java Compiler Compiler (JavaCC aka Jack): http://www.suntest.com

Both are LL tools, but I think there are some LALR tools out there too...

For more compiler info, try
comp.compilers -- general compiler talk
comp.compilers.tools.pccts -- PCCTS talk

- Scott
--
---------
Scott Stanchfield, Santa Cruz CA
See my PCCTS Tutorial at
http://www.scruz.net/~thetick/pcctstut

Bob Horvath <hor...@enteract.com> wrote in article
<3347e8be....@news.enteract.com>...

Bob Horvath

unread,

Apr 7, 1997, 3:00:00 AM4/7/97

to

On 7 Apr 1997 02:25:15 GMT, "Scott Stanchfield" <the...@scruz.net>
wrote:

>I don't play a compiler writer on TV, but I work as one in real-life...
>

>About the basic difference between LL and LALR schemes of parsing...

[edited out the very good explanation - thanks]

>There are several pros and cons to each, as few of which are (my opinion ==
>"I think")
>
>Debugging -- I think LL is easier -- not an objective opinion, mind you...

I would tend to agree.

>
>Strength of representation -- LALR grammars are much easier to write -- LL
>grammars cannot have left-recursion in rules or common prefixes (they must
>be left-factored)

I find the opposite. I found there was a lot of trickle down effect
when I worked with yacc. Can you give some examples of why you think
left recursion is so important?

So far what I have tried doing in JavaCC (my only LL experience),
didn't make me feel like I needed much left recursion, but it is not a
very good example. I have not done a lot in either yacc or javaCC (and
none with PCCTS).

>Tools -- there are several good tools for both methods. A few common ones:
> LALR -- yacc is the standard - there are several variants & several other
>tools (I think GMD is LALR?)
> LL -- PCCTS (my favorite), JavaCC
>
>A lot depends on how comfortable you are with the methodology and the tools
>available. (Personally I think yacc is rather poor...)
>
>I'm a _bit_ biased toward PCCTS as I'm currently working on a nifty
>debugger for PCCTS 2.0 (written in and generating Java source), and I'm
>working on a tutorial book for it.
>

Have you looked at JavaCC at all to be able to comment on similarities
or differences. I know that JavaCC is loosely based on PCCTS, so I am
assuming that PCCTS, (or Antlr as it seems to be called now), is
further along. As someone who might consider Antlr in the future, but
is currently a JavaCC user, here is your chance for a plug.

BTW, what does ANTLR stand for?

Steve E. Chapel

unread,

Apr 8, 1997, 3:00:00 AM4/8/97

to

"Scott Stanchfield" <the...@scruz.net> writes:

>> On 5 Apr 1997 22:03:02 -0800, sch...@cs.ucsb.edu (Steve E. Chapel) wrote:
>> If you are more up on your compiler theory, please help explain the
>> difference between LALR(1) and LL(k).
>> I just know that, in my opinion, JavaCC grammars are easier to
>> maintain as they are closer to the problem domain. You can pass
>> arguments up AND down the tree, which is very helpful, and you don't
>> have to jump thorugh hoops to do things like lists.

Just for the record, I didn't write any of that.

Please be careful when quoting... since my header is followed by >>, then
logically anything I wrote would be followed by >>> anyway.

Scott Stanchfield

unread,

Apr 9, 1997, 3:00:00 AM4/9/97

to

(Sorry -- bad cut -- at least I wasn't flaming you...)
-- Scott

--
---------
Scott Stanchfield, Santa Cruz CA
See my PCCTS Tutorial at
http://www.scruz.net/~thetick/pcctstut

Steve E. Chapel <sch...@cs.ucsb.edu> wrote in article
<5iego6$6...@wacko.cs.ucsb.edu>...

> "Scott Stanchfield" <the...@scruz.net> writes:
>
> >> On 5 Apr 1997 22:03:02 -0800, sch...@cs.ucsb.edu (Steve E. Chapel)
wrote:

[snip]

Nik Shaylor

unread,

Apr 11, 1997, 3:00:00 AM4/11/97

to

On 4 Apr 1997 14:47:09 GMT, gru...@bnl.gov () wrote:

>Yinghao Ma (y...@peach.cis.ohio-state.edu) wrote:
>: I'm doing work on a compiler using Java. Does anyone know if there is some
>: tools like Yacc/Lex produce Java code?

>: Thank for your help!
>
>JavaCC is the Java equivalent. Unlike JavaCup and JavaLex, which
>are independent projects, JavaCC is directly supported by Sun Labs.

Yes but JavaCC is LL(k) and yacc is LALR(1) and that's not exactly equivalent.

>This in itself, of course, does not make it better, just more official.
>However I use it, and it is not bad. The designers are also fairly responsive
>in terms of support.

I'm sure that is so. I have used JavaCUP and JavaLex in a couple of projects.
JavaCUP has always performed flawlessly, but I have had a few problems with
JavaLex. I would like to see a better version of this.

Nik Shaylor

Bill Foote, HotJava Developer

unread,

Apr 12, 1997, 3:00:00 AM4/12/97

to

In article <334e36ef...@news.tcp.co.uk>,

Nik Shaylor <nsha...@tcp.co.uk> wrote:
>On 4 Apr 1997 14:47:09 GMT, gru...@bnl.gov () wrote:
>
>>Yinghao Ma (y...@peach.cis.ohio-state.edu) wrote:
>>: I'm doing work on a compiler using Java. Does anyone know if there is some
>>: tools like Yacc/Lex produce Java code?
>>: Thank for your help!
>>
>>JavaCC is the Java equivalent. Unlike JavaCup and JavaLex, which
>>are independent projects, JavaCC is directly supported by Sun Labs.
>
>Yes but JavaCC is LL(k) and yacc is LALR(1) and that's not exactly equivalent.

Right. JavaCC is more ANTLR-like than yacc-like.

Personally, I've found ANTLR and JavaCC easier to deal with than yacc.
I've had pretty good luck with JavaCC -- I was able to take the 1.0.2-based
Java grammar and hack together a working Java 1.1 grammar pretty quickly
as part of a Java code browser I hacked together.

Doing some earlier work with ANTLR, I found that I spent _much_ less
time chasing down mysterious error messages (like the infamous "5
shift/reduce conflicts" messages). I also found it quite straightforward
to swap parts out when I got to performance-tuning time (it was a
simple grammar, and in that case a hand-tuned lexer was trivial to
write and considerably faster).

If you have yacc grammars hanging around that you want to port, though,
a yacc-like product would probably be easier to use.
--
Bill Foote bill....@eng.sun.com
HotJava Developer, JavaSoft http://java.sun.com/people/billf/

Steve E. Chapel

unread,

Apr 15, 1997, 3:00:00 AM4/15/97

to

sl...@cc.usu.edu (Paul Hepworth) writes:

>...There is Sun's LL(k) parser, formerly called Jack (I forget its new name).

JavaCC, for Java Compiler-Compiler

>This one comes with a working grammar for the Java language, which demonstrates
>that even a complex language may be specified with LL(k). LL(k) parsers about
>equally as powerful as LALR(1) parsers. (Just increase k until it will take

>your grammar :-) Note that this is LL(k) not LL(1). (You choose the k.)...

According to a diagram in _Modern_Compiler_Implementation_in_Java:_Basic_
_Techniques_ by Andrew Appel, LALR(1) includes some languages that LL(k)
doesn't, and LL(k) includes some languages that LALR(1) doesn't. It appears
that "reasonable" languages are generally both LL(k) and LALR(1).

I wonder if anyone could tell me what a language that is LALR(1) but not
LL(k) would be like, or what a language that is LL(k) but not LALR(1) would
be like? Are all modern programming languages really LL(k)? And what about
specific details about how you parse top-down? I know you'll need k tokens
of lookahead, and this lookahead will determine which leaf in the parse
tree built so far to expand (which derivation of the grammar to apply), but
exactly how is this done? A reference to an appropriate URL or book would
be appreciated. None of the books I've seen go into the details of LL(k)
parsing. :-(

--
Steve Chapel sch...@cs.ucsb.edu http://www.cs.ucsb.edu/~schapel

Psi Factor quote of the week:
"I don't see anything in the Vegas' profile that would indicate why
Mrs. Vega has supernatural powers."

Scott Hudson

unread,

Apr 16, 1997, 3:00:00 AM4/16/97

to

sch...@cs.ucsb.edu (Steve E. Chapel) writes:
>
>According to a diagram in _Modern_Compiler_Implementation_in_Java:_Basic_
>_Techniques_ by Andrew Appel, LALR(1) includes some languages that LL(k)
>doesn't, and LL(k) includes some languages that LALR(1) doesn't.

Comparing LL(*K*) to LALR(1) is not exactly a fair comparison. Also,
this statement is about the set of languages that can be handled by the
techniques. It is usually more to the point to consider the set of
grammars that are accepted. There are normally many grammars that
generate the same language. The question is often whether you have to
translate a "simple" grammar into a "convoluted" one in order to make
it work under the technique.

A little cleaner way to look at all this is to note that LL(1) is a
strict subset of LALR(1), both in terms of languages accepted *and*
grammars (i.e., all LL(1) grammars are also LALR(1), but definitely not
the other way around). To my mind, this gives LALR a definite edge if
you consider only syntax.

The other issues that come up in the LL vs. LR question are usually in
regard to the ease with which semantic actions can be placed within the
grammar. Adding actions to an LR grammar in the middle of productions
can introduce conflicts (which are sometimes difficult to understand
and work around). This is not the case for LL -- you can put actions
anywhere and it always works.

For this reason some people conclude that LL is easier to deal with in
regard to semantic actions. However, this is a bit of red herring,
since you can show that if you feed any valid LL grammar (including
arbitrary actions) into an LR parser generator, it will never produce a
conflict. So, you only get action induced conflicts when the grammar
is beyond the things LL can handle. [Note, this result is for LR(1).
See "Compilers: Principles, Techniques, and Tools" by Aho, Sethi, and
Ullman, Section 5.6. The result may hold for LALR(1), but I'm not
certain, and don't have time to work it out at present.]

> It appears
>that "reasonable" languages are generally both LL(k) and LALR(1).

I would agree with that statement for languages. For grammars, I'm not
sure I would make such a statement. No left-recursive grammars are
LL(k) for any k, so this means that you must always do left-recursion
removal. This makes the typical grammar for expressions, for example,
a bit messier. Instead of something like:

E ::= E + T | T
T ::= T * F | F
F ::= number

you have to transform into something like:

E ::= T R1
R1 ::= + T | empty
T ::= F R1
R2 ::= * F | empty
F ::= number

Based on all the above, I would personally pretty much always choose
LALR(1) over LL(1).

Scott Hudson
[Author of CUP -- a Java-based LALR parser generator]

Steve E. Chapel

unread,

Apr 16, 1997, 3:00:00 AM4/16/97

to

hud...@cc.gatech.edu (Scott Hudson) writes:

[snip]

>...No left-recursive grammars are

>LL(k) for any k, so this means that you must always do left-recursion
>removal. This makes the typical grammar for expressions, for example,
>a bit messier. Instead of something like:

> E ::= E + T | T
> T ::= T * F | F
> F ::= number

>you have to transform into something like:

> E ::= T R1
> R1 ::= + T | empty
> T ::= F R1
> R2 ::= * F | empty
> F ::= number

[snip]

This is another point that several posters have made that I've been wondering
about. Can't the compiler-compiler remove the left recursion manually? You
could then feed it a grammar like the top, and it would automatically
transform the rules until left recursion was eliminated. I remember learning
an algorithm for doing this, and it didn't seem too complex, or memory or
time consuming. Why don't LL(k) compiler-compilers use this algorithm?

Bernie Lofaso

unread,

Apr 17, 1997, 3:00:00 AM4/17/97

to Nik Shaylor

Nik Shaylor wrote:

> I'm sure that is so. I have used JavaCUP and JavaLex in a couple of projects.
> JavaCUP has always performed flawlessly, but I have had a few problems with
> JavaLex. I would like to see a better version of this.

I had numerous problems using JavaLex so I abandoned it and wrote my
own.
It's called JLex. It supports lex-type patterns but does not implement
context patterns or lexer states (the latter may get implemented later
this
year). I'll provide copies to people who are interested as long as you
understand there's no warrenty and no support. (Bug reports accepted but
feature requests are likely to be ignored.)

Bernie Lofaso
Applied Research Labs

Scott Stanchfield

unread,

Apr 17, 1997, 3:00:00 AM4/17/97

to

Steve E. Chapel wrote:
>
> This is another point that several posters have made that I've been wondering
> about. Can't the compiler-compiler remove the left recursion manually? You
> could then feed it a grammar like the top, and it would automatically
> transform the rules until left recursion was eliminated. I remember learning
> an algorithm for doing this, and it didn't seem too complex, or memory or
> time consuming. Why don't LL(k) compiler-compilers use this algorithm?

Big problems in LALR-->LL conversion

(Left-recursion in a minute...)

Problem 1 -- left factoring
First and foremost in my mind -- recursive-descent parsers that are
generated by most LL tools (such as PCCTS) have the great property that
the generated code maps very nicely to the rules. This makes debugging
much easier, as you can trace the code in a source debugger and know how
the code relates to the rules. (Of course this will be moot when my
PCCTS debugger comes out this summer... ;) )

And a more "real" reason...
"Because the algorithm only accounts for _plain_ grammars."

If the grammar only has terminals and non-terminals, no problem.
The problem comes when you have actions and other things such as
predicates in the grammar. For example:

rule
: {action1;} a B C
| {action2;} a D E
;

How do you automatically left factor it? (I've been thinking about
doing something like this for PCCTS for a while, but this is one of hte
cases that result in big problems.) You'd get something like:

rule
: {action1 OR action2} a
( B C
| D E
)
;

(if you used subrules).

How do you determine which action to perform? If you moved the actions
_after_ the reference as in

rule
: a ( {action1;} B C
| {action2;} D E
)
;

But -- what if something in action1 or action2 depended on the fact that
"a" had not yet been matched? Problem. Big problem.
Of course, auto-left factoring could be done if several restrictions
were placed on where actions were placed...

Problem 2 -- left recursion
The algorithm for left recursion removal generates some just-plain ugly
code... (OK -- my opinion speaking here.) This hurts the debugging
argument above.

I think this could be done (I don't think action placement hurts here)
but I think left-recursion should be discouraged when tools support much
nicer (and more readable) notations.

Take your typical yacc "list of stuff"

listOfStuff
: widget
| listOfStuff COMMA widget
;

much less readable than

listOfStuff
: widget (COMMA widget)*
;

in a tool like PCCTS. Give in to your feelings! Trust the Horse!
(huh?)

Left recursion can get really nasty to figure out, and in most cases it
just takes a little though to rewrite it as a nice EBNF closure.

-- Scott

> --
> Steve Chapel sch...@cs.ucsb.edu http://www.cs.ucsb.edu/~schapel
> Psi Factor quote of the week:
> "I don't see anything in the Vegas' profile that would indicate why
> Mrs. Vega has supernatural powers."

--
Scott Stanchfield Santa Cruz,CA
Visit my PCCTS Tutorial at http://www.scruz.net/~thetick/pcctstut