P6C: Parser Weirdness

Abhijit A. Mahabal

unread,

May 8, 2004, 11:55:30 PM5/8/04

to perl6-i...@perl.org

I was writing a few tests for the P6 parser and ran into a weird problem.
If I have the following in a file in languages/perl6, it works as
expected:

The file contains:
===========
use P6C::Parser;
use Perl6grammar;

my $grammar = new Perl6grammar;
print $grammar->sigil('$'),"\n";
print $grammar->signature('(+$x)'), "\n";
===========

It prints out:
=============================
P6C::sigil=ARRAY(0x8a1000c)
P6C::signature=ARRAY(0x8a10738)
=============================

And that is what I expected.

Now, if I put exactly the same contents in a file in
languages/perl6/t/parser, then I get the following output:

==========================
P6C::sigil=ARRAY(0x89ffbc8)
at /u/amahabal/site_perl/lib/perl5/site_perl/5.8.0/Parse/RecDescent.pm
line 2822

Parse::RecDescent::_parserepeat('Parse::RecDescent=HASH(0x8a8abfc)','+$x)','CODE(0x87b9598)',0,1,'','undef')
called at Perl6grammar.pm line 6938

Parse::RecDescent::namespace000001::sigparam('Parse::RecDescent=HASH(0x8a8abfc)','+$x)','','','undef')
called at Perl6grammar.pm line 24049

Parse::RecDescent::namespace000001::signature('Parse::RecDescent=HASH(0x8a8abfc)','(+$x)','undef','undef','CODE(0x89ff214)')
called at
/u/amahabal/site_perl/lib/perl5/site_perl/5.8.0/Parse/RecDescent.pm line
2797

Parse::RecDescent::AUTOLOAD('Parse::RecDescent=HASH(0x8a8abfc)','(+$x)')
called at t/parser/foo3.t line 6
==============================

(I added a confess to Parse::Recdescent where it was failing).

Clearly, something happened right as the ->sigil() case worked okay. I
have not yet been able to hunt the problem down, and was hoping that
somebody knows whats happening off the top of their heads. I do not want
to put all the tests in the top directory :)

--Abhiit

Abhijit Mahabal http://www.cs.indiana.edu/~amahabal/

Abhijit A. Mahabal

unread,

May 9, 2004, 10:51:20 PM5/9/04

to perl6-i...@perl.org

On Sat, 8 May 2004, Abhijit A. Mahabal wrote:

> I was writing a few tests for the P6 parser and ran into a weird problem.
> If I have the following in a file in languages/perl6, it works as
> expected:

[...]

> Now, if I put exactly the same contents in a file in

> languages/perl6/t/parser, then I get the following error:

Okay, I traced the problem to a "use FindBin" in P6C::Parser.pm. Is it
okay to change

use lib $FindBin::Bin/../../lib;

to

use lib ../../lib;

or is there a good reason not to? The current version makes it necessary
to put all files that "use P6C::Parser" in the same directory, and the
change would allow:

perl t/parser/foo.t

to work.

--Abhijit

Steve Fink

unread,

May 10, 2004, 1:06:48 PM5/10/04

to Abhijit A. Mahabal, perl6-i...@perl.org

On May-09, Abhijit A. Mahabal wrote:
> On Sat, 8 May 2004, Abhijit A. Mahabal wrote:
>
> > I was writing a few tests for the P6 parser and ran into a weird problem.
> > If I have the following in a file in languages/perl6, it works as
> > expected:
>
> [...]
>
> > Now, if I put exactly the same contents in a file in
> > languages/perl6/t/parser, then I get the following error:
>
>
> Okay, I traced the problem to a "use FindBin" in P6C::Parser.pm. Is it
> okay to change
>
> use lib $FindBin::Bin/../../lib;
>
> to
>
> use lib ../../lib;
>
> or is there a good reason not to?

Neither of those seems right to me. The first keys off of the position
of the binary, which could be anywhere with respect to the library
module you're in; the second is relative to whatever the current
directory is while you're running the script. I would think that
something like

use File::Basename qw(dirname);
use lib dirname($INC{"P6C/Parser.pm"})."/../../../../lib";

(untested and probably not quite the right path) would be better. Or
perhaps it should be ripped out entirely, and any script using
P6C::Parser should be required to set the lib path correctly? It
partly depends on whether we want to ensure that P6C::Parser
preferentially uses the Parse::RecDescent from parrot/lib rather than
a system-provided one. Which probably is not the case?

> The current version makes it necessary
> to put all files that "use P6C::Parser" in the same directory, and the
> change would allow:
>
> perl t/parser/foo.t
>
> to work.

Just make sure

cd t; perl parser/foo.t

works too.

Abhijit A. Mahabal

unread,

May 10, 2004, 1:24:55 PM5/10/04

to Steve Fink, perl6-i...@perl.org

On Mon, 10 May 2004, Steve Fink wrote:

> use File::Basename qw(dirname);
> use lib dirname($INC{"P6C/Parser.pm"})."/../../../../lib";

I had already tried that, and it doesn't seem to work. I guess it is some
timing issue: $INC{"P6C/Parser.pm"} gets defined after P6C::Parser.pm is
loaded (I think).

>
> Just make sure
>
> cd t; perl parser/foo.t
>
> works too.
>

It doesn't yet :( We do need a clean solution.

--abhijit

Allison Randal

unread,

May 10, 2004, 1:34:53 PM5/10/04

to Abhijit A. Mahabal, Steve Fink, perl6-i...@perl.org

It doesn't work for any of them. You can only run the tests from the
languages/perl6 directory. This needs to change at some point.

Even worse, I started working on some unit tests for the various P6C
modules and discovered some really nasty 'use' circles (X uses Y uses Z
uses X), that meant the modules had to be loaded in a very specific
order. I cleaned up a few of them, but I'm sure there are more lurking.

Anyway, yeah, we need a clean solution. All suggestions cheerfully
welcomed. :)

And don't forget, this is just a prototype. All the code will be
replaced as soon as we get a working grammar engine.

Allison

Joseph Ryan

unread,

May 10, 2004, 1:41:12 PM5/10/04

to Steve Fink, perl6-i...@perl.org

Steve Fink wrote:

> Neither of those seems right to me. The first keys off of the position
>
>of the binary, which could be anywhere with respect to the library
>module you're in; the second is relative to whatever the current
>directory is while you're running the script. I would think that
>something like
>
> use File::Basename qw(dirname);
> use lib dirname($INC{"P6C/Parser.pm"})."/../../../../lib";
>
>(untested and probably not quite the right path) would be better. Or
>perhaps it should be ripped out entirely, and any script using
>P6C::Parser should be required to set the lib path correctly? It
>partly depends on whether we want to ensure that P6C::Parser
>preferentially uses the Parse::RecDescent from parrot/lib rather than
>a system-provided one. Which probably is not the case?
>

The Parse::RecDescent in parrot/lib is a hacked version that removes
a bunch of stuff (tracing code, iirc) from the outputted grammer so
that it runs many orders faster than the regular version. Or, to
put it another way, it increases P6C's runspeed from "infuriating"
to "slow" :)

- Joe

Dan Sugalski

unread,

May 10, 2004, 1:42:32 PM5/10/04

to Allison Randal, Abhijit A. Mahabal, Steve Fink, perl6-i...@perl.org

At 12:34 PM -0500 5/10/04, Allison Randal wrote:
>And don't forget, this is just a prototype. All the code will be
>replaced as soon as we get a working grammar engine.

I'm not so sure about that. (Not that it won't be replaced, but that
it needs the grammar engine) I'm pretty sure that grammars as Larry's
defined 'em are recursive-descent, and if that's true then I've this
really nasty feeling we're going to find it insufficiently fast to
parse perl. Could be wrong, but even if we pick up two orders of
magnitude in speed over Parse::RecDescent things'll still be
significantly more sluggish than the current perl 5 parser.

(OTOH, if the grammar engine's not required to be recursive-descent
that makes things rather faster, which is fine with me :)
--
Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk

Allison Randal

unread,

May 10, 2004, 2:24:49 PM5/10/04

to Dan Sugalski, perl6-i...@perl.org

Dan Sugalski wrote:
>
> I'm not so sure about that. (Not that it won't be replaced, but that
> it needs the grammar engine) I'm pretty sure that grammars as Larry's
> defined 'em are recursive-descent, and if that's true then I've this
> really nasty feeling we're going to find it insufficiently fast to
> parse perl. Could be wrong, but even if we pick up two orders of
> magnitude in speed over Parse::RecDescent things'll still be
> significantly more sluggish than the current perl 5 parser.
>
> (OTOH, if the grammar engine's not required to be recursive-descent
> that makes things rather faster, which is fine with me :)

That's a question with multiple layers of answers. :)

The abstraction we need to meet is: a) everything that can be done in
the parser that ships with Perl 6 can be done in Perl 6; and b) it is
possible to change the way Perl 6 is parsed within Perl 6 code (by
loading a custom grammar module, for example).

But, there's some flexibility in how we satisfy that abstraction. It
might be reasonable to have "fast parsing" that merely comes close to
simulating recursive-descent, as long as we also provide a (slower) full
recursive-descent parser that produces the same PIR output.

Certainly some things won't be recursive-descent anyway (operator
precedence).

Also, it may be a little too early to decide that one method of parsing
will be fast or slow. IIRC we nearly ditched CPS because it was too
slow, then found a way to speed it up.

Allison

Dan Sugalski

unread,

May 11, 2004, 10:48:22 AM5/11/04

to Allison Randal, perl6-i...@perl.org

At 1:24 PM -0500 5/10/04, Allison Randal wrote:
>Dan Sugalski wrote:
>>
>> I'm not so sure about that. (Not that it won't be replaced, but that
>> it needs the grammar engine) I'm pretty sure that grammars as Larry's
>> defined 'em are recursive-descent, and if that's true then I've this
>> really nasty feeling we're going to find it insufficiently fast to
>> parse perl. Could be wrong, but even if we pick up two orders of
>> magnitude in speed over Parse::RecDescent things'll still be
>> significantly more sluggish than the current perl 5 parser.
>>
>> (OTOH, if the grammar engine's not required to be recursive-descent
>> that makes things rather faster, which is fine with me :)
>
>That's a question with multiple layers of answers. :)
>
>The abstraction we need to meet is: a) everything that can be done in
>the parser that ships with Perl 6 can be done in Perl 6; and b) it is
>possible to change the way Perl 6 is parsed within Perl 6 code (by
>loading a custom grammar module, for example).

Right, that bit I get. :)

>But, there's some flexibility in how we satisfy that abstraction. It
>might be reasonable to have "fast parsing" that merely comes close to
>simulating recursive-descent, as long as we also provide a (slower) full
>recursive-descent parser that produces the same PIR output.

The problem there is that if we do this then the first time someone
overrides a parser rule they'll go from millisecond parse times to
second parse times.

Recdescent parsers do an extraordinary amount of work for what you
get. It's a massively brute-force way of parsing, which is part of
the problem--that brute-force path is mandated by the parsing
algorithm. The grammar you build a recdescent parser out of, as
Damian has pointed out to me, isn't inherently different than the one
you'd build a bottom-up parser from, it's the behavior of the built
parser that's the issue. There are constructs that aren't valid in
recdescent parsers (that whole left-recursive thing mainly) but
that's true of all the different parser types.

I suppose I could go try convince Larry to lift the guarantees of
behaviour for the grammar engine (well, besides correctness, at
least) for this. Dunno that it'll be too successful without abysmal
benchmarks to show with it, though. :)

>Certainly some things won't be recursive-descent anyway (operator
>precedence).

Oh, don't be too sure of that. :) Operator precedence can be done in
a recdescent grammar straightforwardly enough. It's a bit longwinded,
but doable.

>Also, it may be a little too early to decide that one method of parsing
>will be fast or slow. IIRC we nearly ditched CPS because it was too
>slow, then found a way to speed it up.

Nope, CPS was originally not an option because of brain-cramp issues,
not speed. The difference in speeds between top-down and bottom-up
parsers run in the orders of magnitude, and there's not too much we
can do to make that faster.

Matt Fowles

unread,

May 11, 2004, 11:17:40 AM5/11/04

to Dan Sugalski, Allison Randal, perl6-i...@perl.org

All~

If I recall my dragon book correctly, shift reduce parsers (which are
what most compilers use) can actually parse a larger class of grammars
then recursive descent parsers which can only parse LL(k) grammars. So
that is another good reason to have it. Perhaps Perl 6 grammars should
provide an is parsed trait that allows one to specify which type of
parsing to use, then we could dictate that the default behavior for
parsing or perl itself is shift reduce parsing rather than recursive
descent.

At least this way people know when they are making a trade off between,
speed and other things.

Matt

Steve Fink

unread,

May 11, 2004, 12:12:03 PM5/11/04

to Joseph Ryan, perl6-i...@perl.org

On May-10, Joseph Ryan wrote:
>
> The Parse::RecDescent in parrot/lib is a hacked version that removes
> a bunch of stuff (tracing code, iirc) from the outputted grammer so
> that it runs many orders faster than the regular version. Or, to
> put it another way, it increases P6C's runspeed from "infuriating"
> to "slow" :)

I think I've been told that at least once before, and forgotten. Is
there a good place on the wiki FAQ where this could be inserted?

Steve Fink

unread,

May 11, 2004, 12:46:02 PM5/11/04

to Matt Fowles, Dan Sugalski, Allison Randal, perl6-i...@perl.org

Top-down and bottom-up are not mutually exclusive. At least not
completely. But self-modifying parsers are *much* easier to do with
top-down than bottom-up, because the whole point of bottom-up is that
you can analyze the grammar at "compile" (parser generation) time, and
propagate the knowledge throughout the rule engine.

A simple example is /fish|job|petunia/. Rather than trying to match
/fish/ and upon failure, trying /job/, and as a last resort /petunia/,
you could do a dispatch table on the first letter and never have to
fall back to anything. In the case of /fish|job|<petunia>/, however,
you can't guarantee that FIRST(<petunia>) will always be "p".

You could generate both parsers and then use a notification or
dependency system to pick which to use, but done naively you end up
with an exponential number of parsers and the logic is likely to be
both slow and a bug magnet.

You could compile the parser assuming everything is final and then at
runtime regenerate the entire parser if anything changes. Which
wouldn't work for long, but perhaps you could break the grammar down
into components that are individually compiled bottom-up, but
coordinated top-down. Then you could limit the scope of recompiles in
the bottom-up components, and not need recompiles in the top-down
structure. The decisions of where to break things down would be quite
similar to a regular compiler's inlining decisions. It may be that
grammars are just too recursive for this to help much, though.

I suspect a slight variant of the above may work best. Rather than
doing a full-out LALR(1) parser for the bottom-up components, you'd do
a somewhat more naive but still table-driven (shift/reduce) parser,
carefully limiting what it is assuming about the FIRST() etc. of the
rules within it. That should limit the impact of changes, and simplify
the logic of what needs to be done differently when a change is
detected.

On May-11, Matt Fowles wrote:
>
> Perhaps Perl 6 grammars should provide an is parsed trait that
> allows one to specify which type of parsing to use, then we could
> dictate that the default behavior for parsing or perl itself is
> shift reduce parsing rather than recursive descent.

Optimization hints could also be very helpful, or we could even
default to a total recursive-descent parser and only attempt bottom-up
precomputation if the grammar author specifically says it's ok. The
main problem being that people will say lots of things if it makes
their code faster, without having any idea what it actually means.

Allison Randal

unread,

May 11, 2004, 5:14:12 PM5/11/04

to Steve Fink, perl6-i...@perl.org

Steve Fink wrote:
>
> I suspect a slight variant of the above may work best. Rather than
> doing a full-out LALR(1) parser for the bottom-up components, you'd do
> a somewhat more naive but still table-driven (shift/reduce) parser,
> carefully limiting what it is assuming about the FIRST() etc. of the
> rules within it. That should limit the impact of changes, and simplify
> the logic of what needs to be done differently when a change is
> detected.

It's the best alternative I've heard so far. If anyone had the time,
skills, and interest, I'd be inclined to set them on it. Even if the
results didn't end up meeting our needs for Perl 6, they'd be useful for
other HLLs.

> Optimization hints could also be very helpful, or we could even
> default to a total recursive-descent parser and only attempt bottom-up
> precomputation if the grammar author specifically says it's ok. The
> main problem being that people will say lots of things if it makes
> their code faster, without having any idea what it actually means.

In which case, they get exactly what they deserve. :)

Allison

Larry Wall

unread,

May 12, 2004, 12:59:51 PM5/12/04

to perl6-i...@perl.org

On Tue, May 11, 2004 at 10:48:22AM -0400, Dan Sugalski wrote:
: >Certainly some things won't be recursive-descent anyway (operator

: >precedence).
:
: Oh, don't be too sure of that. :)

Oh, I think Allison can be quite sure of that. :-)

In fact, I'd go so far as to say that it's almost impossible to do
recursive descent when you allow for defining new operator precedence
levels on the fly as Perl 6 does.

: Operator precedence can be done in

: a recdescent grammar straightforwardly enough. It's a bit longwinded,
: but doable.

Yes, of course, it *can* be done that way. The point is you don't
*want* to do that part in recursive descent, because that's where
most of the overhead goes, plunging down 24 or so levels of recursion
on *every* term because of all the precedence levels. In contrast,
operator precedence is an easily implemented bottom-up technique
that works quite well for expressions. And you'll note that operator
syntax is precisely where Perl 6 abandons rules and lets people define
operators with precedence. So what we're aiming for is parsing
the gross structure of the program down to the expression level by
recursive descent, then switching to operator precedence for as long
as it makes sense. It's pretty easy for operator precedence to tell
when it should give up and return to the outer top-down parser. And,
of course, terms that include blocks may end up recursing down into
the top-down parser again. (As might macros with peculiar is_parsed
rules.)

: The difference in speeds between top-down and bottom-up

: parsers run in the orders of magnitude, and there's not too much we
: can do to make that faster.

Certainly, which is why I've been advocating a hybrid approach from
the getgo. We don't have to fall into the either/or fallacy here.
Intentionally building in an operator precedence engine will take
most of the performance pressure off of the recursive descent engine,
and will naturally optimize between bottom-up and top-down without
the programmer having to throw in cargo-cultish pragmas.

Larry

Abhijit A. Mahabal

unread,

May 13, 2004, 11:26:32 AM5/13/04

to Larry Wall, perl6-i...@perl.org

On Wed, 12 May 2004, Larry Wall wrote:

> In fact, I'd go so far as to say that it's almost impossible to do
> recursive descent when you allow for defining new operator precedence
> levels on the fly as Perl 6 does.
>
> : Operator precedence can be done in
> : a recdescent grammar straightforwardly enough. It's a bit longwinded,
> : but doable.
>
> Yes, of course, it *can* be done that way. The point is you don't
> *want* to do that part in recursive descent,

I see a terminology issue looming here. Er, what is recdescent? The dragon
book and Parse::RecDescent do not seem to use the same defn. According to
aho/ullman "A parser that uses a set of recursive procedures to recognize
its input with no backtracking is called a recursive descent parser". (p
180). Parse::RecDescent does not shy away from backtracking, and in
Aho/Ullman's eyes it would be a top-down parser, but not recdescent.

As usual, I may have missed something. I am not sure which of the two you
mean, but I feel it is the Parse::RecDescent sense because it is powerful
and easier for users (and does not require recomputing the parsing table),
but it is also (much?) slower.

Or, of course, you may mean both. When a program uses a lot of evals then
it may be worthwhile when compiling the program to calculate the
parse-tables and other no-backtracking book-keeping so that the eval's
happen faster at runtime.

> Larry

--abhijit

Larry Wall

unread,

May 13, 2004, 12:35:48 PM5/13/04

to perl6-i...@perl.org

On Thu, May 13, 2004 at 10:26:32AM -0500, Abhijit A. Mahabal wrote:
:

: On Wed, 12 May 2004, Larry Wall wrote:
:
: > In fact, I'd go so far as to say that it's almost impossible to do
: > recursive descent when you allow for defining new operator precedence
: > levels on the fly as Perl 6 does.
: >
: > : Operator precedence can be done in
: > : a recdescent grammar straightforwardly enough. It's a bit longwinded,
: > : but doable.
: >
: > Yes, of course, it *can* be done that way. The point is you don't
: > *want* to do that part in recursive descent,
:
: I see a terminology issue looming here. Er, what is recdescent? The dragon
: book and Parse::RecDescent do not seem to use the same defn. According to
: aho/ullman "A parser that uses a set of recursive procedures to recognize
: its input with no backtracking is called a recursive descent parser". (p
: 180). Parse::RecDescent does not shy away from backtracking, and in
: Aho/Ullman's eyes it would be a top-down parser, but not recdescent.

Well, I think the dragon book does violence to the language here.
In plain terms, a top-down parser "descends" by definition, and
typically uses recursion to do so, even if that recursion is emulated.
Whether it also supports backtracking is somewhat orthogonal.

<sermon>
As with many CS offerings from the Modern era, the dragon book
unconsciously tends to try to classify implementations into "pure"
and "impure", so one feels discouraged from using techniques like
backtracking or hybrid compilers. In this Postmodern era, we tend to
value whatever tickles our fancy, and it happens that computers are
getting fast enough that parsing doesn't always have to be optimized
to death for speed. Instead, we can occasionally optimize for other
things like fun and flexibility. And that's what Perl 6 is aiming for.
</sermon>

: As usual, I may have missed something. I am not sure which of the two you

: mean, but I feel it is the Parse::RecDescent sense because it is powerful
: and easier for users (and does not require recomputing the parsing table),
: but it is also (much?) slower.

We mean it in the more general sense. Whether a grammar can be parsed
with or without backtracking depends on how you write the grammar,
unless you happen to have a language with no "dangling syntax".
We try to avoid dangling syntax in Perl, but don't always succeed.
Nevertheless, if the grammar *usually* doesn't need to backtrack, it
can effectively be just about as fast as a non-backtracking grammar.
Plus there are tricks you can do short of analyzing all the rules into
a state diagram. For instance, the design of Perl 6 rules allows
you to pass in a parameter that could supply a list of "stoppers"
to a subrule so it knows where to stop, just as you might do in a
standard recursive descent parser.

: Or, of course, you may mean both. When a program uses a lot of evals then

: it may be worthwhile when compiling the program to calculate the
: parse-tables and other no-backtracking book-keeping so that the eval's
: happen faster at runtime.

Hmm, well, there's something to be said for discouraging people from
using C<eval> by making it slow. :-)

Larry

Message has been deleted

Luke Palmer

unread,

May 13, 2004, 1:49:51 PM5/13/04

to Dan Sugalski, perl6-i...@perl.org

Dan Sugalski writes:
> And personally I'd be happy to do violence to the dragon book. (Not
> that it's *entirely* horrible, as I occasionally need to prop doors
> open or shim a broken table leg temporarily...)
>
> But, anyway, snipping out the rest of this stuff...
>
> The big problem is that I don't know *how* to implement a mixed-type
> parser generator. I'm not big on parsers in general, so I'm mostly
> stuck with the literature if I need to write one from scratch.
>
> Since you've been itching to get into the internals anyway, this'd be
> a good place to start, y'know. :-P

I'd be happy to draft out a model and do a first run implementation in
a little while. I'm definitely a parsophile.

I've actually been pondering a Perl 5 alternative to P::RD that doesn't
become painfully slow once things get complex enough. Pushing that in
Parrot would be even better though.

Luke

Abhijit A. Mahabal

unread,

May 13, 2004, 2:01:12 PM5/13/04

to Dan Sugalski, perl6-i...@perl.org

On Thu, 13 May 2004, Dan Sugalski wrote:
>
> The big problem is that I don't know *how* to implement a mixed-type
> parser generator. I'm not big on parsers in general, so I'm mostly
> stuck with the literature if I need to write one from scratch.

I have been thinking the following about what larry said earlier. Is this
what you meant, larry?

$grammar = q{
class_defn: "class" block .. etc (normal top-down stuff)
...
term: { call Parse::Yapp or something }
}

What I mean by that weird pseudocode is that everything happens in a
parse-recdescent parser, except that the production of some non-terminals
calls into a bottom-up parser. That part seems trivially doable. What I
need to figure out is how the bottom-up parser can relinquish control to
the top-down parser if it hits a block or such.

A preliminary thought: A traditional bottom-up parser can take the four
actions shift, reduce, accept, and error. We will need a fifth: call the
top down parser. I have not thought through how that happens or if it can
be done at all. It is the only way that I currently see.

I think I'll try to coax Parse::Recdescent and Parse::Yapp to work
together. I have never used the latter, but the documentation seems to
suggest that it can do what Bison can. Or maybe I should write two small
parsers by hand (one recdescent and the other an operator precedence
shift/reduce) and make them be nice to each other.

--abhijit

Larry Wall

unread,

May 13, 2004, 3:22:09 PM5/13/04

to perl6-i...@perl.org

On Thu, May 13, 2004 at 01:01:12PM -0500, Abhijit A. Mahabal wrote:
: I have been thinking the following about what larry said earlier. Is this

: what you meant, larry?
:
: $grammar = q{
: class_defn: "class" block .. etc (normal top-down stuff)
: ...
: term: { call Parse::Yapp or something }

: }

Er, I've never used Parse::Yapp, so I couldn't say. I'd tend to roll
my own operator precedence parser anyway.

: What I mean by that weird pseudocode is that everything happens in a

: parse-recdescent parser, except that the production of some non-terminals
: calls into a bottom-up parser. That part seems trivially doable. What I
: need to figure out is how the bottom-up parser can relinquish control to
: the top-down parser if it hits a block or such.

The bottom up parser doesn't know that it is relinquishing control. It
has to have some kind of tokener/lexer, and that lexer recognizes that it
has a complex term that needs a subparser, calls the parser, and then
returns the subparse as a single token to the bottom-up engine.

: A preliminary thought: A traditional bottom-up parser can take the four

: actions shift, reduce, accept, and error. We will need a fifth: call the
: top down parser. I have not thought through how that happens or if it can
: be done at all. It is the only way that I currently see.

No, you still have the four basic actions. Subparsing is all hidden in
the lexer. As far as the parse is concerned, it's processing a single
token that happens to be something complicated. The only complicated
thing is making sure that all the constructs sub-parsed by the lexer
know when they should terminate. That's trivially easy for bracketed
constructs like blocks. Things get a little tricker for things like
unparenthesized list operators that have to be treated like a left
parenthesis that has no corresponding right parenthesis. Things like
semicolon will be slightly context sensitive: outside brackets, they
terminate the current statement, while inside brackets, they separate
slice terms.

And, of course, the lexer has to be sensitive to whether we're currently
expecting a term or an operator...

: I think I'll try to coax Parse::Recdescent and Parse::Yapp to work

: together. I have never used the latter, but the documentation seems to
: suggest that it can do what Bison can. Or maybe I should write two small
: parsers by hand (one recdescent and the other an operator precedence
: shift/reduce) and make them be nice to each other.

I'd take the latter approach myself, since in any event it will
probably need tweaks that are foreign to whatever tool you choose.
In particular, the fact that Perl 6 uses string comparison rather than
numeric comparison to do precedence levels is going to give almost
any standard tool a hissyfit. (We do that so that the user never has
to specify a precedence level--all levels are specified relative to
an existing operator's precedence level. And with strings, we can
fit as many new precedence levels into the interstices as the user
likes without ever running out of numeric precision.)

Larry

Dan Sugalski

unread,

May 13, 2004, 3:27:43 PM5/13/04

to perl6-i...@perl.org

At 12:22 PM -0700 5/13/04, Larry Wall wrote:
>I'd take the latter approach myself, since in any event it will
>probably need tweaks that are foreign to whatever tool you choose.
>In particular, the fact that Perl 6 uses string comparison rather than
>numeric comparison to do precedence levels is going to give almost
>any standard tool a hissyfit. (We do that so that the user never has
>to specify a precedence level--all levels are specified relative to
>an existing operator's precedence level. And with strings, we can
>fit as many new precedence levels into the interstices as the user
>likes without ever running out of numeric precision.)

I think the prudent thing to do there, since we're going to very
rarely be adding new operators, is to assign the darned things real
precedence numbers which get dynamically set. Add a new operator
between two others and everything gets renumbered.

Larry Wall

unread,

May 13, 2004, 3:54:30 PM5/13/04

to perl6-i...@perl.org

On Thu, May 13, 2004 at 03:27:43PM -0400, Dan Sugalski wrote:
: I think the prudent thing to do there, since we're going to very

: rarely be adding new operators, is to assign the darned things real
: precedence numbers which get dynamically set. Add a new operator
: between two others and everything gets renumbered.

I think that's the hard way.

Larry

Dave Mitchell

unread,

May 13, 2004, 4:41:54 PM5/13/04

to perl6-i...@perl.org

On Thu, May 13, 2004 at 12:22:09PM -0700, Larry Wall wrote:
> No, you still have the four basic actions. Subparsing is all hidden in
> the lexer.

Hence why the lexer in Perl 5 is 8000 lines long ;-)

--
Wesley Crusher gets beaten up by his classmates for being a smarmy git,
and consequently has a go at making some friends of his own age for a
change.
-- Things That Never Happen in "Star Trek" #18

Larry Wall

unread,

May 13, 2004, 6:52:30 PM5/13/04

to perl6-i...@perl.org

On Thu, May 13, 2004 at 09:41:54PM +0100, Dave Mitchell wrote:

: On Thu, May 13, 2004 at 12:22:09PM -0700, Larry Wall wrote:
: > No, you still have the four basic actions. Subparsing is all hidden in
: > the lexer.
:
: Hence why the lexer in Perl 5 is 8000 lines long ;-)

Well, actually, the lexer in Perl 5 manages to be that long without
ever calling back into a subparser. Everything ends up bubbling back
out to the main yacc grammar, albeit highly transmogrified in the
case of literal strings. How Perl 5 does two-pass string processing
is one really good reason we're not doing it that way in Perl 6... :-)

Larry