The file contains:
===========
use P6C::Parser;
use Perl6grammar;
my $grammar = new Perl6grammar;
print $grammar->sigil('$'),"\n";
print $grammar->signature('(+$x)'), "\n";
===========
It prints out:
=============================
P6C::sigil=ARRAY(0x8a1000c)
P6C::signature=ARRAY(0x8a10738)
=============================
And that is what I expected.
Now, if I put exactly the same contents in a file in
languages/perl6/t/parser, then I get the following output:
==========================
P6C::sigil=ARRAY(0x89ffbc8)
at /u/amahabal/site_perl/lib/perl5/site_perl/5.8.0/Parse/RecDescent.pm
line 2822
Parse::RecDescent::_parserepeat('Parse::RecDescent=HASH(0x8a8abfc)','+$x)','CODE(0x87b9598)',0,1,'','undef')
called at Perl6grammar.pm line 6938
Parse::RecDescent::namespace000001::sigparam('Parse::RecDescent=HASH(0x8a8abfc)','+$x)','','','undef')
called at Perl6grammar.pm line 24049
Parse::RecDescent::namespace000001::signature('Parse::RecDescent=HASH(0x8a8abfc)','(+$x)','undef','undef','CODE(0x89ff214)')
called at
/u/amahabal/site_perl/lib/perl5/site_perl/5.8.0/Parse/RecDescent.pm line
2797
Parse::RecDescent::AUTOLOAD('Parse::RecDescent=HASH(0x8a8abfc)','(+$x)')
called at t/parser/foo3.t line 6
==============================
(I added a confess to Parse::Recdescent where it was failing).
Clearly, something happened right as the ->sigil() case worked okay. I
have not yet been able to hunt the problem down, and was hoping that
somebody knows whats happening off the top of their heads. I do not want
to put all the tests in the top directory :)
--Abhiit
Abhijit Mahabal http://www.cs.indiana.edu/~amahabal/
> I was writing a few tests for the P6 parser and ran into a weird problem.
> If I have the following in a file in languages/perl6, it works as
> expected:
[...]
> Now, if I put exactly the same contents in a file in
> languages/perl6/t/parser, then I get the following error:
Okay, I traced the problem to a "use FindBin" in P6C::Parser.pm. Is it
okay to change
use lib $FindBin::Bin/../../lib;
to
use lib ../../lib;
or is there a good reason not to? The current version makes it necessary
to put all files that "use P6C::Parser" in the same directory, and the
change would allow:
perl t/parser/foo.t
to work.
--Abhijit
Neither of those seems right to me. The first keys off of the position
of the binary, which could be anywhere with respect to the library
module you're in; the second is relative to whatever the current
directory is while you're running the script. I would think that
something like
use File::Basename qw(dirname);
use lib dirname($INC{"P6C/Parser.pm"})."/../../../../lib";
(untested and probably not quite the right path) would be better. Or
perhaps it should be ripped out entirely, and any script using
P6C::Parser should be required to set the lib path correctly? It
partly depends on whether we want to ensure that P6C::Parser
preferentially uses the Parse::RecDescent from parrot/lib rather than
a system-provided one. Which probably is not the case?
> The current version makes it necessary
> to put all files that "use P6C::Parser" in the same directory, and the
> change would allow:
>
> perl t/parser/foo.t
>
> to work.
Just make sure
cd t; perl parser/foo.t
works too.
> use File::Basename qw(dirname);
> use lib dirname($INC{"P6C/Parser.pm"})."/../../../../lib";
I had already tried that, and it doesn't seem to work. I guess it is some
timing issue: $INC{"P6C/Parser.pm"} gets defined after P6C::Parser.pm is
loaded (I think).
>
> Just make sure
>
> cd t; perl parser/foo.t
>
> works too.
>
It doesn't yet :( We do need a clean solution.
--abhijit
It doesn't work for any of them. You can only run the tests from the
languages/perl6 directory. This needs to change at some point.
Even worse, I started working on some unit tests for the various P6C
modules and discovered some really nasty 'use' circles (X uses Y uses Z
uses X), that meant the modules had to be loaded in a very specific
order. I cleaned up a few of them, but I'm sure there are more lurking.
Anyway, yeah, we need a clean solution. All suggestions cheerfully
welcomed. :)
And don't forget, this is just a prototype. All the code will be
replaced as soon as we get a working grammar engine.
Allison
> Neither of those seems right to me. The first keys off of the position
>
>of the binary, which could be anywhere with respect to the library
>module you're in; the second is relative to whatever the current
>directory is while you're running the script. I would think that
>something like
>
> use File::Basename qw(dirname);
> use lib dirname($INC{"P6C/Parser.pm"})."/../../../../lib";
>
>(untested and probably not quite the right path) would be better. Or
>perhaps it should be ripped out entirely, and any script using
>P6C::Parser should be required to set the lib path correctly? It
>partly depends on whether we want to ensure that P6C::Parser
>preferentially uses the Parse::RecDescent from parrot/lib rather than
>a system-provided one. Which probably is not the case?
>
The Parse::RecDescent in parrot/lib is a hacked version that removes
a bunch of stuff (tracing code, iirc) from the outputted grammer so
that it runs many orders faster than the regular version. Or, to
put it another way, it increases P6C's runspeed from "infuriating"
to "slow" :)
- Joe
I'm not so sure about that. (Not that it won't be replaced, but that
it needs the grammar engine) I'm pretty sure that grammars as Larry's
defined 'em are recursive-descent, and if that's true then I've this
really nasty feeling we're going to find it insufficiently fast to
parse perl. Could be wrong, but even if we pick up two orders of
magnitude in speed over Parse::RecDescent things'll still be
significantly more sluggish than the current perl 5 parser.
(OTOH, if the grammar engine's not required to be recursive-descent
that makes things rather faster, which is fine with me :)
--
Dan
--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk
That's a question with multiple layers of answers. :)
The abstraction we need to meet is: a) everything that can be done in
the parser that ships with Perl 6 can be done in Perl 6; and b) it is
possible to change the way Perl 6 is parsed within Perl 6 code (by
loading a custom grammar module, for example).
But, there's some flexibility in how we satisfy that abstraction. It
might be reasonable to have "fast parsing" that merely comes close to
simulating recursive-descent, as long as we also provide a (slower) full
recursive-descent parser that produces the same PIR output.
Certainly some things won't be recursive-descent anyway (operator
precedence).
Also, it may be a little too early to decide that one method of parsing
will be fast or slow. IIRC we nearly ditched CPS because it was too
slow, then found a way to speed it up.
Allison
Right, that bit I get. :)
>But, there's some flexibility in how we satisfy that abstraction. It
>might be reasonable to have "fast parsing" that merely comes close to
>simulating recursive-descent, as long as we also provide a (slower) full
>recursive-descent parser that produces the same PIR output.
The problem there is that if we do this then the first time someone
overrides a parser rule they'll go from millisecond parse times to
second parse times.
Recdescent parsers do an extraordinary amount of work for what you
get. It's a massively brute-force way of parsing, which is part of
the problem--that brute-force path is mandated by the parsing
algorithm. The grammar you build a recdescent parser out of, as
Damian has pointed out to me, isn't inherently different than the one
you'd build a bottom-up parser from, it's the behavior of the built
parser that's the issue. There are constructs that aren't valid in
recdescent parsers (that whole left-recursive thing mainly) but
that's true of all the different parser types.
I suppose I could go try convince Larry to lift the guarantees of
behaviour for the grammar engine (well, besides correctness, at
least) for this. Dunno that it'll be too successful without abysmal
benchmarks to show with it, though. :)
>Certainly some things won't be recursive-descent anyway (operator
>precedence).
Oh, don't be too sure of that. :) Operator precedence can be done in
a recdescent grammar straightforwardly enough. It's a bit longwinded,
but doable.
>Also, it may be a little too early to decide that one method of parsing
>will be fast or slow. IIRC we nearly ditched CPS because it was too
>slow, then found a way to speed it up.
Nope, CPS was originally not an option because of brain-cramp issues,
not speed. The difference in speeds between top-down and bottom-up
parsers run in the orders of magnitude, and there's not too much we
can do to make that faster.
If I recall my dragon book correctly, shift reduce parsers (which are
what most compilers use) can actually parse a larger class of grammars
then recursive descent parsers which can only parse LL(k) grammars. So
that is another good reason to have it. Perhaps Perl 6 grammars should
provide an is parsed trait that allows one to specify which type of
parsing to use, then we could dictate that the default behavior for
parsing or perl itself is shift reduce parsing rather than recursive
descent.
At least this way people know when they are making a trade off between,
speed and other things.
Matt
I think I've been told that at least once before, and forgotten. Is
there a good place on the wiki FAQ where this could be inserted?
A simple example is /fish|job|petunia/. Rather than trying to match
/fish/ and upon failure, trying /job/, and as a last resort /petunia/,
you could do a dispatch table on the first letter and never have to
fall back to anything. In the case of /fish|job|<petunia>/, however,
you can't guarantee that FIRST(<petunia>) will always be "p".
You could generate both parsers and then use a notification or
dependency system to pick which to use, but done naively you end up
with an exponential number of parsers and the logic is likely to be
both slow and a bug magnet.
You could compile the parser assuming everything is final and then at
runtime regenerate the entire parser if anything changes. Which
wouldn't work for long, but perhaps you could break the grammar down
into components that are individually compiled bottom-up, but
coordinated top-down. Then you could limit the scope of recompiles in
the bottom-up components, and not need recompiles in the top-down
structure. The decisions of where to break things down would be quite
similar to a regular compiler's inlining decisions. It may be that
grammars are just too recursive for this to help much, though.
I suspect a slight variant of the above may work best. Rather than
doing a full-out LALR(1) parser for the bottom-up components, you'd do
a somewhat more naive but still table-driven (shift/reduce) parser,
carefully limiting what it is assuming about the FIRST() etc. of the
rules within it. That should limit the impact of changes, and simplify
the logic of what needs to be done differently when a change is
detected.
On May-11, Matt Fowles wrote:
>
> Perhaps Perl 6 grammars should provide an is parsed trait that
> allows one to specify which type of parsing to use, then we could
> dictate that the default behavior for parsing or perl itself is
> shift reduce parsing rather than recursive descent.
Optimization hints could also be very helpful, or we could even
default to a total recursive-descent parser and only attempt bottom-up
precomputation if the grammar author specifically says it's ok. The
main problem being that people will say lots of things if it makes
their code faster, without having any idea what it actually means.
It's the best alternative I've heard so far. If anyone had the time,
skills, and interest, I'd be inclined to set them on it. Even if the
results didn't end up meeting our needs for Perl 6, they'd be useful for
other HLLs.
> Optimization hints could also be very helpful, or we could even
> default to a total recursive-descent parser and only attempt bottom-up
> precomputation if the grammar author specifically says it's ok. The
> main problem being that people will say lots of things if it makes
> their code faster, without having any idea what it actually means.
In which case, they get exactly what they deserve. :)
Allison
Oh, I think Allison can be quite sure of that. :-)
In fact, I'd go so far as to say that it's almost impossible to do
recursive descent when you allow for defining new operator precedence
levels on the fly as Perl 6 does.
: Operator precedence can be done in
: a recdescent grammar straightforwardly enough. It's a bit longwinded,
: but doable.
Yes, of course, it *can* be done that way. The point is you don't
*want* to do that part in recursive descent, because that's where
most of the overhead goes, plunging down 24 or so levels of recursion
on *every* term because of all the precedence levels. In contrast,
operator precedence is an easily implemented bottom-up technique
that works quite well for expressions. And you'll note that operator
syntax is precisely where Perl 6 abandons rules and lets people define
operators with precedence. So what we're aiming for is parsing
the gross structure of the program down to the expression level by
recursive descent, then switching to operator precedence for as long
as it makes sense. It's pretty easy for operator precedence to tell
when it should give up and return to the outer top-down parser. And,
of course, terms that include blocks may end up recursing down into
the top-down parser again. (As might macros with peculiar is_parsed
rules.)
: The difference in speeds between top-down and bottom-up
: parsers run in the orders of magnitude, and there's not too much we
: can do to make that faster.
Certainly, which is why I've been advocating a hybrid approach from
the getgo. We don't have to fall into the either/or fallacy here.
Intentionally building in an operator precedence engine will take
most of the performance pressure off of the recursive descent engine,
and will naturally optimize between bottom-up and top-down without
the programmer having to throw in cargo-cultish pragmas.
Larry
> In fact, I'd go so far as to say that it's almost impossible to do
> recursive descent when you allow for defining new operator precedence
> levels on the fly as Perl 6 does.
>
> : Operator precedence can be done in
> : a recdescent grammar straightforwardly enough. It's a bit longwinded,
> : but doable.
>
> Yes, of course, it *can* be done that way. The point is you don't
> *want* to do that part in recursive descent,
I see a terminology issue looming here. Er, what is recdescent? The dragon
book and Parse::RecDescent do not seem to use the same defn. According to
aho/ullman "A parser that uses a set of recursive procedures to recognize
its input with no backtracking is called a recursive descent parser". (p
180). Parse::RecDescent does not shy away from backtracking, and in
Aho/Ullman's eyes it would be a top-down parser, but not recdescent.
As usual, I may have missed something. I am not sure which of the two you
mean, but I feel it is the Parse::RecDescent sense because it is powerful
and easier for users (and does not require recomputing the parsing table),
but it is also (much?) slower.
Or, of course, you may mean both. When a program uses a lot of evals then
it may be worthwhile when compiling the program to calculate the
parse-tables and other no-backtracking book-keeping so that the eval's
happen faster at runtime.
> Larry
--abhijit
Well, I think the dragon book does violence to the language here.
In plain terms, a top-down parser "descends" by definition, and
typically uses recursion to do so, even if that recursion is emulated.
Whether it also supports backtracking is somewhat orthogonal.
<sermon>
As with many CS offerings from the Modern era, the dragon book
unconsciously tends to try to classify implementations into "pure"
and "impure", so one feels discouraged from using techniques like
backtracking or hybrid compilers. In this Postmodern era, we tend to
value whatever tickles our fancy, and it happens that computers are
getting fast enough that parsing doesn't always have to be optimized
to death for speed. Instead, we can occasionally optimize for other
things like fun and flexibility. And that's what Perl 6 is aiming for.
</sermon>
: As usual, I may have missed something. I am not sure which of the two you
: mean, but I feel it is the Parse::RecDescent sense because it is powerful
: and easier for users (and does not require recomputing the parsing table),
: but it is also (much?) slower.
We mean it in the more general sense. Whether a grammar can be parsed
with or without backtracking depends on how you write the grammar,
unless you happen to have a language with no "dangling syntax".
We try to avoid dangling syntax in Perl, but don't always succeed.
Nevertheless, if the grammar *usually* doesn't need to backtrack, it
can effectively be just about as fast as a non-backtracking grammar.
Plus there are tricks you can do short of analyzing all the rules into
a state diagram. For instance, the design of Perl 6 rules allows
you to pass in a parameter that could supply a list of "stoppers"
to a subrule so it knows where to stop, just as you might do in a
standard recursive descent parser.
: Or, of course, you may mean both. When a program uses a lot of evals then
: it may be worthwhile when compiling the program to calculate the
: parse-tables and other no-backtracking book-keeping so that the eval's
: happen faster at runtime.
Hmm, well, there's something to be said for discouraging people from
using C<eval> by making it slow. :-)
Larry
I'd be happy to draft out a model and do a first run implementation in
a little while. I'm definitely a parsophile.
I've actually been pondering a Perl 5 alternative to P::RD that doesn't
become painfully slow once things get complex enough. Pushing that in
Parrot would be even better though.
Luke
I have been thinking the following about what larry said earlier. Is this
what you meant, larry?
$grammar = q{
class_defn: "class" block .. etc (normal top-down stuff)
...
term: { call Parse::Yapp or something }
}
What I mean by that weird pseudocode is that everything happens in a
parse-recdescent parser, except that the production of some non-terminals
calls into a bottom-up parser. That part seems trivially doable. What I
need to figure out is how the bottom-up parser can relinquish control to
the top-down parser if it hits a block or such.
A preliminary thought: A traditional bottom-up parser can take the four
actions shift, reduce, accept, and error. We will need a fifth: call the
top down parser. I have not thought through how that happens or if it can
be done at all. It is the only way that I currently see.
I think I'll try to coax Parse::Recdescent and Parse::Yapp to work
together. I have never used the latter, but the documentation seems to
suggest that it can do what Bison can. Or maybe I should write two small
parsers by hand (one recdescent and the other an operator precedence
shift/reduce) and make them be nice to each other.
--abhijit
Er, I've never used Parse::Yapp, so I couldn't say. I'd tend to roll
my own operator precedence parser anyway.
: What I mean by that weird pseudocode is that everything happens in a
: parse-recdescent parser, except that the production of some non-terminals
: calls into a bottom-up parser. That part seems trivially doable. What I
: need to figure out is how the bottom-up parser can relinquish control to
: the top-down parser if it hits a block or such.
The bottom up parser doesn't know that it is relinquishing control. It
has to have some kind of tokener/lexer, and that lexer recognizes that it
has a complex term that needs a subparser, calls the parser, and then
returns the subparse as a single token to the bottom-up engine.
: A preliminary thought: A traditional bottom-up parser can take the four
: actions shift, reduce, accept, and error. We will need a fifth: call the
: top down parser. I have not thought through how that happens or if it can
: be done at all. It is the only way that I currently see.
No, you still have the four basic actions. Subparsing is all hidden in
the lexer. As far as the parse is concerned, it's processing a single
token that happens to be something complicated. The only complicated
thing is making sure that all the constructs sub-parsed by the lexer
know when they should terminate. That's trivially easy for bracketed
constructs like blocks. Things get a little tricker for things like
unparenthesized list operators that have to be treated like a left
parenthesis that has no corresponding right parenthesis. Things like
semicolon will be slightly context sensitive: outside brackets, they
terminate the current statement, while inside brackets, they separate
slice terms.
And, of course, the lexer has to be sensitive to whether we're currently
expecting a term or an operator...
: I think I'll try to coax Parse::Recdescent and Parse::Yapp to work
: together. I have never used the latter, but the documentation seems to
: suggest that it can do what Bison can. Or maybe I should write two small
: parsers by hand (one recdescent and the other an operator precedence
: shift/reduce) and make them be nice to each other.
I'd take the latter approach myself, since in any event it will
probably need tweaks that are foreign to whatever tool you choose.
In particular, the fact that Perl 6 uses string comparison rather than
numeric comparison to do precedence levels is going to give almost
any standard tool a hissyfit. (We do that so that the user never has
to specify a precedence level--all levels are specified relative to
an existing operator's precedence level. And with strings, we can
fit as many new precedence levels into the interstices as the user
likes without ever running out of numeric precision.)
Larry
I think the prudent thing to do there, since we're going to very
rarely be adding new operators, is to assign the darned things real
precedence numbers which get dynamically set. Add a new operator
between two others and everything gets renumbered.
I think that's the hard way.
Larry
Hence why the lexer in Perl 5 is 8000 lines long ;-)
--
Wesley Crusher gets beaten up by his classmates for being a smarmy git,
and consequently has a go at making some friends of his own age for a
change.
-- Things That Never Happen in "Star Trek" #18
Well, actually, the lexer in Perl 5 manages to be that long without
ever calling back into a subparser. Everything ends up bubbling back
out to the main yacc grammar, albeit highly transmogrified in the
case of literal strings. How Perl 5 does two-pass string processing
is one really good reason we're not doing it that way in Perl 6... :-)
Larry