Attempting to come up with a simplistic math grammar that has one possible
operand (A) and one possible operator (*) - so that things like A, A*A, and
A*A*A*A*A are all parsed. This simplistic example (thanks to spinclad on
#perl6) cause PGE to explode.
$ cat ta.p6r
grammar f;
rule atom { A }
rule binary { <expr> \* <atom> }
rule expr { <binary> }
$ ./parrot compilers/pge/demo.pir
input "rule <pattern>", "glob <pattern>", "save <name>",
target string, "pir", "exp", "trace", "next", or "use <file>"
use ta.p6r
Loaded f::atom
Loaded f::binary
Loaded f::expr
input "rule <pattern>", "glob <pattern>", "save <name>",
target string, "pir", "exp", "trace", "next", or "use <file>"
rule <f::atom>
input "rule <pattern>", "glob <pattern>", "save <name>",
target string, "pir", "exp", "trace", "next", or "use <file>"
A
match succeeded
$/: <A @ 0> 0
$/<f::atom>: <A @ 0> 0
input "rule <pattern>", "glob <pattern>", "save <name>",
target string, "pir", "exp", "trace", "next", or "use <file>"
B
match failed
input "rule <pattern>", "glob <pattern>", "save <name>",
target string, "pir", "exp", "trace", "next", or "use <file>"
rule <f::binary>
input "rule <pattern>", "glob <pattern>", "save <name>",
target string, "pir", "exp", "trace", "next", or "use <file>"
B
maximum recursion depth exceeded
<SNIP>
Also, $ ./parrot compilers/pge/demo.pir
input "rule <pattern>", "glob <pattern>", "save <name>",
target string, "pir", "exp", "trace", "next", or "use <file>"
use ta.p6r
Loaded f::atom
Loaded f::binary
Loaded f::expr
input "rule <pattern>", "glob <pattern>", "save <name>",
target string, "pir", "exp", "trace", "next", or "use <file>"
rule <f::expr>
input "rule <pattern>", "glob <pattern>", "save <name>",
target string, "pir", "exp", "trace", "next", or "use <file>"
A
Bus error
That probably shouldn't die so horribly, but it should die. We don't
support left-recursive grammars, and this is one. A
non-left-recursive grammar that matches the same language is:
grammar f;
rule atom { A }
rule binary { <atom> \* <binary> }
Luke
Not to pick gnits or show off my ignorance too much, but shouldn't the
binary rule make the second part optional somehow?
rule binary { <atom> [\* <binary>]? }
Matt
--
"Computer Science is merely the post-Turing Decline of Formal Systems Theory."
-???
On Thu, Jun 30, 2005 at 02:56:27PM -0400, Will Coleda wrote:
> Attempting to come up with a simplistic math grammar that has one possible
> operand (A) and one possible operator (*) - so that things like A, A*A, and
> A*A*A*A*A are all parsed. This simplistic example (thanks to spinclad on
> #perl6) cause PGE to explode.
>
> $ cat ta.p6r
> grammar f;
> rule atom { A }
> rule binary { <expr> \* <atom> }
> rule expr { <binary> }
> $ ./parrot compilers/pge/demo.pir
Well.... yes, of course, since there's a left-recursive rule there it's
very likely to explode. (f::expr calls <binary>, which calls <expr>,
which calls <binary>, which calls <expr>, which calls <binary>, which ... )
It's basically the same as what one would get from doing
sub binary { expr(); atom(); }
sub expr { binary(); }
and I'm not sure PGE should be responsible -- at least not yet --
for trapping such infinite loops.
If you rewrite the grammar to be right-recursive I think you'll
find it works fine:
grammar f;
rule atom { A }
rule binary { <atom> \* <expr> }
rule expr { <binary> }
> Also, $ ./parrot compilers/pge/demo.pir
> ...
> input "rule <pattern>", "glob <pattern>", "save <name>",
> target string, "pir", "exp", "trace", "next", or "use <file>"
> rule <f::expr>
>
> input "rule <pattern>", "glob <pattern>", "save <name>",
> target string, "pir", "exp", "trace", "next", or "use <file>"
> A
Well, "A" isn't a valid match for f::expr in the grammar you've
provided -- the f::expr rule calls <binary>, which absolutely
requires at least one \* . If you want to match "A", "A*A", "A*A*A",
etc., I suggest:
grammar f;
rule atom { A }
rule binary { <atom> [ \* <atom> ]? }
rule expr { <binary> }
which eliminates the recursion altogether as well as recognizing a
single <atom> as a valid expression.
Pm
My opinion is that it's "not a bug" -- the normal behavior for
most programs with infinite recursive loops is that they
eventually explode. The original set of rules were
rule atom { A }
rule binary { <expr> \* <atom> }
rule expr { <binary> }
which is analogous to writing something like
sub atom { print "A"; }
sub binary { expr(); print "*"; atom(); }
sub expr { binary(); }
and most people would consider "explode" to be, well, if not correct,
at least not a bug in the language translator.
Pm
> > [pmichaud - Sat Jul 02 15:27:11 2005]:
Sounds reasonable. I'm sure it would be nearly impossible to catch all
possible cases of infinite recursion when the grammar is compiled but
how difficult would it be to trap at runtime? A simple warning along
the lines of "Warning: production foo has recursed 10,000 times without
matching any terminal\n" would certainly be useful in figuring why it
blew up...
-J
--
I'll tag this bug a TODO and update the subject.
DISCLAIMER: I know about diddly-squat about parers.
A possible naive implementation would be to have a counter on every
production that gets incremented by one each time it is matched. If any
counter has a value > $RECURSION_LIMIT a warning is generated or an
error is raised. Any time a terminal is matched all counters are reset.
A lightly simpler version might have just a single global counter but I
think per production counters might also be useful for
debugging/optimizing a grammar.
Cheers,
-J
--
We can enter "trap recursions without terminals" as a possible PGE
todo item, but unless someone knows an easy implementation or wants
to take on the task of implementation it's not likely to be resolved
anytime soon. :-)
Pm
The theoretical implementation of this is quite simple. Keep a
counter. everytime a token is consumed from the input stream reset it.
Every time a rule is followed increment the counter. If the counter
is ever greater than the number of productions in the grammer, then
you have gone into infinite recursion. The only place where this
technique will fail is where a rule that it is attempting to match
changes dynamically (i.e. mid match). In such an event you have to
consider the number of productions infinite.
Matt
--
"Computer Science is merely the post-Turing Decline of Formal Systems Theory."
-Stan Kelly-Bootle, The Devil's DP Dictionary
On 10/10/05, Patrick R. Michaud <pmic...@pobox.com> wrote:
> On Mon, Oct 10, 2005 at 09:11:02AM -0400, Matt Fowles wrote:
> > Patrick~
> >
> > The theoretical implementation of this is quite simple. Keep a
> > counter. everytime a token is consumed from the input stream reset it.
> > Every time a rule is followed increment the counter. If the counter
> > is ever greater than the number of productions in the grammer, then
> > you have gone into infinite recursion.
>
> What happens with something like:
>
> rule atom { A }
> rule binary { . <atom> | <expr> \* <atom> }
> rule expr { <binary> }
>
> "AB" ~~ / <expr> /
>
> Here, the '.' in <binary> keeps "consuming" the initial "A" (thus
> resetting the counter), but the <atom> rule fails, so we take
> the alternation which recurses back to <expr> and does the whole
> thing all over again (with the counter having been reset to zero).
> So I guess we'd need to backtrack the counter as well...?
>
> Perhaps what we want to be watching is "positions", so that we detect
> when the depth of nested subrules activated from a common input position
> is greater than the total number of defined rules?
>
> And, of course, in the case of rules containing closures, all bets
> should be off for detecting infinite recursion, since the closure
> could be handling the termination of the recursion based on some
> criteria other than tokens consumed from the input stream. (But it's
> also probably easy to know when we've executed a closure and thus
> disable the check for infinite recursion when that happens.)
You are right about backtracking, rather than a simple counter we need
to keep track of the current recursion depth from a particular
position. I agree that we would have to disable this for rules
containing code. Perhaps a better approach would be to perform a bit
of static analysis on the grammar and look for left recursions at
creation time (I believe that is a known problem). Then we can just
warn once up front and go on our merry way recursing at matchtime.
On 10/10/05, Patrick R. Michaud <pmic...@pobox.com> wrote:
> On Mon, Oct 10, 2005 at 10:45:54AM -0400, Matt Fowles wrote:
> > Perhaps a better approach would be to perform a bit
> > of static analysis on the grammar and look for left recursions at
> > creation time (I believe that is a known problem). Then we can just
> > warn once up front and go on our merry way recursing at matchtime.
>
> I think this approach falls back into my category of
> "unless someone wants to take on the task of implementation it's not
> likely to be resolved anytime soon..." :-) :-)
>
> In a dynamic environment like the one we have (parrot/perl) I find it's
> really hard to say what constitutes "creation time", especially since
> we can refer to subrules dynamically ( <$foo>, <::$foo>, etc. ) and
> our grammars have inheritance in which a rule defined by a parent
> grammar can invoke a rule defined later in a child grammar. As with
> closures, we could probably punt on performing any analysis of
> rules with dynamic references or that refer to rules outside of
> the current grammar. But given that we can't really know until
> matchtime what subrule will be invoked by an expression like <binary>,
> it's hard for me to imagine a whole lot we can do in the way of
> static analysis beforehand. However, that could easily just be a
> limit of my imagination.
I completely forgot about inheritance. That basically throws static
analysis out the window.
What happens with something like:
rule atom { A }
rule binary { . <atom> | <expr> \* <atom> }
rule expr { <binary> }
"AB" ~~ / <expr> /
Here, the '.' in <binary> keeps "consuming" the initial "A" (thus
resetting the counter), but the <atom> rule fails, so we take
the alternation which recurses back to <expr> and does the whole
thing all over again (with the counter having been reset to zero).
So I guess we'd need to backtrack the counter as well...?
Perhaps what we want to be watching is "positions", so that we detect
when the depth of nested subrules activated from a common input position
is greater than the total number of defined rules?
And, of course, in the case of rules containing closures, all bets
should be off for detecting infinite recursion, since the closure
could be handling the termination of the recursion based on some
criteria other than tokens consumed from the input stream. (But it's
also probably easy to know when we've executed a closure and thus
disable the check for infinite recursion when that happens.)
Pm
I think this approach falls back into my category of
"unless someone wants to take on the task of implementation it's not
likely to be resolved anytime soon..." :-) :-)
In a dynamic environment like the one we have (parrot/perl) I find it's
really hard to say what constitutes "creation time", especially since
we can refer to subrules dynamically ( <$foo>, <::$foo>, etc. ) and
our grammars have inheritance in which a rule defined by a parent
grammar can invoke a rule defined later in a child grammar. As with
closures, we could probably punt on performing any analysis of
rules with dynamic references or that refer to rules outside of
the current grammar. But given that we can't really know until
matchtime what subrule will be invoked by an expression like <binary>,
it's hard for me to imagine a whole lot we can do in the way of
static analysis beforehand. However, that could easily just be a
limit of my imagination.
Pm