Help with libmarpa ranking and ambiguity

43 views
Skip to first unread message

ioba...@gmail.com

unread,
Aug 31, 2016, 7:43:17 PM8/31/16
to marpa parser
I'm working on Ruby bindings for libmarpa.  They're coming along quite well.  Right now I'm trying to implement prioritized alternatives ('||' in SLIF).  I think I'm close, but the following example has me scratching my head.  Please be patient as I try to give just enough background.  (It's really not that much if you skip the code and dumps.)

First of all, to implement priorities in the Ruby wrapper, I'm using the rule ranking.  I'm not modifying the symbol rankings at all.  Here is the Perl version.  (I know there are plenty of examples on how to do the calculator grammar "right."  It's the most straightforward example I could come up with.)

    my $grammar_str = "
    :default ::= action => [name,values]
    :start ::= expr
    expr ::= number ||
             expr '*' expr ||
             expr '+' expr
    number ~ [\\d]+
    ";
    my $input = '3+4*5+6*7*8';
   
    my $grammar = Marpa::R2::Scanless::G->new({ source => \$grammar_str });
    my $recce = Marpa::R2::Scanless::R->new({ grammar => $grammar});
   
    # check for complete parsing
    my $len_read = $recce->read( \$input );
    die "Incomplete read" if $len_read != length $input;
   
    # check for ambiguous parse result
    if ( my $ambiguous_status = $recce->ambiguous() ) {
        chomp $ambiguous_status;
        die "Parse is ambiguous\n", $ambiguous_status;
    }

I get a single parse result with the '+' before the 6 at the top of the expression tree.

In my Ruby bindings, I'm assigning the ranks so that the "number" rule has the highest rank, the multiply rule has the middle rank, and the addition rule has the lowest rank.  Here is the output of the 'show_rules' method for my grammar.  (The default rank is 0.  If the rank is equal to zero, it is not printed.)

    R0: S2 ::= S3
         number ::= number.rule
    R1: S1 ::= S2  rank 2
         expr.rule ::= NUMBER
    R2: S4 ::= S0 S5 S0
         EXPR /\*/ EXPR ::= EXPR /\*/ EXPR
    R3: S1 ::= S4  rank 1
         expr.rule ::= EXPR /\*/ EXPR
    R4: S6 ::= S0 S7 S0
         EXPR /\+/ EXPR ::= EXPR /\+/ EXPR
    R5: S1 ::= S6
         expr.rule ::= EXPR /\+/ EXPR
    R6: S0 ::= S1
         expr ::= expr.rule

The DSL gets in the way a little on the LHS of the productions (no, this is not a CSG).  Remember, it's a work in progress. :-)  The point is that I'm generating the rules with the ranks I expect because the output above reads the rule rank back from libmarpa.

The parsing result I get has five results, with the '*' operators at the top of the tree.  In prefix notation, the results are as follows.

    "(* (* (* (+ 3 4) (+ 5 6)) 7) 8)"
    "(* (* (+ 3 4) (* (+ 5 6) 7)) 8)"
    "(* (* (+ 3 4) (+ 5 6)) (* 7 8))"
    "(* (+ 3 4) (* (* (+ 5 6) 7) 8))"
    "(* (+ 3 4) (* (+ 5 6) (* 7 8)))"

This looks like an equivalent of the 'dangling-else' ambiguity.

I have two questions.

1.  Why is my priority backwards?  Marpa::R2 gave me the '+' rule at the outer-most level, while my Ruby wrapper gave the '*' rule at the outer-most level every time.  In other words, help me understand what 'rank' means in the Marpa context, and how it is applied.

2.  Why does Marpa::R2 give an unambiguous result?  Even with the '+' rule on the outside, there are two choices for associativity.

Thanks!

- Ryan

Jeffrey Kegler

unread,
Aug 31, 2016, 9:25:05 PM8/31/16
to Marpa Parser Mailing LIst
Marpa::R2 uses a *rewrite* to implement priority.  It's done in Marpa::R2::Internal::MetaAST_Nodes::priority_rule::evaluate() in the Marpa::R2 code.

As for using rule ranks, the SLIF allows the user to use them as well, so I don't use them in the SLIF to implement prioritization -- I am not even sure they can be used in this way.

--
You received this message because you are subscribed to the Google Groups "marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email to marpa-parser+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jeffrey Kegler

unread,
Aug 31, 2016, 10:55:21 PM8/31/16
to Marpa Parser Mailing LIst
I've remembered a major reason for preferring the rewrite approach.  If you parse then rank, you're using ambiguous parsing -- Marpa is cool with that, but it is not necessarily as fast as an unambiguous parse.  If the expression is of size X the parse could be as bad as O(X^3) and that can make a difference if you have a lot of long expressions.

If the grammar enforces precedence, ambiguous parsing can be avoided.

On Wed, Aug 31, 2016 at 4:43 PM, <ioba...@gmail.com> wrote:

ioba...@gmail.com

unread,
Aug 31, 2016, 11:15:12 PM8/31/16
to marpa parser
Interesting.  Can you suggest any references on how to implement this grammar rewriting?

Thanks!

- Ryan
To unsubscribe from this group and stop receiving emails from it, send an email to marpa-parser...@googlegroups.com.

Jeffrey Kegler

unread,
Aug 31, 2016, 11:20:48 PM8/31/16
to Marpa Parser Mailing LIst

To unsubscribe from this group and stop receiving emails from it, send an email to marpa-parser+unsubscribe@googlegroups.com.

Anton Dyudin

unread,
Aug 31, 2016, 11:24:26 PM8/31/16
to marpa-...@googlegroups.com
Speaking as someone who isn't fluent in Perl, and doesn't see a "||" anywhere on that page, would you be willing to give some broad strokes of what the code is doing? :3

Jeffrey Kegler

unread,
Aug 31, 2016, 11:46:13 PM8/31/16
to Marpa Parser Mailing LIst
At this point I have trouble with tracking all my stuff, but here's I think the lastest and/or most accurate: https://github.com/jeffreykegler/kollos/blob/master/notes/design/precedenced.md

On Wed, Aug 31, 2016 at 8:24 PM, Anton Dyudin <antec...@gmail.com> wrote:
Speaking as someone who isn't fluent in Perl, and doesn't see a "||" anywhere on that page, would you be willing to give some broad strokes of what the code is doing? :3

Jeffrey Kegler

unread,
Aug 31, 2016, 11:56:38 PM8/31/16
to Marpa Parser Mailing LIst
As a pedagogic point -- if it's not immediately obvious how SLIF precedenced statements could be implemented by hand, work a least one simple one yourself.  That will make what my algorithm is doing very obvious.

Start with really simple, with something like:

Expression ::=
        Number
       || Expression '*' Expression
       || Expression '+' Expression

and translate it to pure BNF.  Trying to understand the algorithm before you're solid on the basic concept will probably just confuse things.

Anton Dyudin

unread,
Sep 1, 2016, 12:04:42 AM9/1/16
to marpa-...@googlegroups.com
Expression[Mul] ::= Number | Expression[Mul] '*' Expression[Mul]
Expression[Add] ::= Expression[Mul] | Expression[Add] '+' Expression[Add]
Expression ::= Expression[Add]

I think I was looking for the prioritized_symbol section, thank you

On Wednesday, 31 August 2016, Jeffrey Kegler <jeffre...@jeffreykegler.com> wrote:
As a pedagogic point -- if it's not immediately obvious how SLIF precedenced statements could be implemented by hand, work a least one simple one yourself.  That will make what my algorithm is doing very obvious.

Start with really simple, with something like:

Expression ::=
        Number
       || Expression '*' Expression
       || Expression '+' Expression

and translate it to pure BNF.  Trying to understand the algorithm before you're solid on the basic concept will probably just confuse things.

Jeffrey Kegler

unread,
Sep 1, 2016, 12:10:01 AM9/1/16
to Marpa Parser Mailing LIst
Just in case it was unclear, that last message was aimed at the general reader of this read, and was not ad hominem.

@Anton: Off the top of my head, 2+3+4 is ambiguous in that grammar -- the BNF does not implement associativity.

On Wed, Aug 31, 2016 at 9:04 PM, Anton Dyudin <antec...@gmail.com> wrote:
Expression[Mul] ::= Number | Expression[Mul] '*' Expression[Mul]
Expression[Add] ::= Expression[Mul] | Expression[Add] '+' Expression[Add]
Expression ::= Expression[Add]

I think I was looking for the prioritized_symbol section, thank you

Anton Dyudin

unread,
Sep 1, 2016, 12:16:04 AM9/1/16
to marpa-...@googlegroups.com
Oh sure, presumably you'd want to take each "x | y OP y" and swap either the left or right y with an x, depending on what kind of associating it you want; as you said in precedenced.md. Which answers iobass' question about ambiguity I think: there's an interpretation of "num || exp '+' exp" that's ambiguous, but it's not the one R2 is using :P

ioba...@gmail.com

unread,
Sep 1, 2016, 2:48:16 PM9/1/16
to marpa parser
Thanks for the pointer to prose on the rule rewriting.  I can read Perl about as well as middle English -- lots of effort and guessing, and usually getting pretty close. :-)

Further comments inline below.


On Wednesday, August 31, 2016 at 10:10:01 PM UTC-6, Jeffrey Kegler wrote:
Just in case it was unclear, that last message was aimed at the general reader of this read, and was not ad hominem.

No offense taken -- it sounds like great advice to me in particular!

@Anton: Off the top of my head, 2+3+4 is ambiguous in that grammar -- the BNF does not implement associativity.

So SLIF has an implied, default associativity?  I can find one IRC comment to this effect, but I can't find it anywhere in the docs.  (There's a great potential spot in Scanless/DSL.pod where it discusses associativity....)

This explains why my parse was not ambiguous.  What if I *want* my grammar to be ambiguous in this case?  I saw another IRC comment asking for an adverb setting for associativity of "none".  Is this possible?

Thanks!

- Ryan

<snip>

ioba...@gmail.com

unread,
Sep 2, 2016, 4:30:04 PM9/2/16
to marpa parser
The rewrite approach seems to work great for recursive rules, but it doesn't do anything for non-recursive rules.  Running the following gives me an ambiguous parse.  Since opt1a is a higher priority, I expect to see two productions of opt1a with no ambiguity.  Am I expecting too much here?  Thanks for your help!


    use strict;
    use Marpa::R2;
    use Data::Dumper;
    
    my $grammar = "
    options ::= option*
    option  ::= opt1a || opt2a
    opt1a   ::= 'a'
    opt2a   ::= 'a' 'a'
    
    :discard ~ whitespace
    whitespace ~ [\\s]+
    ";
    
    my $grammar = Marpa::R2::Scanless::G->new( { source => \$grammar } );
    my $input = 'a a';
    my $value_ref = $grammar->parse( \$input );
    
    my $value = ${$value_ref};
    print "Output:\n".Dumper($value);


On Wednesday, August 31, 2016 at 8:55:21 PM UTC-6, Jeffrey Kegler wrote:
I've remembered a major reason for preferring the rewrite approach.  If you parse then rank, you're using ambiguous parsing -- Marpa is cool with that, but it is not necessarily as fast as an unambiguous parse.  If the expression is of size X the parse could be as bad as O(X^3) and that can make a difference if you have a lot of long expressions.

<snip>

Jeffrey Kegler

unread,
Sep 2, 2016, 4:36:17 PM9/2/16
to Marpa Parser Mailing LIst
If you use the rewrite option, it remains possible to write a precedenced statement that is inherently ambiguous.  Rewriting is to prevent the *implementation* of precedenced rules from *introducing* new ambiguities not actually in the precedenced rule itself.


Anton Dyudin

unread,
Sep 2, 2016, 4:37:11 PM9/2/16
to marpa-...@googlegroups.com
Isn't the whole point of a precedence rule to provide a disambiguition?

Jeffrey Kegler

unread,
Sep 2, 2016, 4:41:16 PM9/2/16
to Marpa Parser Mailing LIst
If you *want* ambiguity, consider avoiding the precedenced rule syntax and writing it directly in BNF.  Precedenced rules are intended to cover the ordinary cases.  If you're doing advanced and tricky stuff, laying it out in pure BNF forces you to make it clear exactly what you meant.

I find that many systems, with their advanced features and syntaxes, go too far in the DWIMery.  This is handy if their "Do What I Mean" is in fact what you meant.  But if the DWIMery guesses wrong, the system just makes things more complicated.  So I try to not make my advanced features *too* smart.

Jeffrey Kegler

unread,
Sep 2, 2016, 4:49:29 PM9/2/16
to Marpa Parser Mailing LIst
A lot of people use ambiguity to mean "ambiguous if you use a parser with the power of recursive descent or LALR", which is kind of a figurative extension of the term.  By ambiguous I mean ambiguous even if a general context-free parser is used.  That's the standard academic meaning of the term.

On Fri, Sep 2, 2016 at 1:37 PM, Anton Dyudin <antec...@gmail.com> wrote:
Isn't the whole point of a precedence rule to provide a disambiguition?

Anton Dyudin

unread,
Sep 2, 2016, 4:49:45 PM9/2/16
to marpa-...@googlegroups.com
It seems clear to me that

    option  ::= opt1a || opt2a
    opt1a   ::= 'a'
    opt2a   ::= 'a' 'a'

doesn't want ambiguity; but that the BNF rewrite does introduce it, by treating the || as an ambiguous option. Or is it supposed to be precedenced only in recursive contexts?

Jeffrey Kegler

unread,
Sep 2, 2016, 5:04:19 PM9/2/16
to Marpa Parser Mailing LIst
I'll confess to not having looked closely at the example, which is sort of a degenerate case.  But enforcing precedence is *not* the same as removing ambiguity.

Note that the precedence, especially in this case, is *general precedence*, as opposed to the operator precedence you'll find in the textbooks.  Marpa implements operator precedence as a subcase of general precedence.  General precedence is AFAICT  my discovery -- no parser before Marpa could have implemented it efficiently, and I don't know of it being described anywhere in the literature.

If you think the above does not "want" ambiguity, you should work out an alternative interpretation of the meaning of operatorless precedenced statements.  I can (I think) claim the first word in general precedence parsing, but that does not mean it will be the last word. :-)

Jeffrey Kegler

unread,
Sep 2, 2016, 5:13:53 PM9/2/16
to Marpa Parser Mailing LIst
@iobass16 -- you're right.  The default associativity is left, but this is not documented AFAICT.  I've filed a bug report so that when and if there is a new Marpa::R2 release, the docs should be fixed: https://github.com/jeffreykegler/Marpa--R2/issues/272

On Thu, Sep 1, 2016 at 11:48 AM, <ioba...@gmail.com> wrote:

Jeffrey Kegler

unread,
Sep 2, 2016, 5:17:09 PM9/2/16
to Marpa Parser Mailing LIst
Looking back, I see that I addressed the relationship between precedence and ambiguity in the docs: https://metacpan.org/pod/distribution/Marpa-R2/pod/Scanless/DSL.pod#Precedence-and-ambiguous-grammars

ioba...@gmail.com

unread,
Sep 2, 2016, 7:24:58 PM9/2/16
to marpa parser
A degenerate case isn't necessarily invalid. :-)  I tried for a very simple non-recursive precedence rule.  

So what does "precedence" mean in a Marpa grammar then?  AFAICT, it means whatever grammar you get from the rewriting rules.  It's clear that the results for recursive arithmetic examples match my understanding of the term.  In this case, using the precedenced choice operator removes ambiguity.  I didn't expect recursion to be an essential ingredient.

If convenient, I would enjoy seeing a "precedenced statement that remains ambiguous."  Like @Anton Dyudin, I expected to be able to prune ambiguity via precedence.

FYI, this isn't idle speculation.  My target language to parse, VHDL, dictates different semantics in some contexts for identifiers versus general expressions [1].  But of course, a lone identifier is also a valid expression.  To achieve the Marpa ideal of "semantic parsing straight from BNF", I hoped to use precedence to give the identifier priority and suppress the ambiguity.

Please forgive any whininess that may have slipped through in this post.  You have done an amazing job with Marpa!  My aims are to (1) understand Marpa's capabilities, intents, definitions, etc., and (2) figure out how to best use Marpa for my application (parsing VHDL).

Thanks!

- Ryan

[1] For an actual in an association list, a bare signal name (identifier) is hooked directly to the formal.  An expression in this same context infers an anonymous signal, which incurs a delta delay relative to the straight identifier.  The full rules (VHDL-2008) are more complex.

Jeffrey Kegler

unread,
Sep 2, 2016, 8:21:20 PM9/2/16
to Marpa Parser Mailing LIst
About ambiguity and precedence.  You easily run into it if you try to have implied operators.

Imagine someone trying to leave multiplication as implied.  Eliminate the unary plus and it becomes (I think) unambiguous:

E ::= Number
   || E '+' E
   || E E

It's unambiguous, that is, '1 + 2 3 + 4' parses as (1+2)(3+4).

This is very, very cool and far beyond what other parsers can do.  The reason I haven't bragged on this is that is you do something reasonable like add unary plus:

E ::= Number
   || '+' E
   || E '+' E
   || E E

you get these parses:

((1,+,2),(3,+,4))
((1,(+,2)),(3,+,4))
(((1,+,2),3),(+,4))
(((1,(+,2)),3),(+,4))

All of these obey precedence.

Jeffrey Kegler

unread,
Sep 2, 2016, 8:24:49 PM9/2/16
to Marpa Parser Mailing LIst
@Anton: precedence is about what parses are allowed in nested expressions.  Precedence with zero-depth nesting is like a triangle with a zero-degree angle.  Mathematicians call things like this degenerate -- yes, a 0-degree triangle *is* a valid triangle, but it's also an annoying, tricky and ultimately uninteresting special case.

Anton Dyudin

unread,
Sep 2, 2016, 9:44:45 PM9/2/16
to marpa-...@googlegroups.com
I wouldn't rush to declare victory for 1 + 2a + b being (* (+ 1 2) (+ a b)), but I get your point. Precedence in "option = a || a a" can't escape to affect option* around it.


On Friday, 2 September 2016, Jeffrey Kegler <jeffre...@jeffreykegler.com> wrote:
@Anton: precedence is about what parses are allowed in nested expressions.  Precedence with zero-depth nesting is like a triangle with a zero-degree angle.  Mathematicians call things like this degenerate -- yes, a 0-degree triangle *is* a valid triangle, but it's also an annoying, tricky and ultimately uninteresting special case.

Jeffrey Kegler

unread,
Sep 3, 2016, 6:42:53 PM9/3/16
to Marpa Parser Mailing LIst
I reworked the example a bit.  Among the improvements, the plus and (implied) multiplication operators are used with their traditional precedence.

With the input: 1 2 + 3 4

This expression:
E ::= Number
   || E E
   || E '+' E

produces this one parse: ((1,2),+,(3,4))

Add unary plus:

E ::= Number
   || '+' E
   || E E
   || E '+' E

and there are two parses which obey precedence:

(((1,2),(+,3)),4)
((1,2),+,(3,4))


On Fri, Sep 2, 2016 at 6:44 PM, Anton Dyudin <antec...@gmail.com> wrote:
I wouldn't rush to declare victory for 1 + 2a + b being (* (+ 1 2) (+ a b)), but I get your point. Precedence in "option = a || a a" can't escape to affect option* around it.
Reply all
Reply to author
Forward
0 new messages