Ambiguous parsing with Scanless and with my own technique

24 views
Skip to first unread message

Andrew Rodland

unread,
Jan 8, 2013, 12:45:22 PM1/8/13
to marpa-...@googlegroups.com
Forgive the vague question here, but I've been working on refining my parsing technique from TAP::Spec::Parser into a standalone module (which will probably never be released as-is, but I've uploaded it to CPAN as MarpaX::Lex::Easy for the sake of discussion.

Now compare these two different implementations of a toy arithmetic expression calculator:


and


Note that the rule for NUMBER deliberately allows a leading sign (+ or -), which creates an ambiguity with PLUS and MINUS: is "1+1" "NUMBER PLUS NUMBER" or "NUMBER NUMBER"? Obviously the grammar resolves that ambiguity by not admitting "NUMBER NUMBER".

Given "1+1", the scanless script dies with

Found lexemes @0-1: NUMBER; value="1"
Found lexemes @1-3: NUMBER; value="+1"
Error in string_read: Parse exhausted
* Error was at string position: 1
* Error was at lexemes: NUMBER
* String before error:
1
* String after error:
+1

I suppose my question is: is this a defect in Scanless, a defect in my use of it, or just the way it ought to behave?

Thanks,

Andrew

Jeffrey Kegler

unread,
Jan 8, 2013, 1:28:07 PM1/8/13
to marpa-...@googlegroups.com
It's doing longest token matching, and "+1" is a longer token than "+",
so "+1" wins.

I would deal with this by treating the signs as unary operators.

I could enhance the Scanless interface so that G0 takes into account
which tokens are expected, and will therefore not accept a number after
the first one. But there's an efficiency cost, depending on how I do
that -- the unary operator approach is fast, given the current
implementation. And the Scanless if's current approach is consistent
with the way current lexers do it -- they are (unless you get into state
hacks) clueless about the upper layer's expectations.

-- jeffrey

Andrew Rodland

unread,
Jan 8, 2013, 2:25:49 PM1/8/13
to marpa-...@googlegroups.com
Alright, that's fine. My own goal is to take that performance hit if necessary to lower the barrier to entry as far as possible. I'll keep developing the idea.

Good call on unary operators as a practical way to solve the problem if you do need to work with longest-token, though.

Thanks,

Andrew

Jeffrey Kegler

unread,
Jan 8, 2013, 5:51:08 PM1/8/13
to marpa-...@googlegroups.com
@Andrew: I am glad to hear you'll pursue this.  My own semi-traditional longest tokens approach was dictated in large part by

1.) Time.  Especially while Marpa was a solo effort, this was a major issue.
2.) Conservatism.  Nobody has criticized Marpa yet for being lacking in new ideas.  If I am going to make a mistake, I want that mistake to be in sticking to the kind of interface with which people are already familiar.

Maybe your work or some of it can go into Marpa::R3.

Durand Jean-Damien

unread,
Jan 8, 2013, 6:26:59 PM1/8/13
to marpa-...@googlegroups.com
In fact, tradtionnaly the lexers are greedy, afaik -;
Reply all
Reply to author
Forward
0 new messages