Ambiguous parsing with Scanless and with my own technique

Andrew Rodland

unread,

Jan 8, 2013, 12:45:22 PM1/8/13

to marpa-...@googlegroups.com

Forgive the vague question here, but I've been working on refining my parsing technique from TAP::Spec::Parser into a standalone module (which will probably never be released as-is, but I've uploaded it to CPAN as MarpaX::Lex::Easy for the sake of discussion.

Now compare these two different implementations of a toy arithmetic expression calculator:

https://gist.github.com/4485844#file-lex-easy-pl (using MarpaX::Lex::Easy)

and

https://gist.github.com/4485844#file-scanless-pl (using Marpa::R2::Scanless)

Note that the rule for NUMBER deliberately allows a leading sign (+ or -), which creates an ambiguity with PLUS and MINUS: is "1+1" "NUMBER PLUS NUMBER" or "NUMBER NUMBER"? Obviously the grammar resolves that ambiguity by not admitting "NUMBER NUMBER".

Given "1+1", the scanless script dies with

Found lexemes @0-1: NUMBER; value="1"
Found lexemes @1-3: NUMBER; value="+1"
Error in string_read: Parse exhausted
* Error was at string position: 1
* Error was at lexemes: NUMBER
* String before error:
1
* String after error:
+1

I suppose my question is: is this a defect in Scanless, a defect in my use of it, or just the way it ought to behave?

Thanks,

Andrew

Jeffrey Kegler

unread,

Jan 8, 2013, 1:28:07 PM1/8/13

to marpa-...@googlegroups.com

It's doing longest token matching, and "+1" is a longer token than "+",
so "+1" wins.

I would deal with this by treating the signs as unary operators.

I could enhance the Scanless interface so that G0 takes into account
which tokens are expected, and will therefore not accept a number after
the first one. But there's an efficiency cost, depending on how I do
that -- the unary operator approach is fast, given the current
implementation. And the Scanless if's current approach is consistent
with the way current lexers do it -- they are (unless you get into state
hacks) clueless about the upper layer's expectations.

-- jeffrey

Andrew Rodland

unread,

Jan 8, 2013, 2:25:49 PM1/8/13

to marpa-...@googlegroups.com

Alright, that's fine. My own goal is to take that performance hit if necessary to lower the barrier to entry as far as possible. I'll keep developing the idea.

Good call on unary operators as a practical way to solve the problem if you do need to work with longest-token, though.

Thanks,

Andrew

Jeffrey Kegler

unread,

Jan 8, 2013, 5:51:08 PM1/8/13

to marpa-...@googlegroups.com

@Andrew: I am glad to hear you'll pursue this. My own semi-traditional longest tokens approach was dictated in large part by

1.) Time. Especially while Marpa was a solo effort, this was a major issue.
2.) Conservatism. Nobody has criticized Marpa yet for being lacking in new ideas. If I am going to make a mistake, I want that mistake to be in sticking to the kind of interface with which people are already familiar.

Maybe your work or some of it can go into Marpa::R3.

Durand Jean-Damien

unread,

Jan 8, 2013, 6:26:59 PM1/8/13

to marpa-...@googlegroups.com

In fact, tradtionnaly the lexers are greedy, afaik -;

Reply all

Reply to author

Forward