Ambiguous parses induced in external lexer

37 views
Skip to first unread message

Thomas Weigert

unread,
Feb 15, 2015, 1:53:35 AM2/15/15
to marpa-...@googlegroups.com
I am wondering whether it is possible to generate ambiguous parses with the help of an external lexer.

I have the following situation: Upon a rejection event, I am cleaning up in the lexer to be able to continue the parse. However, there are two pairs of ruby slippers I can use. What I would like to do is not to arbitrarily choose one pair and continue the parse, but to use both pairs to continue both parses, potentially resulting in a parse forest.

From studying the manual and the available examples, I cannot see how to do this.

I would greatly appreciate if anybody could advise whether this is possible, and if so, point me into the right direction.

Thanks, Th.

Ruslan Shvedov

unread,
Feb 15, 2015, 5:12:52 AM2/15/15
to marpa-...@googlegroups.com
Marpa allows ambiguous lexemes; in external lexing you can read those several alternative lexemes via several calls to lexeme_alternative() [1] and then a single call to lexeme_complete() [2] and, provided the grammar has the appropriate rules, this will lead to a parse forest (use example of a parse forest with internal lexing can be found in [3]).

Marpa will use both pairs to continue both parses if you give it a grammar which defines rules for such both pairs, example is also in [3]. And, if you have enough rules, Marpa will parse the input according to them and will allow you to handle the multiple parse results.


--
You received this message because you are subscribed to the Google Groups "marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email to marpa-parser...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Thomas Weigert

unread,
Feb 15, 2015, 3:29:28 PM2/15/15
to marpa-...@googlegroups.com
Ruslan,

thanks for your reply. So sorry for not being clear enough.

I am not looking to handle the situation you describe which I believe pertains to that one recognizes two or more conflicting lexical symbols and then continues the parse with all these symbols as possible continuations.

My situation is as follows: I encounter a rejection in the parse and am recovering. I have two or more ruby slippers prepared, which in this case are all applicable. If I step into one of the ruby slippers, it will whisk me away back into the parser and the parser will continue until the ruby slippers have done their thing. However, at this point we have advanced in the parse and might have consumed G1 locations and are not in the same position anymore that we were when we stepped into the ruby slippers.

But I had other ruby slippers I could have used, and had I stepped into one of these, I would have ended up with a different parse, potentially. So I am trying to find a way of stepping into each of the ruby slippers, one at a time, and getting a partial parse. After I found all the partial parses from the applicable ruby slippers, I would like for each of these partial parses to be reflected in the parse forest.

I would have to do something for this to happen. Recognizing alternative lexemes is not the way to get there, I believe. I have partial parses that I need to hand back to the G routines, so they can embed them into their constructed data structures.

Maybe I am outside of what is feasible here, but it would be good to know at least....



On Sunday, February 15, 2015 at 4:12:52 AM UTC-6, rns wrote:
Marpa allows ambiguous lexemes; in external lexing you can read those several alternative lexemes via several calls to lexeme_alternative() [1] and then a single call to lexeme_complete() [2] and, provided the grammar has the appropriate rules, this will lead to a parse forest (use example of a parse forest with internal lexing can be found in [3]).

Marpa will use both pairs to continue both parses if you give it a grammar which defines rules for such both pairs, example is also in [3]. And, if you have enough rules, Marpa will parse the input according to them and will allow you to handle the multiple parse results.


Jeffrey Kegler

unread,
Feb 15, 2015, 3:49:03 PM2/15/15
to Marpa Parser Mailing LIst
If you add ambiguous lexemes, and they allow more than one parse, Marpa will track both parses for as long as either remains possible, so backtracking should not be necessary

I read your new message as you taking it that ambiguous lexemes only have a localized effect.  Marpa, in fact, pursues all the possible parses as far as they go.  The choice of which lexeme to use is *not* made until the parse is complete.

If the question is how to look at all those parses, you use repeated calls to $recce->value() to iterate them.  If you really want to traverse an ASF as a forest, top-down, Marpa has a interface for this which is AFAIK ground-breaking.  Ruslan S. is an expert at this top-down ASF interface.

If we are still talking past each other, you might try creating a mini-example, showing what it is that you think Marpa cannot do.


Thomas Weigert

unread,
Feb 15, 2015, 4:16:52 PM2/15/15
to marpa-...@googlegroups.com
Jeff,

yes, I am sorry, I believe we are still not talking about the same situation. I'll concoct an example, but let me try one more time now....

The ambiguity in my situation does not come from that the lexer encounters an ambiguous lexeme. My problem is that I encounter a parsing problem and try to fix it up. However, there are two or more legitimate ways of fixing it up. Any of these ways of fixing up the parse will lead to a good parse. I want to have all of these reflected in the final parse forest.

The way I fix up the interrupted parse is by using ruby slippers. I grab an arbitrary slipper (all of the ruby slippers will lead to good parses) and step into them. When I kick of the ruby slipper with resume it parses the whole text handed over by the ruby slippers, which results in a moving of the G1 location. I get back control in the external lexer after this, but I am not in the previous location any more as far as the parse is concerned. I can now not just step into the next ruby slipper and try again, as this assumes that I am still in the previous location. So while I know exactly what the other possible parses are (they would result from stepping into the other ruby slippers), I can no longer step into them as they are no longer applicable. The first ruby slipper has whisked me away.

If I could somehow access before I step into the ruby slippers the parser location and roll back when I return from the ruby slipper journey, then I could step into the next slipper. But all I got is the position in the input stream.

Th.

Jeffrey Kegler

unread,
Feb 15, 2015, 4:23:05 PM2/15/15
to Marpa Parser Mailing LIst
If you use ambiguous lexemes, you can step into all the Ruby Slippers at once.  So you do get whisked away, but it is to everywhere.

Ruslan Shvedov

unread,
Feb 16, 2015, 8:40:21 AM2/16/15
to marpa-...@googlegroups.com
Thomas, thanks for making it more clear for me. I'm not sure I've got all the details right, so, based on my current understanding:

Looks like you can incorporate your ruby slippers into grammar as lexemes/rules so that Marpa recognize the input according to them. Marpa is good at ambiguity so it will gather all possible parses for you into a parse forest and let you build your (partial) parses via ASF (abstract syntax forest). 

You can also save partial parses from semantic action subs defined for the corresponding rules or use completion events to gather the data you need at the parse time, like Ron Savage has been doing recently with his Text::Delimited::Marpa module [1]. 

Again, I'm not sure I've got all the details, but I honestly can't see why you need to handle every rejection manually via ruby slippers instead of defining them in the grammar and letting Marpa do its job and using events or actions to gather data.





Ron Savage

unread,
Feb 16, 2015, 6:13:48 PM2/16/15
to marpa-...@googlegroups.com
I should add to rns' compliment that I do not use code which promotes ambiguity. In all the cases I've dealt with so far, I've been able to avoid it without that interfering with the quality of the software.

I either pre-empt ambiguity by crafting the grammar in such a way as to stop it, or I accept that some inputs will force Marpa to return more that one event simultaneously, when I check the list of events triggered at each pause.

In the first case I run the code and modify the grammar as appropriate. In the second case I choose which event to accept, and output (to my code, not back to Marpa) 1 particular token [*]. If it's a combination of events I have not allowed for, the code dies.

[*] I should clarify: I accept the token (lexeme) Marpa identified as triggering multiple events, but I choose how to identify that token. For example, a char or string could be a member of 2 sets, whose ranges overlap, so I have to choose which set to regard the token as belonging to.

So, this is different from your situation I know, but perhaps it will help.

Reply all
Reply to author
Forward
0 new messages