Conditional parsing

26 views
Skip to first unread message

progr...@gmail.com

unread,
May 1, 2014, 4:54:12 PM5/1/14
to modgr...@googlegroups.com
I am working on parsing a Wiki markup language. I started with regex, came to PLY and now I am looking at modgrammar (which I think is the solution I am looking for). However I miss the feature of conditional parsing that I made use of in PLY (http://www.dabeaz.com/ply/ply.html#ply_nn21) quite heavily - or I simply don't get how this could be done with modgrammar.

In pseudo code I am looking for a kind of behavior like this:

if grammar1 matches the next few characters in input_text:
    parse the next lines using grammarA
else:
    parse the next lines using grammar2


I could imagine to define grammar1 and as a subclass of it a grammarA and then do a grammar like

class TotalGrammar(Grammar):
    grammar = G(grammar1) | G(grammar2)

Small problem that I see is that for a real world wiki markup language I will have lots of grammars and OR statements which will lead to:

class TotalGrammar(Grammar):
    grammar = G(grammar1) | G(grammar2) | G(grammar3) | ... | G(grammarN)

And finally I need to repeat it:
class FinalGrammar(Grammar):
    grammar = REPEAT(TotalGrammar)
to get everything out - but it would work in my view.

But the crucial point in this approach is the subclass grammarA:
I would need to downstream the remainder() of the input_text to grammarA and somehow skip afterwords in input_text the text absorbed by grammarA (for the next REPEAT iteration). I think this can be done - but this is a kind of hack in my view (hard to get it coded and hard to understand after a few weeks).

Is there a better / more elegant way to implement conditional parsing with the help of modgrammar?


Alex Stewart

unread,
May 2, 2014, 5:27:12 PM5/2/14
to modgr...@googlegroups.com
Hi there!

I'm not sure I entirely understand what you're wanting to do..  it might be helpful if you could provide a simple example?

If I am understanding some of it correctly, though, I don't think you should need to use subclassing here (it's generally very rare that you'll ever want to subclass grammar classes in Modgrammar anyway)..  Is there some reason you couldn't just do something like the following?

grammar = G(grammar1, grammarA) | grammarB

(this will try to match the next bit of input against grammar1, and if it succeeds will then match the next bit after that against grammarA, or if any of that fails, it will proceed to try matching (from the beginning) against grammarB instead.  Note that Modgrammar always tries grammars in left-to-right order, so this will always have a deterministic (first try grammar1, else try grammarB) result.)

I have to admit, I'm not too familiar with "conditional lexing", but it seems like it's really a bit of a hack that was developed for systems which perform parsing and lexing in separate stages, as sometimes the lexer doesn't have all the information it might need from the parser to figure things out otherwise.  Modgrammar, however, performs both parsing and lexing all in one process, so this sort of thing really shouldn't be needed.  You should be able to just specify your whole grammar as one large tree, and let Modgrammar handle all the tricky if-this-then-the-next-thing-should-be-that questions internally..

Regarding the complexity of long OR clauses and such, that's where grouping things into subgrammars and then building up your grammar using multiple smaller grammar classes can be useful.  I could probably give you some pointers if I knew more of the specifics about your particular grammar..

--Alex



--
You received this message because you are subscribed to the Google Groups "modgrammar" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modgrammar+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

progr...@gmail.com

unread,
May 3, 2014, 3:47:43 PM5/3/14
to modgr...@googlegroups.com
Dear Alex,

thanks for your swift reply, it helped a lot. The question was a kind of "strategy" question as I need to refactor my code to be based on modgrammar (therefore no code snippets).

The hint you gave:
grammar = G(grammar1, grammarA) | grammarB
is the right one (for the if/then/else problem) - but effectively I realized that the hidden problem on my side was (is) my limited knowledge and experience on recursion (which is the true explanation for the hack I was thinking about).

So I need a deep dive there (did the first one already today).

The approach I am taking now is the following (exemplary for a simplified markup language): 
* sections indicated by "="s at the front and the end
* templates indicated by double curly braces "{{" "}}"
* some other grammar classes indicated by ... to dig out content

class Template(Grammar):
    grammar = ("{{", REF(Template_Recursion), "}}")

class Template_Recursion(Grammar):
    grammar = ( ... | ... | Templates)

class Section(Grammar):
    grammar = ( "=", REF(Section_Recursion), "=")

class Section_Recursion(Grammar):
    grammar = ( ... | ... | Section)

class ML(Grammar):
    grammar = (Template_Recursion | Section_Recursion)

;) Sounds like a plan - to me this looks kind of beautiful.

This solves the "local" problem in my view. The "global" problem (ML will stop after one iteration) I haven't decided yet. A simple REPEAT(ML) grammar could solve it - however with a quite large markup language definition I fear a bit the devil in the detail (e.g. infinite loops due to small language errors in the content of the wiki or small errors in my grammar definitions).

;) I might come back with questions on ParseError

Any comments welcome - but I think I am back on the right track. In case of interest I can share my experience (might take a bit).

Thanks for modgrammar! It is a great tool. It simply "feels" right.

Best Regards

Kai

progr...@gmail.com

unread,
May 4, 2014, 4:18:50 PM5/4/14
to modgr...@googlegroups.com
As the code above just shows the idea, here a small snippet for the nested templates that I got to run through on some examples:

class OpenCB(Grammar):
    grammar = L('{{')


class CloseCB(Grammar):
    grammar = L('}}')

class TemplateContent(Grammar):
    grammar = ANY_EXCEPT('{}')


class TemplateFrame(Grammar):
    grammar = (OpenCB, REF('Template'), CloseCB)


class TemplateNested(Grammar):
    grammar = G(OPTIONAL(TemplateContent), TemplateFrame, OPTIONAL(TemplateContent))


class Template(Grammar):
    grammar = (TemplateContent | REPEAT(TemplateNested) | TemplateFrame)

if __name__ == '__main__':

    Template.grammar_resolve_refs()
    myparser = Template.parser()
    text = myparser.parse_string('{{ABC{{DEF}}GHI{{JKL}}}}')
    print(text)
    print(text.elements)


Only thing I am wondering at the moment is wether the TemplateNested grammar could be implemented using recursion (the Optional/Repeat structure looks like as if). ;) But as stated - I need to dig deeper.

Thanks for your help. I am on the right track.


Kai
Reply all
Reply to author
Forward
0 new messages