string matchings fails without anonymous subgrammar

11 views
Skip to first unread message

FatsieFS

unread,
Jan 20, 2020, 9:19:59 AM1/20/20
to modgrammar
I have following code:

class String(Grammar):
    grammar_whitespace_mode
= "explicit"
    grammar
= L('"'), ZERO_OR_MORE(ANY_EXCEPT('\n"\\')|(L('\\'), ANY)), L('"')

class Strings(Grammar):
    grammar
= ONE_OR_MORE(String)

class StringG(Grammar):
    grammar_whitespace_mode
= "explicit"
    grammar
= G(L('"'), ZERO_OR_MORE(ANY_EXCEPT('\n"\\')|(L('\\'), ANY)), L('"'))

class StringGs(Grammar):
    grammar
= ONE_OR_MORE(StringG)

Strings Grammar fails to parse a string with a single space, the StringGs is OK:
In [3]: p = Strings.parser()                                                                                                                                      
In [4]: ss = p.parse_string('" "')                                                                                                                                                                                                
---------------------------------------------------------------------------
ParseError                                Traceback (most recent call last)
<ipython-input-4-2e3c7158d7cf> in <module>
----> 1 ss = p.parse_string('" "')

~/anaconda2/envs/nmigen/lib/python3.7/site-packages/modgrammar/__init__.py in parse_string(self, string, data)
   
519     self.reset()
--> 520     for result in self._parse_text(string, True, True, data, 'complete'):
   
521       # This will always just return the first result
   
522       return result

~/anaconda2/envs/nmigen/lib/python3.7/site-packages/modgrammar/__init__.py in _parse_text(self, string, bol, eof, data, matchtype)
   
453
   
454     while True:
--> 455       count, obj = self._parse(pos, session, matchtype)
   
456       if count is None:
   
457         # Partial match

~/anaconda2/envs/nmigen/lib/python3.7/site-packages/modgrammar/__init__.py in _parse(self, pos, session, matchtype)
   
393         char = self.char + errpos
   
394         line, col = util.calc_line_col(self.text.string, errpos, self.line, self.col, self.tabs)
--> 395         raise ParseError(self.grammar, self.text.string, errpos, char, line=line, col=col, expected=expected)
   
396       if count is None:
   
397         # We need more input

ParseError: [line 1, column 3] Expected '\\' or ANY_EXCEPT('\n"\\'): Found '"'

In [5]: pg = StringGs.parser()                                                                                                                                    
In [6]: ssg = pg.parse_string('" "')                                                                                                                              

Why is this ? Can this be considered a bug ?

FatsieFS

unread,
Jan 20, 2020, 2:08:41 PM1/20/20
to modgrammar

Why is this ? Can this be considered a bug ?

I already understand why the second parser does not fail. Using the G() will generate an anonymous subgrammar that uses the default white_space mode which is optional.
If I change:
    grammar = G(L('"'), ZERO_OR_MORE(ANY_EXCEPT('\n"\\')|(L('\\'), ANY)), L('"'))
to:
    grammar = G(L('"'), ZERO_OR_MORE(ANY_EXCEPT('\n"\\')|(L('\\'), ANY)), L('"'), grammar_whitespace_mode="explicit")
I get the same error.
I do think this is a bug and the grammar should parse the single whitespace string when mode is "explicit"

FatsieFS

unread,
Jan 20, 2020, 3:00:53 PM1/20/20
to modgrammar
OK, seems I need to create ZERO_OR_MORE with explicit whitespace mode. Having the following code does seem to the right thing:

class String(Grammar):
    grammar_whitespace_mode
= "explicit"

    grammar
= L('"'), ZERO_OR_MORE(ANY_EXCEPT('\n"\\') | (L('\\'), ANY), grammar_whitespace_mode="explicit"), L('"')


Alex Stewart

unread,
Jan 20, 2020, 11:26:22 PM1/20/20
to modgr...@googlegroups.com
Aha..  it took me a little bit of pondering on this but I think I see what's going on..  because ZERO_OR_MORE is using the default optional whitespace mode, it will skip over any space in front of a matching sub-element, but if it skips the space, then the sub-element expression fails to match anything, so it ends up with zero matching elements, this would be ok, except that if ZERO_OR_MORE doesn't match anything, then it also doesn't consume any of the whitespace, and the space is left for the outer grammar to deal with, and the outer grammar won't ignore it (since it's using explicit whitespace mode), and also doesn't explicitly match it, so then that fails.  This is really yet another example of how things get really messy when mixing whitespace modes in the same grammar..

I'm actually not sure whether this qualifies as a bug or not.  Looking at things one way, it seems like it should be, but on the other hand if the behavior is changed then it might cause unexpected/undesirable effects in other cases.  I'll have to think on that a bit more to figure out whether there's actually a "right" answer to this sort of thing..

At least it looks like you've already figured out an appropriate workaround for it on your own, though..  Well done :)

FYI, for this particular case, there's actually already a grammar to match the standard quoted-string-with-backslash-escapes pattern provided in the (unfortunately, poorly documented) modgrammar.extras module:

modgrammar.extras.QuotedString

If nothing else, that one should be somewhat more efficient to use since it takes advantage of faster regular-expression matching internally instead of building things up from sub-grammars.

I really should document those extras a bit better, though..

--
You received this message because you are subscribed to the Google Groups "modgrammar" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modgrammar+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/modgrammar/2c7ca787-8f8f-4a9e-ac23-dbcdf4636c8b%40googlegroups.com.

FatsieFS

unread,
Jan 21, 2020, 5:18:22 AM1/21/20
to modgrammar

I really should document those extras a bit better, though..

One of the main reasons why I like modgrammar is that you don't need to specify whitespace in your rules everywhere in optional whitespace mode. But for the grammars I'm writing for existing languages I needed to switch to explicit in some low level rules. Now that I found RE and REGrammar I think I can replace these rules with regular expressions so I fully agree with your statement :)

FatsieFS

unread,
Jan 21, 2020, 5:37:42 AM1/21/20
to modgrammar

modgrammar.extras.QuotedString


How do I use QuotedString ? I get the following error when trying to include it in a grammar:

(nmigen) [verhaegs@localhost pdkmaster]$ ipython
Python 3.7.6 (default, Jan  8 2020, 19:59:22)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.11.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from modgrammar import *                                                                                                                            

In [2]: from modgrammar.extras import *                                                                                                                                                                                          
In [3]: class String(Grammar):
   
...:     grammar = QuotedString
   
...:                                                                                                                                                                                                                          
In [4]: p = String.parser()                                                                                                                                                                                                      
In [5]: print(p.parse_string('" "'))                                                                                                                                                                                              
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-5-240f7c39ba59> in <module>
----> 1 print(p.parse_string('" "'))


~/anaconda2/envs/nmigen/lib/python3.7/site-packages/modgrammar/__init__.py in parse_string(self, string, data)
   
519     self.reset()
--> 520     for result in self._parse_text(string, True, True, data, 'complete'):
   
521       # This will always just return the first result
   
522       return result

~/anaconda2/envs/nmigen/lib/python3.7/site-packages/modgrammar/__init__.py in _parse_text(self, string, bol, eof, data, matchtype)
   
453
   
454     while True:
--> 455       count, obj = self._parse(pos, session, matchtype)
   
456       if count is None:
   
457         # Partial match

~/anaconda2/envs/nmigen/lib/python3.7/site-packages/modgrammar/__init__.py in _parse(self, pos, session, matchtype)

   
367         if debugger:
   
368           parsestate = debugger.debug_wrapper(parsestate, self.grammar, pos, self.text)
--> 369         count, obj = next(parsestate)
   
370       else:
   
371         count, obj = parsestate.send(self.text)

~/anaconda2/envs/nmigen/lib/python3.7/site-packages/modgrammar/__init__.py in grammar_parse(cls, text, index, session)
   
702           s = debugger.debug_wrapper(s, g, pos, text)
   
703         while True:
--> 704           offset, obj = next(s)
   
705           while offset is None:
   
706             text = yield (None, None)

~/anaconda2/envs/nmigen/lib/python3.7/site-packages/modgrammar/__init__.py in grammar_parse(cls, text, index, session)
   
637     """
    638
--> 639     grammar = cls.grammar
    640     grammar_min = cls.grammar_min
    641     grammar_max = cls.grammar_max

AttributeError: type object 'QuotedString' has no attribute 'grammar'


Alex Stewart

unread,
Jan 21, 2020, 12:49:13 PM1/21/20
to modgr...@googlegroups.com
Sorry, I should have been a bit more explicit.  grammar sub-elements need to be class instances of grammar classes (not the classes themselves), so in order to use QuotedString you need to say something like:

grammar = QuotedString()

(this distinction is hidden a bit with a lot of the built-in grammars but is important whenever you're using a grammar class that's not one of the builtins)

--
You received this message because you are subscribed to the Google Groups "modgrammar" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modgrammar+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages