Parser Results in NoneType

22 views
Skip to first unread message

Nekroze Embrace

unread,
Dec 30, 2012, 12:19:29 AM12/30/12
to modgr...@googlegroups.com
I am new to using modgrammar and would like to use it for my latest project, a transcompiler in python. However i am trying to make a simple grammar that takes in only what it knows about and skips over (possibly saving later but thats another issue atm) everything else.

Here is my current code:
from modgrammar import *
class Import(Grammar):
    """ Matches: import frock.panties|import fruitbowl """
    grammar = ("import", WORD("A-Za-z"), LIST_OF(WORD("A-Za-z"), sep='.', min=0))
    def elem_init(self, sessiondata):
        self.rpath = (self[1].string + self[2].string).replace('.', '/') + ".tc"
class ImportRetreiver(Grammar):
    grammar = (ZERO_OR_MORE(Import | REST_OF_LINE))
parser = ImportRetreiver.parser()
string = """import foobar"""
result = parser.parse_string(string)
importPaths = [imp.rpath for imp in result.find_all(Import)]
print(importPaths)

result always ends up as None, why is this? As far as i can see it should be able to detect and return any line that is a valid import statement (pretty much the same as a basic python import statement) and everything else should match REST_OF_LINE and do nothing with it

I have tried to remove the or REST_OF_LINE part from the ImportRetreiver grammar and it still returned None even though all that is provided matches the Import grammar.

Any assistance would be massively appreciated especially in this holiday season so thanks even for reading!

Alex Stewart

unread,
Jan 3, 2013, 3:54:20 PM1/3/13
to modgr...@googlegroups.com
Hi there :)

You're not the first person to be tripped up by this..  The parser is designed to allow you to submit text to it repeatedly, in whatever chunks are convenient, so it doesn't assume that the end of the string is necessarily the end of the input (there might be more input coming in another call to parse_string).  Whenever you receive None back from a parse call, what it means is "I don't know whether I've matched the full thing yet, I need more input to be sure".

With this particular grammar, and the input you've given it, there are several things it can't be sure about:
  1. There could be more "."+word terms coming as part of your import line
  2. There could be more import lines coming
  3. There could be more text coming that would match the REST_OF_LINE

etc..

If the end of the string is actually the end-of-input, then what you need to do is call parse_string with the "eof=True" parameter.  This tells it "there's no more input coming, so match whatever you can with what you've got".  (Any parser call with "eof=True" is guaranteed to either return a result or raise a ParseError.  It will never return None.)

Now, that having been said, there is another problem with your grammar as it stands.  If you add "eof=True" to the parse_string call, you will probably notice that it actually never returns.  This is because you have ZERO_OR_MORE and inside it you have REST_OF_LINE.  If you're at the end of the file, REST_OF_LINE will still successfully match, it will just match the empty string.  In combination with a repetition like ZERO_OR_MORE, the grammar then ends up matching an infinite series of empty strings at the end of the input..

(This is actually a bug.  It's supposed to error out if you do this, but for some reason it's not doing it, and I'm going to look into that, but in any case, this construct is probably not going to do what you want it to do)

It should probably be documented more clearly, but in general, REST_OF_LINE should always be followed by EOL (or some other grammar that will match an end-of-line).  Of course, EOL also counts as whitespace so in order to match it we need to make the grammar non-whitespace-consuming (or redefine whitespace to not include newlines) by changing the grammar_whitespace setting.  You probably want to do this anyway, because for this sort of syntax you probably don't want people to be able to use just any old whitespace (or no whitespace) anywhere they want.  For example, currently the following would all be valid matches for Import:

  import foo.bar
  import foo . bar
  importfoo.bar
  import foo\n.\nbar

(the last three probably should not be valid, I'm thinking)

So what you probably want to do is set grammar_whitespace = False, and then explicitly require space where you want it, i.e.:

grammar_whitespace = False

SPACE = WORD(' \t')

class ImportLine(Grammar):
    """ Matches: import frock.panties|import fruitbowl """
    grammar = ("import", SPACE, LIST_OF(WORD("A-Za-z"), sep='.'), OPTIONAL(SPACE), EOL | EOF)
    def elem_init(self, sessiondata):
        self.rpath = self[2].string.replace('.', '/') + ".tc"

class ImportRetreiver(Grammar):
    grammar = (ZERO_OR_MORE(ImportLine | (REST_OF_LINE, EOL)))

(Note that I also changed your "WORD, LIST_OF(WORD, min=0)" construct to just be a simple LIST_OF instead.  It's a simpler way to match the same thing..)

Note that there are also a few things which will be changing in the next couple of releases of Modgrammar which will probably affect your grammar:
  • The default meaning of SPACE will be changing (so you won't need to redefine it like I did above)
  • Grammars will default to non-whitespace-consuming mode (which again, is more convenient for you)
  • You will be able to specify a "whitespace required" mode for certain grammars, which means you won't have to put in SPACE between all your terms all over the place
  • parse_string will act more like you expected it to act originally (i.e. it will assume the end-of-string is the end-of-input).  If you want the old behavior you'll need to use parse_text instead.
Some of this is already in the 0.9 release which I'm just packaging up now to be released in the next day or two.  The other (non-backwards-compatible) changes will come with 0.10.  (When 0.9 is out you should probably start working against that and run your stuff with the '-Wd' flag to python to print deprecation warnings.  That will make sure you'll be compatible with 0.10 when it comes out..)

I hope this helps..

--Alex
Reply all
Reply to author
Forward
0 new messages