Problem with WORD

52 views
Skip to first unread message

aditya...@gmail.com

unread,
Mar 16, 2014, 9:07:03 AM3/16/14
to modgr...@googlegroups.com
Hi all,
I am trying to use modgrammar to parse a custom grammar.
When I tried this:

```
from modgrammar import *

class Alphabet(Grammar):
    grammar = (WORD("a-z"))

def main():
    alparser = Alphabet.parser()
    print(alparser.parse_string("a"))


if __name__ == "__main__":
    main()
```

I get None as output.
Can anyone please help?

Thanks,
Aditya

Aditya Shah

unread,
Mar 16, 2014, 10:46:07 AM3/16/14
to modgr...@googlegroups.com
Hi all,

I am actually trying to write the parser for the standard grammar

E -> E + T | T
T -> T * F | F
F -> (E) | Alphabet | Number

My code is:

>>>from modgrammar import *

>>>class Alphabet(Grammar):
    grammar = (L("x") | L("y") | L("z"))

>>>class Number(Grammar):
    grammar = (L("0") | L("1") | L("2") | L("3") | L("4") | L("5") | L("6") | L("7") | L("8") | L("9"))

>>>class F(Grammar):
    grammar = (("(", REF("E"), ")")| Alphabet | Number)

>>>class T(Grammar):
    grammar = ((REF("T"), "*", F) | F)

>>>class E(Grammar):
    grammar = ((REF("E"), "+", T) | T)

>>>def main():
    E.grammar_resolve_refs()
    sparser = E.parser()
    res = sparser.parse_string("1")
    print (res)


>>>if __name__ == "__main__":
    main()

The output that I get is:

>>>RuntimeError: maximum recursion depth exceeded

Can anyone tell me if I am doing this wrong?

Thanks,
Aditya

Aditya Shah

unread,
Mar 21, 2014, 12:19:33 AM3/21/14
to modgr...@googlegroups.com
Hi all,
I am using this for a project. Can anyone please help

Thanks,
Aditya

Alex Stewart

unread,
Mar 21, 2014, 3:40:59 PM3/21/14
to modgr...@googlegroups.com
Hey Aditya,

Sorry for the delay in responding.. life's been a bit up in the air for me the past couple of weeks..

Regarding your first question (WORD grammar returning None), what version of Modgrammar are you using?  I cannot reproduce the behavior you mention using the current version of Modgrammar (0.10).

(Note that if you are using an old version (pre-0.10), parse_string behaved slightly differently, and you needed to tell it explicitly that there's no more input coming (by using the eof=True argument).  Otherwise, it couldn't be sure that you weren't intending to call it again with more input (which might be more of the WORD).  In this case it would return None to indicate that it needed more input to be sure whether it had matched the end of the grammar or not.  In 0.10 the default behavior for parse_string was changed to assume eof=True, because this was the more common/expected case for most people.  If you are not using 0.10 (which it sounds like you're not), I would highly recommend you upgrade Mhttps://pythonhosted.org/modgrammar/tutorial.html#left-recursionodgrammar, as there are several improvements which make things easier to use in various ways.)

Regarding your second question ("maximum recursion depth exceeded"), what you have there is a classic case of left-recursion (https://pythonhosted.org/modgrammar/tutorial.html#left-recursion).  Unfortunately, there's just no good way to handle that sort of construct properly in a strict LR parser (which Modgrammar is).

The good news is that in most cases, there are other ways to define the grammar which have the same end result, but don't rely on left-recursion to do the job.  I have to admit, I'm having a little trouble wrapping my brain around the grammar you're wanting to use, so I'm not exactly sure, but it looks like you may be wanting to do something akin to infix math expressions, in which case you may want to take a look at the example of that in the Modgrammar docs for some ideas:


If that doesn't help, I might be able to provide some more suggestions if you can give me a better example of the sort of inputs you want to parse and the sort of output (parse trees) you're hoping to get out of them..

--Alex



Thanks,
Aditya

--
You received this message because you are subscribed to the Google Groups "modgrammar" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modgrammar+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Alex Stewart

unread,
Mar 21, 2014, 3:46:32 PM3/21/14
to modgr...@googlegroups.com
(Grrr..  Please ignore that first URL-like bit of text in my previous message.. it should have read "I would highly recommend you upgrade Modgrammar, as there are...".  Apparently Gmail's editor decided to screw up my message before I sent it by sticking extra bits of text in where they didn't belong..)

Aditya Shah

unread,
Mar 21, 2014, 11:05:04 PM3/21/14
to modgr...@googlegroups.com
@Alex Thanks so much for the reply. I would like to clarify a few things. Since I am working on a Python 2.x library (SymPy), I am using a python 2 backport of Modgrammar which I found at http://rembish.github.io/modgrammar-py2/. I think it uses version 0.7 or something, I am not quite sure.
I used eof = True and that solved the problem, thanks!
Also I used the standard leftrecursion elimination algorithm and voila! The parser now works.
The modified grammar for the same problem is:

E -> TE1
E1 -> +TE1 | e
T -> FT1
T1 -> *FT1 | e
F -> (E) | alphabet | number

The code is 
```

from modgrammar import *

class Alphabet(Grammar):
    grammar = (WORD("a-z"))

class Number(Grammar):
    grammar = (WORD("0-9"))

class F(Grammar):
    grammar = (("(", REF("E"), ")")| Alphabet | Number)

class T1(Grammar):
    grammar = (OPTIONAL("*", F, REF("T1")))

class T(Grammar):
    grammar = (F, T1)

class E1(Grammar):
    grammar = (OPTIONAL("+", T, REF("E1")))

class E(Grammar):
    grammar = (T, E1)

def main():
    E.grammar_resolve_refs()
    sparser = E.parser()
    res = sparser.parse_string("(x*2)+3", eof=True)
    print res


if __name__ == "__main__":
    main()

```

Thanks,
Aditya

Aditya Shah

unread,
Mar 21, 2014, 11:35:19 PM3/21/14
to modgr...@googlegroups.com
@Alex I have one more question in mind.

As mentioned in the code above, when I parse '(x*2) + 3', I get the correct output i.e. the parser recognizes the string properly but the issue is that when I print the elements I directly get 

```
(T<'(x*2)', ''>, E1<'+3'>)
```

I require the output to be something of this sort

```
('(', Alphabet<x>, "*", Number<2>, ')', "+", Number<3>)
```

Can you suggest anyway to achieve this?

Thanks,
Aditya

Alex Stewart

unread,
Mar 22, 2014, 2:31:39 PM3/22/14
to modgr...@googlegroups.com
The parse result returned by Modgrammar is a tree structure (each element contains its direct sub-elements, which contain their sub-elements, etc).  However, what it sounds like you're after is just a list of the terminal symbols.  For this, you probably want to take a look at the .terminals() or .tokens() methods (http://pythonhosted.org//modgrammar/libref.html#modgrammar.Grammar.terminals)..

If you want something more sophisticated than that, you might want to look into the "tags" mechanism, which allows you to identify specific element types with a particular tag, and then get a list of all of the result objects with that tag using the .find_all() method..

--Alex


--

Aditya Shah

unread,
Mar 22, 2014, 2:51:21 PM3/22/14
to modgr...@googlegroups.com
@Alex This definitely helped, thanks a lot.

Regards,
Aditya

happy puppy

unread,
May 6, 2015, 10:40:09 AM5/6/15
to modgr...@googlegroups.com
Hi Alex,

I have a similar problem but I have modgrammar 0.10 installed, using python 3.4.3.  

When I run this code I get Null printed:


import modgrammar as mg

grammar_whitespace_mode = 'optional'


class Expression(mg.Grammar):
grammar = (mg.WORD('h'),)


parser = Expression.parser()
result = parser.parse_text('hhhhh')

print(result)

When I debug into code I can see 0.10 files like C:\Python34\Lib\site-packages\modgrammar-0.10-py3.4.egg!\modgrammar\__init__.py

But if I parse_text('hhhhh<SPACE>') i.e. parse_text('hhhhh ') then I get a match. Am I doing something wrong?

happy puppy

Alex Stewart

unread,
May 12, 2015, 3:27:10 PM5/12/15
to modgr...@googlegroups.com
Hey there.. sorry, I was actually away last week and not checking my email regularly, so I didn't see this earlier..

Getting None back from the parser pretty much always means "I need more text".  This may mean that it hasn't actually fully matched anything yet, or it may mean that it does have what could be a match, but it doesn't know whether there's still more coming that might also be part of the match, so it isn't sure whether it's actually at the end yet or not.

parse_text is intended to be called multiple times with chunks of text to parse, so Modgrammar does not make any assumptions about the chunks you provide or whether it's at the end or not.  If you know you are at the end of the input text, you need to tell Modgrammar this by passing eof=True to parse_text:

result = parser.parse_text('hhhhh', eof=True)

Alternately, if your entire text to be parsed is contained in a single string (you don't need to feed it a bit at a time to the parser), you probably want to use parse_string instead of parse_text:

result = parser.parse_string('hhhhh')

(parse_string is basically equivalent to calling parse_text with the reset=True, eof=True, and matchtype='complete' arguments set)

Hope this helps..

--Alex

--
Reply all
Reply to author
Forward
0 new messages