OR operator between multiple grammars

18 views
Skip to first unread message

raghavan

unread,
Apr 11, 2014, 3:13:30 PM4/11/14
to modgr...@googlegroups.com
As I'm still wrangling with syntax, I have another problem:

I'm trying to combine two subgrammars like this:

grammar = (FirstName, OPTIONAL(MiddleName), OPTIONAL(LastName)) |  (LastName, LITERAL(","), FirstName)

I get an error: TypeError: unsupported operand type(s) for |: 'tuple' and 'tuple'


However, separately, they work - 

grammar1 = (FirstName, OPTIONAL(MiddleName), OPTIONAL(LastName))

grammar2 = (LastName, LITERAL(","), FirstName)


How would one combine multiple sub-grammars? Also, if I want to implement probabilistic context free grammar (PCFG), where each of these sub-grammars has a weight (from 0 to 1), how could I specify that?


Thanks!

Alex Stewart

unread,
Apr 11, 2014, 3:55:33 PM4/11/14
to modgr...@googlegroups.com
That is actually a limitation in Python.  Expressing grammars as:

(GrammarOne, GrammarTwo, GrammarThree)

in most cases is a shortcut that the Modgrammar code recognizes and automatically treats the same as:

GRAMMAR(GrammarOne, GrammarTwo, GrammarThree)

However, the Python interpreter itself doesn't know that the two are necessarily the same.  As far as it knows, the first example is a tuple, not a Grammar.  The "or" operator ("|") is defined for the Grammar type, but not for tuples.

Because of this, and because of the Python operator-searching rules, it can usually figure it out if one of the two terms is obviously a Grammar, for example, these all work:

GrammarOne | GrammarTwo  # Grammar OR Grammar --> Grammar
(GrammarOne, GrammarTwo) | GrammarThree  # tuple OR Grammar (automatically convert the tuple to a Grammar) --> Grammar
GrammarOne | (GrammarTwo, GrammarThree)  # Grammar OR tuple (automatically convert the tuple to a Grammar) --> Grammar

But this won't work:

(GrammarOne, GrammarTwo) | (GrammarThree, GrammarFour)  # tuple OR tuple --> Python doesn't know how to do this.

The way to fix this is to make one (or preferably both) explicitly a Grammar instead of a tuple, like so:

G(GrammarOne, GrammarTwo) | G(GrammarThree, GrammarFour)  # This works

("G(...)" is a shortcut for saying "GRAMMAR(...)", which would also work)

Another way to do the same thing would be to use the OR(...) function instead of the operator, like so:

OR((GrammarOne, GrammarTwo), (GrammarThree, GrammarFour))

(Personally, though, I prefer using the G(...) syntax as I think it's a bit easier to read/understand..)

Incidentally, you will also sometimes run into the same problem with literals, for basically the same reason:

GrammarOne | "foo"  # This works
"foo" | GrammarOne  # This works
"foo" | "bar"  # string OR string --> Python doesn't know how to do this

To solve that issue, you can wrap your literal strings in L(...) to force Python to realize they're actually grammars, not string objects (or just use OR(...) again):

L("foo") | L("bar")  # This works
OR("foo", "bar")  # This works too

Hope this helps..

As for PCFGs, I'm afraid I've never really gone very deeply down that rabbit hole, so I'm not really sure what would be required.  Modgrammar was not really built with that in mind, but it might be possible with some additional post-processing..

I think you might be able to do something akin to PCFGs by calling parse_text/parse_string with the matchtype='all' argument (which will return all possible matches, not just the first one found), and then taking the resulting set of parse trees and calculating all their weights to pick the best one.  (Note that this might not be the most efficient way to do that sort of matching, though.  It would probably depend on the complexity of your grammars and how many possible matches they could return..)

I will say if you do end up using Modgrammar for PCFGs, I'd be very interested in hearing the results, as I don't think that's something anybody else has done yet..

--Alex


--
You received this message because you are subscribed to the Google Groups "modgrammar" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modgrammar+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages