Aha, I think you've stumbled upon a bug! So I cc the dev list too.
Apparently the default logic parser (nltk.sem.logic.LogicParser) requires that the argument is a str object, not unicode:
$ grep -n 'str)' logic.py
87: assert isinstance(name, str), "%s is not a string" % name
271: assert isinstance(type_string, str)
1748: assert isinstance(expr, str), "%s is not a string" % expr
1759: assert isinstance(expr, str), "%s is not a string" % expr
1770: assert isinstance(expr, str), "%s is not a string" % expr
And since you used .decode("UTF-8"), you get a unicode string. Unfortunately there's not much you can do until the bug is fixed, except not converting to unicode...
best,
Peter
31 maj 2012 kl. 02:46 skrev Mat Bettinson:
> Here's a strange one.
>
> cfg = nltk.data.load('file:zhongwen_zhi.fcfg','fcfg',verbose=True,cache=False)
>
> This works okay but the following equivalent:
>
> grammarrules = open("zhongwen_zhi.fcfg").read().decode("UTF-8")
> cfg = nltk.parse_fcfg(grammarrules)
>
> ... produces "AssertionError: first is not a string"
>
> on this grammar line:
>
> N[SEM=<first>] -> 'xian'
>
> Which makes me think calling parse_fcfg(string) is not equivalent to nltk.data.load('file:blah','fcfg').
>
> It should be, shouldn't it?
>
> --
> Regards,
>
> Mat Bettinson
>
>
>
> --
> You received this message because you are subscribed to the Google Groups "nltk-users" group.
> To post to this group, send email to
nltk-...@googlegroups.com.
> To unsubscribe from this group, send email to
nltk-users+...@googlegroups.com.
> For more options, visit this group at
http://groups.google.com/group/nltk-users?hl=en.