Hi,
I have the following code:
import os, sys; sys.path.insert(0, os.path.join("..", ".."))
from pattern.web import Wikipedia
from pattern.web import Google, plaintext
from pattern.web import SEARCH,URL
from pattern.en import parse
engine = Wikipedia(language="en")
article = engine.search("alice in wonderland", cached=True, timeout=30)
for s in article.sections:
print s.title.upper()
para = s.content
for word, tag in tag(para,tokenize=True,encoding = 'unicode'):
if tag == "NN":
print word,tag
But when I run this, I get the following unicode error:
TypeError: 'unicode' object is not callable. The output from my ipython is:
In [60]: word
Out[60]: u'.'
In [61]: tag
Out[61]: u'.
The stdout is:
ALICE'S ADVENTURES IN WONDERLAND
novel NN
girl NN
rabbit NN
hole NN
fantasy NN
world NN
tale NN
logic NN
story NN
popularity NN
nonsense NN
genre NN
narrative NN
course NN
structure NN
imagery NN
culture NN
literature NN
fantasy NN
genre NN
BACKGROUND
Changing to just for word, tag in tag(para) doesn't help either. Do you know what setting I have to use? I am using OSX 10.7.5, ipython 0.12.1 and Xcode 2.6.5. I also regularly get the error that it can't parse u\u2014 which I believe is a "-" dash. Is this a common error? The examples provided in the source code don't work for many wikipedia pages due to this encoding issue. Thanks.
Regards,
Brendan