POS tagging issues with NLTK

1,086 views
Skip to first unread message

Toddy Mladenov

unread,
Mar 6, 2016, 3:08:02 PM3/6/16
to nltk-...@googlegroups.com
Hello,

Just installed the latest NLTK and trying to use POS tagging of a simple instance but getting the following issue:

Python 3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:43:06) [MSC v.1600 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> custtext = "Official account for Microsoft Customer Service & Support. We’re available Monday- Friday from 4a-10p PST and on Saturday and Sunday from 4a-4p PST."
>>> import nltk
>>> sent_tokens = nltk.word_tokenize(custtext)
>>> sent_tokens
['Official', 'account', 'for', 'Microsoft', 'Customer', 'Service', '&', 'Support', '.', 'We', "'re", 'available', 'Monday-', 'Friday', 'from', '4a-10p', 'PST', 'and', 'on', 'Saturday', 'and', 'Sunday', 'from', '4a-4p', 'PST', '.']
>>> nltk.pos_tag(sent_tokens)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\PYTHON34\lib\site-packages\nltk-3.2-py3.4-win32.egg\nltk\tag\__init__.py", line 110, in pos_tag
    tagger = PerceptronTagger()
  File "C:\PYTHON34\lib\site-packages\nltk-3.2-py3.4-win32.egg\nltk\tag\perceptron.py", line 141, in __init__
    self.load(AP_MODEL_LOC)
  File "C:\PYTHON34\lib\site-packages\nltk-3.2-py3.4-win32.egg\nltk\tag\perceptron.py", line 209, in load
    self.model.weights, self.tagdict, self.classes = load(loc)
  File "C:\PYTHON34\lib\site-packages\nltk-3.2-py3.4-win32.egg\nltk\data.py", line 801, in load
    opened_resource = _open(resource_url)
  File "C:\PYTHON34\lib\site-packages\nltk-3.2-py3.4-win32.egg\nltk\data.py", line 924, in _open
    return urlopen(resource_url)
  File "C:\PYTHON34\lib\urllib\request.py", line 161, in urlopen
    return opener.open(url, data, timeout)
  File "C:\PYTHON34\lib\urllib\request.py", line 463, in open
    response = self._open(req, data)
  File "C:\PYTHON34\lib\urllib\request.py", line 486, in _open
    'unknown_open', req)
  File "C:\PYTHON34\lib\urllib\request.py", line 441, in _call_chain
    result = func(*args)
  File "C:\PYTHON34\lib\urllib\request.py", line 1252, in unknown_open
    raise URLError('unknown url type: %s' % type)
urllib.error.URLError: <urlopen error unknown url type: c>

Any ideas what is the issue?

Thanks!
Toddy

avitalp

unread,
Mar 6, 2016, 5:05:36 PM3/6/16
to nltk-users
Hi Toddy,

I believe it's looking for the pickled averaged perceptron tagger model file. Run nltk.download() first and either use the GUI that pops up to download/instaii it; or if you're in CLI mode, select "Download" and type "averaged_perceptron_tagger" when prompted for the id. Once that completes I believe your code should run fine.

Hope this helps,
--Avi

Toddy Mladenov

unread,
Mar 7, 2016, 12:53:22 AM3/7/16
to nltk-...@googlegroups.com
Hi Avi,

I already downloaded the complete set of collections using the GUI and the averaged_perception_tagger is already available, however I still get the error.

Best,
Toddy

--
You received this message because you are subscribed to the Google Groups "nltk-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nltk-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

david.ha...@cnu.edu

unread,
Mar 9, 2016, 4:09:07 PM3/9/16
to nltk-users
I was having this same issue, and downgrading to the previous version seems to have fixed the issue.

Toddy Mladenov

unread,
Mar 10, 2016, 5:48:02 PM3/10/16
to nltk-...@googlegroups.com
Hi David,

I assume you mean downgrading to previous version of NLTK. How did you do that? The only version available in pip is 3.2, which is the one failing.

Best,
Toddy

Tim Santos

unread,
Mar 11, 2016, 4:43:58 AM3/11/16
to nltk-users
seems like downgrading to 3.1 fixed this for me too..

Alexis

unread,
Mar 12, 2016, 4:04:08 PM3/12/16
to nltk-...@googlegroups.com
To downgrade to an earlier version, find and delete the current nltk package in the site-packages directory, then ask pip for version 3.1 like this:

    pip install nltk==3.1

I came across this information here: http://stackoverflow.com/q/35827859/699305


A pity that this is necessary... Let's hope version 3.2.1 will be out soon with a bugfix!

Alexis

alvations

unread,
Mar 13, 2016, 11:29:27 AM3/13/16
to nltk-users
Possibly we need more tests on windows in the future. 

Meanwhhile here's a temporary hack without downgrading:

>>> from nltk.tag import PerceptronTagger
>>> from nltk.data import find
>>> PICKLE = "averaged_perceptron_tagger.pickle"
>>> AP_MODEL_LOC = 'file:'+str(find('taggers/averaged_perceptron_tagger/'+PICKLE))
>>> tagger = PerceptronTagger(load=False)
>>> tagger.load(AP_MODEL_LOC)
>>> pos_tag = tagger.tag
>>> pos_tag('The quick brown fox jumps over the lazy dog'.split())
[('The', 'DT'), ('quick', 'JJ'), ('brown', 'NN'), ('fox', 'NN'), ('jumps', 'VBZ'), ('over', 'IN'), ('the', 'DT'), ('lazy', 'JJ'), ('dog', 'NN')]
Reply all
Reply to author
Forward
0 new messages