Status: New
Owner: ----
Labels: Type-Defect Priority-Medium
New issue 748 by
samboosa...@gmail.com: nltk.tag.pos_tag() fails with
UnicodeDecodeError
http://code.google.com/p/nltk/issues/detail?id=748
nltk.__version__
'3.0a2'
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
to reproduce...
import nltk
text = 'the dog eats'
tokens = nltk.tokenize.word_tokenize(text)
tags = nltk.tag.pos_tag(tokens)
should be...
[('the', 'DT'), ('dog', 'NN'), ('eats', 'NNS')]
is...
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/nltk-3.0a2-py3.3.egg/nltk/data.py
in
load(resource_url, format, cache, verbose, logic_parser, fstruct_parser,
encoding)
640 elif format == 'pickle':
--> 641 resource_val = pickle.load(opened_resource)
642 elif format == 'yaml':
643 import yaml
UnicodeDecodeError: 'ascii' codec can't decode byte 0xcb in position 0:
ordinal not in range(128)
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
# # # # # # # # # # # # # # # #
i fixed it locally by adding "encoding='ISO-8859-1'" to "pickle.load()" in
line 641 of the data.py file in error message.
i don't know if this breaks other things though, or if this is the right
way to do it, or what the root cause is (e.g. pickling across versions).
--
You received this message because this project is configured to send all
issue notifications to this address.
You may adjust your notification preferences at:
https://code.google.com/hosting/settings