I'm trying to put a list containing Unicode strings into nltk.Text(),
but an error occurs:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/nltk/text.py",
line 283, in __init__
self.name = " ".join(map(str, tokens[:8])) + "..."
UnicodeEncodeError: 'ascii' codec can't encode characters in position
0-2: ordinal not in range(128)
I'm tried to start Python shell with "python -U", but it doesn't work as well.
Anyone has any idea on this ?
Or should I workaround it by change the code in text.py to something
u" ".join(....) ?
cheers,
Art
p.s. thanks Steven Bird for fixed the nltk-users link on
http://www.nltk.org/ :)
Dear NLTK,
I'm trying to put a list containing Unicode strings into nltk.Text(),
but an error occurs:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/nltk/text.py",
line 283, in __init__
self.name = " ".join(map(str, tokens[:8])) + "..."
UnicodeEncodeError: 'ascii' codec can't encode characters in position
0-2: ordinal not in range(128)
thank you, it's work!
instead of appending u"xxx", I append u"xxx".encode("utf-8", "replace").
thanks,
Art
--
สิ่งเดียวที่ประชาชนจะมีในการปกป้องตัวเอง
นั้นคือความเมตตาและความเข้มแข็งของรัฐ
ไม่ใช่สิทธิเสรีภาพที่ใช้ห่อหุ้มตัวเขา
(อะจ๊าก)
http://bit.ly/4DXd