Issue with nltk.data.load

63 views
Skip to first unread message

Den Hol

unread,
Aug 14, 2016, 9:32:25 AM8/14/16
to nltk-users
Hello,

I'm a newbie with NLTK and am encountering the following issue.

I am working on a Windows 7 64bits machine with a clean install of Python 3.5 (32 bits) and NLTK 3.2.1

I am now going through the initial examples in the "Python 3 Text Processing with NLTK 3 Cookbook"

I encounter an issue when I try to execute the following command:

import nltk.data
tokenizer = nltk.data.load('tokenizers/punkt/PY3/english.pickle')

and get the following error:

OSError: No such file or directory: 'C:\\nltk_data\\tokenizers\\punkt\\PY3\\PY3\\english.pickle'

==

I checked and there is a C:/nltk_data/tokenizers/punkt/PY3/english.pickle file in my computer 

But I don't understand why the system tries to look in 'C:/nltk_data/tokenizers/punkt/PY3/PY3' ! since there is no such folder !

Note that when I try to execute: tokenizer = nltk.data.load('tokenizers/punkt/english.pickle') things work fine...
BUT I was told by a colleague that for Python3 I should only use the PY3 folders.

==

Any hints would be much appreciated !!!

D

Steven Bird

unread,
Aug 14, 2016, 9:37:28 AM8/14/16
to nltk-users

The "PY3" is added to the path automatically if you're using Python 3. Don't include it in the corpus name yourself.

-Steven Bird


--
You received this message because you are subscribed to the Google Groups "nltk-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nltk-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dimitriadis, A. (Alexis)

unread,
Aug 15, 2016, 5:46:33 AM8/15/16
to nltk-...@googlegroups.com
Sounds like it would be a useful feature to check for the presence of “PY3” before adding it to the path…

Alexis


Dr. Alexis Dimitriadis | Assistant Professor and Senior Research Fellow | Utrecht Institute of Linguistics OTS | Utrecht University | Trans 10, 3512 JK Utrecht, room 2.33 | +31 30 253 65 68 | a.dimi...@uu.nl | www.hum.uu.nl/medewerkers/a.dimitriadis

Dimitriadis, A. (Alexis)

unread,
Aug 15, 2016, 5:53:56 AM8/15/16
to nltk-...@googlegroups.com
(That wasn’t very clear so let’s try it again:)

I think it would be a useful feature if the nltk (nltk.data.load) would check for the presence of “PY3” before inserting it into a user-supplied path.

Alexis


Dr. Alexis Dimitriadis | Assistant Professor and Senior Research Fellow | Utrecht Institute of Linguistics OTS | Utrecht University | Trans 10, 3512 JK Utrecht, room 2.33 | +31 30 253 65 68 | a.dimi...@uu.nl | www.hum.uu.nl/medewerkers/a.dimitriadis

Steven Bird

unread,
Aug 18, 2016, 11:50:45 PM8/18/16
to nltk-users
I think it's an error to include "PY3" in the corpus pathname.

We've had to provide Python 3 versions of some corpora alongside the existing corpora, just in order to support NLTK in both Python 2 and 3.

Once we drop Python 2 support, these corpora will be simplified and there'll be no need for a PY3 subdirectory. At that time, any user code that includes "PY3" in a corpus pathname will break.

So I think it's best not to be lax about this, and leave things as-is.

-Steven Bird

To unsubscribe from this group and stop receiving emails from it, send an email to nltk-users+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "nltk-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nltk-users+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "nltk-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nltk-users+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "nltk-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nltk-users+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages