how to use nltk.path.load() to load corpora from a shared folder?

Lucas Champollion

unread,

Oct 2, 2020, 4:01:08 PM10/2/20

to nltk-users

I am teaching a large class using JupyterHub. I want to deploy a Jupyter Notebook that uses nltk. Each student should be able to access nltk's verson of the Brown corpus via a shared folder in a central location, so that no student has to download this corpus. Similarly for other datasets.

Using nltk.download("brown") I have downloaded the brown corpus and others into a folder nltk_data and moved this folder into the shared directory:

(otter-env) instructor@jupyter-lc126:~$ ls shared/nltk_data/corpora/
brown gutenberg inaugural reuters.zip state_union.zip webtext.zip
brown.zip gutenberg.zip inaugural.zip state_union webtext
(otter-env) instructor@jupyter-lc126:~$ ls shared/nltk_data/corpora/brown

ca01 ca28 cb10 cc10 ce03 (...)

How do I allow students to load these corpora from the shared directory?

Failed attempt:

nltk.data.load('/home/instructor/shared/nltk_data/corpora/brown/', format="text")

returns :

IsADirectoryError: [Errno 21] Is a directory: '/home/instructor/shared/nltk_data/corpora/brown'

Best

Lucas

nltk.data.path.append('/path/to/data')

sumit srivastava

unread,

Oct 2, 2020, 4:14:52 PM10/2/20

to nltk-...@googlegroups.com

Hey Lucas,

I would recommend loading the Nltk data folder path as a system path at the start of your script.

E.g.,

import sys

sys.path.append(´path_to_nltk_data´)

Regards
Sumit Srivastava

The power of imagination makes us infinite...

--
You received this message because you are subscribed to the Google Groups "nltk-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nltk-users+...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/nltk-users/2e886004-fd9c-4ed7-aff5-9f67478906aan%40googlegroups.com.

Naveen Kumar Baskaran

unread,

Oct 5, 2020, 4:08:54 AM10/5/20

to nltk-...@googlegroups.com

Hi ,

It seems that '/home/instructor/shared/nltk_data/corpora/brown is a directory path not a file.

If you are using nltk.data.load please specify the file name with your path.

Or if you want oath use nltk.data.path.

Hope this helps..!

Reply all

Reply to author

Forward