how to use nltk.path.load() to load corpora from a shared folder?

45 views
Skip to first unread message

Lucas Champollion

unread,
Oct 2, 2020, 4:01:08 PM10/2/20
to nltk-users
I am teaching a large class using JupyterHub. I want to deploy a Jupyter Notebook that uses nltk. Each student should be able to access nltk's verson of the Brown corpus via a shared folder in a central location, so that no student has to download this corpus. Similarly for other datasets.

Using nltk.download("brown") I have downloaded the brown corpus and others into a folder nltk_data and moved this folder into the shared directory:

(otter-env) instructor@jupyter-lc126:~$ ls shared/nltk_data/corpora/
brown      gutenberg      inaugural      reuters.zip  state_union.zip  webtext.zip
brown.zip  gutenberg.zip  inaugural.zip  state_union  webtext
(otter-env) instructor@jupyter-lc126:~$ ls shared/nltk_data/corpora/brown
ca01  ca28      cb10  cc10  ce03  (...)

How do I allow students to load these corpora from the shared directory?

Failed attempt:

nltk.data.load('/home/instructor/shared/nltk_data/corpora/brown/', format="text")

returns :

IsADirectoryError: [Errno 21] Is a directory: '/home/instructor/shared/nltk_data/corpora/brown'

Best
Lucas

nltk.data.path.append('/path/to/data')


sumit srivastava

unread,
Oct 2, 2020, 4:14:52 PM10/2/20
to nltk-...@googlegroups.com
Hey Lucas,

I would recommend loading the Nltk data folder path as a system path at the start of your script.

E.g.,
import sys
sys.path.append(´path_to_nltk_data´)

Regards
Sumit Srivastava

The power of imagination makes us infinite...


--
You received this message because you are subscribed to the Google Groups "nltk-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nltk-users+...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/nltk-users/2e886004-fd9c-4ed7-aff5-9f67478906aan%40googlegroups.com.

Naveen Kumar Baskaran

unread,
Oct 5, 2020, 4:08:54 AM10/5/20
to nltk-...@googlegroups.com
Hi ,


It seems that '/home/instructor/shared/nltk_data/corpora/brown is a directory path not a file.
If you are using nltk.data.load please specify the file name with your path.
Or if you want oath use nltk.data.path.


Hope this helps..!


Reply all
Reply to author
Forward
0 new messages