I am teaching a large class using JupyterHub. I want to deploy a Jupyter Notebook that uses nltk. Each student should be able to access nltk's verson of the Brown corpus via a shared folder in a central location, so that no student has to download this corpus. Similarly for other datasets.
Using nltk.download("brown") I have downloaded the brown corpus and others into a folder nltk_data and moved this folder into the shared directory:
(otter-env) instructor@jupyter-lc126:~$ ls shared/nltk_data/corpora/
brown gutenberg inaugural reuters.zip state_union.zip webtext.zip
brown.zip gutenberg.zip inaugural.zip state_union webtext
(otter-env) instructor@jupyter-lc126:~$ ls shared/nltk_data/corpora/brown
ca01 ca28 cb10 cc10 ce03 (...)
How do I allow students to load these corpora from the shared directory?
Failed attempt:
nltk.data.load('/home/instructor/shared/nltk_data/corpora/brown/', format="text")
returns :
IsADirectoryError: [Errno 21] Is a directory: '/home/instructor/shared/nltk_data/corpora/brown'
Best
Lucas
nltk.data.path.append('/path/to/data')