how to use nltk.path.load() to load corpora from a shared folder?

Lucas Champollion

Oct 2, 2020, 4:01:08 PM10/2/20
to nltk-users
I am teaching a large class using JupyterHub. I want to deploy a Jupyter Notebook that uses nltk. Each student should be able to access nltk's verson of the Brown corpus via a shared folder in a central location, so that no student has to download this corpus. Similarly for other datasets.

Using"brown") I have downloaded the brown corpus and others into a folder nltk_data and moved this folder into the shared directory:

(otter-env) instructor@jupyter-lc126:~$ ls shared/nltk_data/corpora/
brown      gutenberg      inaugural  state_union  webtext
(otter-env) instructor@jupyter-lc126:~$ ls shared/nltk_data/corpora/brown
ca01  ca28      cb10  cc10  ce03  (...)

How do I allow students to load these corpora from the shared directory?

Failed attempt:'/home/instructor/shared/nltk_data/corpora/brown/', format="text")

returns :

IsADirectoryError: [Errno 21] Is a directory: '/home/instructor/shared/nltk_data/corpora/brown'


sumit srivastava

Oct 2, 2020, 4:14:52 PM10/2/20
Hey Lucas,

I would recommend loading the Nltk data folder path as a system path at the start of your script.

import sys

Sumit Srivastava

Naveen Kumar Baskaran

Oct 5, 2020, 4:08:54 AM10/5/20
Hi ,

It seems that '/home/instructor/shared/nltk_data/corpora/brown is a directory path not a file.
If you are using please specify the file name with your path.
Or if you want oath use

Hope this helps..!

