I'm not sure what your unicode_csv_reader does but I'm guessing the problem is there as nltk works with unicode. So I'm guessing that in unicode_csv_reader you are trying to encode/decode something with the wrong codec.
I tried to make simple web app to test the interaction of NLTK in PythonAnywhere but received a"500 internal server error". What I tried to do was to get a text query from the user and return nltk.word_tokenize(). My init.py funcion contains:
Resource u'tokenizers/punkt/english.pickle' not found. Please use the NLTK Downloader to obtain the resource: >>> nltk.download() Searched in: - '/home/funderburkjim/nltk_data' - '/usr/share/nltk_data' - '/usr/local/share/nltk_data' - '/usr/lib/nltk_data' - '/usr/local/lib/nltk_data' - u''
Thanks for details on getting the nltk download. For anyone else who may need theparticular file required by nltk.word_tokenize , the download code is 'punkt',so nltk.download('punkt') does the download.Incidentally, the download puts the file in a place that the nltk calling method knows about,which is a nice detail.
Python 2 and 3 live in different worlds, they have their own environments and packages. In this case, if you just need a globally installed package available from the system Python 3 environment, you can use apt to install python3-nltk:
.raw() is another method that exists in most corpora. By specifying a file ID or a list of file IDs, you can obtain specific data from the corpus. Here, you get a single review, then use nltk.sent_tokenize() to obtain a list of sentences from the review. Finally, is_positive() calculates the average compound score for all sentences and associates a positive result with a positive review.
The app will attempt to download this file for every instance, which is not effective.
I also have this issue in my app with 4 nltk dependencies. including this in my main file will definitely increase latency.