2.4gb NLTK data

35 views
Skip to first unread message

Harald Schilly

unread,
Dec 14, 2016, 11:05:47 AM12/14/16
to sage-cloud, sage-cloud-members
No need to download anything extra any more. The NLTK -- www.nltk.org
-- corpus data is hosted on SMC now:

http://blog.sagemath.com/python/2016/12/14/nltk-corpus.html

-- Harald

Samuel Lelièvre

unread,
Dec 15, 2016, 5:33:45 AM12/15/16
to sage-cloud, sage-clou...@googlegroups.com
For anyone wondering, NLTK stands for "Natural language toolkit".
Quoting the home page for NLTK at http://www.nltk.org

"""
NLTK is a leading platform for building Python programs to work with
human language data. It provides easy-to-use interfaces to over 50
corpora and lexical resources such as WordNet, along with a suite of
text processing libraries for classification, tokenization, stemming,
tagging, parsing, and semantic reasoning, wrappers for
industrial-strength NLP libraries, and an active discussion forum.

Thanks to a hands-on guide introducing programming fundamentals alongside
topics in computational linguistics, plus comprehensive API documentation,
NLTK is suitable for linguists, engineers, students, educators, researchers,
and industry users alike. NLTK is available for Windows, Mac OS X, and Linux.
Best of all, NLTK is a free, open source, community-driven project.

NLTK has been called “a wonderful tool for teaching, and working in,
computational linguistics using Python,” and “an amazing library to play
with natural language.”

Natural Language Processing with Python provides a practical introduction
to programming for language processing. Written by the creators of NLTK,
it guides the reader through the fundamentals of writing Python programs,
working with corpora, categorizing text, analyzing linguistic structure,
and more. The book is being updated for Python 3 and NLTK 3. (The original
Python 2 version is still available at http://nltk.org/book_1ed.)
"""

Reply all
Reply to author
Forward
0 new messages