[nltk-dev] NLTK and EC2

Steven Bird

unread,

May 1, 2010, 8:36:43 AM5/1/10

to nltk-dev

Has anyone here tried installing NLTK code and data on Amazon EC2? -Steven

--
You received this message because you are subscribed to the Google Groups "nltk-dev" group.
To post to this group, send email to nltk...@googlegroups.com.
To unsubscribe from this group, send email to nltk-dev+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/nltk-dev?hl=en.

JanStrunk

unread,

May 1, 2010, 8:54:40 AM5/1/10

to nltk-dev

Hi Steven,

Nitin Madnani had a talk at PyCon 2010 in Atlanta titled
"The Python and the Elephant: Large Scale Natural Language Processing
with NLTK and Dumbo"
where he used a Python interface to Hadoop to do cloud-computing using
NLTK.
I think he demonstrated his programs using Amazon EC2.

http://us.pycon.org/2010/conference/schedule/event/98/

Best regards,

Jan
str...@linguistics.rub.de

UMIACS

unread,

May 1, 2010, 9:51:12 AM5/1/10

to nltk...@googlegroups.com, nltk-dev

Hi Steven

Indeed, I used NLTK for my EC2 experiments in the PyCon talk. I tried two ways:

(a) I created a new Amazon Machine Image(AMI) which contained NLTK installed as part of the filesystem.

(b) Dumbo allows one to specify what eggs to transfer to ec2 for a streaming job by using the '-libegg' option and so I basically specified the nltk and yaml eggs as part of the (local) command line.

To be honest, (b) was easier and worked better than (a) but obviously only allows the use of the nltk code and not data. For that additional work would be needed to install the data as part of the filesystem. I was using external larger datasets and so that was not a problem for me.

To replicate my PyCon experiments, see http://www.umiacs.umd.edu/~nmadnani/pycon/replicate.pdf

I have also been thinking about creating a public AMI that would contain all of nltk, all of it's data and all of the contrib stuff. However, generally the problem Is that then you have to create a new AMI everytime you want go update the nltk codebase. Option (b) is better since you can just specify the relevant egg file.

I would be more than happy to be involved if you wanted to create some nltk-based ec2 resources.

Cheers,
- Nitin

Steven Bird

unread,

May 1, 2010, 5:22:54 PM5/1/10

to nltk-dev

2010/5/1 UMIACS <nmad...@umiacs.umd.edu>:

> I would be more than happy to be involved if you wanted to create some nltk-based ec2 resources.

Thanks Nitin. I forgot you'd already been doing this.

I'd like to set up an API for working with bitexts. Let's take this
offline, and report back to nltk-dev if it goes anywhere.

Darren Govoni

unread,

May 26, 2010, 4:35:26 PM5/26/10

to nltk...@googlegroups.com

I'm running an Ubuntu 64bit server with NLTK installed, on EC2.

PRAVEEN

unread,

Apr 5, 2013, 3:25:41 AM4/5/13

to nltk...@googlegroups.com

Hi Steven Bird
I have Hadoop 1.0.0 installed on my Ubuntu natty 11.04. Now How to install NLTK in Apache Hadoop ? Please help me with steps

Carlos Rodriguez

unread,

Apr 5, 2013, 3:37:43 AM4/5/13

to nltk...@googlegroups.com

On Fri, Apr 5, 2013 at 9:25 AM, PRAVEEN <grprav...@gmail.com> wrote:

install NLTK in Apache Hadoop

Hi Praveen,

I am handling Steven's NLTK list responsibilities while he is out doing field work, but if you search a bit in the NLTK site and the discussions you will find some hints, for example:

http://ww2.cs.mu.oz.au/~pbone/papers/nltk-hadoop.pdf

http://blog.cloudera.com/blog/2010/03/natural-language-processing-with-hadoop-and-python/

In general, he can't do individualised tutoring and you'll need to ask the forums

Good luck

Best

Carlos

JAGANADH G

unread,

Apr 5, 2013, 4:13:46 AM4/5/13

to nltk...@googlegroups.com

On Fri, Apr 5, 2013 at 12:55 PM, PRAVEEN <grprav...@gmail.com> wrote:

Hi Steven Bird
I have Hadoop 1.0.0 installed on my Ubuntu natty 11.04. Now How to install NLTK in Apache Hadoop ? Please help me with steps

Hi Praveen,

There is no need to install NLTK inside hadoop. it should be available in the node systems where Hadoop is present. I assume that you are using Dubo/MRJob to do some Hadoop+NLTK experiments.

Best regards

--
**********************************
JAGANADH G
http://jaganadhg.in
ILUGCBE
http://ilugcbe.org.in

Reply all

Reply to author

Forward