Glitch in "Hands-on Exercise 2"

John Valenti

unread,

Aug 11, 2010, 11:50:12 PM8/11/10

to VSCSE Big Data for Science 2010

I'm looking at the page http://salsahpc.indiana.edu/tutorial/hadoopwc2.html

First, thanks for these improved instructions for doing the tutorials!

I'm trying to run the Hadoop wordcount on our cluster. I started by
installing Hadoop stable from Apache (so I'm already ignoring your
instructions...). Then I retrieved the Hadoop-WordCount.zip. Then
following the instructions, I tried to copy wordcount.jar, but it
doesn't exist where the instructions say. I finally found it, in your
version of Hadoop-standalone.tar.

My reason for using the Apache Hadoop.tar was to use their
instructions and continue on to setting up a hadoop cluster. And I'll
be losing my FutureGrid account shortly.

Perhaps the jar file should be included in the zip, to make your
instructions more generalized?
thanks!

Saliya Ekanayake

unread,

Aug 12, 2010, 9:15:34 AM8/12/10

to vscse-big-data-...@googlegroups.com

Hi,

Thanks for the feedback. In fact, the wordcount.jar is created when you build the sample using build.sh. This is the reason why we have not kept a already built jar inside the sample. But as it seems we will try to host a set of jar files for the samples in a separate location for anyone wishing to use them out of the box.

Regards,
Saliya

John Valenti

unread,

Aug 12, 2010, 12:03:21 PM8/12/10

to VSCSE Big Data for Science 2010

Ah, I see that now. I had skipped Exercise 1 since it seemed Hadoop-
Wordcount was already written and I just wanted to run it. So I never
ran build.sh

I was able to run wordcount in your version of Hadoop, but not in the
Apache stable hadoop. Still working on that.

Your instructions are probably better (more details, easier to follow
steps) than the wordcount example that Apache uses for a tutorial. It
would be great if they could be generalized to run in other
environments.

Saliya Ekanayake

unread,

Aug 12, 2010, 12:24:59 PM8/12/10

to vscse-big-data-...@googlegroups.com

Please see the in-line comments.

On Thu, Aug 12, 2010 at 12:03 PM, John Valenti <val...@msu.edu> wrote:

Ah, I see that now. I had skipped Exercise 1 since it seemed Hadoop-
Wordcount was already written and I just wanted to run it. So I never
ran build.sh

I was able to run wordcount in your version of Hadoop, but not in the
Apache stable hadoop. Still working on that.

Great! for the first part. What is the error that you get when you run with your local Hadoop installation?

Your instructions are probably better (more details, easier to follow
steps) than the wordcount example that Apache uses for a tutorial.

It would be great if they could be generalized to run in other
environments.

Are you referring to the instructions in our tutorial? Anyway, we tried to keep the instructions to run the samples as independent as possible from the Hadoop environment that you have and push all the configuration details to a separate page. We will see where we can improve as you have suggested to make it less painful to run in an environment other than FG.

Thank you,
Saliya

Reply all

Reply to author

Forward