Looking for an Hadoop example working on Hadoop profile

Massimo Canonico

unread,

Jun 6, 2018, 8:38:49 AM6/6/18

to cloudlab-users

Hi All,

I'm new about hadoop and I was trying to make the "wordCount" example
working on Hadoop profile.

Following the instruction reported in the "ufficial" guide here:

https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Purpose

It seems that something is missing concerning env variable and maybe even some packet, since (once create the WordCount.java file) and running this command as suggested in the guide:
/usr/local/bin/hadoop-2.7.3/bin/hadoop com.sun.tools.javac.Main WordCount.java

I got some errors concerning the missing library.

So, I'm looking for a working example for Hadoop profile. I'm ok with WordCount or any other "hello_world-like" example.

Thanks,
Massimo

Gary Wong

unread,

Jun 6, 2018, 5:13:02 PM6/6/18

to Massimo Canonico, cloudlab-users

If you're new to Hadoop, I suggest you first experiment with a single
node Hadoop installation as described here:

https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html

because jumping in to a distributed installation somebody else has
already configured for you (if you've had no Hadoop configuration
experience before) is much more complex.

However, using this profile:

https://www.cloudlab.us/instantiate.php?profile=bac3e68d-4fc1-11e7-91c5-90e2ba22fee4

suggested to you earlier, the following steps work for me to import/export
the necessary data in/out of HDFS and submit a distributed word count job:

$ cd /usr/local/hadoop-2.7.3
$ sudo bin/hdfs dfs -mkdir /tmp/gary
$ sudo bin/hdfs dfs -chown gary /user/gary
$ bin/hdfs dfs -put etc/hadoop input
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input output 'dfs[a-z].+'
$ bin/hdfs dfs -get output /tmp/output
$ cat /tmp/output/*

Thanks,
Gary.
--
Gary Wong g...@flux.utah.edu http://www.cs.utah.edu/~gtw/

Massimo Canonico

unread,

Jun 7, 2018, 5:31:07 AM6/7/18

to Gary Wong, cloudlab-users

Hi Gary,

thanks for your reply. I was trying to use the hadoop profile and follow the instruction you provided. I think that something could be wrong in your sequence.

1    $ cd /usr/local/hadoop-2.7.3
2    $ sudo bin/hdfs dfs -mkdir /tmp/gary
3    $ sudo bin/hdfs dfs -chown gary /users/gary
4    $ bin/hdfs dfs -put etc/hadoop input
5    $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input output 'dfs[a-z].+'
6    $ bin/hdfs dfs -get output /tmp/output
7    $ cat /tmp/output/*

(of course I have substitute gary with my username)

In 3, users --> tmp ?

Before 4, should I add the input directory? With sudo bin/hdfs dfs -mkdir /tmp/gary/input ?

Command in 4, without sudo said that I do not have right permission, with sudo I got some java exception:
File /tmp/mex/input/hadoop/mapred-queues.xml.template._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.

Commands in 5, 6 and 7 are runnable without sudo?

I'm running your sequence in the resource manager node.

Thanks in advance.

Best,
M

Gary Wong

unread,

Jun 7, 2018, 10:58:13 AM6/7/18

to Massimo Canonico, cloudlab-users

On Thu, Jun 07, 2018 at 11:31:03AM +0200, Massimo Canonico wrote:
> thanks for your reply. I was trying to use the hadoop profile and follow
> the instruction you provided. I think that something could be wrong in
> your sequence.
>
> 1 $ cd /usr/local/hadoop-2.7.3
> 2 $ sudo bin/hdfs dfs -mkdir /tmp/gary

> 3 $ sudo bin/hdfs dfs -chown gary /*users*/gary

> 4 $ bin/hdfs dfs -put etc/hadoop input
> 5 $ bin/hadoop jar
> share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input
> output 'dfs[a-z].+'
> 6 $ bin/hdfs dfs -get output /tmp/output
> 7 $ cat/tmp/output/*

Whoops! Sorry, I copied and pasted one of the commands from the
wrong place. The correct instructions should be:

$ cd /usr/local/hadoop-2.7.3
$ sudo bin/hdfs dfs -mkdir /user/gary

$ sudo bin/hdfs dfs -chown gary /user/gary
$ bin/hdfs dfs -put etc/hadoop input
$ bin/hadoop jar
share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input
output 'dfs[a-z].+'
$ bin/hdfs dfs -get output /tmp/output
$ cat /tmp/output/*

Thanks,
Gary.

Massimo Canonico

unread,

Jun 7, 2018, 12:38:22 PM6/7/18

to cloudla...@googlegroups.com

Hi after these steps:

mcanonic@resourcemanager:/usr/local/hadoop-2.7.3$ sudo bin/hdfs dfs
-mkdir /user/gary
mcanonic@resourcemanager:/usr/local/hadoop-2.7.3$ sudo bin/hdfs dfs
-mkdir /user/mcanonic
mcanonic@resourcemanager:/usr/local/hadoop-2.7.3$ sudo bin/hdfs dfs
-chown mcanonic /user/mcanonic

the following command generates an error:

> bin/hdfs dfs -put etc/hadoop input

mcanonic@resourcemanager:/usr/local/hadoop-2.7.3$ bin/hdfs dfs -put
etc/hadoop input
18/06/07 10:26:08 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
/user/mcanonic/input/capacity-scheduler.xml._COPYING_ could only be

replicated to 0 nodes instead of minReplication (=1). There are 0
datanode(s) running and no node(s) are excluded in this operation.

what surprised me is that the hadoop profile boots with problem with 4
machines (1 resource manager, 3 slaves and 1 namenode)

So here it could be:
- Is there something else to do in the Hadoop CloudLab profile
- the Hadoop CloudLab profile is missing a "datanode" machine
or something else?

Best,
Massimo

Reply all

Reply to author

Forward