Mapreduce using KFS as file system in Hadoop

Nikhil Agarwal

unread,

Feb 22, 2013, 4:07:30 AM2/22/13

to kfs-...@googlegroups.com

Hi,

I wanted to understand how mapreduce happens using KFS as file system in Hadoop. The KFS-with-Hadoop wiki says:

# ./bin/start-mapred.sh
If the map/reduce job/task trackers are up, all I/O will be done to KFS.

So, suppose my input files are scattered in different nodes(Kosmos servers), how do I(hadoop client using KFS as file system) issue a Mapreduce command (i mean the format of MapReduce command)? Please rectify me if I am wrong but I suppose that the location of input files to Mapreduce is being returned by the function getFileBlockLocations (FileStatus, long, long).

Moreover, after issuing a Mapreduce command would my hadoop client fetch all the data from different servers to my local machine and then do a Mapreduce or would it start the TaskTracker daemons on the machine(s) where the input file(s) are located and perform a Mapreduce there?

Thank you very much for your time and helping me out.

Regards, Nikhil

Alex Kashirin

unread,

Feb 23, 2013, 6:13:30 AM2/23/13

to kfs-...@googlegroups.com

the:

# ./bin/start-mapred.sh

If the map/reduce job/task trackers are up, all I/O will be done to KFS.

That has one meaning, instead the HADOOP File System will do the additional work of the re-balancing the file system (replicas, missing blocks..)

that part will be done on other HardDrives (which are set with the KFS), in order to, lower the additional IOs on the HDFS.

welcome, hope that help.

Kashirin Alex

www.EOP-net.com

Nikhil Agarwal

unread,

Feb 24, 2013, 3:29:33 AM2/24/13

to kfs-...@googlegroups.com

Hi Alex,

Thank you for taking out your time and replying to my query. Actually, I need to know that the data on which MapReduce needs to be done, is it fetched from those hard drives (which are set with the KFS) or is the MR job (tasktracker) made to run on these hard drives itself?

Also, is there any free Kosmos servers (some public URL) where i can try to connect using a Hadoop client and then myself see how the MR is actually working on it?

Regards,

Nikhil

Alex Kashirin

unread,

Feb 25, 2013, 9:53:36 AM2/25/13

to kfs-...@googlegroups.com

I do not know of any free service of a KFS instance !

the tasktrackers and jobtracker , are the one use the KFS instead the IOs on the hadoop hardware