Hi,
I wanted to understand how mapreduce happens using KFS as file system in Hadoop. The KFS-with-Hadoop wiki says:
# ./bin/start-mapred.sh
If the map/reduce job/task trackers are up, all I/O will be done to KFS.
So, suppose my input files are scattered in different nodes(Kosmos servers), how do I(hadoop client using KFS as file system) issue a Mapreduce command (i mean the format of MapReduce command)? Please rectify me if I am wrong but I suppose that the location of input files to Mapreduce is being returned by the function getFileBlockLocations (FileStatus, long, long).
Moreover, after issuing a Mapreduce command would my hadoop client fetch all the data from different servers to my local machine and then do a Mapreduce or would it start the TaskTracker daemons on the machine(s) where the input file(s) are located and perform a Mapreduce there?
Thank you very much for your time and helping me out.
Regards, Nikhil