copying a text file with integers to all the data nodes in the cluster

Rajasri

unread,

Jan 29, 2013, 9:18:38 AM1/29/13

to chenn...@googlegroups.com

hi all. I want to copy a text file with integers to all the data nodes in the cluster. I am using java. Which Hadoop API can i use? Will i need MapReduce for this ? When i tried with DistributedCache, i am getting error while writing the code itself ie while addCacheFile(uri,configuration). Any suggestions or solutions ? . Another doubt is "for Mapper<k,v,k,v> what datatype should i give for k and v to pass text file with integer contents" ? Thanks in advance!

Regards,

Rajasri

Ashwanth Kumar

unread,

Jan 29, 2013, 10:28:27 AM1/29/13

to chenn...@googlegroups.com

To answer your second question - Everything emitted by map() / reduce() using context.write() are Tuples. Tuples where both the items implement the Writable interface. So, you can either pass the entire Text content, or if the file has to be read locally / HDFS pass the file path -- in both these cases the K/V can be Text. Depending on how you want to reduce / want the output.

Quick Questions --

Why would you want to copy a text file to all the nodes and process them separately?
If it is a single text file and if the contents can be help in-memory why don't you add it as a resource to your JAR and process it in each mapper?
Is concatenating all the file contents (if the # files are large) to a single file an option? Then you can effectively MRify the file to your needs.

--
You received this message because you are subscribed to the Google Groups "Hadoop User Group (HUG) Chennai" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chennaihug+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--

Ashwanth Kumar / ashwanthkumar.in

Rajasri janakiraman

unread,

Jan 29, 2013, 10:38:41 AM1/29/13

to chenn...@googlegroups.com

1.since u told that HDFS is distributed, i dropped the idea of copying the text file to all nodes.

2. My actual need is to write the feature vector of an image into a text file (a single line ) [ say img1.txt]. I will have a text file in my HDFS which contains few lines of numbers which represent the feature vector of various other images stored in the HDFS. Each line represent each image [say texture.txt] . What i want to do is , compare the 1st token of img1.txt with all the lines' first token of [texture.txt]. I have to repeat the same for all the token of [img1.txt] to all the columns of [texture.txt].

Regards,

Rajasri.J

Ashwanth Kumar

unread,

Jan 29, 2013, 10:56:51 AM1/29/13

to chenn...@googlegroups.com

Emit each column of the img1.txt and texture.txt with the same key, so that you can reduce them.

Rajasri janakiraman

unread,

Jan 29, 2013, 11:11:00 AM1/29/13

to chenn...@googlegroups.com

Thank u :) Will try it out!

Regards,

Rajasri.J

Reply all

Reply to author

Forward