NLineInputFormat in RHadoop

33 views
Skip to first unread message

Salman Toor

unread,
Apr 10, 2014, 5:58:55 PM4/10/14
to rha...@googlegroups.com
Hi, 

Seems like NLineInputFormat is available


Can I use it in RHadoop? I want to pass fix number of lines to each mapper. This will allow me to distribute equal amount of load on each mapper. If its possible how should I use it? 

I am using hadoop-1.2.1 and this NLineInputFormat is available in docs. 

If that format is not allowed are there other options that I can use? 

Regards.
Salman. 

Antonio Piccolboni

unread,
Apr 10, 2014, 6:30:28 PM4/10/14
to RHadoop Google Group
I don't see why not, but you need to try. The starting point is make.input.format, which will be called like this:

make.input.format(format = function(con) {...} , mode="text", streaming.format="org.apache.hadoop.mapreduce.lib.input.NLineInputFormat")

The format argument is a function accepting an open connection as argument and returning a key-value pair, so it could be something like

function(con){line = readLines(con); keyval(NULL, strsplit(line, pattern = some.separator)[[1]])}

I don't know what the exact format is, so details may vary. The only problem is that there is no way to change the number of lines that this format will read in one step, it will be the default of one. Because of hadoop streaming limitations, the only way to pass arguments to the input format constructor is through the jobconf, and it doesn't look from the docs that this format reads anything from the jobconf, but I may be wrong. So it will work at the default of one line at a time. I hope that works for you


Antonio





--
post: rha...@googlegroups.com ||
unsubscribe: rhadoop+u...@googlegroups.com ||
web: https://groups.google.com/d/forum/rhadoop?hl=en-US
---
You received this message because you are subscribed to the Google Groups "RHadoop" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rhadoop+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Salman Toor

unread,
Apr 10, 2014, 6:43:10 PM4/10/14
to rha...@googlegroups.com
Oh wawoo, you are supper fast! 

Thank you very much for your explanation. I think this will also answer many of the old questions on your group related to the fixed input to the mappers. 

I will give a try and report back. 

Thanks once gain. 

/Salman. 



You received this message because you are subscribed to a topic in the Google Groups "RHadoop" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rhadoop/BC6sNH_Szh4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rhadoop+u...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Salman Toor, PhD
Uppsala University, Sweden.

Antonio Piccolboni

unread,
Apr 10, 2014, 10:31:36 PM4/10/14
to RHadoop Google Group
You helped yourself by finding that input format class! Please let the group know how it pans out.

Antonio
Reply all
Reply to author
Forward
0 new messages