Mapper scope variables in RHadoop

23 views
Skip to first unread message

thwu

unread,
Jul 23, 2013, 2:04:45 PM7/23/13
to rha...@googlegroups.com
Hi All,

I am currently writing a RHadoop function that gets the top k entries of a data set, ranked by value. I plan to do this by finding the top k of each individual mapper, then finding the top k out of these using the reducer.

In the Java API,  you can create a Mapper class variable such as a TreeMap to keep track of entries in an individual mapper.  However, I'm not sure how this can work with RHadoop, or if it even works with streaming in general.  Insights?

Thanks for your help

Antonio Piccolboni

unread,
Jul 23, 2013, 5:09:33 PM7/23/13
to RHadoop Google Group

In rmr each map call processes  keyval.length items, so you just take the top k of those and return them. But you can also use a stateful map function, read about closures in R.

Antonio

--
post: rha...@googlegroups.com ||
unsubscribe: rhadoop+u...@googlegroups.com ||
web: https://groups.google.com/d/forum/rhadoop?hl=en-US
---
You received this message because you are subscribed to the Google Groups "RHadoop" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rhadoop+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply all
Reply to author
Forward
0 new messages