map reduce in R for accumulo

86 views
Skip to first unread message

Madhvi Gupta

unread,
Sep 14, 2015, 11:12:46 AM9/14/15
to RHadoop

I am working to make mapreduce run over accumulo data in hdfs through R.
As you have made that for hbase I am not getting where to bind java code
that is made for accumulo input format.Can you please help me out of
this. Also I want to know what exactly to do with accumulo input format
to make it compatible for amp reduce? Can I use the AccumuloInputFormat
class that is made to run mapreduce on accumulo in java?

Antonio Piccolboni

unread,
Sep 15, 2015, 11:38:03 AM9/15/15
to rha...@googlegroups.com
That would be my first step. Run a streaming job with option -inputformat whatever.whatever.AccumuloInputFormat and a trivial map e.g. -mapper cat and see what is in the output. If that's something you can parse from R, then you have to write an rmr input format, unless one of the built-ins, typically csv with some options, does the job. Maybe the mapping of the accumulo data model to R data using this approach won't be good enough, then you have to write your own java class, but that's only plan B. Another thing you can do in a second phase, for efficiency, is to use streaming binary representation, it is explained here. Then in R you would use the typedbytes input format.  To recap


data -> java input format -> text or typedbytes -> rmr input format -> map function

Makes sense?

Antonio
 

--
post: rha...@googlegroups.com ||
unsubscribe: rhadoop+u...@googlegroups.com ||
web: https://groups.google.com/d/forum/rhadoop?hl=en-US
---
You received this message because you are subscribed to the Google Groups "RHadoop" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rhadoop+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Madhvi Gupta

unread,
Oct 1, 2015, 1:37:51 AM10/1/15
to RHadoop, ant...@piccolboni.info
Thanku for your help.But please let me know where and how to run a streaming job you are talking about?what should be done in that?
Reply all
Reply to author
Forward
0 new messages