Distributed Machine Learning libraries in RHadoop

50 views
Skip to first unread message

S Kumar

unread,
Jun 26, 2014, 2:30:57 PM6/26/14
to rha...@googlegroups.com
Does RHadoop support any machine learning libraries that allow parallelization of algorithms (i.e Distributed libraries/packages for machine learning algorithms ) or should we have to write them all on our own (Detailed steps without the help of functions in pakages) ?
If that is the case, what is the real essence of RHadoop? 
Kindly help me with this.

Antonio Piccolboni

unread,
Jun 26, 2014, 2:44:47 PM6/26/14
to rha...@googlegroups.com


On Thursday, June 26, 2014 11:30:57 AM UTC-7, S Kumar wrote:
Does RHadoop support any machine learning libraries that allow parallelization of algorithms (i.e Distributed libraries/packages for machine learning algorithms )

The answer probably hinges on what you mean by support, but if I am guessing right and you mean: can RHadoop work as an additional backend like snow or parallel for libraries that already allow to choose among those, the answer is no, and not for the foreseeable future as the programming models are different and the goals are different (the size of the computations that are targeted).
 
or should we have to write them all on our own (Detailed steps without the help of functions in pakages) ?

If calling the functions in the package is not of any help, then I am not sure in what other ways you could use a package.
 
If that is the case, what is the real essence of RHadoop? 

Somebody reported redoing a R Hadoop project in about 1/30 of the time using rmr2 (part of RHadoop). It's only an anecdote and probably a sweet spot, but the essence of rmr2 is in the in the other 29/30 of the time spent on a tropical beach or equivalent while other people are still struggling with issues like serialization, temporary file management, type system mapping etc. The point is to multiply your productivity as an algorithm implementor several-fold. It doesn't provide analysis algorithms. That's an important but separate task that RHadoop aims to facilitate.


Antonio
Reply all
Reply to author
Forward
0 new messages