Difference between Scalding, Hadoop and Spark

572 views
Skip to first unread message

stephe...@gmail.com

unread,
Feb 23, 2014, 10:46:18 PM2/23/14
to cascadi...@googlegroups.com
Dear guys,
   I'm wondering what's diff between scalding, hadoop, spark?
   For scalding and hadoop,
   scalding is almost pure domain logic with very little boilerplate compared to hadoop, when writing mapreduce job.
   it with less infrastructure types, less configuring, more focused on the algorithm, maximize expressiveness and extensibility,
   that's its advantage when compared to hadoop,
   There comes my question: Q1?  Except for wrapping on hadoop, scalding get improved on efficiency compared to hadoop?

   Q2? what's diff between scalding and spark? which is more fit for machine learning? 

thanks,
   stephen

Oscar Boykin

unread,
Feb 24, 2014, 2:25:12 AM2/24/14
to cascadi...@googlegroups.com
On Sun, Feb 23, 2014 at 7:46 PM, <stephe...@gmail.com> wrote:
,
   There comes my question: Q1?  Except for wrapping on hadoop, scalding get improved on efficiency compared to hadoop?

Not really, except when it comes to the fact that writing correct and fast hadoop jobs is hard, whereas writing correct jobs with scalding is much easier, and there is some optimization that is applied there (for instance, automatically doing commutative operations map-side with no extra work from the programmer, and in scalding 0.9.0, composable joins which would be VERY hard to write by hand and do in 1 map-reduce phase. Basically, ask: what advantage is using C when we have assembly?
 
   Q2? what's diff between scalding and spark? which is more fit for machine learning? 

The main difference, I think, is that spark prefers to keep items in memory and does not force a total sort of the data when going from mappers to reducers. If your data fits in memory, spark will be faster. If your data does not fit it memory, you may struggle a bit to productionize it.
 

thanks,
   stephen

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/ab7d55ab-1143-471e-a33d-4a55c50d37b9%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
Oscar Boykin :: @posco :: http://twitter.com/posco

stephe...@gmail.com

unread,
Feb 24, 2014, 4:56:58 AM2/24/14
to cascadi...@googlegroups.com
Thanks, Oscar
   I get your point.

在 2014年2月24日星期一UTC+8下午3时25分12秒,Oscar Boykin写道:
Reply all
Reply to author
Forward
0 new messages