Spark comparison

194 views
Skip to first unread message

Ankur Chauhan

unread,
Dec 3, 2013, 12:03:17 PM12/3/13
to stratosp...@googlegroups.com
Hi all,


Sitting at spark-summit 2013, I was interested in figuring out if anyone has done a feature comparison and or benchmarks against spark/storm/etc. This may also serve as a "compatibility matrix" and would help a lot when people want to compare the two projects and help us understand what are the strengths and weakness of each project.

-- Ankur

Ufuk Celebi

unread,
Dec 3, 2013, 12:19:57 PM12/3/13
to stratosp...@googlegroups.com
Hey Ankur,

I like the idea of a comparison matrix. We tried to do something similar with Hadoop already (parts of it are on the front page of our website), which we used for a local summit here. Comparing Stratosphere to Spark in this way would be a natural extension to this. ;-)

Internally, we ran some benchmarks against 0.7.3 (unfortunately right before the 0.8 release). We didn't publish the results as there are certain aspects that make the comparison unfair (for example we have no fault tolerance right now whereas Spark does). As soon as we (re-)introduce fault tolerance mechanisms, we will re-run the benchmarks.

I can publish the code for the Stratosphere and Spark programs we looked at on GitHub. If I add Scala versions of the Stratosphere programs, this will also go to your proposed direction of having a direct comparison.

Is there any specific use case where you want to see numbers? Or is it more like a general thing where you want to see how both systems perform?

Best,

Ufuk

--
You received this message because you are subscribed to the Google Groups "stratosphere-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stratosphere-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/stratosphere-dev.
For more options, visit https://groups.google.com/groups/opt_out.

Ankur Chauhan

unread,
Dec 7, 2013, 4:05:13 AM12/7/13
to stratosp...@googlegroups.com
I am more interested in a analytics and log processing capabilities but a general comparison would probably be a good starting point. I'll have a look at the site for hadoop comparison.

Nirvanesque

unread,
Aug 29, 2014, 7:57:35 AM8/29/14
to stratosp...@googlegroups.com
Ufuk and the Flink team,

You and your team are familiar by now with this comparison (Master thesis of Ze Ni in the KTH Institute)
http://www.diva-portal.org/smash/get/diva2:605106/FULLTEXT01.pdf

I would like to know your viewpoints in this direction?

Thanks in advance,
Anirvan
Hadoop-Stratosphere-Spark comparison.pdf
Reply all
Reply to author
Forward
0 new messages