Disco benchmark suite?

123 views
Skip to first unread message

Samuel Harrold

unread,
Mar 14, 2014, 2:54:48 PM3/14/14
to disc...@googlegroups.com
Hello,

I'm evaluating Disco for use by researchers here at UT Austin, and I'd like to make sure I'm testing appropriately. I'm conducting tests similar to those in the post "RE: Performance comparison - Disco vs Hadoop" from 1/2012. My project is to compare Disco's performance with that of Hadoop as measured by the HiBench Hadoop benchmark suite.

Within the standard HiBench Hadoop benchmarks, the most important ones for these research applications are WordCount, TeraSort, and K-means Clustering. Could you recommend any analogous benchmark programs for Disco? If there are none, could you recommend any guidelines for adapting the example scripts wordcount.py/count_words.py and kclustering.py?

Thanks for any advice you can offer, and thanks for making such a useful tool!
Sam
--------------------
Samuel Harrold
Intern, PhD student
Texas Advanced Computing Center
University of Texas at Austin

Shayan Pooya

unread,
Mar 19, 2014, 4:19:12 PM3/19/14
to samuel....@gmail.com, disc...@googlegroups.com
Hello Sam,

* There is not a counterpart of hibench for disco at the moment.
* Using the examples as benchmarks should be straightforward.  Just run the example on your favorite dataset.
* I just added some comments to the kclustering example that should clear things up a little bit:
https://github.com/discoproject/disco/blob/develop/examples/datamining/kclustering.py

* We will be adding more examples soon.  The Disco integration tests can also be consulted for some other tricky things that can be done with Disco.

* The current kclustering example uses map-reduce and is not very efficient.  This example will be ported to Disco pipelines to show a better way for implementing such an algorithm.

Regards.


--
You received this message because you are subscribed to the Google Groups "Disco-development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to disco-dev+...@googlegroups.com.
To post to this group, send email to disc...@googlegroups.com.
Visit this group at http://groups.google.com/group/disco-dev.
For more options, visit https://groups.google.com/d/optout.

Samuel Harrold

unread,
Mar 19, 2014, 4:28:51 PM3/19/14
to Shayan Pooya, disc...@googlegroups.com
Hi Shayan,

Thank you for the pointers. I'll certainly keep them in mind as I compare the test results. Thanks for adding the comments to kclustering. I look forward to more examples of good Disco integration.

Thank you

Parkway

unread,
Mar 28, 2014, 4:26:49 AM3/28/14
to disc...@googlegroups.com, Shayan Pooya
Samuel: Will the disco vs hadoop benchmarks be published when completed?  Very interested in performance difference between erlang/python and java implementation.  

Vivian Delplace

unread,
Mar 28, 2014, 5:25:56 AM3/28/14
to disc...@googlegroups.com
Samuel: As a Master Thesis, I have done similar job in comparing Disco to Mars, a Map-Reduce GPU implementation ( link ). But my point of vue was the energy consumption. If it can help you..


2014-03-28 9:26 GMT+01:00 Parkway <dinesh...@hotmail.com>:
Samuel: Will the disco vs hadoop benchmarks be published when completed?  Very interested in performance difference between erlang/python and java implementation.  

--

Samuel Harrold

unread,
Apr 25, 2014, 8:42:51 PM4/25/14
to disc...@googlegroups.com
Hi Parkway,
Sorry for missing your message. The results of my tests will be incorporated into a guide for users at Texas Advanced Computing Center. As the project stands, researchers on our end will probably choose to use Disco or other tools based on what they can program fastest in instead of computational performance. My tests are meant to be quick examples on big data, not authoritative assessments. If/when the guide is in a useful final state, I'll post a link back to this thread.

Samuel Harrold

unread,
Apr 25, 2014, 8:46:37 PM4/25/14
to disc...@googlegroups.com
Hi Vivian,
Thanks for the link!
Reply all
Reply to author
Forward
0 new messages