Hello,
I'm evaluating Disco for use by researchers here at UT Austin, and I'd like to make sure I'm testing appropriately. I'm conducting tests similar to those in the post "RE: Performance comparison - Disco vs Hadoop" from 1/2012. My project is to compare Disco's performance with that of Hadoop as measured by the
HiBench Hadoop benchmark suite.
Within the standard HiBench Hadoop benchmarks, the most important ones for these research applications are WordCount, TeraSort, and K-means Clustering. Could you recommend any analogous benchmark programs for Disco? If there are none, could you recommend any guidelines for adapting the example scripts
wordcount.py/count_words.py and kclustering.py?
Thanks for any advice you can offer, and thanks for making such a useful tool!
Sam
--------------------
Samuel Harrold
Intern, PhD student
Texas Advanced Computing Center
University of Texas at Austin