This post is created for the folks wanting to join the effort in
creating sound clustering design.
It is hard to start building such complicated system from the first
attempt. So I suggest
to create other, well defined problem, which will give us vital
understanding of the Dremel clustering.
We will take famous distributed word count problem - the classical
example in most explanations about the Map-Reduce. (You can read about
it here
http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html)
So now the task we have - how to implement distributed word count, on
the cluster of computers. The main requirement (aside of counting
itself)
is low latency. It should be no more then 1 second.
Please respond to this mail with brief explanation - how would you
implement such task. Feel absolutely free about tools, libraries.
Regards,
David