The issue here, as I understand it, is the following:
1) Map(Tree)Reduce creates a tree starting from the first node on
which the call is made. As it is a tree, every child that receives a
message is a member of the tree.
2) All members of the tree have the Map function called on them.
3) In some cases, we actually want a subset of the tree (say those in
some address range only) to execute their map function.
How to handle this?
Three solutions:
1) Make the first MapReduce call via RPC using a Greedy sender on the
node exactly in the middle of the range you want to do broadcast on.
Or, if you prefer to load balance, on a randomly selected address in
the range. Since the normal routing will skip all the MapReduce work
until you reach the node closest to the target, no nodes outside of
the range will be reached.
2) The Map function can take arguments (map_args) generated by the
GenerateTree function. We could pass an argument to the map function
such as a boolean value "run". If run is false, just return false.
If run is true, run the map function.
The above keep MapReduce simple and don't change the existing
structure. I think these approaches are best. If there is sufficient
need:
3) We can extend the complexity of the MapReduce framework to deal
with selection subsets of the Tree. I don't immediately see an
elegant way to do this. If anyone has any ideas, I'm interested in
hearing them. I am generally biased against adding complexity.
Simple rules are best.
POB.
--
P. Oscar Boykin
http://boykin.acis.ufl.edu
Assistant Professor, Department of Electrical and Computer Engineering
University of Florida