MapReduce boundedbroadcast range

Kyungyong Lee

unread,

Dec 12, 2009, 2:00:19 PM12/12/09

to aci...@googlegroups.com

Hi all,

I notice a point to think about for MapReduce and boundedbroadcast operation. If we use bi-directional boundedbroadcast, we can specify both start address and end address as a generate tree argument. Let’s assume that node “A” got a MapReduce task whose boundedbroadcast range is from “C” to “E”, which does not contain node “A”. In this case, is it a correct operation for node “A” to process the Map task though it is not in the boundedbroadcast region? Using current MapReduceComputation.Start() method, the MapReduce task initiation node always performs Map task without checking the node is in the boundedbroadcast region or not. How do you think about it?

Best Regards,

Kyungyong Lee

twcho...@gmail.com

unread,

Dec 14, 2009, 3:40:22 PM12/14/09

to acis.p2p

I'm not sure if I fully understand your question or not.
Correct me if I'm wrong.

As you can see in the code, 'bi-directional' bounded broadcast class
supports
'out of range' operation.
In the 'out of range' mode, node 'A' tries to find a node whose
address is nearest to a node in the middle of range
(C,E) by greedy routing.
Since current 'bi-directional' bounded broadcast class can deal with
both 'in-range' and 'out-range' operation,
I think, it is not necessary to check for the node to be in the range.
Now, what do you think?

Kyungyong Lee

unread,

Dec 14, 2009, 3:58:31 PM12/14/09

to aci...@googlegroups.com

Thank you for the reply Taewoong-hyung("hyung" is a Korean word when people
call each other :)

What I wanted to address is : When a node, which is located outside
boundedbroadcast region, receives MapReduce task, should the node perform
"Map" task or not? For example, address space is from "A" to "Z", and 26
nodes are in the network. I want to count a number of nodes in the range
[C,E]. The mapreduce task is sent to node "A". Because node "A" is not
located in the [C,E], it will forward the MapReduce task to C,D, or E. If
node "A" performs Map task while forwarding the task, returned result will
be "4" (node count for A, C, D, and E). If node "A" does not perform Map
task, because it is not in the boundedbroadcast range, returned result will
be "3" (node count for C, D, and E). How do you think about it? Which one
would be appropriate action?

Best Regards,
Kyungyong Lee

--

You received this message because you are subscribed to the Google Groups
"acis.p2p" group.
To post to this group, send email to aci...@googlegroups.com.
To unsubscribe from this group, send email to
acisp2p+u...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/acisp2p?hl=en.

twcho...@gmail.com

unread,

Dec 14, 2009, 4:16:41 PM12/14/09

to acis.p2p

I might say that depends on your purpose.
In the case you mentioned, the return value should be 3 instead of 4.
Yes, we don't have any method to exclude 'A' from counting.

I will try to resolve this issue.
It would be good if a user can select either way.
Let me know if you have any idea.

P. Oscar Boykin

unread,

Dec 15, 2009, 12:23:31 PM12/15/09

to aci...@googlegroups.com

The issue here, as I understand it, is the following:

1) Map(Tree)Reduce creates a tree starting from the first node on
which the call is made. As it is a tree, every child that receives a
message is a member of the tree.
2) All members of the tree have the Map function called on them.
3) In some cases, we actually want a subset of the tree (say those in
some address range only) to execute their map function.

How to handle this?

Three solutions:

1) Make the first MapReduce call via RPC using a Greedy sender on the
node exactly in the middle of the range you want to do broadcast on.
Or, if you prefer to load balance, on a randomly selected address in
the range. Since the normal routing will skip all the MapReduce work
until you reach the node closest to the target, no nodes outside of
the range will be reached.

2) The Map function can take arguments (map_args) generated by the
GenerateTree function. We could pass an argument to the map function
such as a boolean value "run". If run is false, just return false.
If run is true, run the map function.

The above keep MapReduce simple and don't change the existing
structure. I think these approaches are best. If there is sufficient
need:

3) We can extend the complexity of the MapReduce framework to deal
with selection subsets of the Tree. I don't immediately see an
elegant way to do this. If anyone has any ideas, I'm interested in
hearing them. I am generally biased against adding complexity.
Simple rules are best.

POB.

--
P. Oscar Boykin http://boykin.acis.ufl.edu
Assistant Professor, Department of Electrical and Computer Engineering
University of Florida

Kyungyong Lee

unread,

Dec 15, 2009, 6:48:03 PM12/15/09

to aci...@googlegroups.com

Thank you for the comment, Dr. Boykin. Comments inlined.

Best,
Kyungyong Lee

-----Original Message-----
1) Make the first MapReduce call via RPC using a Greedy sender on the
node exactly in the middle of the range you want to do broadcast on.
Or, if you prefer to load balance, on a randomly selected address in
the range. Since the normal routing will skip all the MapReduce work
until you reach the node closest to the target, no nodes outside of
the range will be reached.

[Kyungyong Lee] If no node exists in the range where I want to send a
broadcast message, the MapReduce will be sent to outside the range. In this
case, MapReduceRangeCounter will return "count = 1", which is not still
correct.

2) The Map function can take arguments (map_args) generated by the
GenerateTree function. We could pass an argument to the map function
such as a boolean value "run". If run is false, just return false.
If run is true, run the map function.

[Kyungyong Lee] It seems to be a feasible and simple method. I will try it,
and see what would happen.

Reply all

Reply to author

Forward