On Fri, Sep 18, 2009 at 2:13 PM, Eric Christeson <
Eric.Ch...@ndsu.edu> wrote:
> This sounds like an interesting project topic. I could research and
> evaluate possible options, depending on how much you've done, and possibly
> move into prototype/implementation if time allows.
> I'm interested enough in the project that I'd like to continue contributing
> beyond the end of class.
Well, I've really just been thinking about it. Here is one idea I had:
Lets say that we have 2 computers, each with 4 "buckets" of data, we label these buckets as follows:
Server A:
A1
A2
A3
A4
Server B:
B1
B2
B3
B4
Clearly in a realistic scenario each computer may have millions of buckets, but for purposes of explanation I'll keep it simple.
Each bucket contains several pieces of data which may contain "transitions" to data in other buckets. A "transition" is where a thread reads one piece of data, and then reads another piece of data. If a transition crosses between machines, this indicates a place where the continuation must be serialized and transferred across the network. We know how many transitions exist between every pair of buckets across both servers.
Our goal is to rearrange these buckets to minimize the number of transitions between the two machines. So how do we do this?
My proposal is to break it down into a hierarchical problem. First, we group our buckets:
G1: A1, A2
G2: A3, A4
G3: B1, B2
G4: B3, B4
Now, we ask whether, by swapping A2 and A4, we can reduce the number of transitions between G1 and G2.
For example: Lets say the following transitions exist (note I've made all transitions bi-directional, so this matrix is mirrored down the diagonal):
G1 G2
A1 A2 A3 A4
G1 A1 -- 4 2 4
A2 4 -- 3 3
G2 A3 2 3 -- 2
A4 4 3 2 --
Currently the number of transitions between G1 and G2 is:
A1 -> A3 = 2 +
A1 -> A4 = 4 +
A2 -> A3 = 3 +
A2 -> A4 = 3
= 12
If we swapped A2 and A4 it would be:
A1 -> A2 = 4 +
A1 -> A3 = 2 +
A4 -> A3 = 2 +
A4 -> A2 = 3
= 11
So in this case we do swap and the result is that we reduce the number of transitions between G1 and G2 by 1.
We then do the same for G3 and G4 by seeing whether we can reduce the number of transitions between G3 and G4 by swapping B2 and B4. If so, we do it.
Now we do this again, but instead - we define two new groups, G5, and G6 - these actually correspond to the physical computers:
G5: G1, G2
G6: G3, G4As before, we ask whether we can reduce the number of transitions between G5 and G6 by swapping G2 and G4. If so, we do it.
And if we had more than just 8 buckets, we'd repeat the operation - each time swapping on a larger and larger scale, until eventually we are swapping at the scale of servers, or perhaps even datacenters.
And that is my proposed algorithm. I haven't even tried to simulate it yet even though this should be relatively easy, so I don't know whether it will work, or how well - but at least it provides a baseline.
Properties we need in such an algorithm include:
- It needs to be able to perform its primary function: to rearrange data to reduce the number of transitions between servers
- It needs to be efficient, both in time and inter-computer communication
- It must not "lock" large amounts of data while its operating, it needs to operate in a concurrent environment
I hope that makes sense,
Ian.