Re: [scala-tools] Swarm

Ian Clarke

unread,

Sep 19, 2009, 3:19:00 PM9/19/09

to Eric Christeson, swarm-...@googlegroups.com

On Fri, Sep 18, 2009 at 2:13 PM, Eric Christeson <Eric.Ch...@ndsu.edu> wrote:
> This sounds like an interesting project topic. I could research and
> evaluate possible options, depending on how much you've done, and possibly
> move into prototype/implementation if time allows.
> I'm interested enough in the project that I'd like to continue contributing
> beyond the end of class.

Well, I've really just been thinking about it. Here is one idea I had:

Lets say that we have 2 computers, each with 4 "buckets" of data, we label these buckets as follows:

Server A:

A1

A2

A3

A4

Server B:

B1

B2

B3

B4

Clearly in a realistic scenario each computer may have millions of buckets, but for purposes of explanation I'll keep it simple.

Each bucket contains several pieces of data which may contain "transitions" to data in other buckets. A "transition" is where a thread reads one piece of data, and then reads another piece of data. If a transition crosses between machines, this indicates a place where the continuation must be serialized and transferred across the network. We know how many transitions exist between every pair of buckets across both servers.

Our goal is to rearrange these buckets to minimize the number of transitions between the two machines. So how do we do this?

My proposal is to break it down into a hierarchical problem. First, we group our buckets:

G1: A1, A2
G2: A3, A4
G3: B1, B2
G4: B3, B4

Now, we ask whether, by swapping A2 and A4, we can reduce the number of transitions between G1 and G2.

For example: Lets say the following transitions exist (note I've made all transitions bi-directional, so this matrix is mirrored down the diagonal):

G1 G2

A1 A2 A3 A4

G1 A1 -- 4 2 4

A2 4 -- 3 3

G2 A3 2 3 -- 2

A4 4 3 2 --

Currently the number of transitions between G1 and G2 is:

A1 -> A3 = 2 +

A1 -> A4 = 4 +

A2 -> A3 = 3 +

A2 -> A4 = 3

= 12

If we swapped A2 and A4 it would be:

A1 -> A2 = 4 +

A1 -> A3 = 2 +

A4 -> A3 = 2 +

A4 -> A2 = 3

= 11

So in this case we do swap and the result is that we reduce the number of transitions between G1 and G2 by 1.

We then do the same for G3 and G4 by seeing whether we can reduce the number of transitions between G3 and G4 by swapping B2 and B4. If so, we do it.

Now we do this again, but instead - we define two new groups, G5, and G6 - these actually correspond to the physical computers:

G5: G1, G2
G6: G3, G4

As before, we ask whether we can reduce the number of transitions between G5 and G6 by swapping G2 and G4. If so, we do it.

And if we had more than just 8 buckets, we'd repeat the operation - each time swapping on a larger and larger scale, until eventually we are swapping at the scale of servers, or perhaps even datacenters.

And that is my proposed algorithm. I haven't even tried to simulate it yet even though this should be relatively easy, so I don't know whether it will work, or how well - but at least it provides a baseline.

Properties we need in such an algorithm include:

It needs to be able to perform its primary function: to rearrange data to reduce the number of transitions between servers
It needs to be efficient, both in time and inter-computer communication
It must not "lock" large amounts of data while its operating, it needs to operate in a concurrent environment

I hope that makes sense,

Ian.

--
Ian Clarke
CEO, Uprizer Labs
Email: i...@uprizer.com
Ph: +1 512 422 3588
Fax: +1 512 276 6674

Ian Clarke

unread,

Sep 21, 2009, 2:10:02 PM9/21/09

to akka...@googlegroups.com, swarm-...@googlegroups.com

On Mon, Sep 21, 2009 at 2:15 AM, Jonas Bonér <jbo...@gmail.com> wrote:

I've been looking at Swarm for a couple of years now. I really like
your ambitious goals.

Thanks, although the Swarm concept has yet to reach its first birthday so perhaps you meant a couple of months?

Akka might help with other parts of Swarm as well, such as providing
fault-tolerance and remoting.

Thanks for reaching out. I'm excited about it.

Thanks Jonas! Really the scale of the overall task is somewhat daunting, but fortunately I think it can be broken down into interesting and manageable sub-problems, each (I think) well suited to a several-week academic project, which is why I hope to reach out to students in particular.

I'd love to get some discussion going on the Swarm mailing list (see http://groups.google.com/group/swarm-discuss) so if you have any thoughts please feel free to post them there, or to respond to any existing posts.

Regards,

Eric J. Christeson

unread,

Sep 22, 2009, 8:50:36 PM9/22/09

to Swarm Discussion

On Sep 19, 2:19 pm, Ian Clarke <ian.cla...@gmail.com> wrote:
> On Fri, Sep 18, 2009 at 2:13 PM, Eric Christeson <Eric.Christe...@ndsu.edu>

> - It needs to be able to perform its primary function: to rearrange data

> to reduce the number of transitions between servers

> - It needs to be efficient, both in time and inter-computer communication
> - It must not "lock" large amounts of data while its operating, it needs

> to operate in a concurrent

A few questions, hopefully they are relevent.

1. Are the transitions relatively easy to find through code analysis?

2. Is there a central node that is responsible for scheduling/moving
continuations? Maybe it's a loosely organized system where the node
that starts a computation is responsible for distributing data?

3. How to handle node failure? I know Hadoop schedules jobs on more
than one node for redundancy.

4. Some sort of directory will be needed. With this algorithm, each
node would only need to know about other nodes involved in its
transitions.

5. I assume there is not a limit on data bins, they could be from a
few bytes to multi-gigabyte files.

6. Will this algorithm scale? Maybe it's no worse than anything else.

Just some random thoughts as I digest what's been put out so far.

Eric

Ian Clarke

unread,

Sep 22, 2009, 9:05:57 PM9/22/09

to swarm-...@googlegroups.com

On Tue, Sep 22, 2009 at 7:50 PM, Eric J. Christeson <eric.j.c...@gmail.com> wrote:

1. Are the transitions relatively easy to find through code analysis?

I doubt it, it would probably contradict the Church–Turing Theorem :-)

My assumption is that a "supervisor" process would observe the program running and spot frequent transitions that way (which would be easy). This is similar to the approach taken with JIT compilation.

2. Is there a central node that is responsible for scheduling/moving
continuations? Maybe it's a loosely organized system where the node
that starts a computation is responsible for distributing data?

I certainly wouldn't rule out some form of centralization in the architecture if it proves necessary, however I'm not quite sure why it would be needed for moving continuations.

If the node running the code knows the IP address and port of the node containing the data, then why would it need a third-party to transfer the continuation to it?

3. How to handle node failure? I know Hadoop schedules jobs on more
than one node for redundancy.

Redundancy is definitely something Swarm will need to support.

4. Some sort of directory will be needed. With this algorithm, each
node would only need to know about other nodes involved in its
transitions.

Right, but I would suggest that the reference to the data itself would contain all the data required to transfer the continuation to the node containing the data. I don't think there is any need for nodes to be persistently aware of each-other for the purpose of transferring continuations (load balancing and garbage collection are a different matter).

5. I assume there is not a limit on data bins, they could be from a
few bytes to multi-gigabyte files.

Yes, the bottom-level data bins should be small, perhaps even consisting of individual objects, however the top-level data bins could represent gigabytes, terabytes, hell, they could be yottabytes :-)

6. Will this algorithm scale? Maybe it's no worse than anything else.

I don't know, but I think it will. It can be parallelized, and assuming parallelization it should scale logarithmically.

Just some random thoughts as I digest what's been put out so far.

Of course, happy to answer any further questions.

Ian Clarke

unread,

Sep 22, 2009, 9:12:25 PM9/22/09

to swarm-...@googlegroups.com

On Tue, Sep 22, 2009 at 8:05 PM, Ian Clarke <ian.c...@gmail.com> wrote:

On Tue, Sep 22, 2009 at 7:50 PM, Eric J. Christeson <eric.j.c...@gmail.com> wrote:

4. Some sort of directory will be needed. With this algorithm, each
node would only need to know about other nodes involved in its
transitions.

Right, but I would suggest that the reference to the data itself would contain all the data required to transfer the continuation to the node containing the data. I don't think there is any need for nodes to be persistently aware of each-other for the purpose of transferring continuations (load balancing and garbage collection are a different matter).

Oops, sorry - I didn't read your question carefully.

You are right, we will need a mechanism for nodes to share information with each-other about transitions to allow a decision to be made about swapping in my proposed balancing algorithm.

Ian.

Ben Harris

unread,

Sep 30, 2009, 12:15:33 PM9/30/09

to Swarm Discussion

The most important thing is to make sure the hypervisor that handles
redistribution of data is scalable and pluggable.

There are some interesting applications of ant colony optimisation to
problems that are similar to this (Traffic routing in large networks).
The benefit of ant colony is that it can be used as an online
algorithm that continuously adapts to changes in how the swarm is
running. I haven't spent any time to actually look at this problem,
but if the system is pluggable then it will allow for custom solutions
that match certain problems, and testing of other solutions that are
good for general problems.

On Sep 23, 1:12 am, Ian Clarke <ian.cla...@gmail.com> wrote:

Ian Clarke

unread,

Sep 30, 2009, 12:23:40 PM9/30/09

to swarm-...@googlegroups.com

On Wed, Sep 30, 2009 at 11:15 AM, Ben Harris <bhar...@gmail.com> wrote:

The most important thing is to make sure the hypervisor that handles
redistribution of data is scalable and pluggable.

There are some interesting applications of ant colony optimisation to
problems that are similar to this (Traffic routing in large networks).
The benefit of ant colony is that it can be used as an online
algorithm that continuously adapts to changes in how the swarm is
running. I haven't spent any time to actually look at this problem,
but if the system is pluggable then it will allow for custom solutions
that match certain problems, and testing of other solutions that are
good for general problems.

I agree, I certainly don't assert that my proposal is the best approach, or even that it will work at all! We'll have to simulate various options.

Its a hard, but interesting problem, I hope that someone can take this on, first step would be to create a simple simulator (can be completely independent from the Swarm codebase) and try out a few different options. It would be a fun self-contained sub-project.

Any takers? :-)

Reply all

Reply to author

Forward