Load Balancing - Was: How to access data on a remote server?

Rick R

unread,

Oct 12, 2009, 4:38:03 PM10/12/09

to swarm-...@googlegroups.com

So that's an interesting issue. Will the load balancer need to intercept every reduction in order to determine (by statistics and locality) where best to apply the computation? This seems like it could be a detriment to performance. Especially if it makes the effort to move the computation right as it's completing the expensive process.

Has there been much research/discussion on this topic?

On Mon, Oct 12, 2009 at 4:22 PM, Ian Clarke <ian.c...@gmail.com> wrote:

On Mon, Oct 12, 2009 at 3:07 PM, Patrick Wright <pdou...@gmail.com> wrote:

vRem = Ref(new Location(myLocation.address, 9997), "test remote string");

Creates a ref with location = remote server, then serializes itself to
the remote server, saves the content of the ref on the remote server,
and continues executing on the remote server.

I'm not sure why just creating a ref with a remote location would
cause the transfer,

So that is calling Ref.apply(location, value) defined on line 30 of Ref.scala. Note that on line 32 there is a call to Swarm.moveTo() - this is why the continuation moves to the remote server if location is non-local (of course, if location is the local node then Swarm.moveTo() has no effect).

and why server A should be able to push data into
the Store in server B.

You wouldn't, typically the decision to locate a piece of data on a remote server would be made automatically, perhaps for purposes of load-balancing.

I do it manually in this demo just to demonstrate how the code will automatically be moved to the location of a Ref, because the stuff which would automatically move data around isn't implemented yet.

Also, to what extent should Swarm expose or make clear operations
which may cause a move, and to what extent should it be hidden (though
possibly easier to use)?

In theory, the programmer shouldn't need be be aware of when a move occurs, it should be entirely transparent to them. It is explicit in this demo (and the other one) simply so that I can demonstrate this functionality because the stuff to handle this automatically is yet to be implemented.

Adding this line
println(format("%s:%s:%s:%s:%s",vRem(), vLoc(), vRem(), vLoc(),vRem()))

to the demo causes multiple hops between servers; imagine iterating
Refs inside a list.

Yup. In practice, a load balancing mechanism would try to avoid something like that by ensuring that all those Refs are on the same machine.

Very interesting stuff, though. Hope I am closer to grokking it.

I think so, hope this helps.

Ian.

--
Ian Clarke
CEO, Uprizer Labs
Email: i...@uprizer.com
Ph: +1 512 422 3588
Fax: +1 512 276 6674

Ian Clarke

unread,

Oct 12, 2009, 4:56:17 PM10/12/09

to swarm-...@googlegroups.com

On Mon, Oct 12, 2009 at 3:38 PM, Rick R <rick.ri...@gmail.com> wrote:

So that's an interesting issue. Will the load balancer need to intercept every reduction in order to determine (by statistics and locality) where best to apply the computation? This seems like it could be a detriment to performance. Especially if it makes the effort to move the computation right as it's completing the expensive process.

Has there been much research/discussion on this topic?

Not much, here is a summary of my thoughts at the moment: http://groups.google.com/group/swarm-discuss/browse_thread/thread/b51f0d961fa37909

The idea is that load balancing (ie. moving data between computers to minimize the number of transmissions of continuations that must occur, while trying to distribute workload evenly among servers) would be a "supervisor" process, occurring in the background, a bit like concurrent garbage collector, or "just in time" compilation.

Of course, there will be a trade-off between how aggressively we rebalance data, given that this will have a significant overhead if we are over-aggressive.

Ian.

Rick R

unread,

Oct 12, 2009, 5:18:07 PM10/12/09

to swarm-...@googlegroups.com

Interesting. I'll have a look at the current system and see how quickly I can hack in such a background supervisor. I have a 1hr train ride out of the city and need something to do :)

Ian Clarke

unread,

Oct 12, 2009, 6:11:47 PM10/12/09

to swarm-...@googlegroups.com

That would be great!

Although, before you start hacking on Swarm itself, you'll definitely want to try simulating the algorithm I described, since I don't know for sure that it will work.

Ian.

Razie

unread,

Oct 12, 2009, 9:28:57 PM10/12/09

to Swarm Discussion

On the one hand, the direction of this talk of load-balancing implies
the existence of a data-moving capability. This also implies some
minimal meta-data (what's moveable and how vs. what's not).

On the other hand, is there a way to hookup with the compiler, to
detect the data dependencies that a piece of code would have? I figure
that would help an automated algorithm to figure out which data points
to co-locate. Finding the types to move will then be easy, but we'd
need explicit query-attribute mapping to make explicit the logic
behind associating the data points we need (so they can be co-located
more often then not). Hmm...interesting: the Ref does not need to be a
unique ID but a query with one result.

For instance, in your example with the dating site, the load-balancing
algorithm should know the attributes I use to find the people, not the
people themselves...the query, not the rows. Although, I see that this
could now be moving the "free" code into an association-language, so
maybe not :)

...And so the story develops...I find it an interestingly fresh angle
on
distributed computing...

Cheers

Rick R

unread,

Oct 13, 2009, 11:04:36 AM10/13/09

to swarm-...@googlegroups.com

I spent my 1hr train ride consumed in thought. Then I got home and spent 2 hours scribbling ideas. So I haven't actually produced anything but I'm much closer.
What I was banging my head on was efficiency, specifically when dealing with 1 node/bucket at a time. For instance, you wish to map a function across a list, the problem I see is that deciding a strategy for how to run the function will take as much time as running the function.

As you mentioned, avoiding excessive copying of data and movement of continuations is the goal, but that implies some central intelligence. But what I realized is that the central manager doesn't have to be an all overseeing Queen (my first idea).

The (probably not novel) idea I came up with is location aware data structures. We define a List of refs. From an opaque standpoint, it is just a list, internally, it is actually a list of lists. Each sub-list exists on a separate server, so really, our list is purely a master list (of lists) of remote references. As nodes are added to the list, it groups the sublists according to location, this is the most efficient mechanism I can come up with.

What this allows us to do is, when we apply a function to all of the nodes, we know exactly where to copy the continuation, in fact, we can do it in parallel, and it requires no pre-calculations at all to decide where to apply the functions.

Further, if we provide each data structure with the same LocationAware interface, then we do have an overseer, it can quickly gather intelligence as to where spread its operations.

It seems that this scheme would benefit greatly from pure functional data structures. My only problem so far is that the elegant implementation of these data structures would be a very odd combination of pure functional and stateful, and the only way I could think to achieve it would be via Monads.

Thoughts?

Rick R

unread,

Oct 13, 2009, 11:11:23 AM10/13/09

to swarm-...@googlegroups.com

If we formalize our data-sets into graphs and trees. The problem you describe becomes easily solvable.
Luckily, most modern data models fit into a graph quite nicely.

Ian Clarke

unread,

Oct 14, 2009, 10:25:27 AM10/14/09

to swarm-...@googlegroups.com

On Mon, Oct 12, 2009 at 8:28 PM, Razie <razv...@gmail.com> wrote:

On the one hand, the direction of this talk of load-balancing implies
the existence of a data-moving capability. This also implies some
minimal meta-data (what's moveable and how vs. what's not).

Well, perhaps I'm naive but with simple objects we can just check to see whether they implement Serializable. With collections, we can try to serialize them and abort if they through a NotSerializableException.

On the other hand, is there a way to hookup with the compiler, to
detect the data dependencies that a piece of code would have? I figure
that would help an automated algorithm to figure out which data points
to co-locate. Finding the types to move will then be easy, but we'd
need explicit query-attribute mapping to make explicit the logic
behind associating the data points we need (so they can be co-located
more often then not). Hmm...interesting: the Ref does not need to be a
unique ID but a query with one result.

For instance, in your example with the dating site, the load-balancing
algorithm should know the attributes I use to find the people, not the
people themselves...the query, not the rows. Although, I see that this
could now be moving the "free" code into an association-language, so
maybe not :)

Well, I'm assuming that we're just talking about an object graph, possibly with various collections classes built on top of that (Map, Set, List, etc), and possibly with a higher-level SQL-like query mechanism built on top of that.

I would imagine that the load balancing algorithm would look at it at the level of the object graph though.

The load balancing algorithm could see the way that, for example, a Map is traversed in-order to find a good way to split the map across multiple computers.

Ian.

Ian Clarke

unread,

Oct 14, 2009, 10:31:01 AM10/14/09

to swarm-...@googlegroups.com

On Tue, Oct 13, 2009 at 10:04 AM, Rick R <rick.ri...@gmail.com> wrote:

What I was banging my head on was efficiency, specifically when dealing with 1 node/bucket at a time. For instance, you wish to map a function across a list, the problem I see is that deciding a strategy for how to run the function will take as much time as running the function.

I don't really understand this, what do you mean by "strategy"? The program would just traverse the list, if the list happens to be split across two or more computers, then the continuation would be serialized at this point and moved as necessary. No advanced planning is required at runtime, the execution just follows the data wherever it happens to be.

The smart stuff happens offline when the load balancing algorithm decides where the data should be.

Ian.

Rick R

unread,

Oct 14, 2009, 10:53:49 AM10/14/09

to swarm-...@googlegroups.com

A strategy for optimal processing is required for two reasons.

The 1st reason the idea of minimizing the # of hops that the continuation has to take in order to traverse the list. This might actually be confounded by a load balancer. For instance, let's say I have a round robin load balancer across 5 nodes. When I create the list and concat 100 widgets onto it, 1 at a time. It would put 20 widgets on each node. The problem is that widget 10 is on node 0, widget 11 is on node 1, so if we traverse in order, we'll have 100 jumps.

If we made the list data structure location aware, then we could aggregate the data into sub-lists by location, and then traverse those. So we would not be traversing the list in order, but we could do it in 5 jumps.

The 2nd reason for a "strategy" is as such. If we have these sub lists, then we can send 1 function to each of the 5 nodes, and do the traversal in parallel. One major reason for distributing data is to distribute the computation.

So someone's response might be: "Hey, just don't do round-robin load balancing". In absence of scaling issues, it would seem that the optimal algorithm is to put as much stuff on one computer as possible. When that one fills up, we spill over to the next, and the next.. but going past a few machines, we'll have to begin to address the issues that are addressed with the location aware structure approach.

This may be the right approach, but to reconcile the issue of overcrowding with the issue of locality, we'll need intelligent graph bisection/annealing algorithms which will require location aware data structures.
Further, to address the locations of thousands (or billions) of widgets, we will want to deal with those widgets in the largest granularity possible. e.g. I would prefer moving lists of hundreds of objects at a time to moving 1 object at a time.

Am I completely off base here? I tend to get ahead of myself. When encountering complex systems I tend to conjure the most complex cases first. But I like to make sure the system *can* scale before building it.

Ian Clarke

unread,

Oct 14, 2009, 2:06:06 PM10/14/09

to swarm-...@googlegroups.com

Well, I'm not sure we are on the same page yet, but the first thing is that a round-robin load balancer would be a terrible way to decide where to put newly created objects for exactly the reason you cite: if you were creating, for example, a linked-list, you'd have a different node on each computer, exactly what we don't want!

The load-balancer may need to decide where to put a newly created object, but in the vast majority of cases it should create it on whatever computer the continuation is currently executing on. In fact, I'd say the only circumstances under which you'd create a new object on a remote machine is if there is no-more space left on this one.

You then leave it to the load-balancer, working in the background, to migrate some or all of the linked-list elsewhere if this is deemed beneficial.

What I envision is that the load balancer operates below the level of specific datastructures (such as List or Set collections), in much the same way that the garbage collector does. The GC doesn't need to know that something is a LinkedList, a HashSet, or a ConcurrentSkipListMap, it just treats everything as an object graph.

Similarly, the load balancer also treats objects as a graph, except rather than the edges being references as with the GC's graph, an edge between objects A and B represents the fact that a thread may access A then B. The weight of the edges would be proportional to the frequency with which A is accessed by a thread, and then B is accessed.

The task of the load balancer is to minimize the number of edges between objects on different computers, prioritizing heavier edges over lighter ones.

So, to summarize, this all happens at the object level, so to the load balancer a list is just a graph of objects in a particular configuration, it doesn't know or need to know that it is a datastructure called a list.

Does that make any sense?

Rick R

unread,

Oct 14, 2009, 2:56:14 PM10/14/09

to swarm-...@googlegroups.com

Yes. This certainly seems like a valid approach, and probably the easiest. However, it also seems like the least optimal. I would guess that the way this would work is that every time a Ref was dereferenced, it would report its statistics to the load balancer. Since this load balancer is Centralized, that's a significant amount of overhead right there. It's not just storing simple statistics, but building a graph from series of access. Then, once it makes a decision and begins to move buckets around, you might have to deal with a transaction or some form of locking. Then there is the copying of data...

Maybe I'm prematurely optimizing, but I think if we take the approach that we need to minimize copying data and think more about optimal algorithms for representing data in such a way to optimize access, it would be successful, and still easy to implement.

The approach I would take would be to design the data structures from the very beginning so that they efficiently align themselves with an optimal access strategy. I agree that a round robin placement would be a horrible idea, but using that case in which the data is evenly spread around the cluster, the solution I offer with my location aware list would allow access in an efficient manner. It could even parallelize access.

Bayani Portier

unread,

Oct 14, 2009, 7:06:38 PM10/14/09

to swarm-...@googlegroups.com

You're probably already thinking this, but rather than pushing raw statistics up, perhaps a strategy like map-reduce would be in order, so that responses can be distilled, and sent in clumps to be distilled further up the food chain to the final load balancer.

Just a thought.

-b

Ian Clarke

unread,

Oct 15, 2009, 9:55:51 AM10/15/09

to swarm-...@googlegroups.com

On Wed, Oct 14, 2009 at 1:56 PM, Rick R <rick.ri...@gmail.com> wrote:
> Yes. This certainly seems like a valid approach, and probably the easiest.
> However, it also seems like the least optimal. I would guess that the way
> this would work is that every time a Ref was dereferenced, it would report
> its statistics to the load balancer.

Well, perhaps increment a counter (not a significant overhead), yes.

> Since this load balancer is
> Centralized, that's a significant amount of overhead right there.

I don't think the load balancer needs to be centralized, and even if
it were, it would only require summary statistics - you wouldn't
report every single event as it happens.

> It's not
> just storing simple statistics, but building a graph from series of access.

Right, it needs a matrix showing the interactions between different
blocks of data.

> Then, once it makes a decision and begins to move buckets around, you might
> have to deal with a transaction or some form of locking. Then there is the
> copying of data...

Yes, we'll definitely need to deal with concurrency issues.

> Maybe I'm prematurely optimizing, but I think if we take the approach that
> we need to minimize copying data and think more about optimal algorithms for
> representing data in such a way to optimize access, it would be successful,
> and still easy to implement.

Right, I think the load balancing algorithm would only move data when
there is a clear benefit to doing so.

> The approach I would take would be to design the data structures from the
> very beginning so that they efficiently align themselves with an optimal
> access strategy. I agree that a round robin placement would be a horrible
> idea, but using that case in which the data is evenly spread around the
> cluster, the solution I offer with my location aware list would allow access
> in an efficient manner. It could even parallelize access.

I just don't think this should occur at the level of specific
collection-types, rather it should occur at the level below this, at
the level of the object graph, similar to garbage collection.

It may be the case that this can't be done at the level of the object
graph, but so far I haven't seen any compelling reason that it can't.

Rick R

unread,

Oct 15, 2009, 10:11:10 AM10/15/09

to swarm-...@googlegroups.com

I now realize that both solutions are achievable with the same implementation. An advanced programmer may simply choose to store her data structures within a Ref. Which will obviously force the load balancer to work with the data on a higher level. She would then be responsible for doing whatever organization deemed necessary.

So let me see if I have this clear;
On start-up, a load balancer, let's say the Queen, pings each Swarm instance to notify them of her existence. They then collect statistics of references, a Summary, from their Store and pass it to the queen at regular (configurable) intervals.

Once a Queen has received all necessary Summaries, she then does her magical matrix calculations and issues edicts to move batches of data from one machine to the next.

So far I have one issue with the implementation:
Currently it seems the only thing doing any work is the continuation. What do you think about building the Swarm, Store and Queen out of RemoteActors? This way, all of these maintenance tasks I just described can be performed via simple RPC in an "out-of-band" fashion. I think it would make the implementation a bit more straightforward, and more scala-esque.

Patrick Wright

unread,

Oct 15, 2009, 10:24:19 AM10/15/09

to swarm-...@googlegroups.com

> So let me see if I have this clear;
> On start-up, a load balancer, let's say the Queen, pings each Swarm instance
> to notify them of her existence. They then collect statistics of references,
> a Summary, from their Store and pass it to the queen at regular
> (configurable) intervals.
>
> Once a Queen has received all necessary Summaries, she then does her magical
> matrix calculations and issues edicts to move batches of data from one
> machine to the next.

I think the technical report on Project Darkstar's multi-node data
store is relevant in this regard. Recommend again to read (and see
what lessons are to be learned).

>
> So far I have one issue with the implementation:
> Currently it seems the only thing doing any work is the continuation. What
> do you think about building the Swarm, Store and Queen out of RemoteActors?
> This way, all of these maintenance tasks I just described can be performed
> via simple RPC in an "out-of-band" fashion. I think it would make the
> implementation a bit more straightforward, and more scala-esque.

From my own POV, I think the continuation is a killer feature of Swarm
ATM, which doesn't obviate the possibility of using other
intra-process communication channels. An important aspect of
continuations as Ian has demo'ed them is that from the programmer's
POV, it looks like a normal method, which means one can implement e.g.
algorithms the way one expects them to look in code. That's a big
deal, IMO. I suspect that any serious implementation of Swarm would be
involve more than just continuations, though.

My own focus right now is on using a Tuple Spaces design accessed via
the same APIs Ian coded, which I've got running locally in a crude
demo.

Patrick

Rick R

unread,

Oct 15, 2009, 11:11:07 AM10/15/09

to swarm-...@googlegroups.com

On Thu, Oct 15, 2009 at 10:24 AM, Patrick Wright <pdou...@gmail.com> wrote:

I think the technical report on Project Darkstar's multi-node data
store is relevant in this regard. Recommend again to read (and see
what lessons are to be learned).

I've read most of the other Darkstar literature since you mentioned it. The TR is downloaded to my laptop, just waiting for me to have a free moment.

An important aspect of
continuations as Ian has demo'ed them is that from the programmer's
POV, it looks like a normal method, which means one can implement e.g.
algorithms the way one expects them to look in code. That's a big
deal, IMO. I suspect that any serious implementation of Swarm would be
involve more than just continuations, though.

I agree, this is really some special sauce. I did a lot of similar work in D on a project I called Scatter. The principal was that data and computation got distributed. I designed the API so that manually distributing work and workers was easy as possible (so I thught). But even that got cumbersome to use for a project of any scale.

My own focus right now is on using a Tuple Spaces design accessed via
the same APIs Ian coded, which I've got running locally in a crude
demo.

Interesting! I would love to see a modern tuple spaces implementation. Do you have it (or plan to have it) hooked up to a REPL? That would be the icing on the Swarm cake if you could interact with and analyze objects in a cloud.

Patrick

Patrick Wright

unread,

Oct 15, 2009, 11:26:30 AM10/15/09

to swarm-...@googlegroups.com

> Interesting! I would love to see a modern tuple spaces implementation. Do
> you have it (or plan to have it) hooked up to a REPL? That would be the
> icing on the Swarm cake if you could interact with and analyze objects in a
> cloud.

No, not yet. My first step (apart from learning how to dance with Git)
is to get the Maven blah blah Jini blah blah configuration correct so
that others automatically download the right jars and start the
required pieces without any hassle. I'm starting with JavaSpaces since
I have some experience with Jini and since the JS API is drop-dead
simple. Once that configuration is prepared we can play around with
things like the REPL, but it shouldn't be too hard (I imagine it will
just be classpath issues at that point).

Having that configuration in place will also make synchronous calls
via Jini trivial, as well.

Eric J. Christeson

unread,

Oct 15, 2009, 2:56:51 PM10/15/09

to Swarm Discussion

On Oct 15, 9:11 am, Rick R <rick.richard...@gmail.com> wrote:

> So let me see if I have this clear;
> On start-up, a load balancer, let's say the Queen, pings each Swarm instance
> to notify them of her existence. They then collect statistics of references,
> a Summary, from their Store and pass it to the queen at regular
> (configurable) intervals.
>
> Once a Queen has received all necessary Summaries, she then does her magical
> matrix calculations and issues edicts to move batches of data from one
> machine to the next.

Maybe this is what you're talking about, but do we even need a Queen
node, or is the Queen a process that runs on all the nodes?
I was thinking that this Queen/hypervisor could be a process or thread
that distributes itself from node to node via continuation, and picks
up meta-data and decides to shuffle data at some point.

If we had some sort of self-organizing structure where nodes keep
track of some subset of total nodes, they could pass the Queen
around. It would be like a token process traveling between nodes to
gather the data it needs to decide what data needs to be moved.

Am I totally off-base, or is this feasible?

Eric

Rick R

unread,

Oct 15, 2009, 3:12:42 PM10/15/09

to swarm-...@googlegroups.com

That seems like a good approach.

Depending on how much state the Queen needs to pack with her, it might be more efficient for her to stay in one place. However, the major advantage of the method that you suggest is that Swarms/Stores don't need to know about a Queen or where she is. The swarms would simply have to offer an interface for querying statistics. This seems like a big win.

Razie

unread,

Oct 16, 2009, 9:40:31 AM10/16/09

to swarm-...@googlegroups.com

The queen shouldn't break the pattern and itself be a continuation,
moving from node to node as it needs the local statistics.

Next up: how do we insure redundancy? The queen's accumulated data
should not be lost if the current node decides to exit stage, left.

Also, this is now a model of paralelism: the queen could continue on
two or more nodes. All we need is a merge/join mechanism for joining
later, say when two queen instances colide on the same node. Oh, I
love this! I had this problem in my agent system, for a distributed
database and came up with a small interface that is needed.

--
Sent from my mobile device

Ian Clarke

unread,

Oct 16, 2009, 2:56:38 PM10/16/09

to swarm-...@googlegroups.com

I think we need to settle on a load-balancing algorithm before we get
too deep into how we'll do this.

Someone said they might simulate the algorithm I proposed a while
back, did anyone get around to that? Anyone still planning to?

Rick R

unread,

Oct 16, 2009, 6:07:44 PM10/16/09

to swarm-...@googlegroups.com

I started implementing a load balancing simulator and I realized that the current Swarm system is so simple, it is the perfect "simulation" environment.

I am in the process of reworking the system slightly so that we could plug in new load balancing algorithms easily. In addition, I would like to come up with some tests so that we can rate and tweak the load balancer.

Ian Clarke

unread,

Oct 16, 2009, 9:09:13 PM10/16/09

to swarm-...@googlegroups.com

Rick,

A simple test might be a binary tree, too big to fit on a single node.
The algorithm that wins is the algorithm that minimizes the number of
"hops".

The hardest test would be randomly interconnected data, that is
basically a worst-case scenario for any load balancing algorithm.

Ian.

Eric J. Christeson

unread,

Oct 20, 2009, 11:19:44 AM10/20/09

to Swarm Discussion

On Oct 16, 1:56 pm, Ian Clarke <ian.cla...@gmail.com> wrote:
> I think we need to settle on a load-balancing algorithm before we get
> too deep into how we'll do this.
>
> Someone said they might simulate the algorithm I proposed a while
> back, did anyone get around to that? Anyone still planning to?

I was planning to work on this for a class. I've been busy with a few
other things, but will have a little time in the next few days to
prototype it.

Eric

Ian Clarke

unread,

Oct 20, 2009, 1:26:33 PM10/20/09

to swarm-...@googlegroups.com

Ok, please keep us up to date to avoid duplicated effort, or if you
realize you can't work on it please let us know so that nobody else is
discouraged from addressing it.

Reply all

Reply to author

Forward