Re: Swarm

58 views
Skip to first unread message

Ian Clarke

unread,
Aug 17, 2013, 1:54:55 PM8/17/13
to Paul Schoenfelder, James Earl Douglas, swarm-...@googlegroups.com
[copying the discussion mailing list for the benefit of others, hope you guys don't mind]

On Fri, Aug 16, 2013 at 6:32 PM, Paul Schoenfelder <paulscho...@gmail.com> wrote:
I think being able to accommodate both scenarios will make Swarm more useful to a wider array of use cases, but being able to efficiently deal with the latter (computations seeking the data), seems like the more important of the two. The way Swarm is written right now, it appears that the former (distributing computations wherever there is available resources) is not directly supported though, and I'm wondering if trying to do both would introduce complexity to an already complex problem, or if you feel that they are fairly compatible.

It's an interesting question.  I guess my original assumption was that it always made sense for the computation to occur where the data is.  You can therefore rebalance computation load by rebalancing the data that it acts upon.  Of course, I have no problem with questioning that assumption :-)
 
It looks like right now you would have to explicitly mark the data needed for the computation as being located somewhere else (even if it isn't) in order to offload the computation to another machine - instead of something that reads more like "perform this computation somewhere else and continue once complete" - essentially ignoring the location of the data.

Right, we use the Ref class to refer to an object that may or may not be on a remote computer.  If you dereference the object and it is local (by calling apply()) then it just returns the object, but if the object is on a remote node then this will cause the continuation to be serialized and migrated to wherever the object is before it is returned.  To the programmer there is no difference.  

Note also that the object returned doesn't need to be serializable data, it could be System.out, for example.
 
I think my next project after looking into SimpleRepository/Store, is to tackle the clustering algorithm. If you have any suggestions or ideas you've already come up with for how that will work, let me know.

I have ideas but they are pretty half-baked at this point.

One nice thing is that it is quite easy to state the problem (apologies for the overloading of the term "node"):

We have an undirected[1] graph, with nodes and edges, where each edge has a number associated with it, let's call it an "affinity"[2].  We have a number of groups, and each node must be in one of these groups, and let's say that the nodes should be equally distributed between the groups (not a strict requirement, but it prevents the trivial solution of putting all the nodes in the same group).  Our goal is to distribute the nodes between the groups so as to minimize the total affinity of the edges that connect nodes in different groups.

[1] Undirected because we assume that moving the continuation in one direction is just as expensive as moving it in the other.
[2] Affinity would correspond to the number of dereferences that occur along this edge in a given time period.

Our solution should have some important practical qualities.  We must be able to apply it in an incremental fashion to a graph that will change over time beyond the control of the algorithm (nodes added, removed, and affinity may change).

the algorithm should ideally be applied in a decentralized fashion, perhaps where two swarm nodes (aka. two "groups") can have a conversation and agree to swap data so as to come up with a better graph.  We don't want to have some centralized controller monitoring everything.

We can help the algorithm with the simple heuristic that when new data is created, it should be created on the machine where the computation is (which would be the natural thing to do anyway).

One thought might be to start a process periodically, where you take an edge that crosses two groups, say it's between nodes A and B.  You then recursively ask "will I improve the overall graph if I move node A to B's group, or if I move B to node A's group?".  You do a greedy best-first search with backtracking (similar to how in-car navigation systems find routes), moving nodes progressively, trying to find a better graph, within a certain depth.

Of course, for very large object graphs this coudl be pretty inefficient, but another thing to think about is that if a group of nodes in the graph is densely connected internally, but sparsely connected externally, then we could treat that set of nodes as a single node for the purposes of this planning algorithm, which should make it more efficient.

Anyway, I guess that represents my best thinking so far (not much, I know), what do you think?

Ian.

--
Ian Clarke
CEO, Uprizer Labs LLC
Email: i...@uprizer.com
Ph: +1 512 422 3588

Ian Clarke

unread,
Aug 18, 2013, 8:13:32 PM8/18/13
to Paul Schoenfelder, James Earl Douglas, swarm-...@googlegroups.com
On Sat, Aug 17, 2013 at 5:37 PM, Paul Schoenfelder <paulscho...@gmail.com> wrote:
So if I understand correctly, the supervisor algorithm (as I'll call it for now) would be performed on a per-group (which translates to per-machine, or per-datacenter?) level

Yes, sorry - I'm overloading terms.  A "group" is a single machine.
 
, and would need to monitor how many times a dereference results in a move from it's group to another (and vice-versa),

Well, actually it should just keep track of the number of times it is dereferenced within a given time period, regardless of whether it is a local or remote dereference (the latter necessitating a transfer of the  continuation).
 
and at a certain threshold, would communicate with the other group's supervisor to arrange moving the data from that group to it's own in order to reduce the number of dereferences. Is that correct?

Sort of, having verified that the move would help, and having determined whether moving additional data might be warranted.
 
One thing I'm wondering is how do we track the moves? Increment a counter? Send a message to the supervisor which maintains it's own internal counter?

I think every time a reference is dereferenced we should record the time since the previous dereference in some kind of decaying running average.  We can use this to quickly determine the "dereferences per second" of any given reference.  This would be the "affinity" metric that I mentioned previously.
 
Also, if we plan on abstracting the level at which the algorithm runs (per-machine vs per-datacenter), how do we handle coordinating the move of data such that the receiving end is actually capable of storing that data? 

Well, we wouldn't move the data unless we knew in advance that the receiver had sufficient room.
 
It seems to me that we would have to write the supervisor to run at the per-machine level so that it is aware of what resources it has free (possibly not only for handling moves of data, but for passing along a computation if local CPU/memory resources are too low?)

The thing is that passing a computation to another computer would be somewhat pointless if the data it must act upon is on the local computer, since the computation would just come straight back.
 
and that supervisors would also have to have some concept of a higher-level grouping (supervisor A knows that it is in the same group as supervisor B, and since supervisor C is requesting data from A and B's group often, they coordinate to send the data in question to C's group, but evenly distributed over the machines in C's group).

I think my explanation was confusing, I apologize.  I think the problem comes from the fact that a "node" is a machine in the language of peer-to-peer, but a node is a piece of information in the language of graphs - which I was using to describe the clustering problem.  I mixed these two terminologies, and in doing so I did a bad job of explaining my thoughts.

My previous explanation doesn't deal with datacenters at all.  A "group" of "nodes" in my explanation of the clustering algorithm referred to a machine (the group), and the data on that machine (the nodes).
 
I wonder if it might be helpful to jump on a Google Hangout in the next day or two, we could hash this out verbally?  James is very welcome also.  Where are you located and what is your availability over the next few days?

Ian.

Paul Schoenfelder

unread,
Aug 17, 2013, 6:37:50 PM8/17/13
to Ian Clarke, James Earl Douglas, swarm-...@googlegroups.com
Good idea on bringing in the list, probably should have these conversations around for posterity whenever more volunteers come along.
It's an interesting question.  I guess my original assumption was that it always made sense for the computation to occur where the data is.  You can therefore rebalance computation load by rebalancing the data that it acts upon.  Of course, I have no problem with questioning that assumption :-)
I think that assumption is spot on, I guess my point assumed that the data is available on a set of machines equally, and you have a frontend application (say a web server) that wants to delegate work to that set of machines. From the application server's perspective, I should be able to write code that is location-agnostic when location of the data is irrelevant, but processing power is - as well as write code that will find the data when location does matter, and processing power is less important. I hope I'm making sense, but I think you get the gist of what I'm saying.

Right, we use the Ref class to refer to an object that may or may not be on a remote computer.  If you dereference the object and it is local (by calling apply()) then it just returns the object, but if the object is on a remote node then this will cause the continuation to be serialized and migrated to wherever the object is before it is returned.  To the programmer there is no difference.
This solves the problem of location-aware data processing really well, but it obviously doesn't work well with the idea of location-agnostic data processing - this is where the complexity gets introduced with my suggestion I think. 

One thought might be to start a process periodically, where you take an edge that crosses two groups, say it's between nodes A and B.  You then recursively ask "will I improve the overall graph if I move node A to B's group, or if I move B to node A's group?".  You do a greedy best-first search with backtracking (similar to how in-car navigation systems find routes), moving nodes progressively, trying to find a better graph, within a certain depth.

Of course, for very large object graphs this coudl be pretty inefficient, but another thing to think about is that if a group of nodes in the graph is densely connected internally, but sparsely connected externally, then we could treat that set of nodes as a single node for the purposes of this planning algorithm, which should make it more efficient.

Anyway, I guess that represents my best thinking so far (not much, I know), what do you think?

So if I understand correctly, the supervisor algorithm (as I'll call it for now) would be performed on a per-group (which translates to per-machine, or per-datacenter?) level, and would need to monitor how many times a dereference results in a move from it's group to another (and vice-versa), and at a certain threshold, would communicate with the other group's supervisor to arrange moving the data from that group to it's own in order to reduce the number of dereferences. Is that correct?

One thing I'm wondering is how do we track the moves? Increment a counter? Send a message to the supervisor which maintains it's own internal counter? Also, if we plan on abstracting the level at which the algorithm runs (per-machine vs per-datacenter), how do we handle coordinating the move of data such that the receiving end is actually capable of storing that data? 

It seems to me that we would have to write the supervisor to run at the per-machine level so that it is aware of what resources it has free (possibly not only for handling moves of data, but for passing along a computation if local CPU/memory resources are too low?), and that supervisors would also have to have some concept of a higher-level grouping (supervisor A knows that it is in the same group as supervisor B, and since supervisor C is requesting data from A and B's group often, they coordinate to send the data in question to C's group, but evenly distributed over the machines in C's group).

Hopefully I'm making sense ( just tell me if I'm not, I can take it :) ).

Paul

Paul Schoenfelder

unread,
Aug 19, 2013, 5:59:18 PM8/19/13
to Ian Clarke, James Earl Douglas, swarm-...@googlegroups.com
I kind of injected the datacenter thing myself, so I think I contributed to the confusion a bit, sorry about that. In any case, I should be able to make some time in the evenings (I'm in Saint Paul, Minnesota, so anytime after 5PM CDT). Just let me know a day that works best for you (and James if he's interested).

Paul

Michael

unread,
Aug 19, 2013, 8:11:34 PM8/19/13
to swarm-...@googlegroups.com, Ian Clarke, James Earl Douglas
Would this conversation be public? I'm professionally working on something with very similar design goals(although an entirely different set of problems) to Swarm and would gladly contribute ideas and code to solve any distributed load balancing problems.

Michael

unread,
Aug 19, 2013, 8:17:44 PM8/19/13
to swarm-...@googlegroups.com, Ian Clarke, James Earl Douglas
To clarify, I was referring to the G+ hangout. Apologies.

Ian Clarke

unread,
Aug 20, 2013, 10:00:32 AM8/20/13
to swarm-...@googlegroups.com, James Earl Douglas
Michael,

The more the merrier!  I've proposed a poll here to pick a time that works for everyone: http://doodle.com/d4gcf892euvynrqb

(Haven't tried this service before, I hope it works).  Once we have a time I'll create a Google Hangout.

Ian.


--
You received this message because you are subscribed to the Google Groups "Swarm Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to swarm-discus...@googlegroups.com.
To post to this group, send email to swarm-...@googlegroups.com.
Visit this group at http://groups.google.com/group/swarm-discuss.
For more options, visit https://groups.google.com/groups/opt_out.

Michael

unread,
Aug 22, 2013, 10:33:42 AM8/22/13
to swarm-...@googlegroups.com, James Earl Douglas, i...@uprizer.com
It looks as if 18:00 central time is the consensus. Looking forward to it!

Ian Clarke

unread,
Aug 23, 2013, 4:58:25 PM8/23/13
to swarm-...@googlegroups.com, James Earl Douglas
Sincere apologies guys, I dropped the ball on this (something blew up at work), and I didn't get the email that Doodle promised to send me (also, not liking their service, it's pretty confusing and clumsy).

Let's continue discussion via email for the moment, I'll work on an email clarifying my thoughts on the clustering issue.

Ian.

Paul Schoenfelder

unread,
Aug 23, 2013, 7:26:49 PM8/23/13
to swarm-...@googlegroups.com, James Earl Douglas, i...@uprizer.com
No worries, I wasn't going to available until now (16:30 CDT) anyways. Next week would be far better for a Hangout session. In the meantime, I'll keep an eye out for that email.

Michael

unread,
Aug 24, 2013, 6:42:14 PM8/24/13
to swarm-...@googlegroups.com, James Earl Douglas, i...@uprizer.com
I'd like to think most people in the computer science/IT world know all about this situation. No worries, Ian.

James Earl Douglas

unread,
Aug 20, 2013, 9:11:36 PM8/20/13
to swarm-...@googlegroups.com
I'm fully booked up, so I'll have to sit this one out. :[
Reply all
Reply to author
Forward
0 new messages