Vertex-Centric + Swarm Computing

178 views
Skip to first unread message

Marko Rodriguez

unread,
Jul 3, 2013, 10:43:54 PM7/3/13
to gremli...@googlegroups.com
Hello,

For those that are interested in Furnace…here is a brain dump of a thought I had. If you have the patience to read/understand this, it is well worth it.


PROBLEM:

For the last week, I've been trying to generalize vertex-centric computing to support the ability do computations on derived graphs


For example, while we can do PageRank on a friendship graph (personA--friend-->personB), we can not express PageRank on a derived coauthorship graph (where the explicit graph is personA--wrote-->paper<--wrote--personB). To do this, you would have to compute those coauthor-edges, insert them into the graph, and then do PageRank on personA--coauthor-->personB. In short, make coauthor an edge, not a path.


To support "on the fly"-computations of derived graphs, I went through a long winded exercise in implementing "BSP waves" (like clock cycles on a chip). "Friendship PageRank" would evaluate every 1-step of BSP while "Coauthor PageRank" would evaluate every 2-steps of BSP (as you need two steps to go from person, to paper, to person). In this way, you have multiple PageRank energies flowing at different phases within the BSP computation where the B-BSP (Bulk-BulkSynchronousParallel) would happen on the longest derivation by path length. While psychedelic, the implementation got messy --- when done from the vertex-centric perspective. 

SOLUTION:

With a kink in perspective, warm butter drips over your face...

In vertex-centric computing you think of every vertex as being on its own thread of execution. In this model, you write a VertexProgram that does "VertexProgram.execute(Vertex vertex, GraphMemory memory)" on a loop. Each loop is a BSP-step.


However, when you think of a "particle" as being the thread of computation moving from vertex-to-vertex, a new pictures emerges. With swarm computing, you have a particle with the same method.

…where the role of the vertex is to simply loop through all the particles it currently holds and execute the particles' execute() methods. Thus, the swarm model is still vertex-centric (as can be seen by its implementation): 


In this model, particles move from vertex-to-vertex and aggregate on vertices (in fact, they can get reference to each other and can have interactions -- though we will talk about that at a later date -- more psychedelic stuff there). So how would you do "Friendship PageRank" and "Coauthor PageRank" in the same graph and within the same computation using this model? You simply define a FriendshipParticle and a CoauthorshipParticle. Inside each of these particles there is a state machine/grammar that tells them where to go next. State machine? Sounds like Gremlin.

v.out('friend')
v.out('authored').in('authored')

So, each particle is taking its own path through the graph according to the path descriptions described and doing it over and over and over… If you think about this hard enough you realize --- "Marko, this is stupid. In a global graph computation you will have SOOO many particles emanating at every branch in the graph. This is an object creation nightmare and you will quickly run out of memory." And yes, this is true if you do it like that. However, you don't care about each individual particle as they are identical at the level of their type (CoauthorParticle or FriendshipParticle -- and assuming NO history of where the particle was is needed. Thus they have no identity and their path can be represented by a regular expression/state_machine). Therefore, identity-less particles can exist on a vertex as counts (Map<Class<Particle>,Long>). Interestingly enough, this is analogous to how Faunus computes Gremlin expressions without blowing up the computation.


In this way, a swarm of heterogeneous particles is moving over the graph computing these derived edges. Easy-peasy oh so lemon squeezy….

CRAZY PART:

So you may now ask -- "You just described computing derived edges, but where does PageRank come in?" Check this -- every particle can have a VertexComputer and when its done with its cycle (its derivation) it calls the VertexProgram.execute(Vertex vertex, GraphMemory memory) on its current vertex! More specifically, CoauthorParticle can be provided a PageRankProgram to execute when it gets to the end of its derivation (when it reaches the inferred coauthor). Thus, this model generalizes such that ANY VertexProgram you write can be run by a swarm where the swarm is simply providing the "BSP waves" discussed in the beginning. Moreover -- and this blows my mind --- you can calculate PageRank, SpreadingActivation, AllPairsShortestPath, etc. etc. in parallel all within the same GraphComputer computation:

- the engine that executes the VertexProgram threads.

…You have different "frequencies" (vertex programs) at different "phases" (particles/BSP-waves) running through the graph all at once. Access the graph once and compute numerous algorithms on that same disk/memory access. Blood drips from ears… Can these "frequencies" interact? modulate one another? Is there something like Fourier transforms here? Yes!

Extra reading:


Trust me -- it will be slick...
Marko.

Antony Stubbs

unread,
Jul 26, 2013, 4:45:57 PM7/26/13
to gremli...@googlegroups.com
The silence is deafening. Anyone else as lost as I am?

Viktoras Veitas

unread,
Aug 26, 2014, 4:49:21 PM8/26/14
to gremli...@googlegroups.com
Hi,

On Thu, Jul 4, 2013 at 4:43 AM, Marko Rodriguez <okram...@gmail.com> wrote:
[...]
However, when you think of a "particle" as being the thread of computation moving from vertex-to-vertex, a new pictures emerges. With swarm computing, you have a particle with the same method.

…where the role of the vertex is to simply loop through all the particles it currently holds and execute the particles' execute() methods. Thus, the swarm model is still vertex-centric (as can be seen by its implementation): 


In this model, particles move from vertex-to-vertex and aggregate on vertices (in fact, they can get reference to each other and can have interactions -- though we will talk about that at a later date -- more psychedelic stuff there). [...]

Just wondering if any of this psychodelic stuff is going to appear in TinkerPop3?

Thanks,
Viktoras

Marko Rodriguez

unread,
Aug 26, 2014, 4:54:57 PM8/26/14
to gremli...@googlegroups.com

Viktoras Veitas

unread,
Aug 26, 2014, 5:28:51 PM8/26/14
to gremli...@googlegroups.com
On Tue, Aug 26, 2014 at 10:54 PM, Marko Rodriguez <okram...@gmail.com> wrote:
 
Yeah, read this and still trying to figure out how to write a custom one... and whether it is worth to bang my head against the wall for that.

What in particular are you interested in swarm computing for?

A few things:
1) Calculating 'maximum coherence' of the network of propositions or facts the way Paul Thagard has proposed, which is basically a fuzzy constraint satisfaction problem, as far as I get;
2) I have read/watched a bit about Probabilistic Soft Logic and was thinking of how their queries can be expressed in terms of graph computations with TinkerPop;
3) It sounded REALLY cool ... ;).

Thanks,
Viktoras


Reply all
Reply to author
Forward
0 new messages