Broadcast variables in Spargel

13 views
Skip to first unread message

m.neuma...@gmail.com

unread,
Jun 1, 2014, 9:57:31 AM6/1/14
to stratosp...@googlegroups.com
Hej,

Im implementing PageRank using Spargel. The only addition to a normal PageRank is a pre step that computes the size of the NodeSet using a simple map + reduce step to count. As a result I have the size value in a DataSet.

I need to hand this over to the Spargel part of the computation but I'm not 100% on how to do that. For normal DataSets I would use a broadcast Variable.

On a normal map function it works as descried in the documentation with:
nodeList.map(new VertexInit()).withBroadcastSet(numNodes, "#Nodes");

If I call the Spargel part as follows, I don't have the option to add .withBroadcastSet.
initNodeList.runOperation(VertexCentricIteration.withPlainEdges(...));

QUESTION:
How can I hand over a the BroadcastSet to Spargel, or if it is not possible how can I pass over the size value I computed in a different way?

I have to present this functionality tomorrow (that's why I'm working on a Sunday) so I hope for a fast answer.


cheers Martin

Stephan Ewen

unread,
Jun 1, 2014, 10:24:24 AM6/1/14
to stratosp...@googlegroups.com
Hi!

You are right, the Spargel API currently lacks a hook to register BC Variables.

It should not be a big deal to fix that. I can add it to a custom branch or 0.5.1-SNAPSHOT and ping you once it is done.

Greetings,
Stephan

Stephan Ewen

unread,
Jun 1, 2014, 10:45:50 AM6/1/14
to stratosp...@googlegroups.com

Stephan Ewen

unread,
Jun 1, 2014, 10:46:36 AM6/1/14
to stratosp...@googlegroups.com
Just out of curiosity: Is it possible to do the Broadcast variable dependent Computation outside the Spargel Code? As initialization logic?

Stephan

Martin Neumann

unread,
Jun 1, 2014, 10:59:12 AM6/1/14
to stratosp...@googlegroups.com
I can put the computation anywhere, all I need it so find out the size of the data set and somehow hand it over to the spargel pagerank computation.

Maybe I can do some tricks with storing the Value directly on the Nodes, I hesitate to do so since that would mean duplicating the value several billion times.


On Sun, Jun 1, 2014 at 4:46 PM, Stephan Ewen <se...@apache.org> wrote:
Just out of curiosity: Is it possible to do the Broadcast variable dependent Computation outside the Spargel Code? As initialization logic?

Stephan

--
You received this message because you are subscribed to a topic in the Google Groups "stratosphere-dev" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stratosphere-dev/HyN9GRLA9kI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stratosphere-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/stratosphere-dev.
For more options, visit https://groups.google.com/d/optout.

Stephan Ewen

unread,
Jun 1, 2014, 11:04:22 AM6/1/14
to stratosp...@googlegroups.com
Yes, I agree that is not the best solution.

The way I implemented the test page rank example, I needed the value for the initial ranks (1 / #numvertices). That initialization is before the vertex-centric part starts.

Do you need it at more places? Dampening factor or random jump or so?

Martin Neumann

unread,
Jun 1, 2014, 12:13:43 PM6/1/14
to stratosp...@googlegroups.com
I need it for node initialization and the random jump factor.


--

Stephan Ewen

unread,
Jun 1, 2014, 1:10:19 PM6/1/14
to stratosp...@googlegroups.com, m.neuma...@gmail.com
I am preparing a patch right now. Hope to have it online soon.
Reply all
Reply to author
Forward
0 new messages