Weird, it is working for me. Can you try again?
The topic is a label used to identify each graph to run PageRank on. If you're only running on a single graph then you can just use something like 0 for the topic.
define PageRank datafu.pig.linkanalysis.PageRank('dangling_nodes','true');
topic_edges = LOAD 'input_edges' as (topic:INT,source:INT,dest:INT,weight:DOUBLE);
topic_edges_grouped = GROUP topic_edges by (topic, source);
topic_edges_grouped = FOREACH topic_edges_grouped GENERATE
group.topic as topic,
group.source as source,
topic_edges.(dest,weight) as edges;
topic_edges_grouped_by_topic = GROUP topic_edges_grouped BY topic;
topic_ranks = FOREACH topic_edges_grouped_by_topic GENERATE
group as topic,
FLATTEN(PageRank(topic_edges_grouped.(source,edges))) as (source,rank);
skill_ranks = FOREACH skill_ranks GENERATE
topic, source, rank;
Sample input:
0 2 3 1.0
0 3 2 1.0
0 4 1 1.0
0 4 2 1.0
0 5 4 1.0
0 5 2 1.0
0 5 6 1.0
0 6 5 1.0
0 6 2 1.0
0 100 2 1.0
0 100 5 1.0
0 101 2 1.0
0 101 5 1.0
0 102 2 1.0
0 102 5 1.0
0 103 5 1.0
0 104 5 1.0
Note that the topic is 0 since there is just a single graph. Nodes A-F were given IDs 1-6, the remainder 100+.
The pig script groups the input data into a form the UDF can use.
Sample output:
0 104 0.016169477
0 1 0.03278149
0 103 0.016169477
0 5 0.08088569
0 100 0.016169477
0 102 0.016169477
0 6 0.03908709
0 3 0.34291038
0 2 0.38440096
0 4 0.03908709
0 101 0.016169477