trying mr_page_rank.py

86 views
Skip to first unread message

djinn

unread,
Jun 15, 2011, 6:05:54 AM6/15/11
to mrjob
I have just built myself a hadoop node. I was trying to run mr_wc.py
which runs superbly well.
Sadly I was trying to run mr_page_rank.py which came to miserable
failure. Trying to comprehend the code, I came to a conclusion it
requires a relation in form of [node_id, node_id]. Which I presented
as plaintext as well as json array per line.

plainttext
nodeid, nodeid

json
[nodeid, nodeid]


I get error at both type of runs.

> /usr/bin/python mr_page_rank.py --step-num=9 --mapper --protocol json --output-protocol json --input-protocol json --iterations 10 --damping-factor 0.85 /tmp/mr_page_rank.ss.20110615.100511.734687/step-8-reducer
writing to /tmp/mr_page_rank.ss.20110615.100511.734687/step-9-mapper
counters: {'Undecodable input': {'ValueError': 100000}}


Please help.

regards

Cpt Caveman

unread,
Jun 17, 2011, 2:33:46 PM6/17/11
to mr...@googlegroups.com
I have not executed this myself but it appears that the input should be: 
nodeId, node 
where node is a dictionary, it has keys such as: score and prev_score.  
The input format is json so everything should be json encoded.  

This is an iterative job, so the input of the mapper (send_score) is the output of the reducer (receive_score).  Investigate what the reducer does to see what you need to put in.  
Reply all
Reply to author
Forward
0 new messages