Webgraph

258 views
Skip to first unread message

Tie hky

unread,
Sep 14, 2014, 9:06:33 AM9/14/14
to web-data...@googlegroups.com

Hi,

Thanks for sharing the Web graph data.

WDC provides different formats for WebGraph data. Would you please let me know what tools are used to convert the data into different formats, what's the format of original data?

Because size of the WebGrpha data is significantly smaller, I downloaded it successfully. And I'd like retrieve some info from the data.
I am new to WebGraph, is it just a graph display tool, or does it provide some functions to retrieve different info from the data?


I tried to run java -cp "webgraph-3.4.1.jar" ... to view the data. But I am not sure what the command line should be. Would you please let me know what I should do to view the data? Thanks.

 






Thanks again.

Robert Meusel

unread,
Sep 15, 2014, 2:27:33 AM9/15/14
to web-data...@googlegroups.com
Hi,

the original format, which was produced by the extraction framework and some PIG Scripts is a simple list of arcs (ID = orgin, ID' = target) which are separated by a tab. These list is accomblished by an index file, where each ID can be mapped to its original URL. The "tools" are all selfbuild. To convert the file into webgraph, we used: BVGraph.store()

WebGraph is a pure caculation tool and build to to process large graphs with rather small resources (in comparrison to the graph).
You can find more information here: http://webgraph.di.unimi.it/

Hope this helps,
Robert

Tie hky

unread,
Sep 15, 2014, 10:02:57 AM9/15/14
to web-data...@googlegroups.com
Thanks Robert.
Will take a look at website.

Eugene Nana Opoku

unread,
Jun 18, 2015, 10:26:09 AM6/18/15
to web-data...@googlegroups.com
This discussion is of much interest to me. webspam-uk-2006 and WDC 2012 datasets both compressed BV format of the page/url graph.
I recently installed webgraph from https://github.com/lhelwerd/WebGraph. Although was successfully build, I get errors creating the offset file by the code  "java it.unimi.dsi.webgraph.BVGraph -O uk-2006-05-nat" gives Error: could not find or load main class it.unimi.dsi.webgraph.BVGraph -O uk-2006-05-nat. Matlab for BVGraph only permits simple operations My question is
1. what other alternatives can be used to view a BV graph
2. from command line, how can I retrieve the in-links of a specific node?

Robert Meusel

unread,
Jun 19, 2015, 4:03:58 AM6/19/15
to web-data...@googlegroups.com
Hi Eugene,

Seems to me that you are simply missing the "-cp webgraph.jar" in the command. Might that fix the problem?
To view the graph, or parts of it, the library/tool of Massimo might help you: http://code.google.com/p/py-web-graph/
I know that there is a function outdegree(int node) which gives you the outdegree for a specific node. Unfortunately I have now idea if this works on command line, as we use most of the WebGraph library within our Maven Projects. Please note, that there is no "indegree" function. So you have to first transpose the graph (turn around the arcs) and then calculate the outdegree.

Hope this helps,
Robert

Eugene Nana Opoku

unread,
Jun 23, 2015, 6:20:02 AM6/23/15
to web-data...@googlegroups.com
I set up webgraph as a maven project in eclipse as you directed. I run BVGraph,java and parsed my .graph-file at run time which almost loaded after I increased my JVM to Xmx1024m but failed cos I allocated half of my memory.I'm new to webgraph, kindly bear with me.  I wish I could load with the offset file but the uk-2006.graph data which is my smallest data size doesn't have the offset file. I read I can use the lazyIterator method to get quaick access to the graph file without decompressing in full. how do I use the methods to

1. how do I use lazyIterator to generate the out-links?
2. where will the output result be?

Robert Meusel

unread,
Jul 2, 2015, 6:50:34 PM7/2/15
to web-data...@googlegroups.com
hi,

as i am not the author of the library, i do not really have a to deep knowledge. maybe you read yourself through the documentation http://webgraph.di.unimi.it/ or ask sebastiano vigna directly.

sorry for the late reply, i am currently travelling.

robert
Reply all
Reply to author
Forward
0 new messages