Download database dump?

54 views
Skip to first unread message

adamros...@gmail.com

unread,
Apr 27, 2015, 11:52:07 PM4/27/15
to wiki...@googlegroups.com
Is there anywhere one can download a dump of the database to perform offline queries against it?  Or are there plans to make such a dump available?

emer...@gmail.com

unread,
Jan 28, 2016, 12:31:56 AM1/28/16
to WikiRank, adamros...@gmail.com
I too am looking for a data dump for a project. Please let me know if you find one!

Sebastiano Vigna

unread,
Jan 28, 2016, 2:45:25 AM1/28/16
to emer...@gmail.com, WikiRank, adamros...@gmail.com

> On 28 Jan 2016, at 06:31, emer...@gmail.com wrote:
>
> I too am looking for a data dump for a project. Please let me know if you find one!
>

1) Download

http://vigna.di.unimi.it/wikirank.tar.bz2

2) Untar, go into wikirank and set your CLASSPATH to all jars in the jars directory

3)

java -server -Dorg.eclipse.jetty.util.log.class=org.eclipse.jetty.util.log.StdErrLog -Dorg.eclipse.jetty.LEVEL=INFO WikiRankServer -R . -h enwiki-h.ranks -p enwiki-pr-3.ranks -i enwiki-indegree.ranks -v enwiki-pv.ranks -P8080 -t enwiki.fcl enwiki-instanceof enwiki-genre enwiki-occupation enwiki-citizenship enwiki-gender enwiki-language enwiki-cast enwiki-director enwiki-birthplace enwiki-country

-P8080 is the port :)

Ciao,

seba

a.fenst...@gmail.com

unread,
Feb 17, 2016, 4:17:33 AM2/17/16
to WikiRank, emer...@gmail.com, adamros...@gmail.com, vi...@di.unimi.it
Hi Sebastiano,

I really like the idea and your work! Is there any chance you could provide CSV files of the Harmonic Centrality, PageRank, Page views, and Indegree rankings?

Kind regards,
Andreas

Sebastiano Vigna

unread,
Feb 17, 2016, 4:37:06 AM2/17/16
to a.fenst...@gmail.com, WikiRank, emer...@gmail.com, adamros...@gmail.com

> On 17 Feb 2016, at 10:17, a.fenst...@gmail.com wrote:
>
> Hi Sebastiano,
>
> I really like the idea and your work! Is there any chance you could provide CSV files of the Harmonic Centrality, PageRank, Page views, and Indegree rankings?
>

The .ranks file are just Java binary files. You can dump them to ASCII in several different ways. One possibility is to download the LAW library (http://law.di.unimi.it/software/download/law-2.3-bin.tar.gz) and to something like

java it.unimi.dsi.law.io.tool.DataInput2Text -t double something.ranks something.txt

Ciao,

seba

a.fenst...@gmail.com

unread,
Feb 17, 2016, 12:18:27 PM2/17/16
to WikiRank, a.fenst...@gmail.com, emer...@gmail.com, adamros...@gmail.com, vi...@di.unimi.it
Thanks Sebastiano!

Andreas

mao.che...@gmail.com

unread,
Jul 13, 2016, 10:33:34 PM7/13/16
to WikiRank, a.fenst...@gmail.com, emer...@gmail.com, adamros...@gmail.com, vi...@di.unimi.it
Hello Sebastiano,

I followed the steps and got converted the files. The files contains the ranks but how do I know which wikipedia page each rank is corresponded to? Also, I followed the steps data dump and got the server to run locally. However, the homepage at port 8080 is a documentation. I have looked around for a while, and am still not sure how to make query to get for example the top 100 pages or top n pages for a certain category. 

Looking forward to your reply. 

Thank you in advance for your time.
Reply all
Reply to author
Forward
0 new messages