How to implement the “page popularity” with JWPL?

3 views
Skip to first unread message

丁磊

unread,
Aug 1, 2017, 11:06:22 PM8/1/17
to jwpl-users
Hi,

I am working on such a task:
When searching a query on Wikipedia Data, I want the candidate pages are ranked by their popularity of the candidate page with regard to my query. (Here, “popularity” means the prior probability of the appearance of a candidate page given my query).

For example, for query “Steve”, Steve Jobs is the page with highest popularity and Steve Nash is a little lower. 

I used to use the “Pageview statistics” data with the Wikipedia API, but is not high-efficiency.
I found the page view data is not available in Wikipedia dumps. so I want to count the appearance of each pages in the page text. 

I wonder if there is a specification or a description of the data in “Page.txt”?
I guess the content between ‘’[[“ and “]]” is the page title but I’m not sure, and how to deal with disambiguation ones?

Thank you.

Torsten Zesch

unread,
Aug 2, 2017, 5:44:56 AM8/2/17
to jw...@googlegroups.com
Your approach sounds good.
If a mention of "Steve" links to the disambiguation page instead of the a specific page, I think there is little that can be done.

-Torsten 

--
You received this message because you are subscribed to the Google Groups "jwpl-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jwpl+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages