How to implement the “page popularity” with JWPL?

Skip to first unread message


Aug 1, 2017, 11:06:22 PM8/1/17
to jwpl-users

I am working on such a task:
When searching a query on Wikipedia Data, I want the candidate pages are ranked by their popularity of the candidate page with regard to my query. (Here, “popularity” means the prior probability of the appearance of a candidate page given my query).

For example, for query “Steve”, Steve Jobs is the page with highest popularity and Steve Nash is a little lower. 

I used to use the “Pageview statistics” data with the Wikipedia API, but is not high-efficiency.
I found the page view data is not available in Wikipedia dumps. so I want to count the appearance of each pages in the page text. 

I wonder if there is a specification or a description of the data in “Page.txt”?
I guess the content between ‘’[[“ and “]]” is the page title but I’m not sure, and how to deal with disambiguation ones?

Thank you.

Torsten Zesch

Aug 2, 2017, 5:44:56 AM8/2/17
Your approach sounds good.
If a mention of "Steve" links to the disambiguation page instead of the a specific page, I think there is little that can be done.


You received this message because you are subscribed to the Google Groups "jwpl-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
For more options, visit

Reply all
Reply to author
0 new messages