Hi,
I am working on such a task:
When searching a query on Wikipedia Data, I want the candidate pages are ranked by their popularity of the candidate page with regard to my query. (Here, “popularity” means the prior probability of the appearance of a candidate page given my query).
For example, for query “Steve”, Steve Jobs is the page with highest popularity and Steve Nash is a little lower.
I used to use the “Pageview statistics” data with the Wikipedia API, but is not high-efficiency.
I found the page view data is not available in Wikipedia dumps. so I want to count the appearance of each pages in the page text.
I wonder if there is a specification or a description of the data in “Page.txt”?
I guess the content between ‘’[[“ and “]]” is the page title but I’m not sure, and how to deal with disambiguation ones?
Thank you.