Re: WSD and PageRank

Aitor Soroa

Feb 17, 2016, 3:04:53 AM2/17/16
to Gianluca Quercini,
Hi Gianluca,

(I'm forwarding the message to the ukb mailing list, as others may find
it useful)
it useful)

See answers below.

On Tue, Feb 16, 2016 at 03:30:33PM +0100, Gianluca Quercini wrote:
> [...]
> I only have a couple of questions:
> 1) In the approach using personalized PageRank, how do you assign the
> initial weights to the nodes?
> If my understanding is correct, you assign to each node representing a
> word in the context a weight 1/|W|, where |W| is the number of words
> in the context, while the other nodes have a weight 0. Is that
> correct?

The exact initial weight of a node v, PV[v], is calculated as follow:

for each cw in context
for each v pointed so that cw->v is in the dictionary:
PV[v] += normalized_cw_w * e[cw->v] / Sum_{cw->u}(e[cw->u])

- normalized_cw_w = weight(cw) / Sum_{w in context} weight(w)
- e[cw->v] weight of the word->synset relation (usually 1, but see below)

for calculating e[cw->v] you can use --dict-weight option, and then the
frequencies present on the dictionary will be used.

> 2) I'd like to try your implementation that I found on GitHub on the latest version of Wikipedia.
> I found in your Github repository some scripts to convert WordNet to
> the format required by your algorithm, but I did not find any script
> to convert Wikipedia. Is there any?

Unfortunately, creating UKB graphs from Wikipedia is not
straightforward. You can download a Wikipedia graph and dictionary. You
can download the graph/dictionary extracted from English Wikipedia here:

We have also some scripts for extracting the graph from Wikipedia XML
dumps, find these attached. There is no proper documentation, and I'm
afraid that I won't be able to offer any support regarding those

The idea is to download a Wikipedia XML dump (usually called
enwiki-latest-pages-articles.xml.bz2), cd into the directory, and run
the script. This should create some files like

Once the extraction is over, you can run "" to create the
dictionary and "" for creating the graph relations.

hope this helps,


Dr. Srinivas Rao

Oct 18, 2016, 5:51:57 AM10/18/16
to ukblist,
