Hi,
On Thu, Nov 14, 2013 at 03:33:14AM -0800,
mor...@gmail.com wrote:
>
> Thank you for answering.
> I have another question about input words. In the cited paper
> (Personalizing PageRank for WordSense Disambiguation) you talking about
> probability mass, but i could not find any references in the README.txt of
> the ukb tool.
> How can I handle with it? Is it about the "control" field or weight "field"?
>
your question is quite general, I'll try to answer it as good as I can :-)
When we talk about distributions, we almost always refer to the value of
the teleport vector v in equation (1) in the paper. The vector is
initialized using the following information:
- weight each context word
- weight of links between words and senses
Each context word "cw" contributes to the v vector using this equation:
For each sense s of cw
v[u] += normalized_cw_w * e[cw->s] / Sum_{cw->b}(e[cw->b])
Where normalized_cw_w is the normalized weight of the context word (its
weight divided by the sum of all context weights), "e[cw->s]" is the
weight of the link between context word cw and synset s and
"Sum_{cw->a}(e[cw->a])" is the sum of all link weights for context word
cw.
So the way to "handle" with it is a) change weights of context words
(weight field), and b) change weights between a word and its senses (via
the dictionary)
Regarding ukb options:
- If option --ctx_noweight is set, the weights of context word is 1. If
not, the weight is taken from field "weight" (1 if absent).
- If option --dict_weight is _not_ set, the link weights from context
words to senses is 1.
I know the explanation is a bit dense, but nevertheless I hope it helps,
aitor