问个关于 translation model问题.

1 view
Skip to first unread message

len_yu

unread,
Jun 26, 2008, 9:37:13 AM6/26/08
to cs410pku
那节课下半节又点事没听到.

问:
有些符号打不出来,直接写文字了.
translation model 如下:
P(Q|D,R)= 对i求积(对j求和(P(qi|wj)*P(wj|D)))

对于一个query,
根据上面计算的概率来给文档排序,
问题是公式中的 P(qi|wj)这个概率对于求 P(Q|D,R) 应该对每篇文档是不是一样的呢?
根据公式理解好像是的,
但是如果是的话,是说不通的,就没有必要算了。

没有理解,
大家帮忙解答下,谢谢

吉阳生

unread,
Jun 26, 2008, 9:57:32 AM6/26/08
to cs41...@googlegroups.com
P(Q|D,R)= 对i求积(对j求和(P(qi|wj)*P(wj|D)))

我的理解是这样
P(Q|D,R) = 求积i[ P(Qi|D,R) ]
              = 求积i[ 求和j[P(Qi|Wj, D, R) P(Wj|D, R)] ]
              = 求积i[ 求和j[P(Qi|Wj) P(Wj|D, R)] ]  if D and R are conditional independent with Qi given Wi
So I think the formula makes sense.

END.

2008/6/26 len_yu <yuk...@163.com>:

len_yu

unread,
Jun 26, 2008, 10:23:17 AM6/26/08
to cs410pku
thanks.
My understanding is:
Given a query,
then for every document in the collection, we compute a P(Q|
D,R),assuming it is P0.
then according to P0 of each document, sorting

when computing P0, we should compute P(qi|wj) first,
My question is :
whether the probability of P(qi|wj) is the same for every document,


On 6月26日, 下午9时57分, "吉阳生" <johnson.nj...@gmail.com> wrote:
> P(Q|D,R)= 对i求积(对j求和(P(qi|wj)*P(wj|D)))
>
> 我的理解是这样
> P(Q|D,R) = 求积i[ P(Qi|D,R) ]
> = 求积i[ 求和j[P(Qi|Wj, D, R) P(Wj|D, R)] ]
> = 求积i[ 求和j[P(Qi|Wj) P(Wj|D, R)] ] if D and R are conditional
> independent with Qi given Wi
> So I think the formula makes sense.
>
> END.
>
> 2008/6/26 len_yu <yuku...@163.com>:
>
>
>
> > 那节课下半节又点事没听到.
>
> > 问:
> > 有些符号打不出来,直接写文字了.
> > translation model 如下:
> > P(Q|D,R)= 对i求积(对j求和(P(qi|wj)*P(wj|D)))
>
> > 对于一个query,
> > 根据上面计算的概率来给文档排序,
> > 问题是公式中的 P(qi|wj)这个概率对于求 P(Q|D,R) 应该对每篇文档是不是一样的呢?
> > 根据公式理解好像是的,
> > 但是如果是的话,是说不通的,就没有必要算了。
>
> > 没有理解,
> > 大家帮忙解答下,谢谢- 隐藏被引用文字 -
>
> - 显示引用的文字 -

吉阳生

unread,
Jun 26, 2008, 10:29:30 AM6/26/08
to cs41...@googlegroups.com
Sorry for my last email's ambiguity.
I mean when compute 求积i[ 求和j[P(Qi|Wj) P(Wj|D, R)] ]
the Wj should belong to Document D, so the P(Qi|Wj) should be different for different document.

Thanks.



2008/6/26 len_yu <yuk...@163.com>:

Zhai

unread,
Jun 27, 2008, 5:41:27 AM6/27/08
to cs410pku
The probability you asked about is the word-word translation
probability, which encodes our knowledge about relatedness of words,
so it's document-independent, thus the same set of probability values
(i.e., { p(w|w')} for all words w and w') would be used to compute the
score for every document.

The other term p(wj|D,R) would distinguish different documents.

On Jun 26, 9:29 am, "吉阳生" <johnson.nj...@gmail.com> wrote:
> Sorry for my last email's ambiguity.
> I mean when compute 求积i[ 求和j[P(Qi|Wj) P(Wj|D, R)] ]
> the Wj should belong to Document D, so the P(Qi|Wj) should be different for
> different document.
>
> Thanks.
>
> 2008/6/26 len_yu <yuku...@163.com>:
> > > - 显示引用的文字 -- Hide quoted text -
>
> - Show quoted text -

jxufe

unread,
Jun 28, 2008, 12:19:37 AM6/28/08
to cs410pku
谢谢各位讲解!
> > - Show quoted text -- 隐藏被引用文字 -
>
> - 显示引用的文字 -

Hongfei Yan

unread,
Jun 28, 2008, 3:53:34 AM6/28/08
to cs41...@googlegroups.com
You may also check the page 2 of the paper for a detailed explanation.

http://net.pku.edu.cn/~course/cs410/reading/p105-xu.pdf
[Xu et al. 01] J. Xu, R. Weischedel, and C. Nguyen. Evaluating a probabilistic model for cross-lingual information retrieval. In Proceedings of the ACM-SIGIR 2001, pages 105-110.

2008/6/27 Zhai <cz...@uiuc.edu>:

吉阳生

unread,
Jun 28, 2008, 12:43:33 PM6/28/08
to cs41...@googlegroups.com
 I see . I made a mistake.
Thank you .

2008/6/28 Hongfei Yan <yhf...@gmail.com>:
Reply all
Reply to author
Forward
0 new messages