Unit 5 - Machine Learning Bayes' Theorem problem

2 views
Skip to first unread message

Richard Neilsen

unread,
Oct 30, 2011, 1:27:27 AM10/30/11
to aiclass-...@googlegroups.com
Hey folks, I'm confused by a question in the quizzes for Unit 5 - Machine Learning. If anybody can help with this it would be great.


So for the straight application of Bayes' Theorem, we use

P(spam | "secret","is","secret")   =   P("secret","is","secret" | spam) * P(spam)  /  P("secret","is","secret")

In the explanation, he uses total probability for the denominator, and calculates this:

P("secret","is","secret")   
=   P("secret","is","secret" | spam) * P(spam)  +  P("secret","is","secret" | ham) * P(ham)
=   P("secret" | spam)^2 * P("is" | spam) * P(spam)   +   P("secret" | ham)^2 * P("is" | ham) * P(ham)
=   3/9 * 3/9 * 1/9 * 3/8   +   1/15 * 1/15 * 1/15 * 5/8
=   1 / 216  +  1 / 5400
=   13 / 2700

however I thought it would be easier to calculate P("secret","is","secret") directly, by counting how many words are in the dictionary in total, and how many instances of those are the specified word:

P("secret","is","secret")   
=   P("secret")^2 * P("is")
=   4/24 * 4/24 * 2/24
=   1 / 432

I can't see why my calculation is incorrect. Any ideas?

Richard
--
Richard Neilsen
richard...@gmail.com

"If we do not steer, we run the danger of ending up where we are going."
-- Eliezer Yudkowsky

Tim Josling

unread,
Oct 30, 2011, 1:39:59 AM10/30/11
to aiclass-...@googlegroups.com
Richard,

You are calculating the probability of any given word that comes in being 'secret'.

He is calculating the probability of a given word being 'secret' in a particular message.

To illustrate the difference, imagine that there were just 2 training messages, One said "secret message" (HAM) and the other said "secret (repeated 1,000 times) message" (SPAM).

The probability as you calculate it of the word secret is P("secret") = 1001/1003 ~= 0.998 but his calculation would be

P("secret") = P("secret" | spam) * P(spam)  +  P("secret"| ham) * P(ham)
= 0.998 * 0.5 + 0.5 * 0.5
~= 0.74 (!)

In essence he is weighting words by the frequency of the message they appear in whereas you are not.

Statistics gives me a headache.

Tim Josling

Lee Shepherd

unread,
Oct 30, 2011, 11:37:04 PM10/30/11
to aiclass-...@googlegroups.com
Awesome explanation, Tim!

Lee
Reply all
Reply to author
Forward
0 new messages