Unit 5 - Machine Learning Bayes' Theorem problem

Richard Neilsen

unread,

Oct 30, 2011, 1:27:27 AM10/30/11

to aiclass-...@googlegroups.com

Hey folks, I'm confused by a question in the quizzes for Unit 5 - Machine Learning. If anybody can help with this it would be great.

It's "12. Question". Link: https://www.ai-class.com/course/video/quizquestion/95

So for the straight application of Bayes' Theorem, we use

P(spam | "secret","is","secret") = P("secret","is","secret" | spam) * P(spam) / P("secret","is","secret")

In the explanation, he uses total probability for the denominator, and calculates this:

P("secret","is","secret")

= P("secret","is","secret" | spam) * P(spam) + P("secret","is","secret" | ham) * P(ham)

= P("secret" | spam)^2 * P("is" | spam) * P(spam) + P("secret" | ham)^2 * P("is" | ham) * P(ham)

= 3/9 * 3/9 * 1/9 * 3/8 + 1/15 * 1/15 * 1/15 * 5/8

= 1 / 216 + 1 / 5400

= 13 / 2700

however I thought it would be easier to calculate P("secret","is","secret") directly, by counting how many words are in the dictionary in total, and how many instances of those are the specified word:

P("secret","is","secret")

= P("secret")^2 * P("is")

= 4/24 * 4/24 * 2/24

= 1 / 432

I can't see why my calculation is incorrect. Any ideas?

Richard
--
Richard Neilsen
richard...@gmail.com

"If we do not steer, we run the danger of ending up where we are going."
-- Eliezer Yudkowsky

Tim Josling

unread,

Oct 30, 2011, 1:39:59 AM10/30/11

to aiclass-...@googlegroups.com

Richard,

You are calculating the probability of any given word that comes in being 'secret'.

He is calculating the probability of a given word being 'secret' in a particular message.

To illustrate the difference, imagine that there were just 2 training messages, One said "secret message" (HAM) and the other said "secret (repeated 1,000 times) message" (SPAM).

The probability as you calculate it of the word secret is P("secret") = 1001/1003 ~= 0.998 but his calculation would be

P("secret") = P("secret" | spam) * P(spam) + P("secret"| ham) * P(ham)
= 0.998 * 0.5 + 0.5 * 0.5
~= 0.74 (!)

In essence he is weighting words by the frequency of the message they appear in whereas you are not.

Statistics gives me a headache.

Tim Josling

Lee Shepherd

unread,

Oct 30, 2011, 11:37:04 PM10/30/11

to aiclass-...@googlegroups.com

Awesome explanation, Tim!

Lee

Reply all

Reply to author

Forward