how do I represent this statement using Bayes notation

21 views
Skip to first unread message

Aditya Mathur

unread,
Oct 27, 2011, 5:57:09 PM10/27/11
to stanford...@googlegroups.com
Given this sentence
"1% of women at age forty who participate in routine screening have breast cancer."

Let the set BC = set of all women with Breast Cancer
           Set 40W = set of all 40 year old women

in probability notation, would this be denoted as P(BC|40W) = 0.01, or would this be P(BC, 40W) = 0.01

P(BC, 40W) means probability that you are a 40 year old woman AND you have breast Cancer

P(BC|40W) means probability you have cancer, given that you are a 40 year old woman

Both these notations seem to be correctly represent the given problem description (but ofcourse, only one of them has to be right)

Can someone please help me understand which is the right representation for such a statement and why?
Thanks a lot

Shahab Eslamian

unread,
Oct 27, 2011, 6:15:58 PM10/27/11
to stanford...@googlegroups.com
P(BC|40W) is the correct representation.

P(BC,40W) means P(BC) x P(40W) :P(BC) is not equal to .01

> --
> You received this message because you are subscribed to the Google Groups
> "Stanford AI Class" group.
> To post to this group, send email to stanford...@googlegroups.com.
> To unsubscribe from this group, send email to
> stanford-ai-cl...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/stanford-ai-class?hl=en.
>

aditya mathur

unread,
Oct 27, 2011, 7:12:27 PM10/27/11
to stanford...@googlegroups.com

by P(BC) x P(40W) :P(BC)
I assume u mean P(40W | BC)*P(BC) correct?
If so, I dont see how that product cannot be equal to 0.01 given that statement


Sent from my iPhone

aditya mathur

unread,
Oct 27, 2011, 7:15:11 PM10/27/11
to aditya mathur, stanford...@googlegroups.com
I am not saying that it is 0.01,
I am saying that I don't think we can Deduce that it cannot be 0.01

Sent from my iPhone

Sarah Norell

unread,
Oct 27, 2011, 7:25:21 PM10/27/11
to stanford...@googlegroups.com
I would say it is P( have breast cancer | 40 year old woman and
screened) =0.01 so your sets aren't quite right to start with since it
only talks about women going for screening not all 40 year old women.

We could resist it just to women or to people in general depending on
what other probabilities we have/want.

BC = set of all people/women with Breast cancer


40W = set of all 40 year old women

S= set of all routinely screened people/women

Then it becomes

P(BC | 40W, S) = 0.01

Shahab Eslamian

unread,
Oct 27, 2011, 7:25:59 PM10/27/11
to stanford...@googlegroups.com
no, I just meant that P(BC) is not the 1% mentioned in the question,
it is the probability of breast cancer for all ages, whereas 1% is the
probability of breast cancel for over 40.
P(BC, 40W) means the probability that someone has "breast cancer" and
is "over 40", which are considered independently.

P(BC|40W) means probability of "breast cancer" for people "over 40".

David Weiseth

unread,
Oct 27, 2011, 8:05:29 PM10/27/11
to stanford...@googlegroups.com
I am just commenting what you wrote and not any other question information given somewhere else.

This is a joint probability, not a conditional probability as stated.  No other information that you have provided gives you the ability to say anything other than

P(BC,40W) = .01
P( BC | 40W ) P(40W) = .01  
P(40W | BC ) P( BC ) = .01

Hope that helps.  --David

David Weiseth

unread,
Oct 27, 2011, 8:14:53 PM10/27/11
to stanford...@googlegroups.com
one more thing 

BC is not a set, it is characteristic,
40W is not a set it is a characteristic 

Probabilities are ratios of (what you want to find in a set ) / (whole set)

You can approach your final answer from intermediate points of filtering of either the set to find or the whole set, which is where the conditional probabilities come in

You want to find all elements in the whole set that have both characteristics, this is why it is a joint probability. 

The conditional probabilities just allow you to approach this result with different starting information, it does not change the destination, just a different way to get there.

Hope that helps as well --David

I like using a graph/tree to reason my way through the complex probability problems, then I can just see which branches to add up for the total probability that answers the question, it is a bit tedious but it makes sense to me.

Shahab Eslamian

unread,
Oct 27, 2011, 8:24:05 PM10/27/11
to stanford...@googlegroups.com
No David, I believe what Sarah stated was the most correct one:

"BC = set of all people/women with Breast cancer


40W = set of all 40 year old women

S= set of all routinely screened people/women

Then it becomes

P(BC | 40W, S) = 0.01"

It's not a joint probability. Only women at age 40 are being examined.
If we were to screen women of every age then we would have a joint
probability.

Sarah Norell

unread,
Oct 27, 2011, 8:42:49 PM10/27/11
to stanford...@googlegroups.com
David, if you want it stictly speaking, it should be P( x in BC | x in
40W and x in S) =0.01.

deepakjnath

unread,
Oct 28, 2011, 1:58:20 AM10/28/11
to stanford...@googlegroups.com
P(BC,40W) means Probability of You having BC and Probability that you are 40W

P(BC,40W) = P(BC) x P(40W). Now probability that you are a 40 year old women = total number of 40 year old women / total number of women.

P(BC) is not equal to 1 % because P(BC) is probability of having cancer for all women.

P(BC/40W) = 1%

if we assume  P(40W) = 30 % then

P(BC/40W) = 1 % ; Probability that you have cancer given you are a women @ 40.

P(BC,40W) = P(BC)xP(40W) = .01 x .3 = 0.3% ; P(BC,40W) means Probability of you having BC and Probability that you are 40W



--

Ehsan Tadayon

unread,
Oct 28, 2011, 2:20:22 AM10/28/11
to stanford...@googlegroups.com
ok...
great discussion!
the point is that you have gathered all 40 years women, and then 1% of them are going to have BC! so among 40 years women , 1% are prone for BC! so we dont have any other information regarding rest of the population! 
so we have no idea about what is the probability of women being 40 years old! so we can not talk about p(BC,40W) because we need p(40W)! but we know p(BC|40W) because we have gathered them and we have screened just 40 years old women! 
tricky question!

Ehsan Tadayon

unread,
Oct 28, 2011, 2:29:46 AM10/28/11
to stanford...@googlegroups.com
what Sarah did . does not rule out why it can not be p(BC,40W,S)! and i think the question is that whether it's a conditional probability or joint probability!

aditya mathur

unread,
Oct 28, 2011, 7:33:22 AM10/28/11
to stanford...@googlegroups.com
Yea exactly, the question is should it be represented as a joint probability or a conditional one 

Also, deepak..
P(BC,40W) != P(BC)*P(40W)
P(BC,40W)= P(40W | BC )* P(BC) = P(BC | 40W)* P(40W)


Sent from my iPhone

aditya mathur

unread,
Oct 28, 2011, 7:40:39 AM10/28/11
to stanford...@googlegroups.com
I agree with these sets, I missed out set S

But that still doesn't rule out the statement being represented as P
(BC, 40W, S) , rather than P(BC | 40W,S)

aditya mathur

unread,
Oct 28, 2011, 8:00:36 AM10/28/11
to stanford...@googlegroups.com
P(BC,40W) = P(BC) x P(40W) only if the two are independent. 

also, there is no other information given in that statement so I dont think we can solve anything like you have. I am just asking what the correct representation of the statement would be using probability notations

Sarah Norell

unread,
Oct 28, 2011, 8:15:38 AM10/28/11
to stanford...@googlegroups.com
One question I do have for all those who say it is the joint
probability. What is the sample space? If it's just women, then it's
saying something very different from if it's all people.

aditya mathur

unread,
Oct 28, 2011, 8:20:36 AM10/28/11
to stanford...@googlegroups.com
We just have that statement.... Also, I am not saying that it is a joint probability, I am saying that I am not sure if it is joint or conditional prob

Sent from my iPhone

Sarah Norell

unread,
Oct 28, 2011, 8:57:00 AM10/28/11
to stanford...@googlegroups.com
I'm just going to play with the figure depending on the sample space
to see what it could really be saying. I'm going to assume it's the
joint probability where the sample space is first all people, and
secondly just women.

What does it say if the sample space is all people? It says that 1 in
100 people is a woman who goes for screening and is 40 years old. From
a demographic point of view, are 1 in 100 people women aged 40 let
alone ones with cancer?

At a rough estimate, going on figures from
http://www.indexmundi.com/world/demographics_profile.html, and
assuming that there are an equal number of people at each age from
15-64, we have 0.6% of the world population being female and 40, that
is P(Woman and age 40)= 0.006 < P(Woman and age 40 and breast cancer
and screening)=0.01.

If instead of the sample space being everyone, we just consider it as
women. Then the percentage of women aged 40 is around 1.3%. Let's see
that if a 40 year old woman goes for a screening, whether she is going
to have cancer.

P(breast cancer and screening|Woman and age 40)=P(Woman and age 40 and
breast cancer and screening)/P(Woman and age 40) = 0.01/0.013 = 0.77.
Yikes! A 77% chance she has breast cancer and goes for screening!

Maybe there was a baby boom that year (USA year 2000 baby boomers were
around 40). In the USA in 2000, approximately 0.8% of the population
was female and 40.
(http://en.wikipedia.org/wiki/Demographics_of_the_United_States)

Aditya Mathur

unread,
Oct 28, 2011, 10:01:52 AM10/28/11
to stanford...@googlegroups.com
Sarah, I am afraid you have gone off on a tangent here :)
The statement "1% of women at age forty who participate in routine screening have breast cancer." does not tell you anything about the sample space. There is not even a probability question to solve here!! so I am not sure why we should be looking for a solution. This statement can just be the start of a question and not the entire question. This statement might as well have been;
"1% of all A that participate in B have C" 
The sample space can only be deduced from the probability being asked to calculate. There is no such question here.
My question is, just how should such a statement be represented using probability notation? Not to calculate the probability. (which means that we would need to correctly judge the sample space, as you so rightly pointed out earlier)




Sarah Norell

unread,
Oct 28, 2011, 11:03:36 AM10/28/11
to stanford...@googlegroups.com
I am well aware it tells you nothing of the sample space. The reason I
did the calculation was to explore the meaning of the joint
probability which I consider is relevant but as you say it is not, so
be it. I'm sorry to have wasted everyone's time.

Aditya Mathur

unread,
Oct 28, 2011, 11:26:59 AM10/28/11
to stanford...@googlegroups.com
I did not mean to imply that you were wasting my time (or anyone elses). This is just a discussion, I am not sure why you are taking my reasoning personally. You brought up a good point (many), I just thought that you were deviating from the question a little bit and so pointed it out ( very politely). May be it didnt translate well on email, my intentions were not to mock u, or disprove what you are saying, just to add to the discussion

Aditya Mathur

unread,
Oct 28, 2011, 11:29:30 AM10/28/11
to stanford...@googlegroups.com
Also, I still have no idea whether it should be joint or conditional. and I am merely pointing out that you reasoning doesnt provide me with a concrete reason why it should be either. (I gave my reason earlier)

P.S I am at work so am not able to write long emails. so if my reasons were not elaborate enough, then I apologize.
Reply all
Reply to author
Forward
0 new messages