Reply to queries

Aritra Mandal

unread,

Feb 1, 2021, 5:22:34 AM2/1/21

to Discussion forum for Introduction to Probability with examples using R

Hello learners,

I am Aritra Mandal, your TA. I have tried to address all the queries.

Please go through the following. Feel free to reply if there is any doubt.

Question 1> I have a question about what is happening in this video at 00:15:21 -

http://www.youtube.com/watch?v=uLn7gaxeC_A#t=921s.

Sir, in this example we get probability of choosing red ball as 121/360. But from table if we consider that situation that there is no difference in any box, then we get probability of choosing red ball as probability of red ball divided by total number of balls which is 10/30=1/3. Also 121/360>1/3. Why does this happen? I expect that probability of red ball from the conditional probability method should be 1/3 (because sum of probability of B1, B2 and B3=1), how this comes out to be greater than 1/3? How the chance of choosing red ball from the method used in the example get increased? Please clarify the doubt..

Answer> If you calculate probabilities of choosing green and blue ball then you will get

P(green ball is chosen)= 123/360 >1/3.

But P(blue ball is chosen)=86/360 <1/3.

So you see, because of additional layer of choosing a box and then a ball, the probabilities are becoming different from 1/3.

This deviation is occurring because of distribution of balls in three boxes.

Question 2> I have a question about what is happening in this video at 00:04:32 -

http://www.youtube.com/watch?v=uLn7gaxeC_A#t=272s.

Sir, in this theorem shouldn't we have to specify that sum of probabilities of Bi's=1, or the cardinality of sample space= n because only then we can have at most n disjoint subsets (events)?

Answer> I will try to address your query using an example.

We roll two dice and note down the outcomes. So the sample space U =

{(1,1),(1,2),........,(1,6),

(2,1),(2,2,),.......,(2,6),

(3,1),(3,2,),.......,(3,6),

(4,1),(4,2,),.......,(4,6),

(5,1),(5,2,),....... ,(5,6),

(6,1),(6,2,),.......,(6,6),}

Now, let B_1=First dice shows 2 ; B_2=First dice shows 4; B_3=First dice shows 6. Then event B_1={(2,1),(2,2,),......,(2,6)} and event B_2={(4,1),(4,2,),......,(4,6)}, event B_3={(6,1),(6,2,),......,(6,6)}

Then B_1,B_2 and B_3 are disjoint events. Take A=Both dice shows an even number.

Then A={(2,2),(4,4),(6,6)} and A is subset of Union of B_1,B_2,B_3.

Now we can apply Theorem 1.3.5 to get probability of event A.

You can see that Union of events B_1,B_2,B_3 is not equal to sample space U.

So sum of probabilities of events B_1,B_2,B_3 is<1.

I hope it clears your doubt.

Question 3> I have a question about what is happening in this video at 00:00:05 -

http://www.youtube.com/watch?v=whOmwh682oM#t=5s.

Sir can you please explain the proof of Theorem 1.2.1 ..

Answer> Dear learner I feel the proof is well explained in the video.

If you can specify your doubt in the proof of the theorem then I can answer that.

Question 4> I have a question about what is happening in this video at 00:40:09 -

http://www.youtube.com/watch?v=uLn7gaxeC_A#t=2409s.

How does P(R2) comes as p(R2 intersection R1 or R2 intersection B1). Pls explain.

Answer> R_1=Red ball was drawn in the 1st draw

R_2=Red ball was drawn in the 2nd draw

B_1=Black ball was drawn in the 1st draw

So you see R_2 can be achieved by following ways:

1) Red ball was drawn in the 1st draw and then Red ball was drawn in the 2nd draw

2) Black ball was drawn in the 1st draw and then Black ball was drawn in the 1st draw

So, "Red ball was drawn in the 2nd draw" is same as "Red ball was drawn in the 1st draw and then Red ball was drawn in the 2nd draw or Black ball was drawn in the 1st draw and then Black ball was drawn in the 1st draw".

This implies R_2 is same as "R_1 intersection R_2 or B_1 intersection R_2"

I hope concept is clear now.

Thanks and regards

Himanshu

unread,

Feb 1, 2021, 11:47:47 AM2/1/21

to Discussion forum for Introduction to Probability with examples using R, ariga...@gmail.com

Hello mam,

Thank you for the replies. I have read the answer to the Question-1, I have doubt about the answer which I am not able to understand.

In the problem, probability of red ball gven ball given box-1 is calculated as P(R|B1)= Number of red ball in box-1/Total number of ball in box-1 = 4/12. Similarly P(R|B2) and P(R|B3) is calculated as 3/8 and 3/10 respecively.

If I try to find this probability using the formula of conditional probability as P(R|B1) = P(R intersection B1)/ P(B1) = (Total number of red balls in B1 / Total number of balls) / (Probability of choosing B1) = (4/30)/(1/3) = 4/10.

I have two doubts-

1) Why I get different answer, because intutively 4/12 probability makes more sense, but formula-wise we get 4/10. Why this is happening? Why in the problem 4/12 is being chosen not 4/10?

2) If formula-wise, I calculate P(R|B2) and P(R|B3), I get the result as 3/10 and 3/10 respectively. Now I have calculate the probability of R as P(R) = P(R|B1)*P(B1) + P(R|B2)*P(B2) + P(R|B3)*P(B3) = [(4/10)+(3/10)+(3/10)]*(1/3) = (10/10)*(1/3) = 1/3, which is our actual probability of choosing red balls out of total balls.

Mam I get very confused, what is happening. May you please clarify the doubt.

Thanks!

Message has been deleted

Vinay M

unread,

Feb 3, 2021, 4:09:59 AM2/3/21

to Discussion forum for Introduction to Probability with examples using R, jman...@gmail.com, ariga...@gmail.com

Hi, very nice question, and a subtle one. Thanks for asking this question and because of this we can understand with clarity what is really happening here. P(R|B1) = |(R∩B1)|/ |(B1)|, this formula is true as it is the first few true statement on the path of getting to final form of Bayes' theorem which says P(R|B1) = P(R∩B1)/ P(B1), but there is a mistake here, before getting to the later from the former as shown above there is another step which assumes equally likely outcome setting for the final formula to hold. Let's see how,

P(R|B1) = |(R∩B1)|/ |(B1)| = (|(R∩B1)| / |S|) ÷ ( |(B1)| / |S| ) ≠ P(R∩B1)/ P(B1), here the denominator in the middle step is not the same as P(B1) in general, think about it, this is the subtle part and much more interesting part is that the numerator in the middle step is not equal to P(R∩B1). All this is happening because of the wrong assumption of what is in the sample space, if we just took all the 30 balls to be in it, then all of them are not likely because of the fact that they are in boxes and the boxes have another fixed probability distribution of being equally likely among themselves which is not related to how balls are distributed in the boxes, so if we try to conform the probability of choosing the boxes also on the number of balls it has compared to the total sample space which has 30, this sort of thinking is wrong. The boxes have separate fixed probability and they have the sample space which contains 3 boxes which is not at all concerned with the number of balls in it.

Maybe this will help you clear your confusions, but it's a nice confusion which leads to a proper understanding if thought out deeply.

Himanshu

unread,

Feb 3, 2021, 8:35:22 AM2/3/21

to Discussion forum for Introduction to Probability with examples using R, vina...@gmail.com, Himanshu, ariga...@gmail.com

@Vinay, thanks for the reply. I have read your answer, but still I have some doubt. I will try my best to explain my doubt from basic. So please read it if it is slightly longer.

Basically, P(R|B1) is probability of choosing red ball if we know we have chosen black ball. Then it is simply number of red balls in B1 divided by total numbers of balls in B1.

So, from set theoretic point of view, we can say that B1={R,R,R,R,G,G,G,B,B,B,B,B} and R∩B1={R,R,R,R}. So, P(R|B1)=|R∩B1|/|B1|=4/12=1/3. Upto that point it is fine.

Similarly, B2={R,R,R,G,G,G,B,B}, R∩B2={R,R,R} and B3={R,R,R,G,G,G,G,B,B,B}, R∩B3={R,R,R}.

Now, we have total of three boxex B1,B2, and B3. So, sample space, S'={B1',B2',B3'}. I have written apostrophe over the elements of sample space but these represents the usual B1, B2, and B3 box.

P(B1')=P(B2')=P(B3')=1/3. In this sample space, we don't have information of how the balls are distributed in the box. So this probability of irrespective of the distribution of balls inside the boxes.

But if we use the result that P(R)=P(R|B1)P(B1)+P(R|B2)P(B2)+P(R|B3)P(B3), here it is the condition that R is a subset of (B1UB2UB3) and Bi's are disjoint events. And the sample space should remain the original one, i.e, S={R,....,R, G,.....,G, B,.....,B}= B1UB2UB3 which is different from S'.

I have two doubts-

1) Now we can see that B1∩B2∩B3 is not equal to empty set, they are not disjoint, so is it the reason that we have used B1', B2' and B3' because they are disjoint. But then also there is a problem because we have changed our sample space from S to S'?

2) Is the set B1=B1'? Because B1' does not contain any information about the distribution of ball in it, but set B1 which we have above seen contains the full information of how much ball of each colour is present in it.

So, is the R∩B1' defined?

It might be a silly question, but I find some difficulty in grasping the concept.

Vinay M

unread,

Feb 3, 2021, 10:04:54 AM2/3/21

to Discussion forum for Introduction to Probability with examples using R, jman...@gmail.com, Vinay M, ariga...@gmail.com

B1', B2', B3' are used to find the probability of choosing them, because it is stated in the question that we first randomly choose one box out of three, which are all equally likely. Here B1' is same as B1, which is just box 1. Yes sample space is changed like you say from S to S' for finding the probabilities of choosing the boxes first because thats how question asks us to consider irrespective of how many balls are in it. You just don't worry about the set B1' itself and what's in the set, don't bring the elements of B1' out to calculate the probability of selecting the boxes as its not required and it's wrong to do that in this context, whatever detail is in B1' just don't bother for doing box selection. Consider the details only after the boxes are chosen. I have to remind you again that B1' = B1' they are the same box, maybe you prefer saying B1' is just looking at Box, that's it but B1 is when we look into what's inside detail also.
R∩B1 or R∩B1' is the event of choosing a red ball and a box 1 but again, please do not bring out the details in the box to calculate this probability like this : - (|R∩B1| /30) because this implicitly conforms the probability of selecting B1 to consider all the details in the box which as we have seen is wrong.

The following equation is true, P(R∩B1) = P(B1) x P(R|B1).

Himanshu

unread,

Feb 3, 2021, 1:14:04 PM2/3/21

to Discussion forum for Introduction to Probability with examples using R, vina...@gmail.com, Himanshu, ariga...@gmail.com

@Vinay, thank you so much for the the discussion. I have read your replies several times. The whole point of my confusion is that I was considering the event of choosing red ball as equally likely. The 30 balls are not just placed randomly in a single box, they are distributed in 3 boxes. To pick any ball I have to put my hand in any of the 3 boxes, then I can pick up a ball, that's why choosing red ball is not equally likely and conditional probability comes into picture.

But in the lecture and the book by professor, why conditional probability P(A|B) is defined as P(A∩B)/P(B) because this hold only true if A∩B and B are equally likely.

I am attaching two pic from the book.

If A and B are not equally likely, then P(A|B)= |A∩B|/|B| may not be equal to P(A∩B)/P(B). Because if they are equally likely then this is true. But if they are not equally likely then how we are sure that |A∩B|/|B|=P(A∩B)/P(B) because in that case P(A∩B) ≠ |A∩B|/|S| and P(B) ≠ |B|/|S| ?

In theorem 1.3.5, also in last step, P(A∩Bi)=P(A|Bi)P(Bi) is used. How this is justified if A and A∩B are not equally likely?

Vinay M

unread,

Feb 4, 2021, 1:25:51 AM2/4/21

to Discussion forum for Introduction to Probability with examples using R, jman...@gmail.com, Vinay M, ariga...@gmail.com

Very good question, P(A|B) = P(A∩B)/P(B) holds even if the events considered in this formula A and B both DO NOT share the same sample space. But let's see what's happening in our example problem.

In our probelm P(R|B1) = P(R∩B1)/P(B1) is true, and P(R|B1) = |R∩B1|/|B1| is also true, but here for this second equation we have to look at the details in the event set B1.

Now, in equation P(R|B1) = |R∩B1|/|B1| , I can just divide both numerator and denominator both by |S| which is just a number, here let's say it is the cardinality of |(B1 ∪ B2 ∪ B3)| = 30. Even after dividing by |S| for both numerator and denominator the answer calculated doesn't change. Things go wrong only when we interpret numerator |R∩B1|/|S| as P(R∩B1) and denominator |B1| /|S| as P(B1). I think your doubt is right here, if

P(A|B) = P(A∩B)/P(B) (according to conditional probability) and also P(A|B) = |(A∩B)|/|(B)| , then why not just divide numerator and denominator by a common number let's say |S| and why don't it make sense to say |(A∩B)|/|S| as P(A∩B) and |B|/|S| as P(B) ? It is wrong to interpret in that way because in P(B), the event B may have different sample space than the event A like it has happened for our example, i.e. the sample space for P(B1) and the sample space for R are not the same. Infact the sample space of all the 30 balls is a wrong one, it should not be taken anywhere at all here, because if we take the 30 ball sample space it will push/force the B1 event to take that 30 ball sample space, which is wrong. Now let's consider P(R|B1) = P(R∩B1)/P(B1), the numerator is the probability of both event R ( selecting a red ball) and B1( selecting box 1) happening, but these two events do not share the same sample space of that 30 ball.

The sample space of R is {R from B1, R from B2, R from B3} but each of these elements are not equally likely because of the fact that red ball will not equally come from different boxes, some box may have more red balls and a red ball coming from this box will have more probability than that coming from other boxes. Notice that the selection of R is a two step process, first it has to go through box selection which has sample space S' ={B1, B2, B3} and then second step as selecting red ball from a box, which could be any one of these three sample space depending on the box, SB1 ={ R,R,R,R,G,G,G,B,B,B,B,B} , SB2= {R,R,R,G,G,G,B,B}, SB3 = {R,R,R,G,G,G,G,B,B,B}.

Therefore P(R∩B1) = P(B1) x P(R from SB1) which is same as writing P(R∩B1) = P(B1) x P(R|B1).

Himanshu

unread,

Feb 4, 2021, 2:21:28 PM2/4/21

to Discussion forum for Introduction to Probability with examples using R, vina...@gmail.com, Himanshu, ariga...@gmail.com

@Vinay, I appreciated your reasoning. I am also trying to prove the same by the Bernoulli's single trial method.

Each trial can be thought of composed of two subtrials (subtrial-1- choosing the box, subtrial-2 - choosing the ball). And success is choosing red ball after first subtrial (i.e., R|sub-event-1).

So the sample space will become S= {(B1,Success), (B1, Failure), (B2,Success), (B2, Failure) , (B3,Success), (B3, Failure)}.

Here B1={r_B1,...,r_B1,g_B1,..,g_B1,b_B1,...,b_B1}, B2={r_B2,...,r_B2,g_B2,..,g_B2,b_B2,...,b_B2}, B3={r_B3,...,r_B3,g_B3,..,g_B3,b_B3,...,b_B3}. Labels are used to show that B1, B2 and B3 are mutually disjoint.

Sample space for the subtrial-1 is {B1,B2,B3}.

We can easily see that sample space of choosing red ball from B1 (R|B1) is B1.

The sample space for both the subtrials are different. Also whether the box B1 is chosen or not P(R|B1) remain same. So. both the subtrials are independent.

The probability of choosing red ball after choosing B1 (Caution- It is not the probability of choosing red ball from B1 which is P(R|B1)) is P(R∩B1) = P(B1).P(R|B1).

As P(B1)>0, so P(R|B1) = P(R∩B1)/P(B1). This formula is irrespective of the fact that whether the sample space of the numerator and denominator is same or not. If the sample space of both the numerator and denominator is same and each outcome is equally likely then P(R|B1)= |R∩B1|/|B1|.

Thanks for the wonderful discussion. It helped in analyzing the concepts better.

Reply all

Reply to author

Forward