Borel's paradox

Norm

unread,

Oct 3, 2002, 11:42:29 PM10/3/02

to

Hi again, sci.mathers

I have another probability theory question. Borel's paradox refers to
the ambiguities that can occur when defining a conditional distribution
for continuous random variables, where the issue of conditioning on an
event of probability zero arises, as in conditional pdf
f_{Y|X=x}(y) : = f_{X,Y}(x,y) / f_X(x). (With hopefully obvious
notation.)

Eg, if X and Y are iid exponential rv's, the conditional density of X+Y
given X=Y can be approached either by (a) changing variables to U=X+Y
and V=X/Y and noting that X=Y iff V=1 leading to the conditional
distribution for U (given V=1) as being gamma distributed; or (b)
changing to U=X+Y and V=X-Y so that X=Y iff V=0 leading to conditional
distribution for U (given V=0) as exponential, same parameter as X and Y
to begin with.

The difference here is that in (a), we're conditioning on the event
{Y \leq X \leq (1+h)Y} with h --> 0, whereas in (b) it's {Y \leq X \leq
Y+h}.

My question is: is there an underlying difference in sigma-fields going
on here? If so, what are the sigma-fields? More specifically, if a
sigma-field is specified, is there then only one way to construct a
conditional pdf that naturally follows?

Or is it all just a matter of convention and nothing to do with
sigma-fields at all?

Well, as you can see, I don't understand the role of sigma-fields here.
Any comments would be much appreciated.

Norm

Stephen J. Herschkorn

unread,

Oct 4, 2002, 12:21:25 AM10/4/02

to

>
>
>Eg, if X and Y are iid exponential rv's, the conditional density of X+Y
>given X=Y can be approached either by (a) changing variables to U=X+Y
>and V=X/Y and noting that X=Y iff V=1 leading to the conditional
>distribution for U (given V=1) as being gamma distributed; or (b)
>changing to U=X+Y and V=X-Y so that X=Y iff V=0 leading to conditional
>distribution for U (given V=0) as exponential, same parameter as X and Y
>to begin with.
>

Really, the conditional distribution of X+Y given X=Y is not
defined. Or you can define it to be anything. Here is a summary of the
modern definition of conditional probability which I posted in response
to another question earlier.

Kuba Olejnik wrote:

>> "Stephen J. Herschkorn" ...
>
>
>>> > To define
>>> > conditional probability precisely requires an appeal to measure theory,
>>> > where we find that P(A|X) is a random variable which is technically not
>>> > uniquely defined.
>>
>>
>> I would be truely grateful for a
>> bit closer hint in measure thery. Esp. the problem of non-uniqueness
>> bothers me the most.
>
>

O.K., since you asked:

Let (S,F,P) be a probability space, let X be an integrable random variable on S, and let G be a sub-sigma-field
of F. By definition, the random variable Y is a (version of the) conditional expectation E[X|G] iff Y is
G-measurable and E[YI] = E[XI] for all indicator variables I for events in G. By the Radon-Nikodym theorem,
conditional expectation exists. It is in general not unique, but if Y and Z are two versions of E[X|G], then Y =
Z almost surely. (I believe that if G is discrete with no null atoms, then E[X|G] is unique, but I am not sure.)

Some examples:

* If X is G-measurable, then E[X|G] = X a.s.
* E[X | { {}, S}] = EX
* The usual definitions in applied probability of conditional expectation for discrete and continuous random
variables.

Given a random variable Y, E[X|Y] = E[X | s(Y)], where s(Y) denotes the sigma-field generated by Y. E[X | Y=y] =
E[X|Y] evaluated on the event {Y=y}.

Conditional probablity P(A|G) = E[1_A | G] where 1_A is the indicator variable for A. Note that the statement 0 <=
P(A|G) <= 1 holds only almost surely; countable additivity holds only almost surely as well. P(A|B) = E[1_A | { {}, B,
Bc, S}] evaluated on B.

Another, equivalent approach to conditional probability uses the the concept of a conditional probability kernel. I do
not recall the details (from the lectures fifteen yeas ago!), and unfortunately I have no reference to offer. I am not
even familiar with a readable reference for the first approach. You can take a look at Billingsley,
_Probability_and_Measure_; Halmos, _Measure_Theory_ also discusses the topic briefly. Readers, any other references?

--
Stephen J. Herschkorn hers...@rutcor.rutgers.edu

Norm

unread,

Oct 5, 2002, 4:50:18 PM10/5/02

to

"Stephen J. Herschkorn" wrote:

> Really, the conditional distribution of X+Y given X=Y is not
> defined. Or you can define it to be anything. Here is a summary of the
> modern definition of conditional probability which I posted in response
> to another question earlier.

Ok, thanks. I'll work on you wrote later on in that post. At the risk of being obtuse, when I looked up the Billingsley book,
I found his problem on Borel's paradox and part of it stumps me. He considers longitude Theta and latitude Phi rv coordinates
on the sphere, specifying a point uniformly distributed on the sphere. The problem asks you to show that Phi conditional on
Theta = theta has density |cos phi|/4, hence points on a great circle longitudinally (eg great circle thru Greenwich) are
*not* uniformly distributed. I can show that. Similarly, I can show that Theta conditional on Phi = phi is uniformly
distributed. In particular, points on the equator *are* uniformly distributed. The "paradox" here being that great circles
are indistinguishable on a uniformly distributed spherical surface, yet the equator is uniform, whereas longitudinal great
circles are not. I can see all that. *But* what eludes me is his concluding comment:

"This shows again the inadmissibility of conditioning with respect to an isolated event of probability 0. The relevant
sigma-field must not be lost sight of."

I don't get this. Can you (or anyone else) help me understand what the relevant sigma-field(s) is (are) here? I really would
like to understand the underlying mechanism of all this.

Thanks,

Norm

Stephen J. Herschkorn

unread,

Oct 9, 2002, 1:23:33 AM10/9/02

to

[cc'd to Herman Rubin: Herman, what do you think of this explanation?]

Norm-

>At the risk of being obtuse, when I looked up the Billingsley book,
>I found his problem on Borel's paradox and part of it stumps me. He considers longitude Theta and latitude Phi rv coordinates
>on the sphere, specifying a point uniformly distributed on the sphere. The problem asks you to show that Phi conditional on
>Theta = theta has density |cos phi|/4, hence points on a great circle longitudinally (eg great circle thru Greenwich) are
>*not* uniformly distributed. I can show that. Similarly, I can show that Theta conditional on Phi = phi is uniformly
>distributed. In particular, points on the equator *are* uniformly distributed. The "paradox" here being that great circles
>are indistinguishable on a uniformly distributed spherical surface, yet the equator is uniform, whereas longitudinal great
>circles are not. I can see all that. *But* what eludes me is his concluding comment:
>
>"This shows again the inadmissibility of conditioning with respect to an isolated event of probability 0. The relevant
>sigma-field must not be lost sight of."
>
>I don't get this. Can you (or anyone else) help me understand what the relevant sigma-field(s) is (are) here? I really would
>like to understand the underlying mechanism of all this.

In the problem whereto you refer, let us give the sphere an orientation
in R^3, where the third coördinate (z) is vertical. Then the underlying
probability space (S, F, P) is given by S = the unit ball in R^3, F =
the Borel sets of S, and PA = (surface area of A)/(4Pi). Define six
random variables on S: X, Y = the first and second projection maps;
Theta1 = longitude with range (-pi, pi]; Phi1 = latitude with range
[-pi/2, pi/2]; Theta2 = longitude with range [-pi/2, pi/2], and Phi2 =
latitude with range (-pi, pi]. Letting s(") indicate sigma-field
generation, s(Theta1) and s(Phi2) are generated by the surface of
wedges, (i.e., regions on the surface between two great semicircles of
the same orientation), whereas s(Phi1) and s(Theta2) are generated by
horizontal and vertical strips, respectively. The event B = {(x,y,z) in
S: z = 0} is in both s(Phi1) and s(Phi2), and you have noted that by
using the elementary definitions of conditional distribution, that of
(X,Y) given Phi1 = 0 differs from that of (X,Y) given (Phi2 = 0 or Phi2
= pi), although the conditioning events both describe B. Were you to
ignore the angular coördinates, you would probably say that the
conditional distribution of (X,Y) given B is the same as the first of
these (i.e., the uniform one). An important feature here is that PB = 0.

Thus, the conditioning sigma-field determines how we, in practice,
define our conditional probabilities on null events. (See my elementary
take below.) This, I think, is what Billingsley means. Strictly
speaking, with the measure-theoretic definition of conditional
probability, P(A|B) is necessarily constant only when B is an atom in
the conditioning sigma-field G, (that is, the only proper subset of B in
G is the empty set), since P(A|G) is a G-measurable random variable. G
is determined by context; without other specification, it is usually
assumed to be the four-set sigma-field s(B). If B is an atomic null
event in G, then P(A|B) can be defined to be anything (e.g., -17) in a
version of P(A|G). So, in our example, we could justifiably define
P(Theta1 <= x | B) = -3x^2 if x >= 10, e if x < 10, since P(" | G) is a
probability measure only almost surely.

Here are some exercies to help you with this:

- Let X and Y be jointly continuous random variables, U = X+Y, V = X -
Y, and W = Y/X. Describe s(V) and s(W). Using elementary definitions,
determine, in terms of the joint density of X and Y, the conditional
density of U given V=0 and that of U given W=1. Explain why these differ.

- (S,F,P) be a probability space, X an integrable random variable on S,
G a sub-sigma-field of F, A a non-null atom in G, and I the indicator
variable for A. Show that E[X | G] I = E[X | s(A)] I. (I think this is
true - I haven't worked out the details.)

And here is my more elementary take on Borel's paradox.: Essentially,
you want to define P(A | B) when B is a null event, so you try the limit
of P(A | B_n), where (B_n) is a sequence of non-null events descending
to B. The problem with this definition is that the limiting value
generally depends on the sequence of events you choose. The limit may be
unique if you restrict the sequence to a specific sigma-field (e.g.,
s(Theta1) in our example).

Do not worry about your difficulty understanding Billingsley,
Probability and Measure, which, while comprehensive, is not an easy book
whence to learn the subject. Unfortunately, I myself do not know of good
textbooks on the subject. Some other tentative suggestions: the chapter
on probability in Halmos, Measure Theory and the brief discussion of
measure-theoretic conditional expectation in the chapter on martingales
in Karlin and Taylor, A First Course in Stochastic Processes. Do readers
out there have a good suggestion?

Also, are you taking a first-semester measure-theoretic course in
probability or teaching yourself the material right now? When I took a
graduate course on the subject, we did not touch conditional probability
until the second semester.

Hope this helps,

Stephen J. Herschkorn

unread,

Oct 9, 2002, 2:04:23 AM10/9/02

to

One more observation which may help you understand the measure-theoretic
approach to conditional expectation: Let random variables X and Y
have joint dentisty f and Y have marginal density g. While,
strictly speaking, it is not true that E[X | Y = y] = integral (x
f(x,y) dx) / g(y), it *is* true that E[X|Y] = integral (x f(x,Y) dx)
/ g(Y) almost surely.

Norm

unread,

Oct 23, 2002, 9:13:33 PM10/23/02

to

Hi Stephen,

Sorry to take so long getting back to you, I've been out of town with work. Thanks for your reply below. It was very helpful and
cleared up many issues for me.

The example you gave with the sphere makes a lot of sense to me now. I'd already realized that with a different way of describing
the "longitude" and "latitude", you get different conditional dists, so I was starting to see that things were very much
coordinate dependent here. Your explanation of the underlying sigma fields highlighted that and made sense of what I was beginning
to perceive.

I'll try the exercises you mentioned.

In answer to your final question, I'm reading on my own. I did a math degree several years ago including measure theory but not a
good probability theory course. I'm just curious and trying to get back into it a bit on the side. Beats being out on the street
stealing hubcaps. ;-)

Thanks again for a most informative post!

Regards,

Norm

"Stephen J. Herschkorn" wrote:

> Norm-