Brain Teaser

Steven E. Landsburg

unread,

Dec 27, 2010, 10:28:00 PM12/27/10

to

There's an old brain teaser about a country where every couple wants to
have a boy. Therefore each couple keeps having children until one is
a boy; then they stop. Question: What is the expected ratio of
female-births to all-births?

A thread on MathOverflow (with a particularly helpful comment from Douglas
Zare) reminded me that there's a valuable lesson to be learned from
this teaser, namely this:

The fact that two random variables have an expected *difference* of
zero does not imply that they have an expected *ratio* of one!

This in turn is because:

The expectation of a ratio is not (in general) equal to the ratio
of the expectations.

This inspired me to make a couple of blog posts, here and here:

http://www.thebigquestions.com/2010/12/21/are-you-smarter-than-google/

http://www.thebigquestions.com/2010/12/22/a-big-answer-2/

In the second post, I give a particularly simple counterexample
(involving four families on my block) to the assertion that
Expected Difference = 0 => Expected ratio = 1.

This triggered objections of the form "But the example is not exactly
like the problem", from people who apparently didn't realize that *any*
counterexample is enough to invalidate a purported lemma, and that
any proof that invokes that lemma is therefore invalidated, even in
contexts that look quite unlike the counterexample.

This, of course, is to be expected, and after much back-and-forth in
the comment section, I daresay most of these people ended up learning
something, which is a very good thing. Certainly many very very smart
people have the wrong intuition about this problem at first.

But, the Internet being what it is, there are a small number of very
cranky cranks who belligerently defend the wrong answer and (of course)
conclude that I am very foolish for not deferring to their belligerence.
One of these (whose comment on my blog I deleted for its lack of
content and abusive tone --- something I almost never find necessary)
came from a faculty member in the physics department at an Ivy League
university. Another came from the ever-cranky physicist Lubos Motl.

Probably one should just shrug this stuff off, but the tone of these
idiots pissed me off enough that I've offered a $15,000 challenge to
Motl and his ilk. Yes, I'm dead serious about this. If you think Motl
is right, you can get in on the action; the offer is here:

http://www.thebigquestions.com/2010/12/22/a-big-answer-2/

Please note that I'm not using the word "idiot" to describe people who
don't at first understand this, or even people who *never* understand it.
I am using it to describe people who repeat the same false argument
18 times over, even after they've been shown an explicit counterexample
to the "lemma" they're invoking, and having had the relevance of that
counterexample pointed out to them 62 times in 62 ways.

Anyway, I thought sci.math denizens might find this fun. Hence this post.

Steven E. Landsburg
http://www.TheBigQuestions.com/blog

quasi

unread,

Dec 28, 2010, 1:17:10 AM12/28/10

to

On 27 Dec 2010 22:28:00 -0500, land...@panix.com (Steven E.
Landsburg) wrote:

Definitely an intriguing problem -- thanks.

It's clear to me that you're right (of course you knew that).

Here's a sketch of my analysis.

Let n be the number of families.

For n=1, the expected ratio is

(0/1)*(1/2^1) + (1/2)*(1/2^2) + (2/3)*(1/2^3) + ...

= sum ((k-1)/k)*(1/2^k)), for k = 1 to oo

= 1 - ln(2).

which is approximately 0.3

The very fact that for n=1 the answer is less than 1/2 makes it clear
that the answer is not "1/2 for all n".

Next, for arbitrary n, consider the distribution of the number of
girls.

The expected number of girls per family is

0*(1/2^1) + 1*(1/2^2) + 2*(1/2^3) + 3*(1/2^4) + ...

= sum ((k-1)/2^k), for k = 1 to oo

= 1.

Now suppose n is large, but fixed. I'll sketch a heuristic argument.

For large n, the number of girls will be approximately normally
distributed with mean equal to

n*(expected number of girls per family)

= n

and standard deviation on the order of sqrt(n) (I think).

The number of boys will be exactly n

Hence, for any open interval centered at 1/2, however small, the
probability of the ratio (girls/(girls+boys)) falling in that interval
can, for sufficiently large n, be forced arbitrarily close to 1. But
all possible ratios are bounded below by 0 and above by 1, hence the
expectation of the ratio can't be pulled away by outliers. Thus, as n
approaches infinity, the expected ratio approaches 1/2.

Since for n = 1, the expected ratio is 1 - ln(2), or approximately
0.3, this suggests that the expected ratio increases towards 1/2 as n
approaches infinity.

To recap, for n families, the expected number of girls is n, and the
number of boys is exactly n, and hence the difference (girls - boys)
has expectation zero, but the expected ratio (girls/(girls+boys)) is
not 1/2 -- it's less, but approaches 1/2 as n approaches infinity.

quasi

Will O'the Wisp

unread,

Dec 28, 2010, 1:35:08 AM12/28/10

to

quasi:

There are a lot of valuable observations in your post, but I particularly
like this one:

quasi

unread,

Dec 28, 2010, 12:39:24 PM12/28/10

to

A few more comments ...

Fix n, the number of families.

Define random variables g,b by

g = # of girls
b = # of boys

It's easy to show that E(g) = n.

By hypothesis, b = n, so of course E(b) = n.

Since b is constant, b is independent of pretty much anything.

In particular, b+1 and g+1 are independent.

Now by linearity, it is true that

E((g+1)/(b+1)) = 1

but even though b+1 and g+1 are independent, it is not true that

E((b+1)/(g+1)) = 1

In fact, for all n, E((b+1)/(g+1)) is always greater than 1.

Of course, by linearity, it is true that

E((b+1)/(g+1)) = (n+1)*E(1/(g+1))

but while E(g+1) = n+1, it is not true that E(1/(g+1)) = 1/(n+1).

quasi

unread,

Dec 28, 2010, 2:57:02 PM12/28/10

to

Ok, here's a sketch of a proof that

E(g/(g+b)) < 1/2

for all n.

First a lemma, stated without proof, but as far as I can see, it's a
simple argument based on convexity, and moreover, the result is
probably well known and just a standard exercise (assuming that it's
actually true -- I'm not 100% sure).

lemma:

If X is a nonconstant discrete random variable with only positive real
values, then E(1/X) > 1/E(X).

Next, assuming the lemma,

and noting that b is independent of 1/(g+b),

E(b/(g+b))
= E(b)*E(1/(g+b))
> E(b)*(1/E(g+b))
= n*(1/(2n))
= 1/2.

But E(b/(g+b)) > 1/2 implies E(g/(g+b)) < 1/2 (since those two
expectations must add to 1), which completes the proof.

quasi

Ray Vickson

unread,

Dec 28, 2010, 4:41:11 PM12/28/10

to

On Dec 27, 10:17 pm, quasi <qu...@null.set> wrote:
> On 27 Dec 2010 22:28:00 -0500, landsb...@panix.com (Steven E.

If n = number of families and G = number of girls born (in the whole
population), the number of births = n+G because each family has
exactly 1 boy. If p = P{girls} per birth, the number of girls in
family i is Gi with P{Gi = k} = p*(1-p)^k, k=0, 1, 2, .... . Note that
Hi = 1+Gi is a standard geometric, with mean 1/p and variance (1-p)/
p^2, so E[Gi] = (1-p)/p and Var{Gi} = (1-p)/p^2. G is the sum of n iig
Gi's, so has mean EG = n*(1-p)/p and Var(G) = n*(1-p)/p^2. For the
case p = 1/2 we have EG = n and Var(G) = 2*n. The ratio of girls to
births is R = G/(G+n), so P{R <= r} = P{G <= r*n/(1-r)}. In the
continuous limit, with density g(x) for G we have density h(r) for r
given by h(r) = (d/dr)P{G <= nr/(1-r)} = [n/(1-r)^2]*g(nr/(1-r)). If
we use the normal approximation for g and assume that approxER =
integral_{r=0..1} r*h(r) dr, we can get numerical results for various
n using Maple. On the other hand (for p = 1/2) the exact distribution
of G is P{G = n+k} = C(n+k-1)/2^(n+k), so the exact expected ratio is
ER = sum_{k=0..infinity} (k/(k+n))*P{G = k+n}. Maple gives this as ER
= n/(n+1)*(-(-n-1)/n-(n+1)*LerchPhi(-1,1,n)), where LerchPhi(z,a,v) =
sum_{n=0..infinity} z^n/(v+n)^a. Numerical evaluation using Maple
gives:

n approx ER exact ER
1 0.411794 0.306853
2 0.425246 0.386294
3 0.436341 0.420558
4 0.445168 0.439255
5 0.452246 0.450931
6 0.457993 0.458883
7 0.462713 0.464636
8 0.466631 0.468987
9 0.469916 0.472390
10 0.472695 0.475123
11 0.475064 0.477365
12 0.477099 0.479238
13 0.478860 0.480825
14 0.480394 0.482188
15 0.481738 0.483370
16 0.482923 0.484405
17 0.483973 0.485319
18 0.484909 0.486132
19 0.485747 0.486860
20 0.486500 0.487516
21 0.487181 0.488109
22 0.487799 0.488648
23 0.488361 0.489141
24 0.488875 0.489592
25 0.489346 0.490008
26 0.489780 0.490392
27 0.490180 0.490747
28 0.490551 0.491077
29 0.490894 0.491384
30 0.491214 0.491671
31 0.491512 0.491940
32 0.491791 0.492191
33 0.492052 0.492428
34 0.492297 0.492650
35 0.492527 0.492860
36 0.492744 0.493058
37 0.492949 0.493246
38 0.493143 0.493423
39 0.493326 0.493592
40 0.493500 0.493752
41 0.493665 0.493904
42 0.493821 0.494049
43 0.493970 0.494188
44 0.494113 0.494320
45 0.494248 0.494446
46 0.494378 0.494566
47 0.494501 0.494682
48 0.494620 0.494793
49 0.494733 0.494899
50 0.494842 0.495001

R.G. Vickson

quasi

unread,

Dec 28, 2010, 5:07:43 PM12/28/10

to

Still more comments ...

In the above proof, I used the fact that b and 1/(g+b) are independent
to justify the claim

E(b/(g+b)) = E(b)*E(1/(g+b))

but since b is constant, independence was unnecessary -- linearity
suffices.

However let me propose a variation of the original problem. In this
variation, b is no longer constant. For simplicity, I'll use just one
family.

Suppose a family produces children (singlets) until two consecutive
births consist of a boy and a girl, in that order.

Of the births, let

g = # of girls
b = # of boys

Here are some claims:

(1) g,b are independent.

(2) E(g) = E(b) = 2, and hence E(g-b) = 0.

(3) E(g/b) = E(b/g) > 1.

(4) E(g/(g+b)) = E(b/(g+b)) = 1/2.

Claim (3) seems somewhat anti-intuitive in that _both_ of the ratios
g/b and b/g have expectation exceeding 1. Perhaps my claim is false?

quasi

unread,

Dec 28, 2010, 5:19:01 PM12/28/10

to

Ok, claim (3) is true -- the proof follows from the previously stated
lemma.

Since g,b are independent, g and 1/b are independent, hence

E(g/b) = E(g)*E(1/b) = 2*E(1/b) > 2*(1/E(b)) = 1

Similarly, b and 1/g are independent, hence

E(b/g) = E(b)*E(1/g) = 2*E(1/g) > 2*(1/E(g)) = 1

quasi

unread,

Dec 30, 2010, 4:29:00 AM12/30/10

to

On 27 Dec 2010 22:28:00 -0500, land...@panix.com (Steven E.
Landsburg) wrote:

>There's an old brain teaser about a country where every couple
>wants to have a boy. Therefore each couple keeps having children
>until one is a boy; then they stop. Question: What is the
>expected ratio of female-births to all-births?

As has been made clear in previous replies, if n is the number of
families, g the number of female-births, b the number of male-births,
then E(g/(g+b)) depends on n, but is always less than 1/2.

I'll modify the question in 2 ways ...

(1) Allow for a stopping rule which consists of an arbitrary nonempty
finite string of g's and b's. If consecutive births match the stopping
rule, that completes the birth process.

(2) For simplicity, restrict to the case of just 1 family.

For a given stopping rule s, let v(s) = E(g/(g+b)).

Thus, for example,

v("b") = 1 - ln(2)

which was the original question for the case n=1.

For a given stopping rule s, let s' denote the complementary rule,
with g's and b's interchanged. It's clear that

v(s') = 1 - v(s)

hence we can restrict our analysis to stopping rules s which end in b.

I think v(s) is irrational (in fact transcendental) for all s, hence
v(s) is never equal to 1/2.

Question:

Characterize the stopping rules s for which v(s) > 1/2.

Some numerical simulation-based approximate results ...

v("gb") ~ .48

v("bb") ~ .36

v("ggb") ~ .52

v("gbb") ~ .44

v("bgb") ~ .47

v("bbb") ~ .39

v("ggbb") ~ .488

v("gggbb") ~ .504

Note that by the complement property,

v("ggbb") < 1/2 implies v("bbgg") > 1/2

One might conjecture that v(s) > 1/2 if s contains more g's than b's.

I'm not sure.

The question arose just as a curiosity.

quasi

unread,

Dec 30, 2010, 5:28:08 AM12/30/10

to

Ok, so here's the full conjecture -- based partly on data, partly on
heuristic reasoning ...

Conjecture:

v(s) > 1/2 iff either

(1) s has more g's than b's

(2) s has an equal number of g's and b's, but starts with b.

quasi

Ilmari Karonen

unread,

Dec 30, 2010, 1:50:38 PM12/30/10

to

On 2010-12-28, quasi <qu...@null.set> wrote:
>
> First a lemma, stated without proof, but as far as I can see, it's a
> simple argument based on convexity, and moreover, the result is
> probably well known and just a standard exercise (assuming that it's
> actually true -- I'm not 100% sure).
>
> lemma:
>
> If X is a nonconstant discrete random variable with only positive real
> values, then E(1/X) > 1/E(X).

It is indeed a simple convexity argument. In fact, it follows from a
more general lemma:

If X is a random variable over an interval S, and f: S -> R is convex,
then E(f(X)) >= f(E(X)). Further, if f is strictly convex, then the
inequality is strict unless X is constant.

Proof is left as an exercise for the reader, but the first part
follows fairly straightforwardly from considering the random variable
Z = (X, f(X)) over S x R and the fact that E(Z) = ( E(X), E(f(X)) )
must lie in the convex hull of the support of Z. The second part can
then be shown by contradiction.

--
Ilmari Karonen
To reply by e-mail, please replace ".invalid" with ".net" in address.

quasi

unread,

Dec 31, 2010, 3:43:33 AM12/31/10

to

I now mostly believe that the conjecture is false (about 70-30, false
to true), but I don't have a counterexample.

quasi

Phil H

unread,

Dec 31, 2010, 11:48:38 AM12/31/10

to

For 4 families, EV of m/f
1m 2families
1f, 1m 1 family
2.5f, .5m 1 family ratio 2.5/3.5

For an infinite number of families.
1m 50% of families
1f,1m 25% of families
2f,1m 12.5% of families
3f,1m 6.25% of families....etc

If n is the number of families, then the number of males is the Sum of
1/2n + 1/4n + 1/8n....etc. The number of females will be the Sum of 0
+ 1/4n + 2/8n + 3/16n + 4/32n....etc.
As n approaches infinity, the ratio will be n/n.
Phil H

quasi

unread,

Dec 31, 2010, 12:39:51 PM12/31/10

to

On Fri, 31 Dec 2010 08:48:38 -0800 (PST), Phil H <phol...@gmail.com>
wrote:

No, the above is wrong -- you made a numerical error.

The expected value of the ratio m/f is always 1, regardless of the
number of families.

>For an infinite number of families.
>1m 50% of families
>1f,1m 25% of families
>2f,1m 12.5% of families
>3f,1m 6.25% of families....etc
>
>If n is the number of families, then the number of males is the
>Sum of 1/2n + 1/4n + 1/8n....etc. The number of females will be
>the Sum of 0 + 1/4n + 2/8n + 3/16n + 4/32n....etc.
>As n approaches infinity, the ratio will be n/n.

If n is nonzero, neither of the above infinite sums depends on n, so
letting n approach infinity has no relevance.

Moreover, you are answering the wrong question.

The problem asks for the expected value of m/(m+f), not the expected
value of m/f.

quasi

unread,

Dec 31, 2010, 1:52:56 PM12/31/10

to

Never mind -- early morning illusion -- the above looks OK. I fell
into the same trap that I was defending against in earlier replies.

>The expected value of the ratio m/f is always 1, regardless of the
>number of families.
>
>>For an infinite number of families.
>>1m 50% of families
>>1f,1m 25% of families
>>2f,1m 12.5% of families
>>3f,1m 6.25% of families....etc
>>
>>If n is the number of families, then the number of males is the
>>Sum of 1/2n + 1/4n + 1/8n....etc. The number of females will be
>>the Sum of 0 + 1/4n + 2/8n + 3/16n + 4/32n....etc.
>>As n approaches infinity, the ratio will be n/n.
>
>If n is nonzero, neither of the above infinite sums depends on n, so
>letting n approach infinity has no relevance.

Again, I misunderstood. What you are doing is taking limits of finite
sums, where each finite sum represents the expected value of m/f for n
families. Then you take the limit as n approaches infinity. That seems
OK -- I retract my objection.

>Moreover, you are answering the wrong question.
>
>The problem asks for the expected value of m/(m+f),

Correction: The problem asks for the expected value of f/(m+f),

quasi

unread,

Dec 31, 2010, 1:57:15 PM12/31/10

to

A temporary bout of doubt. Cancel the above -- I believe, I believe.

My original intuition seems OK -- I'm fairly sure (95%) that the
conjecture is true.

quasi

unread,

Dec 31, 2010, 2:23:53 PM12/31/10

to

No, mind again.

Firstly, it looks like you made a numerical error. Shouldn't the 1m in
the first line be multiplied by 2? Then you would get 3.5 / 3.5

But then it's still wrong, since the reasoning is wrong. It looks like
you fell into the trap that many others fell into (including myself at
least once).

To get the expected ratio you need to do a weighted average of the
possible ratios (weighted by probabilities), not the ratio of the
expected values.

When you do that, you won't even bother trying to get E(m/f) since,
for fixed n, the variable f has a positive probability of being 0.

>>The expected value of the ratio m/f is always 1, regardless of the
>>number of families.
>>
>>>For an infinite number of families.
>>>1m 50% of families
>>>1f,1m 25% of families
>>>2f,1m 12.5% of families
>>>3f,1m 6.25% of families....etc
>>>
>>>If n is the number of families, then the number of males is the
>>>Sum of 1/2n + 1/4n + 1/8n....etc. The number of females will be
>>>the Sum of 0 + 1/4n + 2/8n + 3/16n + 4/32n....etc.
>>>As n approaches infinity, the ratio will be n/n.
>>
>>If n is nonzero, neither of the above infinite sums depends on n,
>>so letting n approach infinity has no relevance.
>
>Again, I misunderstood. What you are doing is taking limits of

>inite sums, where each finite sum represents the expected value

>of m/f for n families. Then you take the limit as n approaches
>infinity. That seems OK -- I retract my objection.

But there _are_ objections -- the same objections as I made for your
analysis of the case n=4.

>>Moreover, you are answering the wrong question.
>>
>>The problem asks for the expected value of m/(m+f),
>
>Correction: The problem asks for the expected value of f/(m+f),
>
>>not the expected value of m/f.

Sorry for the confusion -- I think I'm mostly awake now -- ignore my
earlier objections and retractions. I think this reply correctly
identifies the flaws in your analysis.

quasi

Robert Israel

unread,

Dec 31, 2010, 2:54:35 PM12/31/10

to

quasi <qu...@null.set> writes:

I don't think so.

> >>>Question:
> >>>
> >>>Characterize the stopping rules s for which v(s) > 1/2.
> >>>
> >>>Some numerical simulation-based approximate results ...
> >>>
> >>> v("gb") ~ .48

The n-child families (n >= 2) with this rule will have j boys
(0 <= j <= n-2), then n-j-1 girls, then a boy.
So v("gb") = sum_{n=2}^infty sum_{j=0}^{n-2} 2^(-n) (n-j-1)/n
= sum_{n=2}^infty 2^(-n-1) (n-1) = 1/2

unless I've made a mistake.
--
Robert Israel isr...@math.MyUniversitysInitials.ca
Department of Mathematics http://www.math.ubc.ca/~israel
University of British Columbia Vancouver, BC, Canada

quasi

unread,

Dec 31, 2010, 3:35:52 PM12/31/10

to

Yes, I think you've made a mistake (hehe -- a rarity).

The probability of an n-child family is not 2^(-n).

Unless I've made a mistake (which is far less rare).

quasi

Robert Israel

unread,

Dec 31, 2010, 5:01:12 PM12/31/10

to

quasi <qu...@null.set> writes:

Huh? I thought we were assuming each child is equally likely
to be a girl or boy, independent of previous births. The probability
of any given n-child pattern of boys and girls that is allowed by the
stopping rules is then 2^(-n).

> Unless I've made a mistake (which is far less rare).
>
> quasi

quasi

unread,

Dec 31, 2010, 6:12:53 PM12/31/10

to

On Fri, 31 Dec 2010 16:01:12 -0600, Robert Israel
<isr...@math.MyUniversitysInitials.ca> wrote:

Yep, my mistake.

An error both in my heuristic reasoning, at least for the case where
the stopping string has an equal number of g's and b's, and an error
in my simulation.

Of course, I knew that if one of us was in error, it was more likely
to be me than you, but I put too much faith in the simulation results
(which as is clear now, were wrong).

So my conjecture is now simpler ...

Conjecture:

v(s) < 1/2 if s has less g's than b's.

v(s) = 1/2 if s has an equal number of g's and b's.

v(s) > 1/2 if s has more g's than b's.

The proof for case of an equal number of g's and b's is probably a
straightforward generalization of Robert Israel's calculation for the
case s = "gb".

The other cases may be just as easy, I'm not sure.

Apologies to Robert Israel for falsely accusing him of having made an
error (but I'll catch him in an error one of these days).

quasi

Tim Little

unread,

Dec 31, 2010, 9:03:00 PM12/31/10

to

On 2010-12-31, quasi <qu...@null.set> wrote:
> The problem asks for the expected value of m/(m+f), not the expected
> value of m/f.

This is a good thing, because m/f has no expected value. For any
number of families there is always the possibility of f=0, for which
the ratio is undefined.

- Tim

Edward Green

unread,

Jan 1, 2011, 12:56:47 AM1/1/11

to

On Dec 28, 1:17 am, quasi <qu...@null.set> wrote:
> On 27 Dec 2010 22:28:00 -0500, landsb...@panix.com (Steven E.

>
>
>
> Landsburg) wrote:
> >There's an old brain teaser about a country where every couple
> >wants to have a boy. Therefore each couple keeps having children
> >until one is a boy; then they stop. Question: What is the
> >expected ratio of female-births to all-births?

<...>

> Here's a sketch of my analysis.
>
> Let n be the number of families.
>
> For n=1, the expected ratio is
>
> (0/1)*(1/2^1) + (1/2)*(1/2^2) + (2/3)*(1/2^3) + ...
>
> = sum ((k-1)/k)*(1/2^k)), for k = 1 to oo
>
> = 1 - ln(2).

How did you calculate that sum?

> which is approximately 0.3
>
> The very fact that for n=1 the answer is less than 1/2 makes it clear
> that the answer is not "1/2 for all n".

I am confused right off by your answer. Why should "n" enter into it?
The expectation of a random variable is a property of a population of
i.i.d. random variables as much as it is a property of one exemplar of
the random variable. At least it is for an intensive random variable,
like a ratio. "n" has nothing to do with it. It's like asking what the
expected fraction of heads is for tossing a fair coin: it doesn't make
any difference if you toss one coin or toss a thousand identical
coins, the answer is still 1/2.

It's been a while since I've thought seriously about random variables,
so I may be putting my foot into it. Please educate me in my feeble
minded error.

> Next, for arbitrary n, consider the distribution of the number of
> girls.
>
> The expected number of girls per family is
>
> 0*(1/2^1) + 1*(1/2^2) + 2*(1/2^3) + 3*(1/2^4) + ...
>
> = sum ((k-1)/2^k), for k = 1 to oo
>
> = 1.
>
> Now suppose n is large, but fixed. I'll sketch a heuristic argument.
>
> For large n, the number of girls will be approximately normally
> distributed with mean equal to
>
> n*(expected number of girls per family)
>
> = n
>
> and standard deviation on the order of sqrt(n) (I think).

Yes I think that's right.

> The number of boys will be exactly n
>
> Hence, for any open interval centered at 1/2, however small, the
> probability of the ratio (girls/(girls+boys)) falling in that interval
> can, for sufficiently large n, be forced arbitrarily close to 1. But
> all possible ratios are bounded below by 0 and above by 1, hence the
> expectation of the ratio can't be pulled away by outliers. Thus, as n
> approaches infinity, the expected ratio approaches 1/2.
>
> Since for n = 1, the expected ratio is 1 - ln(2), or approximately
> 0.3, this suggests that the expected ratio increases towards 1/2 as n
> approaches infinity.

Hmm... I see now why you think the answer depends on n, but I still
think there is something wrong with your argument. Which will come to
me any minute once the effects of the champagne wear off. :-}

> To recap, for n families, the expected number of girls is n, and the
> number of boys is exactly n, and hence the difference (girls - boys)
> has expectation zero, but the expected ratio (girls/(girls+boys)) is
> not 1/2 -- it's less, but approaches 1/2 as n approaches infinity.

The expected ratio of girls to boys is what you calculated for "n =
1", period. I think.

If you beat me up a little I will no doubt come to remember what I
once knew about random variables, and mend the error of my ways.

Edward Green

unread,

Jan 1, 2011, 1:10:27 AM1/1/11

to

On Jan 1, 12:56 am, Edward Green <spamspamsp...@netzero.com> wrote:

> The expected ratio of girls to boys is what you calculated for "n =
> 1", period. I think.

Oops. I meant what the problem asked for: the expected ratio of girls
to all births.

k...@kymhorsell.com

unread,

Jan 1, 2011, 1:39:33 AM1/1/11

to

Steven E. Landsburg <land...@panix.com> wrote:
> There's an old brain teaser about a country where every couple wants to
> have a boy. Therefore each couple keeps having children until one is
> a boy; then they stop. Question: What is the expected ratio of
> female-births to all-births?

...

Is this one of those funny situations where any initial population
collapses because (under most realistic assumptions) births won't replace
deaths and if the population was sufficiently large to start with the
ratio of female births/all births -> 0?

--
nice try, but wikipedia is not a credible source.
-- Animal06 ["10,000 lakes"] <Wher...@Friday.com>, 03 Dec 2010 13:05:47 -0500

David Bernier

unread,

Jan 1, 2011, 10:41:04 AM1/1/11

to

[...]

We're assuming no twins, triplets, or multiplets, I think ...
Realistically, there's a minimum time interval from a
birth in a family until the next possible birth.

Then, the parents die some day, but not before having had
a boy. Or maybe they to have a boy either until a boy is
born or one of them dies. Or maybe child-bearing isn't possible
past some age ...

David Bernier

unread,

Jan 1, 2011, 11:11:36 AM1/1/11

to

David Bernier wrote:
> Edward Green wrote:
>> On Dec 28, 1:17 am, quasi<qu...@null.set> wrote:
>>> On 27 Dec 2010 22:28:00 -0500, landsb...@panix.com (Steven E.
>>>
>>>
>>>
>>> Landsburg) wrote:
>>>> There's an old brain teaser about a country where every couple
>>>> wants to have a boy. Therefore each couple keeps having children
>>>> until one is a boy; then they stop. Question: What is the
>>>> expected ratio of female-births to all-births?

[...]

> We're assuming no twins, triplets, or multiplets, I think ...
> Realistically, there's a minimum time interval from a
> birth in a family until the next possible birth.
>
> Then, the parents die some day, but not before having had
> a boy. Or maybe they to have a boy either until a boy is

^^^ "keep having children"

> born or one of them dies. Or maybe child-bearing isn't possible
> past some age ...

The end of life scenarios aren't clear ...

quasi

unread,

Jan 1, 2011, 1:54:39 PM1/1/11

to

No time to beat you up right now -- I'm on my way out.

Maybe later.

quasi

Tim Little

unread,

Jan 1, 2011, 9:31:21 PM1/1/11

to

On 2011-01-01, k...@kymhorsell.com <k...@kymhorsell.com> wrote:
> Is this one of those funny situations where any initial population
> collapses because (under most realistic assumptions) births won't
> replace deaths

This problem assumes no deaths, because if the parents die then the
stated requirement that they keep having children until they get a boy
cannot be guaranteed.

- Tim

k...@kymhorsell.com

unread,

Jan 1, 2011, 9:56:13 PM1/1/11

to

Well... each woman can *try* to keep having children and go
infertile when the first male child is born. :)

With immortality and a pure birth process based on the 2 groups
fertile females, and "others" (males + females that have
produced a male child) we seem to have a simple limit on females
born/all births as elsewhere described.

--
[Full metal rebuttal:]
Not true.
-- John Stafford <nh...@droffats.net>, 08 Dec 2010 10:16:59 -0600

quasi

unread,

Jan 2, 2011, 2:39:32 AM1/2/11

to

On Fri, 31 Dec 2010 21:56:47 -0800 (PST), Edward Green
<spamsp...@netzero.com> wrote:

>On Dec 28, 1:17 am, quasi <qu...@null.set> wrote:
>> On 27 Dec 2010 22:28:00 -0500, landsb...@panix.com (Steven E.
>>
>> Landsburg) wrote:
>> >There's an old brain teaser about a country where every couple
>> >wants to have a boy. Therefore each couple keeps having children
>> >until one is a boy; then they stop. Question: What is the
>> >expected ratio of female-births to all-births?
>
><...>
>
>> Here's a sketch of my analysis.
>>
>> Let n be the number of families.
>>
>> For n=1, the expected ratio is
>>
>> (0/1)*(1/2^1) + (1/2)*(1/2^2) + (2/3)*(1/2^3) + ...
>>
>> = sum ((k-1)/k)*(1/2^k)), for k = 1 to oo
>>
>> = 1 - ln(2).
>
>How did you calculate that sum?

Change (1/2) to x to get a power series. Then differentiate and sum
the resulting series. Then integrate back, choosing the appropriate
constant to match the value of the original power series at x=0.
Finally, substitute x=1/2.

>> which is approximately 0.3
>>
>> The very fact that for n=1 the answer is less than 1/2 makes
>> it clear that the answer is not "1/2 for all n".
>
>I am confused right off by your answer. Why should "n" enter
>into it? The expectation of a random variable is a property of
>a population of i.i.d. random variables as much as it is a
>property of one exemplar of the random variable. At least it
>is for an intensive random variable, like a ratio. "n" has
>nothing to do with it. It's like asking what the expected
>fraction of heads is for tossing a fair coin: it doesn't make
>any difference if you toss one coin or toss a thousand identical
>coins, the answer is still 1/2.

For a population of n families, the expected value of the ratio
g/(g+b) is not the sum of the expected ratio of g/(g+b) per family
divided by n, but rather the weighted average of the possible values
of g/(g+b) for the entire population, weighted by the probabilities of
those ratios occurring. The fact that the expected ratio varies as a
function of n is due to the bias introduced by the stopping rule.

If you're not convinced try a simulation.

As you noted in the next reply, the ratio in question is g/(g+b), not
g/b.

For n families, the expected value of g/b is 1, independent of n,
however the ratio g/(g+b) is always less than 1/2 and definitely
_does_ depend on n, increasing towards 1/2 as n approaches infinity.

>If you beat me up a little I will no doubt come to remember what I
>once knew about random variables, and mend the error of my ways.

If there was no bias in the stopping rule, the expectation of g/(g+b)
would be 1/2, independent of n, but in this problem, there is a bias.

quasi

Phil H

unread,

Jan 2, 2011, 12:52:12 PM1/2/11

to

f = 0 is not in the range of expected values.
f/(m+f)....possible outcomes - M,FM,FFM,FFFM,FFFFM......etc
The expected value of males and females for a single family.....
For males this will be 1/2+1/4+1/8+1/16.....etc Sum of infinite series
n/2^n = <1 my reasoning here is the possibility of infinite females
(no males).
For females, expected value will be 1/4+2/8+3/16+4/32+5/64...etc Sum
of infinite series n/2^(n+1)= <1. Of course the mathematical behavior
will never match reality where a woman having an infinite number of
female children is not possible.
f/(m+f) = <1/(<1+<1)= 1/2.....there may be some distinction between <1
for males versus females but this is not clear to me.
Phil H

quasi

unread,

Jan 2, 2011, 1:24:06 PM1/2/11

to

Phil H <phol...@gmail.com> wrote:

>
> quasi wrote:
>>
>>The problem asks for the expected value of m/(m+f),
>>not the expected value of m/f.
>>
>>This is a good thing, because m/f has no expected value.
>>For any number of families there is always the possibility
>>of f=0, for which the ratio is undefined.
>>

>>f = 0 is not in the range of expected values.

For a given n, sure it is.

>f/(m+f)....possible outcomes - M,FM,FFM,FFFM,FFFFM......etc

The case M gives f = 0, m = 1.

>The expected value of males and females for a single family.....
>
>For males this will be 1/2+1/4+1/8+1/16.....etc
>Sum of infinite series n/2^n = <1

Did you mean 1/2^n? But if so, the sum would be equal to 1, not less
than 1. However for males, there's no need for a sum, since for a each
family, the rules force m = 1.

>My reasoning here is the possibility of infinite females (no males),

which has probability 0, so can be ignored.

>For females, expected value will be 1/4+2/8+3/16+4/32+5/64...etc
>Sum of infinite series n/2^(n+1)= <1.

No, the above sum is equal to 1.

>Of course the mathematical behavior will never match reality
>where a woman having an infinite number of female children
>is not possible.

This is just a play problem, there's no need to try to force it to be
real-world. Just view it as flipping a coin, heads or tails, until the
first head.

>f/(m+f) = <1/(<1+<1)= 1/2.....

No, that's been be disproved several times in this thread.

The expected value of f/(m+f) is _less_ than 1/2, regardless of the
number of families.

>there may be some distinction between <1 for males versus females

>but this is not clear to me.

If you write a simulation program (or else flip coins and keep track),
you'll quickly realize that E(f/(m+f)) < 1/2.

quasi

ChasBrown

unread,

Jan 2, 2011, 1:45:49 PM1/2/11

to

On Jan 2, 10:24 am, quasi <qu...@null.set> wrote:

<snip>

> If you write a simulation program (or else flip coins and keep track),
> you'll quickly realize that E(f/(m+f)) < 1/2.
>

Whenever I encounter a probability conundrum like this, I think: how
can I turn this into a sucker bet?

The best I can get here is: you and I each drop $500 dollars into a
hat. You flip a coin until heads come up; at that point you get $1000
* (tails/total flips) and I get the remainder.

Hmmm... a bit too much calculation for a bar bet...

Cheers - Chas

> quasi

quasi

unread,

Jan 2, 2011, 1:52:51 PM1/2/11

to

On Sun, 02 Jan 2011 13:24:06 -0500, quasi <qu...@null.set> wrote:

>Phil H <phol...@gmail.com> wrote:
>>
>> quasi wrote:
>>>
>>>The problem asks for the expected value of m/(m+f),
>>>not the expected value of m/f.

Sorry -- I accidentally lost a reference ...

The remark below was from Tim Little, not quasi (I thought I had said
it).

***

>>>This is a good thing, because m/f has no expected value.
>>>For any number of families there is always the possibility
>>>of f=0, for which the ratio is undefined.|

***

>>
>>f = 0 is not in the range of expected values.
>
>For a given n, sure it is.
>
>>f/(m+f)....possible outcomes - M,FM,FFM,FFFM,FFFFM......etc
>
>The case M gives f = 0, m = 1.
>
>>The expected value of males and females for a single family.....
>>
>>For males this will be 1/2+1/4+1/8+1/16.....etc
>>Sum of infinite series n/2^n = <1
>
>Did you mean 1/2^n? But if so, the sum would be equal to 1, not
>less than 1. However for males, there's no need for a sum, since

>for each family, the rules force m = 1.

>
>>My reasoning here is the possibility of infinite females
>>(no males),
>
>which has probability 0, so can be ignored.
>
>>For females, expected value will be
>>1/4+2/8+3/16+4/32+5/64...etc
>>Sum of infinite series n/2^(n+1)= <1.
>
>No, the above sum is equal to 1.
>
>>Of course the mathematical behavior will never match reality
>>where a woman having an infinite number of female children
>>is not possible.
>
>This is just a play problem, there's no need to try to force
>it to be real-world. Just view it as flipping a coin, heads
>or tails, until the first head.

Also, the above line below has a serious error.

The expected value

E(f/(m+f))

is _not_ the same as the ratio

E(f) / E(m+f)

>>f/(m+f) = <1/(<1+<1)= 1/2.....
>

>No, that's been disproved several times in this thread.

Edward Green

unread,

Jan 2, 2011, 4:20:26 PM1/2/11

to

On Jan 2, 2:39 am, quasi <qu...@null.set> wrote:
> On Fri, 31 Dec 2010 21:56:47 -0800 (PST), Edward Green
>
>
>
> <spamspamsp...@netzero.com> wrote:
> >On Dec 28, 1:17 am, quasi <qu...@null.set> wrote:
> >> On 27 Dec 2010 22:28:00 -0500, landsb...@panix.com (Steven E.
>
> >> Landsburg) wrote:
> >> >There's an old brain teaser about a country where every couple
> >> >wants to have a boy. Therefore each couple keeps having children
> >> >until one is a boy; then they stop. Question: What is the
> >> >expected ratio of female-births to all-births?

<...>

> >> Let n be the number of families.

>
> >> For n=1, the expected ratio is
>
> >> (0/1)*(1/2^1) + (1/2)*(1/2^2) + (2/3)*(1/2^3) + ...
>
> >> = sum ((k-1)/k)*(1/2^k)), for k = 1 to oo
>
> >> = 1 - ln(2).
>
> >How did you calculate that sum?
>
> Change (1/2) to x to get a power series. Then differentiate and sum
> the resulting series. Then integrate back, choosing the appropriate
> constant to match the value of the original power series at x=0.
> Finally, substitute x=1/2.

Damn, you're good.

> >> which is approximately 0.3
>
> >> The very fact that for n=1 the answer is less than 1/2 makes
> >> it clear that the answer is not "1/2 for all n".
>
> >I am confused right off by your answer. Why should "n" enter
> >into it? The expectation of a random variable is a property of
> >a population of i.i.d. random variables as much as it is a
> >property of one exemplar of the random variable. At least it
> >is for an intensive random variable, like a ratio. "n" has
> >nothing to do with it. It's like asking what the expected
> >fraction of heads is for tossing a fair coin: it doesn't make
> >any difference if you toss one coin or toss a thousand identical
> >coins, the answer is still 1/2.
>
> For a population of n families, the expected value of the ratio
> g/(g+b) is not the sum of the expected ratio of g/(g+b) per family
> divided by n, but rather the weighted average of the possible values
> of g/(g+b) for the entire population, weighted by the probabilities of
> those ratios occurring. The fact that the expected ratio varies as a
> function of n is due to the bias introduced by the stopping rule.
>
> If you're not convinced try a simulation.

Hmm... You are probably right. I have to think about this some more.

> For n families, the expected value of g/b is 1, independent of n,
> however the ratio g/(g+b) is always less than 1/2 and definitely
> _does_ depend on n, increasing towards 1/2 as n approaches infinity.
>
> >If you beat me up a little I will no doubt come to remember what I
> >once knew about random variables, and mend the error of my ways.
>
> If there was no bias in the stopping rule, the expectation of g/(g+b)
> would be 1/2, independent of n, but in this problem, there is a bias.

Do you have an intuitive feel for why the expected ratio grows towards
the asymptote 1/2 with population size?

Phil H

unread,

Jan 2, 2011, 6:40:49 PM1/2/11

to

On Jan 2, 11:24 am, quasi <qu...@null.set> wrote:

> Phil H <pholma...@gmail.com> wrote:
>
> > quasi wrote:
>
> >>The problem asks for the expected value of m/(m+f),
> >>not the expected value of m/f.
>
> >>This is a good thing, because m/f has no expected value.
> >>For any number of families there is always the possibility
> >>of f=0, for which the ratio is undefined.
>
> >>f = 0 is not in the range of expected values.
>
> For a given n, sure it is.

Maybe our definitions of expected value are different? For a single
coin toss, the expected number of heads is .5 and the expected number
of tails is .5. Is this incorrect?

>
> >f/(m+f)....possible outcomes - M,FM,FFM,FFFM,FFFFM......etc
>
> The case M gives f = 0, m = 1.
>
> >The expected value of males and females for a single family.....
>
> >For males this will be 1/2+1/4+1/8+1/16.....etc

> >Sum of infinite series 1/2^n = <1

>
> Did you mean 1/2^n? But if so, the sum would be equal to 1, not less
> than 1. However for males, there's no need for a sum, since for a each
> family, the rules force m = 1.
>
> >My reasoning here is the possibility of infinite females (no males),
>
> which has probability 0, so can be ignored.

This is worth discussing. For a function with a vertical asymptote for
a given x value, say c, the limit of the function as x-->c is infinity
but the function value never reaches infinity because it's undefined.
We then limit the function domain to something like [0,c)U(c,infin).
Doesn't this analogy apply to our series? Because infinity is
undefined, the sum of expected values will never reach 1.

>
> >For females, expected value will be 1/4+2/8+3/16+4/32+5/64...etc
> >Sum of infinite series n/2^(n+1)= <1.
>
> No, the above sum is equal to 1.

Repeat of previous comment.

>
> >Of course the mathematical behavior will never match reality
> >where a woman having an infinite number of female children
> >is not possible.
>
> This is just a play problem, there's no need to try to force it to be
> real-world. Just view it as flipping a coin, heads or tails, until the
> first head.
>
> >f/(m+f) = <1/(<1+<1)= 1/2.....
>
> No, that's been be disproved several times in this thread.
>
> The expected value of f/(m+f) is _less_ than 1/2, regardless of the
> number of families.
>
> >there may be some distinction between <1 for males versus females
> >but this is not clear to me.
>
> If you write a simulation program (or else flip coins and keep track),
> you'll quickly realize that E(f/(m+f)) < 1/2.

But you said previosuly f = 1 and m = 1 so how can 1/(1+1) = <1/2?
>
> quasi
I'll assume that I'm missing something ......
Phil H

k...@kymhorsell.com

unread,

Jan 2, 2011, 6:44:34 PM1/2/11

to

Edward Green <spamsp...@netzero.com> wrote:
...

> Do you have an intuitive feel for why the expected ratio grows towards
> the asymptote 1/2 with population size?

The number of breeding females falls over time, so the "boy bias"
reduces over time.

Provided the population doesn't "get stuck" because the number of
breeding females becomes zero (i.e. they all have a boy and switch off)
the population will move toward the "birth ratio". If 49% of births
are male, the popualtion -> 51% females. Etc.

Robert Israel

unread,

Jan 2, 2011, 9:01:03 PM1/2/11

to

quasi <qu...@null.set> writes:

I don't think so. In the case of "gb", there's an easy symmetry: given that
there are a total of n >= 2 children and the first occurrence of gb is at the
end, the first n-2 must be j g's followed by n-2-j b's, 0 <= j <= n-2,
which I'll denote by g^j b^(n-2-j).
The symmetry g^j b^(n-2-j) gb <-> g^(n-2-j) b^j gb
makes it obvious that E[g/(g+b) | g+b = n] = 1/2, and therefore
E[g/(g+b)] = 1/2.
Now let's try the stopping rule "bggb". You could have 7 children
in 7 ways, where the first 3 could be anything except bgg. So this
breaks the symmetry, and E[g/(g+b) | g+b = 7] = 25/49 > 1/2. I suppose it
could still happen that E[g/(g+b)] = 1/2, but I'd guess that's unlikely.
Calculating it exactly looks complicated, though.

Tim Little

unread,

Jan 2, 2011, 9:06:47 PM1/2/11

to

E(f) = 1 and E(m) = 1, and even E(m+f) = E(f) + E(m) = 2.
However, E(f/(m+f)) is not equal to E(f) / E(m+f).

- Tim

Tim Little

unread,

Jan 2, 2011, 9:32:12 PM1/2/11

to

On 2011-01-02, ChasBrown <cbr...@cbrownsystems.com> wrote:
> The best I can get here is: you and I each drop $500 dollars into a
> hat. You flip a coin until heads come up; at that point you get $1000
> * (tails/total flips) and I get the remainder.
>
> Hmmm... a bit too much calculation for a bar bet...

You could illustrate the general principle of E(X/Y) =/= E(X) / E(Y)
more simply, e.g. by betting $10 on a roll of a die (with expected
value 3.5), and you get $30 divided by the result.

Someone who believes that E(X/Y) = E(X) / E(Y) should accept that bet,
because E(X)/E(Y) = 30/3.5 ~= $8.57. However, E(X/Y) = $12.25.

--
Tim

David Bernier

unread,

Jan 3, 2011, 3:27:08 AM1/3/11

to

There's the concept of random walks on graphs. Here, a node
or state would correspond to the last four births, with
zeros for < 4 births:

0000, 000b, 000g, 00bb, 00bg, 00gb, 00gg, 0bbb, 0bbg, 0bgb, 0bgg,
0gbb, 0gbg, 0ggb, 0ggg, and the admissible sequences of
four of 'b' and 'g' : <= 31 states, I believe.

For the state gggg, there are the possible previous states
bggg and gggg. The two possible following states would be:
gggg and gggb. The stopping state, "bggb" is a sink, with
no following states. The "0000" state is a source, with nothing
before. The expected number of visits in the state "0000" is 1,
and also E[N("bggb")] = 1, with N("bggb") denoting the r.v. of
number of passages through the "bggb" state. For any state other
than sink and source, there's an equation from transitions to
and transitions from the state:

E[N("bggg")]/2 + E[N("gggg")]/2 = E[N("gggg")] and
E[N("gggg")] = E[N("gggb")]/2 + E[N("gggg")]/2 .

This gives several equations, which may be solvable for the up to
31 expectations if things work out as in Doyle and Snell's
"Random-Walks and Electric-Networks", Mathematical Association of America(1984).

Except for the sink "bggb", any other admissible state would be
followed 50% of the time by the birth of a boy, and
50% of the time by the birth of a girl.

Now assuming we had the expected number of passages though each
of the (up to) 31 states, does that help in computing
E[g/(g+b) ] ? Or at least, could it help in finding
the distribution of families of four, five, etc. ?
I'm not sure...

David Bernier
--
AVEN:
http://www.asexuality.org/home/

David Bernier

unread,

Jan 3, 2011, 6:21:53 AM1/3/11

to

I think I intuitively understand your symmetry argument for
the "gb" stopping rule. I'm working under the assumption of
just one family, which is quasi's condition (2) above.

For the "gb" stopping rule, I wrote a program that uses one
of George Marsaglia's recommended pseudo-random number
generators. I decided ahead of time to stop after
5 billion trials. E[g/(g+b)] was approximated as
the average g/(g+b) over all trials up to the current iteration,
and the average of the g/(g+b) ratios was updated and printed every
10 million trials. The simulation agrees well with E[g/(g+b)] = 1/2 :

5000000000 trials, average g/(g+b) ratio = 0.499998295.

The program can be modified for simulating the stopping rule "bggb".

I'm posting the source code in case anyone is interested in
experimenting.

David Bernier

-------------- simbg07a.c source code --------------------------------

#include <stdio.h>
#include <math.h>
#define THESEED 0xf9d7a60e

static unsigned int xs=521288629,xcng=362436069,Q[4691];
unsigned int MWC(void)
{unsigned int t,x; static unsigned int c=0,j=4691;
j=(j<4690)? j+1:0;
x=Q[j]; t=(x<<13)+c;
if(t<c) {c=(x>>19)+1; t=t+x;}
else {t=t+x; c=(x>>19)+(t<x);}
return (Q[j]=t);
}
#define CNG ( xcng=69069*xcng+123 )
#define XS ( xs^=(xs<<13), xs^=(xs>>17), xs^=(xs<<5) )
#define KISS ( MWC()+CNG+XS )

int main(void)
{
unsigned int the_seed_off_this = THESEED;
unsigned int xenon;
long long k;
int not_done;
int child1, child2, nextchild;
long nboy, ngirl;
long boys_plus_girls;
double ratio;
long double cumulative_sum;
long double avg;

cumulative_sum = (long double) 0;

for(k=1; k<= 100000000000; k++)
{

child1 = 3;
child2 = 3;

nboy = (long) 0;
ngirl = (long) 0;

not_done = 1;

while(not_done == 1)
{
// new child ....

xenon=KISS^the_seed_off_this;

nextchild = xenon%2;

// 0 means girl
// 1 means boy

if(nextchild == 0)
{
ngirl++;
}

if(nextchild == 1)
{
nboy++;
}

// if(nextchild == 0)
// {
// printf("g ");
// fflush(stdout);
// }

// if(nextchild == 1)
// {
// printf("b ");
// fflush(stdout);
// }

// advance one step ....

child1 = child2;
child2 = nextchild;

if( (child1 == 0) && (child2 == 1) )
{
not_done = 0;
}
}

// printf(" boys: %ld ", nboy);

// printf(" girls: %ld ", ngirl);

boys_plus_girls = nboy + ngirl;

ratio = ((double) ngirl)/((double) boys_plus_girls);

// printf(" g/(g+b) = %lf ", ratio);

cumulative_sum = cumulative_sum + ((long double) ratio);

// printf("\n");
// fflush(stdout);

avg = cumulative_sum/((long double)k);

if( 0 == (k%10000000))
{
printf("%lld trials, average g/(g+b) ratio = %.9Lf\n", k, avg);
fflush(stdout);
}

}

printf("\n");

return 0;
}
--
AVEN:
http://www.asexuality.org/home/

David Bernier

unread,

Jan 3, 2011, 11:15:39 AM1/3/11

to

David Bernier wrote:
[...]

> For the "gb" stopping rule, I wrote a program that uses one
> of George Marsaglia's recommended pseudo-random number
> generators. I decided ahead of time to stop after
> 5 billion trials. E[g/(g+b)] was approximated as
> the average g/(g+b) over all trials up to the current iteration,
> and the average of the g/(g+b) ratios was updated and printed every
> 10 million trials. The simulation agrees well with E[g/(g+b)] = 1/2 :
>
> 5000000000 trials, average g/(g+b) ratio = 0.499998295.
>
> The program can be modified for simulating the stopping rule "bggb".

I'm currently running a simulation for the stopping rule "bggb".
Sometimes, the family is large (60 or more children).

22570000000 trials, average g/(g+b) ratio = 0.4979994779
for the stopping rule "bggb".

The program using Marsaglia's generator is written for 64-bit computing,
by which I mean the x86_64 architecture, such as Intel Core 2 (duo/quad):

< http://en.wikipedia.org/wiki/X86-64#AMD64_implementations > .

David Bernier

Robert Israel

unread,

Jan 4, 2011, 2:43:34 AM1/4/11

to

David Bernier <davi...@videotron.ca> writes:

> David Bernier wrote:
> [...]
> > For the "gb" stopping rule, I wrote a program that uses one
> > of George Marsaglia's recommended pseudo-random number
> > generators. I decided ahead of time to stop after
> > 5 billion trials. E[g/(g+b)] was approximated as
> > the average g/(g+b) over all trials up to the current iteration,
> > and the average of the g/(g+b) ratios was updated and printed every
> > 10 million trials. The simulation agrees well with E[g/(g+b)] = 1/2 :
> >
> > 5000000000 trials, average g/(g+b) ratio = 0.499998295.
> >
> > The program can be modified for simulating the stopping rule "bggb".
>
> I'm currently running a simulation for the stopping rule "bggb".
> Sometimes, the family is large (60 or more children).
>
> 22570000000 trials, average g/(g+b) ratio = 0.4979994779
> for the stopping rule "bggb".

Here's my analysis for the "bggb" rule. Actually I calculated the ratio
b/(g+b) = 1 - g/(g+b). Consider the "states" 0, b, bg, bgg. For state
s let u_s(n) be the number of ordered n-tuples x of b's and g's such that
the concatenation sx end with bggb and has no bggb before the end, and
let v_s(n) be the total number of b's in the n-tuples counted by u_s(n).
Thus u_bgg(1) = v_bgg(1) = 1, u_s(1) = v_s(1) = 0 otherwise. These satisfy
the system of recurrences (for n >= 2)

u_0(n) = u_0(n-1) + u_b(n-1)
u_b(n) = u_b(n-1) + u_bg(n-1)
u_bg(n) = u_b(n-1) + u_bgg(n-1)
u_bgg(n) = u_0(n)
v_0(n) = u_b(n-1) + v_0(n-1) + v_b(n-1)
v_b(n) = u_b(n-1) + v_b(n-1) + v_bg(n-1)
v_bg(n) = u_b(n-1) + v_b(n-1) + v_bgg(n-1)
v_bgg(n) = v_0(n-1)

The generating function for v_0(n) is, if I'm not mistaken,

g(t) = sum_{n=1}^infty v_0(n) t^n
= t^4 (2 - 3 t + t^3 - t^4)/(1 - 2 t + t^3 - t^4)^2

The expected ratio
E[b/(b+g)] = sum_{n=1}^infty v_0(n)/n 2^(-n)
= int_0^{1/2} g(t)/t dt

which Maple evaluates as

/ -----
34 | \
-- + 2/55 | ) _R ln(
55 | /
| -----
\_R = %2

14426825 3 26986905 2 28247821370 2967778123
---------- _R + --------- _R - ----------- _R + ----------)
1956444783 652148261 21520892613 652148261

\ / ----- \
| | \ 4455 3 3795 2 809 120 |
| + | ) _R ln(----- _R + ---- _R - --- _R + ---)| -
| | / 71 71 71 71 |
| | ----- |
/ \_R = %1 /

/ -----
| \
2/55 | ) _R ln(
| /
| -----
\_R = %2

6587704507 14426825 3 26986905 2 28247821370
---------- + ---------- _R + --------- _R - ----------- _R)
1304296522 1956444783 652148261 21520892613

\ / ----- \
| | \ 311 4455 3 3795 2 809 |
| - | ) _R ln(--- - ---- _R + ---- _R - --- _R)|
| | / 142 71 71 71 |
| | ----- |
/ \_R = %1 /

4 3 2
%1 := RootOf(275 _Z - 275 _Z + 90 _Z - 20 _Z + 1)

4 2
%2 := RootOf(275 _Z - 58580 _Z + 457875 _Z - 873499)

with a numerical value of .5020006457616504598
(and thus E[g/(g+b)] = .4979993542383495402, agreeing quite well with
David's simulation).

David Bernier

unread,

Jan 4, 2011, 4:59:09 PM1/4/11

to

That's quite astounding. Does sum_{ _R = %2 } [ expression in _R ]
signify to sum over all four roots of the second polynomial?
Or else an infinite sum starting at an integer value?

Perhaps this could be a good test for pseudo-random number generators.
My program writes its "g/(g+b)" average to a file every 10^7 trials.

From time to time, I ask for the last ten lines of this growing file.
The last time the average was below your 0.497999354238.... was
quite a while ago:

6020000000 trials, average g/(g+b) ratio = 0.4979994717
6030000000 trials, average g/(g+b) ratio = 0.4979994159
6040000000 trials, average g/(g+b) ratio = 0.4979994397
6050000000 trials, average g/(g+b) ratio = 0.4979995203
6060000000 trials, average g/(g+b) ratio = 0.4979995095
6070000000 trials, average g/(g+b) ratio = 0.4979994455
6080000000 trials, average g/(g+b) ratio = 0.4979994036
6090000000 trials, average g/(g+b) ratio = 0.4979993518 <====
6100000000 trials, average g/(g+b) ratio = 0.4979992776 <====
6110000000 trials, average g/(g+b) ratio = 0.4979992763 <====

The above was after ~= 6.1 billion trials.

The most recent end-file data is this:
183060000000 trials, average g/(g+b) ratio = 0.4979994210
183070000000 trials, average g/(g+b) ratio = 0.4979994182
183080000000 trials, average g/(g+b) ratio = 0.4979994153
183090000000 trials, average g/(g+b) ratio = 0.4979994169
183100000000 trials, average g/(g+b) ratio = 0.4979994207
183110000000 trials, average g/(g+b) ratio = 0.4979994185
183120000000 trials, average g/(g+b) ratio = 0.4979994176
183130000000 trials, average g/(g+b) ratio = 0.4979994153
183140000000 trials, average g/(g+b) ratio = 0.4979994123
183150000000 trials, average g/(g+b) ratio = 0.4979994134

Now it's at 183 billion trials, I think.

If we let Y = the r.v. g/(g+b) , then E[Y] is about 0.498.

I have no idea what the variance of Y is. But I think I should
get a good enough estimate by simulating for
E[Y^2], and then subtracting (E[Y])^2 .

Using the variance from simulation and the central limit
theorem, I believe it's standard procedure to construct
a [so-called] 95% confidence interval for E[Y].

Luckily, I have other PRNG generators that I could use
as a comparison method for the results with the current PRNG.

David Bernier

Robert Israel

unread,

Jan 4, 2011, 7:58:33 PM1/4/11

to

David Bernier <davi...@videotron.ca> writes:

> Robert Israel wrote:

Yes, it's the sum of the expression over all roots of the polynomial.

David Bernier

unread,

Jan 5, 2011, 7:34:57 PM1/5/11

to

David Bernier wrote:
[...]

> I'm currently running a simulation for the stopping rule "bggb".
> Sometimes, the family is large (60 or more children).
>
> 22570000000 trials, average g/(g+b) ratio = 0.4979994779
> for the stopping rule "bggb".

[...]

I posted the source code for the simulation with the stopping rule "gb".
The pseudo-random number generator is by Marsaglia, and named KISS
for "keep it simple stupid". That generator was a topic of discussion
in sci.math around 2009 or so. I have code with an earlier generator,
also called KISS.

There's a "long double" floating point variable cumulative_sum
in the program. It's the sum of the ratios for every trial.
After N trials, "cumulative_sum" will be roughly N x 0.497999 ,
say N/2 . With N at 10^10 or larger, adding a new sample
ratio "r" could cause a loss of precision in decimal digit #7 or
#8 (for example) if the mantissa (fractional part) of
a floating point of the type like "cumulative_sum" hasn't
enough bits. The type is called a C-language "long double",
and I'm not sure that the number of bits in the fractional
part has been standardized across compilers/processors;
I imagine anywhere from 53 to 80 bits.

I started anew with KISS, but this time using the C MPFR library
with 120-bit multi-precision numbers, when desirable, in the
source code.

This is the latest estimate (with MPFR) for the "bggb" stopping rule, same
KISS generator, same boy/girl outcomes in all trials,
when comparing to the un-posted "bggb" simulation program
without MPFR:

$ tail my_mp_data
46030000000 trials, average g/(g+b) ratio = 4.979994402331924755542671841e-1
46040000000 trials, average g/(g+b) ratio = 4.979994422824728425257535428e-1
46050000000 trials, average g/(g+b) ratio = 4.979994331001575695165645802e-1
46060000000 trials, average g/(g+b) ratio = 4.979994357806791562958800767e-1
46070000000 trials, average g/(g+b) ratio = 4.979994305195598008128094211e-1
46080000000 trials, average g/(g+b) ratio = 4.979994264791759091749787758e-1
46090000000 trials, average g/(g+b) ratio = 4.979994371369548910770820735e-1
46100000000 trials, average g/(g+b) ratio = 4.979994353386294333441470424e-1
46110000000 trials, average g/(g+b) ratio = 4.979994256329465903975976811e-1
46120000000 trials, average g/(g+b) ratio = 4.979994144033058317761845721e-1
[now at 46,120,000,000 trials in multi-precision ... ]

Of course, the last line amounts to a ratio of 0.4979994144 ; if we knew
the variance of the random variable g/(g+b) , then we could
construct a PRNG-generated confidence interval, and see if this
PRNG-generated confidence interval contains or not
Robert Israel's analytical/probability/Maple value of
E[(g/(g+b))] ~= .4979993542383495402
see for example < http://groups.google.com/group/sci.math/msg/5cbfd3e60613aba8 >

David Bernier

Robert Israel

unread,

Jan 5, 2011, 7:49:00 PM1/5/11

to

David Bernier <davi...@videotron.ca> writes:

Why don't you estimate the variance at the same time as you are estimating the
expected value?

> David Bernier

LudovicoVan

unread,

Jan 5, 2011, 8:23:38 PM1/5/11

to

On Jan 6, 12:34 am, David Bernier <david...@videotron.ca> wrote:

> There's a "long double" floating point variable cumulative_sum
> in the program. It's the sum of the ratios for every trial.
> After N trials, "cumulative_sum" will be roughly N x 0.497999 ,
> say N/2 . With N at 10^10 or larger, adding a new sample
> ratio "r" could cause a loss of precision in decimal digit #7 or
> #8 (for example) if the mantissa (fractional part) of
> a floating point of the type like "cumulative_sum" hasn't
> enough bits. The type is called a C-language "long double",
> and I'm not sure that the number of bits in the fractional
> part has been standardized across compilers/processors;
> I imagine anywhere from 53 to 80 bits.

The relevant standard is the IEEE standard for floating-point: it
leaves some room for flexibility on the implementation side, where the
missing details are implementation specific and are supposed to be
specified in the compiler's documentation.

As for the loss of precision (e.g. in your code when you take the
ratio and when you cumulate it to the running sum): that is a very
well known problem in the leterature on floating-point. A couple of
(obvious) starting links:

http://en.wikipedia.org/wiki/Floating_point#Accuracy_problems
http://en.wikipedia.org/wiki/Kahan_summation_algorithm

-LV

David Bernier

unread,

Jan 5, 2011, 9:16:22 PM1/5/11

to

[...]

Yes, thanks for the suggestion. It's for lack of time. Also, my
programming methodology is to first reproduce earlier computations
or approximations, and secondly add new features, such as estimating
the second moment of g/(g+b) .

I've got the first "bggb" program continuing without MPFR help:
337570000000 trials, average g/(g+b) ratio = 0.4979994203
337580000000 trials, average g/(g+b) ratio = 0.4979994199
337590000000 trials, average g/(g+b) ratio = 0.4979994191
337600000000 trials, average g/(g+b) ratio = 0.4979994204
337610000000 trials, average g/(g+b) ratio = 0.4979994202
[ 337 billion trials ]

I don't know what others think, but the almost-constant '41' or '42'
after .497999 looks suspicious, assuming your Maple value is
correct to 12+ decimal digits. At first I figured KISS, version II
might be failing; for one thing, I saw sample families with 63
children in testing. Obviously, the first time "bggb" appears
is also the last.

Then, in the past day I began to suspect that accuracy in estimating
the average g/(g+b) ratio was or might be lost because the floating point
fractions have too few bits.

The MPFR version using the "KISS, version II" shouldn't suffer from
lack of floating-point precision, with 120 bits for g/(g+b) fractions,
the cumulative sum of g/(g+b) fractions and g/(g+b) averages.
The MPFR version seems slower.

The old KISS, which I call "KISS, version I" by Marsaglia was in C code
I wrote when things were compiled in 32-bit fashion. Now 64-bit fashion is
sometimes the default, so it was some trouble getting the needed
32-bit C libraries, and finding the compiler option.

In that old combination PRNG, I combine by exclusive or:
"KISS, version I", bits from the Mersenne Twister and bits
from "alleged RC4", which allegedly duplicates the crypto
RC4 algorithm of R. Rivest. I switched from "KISS, version II"
to ("KISS, version I" .xor. MTwister .xor. "alleged RC4").

David Bernier

unread,

Jan 5, 2011, 9:53:43 PM1/5/11

to

I saw that the GNU gcc C compiler supports 80-bit floats and 128-bit floats:
< http://gcc.gnu.org/onlinedocs/gcc/Floating-Types.html >

I think the types are: __float80 and __float128 (it looks like two underscores).

David

David Bernier

unread,

Jan 6, 2011, 1:02:54 AM1/6/11

to

[...]

This is what I get: [ means "all Ok" ]

My confidence interval calculation from
simulation results using the PRNG "KISS, version II":

? N = 359980000000
%1 = 359980000000
? moment2 = 0.2605943082
%2 = 0.26059430820000000000000000000000000000
? moment1 = 0.4979994221
%3 = 0.49799942210000000000000000000000000000
? vari = moment2 - moment1*moment1
%4 = 0.012590883788066031590000000000000000000
? sigm = sqrt(vari)
%5 = 0.11220910742032498491110399381166182852
? sigtitwo = 2*sigm
%6 = 0.22441821484064996982220798762332365703
? lowbound = moment1 - sigtitwo/sqrt(N)
%7 = 0.49799904805925176715595176781706848474
? highbound = moment1 + sigtitwo/sqrt(N)
%8 = 0.49799979614074823284404823218293151526

The 95% two-sided prng confidence interval for E[g/(g+b)] is:
[0.497999048 , 0.497999796].

Robert Israel's analytical/Maple E[g/(g+b)] falls comfortably within
the prng confidence interval.

David Bernier

quasi

unread,

Jan 6, 2011, 2:26:18 AM1/6/11

to

Nice.

I'll guess then that the only stopping strings for which

E[g/(g+b)] = 1/2

are the strings s which are one of

"gb"
"gbgb"
"gbgbgb"

etc.

and their complements (with g's and b's switched).

For stopping strings s with an equal of g's and b's, I wonder which
ones have

E[g/(g+b)] < 1/2

??

There should be some pattern but at this point it isn't clear.

quasi

David Bernier

unread,

Jan 6, 2011, 2:44:20 AM1/6/11

to

If I may say so, I think the guess above is quite interesting.
It challenges the would-be counterexample discoverer
to come up with a stopping rule with potentially
delicate combinatorics that will still yield:

E[g/(g+b)] = 1/2 exactly.

best wishes,

David Bernier

quasi

unread,

Jan 6, 2011, 5:30:18 AM1/6/11

to

On Thu, 06 Jan 2011 02:44:20 -0500, David Bernier
<davi...@videotron.ca> wrote:

>> I'll guess then that the only stopping strings for which
>>
>> E[g/(g+b)] = 1/2
>>
>> are the strings s which are one of
>>
>> "gb"
>> "gbgb"
>> "gbgbgb"
>>
>> etc.
>>
>> and their complements (with g's and b's switched).
>
>If I may say so, I think the guess above is quite interesting.
>It challenges the would-be counterexample discoverer
>to come up with a stopping rule with potentially
>delicate combinatorics that will still yield:
>
> E[g/(g+b)] = 1/2 exactly.

But I guessed too quickly.

The stopping rules

"ggbb" and "bbgg"

also have E[g/(g+b)] = 1/2.

My latest guess is this ...

E[g/(g+b)] = 1/2 iff the stopping rule has one of the forms

(g^j b^j)^k

(b^j g^j)^k

where j,k are positive integers.

>> For stopping strings s with an equal of g's and b's, I wonder which
>> ones have
>>
>> E[g/(g+b)]< 1/2
>>
>> ??
>>
>> There should be some pattern but at this point it isn't clear.

And the above pattern is still a mystery.

quasi

unread,

Jan 6, 2011, 5:55:41 AM1/6/11

to

On Thu, 06 Jan 2011 05:30:18 -0500, quasi <qu...@null.set> wrote:

>On Thu, 06 Jan 2011 02:44:20 -0500, David Bernier
><davi...@videotron.ca> wrote:
>
>>> I'll guess then that the only stopping strings for which
>>>
>>> E[g/(g+b)] = 1/2
>>>
>>> are the strings s which are one of
>>>
>>> "gb"
>>> "gbgb"
>>> "gbgbgb"
>>>
>>> etc.
>>>
>>> and their complements (with g's and b's switched).
>>
>>If I may say so, I think the guess above is quite interesting.
>>It challenges the would-be counterexample discoverer
>>to come up with a stopping rule with potentially
>>delicate combinatorics that will still yield:
>>
>> E[g/(g+b)] = 1/2 exactly.
>
>But I guessed too quickly.
>
>The stopping rules
>
> "ggbb" and "bbgg"
>
>also have E[g/(g+b)] = 1/2.

Cancel that -- I think the above may be wrong.

I have two programs -- one is a simulation, the other produces exact
rational lower and upper bounds. For the stopping rule "ggbb", they
disagree.

The simulation suggests E[g/(g+b)] is a approximately .492

The bounding program suggests E[g/(g+b)] is exactly 1/2.

Obviously, at least one of the programs has a bug, but I'm too tired
right now to do any debugging.

quasi

David Bernier

unread,

Jan 6, 2011, 5:14:39 PM1/6/11

to

David Bernier wrote:
> David Bernier wrote:
>> Robert Israel wrote:
>>> David Bernier<davi...@videotron.ca> writes:

[...]

>>>> I started anew with KISS, but this time using the C MPFR library
>>>> with 120-bit multi-precision numbers, when desirable, in the
>>>> source code.
>>>>
>>>> This is the latest estimate (with MPFR) for the "bggb" stopping rule,
>>>> same
>>>> KISS generator, same boy/girl outcomes in all trials,
>>>> when comparing to the un-posted "bggb" simulation program
>>>> without MPFR:
>>>>
>>>>

[...]

[ MPFR simulation numbers with Marsaglia's "KISS, version II"
ca. 2009]

[...]

With N equal to about 300 billion, the standard deviation
or of the average of N trials is about 2 E -7 or
0.0000002 . All PRNG generator simulation estimates
for the "bggb" stopping rule are non-suspect, just
like if we had true random numbers.

So I'll be ending the "bggb" computations soon.

David Bernier
--
120039822036992245303534619191166796374

David Bernier

unread,

Jan 7, 2011, 8:33:11 AM1/7/11

to

[...]

I had a look at the Kahan summation algorithm at Wikipedia.
I didn't check or take the time to really understand
how the "compensation term(s)" work, but I take it on trust.

I wrote my first C program using the type: __float128 .
I'm summing the harmonic series, as before. It's
not much faster than MPFR, I think, but it's
much easier to write. For printing,
I cast the __float128-type quad-precision numbers
as "long doubles", because "%Lf" in C is to format
"long doubles" for printf() [printing], but
I couldn't find a reliable source for printing
__float128-type numbers to full 33-digit precision.

The source code appears below.

The last 5 lines of output:

$ tail -5 my_harmonic_output.txt
Sum to 71600000000 is: 25.57157657582152718821
Sum to 71700000000 is: 25.57297224946049219244
Sum to 71800000000 is: 25.57436597790908630427
Sum to 71900000000 is: 25.57575776658189912506
Sum to 72000000000 is: 25.57714762087094377205

David Bernier
--------------- C source code ---------------
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <endian.h>
int main (void)
{
unsigned long long i, n;
__float128 s, t, u;
n = 100000000000000;
printf("n = %llu\n", n);
fflush(stdout);
t = (__float128) 1;
s = (__float128) 0;

for (i = 1; i <= n; i++)
{
s= s + t/((__float128)i);
if(0 == (i%100000000))
{
printf ("Sum to %llu is: ", i);
printf("%.20Lf", (long double)s);
printf("\n");
fflush(stdout);
}
}

return 0;
}
--
120039822036992245303534619191166796374

LudovicoVan

unread,

Jan 7, 2011, 9:47:42 AM1/7/11

to

> > I saw that the GNU gcc C compiler supports 80-bit floats and 128-bit
> > floats:
> > <http://gcc.gnu.org/onlinedocs/gcc/Floating-Types.html>
>
> > I think the types are: __float80 and __float128 (it looks like two
> > underscores).
>
> [...]
>
> I had a look at the Kahan summation algorithm at Wikipedia.
> I didn't check or take the time to really understand
> how the "compensation term(s)" work, but I take it on trust.
>
> I wrote my first C program using the type: __float128 .
> I'm summing the harmonic series, as before. It's
> not much faster than MPFR, I think, but it's
> much easier to write.

Few sparse thoughts:

Indeed, using the built-in types, i.e. the types that are
intrinsecally supported by the hardware (and the compiler), you get
the best performances: but that's not a faithful evaluation really. As
soon as you start controlling/bounding errors (moreover, not an easy
task at all with floating-point!), you find yourself with performances
that are not that different from ad-hoc software-level solutions,
where the lack of intrinsic and hardware-level support is largely
compensated by the ability to control error bounds much much more
easily.

If I were you, I'd have used MPFR *rationals* all the way for this
task: I haven't checked, but I'd even guess, given enough precision,
you would get *exact* results for this problem. Besides, even if that
would result a bit slower under an operation-by-operation comparison,
OTOH it would generally provide faster convergence due to the better
bounding and controlling of errors.

-LV

cwldoc

unread,

Jan 7, 2011, 2:10:14 PM1/7/11

to

> On 27 Dec 2010 22:28:00 -0500, land...@panix.com
> (Steven E.
> Landsburg) wrote:
>
> >There's an old brain teaser about a country where
> every couple
> >wants to have a boy. Therefore each couple keeps
> having children
> >until one is a boy; then they stop. Question: What
> is the
> >expected ratio of female-births to all-births?
> >

> >A thread on MathOverflow (with a particularly
> helpful comment
> >from Douglas Zare) reminded me that there's a
> valuable lesson to
> >be learned from this teaser, namely this:
> >
> >The fact that two random variables have an expected
> *difference*
> >of zero does not imply that they have an expected
> *ratio* of one!
> >
> >This in turn is because:
> >
> >The expectation of a ratio is not (in general) equal
> to the ratio
> >of the expectations.
> >
> >This inspired me to make a couple of blog posts,
> here and here:
> >
> ><http://www.thebigquestions.com/2010/12/21/are-you-sm
> arter-than-google/>
> >
> ><http://www.thebigquestions.com/2010/12/22/a-big-answ
> er-2/>
> >
> >In the second post, I give a particularly simple
> counterexample
> >(involving four families on my block) to the
> assertion that
> >Expected Difference = 0 => Expected ratio = 1.
> >
> >This triggered objections of the form "But the
> example is not
> >exactly like the problem", from people who
> apparently didn't
> >realize that >*any* counterexample is enough to
> invalidate a
> >purported lemma, and that any proof that invokes
> that lemma is
> >therefore invalidated, even in contexts that look
> quite unlike
> >the counterexample.
> >
> >This, of course, is to be expected, and after much
> back-and-forth
> >in the comment section, I daresay most of these
> people ended up
> >learning something, which is a very good thing.
> Certainly many
> >very very smart people have the wrong intuition
> about this problem
> >at first.
> >
> >But, the Internet being what it is, there are a
> small number of
> >very cranky cranks who belligerently defend the
> wrong answer and
> >(of course) conclude that I am very foolish for not
> deferring to
> >their belligerence. One of these (whose comment on
> my blog I
> >deleted for its lack of content and abusive tone ---
> something I
> >almost never find necessary) came from a faculty
> member in the
> >physics department at an Ivy League university.
> Another came from
> >the ever-cranky physicist Lubos Motl.
> >
> >Probably one should just shrug this stuff off, but
> the tone of
> >these idiots pissed me off enough that I've offered
> a $15,000
> >challenge to Motl and his ilk. Yes, I'm dead
> serious about this.
> >If you think Motl is right, you can get in on the
> action; the
> >offer is here:
> >
> ><http://www.thebigquestions.com/2010/12/22/a-big-answ
> er-2/>
> >
> >Please note that I'm not using the word "idiot" to
> describe
> >people who don't at first understand this, or even
> people who
> >*never* understand it. I am using it to describe
> people who
> >repeat the same false argument 18 times over, even
> after
> >they've been shown an explicit counterexample to the
> "lemma"
> >they're invoking, and having had the relevance of
> that
> >counterexample pointed out to them 62 times in 62
> ways.
> >
> >Anyway, I thought sci.math denizens might find this
> fun.
> >Hence this post.
> >
> >Steven E. Landsburg
> ><http://www.TheBigQuestions.com/blog>
>
> Definitely an intriguing problem -- thanks.
>
> It's clear to me that you're right (of course you
> knew that).
>
> Here's a sketch of my analysis.

>
> Let n be the number of families.
>
> For n=1, the expected ratio is
>
> (0/1)*(1/2^1) + (1/2)*(1/2^2) + (2/3)*(1/2^3) +
> ) + ...
>
> = sum ((k-1)/k)*(1/2^k)), for k = 1 to oo
>
> = 1 - ln(2).

How do you get that result? I know that
ln 2 = 1 - 1/2 + 1/3 - . . .

>
> which is approximately 0.3
>
> The very fact that for n=1 the answer is less than
> 1/2 makes it clear
> that the answer is not "1/2 for all n".
>

> Next, for arbitrary n, consider the distribution of
> the number of
> girls.
>
> The expected number of girls per family is
>
> 0*(1/2^1) + 1*(1/2^2) + 2*(1/2^3) + 3*(1/2^4) +
> ) + ...
>
> = sum ((k-1)/2^k), for k = 1 to oo
>
> = 1.
>
> Now suppose n is large, but fixed. I'll sketch a
> heuristic argument.
>
> For large n, the number of girls will be
> approximately normally
> distributed with mean equal to
>
> n*(expected number of girls per family)
>
> = n
>
> and standard deviation on the order of sqrt(n) (I
> think).
>
> The number of boys will be exactly n
>
> Hence, for any open interval centered at 1/2, however
> small, the
> probability of the ratio (girls/(girls+boys)) falling
> in that interval
> can, for sufficiently large n, be forced arbitrarily
> close to 1. But
> all possible ratios are bounded below by 0 and above
> by 1, hence the
> expectation of the ratio can't be pulled away by
> outliers. Thus, as n
> approaches infinity, the expected ratio approaches
> 1/2.
>
> Since for n = 1, the expected ratio is 1 - ln(2), or
> approximately
> 0.3, this suggests that the expected ratio increases

> towards 1/2 as n
> approaches infinity.
>

> To recap, for n families, the expected number of
> girls is n, and the
> number of boys is exactly n, and hence the difference
> (girls - boys)
> has expectation zero, but the expected ratio
> (girls/(girls+boys)) is
> not 1/2 -- it's less, but approaches 1/2 as n
> approaches infinity.
>
> quasi

The expected number of girls per family is
Eg = (0)(1/2) + (1)(1/4) + (2)(1/8) + (3)(1/16) + . . .

The expected number of boys per family is
Eb = 1

The expected value of girls/(girls + boys) is
Er= (0/1)(1/2) + (1/2)(1/4) + (2/3)(1/8) + . . .

The original poster noted that, in this case,
Er is not Eg/(Eb + Eg)

David Bernier

unread,

Jan 20, 2011, 11:46:18 AM1/20/11

to

[...]

After 13 days, the average summing rate is
2,600,000 terms added per second:
Sum to 2904900000000 is: 29.27463574713371125512

In other words,

sum_{k=1 ... 2904900000000} 1/k ~= 29.27463574713371125512

According to PARI-gp:
? psi( 2904900000001)-psi(1)
29.274635747133711255505803957873420364081564... // the answer

29.27463574713371125512 is Ok for 20-digit accuracy.

In the C code, the format string was/is: "%.20Lf" ....
I think it just means: format with 20 digits after the decimal point.

In the program, s is of type __float128 , a quad-float or 128 bits.
For printing, "s" was cast as a "long double".
Was there a loss of precision in the conversion?

How is one supposed to write the format string for
printing a type __float128 ? Example: for a "long double",
the formatting string is "%Lf" .

David Bernier

=== source code: ==================

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <endian.h>

int main (void)
{
unsigned long long i, n;
__float128 s, t, u;

n = 100000000000000;

printf("n = %llu\n", n);
fflush(stdout);

t = (__float128) 1;

s = (__float128)0;

for (i = 1; i <= n; i++)
{

s= s + t/((__float128)i);

if( 0 == (i%100000000))

James Waldby

unread,

Jan 20, 2011, 4:34:54 PM1/20/11

to

[possibly comp.programming might provide better info]

On Thu, 20 Jan 2011 11:46:18 -0500, David Bernier wrote:
...

> In the C code, the format string was/is: "%.20Lf" .... I think it just
> means: format with 20 digits after the decimal point.
>
> In the program, s is of type __float128 , a quad-float or 128 bits. For
> printing, "s" was cast as a "long double". Was there a loss of
> precision in the conversion?
>
> How is one supposed to write the format string for printing a type
> __float128 ? Example: for a "long double", the formatting string is
> "%Lf" .

...

> int main (void)
> {
> unsigned long long i, n;
> __float128 s, t, u;

...
> s = (__float128)0;
...
s= s + t/((__float128)i);
...

> printf("%.20Lf", (long double)s);

...

While gcc 4.3 supports 112-bit-mantissa arithmetic with items of
__float128 type, apparently (ie, as I understand the web pages
about it) at the moment it does not have format specifiers for
full precision printout. The following program (between dashed
lines) illustrates one kludge for printing small positive values.
----------------------------------------------------------
#include <stdio.h>
#include <math.h>
void sho (__float128 a) {
long double c = floor ((long double)(1e10Q * a));
__float128 d = c, f = a - d/1e10Q, g=1e30Q * f;

printf("%0.10Lf%020.0Lf\n", c/1e10L, (long double)g);
}

int main (void) {
printf ("%0.30Lf\n", nextafterl(1.0Q, 2.0Q));
sho (nextafterl(1.0Q, 2.0Q)); // a long double value
sho (1 + 1e-18Q);
sho (1 + 1e-19Q);
sho (1 + 1e-20Q);
sho (1 + 1e-24Q);
sho (1 + 1e-28Q);
sho (+1234.56789012345678901234567890123456789Q);
sho (+12345.6789012345678901234567890123456789Q);
sho (+123456.789012345678901234567890123456789Q);
puts ("Following lines illustrate incorrect output");
sho (-1234.56789012345678901234567890123456789Q);
sho (+1234567890.12345678901234567890123456789Q);
sho (-1234567890.12345678901234567890123456789Q);
sho (1e13 + 1e-18Q);
return 0;
}
----------------------------------------------------------
Program output on one of my computers:
1.000000000000000000108420217249
1.000000000000000000108420217249
1.000000000000000001000000000000
1.000000000000000000100000000000
1.000000000000000000010000000000
1.000000000000000000000001000000
1.000000000000000000000000000100
1234.567890123456789012345678901236
12345.678901234567890123456789012348
123456.789012345678901234567890123456
Following lines illustrate incorrect output
-1234.567890123543210987654321098764
1234567890.123456716872212345678901234610176
-1234567890.1234567168-72212345678901234610176
9999999999999.9991607666838860800000000999586529280

--
jiw