Large Population, small sample and some combinatorics

Silvan Urfer

unread,

Jan 20, 2002, 3:42:14 PM1/20/02

to

Hello

This is probably a banal question for a full-time statistician, but I
need some advice on the validity of a method I intend to use.

Scenario: A population 1 of purebred dogs in which the chance to reach a
certain age is established fairly well (sample of roughly 400 dogs
during the last 20 years). We have the lifespans of 9 randomly choosed
dogs from an earlier population 2 (between 1900 and 1930. Obtaining
lifespan data from back then is extermely difficult. These nine dogs are
the result of scanning roughly 150 individuals from this timespan).

In Population 1, the chance of reaching 10 years of age is p=0.17, the
cance of reaching 13.5 is q=0.01

Now, population 2 contains both a dog that reached 10 and a dog that
reached 13.5. It should be stressed out that this is not the result of
taking several samples and choosing the most convenient one.

It is easily established that the chance for a 10-year-old in a sample
of nine would be 1-(1-0.17)^9 = 0.81
(in general notation, 1-(1-p)^n)

Likewise, the cance for a 13.5-year-old would be 1-(1-0.01)^9 = 0.086
(1-(1-q)^n)

No big deal.

Alone, neither of the two dogs gives a significant difference, thus it
could well be coincidence. I strongly suspect that having both together
in the sample of nine would make one. However, I have a problem with
calculating the exact chance.

Here is the problem:

For dog a, we have the chance of 1-(1-p)^n
For dog b, we have the chance of 1-(1-q)^(n-1)

But it could equally well be said that

For dog a, we have the chance of 1-(1-p)^(n-1)
For dog b, we have the chance of 1-(1-q)^n

Which gives us two different chances for each dog. Thus, order plays a
role. Now, my problem is: How to calculate the chance without order
having an influence? I hope someone will be able to explain.

Secondly, if the chance of this age distribution in a random sample of 9
is as small as I suspect it to be, would you consider this a valid proof
that population 2 had a higher chance of reaching the ages of 10 and
over than population 1 has?

Thanks for reading

Silvan

Neville X. Elliven

unread,

Jan 23, 2002, 12:24:38 AM1/23/02

to

Silvan Urfer wrote:

>Scenario: A population 1 of purebred dogs in which the chance to reach a
>certain age is established fairly well (sample of roughly 400 dogs
>during the last 20 years). We have the lifespans of 9 randomly choosed
>dogs from an earlier population 2 (between 1900 and 1930. Obtaining
>lifespan data from back then is extermely difficult. These nine dogs are
>the result of scanning roughly 150 individuals from this timespan).
>
>In Population 1, the chance of reaching 10 years of age is p=0.17, the
>cance of reaching 13.5 is q=0.01

I take this to mean:
P{death before age 10} = 0.83
P{death before age 13.5} = 0.99

>Now, population 2 contains both a dog that reached 10 and a dog that
>reached 13.5. It should be stressed out that this is not the result of
>taking several samples and choosing the most convenient one.
>
>It is easily established that the chance for a 10-year-old in a sample
>of nine would be 1-(1-0.17)^9 = 0.81
>(in general notation, 1-(1-p)^n)
>
>Likewise, the cance for a 13.5-year-old would be 1-(1-0.01)^9 = 0.086
>(1-(1-q)^n)

Correct.

>Alone, neither of the two dogs gives a significant difference, thus it
>could well be coincidence. I strongly suspect that having both together
>in the sample of nine would make one. However, I have a problem with
>calculating the exact chance.

It's a multinomial distribution, with the following probabilities:
P{death before age 10} = 0.83
P{death between ages 10 and 13.5} = 0.16
P{death at age 13.5 or after} = 0.01

The probability of a sample of size nine having two members in the first
category, and one member in each of the next two categories is:
9!/(7!*1!*1!)*(0.83)^7*(0.16)*(0.01) = 0.03, approximately.

This low probability could mean that the two populations have different
distributions of age, or it could mean that the method of choosing the
sample of nine was not random, or it could mean that you are observing a
relatively unusual random event.

Silvan Urfer

unread,

Jan 24, 2002, 6:49:38 PM1/24/02

to

Neville X. Elliven wrote:

> >In Population 1, the chance of reaching 10 years of age is p=0.17, the
> >cance of reaching 13.5 is q=0.01
>
> I take this to mean:
> P{death before age 10} = 0.83
> P{death before age 13.5} = 0.99

Yup, that's what I meant

> It's a multinomial distribution, with the following probabilities:
> P{death before age 10} = 0.83
> P{death between ages 10 and 13.5} = 0.16
> P{death at age 13.5 or after} = 0.01
>
> The probability of a sample of size nine having two members in the first
> category, and one member in each of the next two categories is:
> 9!/(7!*1!*1!)*(0.83)^7*(0.16)*(0.01) = 0.03, approximately.

I suspect that should be seven instead of two... anyway, thanks a lot for
pointing out the solution. You helped me a great deal here.

Now, to use general notation, would that be:

n!/((n-2)!*1!*1!)*(1-p)^(n-2)*(q-p)^1*(1-q)^1

In case it is, I think I have got the message. Thanks again.

Silvan

Neville X. Elliven

unread,

Jan 26, 2002, 2:23:45 AM1/26/02

to

Silvan Urfer wrote:

>> It's a multinomial distribution, with the following probabilities:
>> P{death before age 10} = 0.83
>> P{death between ages 10 and 13.5} = 0.16
>> P{death at age 13.5 or after} = 0.01
>>
>> The probability of a sample of size nine having two members in the
>> first category, and one member in each of the next two categories is:
>> 9!/(7!*1!*1!)*(0.83)^7*(0.16)*(0.01) = 0.03, approximately.
>
> I suspect that should be seven instead of two...

Yes, that is correct; what I *meant* to write was:

The probability of a sample of size nine having *seven* members in the

first category, and one member in each of the next two categories is:
9!/(7!*1!*1!)*(0.83)^7*(0.16)*(0.01) = 0.03, approximately.

>Now, to use general notation, would that be:
>
>n!/((n-2)!*1!*1!)*(1-p)^(n-2)*(q-p)^1*(1-q)^1

The general form is:
n!/[n(1)!*n(2)!*...*n(k)!]*p(1)^n(1)*p(2)^n(2)*...*p(k)^n(k)
where there are n members in k categories, with
n(i) = count of members in category i
p(i) = probability of membership in category i
n(1) + n(2) + ... + n(k) = n
p(1) + p(2) + ... + p(k) = 1