Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

relatives' relative age

0 views

Skip to first unread message

Mark Spahn

unread,

Oct 18, 2007, 6:22:20 PM10/18/07

Imagine two siblings, Sib0 and Sib1, taken randomly from a given
population (e.g., everyone in the world who has ever lived)
and consider the probability that their age difference
x = age of Sib1 - age of Sib0
assumes a given value. What does this probability distribution
function look like? It must be symmetrical about 0, but its value at
0 must be less than its value at -1 or +1 years. Why?
Because few siblings are of the same age (only twins and
adopted brothers/sisters). The value of this probability
distribution function must be at its lowest between -9 months
and +9 months, and must be very low where x < 10 and x > 10 years.
My guess is that 90% of siblings are within, oh, 7 years in age;
i.e., P{-7 < x < 7} = .9 (imagine the area under the probability
distribution function's graph from x=-7 to x=7).

Now consider the probability distribution function of
x = age of parent of person P - age of person P,
where P is taken randomly from a given population.
The probability that P's parent is < 16 years older or
> 40 years older than person P must be low, but
I don't know the shape of the graph for 16 < x < 40.
(Come to think of it, the curve for
x = age of mother of person P - age of person P
is somewhat to the left of the curve for
x = age of father of person P - age of person P.)

But once we know these two probability distribution functions,
we can calculate the probability distribution function for any
pair of relatives. On average, how much older than a person
is his/her aunt or uncle? This is some combination of the
parent-child age difference and the sibling-sibling age difference.
This (person's parent's sibling's age - person's age) probability
distribution must be more spread-out than that for the parent-child
age difference.

Similarly, the age difference between cousins must be more
spread out than the age difference between siblings.
And we can expand this analysis to grandparent-grandchild
age differences, and age differences between 2nd, 3d, and
18th cousins. What do these various age-difference probability
distribution functions look like? They should all be derivable
from the first two age-difference probability distribution functions.
Then we can ask and answer questions like:
What is the probability that you have an aunt or uncle who
is younger than you are?
What is the probability that your age is within 5 years of
the age of a random 2nd cousin of yours?
For what value n is the probability 95% that your age is within
n years of the age of a random 8th cousin of yours?

Does anybody have access to the sibling-sibling and parent-child
age difference probability distribution functions (or to the raw
genealogical data from which they can be derived)?
I hope the questions I am asking are clear enough, and if I am
using improper terminology, corrections are welcome.
These questions strike me as the kind that genealogists must
have answered at least a hundred years ago.

-- Mark Spahn (West Seneca, NY)

William Elliot

unread,

Oct 18, 2007, 11:41:12 PM10/18/07

On Thu, 18 Oct 2007, Mark Spahn wrote:

> Imagine two siblings, Sib0 and Sib1, taken randomly from a given
> population (e.g., everyone in the world who has ever lived) and consider
> the probability that their age difference x = age of Sib1 - age of Sib0
> assumes a given value.

I propose that |age(S1) - age(S2)| would be a more useful statistic.

> What does this probability distribution function look like? It must be
> symmetrical about 0, but its value at 0 must be less than its value at
> -1 or +1 years. Why? Because few siblings are of the same age (only
> twins and adopted brothers/sisters). The value of this probability
> distribution function must be at its lowest between -9 months and +9
> months, and must be very low where x < 10 and x > 10 years. My guess is
> that 90% of siblings are within, oh, 7 years in age; i.e., P{-7 < x < 7}
> = .9 (imagine the area under the probability distribution function's
> graph from x=-7 to x=7).
>

If they're twins then the measure is zero.
If two babies are adopted together by a family,
then the measure is anything >= 0.
The again, if the two sibling aren't related,
then the measure would have larger spread.

You did not make clear if the siblings were related,
how the were related or if just any two people
who have a bother or sister are consider a sibling.

That the measure is symmetrical about zero is that
it's prone to double entry. One for S1 and S2, the
other for S2 and S2. Thus the suggestion of absolute
value for the measure.

hagman

unread,

Oct 19, 2007, 3:45:58 AM10/19/07

On 19 Okt., 00:22, "Mark Spahn" <msp...@localnet.com> wrote:
> Imagine two siblings, Sib0 and Sib1, taken randomly from a given
> population (e.g., everyone in the world who has ever lived)

How are they taken?
Take two persons A,B and repeat until they are siblings?
Someone with many siblings will be taken more likely then.

I'm quite sure that we are not talking about independent random
experiments.
Decisions about having many or few children, getting them early or
late etc.
may be corrrelated between parents and their children by family
tradition or local customs.

Mike

unread,

Oct 22, 2007, 10:59:24 PM10/22/07

In article <13hfn49...@corp.supernews.com>, msp...@localnet.com says...

> Imagine two siblings, Sib0 and Sib1, taken randomly from a given
> population (e.g., everyone in the world who has ever lived)
> and consider the probability that their age difference
> x = age of Sib1 - age of Sib0
> assumes a given value. What does this probability distribution
> function look like? It must be symmetrical about 0, but its value at
> 0 must be less than its value at -1 or +1 years. Why?
> Because few siblings are of the same age (only twins and
> adopted brothers/sisters). The value of this probability
> distribution function must be at its lowest between -9 months
> and +9 months, and must be very low where x < 10 and x > 10 years.
> My guess is that 90% of siblings are within, oh, 7 years in age;
> i.e., P{-7 < x < 7} = .9 (imagine the area under the probability
> distribution function's graph from x=-7 to x=7).
>
> Now consider the probability distribution function of
> x = age of parent of person P - age of person P,
> where P is taken randomly from a given population.
> The probability that P's parent is < 16 years older or
> > 40 years older than person P must be low, but
> I don't know the shape of the graph for 16 < x < 40.

Hey it isn't that rare - I am 40 years and 2 months older than my son and certainly in my society that isn't uncommon.

> (Come to think of it, the curve for
> x = age of mother of person P - age of person P
> is somewhat to the left of the curve for
> x = age of father of person P - age of person P.)
>

A quick google on "age of mother at birth" found me several sets of data that provide pretty much exactly the type of
data you are looking for - sure you can find some too.

Mike

Proginoskes

unread,

Oct 24, 2007, 4:16:59 AM10/24/07

At the risk of putting my head on the chopping block again ...

On Oct 18, 3:22 pm, "Mark Spahn" <msp...@localnet.com> wrote:
> Imagine two siblings, Sib0 and Sib1, taken randomly from a given
> population (e.g., everyone in the world who has ever lived)
> and consider the probability that their age difference
> x = age of Sib1 - age of Sib0
> assumes a given value. What does this probability distribution
> function look like? It must be symmetrical about 0, but its value at
> 0 must be less than its value at -1 or +1 years.

I would expect a spike at 0, then the probability distribution would
drop down to 0 until you get to around +/- 3/4, when it would start
going up again.

> Why?
> Because few siblings are of the same age (only twins and
> adopted brothers/sisters). The value of this probability
> distribution function must be at its lowest between -9 months
> and +9 months, and must be very low where x < 10 and x > 10 years.

There _are_ families with lots of children. And if you count multiple
marriages, the ages between "siblings" can be on the order of 20
years. It's happened in my (extended) family.

Which brings up the issue of how you are choosing a pair of siblings.
Are you choosing a family, then choosing two siblings out of that
family? In that case, families with a small number of children would
dominate the statistics. If you make a list of all pairs of siblings
(S1,S2), then pick a pair at random (with equal probability), then
this favors families with a large number of children. For instance, a
family with 2 children would only occur once (or twice?) on the list,
but a family of 5 would show up ten (or twenty) times. That would push
up the expected value of the (absolute value of the) difference.

What about in China, where families used to kill baby girls? Should
_they_ be counted? If not, that will increase the expected value as
well.

People are living longer now and are thus able to have children over a
longer span of time. How does this affect the sampling?

> My guess is that 90% of siblings are within, oh, 7 years in age;
> i.e., P{-7 < x < 7} = .9 (imagine the area under the probability
> distribution function's graph from x=-7 to x=7).
>
> Now consider the probability distribution function of
> x = age of parent of person P - age of person P,
> where P is taken randomly from a given population.
> The probability that P's parent is < 16 years older or> 40 years older than person P must be low, but

Again, this will depend on the culture, the longevity of the
population at that time, etc.

> I don't know the shape of the graph for 16 < x < 40.

You should be able to get the data from the Census Bureau, for the USA
over the past century or so, at the very minimum.

Or asked about, anyway. They may not have had access to the data, so
they couldn't answer them.

Try the US Census Bureau, though.

--- Christopher Heckman

Dan in NY

unread,

Oct 24, 2007, 2:51:53 PM10/24/07

relationships
"Mark Spahn" <msp...@localnet.com> wrote in message
news:13hfn49...@corp.supernews.com...

&&&
Greetings Mark and other amr readers,

When I read this, I thought of life insurance companies and their experience
tables. These tables are used to determine the rates of life insurance
policies. (A web search of "experience table" resulted in over 100,000
hits.) Years ago, I took an introductory undergraduate course in statistics
and probability. I learned some of the simpler concepts. I don't remember
much about it now but I'll try to put your questions into my perspective.

To form an experience table, lots of data is needed. For life insurance, a
mortality table is created and used. The data starts simple but soon
becomes complicated. The data is the age at death of a large number of
people. But what people and when? The data varies depending on how, when
and where it is gathered. One table might be prepared from data for deaths
occurring in a given year and country. Since life insurance rates (and many
other things) depend on it, new tables may be prepared every year or even
more often. There may be tables depending on sex, religion or denomination
and many other categories.

I said, "a large number of people." You wrote, "My guess is that 90% of
siblings are within, oh, 7 years in age ... ." To take the "guess" out of
it, mathematical (or actuarial) definitions are needed. These definitions
make it possible to determine how much data is needed for a specified level
of accuracy.

Then you mention an 8th cousin and genealogists. I am an amateur
genealogist (where "amateur" means I haven't taken any courses or testing in
it). Mostly I work with facts about my relatives. The word "cousin" is
used in many ways but to be definite about something, a definition is
needed. The simplest definition I use is that all cousins are descendants
of a common ancestor. Also among those descendants are aunts and uncles;
parents, grandparents etc.; nieces and nephews. First cousins all have the
same grandparent. (Sometimes it requires both grandparents to be the same.)
Your "first cousins once removed" are children of your first cousins. 8th
cousins both have a common ancestor, and both are 8 generations away from
their nearest common ancestor.

I have a copy of a chart (published by Blake and Blake) that shows family
relationships for five generations. It was copied from a chart where
official wills are kept so I suppose the purpose of the chart is for legal
determination of an order of next of kin.

I propose that none of your questions can be answered without gathering
experience data. For example, the census of West Seneca, NY for the year
2000 likely contains the age of the children who live with given parents.
This data could give a general idea of the age difference of siblings but it
has many things to qualify it. Some of the parents will have more children,
some of the children have died, moved away, and/or live with someone other
than their parents, etc.

You could use this census and compile that data. You could then form a
statement about the siblings who lived in West Seneca, NY for the year 2000.
With a sample of the data you could use the mathematics of statistics to
make the statement more precise. My county is between 400 and 500 miles
from there. I could do the same for my county. It seems likely to me that
this data couldn't be used for the statement you imply. Your example,
"everyone in the world who has ever lived" must require data that was never
recorded. For that example, If you consider families in which the parents
have died and those who may have more children, the results will be much
different.

IMHO very little data has been compiled that would assist in providing facts
you suggest. If you search for web sites you might be able find someone who
could help you more. The Blake and Blake genealogist's web site may be of
help. I found this link there:
http://www.familysearch.org/Eng/Library/FHL/library_main.asp, established
for LDS, but will help anyone. They have a history center in Buffalo
(Williamsville) as shown on the web site.

I could go on and on. I think I already have a good start doing that but I
shouldn't post this much. I have sometimes thought I might like to have the
same or similar pieces of data as those you mention. If you find any of the
type of information you mention, please let me know by posting here or by
sending email.
--
Dan in NY
(for email, exchange y with g in
dKlinkenbery at hvc dot rr dot com)

0 new messages