How many names?

Kathryn M Rogers

unread,

Sep 21, 2009, 8:23:12 PM9/21/09

to

Hello Colleagues,

I have FTM 2005 and I am wondering how many names will it take before it has
a meltdown? I have about 75,000 now and don't want an unexpected crash. Of
course I do a backup regularly but is there an optimum number of names at
which you should divide your data base?

Regards,
Kathryn Rogers

Mick

unread,

Sep 21, 2009, 8:37:55 PM9/21/09

to

http://gen.custhelp.com/cgi-bin/gen.cfg/php/enduser/std_adp.php?p_faqid=1680

Ask on their Tech Help board.

MickG

--
One of the main causes of the fall of the Roman Empire was that, lacking
zero, they had no way to indicate successful termination of their C
Programs. (Robert Firth)

Paul Blair

unread,

Sep 21, 2009, 10:29:19 PM9/21/09

to

The limit is file size, not the number of individuals you have.

I believe that particular model tops out at 2GB - so have a look in your
directory to see how big your main file is. It will also depend on how
much other stuff you have in there - pictures in particular.

I have 8000 people in 32MB, with almost no graphics. At that rate, I'd
never hit the 2GB limit, but YMMV!

Paul

Wes Groleau

unread,

Sep 22, 2009, 12:35:28 AM9/22/09

to

Paul Blair wrote:

> Kathryn M Rogers wrote:
>> I have FTM 2005 and I am wondering how many names will it take before
>> it has a meltdown? I have about 75,000 now and don't want an
>

> The limit is file size, not the number of individuals you have.
> I believe that particular model tops out at 2GB - so have a look in your

Forgive my aging memory--is FTM the one that was recently identified
as being based on Microsoft Access?

If so, then should you be so fortunate as to get within one record
of a two gig file size, that one more record will completely trash
the whole thing.

But, it's unlikely you will get that far. Long before you get there,
Access gets very slow and very unreliable.

--
Wes Groleau

Unusual ways of learning?
http://Ideas.Lang-Learn.us/WWW?itemid=96

Paul Blair

unread,

Sep 22, 2009, 12:52:11 AM9/22/09

to

Wes Groleau wrote:
> Paul Blair wrote:
>> Kathryn M Rogers wrote:
>>> I have FTM 2005 and I am wondering how many names will it take before
>>> it has a meltdown? I have about 75,000 now and don't want an
>>
>> The limit is file size, not the number of individuals you have.
>> I believe that particular model tops out at 2GB - so have a look in your
>
> Forgive my aging memory--is FTM the one that was recently identified
> as being based on Microsoft Access?
>
> If so, then should you be so fortunate as to get within one record
> of a two gig file size, that one more record will completely trash
> the whole thing.
>
> But, it's unlikely you will get that far. Long before you get there,
> Access gets very slow and very unreliable.
>

Forgiven, Wes. Legacy is based on Access, FTM has its own data system.

On my figuring, it would be the 500,000th entry that would do the
damage. Spread the word...

Paul

Charlie Hoffpauir

unread,

Sep 22, 2009, 9:47:14 AM9/22/09

to

On Tue, 22 Sep 2009 14:52:11 +1000, Paul Blair <pbl...@pcug.org.au>
wrote:

Actually based on my experience as an (ex) FTM user, it's as likely to
corrupt your data with a few hundred names as with a few thousand. And
I don't recall any limit as to size, either number of names or file
size. But I do know that after about 30,000 names things could get
awful slow, especially generating an All-in-one tree.

singhals

unread,

Sep 22, 2009, 10:46:53 AM9/22/09

to gen...@rootsweb.com

Kathryn M Rogers wrote:

;^) About 6 names short of a logical break-point, would be
my guess. Lord knows, that's how it works in other programs!

Realistically, I'd imagine it's more a matter of RAM and
file-size than a simple head-count. If I've got 100,000
names with almost no data other than a name and a
relationship, it's going to be a MUCH smaller file than a
25,000 name database where I have full birth, marriage,
death, burial info, along with photographs and copious notes
and source citations on everyone ... even if I use web-sized
photos.

Cheryl

J. Hugh Sullivan

unread,

Sep 22, 2009, 2:57:30 PM9/22/09

to

On Tue, 22 Sep 2009 10:46:53 -0400, singhals <sing...@erols.com>
wrote:

>Realistically, I'd imagine it's more a matter of RAM and
>file-size than a simple head-count. If I've got 100,000
>names with almost no data other than a name and a
>relationship, it's going to be a MUCH smaller file than a
>25,000 name database where I have full birth, marriage,
>death, burial info, along with photographs and copious notes
>and source citations on everyone ... even if I use web-sized
>photos.
>
>Cheryl

From this thread it seems to me that people are happy just gathering
names. Otherwise how could one have 50,000-100,000 names? I have done
the genealogy of the Bible and Irish Mythology/History of the
Sullivans in separate files and don't have anything like that many
names.

I only have 6,000+ names but I have varying amounts of data on each.
Most are as a result of "meeting" someone and exchanging data. If all
they had was a couple of names, I excluded them.

I have walked the areas where three generations before me lived along
with all the siblings. I have visited the gravesites of every one that
could be found - some so remote that my Deep Woods Off quit working.

I'm not critiqueing methods, but it seemed time to point out that
making genealogy very personal has a place also.

Hugh

Steve Hayes

unread,

Sep 22, 2009, 9:44:42 PM9/22/09

to

On Tue, 22 Sep 2009 18:57:30 GMT, Ea...@bellsouth.net (J. Hugh Sullivan)
wrote:

I don't know about the new version, but the old PAF 2.x had a names database.
It stored names once, and then referred to them by pointers. So if you had 25
people with the name Mary, it would just store Mary once, and the person
records for those 25 people would point to the location where the name was
stored.

I think what people are talking about is not actually names, but how many
person records a database will hold.

--
Steve Hayes from Tshwane, South Africa
Web: http://hayesfam.bravehost.com/stevesig.htm
Blog: http://methodius.blogspot.com
E-mail - see web page, or parse: shayes at dunelm full stop org full stop uk

Peter J Seymour

unread,

Sep 23, 2009, 5:03:44 AM9/23/09

to

Ideally, someone with time on their hands should do some experiments and
find out. And no not me, I'm too busy with other things like trying to
maximise the amount of data Gendatam Suite can squeeze into a given
amount of RAM. (Answer: RAM space to accommodate a gedcom file is
currently down to approximately an 8x multipler on filesize, it depends
to some extent on the make-up of the file though).
Peter

J. Hugh Sullivan

unread,

Sep 23, 2009, 9:36:52 AM9/23/09

to

>Steve Hayes from Tshwane, South Africa

There were talking about storage, Steve. But, I didn't think newbies
should wonder if they needed 90,000 names be be considered a
genealogist. It escapes me how anyone could do justice to that many
names other than to occupy disc space. Again, no criticism intended.

Hugh

singhals

unread,

Sep 23, 2009, 10:37:35 AM9/23/09

to gen...@rootsweb.com

J. Hugh Sullivan wrote:

Some of my more unwieldy databases are what you could call
One Name collections. I don't know for certain which John
Smith, so I collected 'em all when I found one until I found
the one that matched the rest of what I knew. Or, Someone
was once looking for Ada Kuykendall. I have an interest in
some Kuykendalls, and so I helped look -- and never purged
those who weren't mine or his from the file.

I like to put in "stories" about people, but to be up-front
about it, many of the specific stories I remember about
people aren't things you'd /tell/ their grandkids because it
sets a bad example that could be used by those grands. (g)

BTW -- hear anything from Blackeagle?

Cheryl

J. Hugh Sullivan

unread,

Sep 23, 2009, 12:41:22 PM9/23/09

to

On Wed, 23 Sep 2009 10:37:35 -0400, singhals <sing...@erols.com>
wrote:

>Some of my more unwieldy databases are what you could call
>One Name collections. I don't know for certain which John
>Smith, so I collected 'em all when I found one until I found
>the one that matched the rest of what I knew. Or, Someone
>was once looking for Ada Kuykendall. I have an interest in
>some Kuykendalls, and so I helped look -- and never purged
>those who weren't mine or his from the file.

Comprende, Senora.

>I like to put in "stories" about people, but to be up-front
>about it, many of the specific stories I remember about
>people aren't things you'd /tell/ their grandkids because it
>sets a bad example that could be used by those grands. (g)

I'm different there, too. My gg grandfather had 4 children by 3 ladies
before he married the one who bore him 2.

We call him Stud Sullivan. 8-)

>BTW -- hear anything from Blackeagle?
>
>Cheryl

When last heard he lived about 30 miles from here. But after he moved
to WI, nothing.

Hugh

Percival P. Cassidy

unread,

Sep 23, 2009, 2:16:49 PM9/23/09

to

Steve Hayes wrote:

>> I only have 6,000+ names but I have varying amounts of data on each.
>> Most are as a result of "meeting" someone and exchanging data. If all
>> they had was a couple of names, I excluded them.
>>
>> I have walked the areas where three generations before me lived along
>> with all the siblings. I have visited the gravesites of every one that
>> could be found - some so remote that my Deep Woods Off quit working.
>>
>> I'm not critiqueing methods, but it seemed time to point out that
>> making genealogy very personal has a place also.

> I don't know about the new version, but the old PAF 2.x had a names database.
> It stored names once, and then referred to them by pointers. So if you had 25
> people with the name Mary, it would just store Mary once, and the person
> records for those 25 people would point to the location where the name was
> stored.

And that is how a database should be organized, according to what I have
read. And that is how I constructed my bibliography database using
DataPerfect (a wonderful program by WordPerfect Corporation, which was
thrown into the dumpster by it subsequent owners): each publisher and
each author occurred once each, with pointers from each entry in the
title file to the appropriate entries in the author and publisher files.
(That's the simplified explanation anyway).

Perce

Steve Hayes

unread,

Sep 24, 2009, 12:27:03 AM9/24/09

to

Indeed most relational databases are organised in that way, though they don't
always get down to individual names -- an author might be recorded once, but
in most cases not the elements of the author's name.

Peter J Seymour

unread,

Sep 24, 2009, 4:10:46 AM9/24/09

to

Steve Hayes wrote:
> On Wed, 23 Sep 2009 14:16:49 -0400, "Percival P. Cassidy"
> <Nob...@NotMyISP.net> wrote:
>
>
>>Steve Hayes wrote:
>>
>>

.....

>>>I don't know about the new version, but the old PAF 2.x had a names database.
>>>It stored names once, and then referred to them by pointers. So if you had 25
>>>people with the name Mary, it would just store Mary once, and the person
>>>records for those 25 people would point to the location where the name was
>>>stored.
>>
>>And that is how a database should be organized, according to what I have
>>read. And that is how I constructed my bibliography database using
>>DataPerfect (a wonderful program by WordPerfect Corporation, which was
>>thrown into the dumpster by it subsequent owners): each publisher and
>>each author occurred once each, with pointers from each entry in the
>>title file to the appropriate entries in the author and publisher files.
>>(That's the simplified explanation anyway).
>
>
> Indeed most relational databases are organised in that way, though they don't
> always get down to individual names -- an author might be recorded once, but
> in most cases not the elements of the author's name.
>

This is an example of how database design can get firmly emeshed in the
meaning of the data. I feel the sensible way is to focus on the person.
A person, identified by a set of names, does only occur once (assuming
identification is sufficiently specific). On the other hand treating
name values as unique can be problematic. Does a particular name
spelling has only one derivation and meaning. Some forenames seem to
have multiple potential meanings suggesting that the name is a
conflation of different ones or at least that the writer was not sure on
the matter. A major source of confusion can be simple misspelling such
that two different names end up spelt the same. I suspect it is more of
a problem with surnames. For instance, how many different historical
sources does the modern day surname Seymour have? More than one. For
these reasons I prefer to treat names as unique to a person/family and
not attempt to break them out into a separate common table. The PAF
behaviour described above seems a bit perverse, but then I don't know
what they were trying to achieve.
Peter

Ian Goddard

unread,

Sep 24, 2009, 4:30:34 AM9/24/09

to

The two are different cases. The first seems to be an index.

There might be different authors with the same name. If you're simply
treating the author file as an index without distinguishing between
synonyms that might be OK for a bibliographic database where you're
treating the the files as indexes.

However it wouldn't do for other purposes which would require proper
normalisation. If the same publisher has John Smith who writes
blockbuster novels and John Smith who writes books on esoteric
genealogical software you can bet that one of the John Smiths will have
fairly robust views about keeping the two distinct when it comes to
paying royalties.

From the point of view of a genealogical database, which I hope would
be normalised, the same person might be represented many times and in
different roles. A man's name might be the subject of separate birth,
baptism, death and burial records; the groom in his marriage records;
the husband in records pertaining to his wife/widow; the father in
records pertaining to his children; the testator in his will and the
beneficiary in others etc. A woman's name might appear less frequently,
at least in earlier times, as PRs often only name the father in baptisms
of legitimate children. We also have to bear in mind that names are not
always consistently spelled so even if we tried to join all these roles
in the various events to a single person record we would find that some
people would have more than one name variation.

In fact, although joining the roles to a single person record might
initially seem to be a valid thing to do it isn't really. The core
activity of genealogy is in deciding how events and people are linked -
is the John Smith who is the son in this baptismal record the same
person as the John Smith who is the groom in this marriage and is he the
same same John Smith who is the father in that baptismal record? We
really should keep the event records, with their names, distinct so that
we can relink them later if we change our minds. So we should allow for
the fact that a database will need to maintain a great many more names
than the number of people it represents.

--
Ian

Hotmail is for spammers. Real mail address is igoddard
at nildram co uk

Ian Goddard

unread,

Sep 24, 2009, 4:51:09 AM9/24/09

to

Peter J Seymour wrote:
> I feel the sensible way is to focus on the person.

How would you do this? The person no longer exists having died years
ago. All we have are the records, possibly ambiguous, possibly
contradictory, which provide evidence of the person. The "person" who
emerges from our research is a historical reconstruction and until that
reconstruction is complete, or at least, well under way, we have no
"person" on which to focus.

John, son of William Goddard was baptised in 1753
John, son of Jonathan Goddard was baptised in 1753

Clearly there are two people here.

John Goddard was buried, aged 61 in December 1814

Clearly this John Goddard had been a person and could have been one of
the two represented by the baptisms - and probably was - but which? It
took several years of investigation and a whole slew of other,
interlocking records to establish to my satisfaction that he was, in
fact, the son of Jonathan.

I repeat, all we have are the records. Focus on those.

Steven Gibbs

unread,

Sep 24, 2009, 5:28:39 AM9/24/09

to

"Ian Goddard" <godd...@hotmail.co.uk> wrote in message
news:aKWdnSPuGs5grybX...@pipex.net...

>
> John, son of William Goddard was baptised in 1753
> John, son of Jonathan Goddard was baptised in 1753
>
> Clearly there are two people here.
>
> John Goddard was buried, aged 61 in December 1814

What is the problem? I enter three records in my database. I create three
people in my database. I hope that at some stage I can merge two of those
people. Whether I see that as concentrating on the person, or as
concentrating on the record is immaterial.

Steven

Ian Goddard

unread,

Sep 24, 2009, 5:53:24 AM9/24/09

to

Steven Gibbs wrote:
> "Ian Goddard" <godd...@hotmail.co.uk> wrote in message
> news:aKWdnSPuGs5grybX...@pipex.net...
>> John, son of William Goddard was baptised in 1753
>> John, son of Jonathan Goddard was baptised in 1753
>>
>> Clearly there are two people here.
>>
>> John Goddard was buried, aged 61 in December 1814
>
> What is the problem? I enter three records in my database. I create three
> people in my database.

There's the problem. Your database has records for three people when
there are almost certainly no more than two.

Steven Gibbs

unread,

Sep 24, 2009, 6:16:39 AM9/24/09

to

"Ian Goddard" <godd...@hotmail.co.uk> wrote in message

news:EOednV9P0bkJ3CbX...@pipex.net...

> Steven Gibbs wrote:
>> "Ian Goddard" <godd...@hotmail.co.uk> wrote in message
>> news:aKWdnSPuGs5grybX...@pipex.net...
>>> John, son of William Goddard was baptised in 1753
>>> John, son of Jonathan Goddard was baptised in 1753
>>>
>>> Clearly there are two people here.
>>>
>>> John Goddard was buried, aged 61 in December 1814
>>
>> What is the problem? I enter three records in my database. I create
>> three people in my database.
>
> There's the problem. Your database has records for three people when
> there are almost certainly no more than two.

So does any working genealogical database, whatever its form. You had three
people in your working database, even if that database was in the form of
scraps of written notes. Eventually you had the evidence to merge two of
them.

Of course, the details aren't going into my super-duper 100% (I hope)
correct publishable database, but I am rarely able to touch that these days.
Even then, I wouldn't be distraught if I found that Mary Unknown, mother of
X, and Mary, daughter of Y turned out to be the same person.

Steven

Steve Hayes

unread,

Sep 24, 2009, 7:47:13 AM9/24/09

to

On Thu, 24 Sep 2009 09:10:46 +0100, Peter J Seymour <mo...@pjsey.demon.co.uk>
wrote:

What they were trying to achieve was a saving on disk space when many people
were using 360k floppies to store their data, and 1,2 meg HD floppies were
state of the art, while 1,4 Mb stiffies were ahead of the envelope.

If you have 100 Marys, 70 Johns, 50 Peters and so on, then if you store each
of them only once, and have pointers to them, you can save a fair bit of disk
space. That's how relational databases are supposed to work -- it's the
somethingth normal form, store each discrete piece of information once and
only once.

Ian Goddard

unread,

Sep 24, 2009, 9:21:01 AM9/24/09

to

Steven Gibbs wrote:
> "Ian Goddard" <godd...@hotmail.co.uk> wrote in message
> news:EOednV9P0bkJ3CbX...@pipex.net...
>> Steven Gibbs wrote:
>>> "Ian Goddard" <godd...@hotmail.co.uk> wrote in message
>>> news:aKWdnSPuGs5grybX...@pipex.net...
>>>> John, son of William Goddard was baptised in 1753
>>>> John, son of Jonathan Goddard was baptised in 1753
>>>>
>>>> Clearly there are two people here.
>>>>
>>>> John Goddard was buried, aged 61 in December 1814
>>> What is the problem? I enter three records in my database. I create
>>> three people in my database.
>> There's the problem. Your database has records for three people when
>> there are almost certainly no more than two.
>
> So does any working genealogical database, whatever its form. You had three
> people in your working database, even if that database was in the form of
> scraps of written notes. Eventually you had the evidence to merge two of
> them.

That's not how I'd look at it. What I had was three (and eventually
more) records of contemporary origin some of which were to be eventually
*linked* to a reconstructed single individual. But they were able to
remain unlinked for as long as they need to be and even once linked they
can, in the light of newer and better information (e.g. a Will of
William's showing John's children to be his grandchildren), be unlinked
and relinked elsewhere. Picking apart a merged entity is likely to be
much more messy and makes the whole structure a good deal more fragile.

The trouble is that a failure to distinguish between two entities, a
name and the person named, is a common feature of genealogical databases
and firmly embodied in Gedcom.

> my super-duper 100% (I hope) correct ... database, but I am rarely able to touch that these days.

Why? Would it be because it's too difficult to revise if you merge the
wrong entities?

Steven Gibbs

unread,

Sep 24, 2009, 11:17:59 AM9/24/09

to

"Ian Goddard" <godd...@hotmail.co.uk> wrote in message

news:_L2dnejYw-qj7ybX...@pipex.net...

> Steven Gibbs wrote:
>> "Ian Goddard" <godd...@hotmail.co.uk> wrote in message
>> news:EOednV9P0bkJ3CbX...@pipex.net...
>>> Steven Gibbs wrote:
>>>> "Ian Goddard" <godd...@hotmail.co.uk> wrote in message
>>>> news:aKWdnSPuGs5grybX...@pipex.net...
>>>>> John, son of William Goddard was baptised in 1753
>>>>> John, son of Jonathan Goddard was baptised in 1753
>>>>>
>>>>> Clearly there are two people here.
>>>>>
>>>>> John Goddard was buried, aged 61 in December 1814
>>>> What is the problem? I enter three records in my database. I create
>>>> three people in my database.
>>> There's the problem. Your database has records for three people when
>>> there are almost certainly no more than two.
>>
>> So does any working genealogical database, whatever its form. You had
>> three people in your working database, even if that database was in the
>> form of scraps of written notes. Eventually you had the evidence to
>> merge two of them.
>
> That's not how I'd look at it. What I had was three (and eventually more)
> records of contemporary origin some of which were to be eventually
> *linked* to a reconstructed single individual.

No. You had a set of names, from a set of records, each of which (almost
certainly) link to one of a set of individuals. You're seeing the records,
because you're discarding those records that link to the *wrong*
individuals. I see the people, because I keep *all* the people in my
working database.

> But they were able to remain unlinked for as long as they need to be and
> even once linked they can, in the light of newer and better information
> (e.g. a Will of William's showing John's children to be his
> grandchildren), be unlinked and relinked elsewhere. Picking apart a
> merged entity is likely to be much more messy and makes the whole
> structure a good deal more fragile.

Yes and no. Picking apart a wrongly merged entity is a pain in the
proverbial, but, on the other hand, I've found that, when the risk is small,
making the obvious assumptions in my working databases allows a much clearer
view of what's going on. I don't know how many times I've chased people
through the IGI and FreeBMD, found a whole lot of circumstantial evidence,
thought "I know it's right, but I don't have enough proof" and had to
discard it because my database only contained relationships I was certain
of. Now I've changed my methods, it *all* goes in, and, yes I can see I've
got multiple people who might be the same, but I can't merge. But as more
data gets added, somrtimes they converge, sometimes they diverge. And if a
contradiction does emerge, at lease I've got all the source data to hand
that tempted me to go wrong in the first place.

What with the LMA records becoming available, I've just been working on my
HAZELTONs of London database. I have a lot of multiple people along the
lines of, for example,

Sidney Arthur C. HAZELTON
Birth Registered Q4 1904 at Richmond Surrey Registration District

and

Sidney A.C. HAZELTON
Marriage Registered Q1 1929 at Richmond Surrey Registration District
Spouse: Prewett

Are they the same person? Of course they are. Can I merge them? No, not
yet. Do I worry that I've got him as two different people? Not at all. Is
he one of mine? I very much doubt it, but if he turns out to be, I'll not
have to go back over old ground. But equally, I have been able to resolve
very many of the similar 19th century obvious duplications from the LMA
data. Almost all, with very few exceptions, were as expected. Some of them
even did turn out to be from my lines.

>> my super-duper 100% (I hope) correct ... database, but I am rarely able
>> to touch that these days.
>
> Why? Would it be because it's too difficult to revise if you merge the
> wrong entities?

No. Because it's almost impossible to find any further records, let alone
adequate confirmation for my pre-1837 ancestors.

Steven

Peter J Seymour

unread,

Sep 24, 2009, 12:17:39 PM9/24/09

to

Steve Hayes wrote:
> On Thu, 24 Sep 2009 09:10:46 +0100, Peter J Seymour <mo...@pjsey.demon.co.uk>
> wrote:
>

.....

The PAF
>>behaviour described above seems a bit perverse, but then I don't know
>>what they were trying to achieve.
>
>
> What they were trying to achieve was a saving on disk space when many people
> were using 360k floppies to store their data, and 1,2 meg HD floppies were
> state of the art, while 1,4 Mb stiffies were ahead of the envelope.
>
> If you have 100 Marys, 70 Johns, 50 Peters and so on, then if you store each
> of them only once, and have pointers to them, you can save a fair bit of disk
> space. That's how relational databases are supposed to work -- it's the
> somethingth normal form, store each discrete piece of information once and
> only once.
>
>

Hmmm, I wonder how much space that would actually save bearing in mind
the need for pointers and presumably other supporting stuff, but you are
probably correct about the motivation.
I feel that you are not really picking up my other point. Two pieces of
textually identical data in the same field type might have different
historical contexts and to treat them as necessarily identical could
lead to misunderstanding. Presumably a field (not specifically the
contents)that would occur in multiple record types is what is stored
once and that further coalescing identical data values was something you
might or might not do depending on whether it prejudiced the application.
Peter

Peter J Seymour

unread,

Sep 24, 2009, 12:23:02 PM9/24/09

to

Ian Goddard wrote:
> Peter J Seymour wrote:
>
>> I feel the sensible way is to focus on the person.
>
>
> How would you do this? The person no longer exists having died years
> ago. All we have are the records, possibly ambiguous, possibly
> contradictory, which provide evidence of the person. The "person" who
> emerges from our research is a historical reconstruction and until that
> reconstruction is complete, or at least, well under way, we have no
> "person" on which to focus.
>
> John, son of William Goddard was baptised in 1753
> John, son of Jonathan Goddard was baptised in 1753
>
> Clearly there are two people here.
>

...

>
> I repeat, all we have are the records. Focus on those.
>

I think you are being too picky. There is nothing in what you say that
challenges my comments. Well okay, we could use some term such as
persona rather than person, but I think we understand the meaning either
way.
Peter

Steve Hayes

unread,

Sep 24, 2009, 2:08:38 PM9/24/09

to

On Thu, 24 Sep 2009 17:17:39 +0100, Peter J Seymour <mo...@pjsey.demon.co.uk>
wrote:

I'm not sure what you are getting at there.

The historical context has nothing to do with it. Peter is Peter. When you
type it on your keyboard, they keyboard does not need to know the historical
context. In the same way, the disk location where it is stored doesn't need to
know anything about the historical context either. That's something you need
to know.

Just say you have two records such as

Peter Marmaduke Constantine von Lilienstein
and
James Peter Smith

Instead of storing the "Peter" in two different locations in memory, it stores
it in just one. And you don't need to know where the program stores it, and
the program doesn't need to know the historical context, just so long was when
you look at the record for the persons concenrned, the Peter is where you
expect it to be.

Ian Goddard

unread,

Sep 24, 2009, 4:09:24 PM9/24/09

to

Steven Gibbs wrote:
> No. You had a set of names, from a set of records, each of which (almost
> certainly) link to one of a set of individuals. You're seeing the records,
> because you're discarding those records that link to the *wrong*
> individuals. I see the people, because I keep *all* the people in my
> working database.

No, see below.

>> But they were able to remain unlinked for as long as they need to be and
>> even once linked they can, in the light of newer and better information
>> (e.g. a Will of William's showing John's children to be his
>> grandchildren), be unlinked and relinked elsewhere. Picking apart a
>> merged entity is likely to be much more messy and makes the whole
>> structure a good deal more fragile.
>
> Yes and no. Picking apart a wrongly merged entity is a pain in the
> proverbial, but, on the other hand, I've found that, when the risk is small,
> making the obvious assumptions in my working databases allows a much clearer
> view of what's going on. I don't know how many times I've chased people
> through the IGI and FreeBMD, found a whole lot of circumstantial evidence,
> thought "I know it's right, but I don't have enough proof" and had to
> discard it because my database only contained relationships I was certain
> of. Now I've changed my methods, it *all* goes in, and, yes I can see I've
> got multiple people who might be the same, but I can't merge. But as more
> data gets added, somrtimes they converge, sometimes they diverge. And if a
> contradiction does emerge, at lease I've got all the source data to hand
> that tempted me to go wrong in the first place.

Let me offer you an idea that would extend what you're doing and make it
easier.

On the one side you have the primary evidence - events and the list of
name/role cominations derived from them, e.g. a the evidence entity
being a baptismal record of William, son of John Smith giving William
Smith, son and John Smith, father as the name/role entities emerging
from it. All these entities would also have IDs.

On the other you have your interpretation, your reconstruction of what
you think happened.

A key part of this is an entity which represents your individual. This
needs to carry little information in itself, just an ID, a standardised
version of the name and an epithet to distinguish individuals of the
same name (for instance I need to distinguish John Goddard, son of
Jonathan, from his grandfather on the one hand and his son and grandson
on the other, all of whom were called John Goddard). This gives two
handles for the individual, an ID to stitch the database together and an
extended name such as "John Goddard of Scholes" or "John Goddard III" by
which to think of him. This entity is fairly simple because it is
primarily a hub for a series of links.

The first set of links are between the evidential entities and the
reconstruction. The core of this type of link entity is the pair of IDs
pointing to the entities being linked. The prime use of this approach
would be to link different types of evidence name role entities (child
baptised, groom, parent of child baptised etc) to the reconstructed
individual. This structure, however, has some interesting properties.

Firstly, you can supplement the basic link with other information such
as your confidence rating of the link and notes to explain your
reasoning. If you start out unsure of the link you can give it a low
confidence and increase the confidence as more information accumulates
and document your reasons. If you want to underline the fact that you
don't think this evidence relates to this reconstructed individual you
could even set up a link with a negative confidence (I use Gramps which
persists in offering to merge individuals who are not the same, even
parent and child if they have the same name; I would dearly like to be
able to set up an indicator to say "don't offer me this merge again").

Secondly, don't need to decide which alternative evidence entity belongs
to a reconstruction, you can hedge your bets and link to both. In my
example of the two John Goddards, I could have a link to both the
baptismal records with maybe a 50/50 decree of confidence and then, as I
gain more information, I can increase the confidence on one and reduce
that on the other until I decide I can drop a link - or leave it with
zero or negative confidence. This is why I replied "No" above - I can
keep all the records in.

Thirdly the structure is actually one which implements a many-to-many
association. Not only can the reconstruction be linked to multiple
evidence entities (baptism, marriage, baptism of children etc), one
evidence entity can link to multiple reconstructions. For instance I
have a tithe map with a plot marked "John Goddard's estate". I don't
know whether this applies to the father, whose widow had just died, or
the son but with the system I've outlined I could have a link to both.
Although I wouldn't do it that way you could handle my original
problem by setting up hubs for the son of William & the son of Jonathan
& linking the burial, marriage, children's baptisms, etc to both of them.

The second part of the reconstruction is a set of entities which
represents relationships between reconstructed individuals. The
principle would be similar to what I've outlined above: an entity to
represent the relationship itself, say a family, and another set of
links to represent roles within that relationship - mother, father or child.

The key to all this is that the things which we're likely to change are
the links, things which we think of as "real" can be kept fixed. If you
decide you really have two reconstruction entities representing the same
individual all you need to do is update the IDs in the links pointing to
one of them to point to the other and then delete the unlinked entity.

I think it fuses your two approaches. Relationships can be left in even
if they're not quite proved because their status will be clearly
indicated. And because we manipulate links instead of merging you don't
have the problems of your second approach. As far as I can see what I'm
proposing actually supports what you're trying to do whereas what you're
actually doing is constrained by existing software.

Ian Goddard

unread,

Sep 24, 2009, 4:13:48 PM9/24/09

to

Peter J Seymour wrote:
> I think you are being too picky.

I spent many years working in an environment where there was no such
thing as being too picky about differentiating between evidence and its
interpretation. The records we inherit from the past, including the
names in them, are evidence. The "people" in our genealogies are our
interpretations of that evidence. I'm differentiating between them.

Wes Groleau

unread,

Sep 24, 2009, 11:22:06 PM9/24/09

to

Ian Goddard wrote:
> and relinked elsewhere. Picking apart a merged entity is likely to be
> much more messy and makes the whole structure a good deal more fragile.

What if the wrongly merged "entity" is a name?

If you want to focus on three separate records,
how does having the name that happens to be on each record
stored as a single name just because they all happen to
spell it the same?

Now I put my computer scientist hat on. How much space are we saving
by putting "John" (four to ??? bytes depending on the implementation)
in a single place instead of three? How much space are we saving when
we remember that each of those three places has a record number instead,
which is probably a four-byte integer?

How much extra code do we have to write, test, debug, etc. to
dereference the record number every time we refer to one of
those records/persons?

Normalization is not _always_ of as much value as purists think it is.

--
Wes Groleau

"What progress we are making! In the Middle Ages, they would have
burnt me; nowadays they are content with burning my books."
-- Sigmund Freud, 1933
"He was never to know that even that was only an illusory progress,
that ten years later they would have burned his body as well."
-- Ernest Jones, 1953

Wes Groleau

unread,

Sep 24, 2009, 11:25:20 PM9/24/09

to

Ian Goddard wrote:
> I spent many years working in an environment where there was no such
> thing as being too picky about differentiating between evidence and its
> interpretation. The records we inherit from the past, including the
> names in them, are evidence. The "people" in our genealogies are our
> interpretations of that evidence. I'm differentiating between them.

For me, the people and their stories are the goal.
The evidence is a tool, not the goal.

--
Wes Groleau

German Teachers
http://Ideas.Lang-Learn.us/WWW?itemid=81

bi...@harrisongenealogy.co.uk

unread,

Sep 25, 2009, 4:04:31 AM9/25/09

to grolea...@freeshell.org, gen...@rootsweb.com

Wes is right !

REMEMBER that ALL info whether it be statutory BMD info or Parish Registers
or whatever ....... its only heresay ... it's been given VERBALLY by
individuals and maybe or may not be correct ! So when you get as much info
as possible together you need to evaluate it all and make a valued
judgement.

regards

Bill

> -------------------------------
> To unsubscribe from the list, please send an email to
> GENCMP-...@rootsweb.com with the word 'unsubscribe' without the quotes
> in the subject and the body of the message

Peter J Seymour

unread,

Sep 25, 2009, 4:18:32 AM9/25/09

to

What is missing in your argument is context for the two instances of
"Peter". If that doesn't bother you then fine. Yes, the user doesn't
need to know the detail technicalities of the program, but what the user
does want (should want/ need?) is retention of any supporting evidence
or interpretation that they have chosen to enter and the ability to hang
this on the right pieces of data. For instance there may be a "Seymour"
that was originally a "de St Maur" and another that was originally a
"Seymer". This would not be evident in a recent generation but you may
want to nevertheless note the situation. My point is that depending on
the sophistication of the tools available, it may be alright to record
all instances of "Seymour" as one text literal (and indeed at a detail
level, some systems ultimately do that), on the other hand analysis may
benefit from, at the application level, maintaining the instances as
separate entities. You pays your money and takes your choice as they say.
There are levels of complexity in these matters and we may not be
arguing from the same premises. Perhaps you are regarding a name as just
a text string, while I am regarding it as an object with attributes (one
attribute would be the text value). I have given thought to whether
Gendatam Suite should coalesce name values in a literal pool. I came to
the conclusion that it would in general cost storage space rather than
save it. Also, any transient operations such as counting instances of
each name value are easily performed as the program just has to work
down a record index.
I'm not sure where this gets us, I'm just explaining how things work for me.
Peter

Steve Hayes

unread,

Sep 25, 2009, 2:32:04 PM9/25/09

to

On Fri, 25 Sep 2009 09:18:32 +0100, Peter J Seymour <mo...@pjsey.demon.co.uk>
wrote:

I think you are talking about something completely different.

I was talking about the meaning of the subject line -- how many "names" will a
program store, as opposed to how many records of persons it will store.

Rickey

unread,

Sep 25, 2009, 4:41:49 PM9/25/09

to

"Kathryn M Rogers" <kmro...@bigpond.net.au> wrote in message
news:QNUtm.42101$ze1....@news-server.bigpond.net.au...

> Hello Colleagues,
>
> I have FTM 2005 and I am wondering how many names will it take before it
> has a meltdown? I have about 75,000 now and don't want an unexpected
> crash. Of course I do a backup regularly but is there an optimum number of
> names at which you should divide your data base?
>
> Regards,
> Kathryn Rogers
>
>

Approx 250,000

http://tinyurl.com/yc7d37l

The above link will take you to the Family Tree Maker FAQ site which show
the Program Limitations.

Hope this helps

Rickey

Peter J Seymour

unread,

Sep 26, 2009, 4:03:39 AM9/26/09

to

Well done. Now why didn't we get there in the first place. The figure
refers to individuals not names, but should be a good enough guide.
Peter

Gordon Burditt

unread,

Oct 19, 2009, 4:10:06 AM10/19/09

to

>I'm not sure what you are getting at there.
>
>The historical context has nothing to do with it. Peter is Peter. When you
>type it on your keyboard, they keyboard does not need to know the historical
>context. In the same way, the disk location where it is stored doesn't need to
>know anything about the historical context either. That's something you need
>to know.

There's one important qualifier here: if this compression is being
done "behind your back", it's fine as long as when you create a
record that says "First Name: Peter", and you later retrieve the
record, it STILL says "First Name: Peter". You don't really care
that much how it got that info, although if it takes a lot of time
to come up with "Peter", maybe this wasn't such a good optimization.

But you need to worry about another issue: if you've got 6 Peters
in your database, and you discover that after you discover more
records, one of them really should be named "Pater" and make the
change, then the other 5 Peters had better still be "Peter", not
"Pater". It would also be nice if I changed the *last* "Peter" to
something else, that the name "Peter" gets taken out of the list
entirely. Otherwise, the list will get filled up with every spelling
error I ever make, then correct. And it's important not to delete
names still in use.

Now, I don't think any existing programs forgot about this issue,
because it would become visible fairly quickly, and as you try to
fix the problem, it would become repeatedly obvious. Glitches in
corner situations (perhaps with merges) are not impossible. If
you're putting this stuff in a relational database yourself, you
need to worry about such things.

You do have to be careful about using crappy media. Hopefully,
floppies are dead. One byte error in the wrong place can transform
*every* "Smith" into "SmZth".

Steve Hayes

unread,

Oct 20, 2009, 2:03:01 AM10/20/09

to

On Mon, 19 Oct 2009 03:10:06 -0500, gordon...@burditt.org (Gordon Burditt)
wrote:

>>I'm not sure what you are getting at there.
>>
>>The historical context has nothing to do with it. Peter is Peter. When you
>>type it on your keyboard, they keyboard does not need to know the historical
>>context. In the same way, the disk location where it is stored doesn't need to
>>know anything about the historical context either. That's something you need
>>to know.
>
>There's one important qualifier here: if this compression is being
>done "behind your back", it's fine as long as when you create a
>record that says "First Name: Peter", and you later retrieve the
>record, it STILL says "First Name: Peter". You don't really care
>that much how it got that info, although if it takes a lot of time
>to come up with "Peter", maybe this wasn't such a good optimization.

Well that's how it works in PAF 2.x -- I don't know about other versions.

>But you need to worry about another issue: if you've got 6 Peters
>in your database, and you discover that after you discover more
>records, one of them really should be named "Pater" and make the
>change, then the other 5 Peters had better still be "Peter", not
>"Pater". It would also be nice if I changed the *last* "Peter" to
>something else, that the name "Peter" gets taken out of the list
>entirely. Otherwise, the list will get filled up with every spelling
>error I ever make, then correct. And it's important not to delete
>names still in use.

Again, that's how it works in PAF 2.x.

We have several Peters. Then when I entered a Pater (for Agnes Pater), it
beeped and asked if I wanted to enter that new name. I responded Y, and so it
stored Pater in the database too. If I discover that one of the Peters should
have been Pater, I retype it, and it simply shifts the pointers from Peter to
Pater. If it's a new name, not already in the names database, it asks if I
want to store it, and I can say Y, and it will store it, but if it's a typo, I
press N and then correct it. And if I make a typo and spell a name incorrectly
it beeps and asks if I want to store the name, and then I respond N and
correct the typo.

I think there's a cleanup routine to remove names from the names database if
there are no actual names corresponding to it,

For what it's worth here is a list of datafiles in PAF 2.x

Volume in drive E is PROGRAMS
Volume Serial Number is 07DD-2A18

Directory of E:\PAF

2004-12-20 08:06 PM 785,036 INDIV2.DAT
2004-05-22 10:27 AM 81,956 MARR2.DAT
2004-12-05 07:16 AM 113,253 NAME2.DAT
2002-04-19 08:04 AM 448 NAMADD2.DAT
2004-12-20 08:06 PM 921,088 NOTES2.DAT
2002-04-18 11:52 AM 184 REPTITL2.DAT
2001-07-20 06:54 PM 248 GIEDEF.DAT
2002-01-02 01:32 PM 7,263 REPORT.DAT
1995-05-30 04:41 PM 28,186 BIO.DAT
2004-01-14 04:50 AM 102,720 ALPHA.DAT
1990-11-08 09:20 AM 1,536 MISC.DAT
1994-12-14 01:31 PM 145,636 FIRST.DAT
2000-03-11 07:57 AM 10,929 BIOTRANS.DAT
1997-09-20 09:40 AM 0 GWAY4.DAT
14 File(s) 2,198,483 bytes
0 Dir(s) 3,406,536,704 bytes free

NAME2.DAT is the file that stores the names, INDIV2.DAT is the record of
individuals in the database -- record number (RIN), names, and dates of birth
and death. The name fields do not store the actual names, but pointers to
NAME2.DAT.

As you can see from the dates, I've stopped using it, mainly because it's not
Y2K compatible, and beeps every time I enter a date after 31 Dec 1999. And
also because most modern printers are too crippled to print directly from DOS
programs, and so one has to resort to clumsy workarounds like printing to a
file, importing the file into a Windows program, and printing from that.

But until then it worked quite well, and I never had any problems with it
finding the wrong names.

>Now, I don't think any existing programs forgot about this issue,
>because it would become visible fairly quickly, and as you try to
>fix the problem, it would become repeatedly obvious. Glitches in
>corner situations (perhaps with merges) are not impossible. If
>you're putting this stuff in a relational database yourself, you
>need to worry about such things.

No doubt. That's why all the programming text books go on about "referential
integrity".

>You do have to be careful about using crappy media. Hopefully,
>floppies are dead. One byte error in the wrong place can transform
>*every* "Smith" into "SmZth".

And hardware.

One computer I had had a faulty disk controller. My family history database
freuently had records 935-940 overwritten with random text or data from
another place altogether - a word processing file I had saved or something. I
kept having to re-enter it. Eventually I got a new hard disk and controller
and the problem disappeared.