How Should We Store Evidence in Genealogical Databases?

Tom Wetmore

unread,

May 23, 2011, 7:52:05 AM5/23/11

to

This thread is an offshoot from the Linux thread that is going off on a number of tangentsl.

How should we store evidence in genealogical databases?

You get a marriage record in the mail; you find an image of a census record at Ancestry.com; you find the record of an event on a page in a book you found on Google books. What are you going to do with those three records? Here are some possible answers.

First, if you are careful genealogists, you're going to record the source of the records in your database as source records. Got that out of the way.

Second, as far as the "physical records" are concerned, let's say you carefully file the paper marriage record away in your paper filing system, and you go to your big ancestry folder area on your computer and keep copies of those two images. Dandy.

Now, what are you going to do with the information in those three physical records (let's say we can call those image files "physical" for sake of argument).

Here's the "normal" answer in my opinion. You look at the physical records, you decide who the persons were who are mentioned in those records, you go you your genealogy program and you find the appropriate person records, creating them if need be, and you edit in the new information. In other words you extract information from the physical records and you add that information directly to person records. Note that the information from the physical records only enters into your database as items inside person records.

Here's another possibility advocated by some genealogists. After you create the source records for where the physical records came from, you edit those source records, adding to them the information that you got from those sources that you believe is important. You probably have to do this as "unstructured notes." Then you link persons to those sources and you also "copy up" from the stuff you added to the source records into the person records.

Here's another possibility advocated by programs like Gramps for Family Tree Maker. You first create event records from information in the physical records, say a birth or death or marriage events, and then you add a link from some person in your database, creating that person record if need be, to that event record. The events really don't stand alone; you have to link person records to them.

All these techniques work fine while you are in the realm of "person-based genealogy" or "conclusion-based genealogy". When in this realm you either already know whom the people are that you are researching, or you have such a solid vital record and other record trail back to them that you can be sure whom you are researching. You know whether any particular record belongs to a person you are researching or not; you ignore the records that don't, and you simply copy information out of the records that do. In my opinion 98% of the genealogical software is devoted to people working in this mode.

Eventually every genealogist reaches the point when he or she has delved far enough back in time that the solid, firm trail of records has dried up. When we reach this point our task changes from one of simply elaborating persons we know or can learn about easily, to one of true historical research. We embark on the chore of trying to find whatever sources we can, from whatever creative recesses of our minds or experience takes us. From the sources we manage to find, we have to keep whatever information that mentions people that might eventually be of interest to us, and we must record that information somehow so we can continually be able to refer to it. We have faith that at some time in the future we will have found enough records that we'll be able to figure out who all those people are and how they are related. At that time, maybe far in the future, maybe after many serendipities in our record searching, we'll be able to finally create new persons in our database and add the hard fought information to them.

When we reach this point we are in the realm of "record-based genealogy." This has been described as "crossing a chasm." We are now true historians. We must collect lots of records, but we don't know yet whom they belong to.

What are you going to do with this evidence? If you use some of the approaches above you're kind of stuck. You can add paper copies to your files, or images files to your computer, but what else are you going to do? There are no people records around to stick them to. You can bloat source records with notes, but how can you find any of that unstructured info in the future?

To do your research effectively, to be able to reason about the data you've collected, you have to have some way of finding the information and arranging it. Are you going to do this by spreading sheets of paper on your desk, keeping lots of windows to image files open on your computer, taking lots of notes on 3x5 cards, sketching out possible family groups with paper and pencil?

Wouldn't you want all that evidence information codified somehow inside your genealogy program so you can search for names, search for dates, search for places, see the relationships mentioned in the evidence, and so on? How would you want your genealogy application to support you after you have "crossed the chasm?"

I have my answers to these questions, but I'll stop at this point. I hope you'll think about this. What would your dream software system to do support you? Say you collected 100 records that mention people with interesting names, that is, names that might be those of persons you are interested in. Because you are so far back and time, in the fog of uncertainty, you don't yet know who the real persons mentioned in those 100 record are, but you have reason to believe that some of them will be the persons you are looking for, and that some of those people are probably mentioned many times in those records. How do you want to record those 100 records so you can work with them to decide who the real persons were? Do you want them somehow in your genealogy program? Do you want that information not to be in your genealogical database until you feel you know who they were? What features or requirements would you want your genealogical software to have or meet to handle this evidence and facilitate your ability to work with it?

Tom

J. Hugh Sullivan

unread,

May 23, 2011, 8:56:09 AM5/23/11

to

On Mon, 23 May 2011 04:52:05 -0700 (PDT), Tom Wetmore
<tt...@verizon.net> wrote:

>This thread is an offshoot from the Linux thread that is going off on a num=

>ber of tangentsl.
>
>How should we store evidence in genealogical databases?
>

>You get a marriage record in the mail; you find an image of a census record=
> at Ancestry.com; you find the record of an event on a page in a book you f=
>ound on Google books. What are you going to do with those three records? He=
>re are some possible answers.

It's easy for me - I establish parameters. If it is online or in a
book I don't want a hard copy unless it is my direct line. Most people
who have all that paper can't find anything anyhow. And finding it
serves no more purpose than listing the source where others can see
it. After I have proved it to myself I have absolutely no need to
prove it to others. If they are not satisfied with what I have they
can do their own research.

Hugh

singhals

unread,

May 23, 2011, 10:12:46 AM5/23/11

to gen...@rootsweb.com

Moreover, the relatives who have even a minor interest in
any of this have about 1/4 as much interest in where I found
something or what it really says. If I share sources with
them, they aren't in footnotes; the narrative text gets
generated with footnotes because that's how software does
it, but I go back and edit the document pulling those
footnotes into the prose; as in , "I finally found this
marriage in the next county over (Tyler) in the
chronological record but just not in the county we thought."
Everyone is happy, particularly after I send the one who
does care a copy of the unedited version.

If someone doesn't wish to believe me, showing them papers
won't change their mind and not showing them papers won't
change the minds of those who do believe me.

Cheryl

steve

unread,

May 23, 2011, 12:18:57 PM5/23/11

to

Can't you simply catalog every piece of information number such and
such? The copy of the marriage certificate from Aunt Mary is #123456
and the email from cousin Leroy is #123457 and the transcription of
Hidden Valley marriages is #123458.

An individual record can then refer to the various item numbers that
support its conclusions. I don't see that the item has to point back
to the people. Presumably the item would speak for itself.

What a genealogy program can or should do depends on what the user is
doing. A program that is great for presenting a family tree may not
be suitable for doing a one name study. How do you record that the
John SMITH who enlisted in the 1st Regiment of Alabama Infantry just
might be the same John SMITH who married Mary JONES over in the
neighboring county; but you're not sure.

Right now I'm sorta trying to do a locality study. I'm completely
clueless as to how I should organize things. I wind up just making
lots of transcriptions and notes and saving them as text files.
Surely there is a better way.

Steve

--

Steve Hayes

unread,

May 23, 2011, 12:23:07 PM5/23/11

to

On Mon, 23 May 2011 04:52:05 -0700 (PDT), Tom Wetmore <tt...@verizon.net>
wrote:

>This thread is an offshoot from the Linux thread that is going off on a number of tangentsl.

>
>How should we store evidence in genealogical databases?

Your message was difficult to read as there is something wrong with your
wordwrap.

--
Steve Hayes from Tshwane, South Africa
Web: http://hayesfam.bravehost.com/stevesig.htm
Blog: http://methodius.blogspot.com
E-mail - see web page, or parse: shayes at dunelm full stop org full stop uk

Tom Wetmore

unread,

May 23, 2011, 12:21:37 PM5/23/11

to gen...@rootsweb.com

Cheryl,

I'm not asking what you should store so you can convince others of the accuracy of your work.

I'm asking what do you want to do with the evidence you have gathered about people you MIGHT be interested in, but BEFORE you have figured out who was who. Since the evidence might apply to persons you are interested in, I assume you wouldn't throw it out. Since you don't know who it refers to yet, you can't add it to any person record already in your database. In what form would you want that evidence, and what would you like to be able to do with it?

Tom

Tom Wetmore

unread,

May 23, 2011, 1:16:44 PM5/23/11

to haye...@yahoo.com

>
> Your message was difficult to read as there is something wrong with your
> wordwrap.

If I could fix that I would. I'm using the google interface with safari on a mac. Suggestions?

Tom

Tom Wetmore

unread,

May 23, 2011, 2:30:28 PM5/23/11

to

On Monday, May 23, 2011 12:18:57 PM UTC-4, steve wrote:

> On May 23, 6:52 am, Tom Wetmore <t....@verizon.net> wrote:

> > How should we store evidence in genealogical databases?

> > Tom

> Can't you simply catalog every piece of information number such and
> such? The copy of the marriage certificate from Aunt Mary is #123456
> and the email from cousin Leroy is #123457 and the transcription of
> Hidden Valley marriages is #123458.
>
> An individual record can then refer to the various item numbers that
> support its conclusions. I don't see that the item has to point back
> to the people. Presumably the item would speak for itself.
>
> What a genealogy program can or should do depends on what the user is
> doing. A program that is great for presenting a family tree may not
> be suitable for doing a one name study. How do you record that the
> John SMITH who enlisted in the 1st Regiment of Alabama Infantry just
> might be the same John SMITH who married Mary JONES over in the
> neighboring county; but you're not sure.
>
> Right now I'm sorta trying to do a locality study. I'm completely
> clueless as to how I should organize things. I wind up just making
> lots of transcriptions and notes and saving them as text files.
> Surely there is a better way.
>
> Steve
>

Steve,

Your approach is great for person-based genealogy. You collect your
evidence, you catalog it, and you reference each item of evidence from the
person records in your database that the evidence refers to.

My question has to do with what happens when you have reached the point where
you need to do record-based genealogy, and you have collected a great deal of
evidence about people you are not sure of yet. You can catalog that
evidence as you suggest. But you don't have any person records to refer to
that evidence yet, so you can't add any of that data to your database yet.

So say you have 100 catalogued records. Some are on paper. Some are image
files. Maybe you transcribed some of them into word processing files.

How do you want to work with that evidence to decide who the real persons
were? Imagine -- you have collected 100 items of evidence. You want to compare
them in many ways to decide who was you. So you need to search through them
looking for a specific name, or a specific year, or a specific place. You
don't know who the people are yet, and you are trying to figure that out. If
all these records were on paper, or in word processing files or in image
files, how are you going to find the data you need quickly.

I am convinced that once you reach record-based genealogy, you need to get
your evidence records codified into records in a software program. I
would like that software program to be the same program I use for genealogy,
but today's programs don't seem to support this idea very well. If you had
those evidence records codified somehow, you would have an excellent solution
to the searching and thinking problem. Your software could instantly retrieve
all evidence records containing a certain name, or a certain year, or a
certain place.

So again my question. Given you are doing record-based genealogy, and you have
lots of evidence records about people you are not sure of yet, how do you want
to record that evidence so you can effectively use it to make you decisions
about who was who?

Tom

Steve Hayes

unread,

May 23, 2011, 4:57:14 PM5/23/11

to

On Mon, 23 May 2011 11:30:28 -0700 (PDT), Tom Wetmore <tt...@verizon.net>
wrote:

>How do you want to work with that evidence to decide who the real persons

>were? Imagine -- you have collected 100 items of evidence. You want to compare
>them in many ways to decide who was you. So you need to search through them
>looking for a specific name, or a specific year, or a specific place. You
>don't know who the people are yet, and you are trying to figure that out. If
>all these records were on paper, or in word processing files or in image
>files, how are you going to find the data you need quickly.
>
>I am convinced that once you reach record-based genealogy, you need to get
>your evidence records codified into records in a software program. I
>would like that software program to be the same program I use for genealogy,
>but today's programs don't seem to support this idea very well. If you had
>those evidence records codified somehow, you would have an excellent solution
>to the searching and thinking problem. Your software could instantly retrieve
>all evidence records containing a certain name, or a certain year, or a
>certain place.

askSam can do most of that.

singhals

unread,

May 23, 2011, 6:08:26 PM5/23/11

to gen...@rootsweb.com

I want a piece of paper, filed where it seemed to me to be a
good idea to file it.

Paper because I find it easier to shuffle paper than
electrons -- for one thing, I can spread two dozen pieces of
paper on the table-top and STILL be able to read them,
something I find I cannot do with even 4 windows open on the
monitor.

Cheryl

singhals

unread,

May 23, 2011, 6:18:27 PM5/23/11

to gen...@rootsweb.com

Tom Wetmore wrote:

[Snippit]

> I am convinced that once you reach record-based genealogy, you need to get
> your evidence records codified into records in a software program. I

You might check at your local university/college for a
professor of history who is doing active research and
publishing what he uses.

Tom Wetmore

unread,

May 23, 2011, 6:20:30 PM5/23/11

to haye...@yahoo.com

> askSam can do most of that.
>

As a Mac guy I didn't know much about askSam. Looking at it now, it seems a
good solution. Personally, I want my evidence in my genealogy program with a
seamless connection between evidence and conclusion data, but if I were forced
to used two programs, one for evidence and one for my persons, this looks like
a good solution.

I see there are some possible Mac programs to help out, one named CircusPonies.

Tom

Tom Wetmore

unread,

May 23, 2011, 6:22:54 PM5/23/11

to gen...@rootsweb.com

> I want a piece of paper, filed where it seemed to me to be a
> good idea to file it.
>
> Paper because I find it easier to shuffle paper than
> electrons -- for one thing, I can spread two dozen pieces of
> paper on the table-top and STILL be able to read them,
> something I find I cannot do with even 4 windows open on the
> monitor.
>
> Cheryl

Cheryl,

Far be it for me to question you on this, as that is how historians have done their
jobs for centuries!! But I think in this day and age of computers there are better
ways.

Glad I'm getting some answers to my questions. Yours is to use paper, Steve's is
to use askSam.

Hoping I'll get a few more answers.

Tom

Ian Goddard

unread,

May 23, 2011, 6:51:44 PM5/23/11

to

First of all, I happen to live in the same area where most of my
ancestors lived. I have surnames in my family tree in, say the C18th,
who were recorded hereabouts in the 1379 subsidy roll. There are
transcripts of PRs either in print, formerly in print but now on
archive.org or on data CDs which cover most of what I need. This means
that filing and retrieval of most physical records isn't a problem.
Also I've been able to grab some of the machine readable stuff into an
RDBMS which now contains most of the C18th & parts of adjacent centuries
of baptisms for 3 parishes (the exact cover varies, plus about a century
of marriages for another plus I have almost all for late C16th & half
the C17th in a holding table. Naturally transcriptions are only as good
as the transcriber so it means occasional trips to the library a couple
of miles away to check things on microfiche - assuming they're legible
;) In this respect I'm probably a good deal more fortunate than most
although my wife's family are a different matter altogether as she's
from Ireland and we have trouble in some cases getting beyond the late
C19th.

However, one of the penalties of having one's ancestors in the same
place for generations is that lots of collaterals are also around and
often using the same names. That means that problems start to arise
long before records dry up. It means that I have to try to sort out
which child of George Boothroyd belongs to which of the several
contemporary George Boothroyds etc. My solution to that is to use a
spreadsheet (recounted here in previous posts) to shuffle records into
coherent family groups based on clues of location and occupation of
fathers and likely assumptions about minimum intervals between baptisms
of children to the same family. The sort of rule-based approach which
Richard mentioned in the previous thread could be really useful here.

The spreadsheet approach is made easier where the baptisms come from the
period covered by my database. When it gets earlier than that I have
sometimes resorted to grabbing IGI PR transcripts as GEDCOMs & loading
them into Gramps to sort which is why it hurt when the GEDCOM download
broke at the turn of the year.

I also use Gramps for recording finished work. I also keep a Gramps
database containing location information and merge exports of that with
the various family databases to keep the locations consistent.

--
Ian

The Hotmail address is my spam-bin. Real mail address is iang
at austonley org uk

Steven Gibbs

unread,

May 23, 2011, 7:47:11 PM5/23/11

to

When I started a one-place study about ten years ago, I decided that
standard software didn't do the job I wanted. So I wrote my own. Over the
years I've developed the ideas into something fairly coherent, but, since
I'd rather be getting on with data input than coding, the execution leaves a
lot to be desired, and I'd do a lot different if I was starting from
scratch.

Basically, I really only have two types of object, persons and documents. I
input a document and create a new persona for each name in the document.
Then I check each persona to see if I am comfortable about merging it into
an existing person. The only real linkage is that each person has a father
and a mother. I treat names as attributes of a person, and estimate other
attributes algorithmically, such as estimated date of birth. By clicking on
a name I can bring up a complete list of documents referring to that person,
and I can open multiple persons at any one time, allowing easy comparisons.
An example is given as an appendix.

I can search easily on multiple parameters - it's no problem to get a list
of everybody with a spouse called Jane, a mother called Mary, and a
connection to Cardington, for example.

At the moment I have only coded up specific types of documents with fixed
formats such as census records, parish registers and civil registration
indexes. (Note that this is English research.) I haven't quite worked out
a design for freeform text such as wills, but I would expect it to be
something along the lines of delineating names, places and occupations with
tags.

I didn't cater easily for unmerging persons. If I need to unmerge a person,
I have to create a new person for every original persona in the documents
for a person, unlink any parents and children and then remerge each of the
personas again. If I was starting again, I'd keep a record of each merge so
I could recreate the last two persons that were merged, as experience shows
that it's usually the most recent merge that was in error.

Steven

Appendix: Example of screen output.

From this I can easily see that I haven't found a baptism for Joseph (but it
looks as though it should be at Cople), and that I haven't seen the marriage
record - personally, I'm not bothered about the lack of sourcing for this,
since it's a private project, but if I was to give this data to anyone I'd
ensure that the IGI batch number I used was included. I can also easily see
that there is a problem with the age of his daughter Sarah. I note that
he's living with his sister Mary Ann, so that if I can confirm parentage for
Joseph, I would hope to be able to confirm parentage for Mary Ann, or vice
versa (while being aware that they might only be half-brother and -sister).
Joseph's estimated birthdate of 1817 shows in the index, but not in the
output.

Joseph BRIMLEY

_________________________________________________

Father: James BRIMLEY
_________________________________________________

1841 Census: Cardington, Bedfordshire
Piece: HO107/9 Folio: 13 Page: 20
Dwelling:
Joseph BRIMLY 25 M Bedfordshire
Occ: Wheelwright
Elizabeth BRIMLY 25 F Bedfordshire
_________________________________________________

Joseph BRIMLEY
Marriage Registered Q2 1841 at Bedford Registration District

_________________________________________________

24 May 1841: Marriage at St John, Bedford, Bedfordshire.

Joseph BRIMLEY to Elizabeth WELLS

Joseph BRIMLEY
Age:
Condition:
Occupation:
Son of James BRIMLEY,

Elizabeth WELLS
Age:
Condition:
Occupation:
Daughter of Joseph WELLS,

_________________________________________________

1851 Census: Cardington, Bedfordshire
Piece: HO107/1752 Folio: 262 Page: 18
Dwelling:
Joseph BRIMLEY M 33 M Cople, Bedfordshire
Rel: Head
Occ: Wheelwright
Elizabeth BRIMLEY M 33 F Harrold, Bedfordshire
Rel: Wife
Sarah BRIMLEY U 9 F Cardington, Bedfordshire
Rel: Daughter
Mary Ann BRIMLEY U 37 F Cople, Bedfordshire
Rel: Sister
_________________________________________________

1861 Census: Cardington, Bedfordshire
Piece: RG9/992 Folio: 53 Page: 22
Dwelling: (partial entry - boarder)
Joseph BRIMLEY M 43 M Cople, Bedfordshire
Rel: Head
Occ: Wheelwright
Elizabeth BRIMLEY M 43 F Harrold, Bedfordshire
Rel: Wife
Occ: Wheelwright's wife
Sarah BRIMLEY U 12 F Cardington, Bedfordshire
Rel: Daughter
Occ: Scholar
Mary Ann BRIMLEY U 47 F Cople, Bedfordshire
Rel: Sister
_________________________________________________

1871 Census: Cardington, Bedfordshire
Piece: RG10/1545 Folio: 33 Page: 10
Dwelling: Bedford Road
Joseph BRIMLEY M 53 M Cople, Bedfordshire
Rel: Head
Occ: Wheelwright
Elizabeth BRIMLEY M 53 F Harrold, Bedfordshire
Rel: Wife
Mary COWLAND U 22 F Wilstead, Bedfordshire
Rel: Boarder
_________________________________________________

1881 Census: Cardington, Bedfordshire
Piece: RG11/1624 Folio: 43 Page: 7
Dwelling: 1 Church Side
Joseph BRIMLEY M 63 M Cople, Bedfordshire
Rel: Head
Occ: Wheelwright
Susannah BRIMLEY M 61 F Willington, Bedfordshire
Rel: Wife
Mary COWLAND U 32 F Wilstead, Bedfordshire
Rel: Boarder
Occ: Companion Dom
_________________________________________________

30 September 1886: Burial at Cardington, Bedfordshire

Joseph BRIMLEY, aged 69.

_________________________________________________

Joseph BRIMLEY

Death Registered Q3 1886 at Bedford Registration District
Age: 69

_________________________________________________

Known children:-

Sarah (1841/1848)
_________________________________________________

Tom Wetmore

unread,

May 23, 2011, 8:37:20 PM5/23/11

to

Steven,

Very nice. I'm working on a similar solution. I handle the unmerge problem
by never really merging. Instead of merging I build up a tree of person
records. If I decide two personae refer to the same person I create a new
person record that simply refers to the two personae. I can add a
justification to that new record to explain the rationale for joining. If I
decide later that the personas refer to different people, I just delete the
higher level person.

I let the higher level person inherit facts from the lower level personae.
This works fine unless there is some conflict in the facts in the different
personae. If there are I let the person know which persona to use for which
fact. I don't have different data structures for the persona and person, so
there is no reason to restrict these "person trees" to just two levels. I like
this approach because at the leaves of the trees we have all the
personae, and at all other levels we have persons that were created
explicitly because of some conclusion I came to as the result of
analyzing all the evidence, and I can document that analysis at that point.

The issues I am working through are really those of finding a user interface
that makes these idease easy to work with.

In summary, there are now four answers to my question that have appeared on
this thread:

1. Use paper.

2. Use special associative or full text indexed databases (e.g., askSam).

3. Use spreadsheets.

4. Use roll-your-own software based on a persona concept.

This is great; thanks.

Tom

Tom Wetmore

unread,

May 23, 2011, 8:49:49 PM5/23/11

to

Ian,

Thanks. You have a very practical approach. Your data is in lots of forms
so you manage each form the way that seems to make the most sense.

You summarize the important information in a spreadsheet so you can get
a high-level overview of your data, and have it in a form where you can
rearrange inforamtion quickly and easily to experiment with different
ways of joining evidence into persons and persons into families.

Of the four ways of handling evidence I summarized in my response to
Steven, I think we'd have to add a fifth now:

5. An ad hoc approach in which we keep different types of evidence,
taken from different types of sources, in different formats that seem
best for that type.

Note one very interesting thing about the five answers so far. No one
says they use their genealogical application to store their evidence.
Is this a failing of genelaogical software in general, or is handling the
evidence and handling the persons such two fundamentally different
things that we need two completely different programs for handling them?

My answer is that genealogical programs should be able to handle evidence,
but no one has figured out how to do it yet.

Tom

Robert Riches

unread,

May 24, 2011, 12:53:34 AM5/24/11

to

Hi Tom,

(When you mentioned LifeLines a few postings ago, I recognized
your name from many years ago. I never got LifeLines compiled
back then. That was probably my loss.)

I don't know much about the Google interface. Unless their
interface is terribly broken, it _ought_ to be possible to
manually break lines at some reasonable limit (72 or so is the
old standard).

Have you considered using slrn with a non-Google news server?
There are a bunch of free and fairly low-cost options. When
Verizon dropped newsgroups, I switched to news.individual.net and
have been quite happy with them for about US$16/yr.

HTH

--
Robert Riches
spamt...@jacob21819.net
(Yes, that is one of my email addresses.)

Wes Groleau

unread,

May 24, 2011, 1:16:56 AM5/24/11

to

On 05-23-2011 20:49, Tom Wetmore wrote:
> Note one very interesting thing about the five answers so far. No one
> says they use their genealogical application to store their evidence.

OK, I'll say it. Not a marvelous wonderful stupendous tool for it, but
better than anything else I've tried.

--
Wes Groleau

There are two types of people in the world …
http://Ideas.Lang-Learn.us/barrett?itemid=1157

Steve Hayes

unread,

May 24, 2011, 3:13:55 AM5/24/11

to

And that's back to the Research Data Filer (RDF) that came with early versions
of PAF. You keep the paper evidence, and the computer indexes them --
something computers are particularly good at.

I've said it before and I'll say it again - I'd really like to see an updated
version of RDF, preferably one that can import the data from the old version.
Perhaps with a bit more pizazz, but nothing too complicated.

Steve Hayes

unread,

May 24, 2011, 3:20:42 AM5/24/11

to

On Mon, 23 May 2011 15:20:30 -0700 (PDT), Tom Wetmore <tt...@verizon.net>
wrote:

>

My daughter speaks highly of OneNote, which comes witn MS Office, which I
believe has a Mac version. I haven't tried it myself, though, but I use askSam
for making notes from documents in archives, interviews with family members
and the like.

I wouldn't like that to be part of my regular genealogy program, because I use
it for making notes of other things as well, and genealogy programs are too
bloated as it is - remember the days when programs used to take up less disk
space than the data?

Peter J. Seymour

unread,

May 24, 2011, 3:19:40 AM5/24/11

to

Absolutely. For all my interest in computers, my real underlying
interest is in information. Computers are only a tool to help dealing
with this. I keep the evidence (and some printouts) in lever arch files
roughly organised on a surname basis. Yes, you can take several pieces
of paper and spread them out in front of you to study and compare them.
That is something you cannot do on a computer.

The issue of evidence for people you MIGHT be interested in is a good
point. However, my feeling is that throwing it all straight onto a
computer is likely to be unhelpful even if the software can cope.

Peter

Steve Hayes

unread,

May 24, 2011, 3:24:21 AM5/24/11

to

On Mon, 23 May 2011 15:22:54 -0700 (PDT), Tom Wetmore <tt...@verizon.net>
wrote:

>

Actually I use askSam to generate paper too!

But the historian thing reminds me of another program I'd like to see, which
has been discussed here occasionally in the past - an event-based program for
recording conclusions as well as evidence, which could be used by general
historians and not just family historians. It should take into account other
relationships than familial ones, and would be ideal for biographers.

Peter J. Seymour

unread,

May 24, 2011, 3:40:55 AM5/24/11

to

On 2011-05-24 01:49, Tom Wetmore wrote:
.....

>
> Note one very interesting thing about the five answers so far. No one
> says they use their genealogical application to store their evidence.
> Is this a failing of genelaogical software in general, or is handling the
> evidence and handling the persons such two fundamentally different
> things that we need two completely different programs for handling them?
>
> My answer is that genealogical programs should be able to handle evidence,
> but no one has figured out how to do it yet.
>
> Tom

I'm not sure I am responding to the strictly correct post here as your
software is posting all your replies as new threads. However...

You are perhaps wanting things to be all one way rather than another and
are in danger of missing the point which is that evidence can be stored
at a number of levels, physical and virtual, according to its perceived
usefulness. I retain (almost) all evidence on paper, but selected
evidence is then also stored on computer. It is simply too much effort
to store all possible evidence in a useful form on computer, so some
pre-sifting makes life easier, in fact it makes the job practicable. Now
the interesting question is since you have the paper based evidence, how
much of a particular item do you store on a computer? In practice, this
computer data might be just a reference, or it might be all the relevant
data. It could even be all the information in that piece of evidence and
including what sort of paper, what colour, what condition and so on, but
that would be unusual and remarkably keen. The amount will depend on
what you want to do with it on the computer.

Do you see the picture?

Peter

Ian Goddard

unread,

May 24, 2011, 4:38:24 AM5/24/11

to

Tom Wetmore wrote:
> On Monday, May 23, 2011 7:47:11 PM UTC-4, Steven Gibbs wrote:
>> Basically, I really only have two types of object, persons and documents. I
>> input a document and create a new persona for each name in the document.
>> Then I check each persona to see if I am comfortable about merging it into
>> an existing person

%><

>> I didn't cater easily for unmerging persons. If I need to unmerge a person,
>> I have to create a new person for every original persona in the documents
>> for a person, unlink any parents and children and then remerge each of the
>> personas again. If I was starting again, I'd keep a record of each merge so
>> I could recreate the last two persons that were merged, as experience shows
>> that it's usually the most recent merge that was in error.

> Very nice. I'm working on a similar solution. I handle the unmerge problem

> by never really merging. Instead of merging I build up a tree of person
> records. If I decide two personae refer to the same person I create a new
> person record that simply refers to the two personae. I can add a
> justification to that new record to explain the rationale for joining. If I
> decide later that the personas refer to different people, I just delete the
> higher level person.
>

Take it a step further. Have a separate entity for your "higher level
person" and then link entities between that and the personae records.
At a minimum the link could simply contain pointers in each direction
but you could expand it to contain a note and maybe some value to
indicate your confidence - including negative values to say you think
the persona does not refer to that person.

Gramps, BTW, does have an undo for merges but some operations, such as
importing a fresh GEDCOM, wipe the history.

Tom Wetmore

unread,

May 24, 2011, 5:13:04 AM5/24/11

to spamt...@verizon.net

Robert,

I am the LifeLines guy.

Yeah, the google interface allows infinite sized lines, and doesn't hard wrap them before sending to the new reader. So I'm hard wrapping them myself now.

Thanks.

Tom

Tom Wetmore

unread,

May 24, 2011, 5:18:38 AM5/24/11

to Newsg...@pjsey.demon.co.uk

Peter,

I understand your point. I can assure you I don't want it all one way. I keep
evidence in various ways depending on the type.

What I am interested in is what is the best way to represent your evidence
when you have lots of evidence about as-yet unknown persons, and you
are going through the inference process of deciding who those persons
are. I would have difficulty handling 100 index cards, or 100 pieces of paper,
or having 100 image files open on my computer. I would want a way of
putting, say, these five together in a group as a tentative person, and those
15 as another tentative person, and so on. It's hard for me to do this
grouping and thinking with paper, cards, open windows. I need some other
mechanism to help me.

So I'll keep my evidence on paper or in image files, as we all do, but I'm
looking for some additional "codified" form of the evidence as more or
less similarly-formated records on my computer so I can easily compare
and group them.

We are getting some good answers about this on this thread now.

Tom

Tom Wetmore

unread,

May 24, 2011, 5:27:07 AM5/24/11

to

> Ian

Ian,

Thanks. I have described my current thoughts for "doing genealogical research" as
building up "person-trees" with personae (person records codified directly from
evidence records) at the leaves and higher level person records as the roots and
"interior nodes" of the trees. This is a bit of a simplification
since it ignores issues of events codified from evidence, levels of confidence, adding
notes, adding conclusions, and so forth. My entire model is a bit more
complex that the simple tree of persons idea, but these concepts seem subtle enough
that I hesitate to obfuscate the core ideas with that complexity. I guess I'm trying
to say that my overall model does include the points you have just mentioned.

Thanks.

Tom

Ian Goddard

unread,

May 24, 2011, 5:46:41 AM5/24/11

to

Tom Wetmore wrote:
> Note one very interesting thing about the five answers so far. No one
> says they use their genealogical application to store their evidence.
> Is this a failing of genelaogical software in general, or is handling the
> evidence and handling the persons such two fundamentally different
> things that we need two completely different programs for handling them?

Gramps has a feature which I do sometimes use for handling evidence
files but AFAIK it just points to a file it's told about. I don't think
it makes a copy in the database.

> My answer is that genealogical programs should be able to handle evidence,
> but no one has figured out how to do it yet.

Figured but not done. If you look back to the thread "Data import from
the new Familysearch site" I have an XML example where the "evidence" is
copy & pasted from the web-page:

<SourceObjects>
<SourceObject>
<Content><![CDATA[Name: Mary Collier
Gender: Female
Baptism/Christening Date: 27 Dec 1760
Baptism/Christening Place: ALMONDBURY,YORK,ENGLAND
Birth Date:
Birthplace:
Death Date:
Name Note:
Race:
Father's Name: George Collier
Father's Birthplace:
Father's Age:
Mother's Name:
Mother's Birthplace:
Mother's Age:
Indexing Project (Batch) Number: P01712-1
System Origin: England-ODM
Source Film Number: 230649
Reference Number:]]>
</Content>
<ObjectID>1C42DFF7-299E-44C8-91FB-F5A805A78AAE</ObjectID>
<MimeType>text/plain</MimeType>
</SourceObject>
</SourceObjects>

Several points:

1. The object itself is wrapped up with a mime type (
http://en.wikipedia.org/wiki/MIME ). This tells the program how to
display it, assuming the program can handle that type itself, or pass it
on to whatever utility the OS has associated with that type.

2. As a consequence of the above the system is future-proofed. If some
new format is introduced then provided that a mime type is provided then
it can be handled.

3. A collection of objects can be wrapped in a binder. This makes
provision for multi-page scans, scan and transcription, transcript and
translation or whatever.

4. This is intended to be a publication medium.

5. A UUID is provided. This means that if I send a copy of this object
to you and someone else who has also received a copy sends a further
copy to you your S/W will be able to identify and discard the duplicate.

6. A consequence of 4 & 5 is that once the object is published it
should not be amended. If someone wants to make some amendment, such as
provide a transcript to a scan, a translation to a transcript, etc. it
should be done as a separate object with a pointer to the original.

7. There are a few changes I'd make. Firstly the binder object should
have the UUID instead of the inner to discourage braking up the object.
Secondly there should be text as a human-readable
title/description/handle to the inner object. Thirdly there needs to be
a content encoding element as well as the mime type although this could
be optional.

8. The whole shebang, binder and all, would best be enclosed in a
further wrapper with its own UUID, a UUID pointing further up the source
tree and other information such as text to be used as a citation. In
the case of an amendment record the source UUID would be that of the
amended record.

Note that this is an external representation. S/W realisations would be
free to store the data in an RDBMS or whatever.

singhals

unread,

May 24, 2011, 9:55:38 AM5/24/11

to gen...@rootsweb.com

Tom Wetmore wrote:

> Note one very interesting thing about the five answers so far. No one
> says they use their genealogical application to store their evidence.
> Is this a failing of genelaogical software in general, or is handling the
> evidence and handling the persons such two fundamentally different
> things that we need two completely different programs for handling them?

I won't speak for Steve or Ian, but in MY case I tried
putting it all into the genealogical database. It worked
fine for me while I was doing it EXCEPT for what I see as a
personality failure. If I were looking for the /exact/ date
of a death in the Mobil family, I'd invariably notice that
Ol' Skylar was in there 12 times ... and I'd end up opening
all 12 items to find the one with ancestors and descendants.
I considered that a waste of my time and energy, so I went
back to paper, which I handle only when I need to.

> My answer is that genealogical programs should be able to handle evidence,
> but no one has figured out how to do it yet.

I don't think one ought to mix results with process. In
fact, one ought to have two processes -- one double-checks
the other for legitimacy; keeping research/evidence on paper
and transferring it to the database provides both the
separation of tasks AND the cross-check.

Cheryl

Bob LeChevalier

unread,

May 24, 2011, 10:38:29 AM5/24/11

to

Tom Wetmore <tt...@verizon.net> wrote:
>This thread is an offshoot from the Linux thread that is going off on a number of tangentsl.
>
>How should we store evidence in genealogical databases? You get a marriage record in the mail; you find an image of a census record at Ancestry.com; you find the record of an event on a page in a book you found on Google books. What are you going to do with those three records? Here are some possible answers.
> First, if you are careful genealogists, you're going to record the source of the records in your database as source records. Got that out of the way.
> Second, as far as the "physical records" are concerned, let's say you carefully file the paper marriage record away in your paper filing system, and you go to your big ancestry folder area on your computer and keep copies of those two images. Dandy.
> Now, what are you going to do with the information in those three physical records (let's say we can call those image files "physical" for sake of argument).
> Here's the "normal" answer in my opinion. You look at the physical records, you decide who the persons were who are mentioned in those records, you go you your genealogy program and you find the appropriate person records, creating them if need be, and you edit in the new information. In other words you extract information from the physical records and you add that information directly to person records. Note that the information from the physical records only enters into your database as items inside person records.
> Here's another possibility advocated by some genealogists. After you create the source records for where the physical records came from, you edit those source records, adding to them the information that you got from those sources that you believe is important. You probably have to do this as "unstructured notes." Then you link persons to those sources and you also "copy up" from the stuff you added to the source records into the person records.
> Here's another possibility advocated by programs like Gramps for Family Tree Maker. You first create event records from information in the physical records, say a birth or death or marriage events, and then you add a link from some person in your database, creating that person record if need be, to that event record. The events really don't stand alone; you have to link person records to them.

As I understand it, what NFS did with IGI records which were
transcriptions, is that it turned each such transcription into an
association of personas, represented in the source record. Thus a
transcribed marriage has two spouses, probably a marriage date and
location, and possibly parents. The personas involved in a given
transcription are not linked to any other data.

Later, users can then selected various "personas" representing raw
data and "combine" them into a compound or derived persona
representing more completely what they think is the data applicable to
a single person. Someone else could combine different extracted
source personas into a different compound. The original source persona
still exists, as does any combined ones. In theory, one can determine
whether a persona is extracted data or derived, but in practice the
LDS source data is incredibly verbose and redundant, and so
uninformative unless you know what to look for (somewhere deep in the
record, one can find the same sorts of things one could find in the
old IGI records to distinguish an extracted from a submitted record).

They say that they plan to improve upon their sourcing.

>Eventually every genealogist reaches the point when he or she has delved far enough back in time that the solid, firm trail of records has dried up.

I don't ever expect to reach that point. But then my genealogy has
never been confined to direct ancestors. Try to find all the
descendants of some ancestor born around the time of the revolution.
Every single one of those descendants is a cousin, and in many
families that task alone might take a lifetime.

lojbab
---
Bob LeChevalier - artificial linguist; genealogist
loj...@lojban.org Lojban language www.lojban.org

J. Hugh Sullivan

unread,

May 24, 2011, 10:40:27 AM5/24/11

to

On Mon, 23 May 2011 10:12:46 -0400, singhals <sing...@erols.com>
wrote:

>J. Hugh Sullivan wrote:
>> On Mon, 23 May 2011 04:52:05 -0700 (PDT), Tom Wetmore
>> <tt...@verizon.net> wrote:
>>
>>> This thread is an offshoot from the Linux thread that is going off on a num=

>>> ber of tangentsl.
>>>
>>> How should we store evidence in genealogical databases?
>>>

>>> You get a marriage record in the mail; you find an image of a census record=
>>> at Ancestry.com; you find the record of an event on a page in a book you f=
>>> ound on Google books. What are you going to do with those three records? He=
>>> re are some possible answers.
>>
>> It's easy for me - I establish parameters. If it is online or in a
>> book I don't want a hard copy unless it is my direct line. Most people
>> who have all that paper can't find anything anyhow. And finding it
>> serves no more purpose than listing the source where others can see
>> it. After I have proved it to myself I have absolutely no need to
>> prove it to others. If they are not satisfied with what I have they
>> can do their own research.
>
>Moreover, the relatives who have even a minor interest in
>any of this have about 1/4 as much interest in where I found
>something or what it really says. If I share sources with
>them, they aren't in footnotes; the narrative text gets
>generated with footnotes because that's how software does
>it, but I go back and edit the document pulling those
>footnotes into the prose; as in , "I finally found this
>marriage in the next county over (Tyler) in the
>chronological record but just not in the county we thought."
>Everyone is happy, particularly after I send the one who
>does care a copy of the unedited version.
>
>If someone doesn't wish to believe me, showing them papers
>won't change their mind and not showing them papers won't
>change the minds of those who do believe me.
>
>Cheryl

Approaching from the other side, I don't need to see the paper for a
fact posted by another - just the source in case I want to endorse it.

Unsourced "facts" are often just conclusions - those are pretty scary.

Hugh

Bob LeChevalier

unread,

May 24, 2011, 10:47:06 AM5/24/11

to

steve <shmar...@ticnet.com> wrote:
>Right now I'm sorta trying to do a locality study. I'm completely
>clueless as to how I should organize things. I wind up just making
>lots of transcriptions and notes and saving them as text files.
>Surely there is a better way.

For my French work, I use a separate spreadsheet page for each type of
record for each parish, so that the most common data items found in
extracted records is all in the same column(s), and more or less in
the same format. I still can put have some freeform notes off to the
right, but I don't commonly need to. I also generate pages that have
the same records sorted on various keys.

I then in a separate pass, may add the people in these records to a
data base and link them together based on parentage or espousal, when
I can identify the people. In my case, probably 90% of all people in
one kinds of record, show up in some other record (either as parent or
spouse, or a death record for someone who was born), and I thus end up
with a huge and convoluted tree relating almost everyone to everyone
else. But this is totally separate from my spreadsheets of source
data.

Ian Goddard

unread,

May 24, 2011, 12:44:24 PM5/24/11

to

singhals wrote:
> Tom Wetmore wrote:
>
>> Note one very interesting thing about the five answers so far. No one
>> says they use their genealogical application to store their evidence.
>> Is this a failing of genelaogical software in general, or is handling the
>> evidence and handling the persons such two fundamentally different
>> things that we need two completely different programs for handling them?
>
> I won't speak for Steve or Ian, but in MY case I tried putting it all
> into the genealogical database. It worked fine for me while I was doing
> it EXCEPT for what I see as a personality failure. If I were looking for
> the /exact/ date of a death in the Mobil family, I'd invariably notice
> that Ol' Skylar was in there 12 times ... and I'd end up opening all 12
> items to find the one with ancestors and descendants. I considered that
> a waste of my time and energy, so I went back to paper, which I handle
> only when I need to.
>

I'm not sure exactly what you're describing here but the Gramps approach
is that you can add a media file into the Gallery. Just about any
category of item except repository can be linked to a Gallery item. The
database form has a Gallery tab on it. Click on that and the relevant
Gallery items will be shown as thumbnails.

So, for example, I can download a household image as a PDF for the Irish
1911 census (*free* - shame on the GB census!), enter it into the
Gallery and link to the family record, to each member of the household
and to the Place record. If I go into any of the records I can see the
thumbnail on the Gallery and click it to launch Acrobat & bring up the
full-size image. Supposedly it will also handle audio & video but I
haven't tried that.

singhals

unread,

May 24, 2011, 2:25:09 PM5/24/11

to gen...@rootsweb.com

Ian Goddard wrote:
> singhals wrote:
>> Tom Wetmore wrote:
>>
>>> Note one very interesting thing about the five answers so far. No one
>>> says they use their genealogical application to store their evidence.
>>> Is this a failing of genelaogical software in general, or is handling the
>>> evidence and handling the persons such two fundamentally different
>>> things that we need two completely different programs for handling them?
>>
>> I won't speak for Steve or Ian, but in MY case I tried putting it all
>> into the genealogical database. It worked fine for me while I was doing
>> it EXCEPT for what I see as a personality failure. If I were looking for
>> the /exact/ date of a death in the Mobil family, I'd invariably notice
>> that Ol' Skylar was in there 12 times ... and I'd end up opening all 12
>> items to find the one with ancestors and descendants. I considered that
>> a waste of my time and energy, so I went back to paper, which I handle
>> only when I need to.
>>
>
> I'm not sure exactly what you're describing here but the Gramps approach
> is that you can add a media file into the Gallery. Just about any
> category of item except repository can be linked to a Gallery item. The
> database form has a Gallery tab on it. Click on that and the relevant
> Gallery items will be shown as thumbnails.
>

Yeah, even PAF allows that. And, I'm told, you can put the
image link into the Source template which does have a
repository field. I'm not enchanted with that field, unless
the repository is Library of Congress or the Family History
Library; no other US library is nearly as disaster-proof --
and even LoC burned once.

Having that available though doesn't cure my problem of not
instantly recognizing which of 12 men of the same name and
birthdate is the one I want and which 11 are data-holders only.

> So, for example, I can download a household image as a PDF for the Irish
> 1911 census (*free* - shame on the GB census!), enter it into the
> Gallery and link to the family record, to each member of the household
> and to the Place record. If I go into any of the records I can see the

> thumbnail on the Gallery and click it to launch Acrobat& bring up the
> full-size image. Supposedly it will also handle audio& video but I
> haven't tried that.

Yes, PAF too is said to handle audio and video, but since a
majority of my ancestors died before that technology was
invented it's even less useful. (g)

Cheryl

Ian Goddard

unread,

May 24, 2011, 2:41:44 PM5/24/11

to

singhals wrote:
> Yes, PAF too is said to handle audio and video, but since a majority of
> my ancestors died before that technology was invented it's even less
> useful. (g)

And even if it were invented would you really want a recording of a very
drunk ggfather singing out of tune?

Tom Wetmore

unread,

May 24, 2011, 4:21:49 PM5/24/11

to

> Bob LeChevalier wrote:

>> Tom Wetmore <tt...@verizon.net> wrote:
> As I understand it, what NFS did with IGI records which were
> transcriptions, is that it turned each such transcription into an
> association of personas, represented in the source record. Thus a
> transcribed marriage has two spouses, probably a marriage date and
> location, and possibly parents. The personas involved in a given
> transcription are not linked to any other data.
>
> Later, users can then selected various "personas" representing raw
> data and "combine" them into a compound or derived persona
> representing more completely what they think is the data applicable to
> a single person. Someone else could combine different extracted
> source personas into a different compound. The original source persona
> still exists, as does any combined ones. In theory, one can determine
> whether a persona is extracted data or derived, but in practice the
> LDS source data is incredibly verbose and redundant, and so
> uninformative unless you know what to look for (somewhere deep in the
> record, one can find the same sorts of things one could find in the
> old IGI records to distinguish an extracted from a submitted record).

Bob,

I've used nFS and read all the API documentation. They have personas
and persons and the user interface allows you to rearrange
personas within persons. It's a two-tier system. I think it's a great
example of how personas can be made to work. I think their data is
pretty stinky, but that has no bearing on the technique in my opinion.

>> Eventually every genealogist reaches the point when he or she has delved

>> far enougback in time that the solid, firm trail of records has dried up.

> I don't ever expect to reach that point. But then my genealogy has
> never been confined to direct ancestors. Try to find all the
> descendants of some ancestor born around the time of the revolution.
> Every single one of those descendants is a cousin, and in many
> families that task alone might take a lifetime.

What I think you are saying is that you never plan to have to cross the chasm
from person-based genealogy to record-based genealogy. The whole
genealogical application business, in my opinion, caters to people who
believe that. Obviously from all I've written about this, I don't believe
that. At my point in genealogical research I have 1000s of records
that I haven't been able to assign to real people yet.

Here is my overall genealogical project. I descend from Loyalist Wetmores
who were exiled to Canada at the end of the American Revolutionary
War. By the 1850s many of the children and grandchildren of those
families were returning to the United States where the economy was
stronger. I descend from one of those returning families. My project
is to understand that return migration by finding all the families who
were involved, what their patterns of migration were, where they
ended up, and where all their descendants are living now. This is a
full research project. It is definitely a record-based project. You can
probably understand my needs for effective ways to record all my
evidence so I can access it in many ways to support the process of
making conclusions.

You might argue that this project is not a genealogical project, but
rather a historical project, so I have no business expecting a
genealogical application to be able to support me. I don't see it that
way. I believe that genealogy is history, and the farther back in
time we go, the more we have to act like historians to make
progress. I want a genealogical application that can support what
I am doing with this project.

Tom

Tom Wetmore

unread,

May 24, 2011, 4:26:52 PM5/24/11

to gen...@rootsweb.com

I concur with Cheryl. Basically all genealogical applications allow their
records to contain links to external items, either files on the local file
system or to URL on the world wide web.

That doesn't solve the problem of where to store those links. As Cheryl
points out, if you put a link in a person record, you are making the
explicit statement that the linked-to evidence refers to that person.

The question I am trying to get to in this thread, is how are you going
to handle that evidence, a link to an external something or other in
this case, when you don't yet know what person should link to it?

Tom

Ian Goddard

unread,

May 24, 2011, 5:27:03 PM5/24/11

to

If I were starting from scratch it would go in a blob. The whole
workflow should be evidence > analysis > reconstruction. You aren't
going to have it linked to your reconstruction until you've got to the
end of that process.

In essence the database would be arranged as follows:

Evidence:
Those blobs
Hierarchy of source information providing a provenance and indicating
where the material is currently located

Analysis:
Events
Personae (preserving names as originally written)
Roles linking Personae to Events
Locations linked to events (also preserving names as originally written)

Reconstruction:
People (providing a standard spelling plus some de-ambiguation as
needed, e.g. John Smith 3rd)
Links to Personae
Relationships - families, political, military, etc
Places (providing a standard spelling but also put in heirarchies,
civil, ecclesiastical, etc, date-bounded where appropriate)
Links to Locations

Brian

unread,

May 24, 2011, 9:49:41 PM5/24/11

to

On Mon, 23 May 2011 18:23:07 +0200, Steve Hayes
<haye...@telkomsa.net> wrote:

>On Mon, 23 May 2011 04:52:05 -0700 (PDT), Tom Wetmore <tt...@verizon.net>
>wrote:
>

>>This thread is an offshoot from the Linux thread that is going off on a number of tangentsl.

>>
>>How should we store evidence in genealogical databases?
>

>Your message was difficult to read as there is something wrong with your
>wordwrap.

I was able to read it but it went all the way across the screen on a
15.7" laptop.

Brian

unread,

May 24, 2011, 9:51:11 PM5/24/11

to

On Mon, 23 May 2011 10:16:44 -0700 (PDT), Tom Wetmore
<tt...@verizon.net> wrote:

>>
>> Your message was difficult to read as there is something wrong with your
>> wordwrap.
>

>If I could fix that I would. I'm using the google interface with safari on a mac. Suggestions?
>
>Tom

I don't know. I use Agent and it has a setting for Word Wrap. Perhaps
what you use does also.

Brian

unread,

May 24, 2011, 9:56:23 PM5/24/11

to

On Tue, 24 May 2011 19:41:44 +0100, Ian Goddard
<godd...@hotmail.co.uk> wrote:

>And even if it were invented would you really want a recording of a very
>drunk ggfather singing out of tune?

Wouldn't have to be that although those would show up. According to my
mother, her father had a wonderful singing voice until a car he was
working on fell on him in the 30's. It's too bad there were no
recordings.

Steve Hayes

unread,

May 24, 2011, 11:00:07 PM5/24/11

to

On Mon, 23 May 2011 09:18:57 -0700 (PDT), steve <shmar...@ticnet.com> wrote:

>What a genealogy program can or should do depends on what the user is
>doing. A program that is great for presenting a family tree may not
>be suitable for doing a one name study. How do you record that the
>John SMITH who enlisted in the 1st Regiment of Alabama Infantry just
>might be the same John SMITH who married Mary JONES over in the
>neighboring county; but you're not sure.

In a one-name study, couldn't you use a simpler database format, rather than a
lineage-linked one?

>
>Right now I'm sorta trying to do a locality study. I'm completely
>clueless as to how I should organize things. I wind up just making
>lots of transcriptions and notes and saving them as text files.
>Surely there is a better way.

That's where askSam could help you:

www.asksam.com

it is a better way, in that it is easy to search and produce formatted reports
from notes and transcriptions.

And no, I don't have shares in askSam, I've just been using it for 20 years to
keep track of my genealogical research.

Steve Hayes

unread,

May 24, 2011, 11:01:10 PM5/24/11

to

Seems to be fixed now.

Wes Groleau

unread,

May 24, 2011, 11:25:08 PM5/24/11

to

On 05-24-2011 10:40, J. Hugh Sullivan wrote:
> Unsourced "facts" are often just conclusions - those are pretty scary.

Maybe not even that. Could be cousin Jane's memory of what her
93-year-old grandfather told her about events that happened before
his father was born.

(Been there, read that.)

--
Wes Groleau

There are two types of people in the world …
http://Ideas.Lang-Learn.us/barrett?itemid=1157

Steve Hayes

unread,

May 24, 2011, 11:29:17 PM5/24/11

to

On Tue, 24 May 2011 13:26:52 -0700 (PDT), Tom Wetmore <tt...@verizon.net>
wrote:

>The question I am trying to get to in this thread, is how are you going

>to handle that evidence, a link to an external something or other in
>this case, when you don't yet know what person should link to it?

And that's where I find programs like askSam useful.

I put such evidence in there, and when I find a person that I think the
evidence might link to, it's easy to find the evidence again to check.

One example is gravestones.

When I visit a cemetery where families I'm interested are buried, I take
photos of gravestones of persons I know are related, and those with the same
surname that I don't know are related. I then put them into askSam, and also
transcribe the inscription (as much of it as I can read), and also put in
fields of name, place, and date of death/burial

Then if I find another piece of evidence that links that one to someone in my
database, it is quite easy to find, and I have a photo of the original
gravestone to refer to.

The entry form looks something like this:

Surname[
First_Names[
Death_Date[
Burial_Date[
Cemetery[
Place[
Prov/County[
Country[
Date_Entered[ ^D
Inscription[
]
Picture[
]
Notes[
]
Research[
]
Other_Sources[
]

The fields with a closing square bracket on the following line are multi-line
fields, like memo fields in MS Access.

By default askSam will search for any word or phrase in any field, though you
can restrict searches by using the usual Boolean arguments.

It can produce formatted reports based on any combination of fields, like
this:

Index of monumental inscriptions

Burnard, Magdalena M.C. Died 10 Dec 1906 Stellenbosch, Western
Cape
Burnard, Sibella Margaretha Died 5 Aug 1949 Stellenbosch, Western
Cape
Coppin, Rebecca Died 22 Jan 1893 Cardinham, Cornwall
Hannan, Thomas Died 4 Oct 1890 Girvan, Ayrshire
Hayes, Albert Edward Died 7 Sep 1931 Bristol,
Hayes, James Died 16 Aug 1943 Axbridge, Somerset
Hayes, John Died 10 Jul 1912 Axbridge, Somerset
Raw, Donovan Died 14 Nov 1944 Lidgetton, Natal
Raw, Leonard Crick Died 29 Mar 1963 Lidgetton, Natal
Raw, Lynda Died 19 Nov 1977 Lidgetton, Natal
Raw, Margaret Amy Died 24 Nov 1867 Lidgetton, Natal
Raw, William George Died 7 Nov 1956 Lidgetton, Natal
Riddle, George Died 29 Nov 1878 Cardinham, Cornwall
Riddle, Mary Elizabeth Died 20 Sep 1858 Cardinham, Cornwall
Riddle, William Died 23 Apr 1848 Cardinham, Cornwall
Riddle, William Died 2 Oct 1854 Cardinham, Cornwall
Sandercock, Charlotte Died 11 Feb 1880 Cardinham, Cornwall
Sandercock, Henry Died 16 Jan 1887 Cardinham, Cornwall
Sandercock, William Died 24 Nov 1786 Cardinham, Cornwall
Stooke, Edmund Died Oct 1860 Ashton, Devon
Tribelhorn, Johannes Jacobus Ferdinand Died 8 Jan 1968 Stellenbosch,
Western Cape
______________
Printed 25 May 2011

(the columns don't match when imported into Usenet). Click on any line and you
are taken to the original record to check, edit or annotate.

In that case, the Riddle ones in Cardinham, Cornwall, were ones I did not know
were related, but if I find other evidence that might link to Riddles, it is
easy enough to find these ones.

In some cases the date of death on a gravestone differs from the date of death
on a death certificate or other record. In a lineage-linked genealogy program
you usually have one field for recording your conclusion, which is the correct
date. But recording the evidence separately means you can change it if fresh
evidence turns up giving you reason to revise your conclusion.

If I understood your question correctly, I think this answers it.

Tom Wetmore

unread,

May 25, 2011, 4:11:32 AM5/25/11

to

Ian,

Bingo. You've just described the ideal model.

Tom

Tom Wetmore

unread,

May 25, 2011, 4:25:00 AM5/25/11

to haye...@yahoo.com

> Steve Hayes from Tshwane, South Africa

Steve,

Given that a genealogical application does not support evidence it seems like
askSam is almost ideal as the "second program" to handle evidence.

I would sure like those askSam capabilities built into my genealogy app so I
didn't have to move back and forth between two program. I might be
naive, but I don't think it would be difficult to add the capabilities to a
genealogy app. Every genealogy record can be viewed as a tree of
structured name/value pairs (just think of any GEDCOM, XML, or jSON
representation of anything you've seen before). If the askSam capability is
that of allowing generalized searches over the values of these pairs,
then this is a natural add-on feature for a genealogy app.

My inclination is to use fairly strictly defined persona records to hold
most codified evidence, but using the askSam approach it seems that much
of the strictness could be removed, allowing "evidence records" to be
just about anything the user desired, just as long as they were structured
in such a way that askSam style searching could handle them.

Tom

steve

unread,

May 25, 2011, 6:59:32 AM5/25/11

to

On May 24, 2:19 am, "Peter J. Seymour" <Newsgro...@pjsey.demon.co.uk>
wrote:

> On 2011-05-23 23:08, singhals wrote:
>
> > Tom Wetmore wrote:

> >> Cheryl,
>
> >> I'm not asking what you should store so you can convince others of the
> >> accuracy of your work.
>
> >> I'm asking what do you want to do with the evidence you have gathered
> >> about people you MIGHT be interested in, but BEFORE you have figured
> >> out who was who. Since the evidence might apply to persons you are
> >> interested in, I assume you wouldn't throw it out. Since you don't
> >> know who it refers to yet, you can't add it to any person record
> >> already in your database. In what form would you want that evidence,
> >> and what would you like to be able to do with it?
>
> > I want a piece of paper, filed where it seemed to me to be a good idea
> > to file it.
>
> > Paper because I find it easier to shuffle paper than electrons -- for
> > one thing, I can spread two dozen pieces of paper on the table-top and
> > STILL be able to read them, something I find I cannot do with even 4
> > windows open on the monitor.
>
> > Cheryl
>
> Absolutely. For all my interest in computers, my real underlying
> interest is in information. Computers are only a tool to help dealing
> with this. I keep the evidence (and some printouts) in lever arch files
> roughly organised on a surname basis. Yes, you can take several pieces
> of paper and spread them out in front of you to study and compare them.
> That is something you cannot do on a computer.
>
> The issue of evidence for people you MIGHT be interested in is a good
> point. However, my feeling is that throwing it all straight onto a
> computer is likely to be unhelpful even if the software can cope.
>
> Peter

Anybody know what kind of approach IBM used in programming Watson for
Jeopardy?

J. Hugh Sullivan

unread,

May 25, 2011, 7:56:56 AM5/25/11

to

On Tue, 24 May 2011 23:25:08 -0400, Wes Groleau
<Grolea...@FreeShell.org> wrote:

>On 05-24-2011 10:40, J. Hugh Sullivan wrote:
>> Unsourced "facts" are often just conclusions - those are pretty scary.
>
>Maybe not even that. Could be cousin Jane's memory of what her
>93-year-old grandfather told her about events that happened before
>his father was born.
>
>(Been there, read that.)
>
>--
>Wes Groleau

Events do tend to get expanded as the years pass.

Hugh

singhals

unread,

May 25, 2011, 10:46:48 AM5/25/11

to gen...@rootsweb.com

Ian Goddard wrote:
> singhals wrote:
>> Yes, PAF too is said to handle audio and video, but since a majority of
>> my ancestors died before that technology was invented it's even less
>> useful. (g)
>
> And even if it were invented would you really want a recording of a very
> drunk ggfather singing out of tune?
>

(G) EXcellent point!

Cheryl

Bob LeChevalier

unread,

May 25, 2011, 11:20:59 AM5/25/11

to

Tom Wetmore <tt...@verizon.net> wrote:
>Bob,
>
>I've used nFS and read all the API documentation. They have personas
>and persons and the user interface allows you to rearrange
>personas within persons. It's a two-tier system. I think it's a great
>example of how personas can be made to work. I think their data is
>pretty stinky, but that has no bearing on the technique in my opinion.

Their stinky data is merely the old IGI, PRFs and AFs (along with new
stuff that is being constantly added, and which is generally better).

A lot of that old IGI stuff was pretty bad, but they had the option to
throw everything away, or keep it and find ways to weed out the bad
over time. They seem to be trying the latter, and in most cases it is
working.

They haven't solved the problem of what to do when someone 20 or 50
years ago conflated two people that weren't related, and recorded that
conflation as a "fact".

At some point they will possibly decide that certain "personas" aren't
salvageable, and remove them. (I have the impression that they can
and will do so now, but it is on a record by record basis after being
presented with documented evidence.)

>>> Eventually every genealogist reaches the point when he or she has delved
>>> far enougback in time that the solid, firm trail of records has dried up.
>
>> I don't ever expect to reach that point. But then my genealogy has
>> never been confined to direct ancestors. Try to find all the
>> descendants of some ancestor born around the time of the revolution.
>> Every single one of those descendants is a cousin, and in many
>> families that task alone might take a lifetime.
>
>What I think you are saying is that you never plan to have to cross the chasm
>from person-based genealogy to record-based genealogy.

On the contrary, I have had to do some of what you call "record-based
genealogy" in the 20th century (especially since many records for
"living" persons are locked up), and my French work is back in the
17th century and is entirely record-based. But I never found a chasm
to be crossed. Lots of little roadblocks, but they only affect one
person or one record out of a zillion lines.

When you reach the point where there are no records at all, then there
is a chasm. I put no priority on trying to cross them. I have
ancestors in pre-Revolution Russia and the Ottoman empire, but no
significant likelihood in my lifetime to get access to records that I
can use (though they might exist). So I've chosen other pursuits,
mostly not for my own family, directly.

>The whole genealogical application business, in my opinion, caters to people who
>believe that.

I think it caters to people who are interested in tree structures
rather than source data as an organizing principle. Because that is
what sells enough to warrant "catering".

>Obviously from all I've written about this, I don't believe
>that. At my point in genealogical research I have 1000s of records
>that I haven't been able to assign to real people yet.

As far as I am concerned, they are all assigned to real people.
Whether I can attach those people to my tree is another question. I
generally don't enter it into my database until I think such
attachment is likely if not definite. The source records I either
leave where they lie (because I don't give a damn about building a
pile of paper that no one including me will ever look at). Or in some
cases, build spreadsheets for my extractions therefrom - I don't need
a genealogical application for that.

I would not imagine that there is would be any general solution for
the problems of organizing records of random types that does not
involve linking them together in SOME way, or possibly in MANY ways.

>Here is my overall genealogical project. I descend from Loyalist Wetmores
>who were exiled to Canada at the end of the American Revolutionary
>War. By the 1850s many of the children and grandchildren of those
>families were returning to the United States where the economy was
>stronger. I descend from one of those returning families. My project
>is to understand that return migration by finding all the families who
>were involved, what their patterns of migration were, where they
>ended up, and where all their descendants are living now. This is a
>full research project. It is definitely a record-based project. You can
>probably understand my needs for effective ways to record all my
>evidence so I can access it in many ways to support the process of
>making conclusions.

But the ways you want to organize such evidence is not genealogical
(i.e. not based on family relationships), because you don't KNOW the
family relationships. You want a genealogical application to do a
fundamentally non-genealogical task.

>You might argue that this project is not a genealogical project, but
>rather a historical project, so I have no business expecting a
>genealogical application to be able to support me. I don't see it that
>way. I believe that genealogy is history, and the farther back in
>time we go, the more we have to act like historians to make
>progress. I want a genealogical application that can support what
>I am doing with this project.

While one can consider genealogy a branch of history, once you leave
the realm of family relationships, it is no longer genealogy.

The market isn't there for such a general application covering
historical projects as opposed to genealogical ones, in part because
every "historical project" is different both in goals and in methods.
If that were not the case, then professional historians would be using
such applications. (some of them might be, but you'd have to consult
such a professional historian).

Bob LeChevalier

unread,

May 25, 2011, 11:27:14 AM5/25/11

to

Tom Wetmore <tt...@verizon.net> wrote:
>I concur with Cheryl. Basically all genealogical applications allow their
>records to contain links to external items, either files on the local file
>system or to URL on the world wide web.
>
>That doesn't solve the problem of where to store those links. As Cheryl
>points out, if you put a link in a person record, you are making the
>explicit statement that the linked-to evidence refers to that person.

I'd quibble and say that it MIGHT refer to that person. Genealogical
software sometimes does have degree of confidence marking on source
data, not that I have found it to be at all useful.

>The question I am trying to get to in this thread, is how are you going
>to handle that evidence, a link to an external something or other in
>this case, when you don't yet know what person should link to it?

Not in a genealogical application, because until you can link it to a
tree, it is not ***genealogical*** data.

The only obvious answer is that you organize the data in whatever way
is most useful to you, based on whatever commonalities there are in
the data that you think are important. If there are no such obvious
commonalities, you might just be reduced to assigning an ID number to
the evidence, and keeping a list indexed by ID number.

Wes Groleau

unread,

May 25, 2011, 12:28:45 PM5/25/11

to

On 05-25-2011 06:59, steve wrote:
> Anybody know what kind of approach IBM used in programming Watson for
> Jeopardy?

http://www.sciencemag.org/content/331/6020/999.short

http://www.sciencedirect.com/science/article/pii/S0160289611000262

http://www.medscape.com/viewarticle/740079

http://portal.acm.org/citation.cfm?id=1854275

http://en.wikipedia.org/wiki/Watson_%28computer%29

http://www.pbs.org/wgbh/nova/tech/will-watson-win-jeopardy.html

http://www.pbs.org/wgbh/nova/tech/smartest-machine-on-earth.html

http://www.google.com/search?q=watson+jeopardy

Wes Groleau

unread,

May 25, 2011, 12:35:49 PM5/25/11

to

On 05-25-2011 11:20, Bob LeChevalier wrote:
> Their stinky data is merely the old IGI, PRFs and AFs (along with new
> stuff that is being constantly added, and which is generally better).

Welll, ....

One of the collections is images of marriage records where a column on
the left is "date license issued" and a column on the right is "date of
marriage." Both columns legibly labeled, yet the index consistently
reports the left one as date of marriage, even when the right one
legibly says "license returned—marriage did not occur."

This makes all of the other collections suspect, since most of them
offer the indexes only—no verification possible.

Another one estimates the date of birth from the date of death, even
though the record _and_ the index has the actual date of birth stated.

Wes Groleau

unread,

May 25, 2011, 12:40:54 PM5/25/11

to

On 05-24-2011 16:26, Tom Wetmore wrote:
> That doesn't solve the problem of where to store those links. As Cheryl
> points out, if you put a link in a person record, you are making the
> explicit statement that the linked-to evidence refers to that person.

Always? Many programs support some variation of GEDCOM's TYPE tag.

No reason a link couldn't have a TYPE subrecord, or a NOTE or ....
Even if GEDCOM doesn't officially support it.

Maybe some software out there somewhere has tried that.

(Please don't take my comments as an enthusiastic endorsement
of GEDCOM)

Tom Wetmore

unread,

May 25, 2011, 12:43:48 PM5/25/11

to

On Wednesday, May 25, 2011 11:20:59 AM UTC-4, Bob LeChevalier wrote:
> Tom Wetmore <tt...@verizon.net> wrote:
> >Bob,

> >What I think you are saying is that you never plan to have to cross the chasm

> >from person-based genealogy to record-based genealogy.
>
> On the contrary, I have had to do some of what you call "record-based
> genealogy" in the 20th century (especially since many records for
> "living" persons are locked up), and my French work is back in the
> 17th century and is entirely record-based. But I never found a chasm
> to be crossed. Lots of little roadblocks, but they only affect one
> person or one record out of a zillion lines.

> When you reach the point where there are no records at all, then there
> is a chasm. I put no priority on trying to cross them. I have
> ancestors in pre-Revolution Russia and the Ottoman empire, but no
> significant likelihood in my lifetime to get access to records that I
> can use (though they might exist). So I've chosen other pursuits,
> mostly not for my own family, directly.
>

I disagree. The chasm doesn't occur when you run out of records; it occurs
when you cannot automatically assign the records you find to a real person
yet.

> >The whole genealogical application business, in my opinion, caters to
> >people who believe that.
>
> I think it caters to people who are interested in tree structures
> rather than source data as an organizing principle. Because that is
> what sells enough to warrant "catering".

Agreed.

> >At my point in genealogical research I have 1000s of records
> >that I haven't been able to assign to real people yet.
>
> As far as I am concerned, they are all assigned to real people.
> Whether I can attach those people to my tree is another question. I
> generally don't enter it into my database until I think such
> attachment is likely if not definite.

This is pretty close to my point. When you refer to real people that you can't
attach to your tree yet, I would say you're talking about something close to
the persona concept. So to my way of thinking you are storing your evidence
that can't go directly into persons in your tree, into "holding personas".

This is exactly what I am forced to do with my genealogy program, which I
guess I am now ashamed to say, I wrote (LifeLines). It is chockerblock full
of person records, some the real tree persons, and some my "on hold" evidence
personas. Except for the fact that I have no user interface mechanism that
allows me to group the personas into persons when I decide how to do it, the
system works middling well. I am still forced to merge, which I now believe
to be the wrong thing to do with personas.

I am working on a new application now where I use drag and drop to manipulate
persona records in person records. I'll be using this to experiment with
the mutli-tiered person tree approach. If I can't find a user interface
metaphor that makes working with the data easy to comprehend I will likely
be forced to conclude that the "person tree" concept, as beautiful as I think
it is, just isn't practical for the real world.

What do you do when you decide that one of those stand alone persons is the
same as one already in your tree, or if you decide that two of those stand
alone persons are the same person but you still are not ready to add him to
your tree. I assume you have to merge data, that is reduce the number of
your person records. I don't like this because I think of the persona records
as being a codification of evidence. Of course, you don't have to think of
it that way.

> The source records I either
> leave where they lie (because I don't give a damn about building a
> pile of paper that no one including me will ever look at). Or in some
> cases, build spreadsheets for my extractions therefrom - I don't need
> a genealogical application for that.

I agree with the former, but think that genealogical applications can do more
for you than spreadsheets can.

> I would not imagine that there is would be any general solution for
> the problems of organizing records of random types that does not
> involve linking them together in SOME way, or possibly in MANY ways.

Well, for doing genealogy there aren't that many kinds of record types you
have to deal with. I'd add persona, event, place to the list, and maybe have
a catch-all object for everything else.

> >Here is my overall genealogical project. I descend from Loyalist Wetmores
> >who were exiled to Canada at the end of the American Revolutionary
> >War. By the 1850s many of the children and grandchildren of those
> >families were returning to the United States where the economy was
> >stronger. I descend from one of those returning families. My project
> >is to understand that return migration by finding all the families who
> >were involved, what their patterns of migration were, where they
> >ended up, and where all their descendants are living now. This is a
> >full research project. It is definitely a record-based project. You can
> >probably understand my needs for effective ways to record all my
> >evidence so I can access it in many ways to support the process of
> >making conclusions.
>
> But the ways you want to organize such evidence is not genealogical
> (i.e. not based on family relationships), because you don't KNOW the
> family relationships. You want a genealogical application to do a
> fundamentally non-genealogical task.

But I'm trying to discover the family relationships. It's not complicated how
I want to organize the data -- by name, by names of parents if known, by
names of spouses and children when known, by important event dates and places
when known. These are exactly the same properties that any genealogists wants
any of his records organized by.

> >You might argue that this project is not a genealogical project, but
> >rather a historical project, so I have no business expecting a
> >genealogical application to be able to support me. I don't see it that
> >way. I believe that genealogy is history, and the farther back in
> >time we go, the more we have to act like historians to make
> >progress. I want a genealogical application that can support what
> >I am doing with this project.
>
> While one can consider genealogy a branch of history, once you leave
> the realm of family relationships, it is no longer genealogy.

Technically, I agree. Practically I don't think it matters much. There is no
real difference in my mind between researching persons for the purpose of
finding their family tree, versus researching them for the purpose of
finding out as much about them and their milieux as possible; one just goes
a little further, but the process of finding records and reasoning from them
is the same.

> The market isn't there for such a general application covering
> historical projects as opposed to genealogical ones, in part because
> every "historical project" is different both in goals and in methods.
> If that were not the case, then professional historians would be using
> such applications. (some of them might be, but you'd have to consult
> such a professional historian).

I agree there isn't a market, but not for the reason you give. Historical
projects do have different goals, but their methods are identical and easily
supported by software. The methods are search for records, extract evidence
from the records, reason about the evidence you have found, make and justify
conclusions based on the evidence and your reasoning, and publish your
findings. There are certainly lots of record types, and searching for them
and extracting evidence from them can be pretty unique, but there is nothing
different between projects in the overall process or in the fact that a
single software application could support them all. I just think the market
is small because it is small!

Tom Wetmore

Wes Groleau

unread,

May 25, 2011, 1:12:06 PM5/25/11

to

On 05-25-2011 04:25, Tom Wetmore wrote:
> Given that a genealogical application does not support evidence it seems like

It's still not a given to me.

> askSam is almost ideal as the "second program" to handle evidence.
>
> I would sure like those askSam capabilities built into my genealogy app so I
> didn't have to move back and forth between two program. I might be
> naive, but I don't think it would be difficult to add the capabilities to a
> genealogy app. Every genealogy record can be viewed as a tree of
> structured name/value pairs (just think of any GEDCOM, XML, or jSON

The basic search in webtrees (and probably other apps) is pretty much a
full text search, with advanced having specific name/value pairs.

(We had to add a little code to discard hits on ID fields))

Tom Wetmore

unread,

May 25, 2011, 3:02:47 PM5/25/11

to Grolea...@freeshell.org

On Wednesday, May 25, 2011 12:40:54 PM UTC-4, Wes Groleau wrote:
> On 05-24-2011 16:26, Tom Wetmore wrote:
> > That doesn't solve the problem of where to store those links. As Cheryl
> > points out, if you put a link in a person record, you are making the
> > explicit statement that the linked-to evidence refers to that person.
>
> Always? Many programs support some variation of GEDCOM's TYPE tag.
>
> No reason a link couldn't have a TYPE subrecord, or a NOTE or ....
> Even if GEDCOM doesn't officially support it.
>
> Maybe some software out there somewhere has tried that.
>
> (Please don't take my comments as an enthusiastic endorsement
> of GEDCOM)
>

Wes,

I wasn't clear enough. The idea I was getting at is this. You have found
an item of evidence that you either copy onto your computer as a file
or you have as a URL text string. You are pretty sure this evidence refers
to a person you are interested in, but you haven't gotten enough info
yet to be sure of this or to know exactly what person it refers to.

Cheryl made the point that she would keep a link to that file or URL in
a person record in her database. My question was directed to the
situation where you don't yet have such a person record to hold the link.

My preferred approach is to codify that evidence into new persona
records and let them be sit in the database while you collect more data.
These persona records are indexed and searchable and manipulable
and editable as easily as regular person records.

I personally find that this simple mechanism solves all problems
I have with designing a single system that can seamlessly handle
both record-based and person-based genealogy. I just need the
software to give the UI to do this.

Tom

Tom Wetmore

unread,

May 25, 2011, 3:25:38 PM5/25/11

to Grolea...@freeshell.org

On Wednesday, May 25, 2011 12:35:49 PM UTC-4, Wes Groleau wrote:
> On 05-25-2011 11:20, Bob LeChevalier wrote:
> > Their stinky data is merely the old IGI, PRFs and AFs (along with new
> > stuff that is being constantly added, and which is generally better).
>
> Welll, ....
>
> One of the collections is images of marriage records where a column on
> the left is "date license issued" and a column on the right is "date of
> marriage." Both columns legibly labeled, yet the index consistently
> reports the left one as date of marriage, even when the right one
> legibly says "license returned—marriage did not occur."
>
> This makes all of the other collections suspect, since most of them
> offer the indexes only—no verification possible.
>
> Another one estimates the date of birth from the date of death, even
> though the record _and_ the index has the actual date of birth stated.

Wes,

I'm sure there are many more horror stories like your example.

Personally I think that the LDS should scrap all the data in the nFS tree
and then pre-load it with personas created from high-quality data
taken from vital records, censuses, and other sources from which it
is possible to derive quality, sourced, personas. Then don't allow
people to blindly upload GEDCOM files. Force them to enter new
personas only by hand, as an attempt to filter out the crappiest of
the crap. Personally I think it's a lost cause as soon as you allow
any user to add any data they like, which basically means that these
family trees of all mankind are inherently flawed.

The only solution, in my very humble opinion, is to highly restrict
that personas that can be added, highly restrict the ability of
users to join personas into persons and to join persons into
families (it would not be hard for simple algorithms to check basic
sanity of such things). The only hope for these trees is quality
input and high quality procedures to create persons and families.
Personally I much prefer automatic algorithms to do most of
this work, as I have seen these algorithms do wonderful things
in analogous contexts (I'll be happy to write about these some
time). These are controversial views than many disagree with.

Tom

singhals

unread,

May 25, 2011, 3:41:35 PM5/25/11

to gen...@rootsweb.com

Tom Wetmore wrote:
> On Wednesday, May 25, 2011 12:40:54 PM UTC-4, Wes Groleau wrote:
>> On 05-24-2011 16:26, Tom Wetmore wrote:
>>> That doesn't solve the problem of where to store those links. As Cheryl
>>> points out, if you put a link in a person record, you are making the
>>> explicit statement that the linked-to evidence refers to that person.
>>
>> Always? Many programs support some variation of GEDCOM's TYPE tag.
>>
>> No reason a link couldn't have a TYPE subrecord, or a NOTE or ....
>> Even if GEDCOM doesn't officially support it.
>>
>> Maybe some software out there somewhere has tried that.
>>
>> (Please don't take my comments as an enthusiastic endorsement
>> of GEDCOM)
>>
> Wes,
>
> I wasn't clear enough. The idea I was getting at is this. You have found
> an item of evidence that you either copy onto your computer as a file
> or you have as a URL text string. You are pretty sure this evidence refers
> to a person you are interested in, but you haven't gotten enough info
> yet to be sure of this or to know exactly what person it refers to.
>
> Cheryl made the point that she would keep a link to that file or URL in
> a person record in her database. My question was directed to the
> situation where you don't yet have such a person record to hold the link.
>

Then I create a person-record/persona for it. Hence the
dozen or so different entries with a single name.

> My preferred approach is to codify that evidence into new persona
> records and let them be sit in the database while you collect more data.
> These persona records are indexed and searchable and manipulable
> and editable as easily as regular person records.
>

Apparently, you're calling what I do "codifying"; I call it
saving.

> I personally find that this simple mechanism solves all problems
> I have with designing a single system that can seamlessly handle
> both record-based and person-based genealogy. I just need the
> software to give the UI to do this.

PAF produces a UID for each entry; I'm pretty certain other
programs do to, particularly the ones that allow
synchronization of multiple databases.

Cheryl

Tom Wetmore

unread,

May 25, 2011, 3:57:39 PM5/25/11

to gen...@rootsweb.com

> > Cheryl made the point that she would keep a link to that file or URL in
> > a person record in her database. My question was directed to the
> > situation where you don't yet have such a person record to hold the link.

> Then I create a person-record/persona for it. Hence the
> dozen or so different entries with a single name.

I agree.

> > My preferred approach is to codify that evidence into new persona
> > records and let them be sit in the database while you collect more data.
> > These persona records are indexed and searchable and manipulable
> > and editable as easily as regular person records.

> Apparently, you're calling what I do "codifying"; I call it
> saving.

Codify simply means to be systematic. To create those persona records you
have to take information found in evidence and systematically create
persona records from it.

> > I personally find that this simple mechanism solves all problems
> > I have with designing a single system that can seamlessly handle
> > both record-based and person-based genealogy. I just need the
> > software to give the UI to do this.

What's not mentioned here, is what happens with the personas when
you decide who they are. The usual answer is to 1) merge the data in
the persona record into the person record if the person record
already exists; or 2) create the first version of a new person record
from the persona record. Merging is key. This is the main thing I
object to. I want the personas to maintain their integrity/identity for
all time, since they are the extracted from of original evidence. This
is why I replace the merging operation with a tree-building
operation. This tree is not the genealogical pedigree tree, it is
the tree of records that represent the same person. The leaves of
this person tree are the persona records. The other nodes are the
person records.

Tom

singhals

unread,

May 25, 2011, 5:16:03 PM5/25/11

to gen...@rootsweb.com

WHY would I want to keep them separate once I've decided
they're the same person? My bc, my mc, and my son's bc all
refer to ME as ME; what would be the POINT of keeping ME in
there 3 times?

Cheryl

Tom Wetmore

unread,

May 25, 2011, 5:50:43 PM5/25/11

to gen...@rootsweb.com

> WHY would I want to keep them separate once I've decided
> they're the same person? My bc, my mc, and my son's bc all
> refer to ME as ME; what would be the POINT of keeping ME in
> there 3 times?
>
> Cheryl

Cheryl,

You don't have yourself three times. You have yourself once. You
just have separate records, if you want, for your birth, marriage,
residence, and your own record refers to these.

I know you think this doesn't make any sense with respect
to yourself or your parents or grandparents or great
grandparents. Nobody does. That it would even be crazy.

You have to get over the hurdle of thinking about the people you
know and the people it is easy to find by following a clear trail
of vital records or census records or similar records.

These problems with evidence only come to the fore when you
are far enough back, and into the realm of uncertainty, that
when you find evidence about people with names you might be
interested in, you don't know enough yet whether the people
really are the people you are looking for. That is the
evidence that has to be worried about. There is no person in
your database you can add that information to, because you
haven't figured out who the people are yet. Say you have found
400 records like this. The question I am asking is what are
you going to do with those 400 items of evidence so you can
quickly search them, arrange them, and use them to help you
make your conclusions about who the people really were?"

We have gotten some very good answers to that question on
this thread. I have my own preferred answer, which is just
one of many that are being described here. My answer is
to create persona records for each item of evidence and
add those persona records to my database. Then when I
decide that one of those persona records really does refer
to one of the persons I am interested in I make the record of
that person refer to that persona record. I don't want to
merge the data in for two very good reasons:

1. I might be wrong and want to undo the merge. Undoing
a merge is very messy, especially after multiple merges
have happened.

2. I want the persona to be a codified representation of the
real evidence. I want it to be a record in my database that
contains exactly and only the information about a person
that came from one item of evidence.

There are many programs that allow you to create all the
persona records you want in the guise of extra person
records. Basically all of them. But when you decide that two
of your records are the same person you HAVE to merge
them. Well, the only exception I am aware of is the New
Family Search Tree, that explicitly supports personas
and persons.

So you have the multiple persona records as the result
of difficult research and not wanting to muddle your
evidence. For people easy to research you can be as
sloppy as you want.

Tom

Wes Groleau

unread,

May 25, 2011, 7:53:52 PM5/25/11

to

On 05-25-2011 15:25, Tom Wetmore wrote:
> Personally I think that the LDS should scrap all the data in the nFS tree
> and then pre-load it with personas created from high-quality data
> taken from vital records, censuses, and other sources from which it
> is possible to derive quality, sourced, personas. Then don't allow

Unfortunately, the records I complained about were created by volunteer
indexers from images digitized from microfilm. Obviously there is a
quality control inadequacy.

They aren't in a "tree"

Bob LeChevalier

unread,

May 25, 2011, 9:18:59 PM5/25/11

to

Wes Groleau <Grolea...@FreeShell.org> wrote:
>On 05-25-2011 11:20, Bob LeChevalier wrote:
>> Their stinky data is merely the old IGI, PRFs and AFs (along with new
>> stuff that is being constantly added, and which is generally better).
>
>Welll, ....
>
>One of the collections is images of marriage records where a column on
>the left is "date license issued" and a column on the right is "date of
>marriage." Both columns legibly labeled, yet the index consistently
>reports the left one as date of marriage, even when the right one
>legibly says "license returned—marriage did not occur."

Supposedly, if it is a consistent error, you can contact someone with
specifics, and they'll fix it. It sounds like you are talking about
one of the new databases on the search site, rather than the lineages
on new.family.search.org

>This makes all of the other collections suspect, since most of them
>offer the indexes only—no verification possible.
>
>Another one estimates the date of birth from the date of death, even
>though the record _and_ the index has the actual date of birth stated.

You definitely seem to be talking about databases of transcribed
records here. Even such problems are trivial compared to some of the
people in the lineages with 5 different first names, born in 5
different states and sometimes multiple countries, married to several
people, some who were dead before they were born, and others
simultaneously in multiple states, and listing someone as a son who is
also one of their 5 different fathers.

Bob LeChevalier

unread,

May 25, 2011, 9:34:26 PM5/25/11

to

Tom Wetmore <tt...@verizon.net> wrote:
>> >At my point in genealogical research I have 1000s of records
>> >that I haven't been able to assign to real people yet.
>>
>> As far as I am concerned, they are all assigned to real people.
>> Whether I can attach those people to my tree is another question. I
>> generally don't enter it into my database until I think such
>> attachment is likely if not definite.
>
>This is pretty close to my point. When you refer to real people that you can't
>attach to your tree yet, I would say you're talking about something close to
>the persona concept. So to my way of thinking you are storing your evidence
>that can't go directly into persons in your tree, into "holding personas".
>
>This is exactly what I am forced to do with my genealogy program, which I
>guess I am now ashamed to say, I wrote (LifeLines). It is chockerblock full
>of person records, some the real tree persons, and some my "on hold" evidence
>personas.

That is the difference in my approach. I generally don't add someone
to my data base unless I have connected them to at least one other
person in my data base. Unlinked individuals are better dealt with in
a flat table (spreadsheet) than in a relational data base.

>Except for the fact that I have no user interface mechanism that
>allows me to group the personas into persons when I decide how to do it, the
>system works middling well. I am still forced to merge, which I now believe
>to be the wrong thing to do with personas.

With a spreadsheet, you could "combine" records (without merging) by
adding a column for a unique combined-person-id, and putting that id
in all of the relevant records. You haven't merged them - separating
them is merely a matter of erasing the id from the records that no
longer seem to fit.

A "person record" is merely the set of "persona records" that share a
single id (sorting or selecting on values in one column is a basic
spreadsheet function.

>What do you do when you decide that one of those stand alone persons is the
>same as one already in your tree, or if you decide that two of those stand
>alone persons are the same person but you still are not ready to add him to
>your tree.

I have to manually add the data at that point. But I have to manually
add the data at some point. I just don't do it until I have someone
ready to be linked.

>I assume you have to merge data, that is reduce the number of
>your person records.

I occasionally have to merge persons, especially since the data I am
working on now often circles back to families I have already added,
and I don't realize that two people I have added are the same person
until I look at the index of persons (actually, I let the program look
for dupes). But these persons being merged are already not-raw data.

singhals

unread,

May 25, 2011, 9:36:21 PM5/25/11

to gen...@rootsweb.com

singhals wrote:
> Tom Wetmore wrote:

>> On Wednesday, May 25, 2011 12:40:54 PM UTC-4, Wes Groleau wrote:
>>> On 05-24-2011 16:26, Tom Wetmore wrote:
>>>> That doesn't solve the problem of where to store those links. As Cheryl
>>>> points out, if you put a link in a person record, you are making the
>>>> explicit statement that the linked-to evidence refers to that person.
>>>
>>> Always? Many programs support some variation of GEDCOM's TYPE tag.
>>>
>>> No reason a link couldn't have a TYPE subrecord, or a NOTE or ....
>>> Even if GEDCOM doesn't officially support it.
>>>
>>> Maybe some software out there somewhere has tried that.
>>>
>>> (Please don't take my comments as an enthusiastic endorsement
>>> of GEDCOM)
>>>
>> Wes,
>>
>> I wasn't clear enough. The idea I was getting at is this. You have found
>> an item of evidence that you either copy onto your computer as a file
>> or you have as a URL text string. You are pretty sure this evidence refers
>> to a person you are interested in, but you haven't gotten enough info
>> yet to be sure of this or to know exactly what person it refers to.
>>
>> Cheryl made the point that she would keep a link to that file or URL in
>> a person record in her database. My question was directed to the
>> situation where you don't yet have such a person record to hold the link.
>>
>

> Then I create a person-record/persona for it. Hence the
> dozen or so different entries with a single name.
>

>> My preferred approach is to codify that evidence into new persona
>> records and let them be sit in the database while you collect more data.
>> These persona records are indexed and searchable and manipulable
>> and editable as easily as regular person records.
>>
>

> Apparently, you're calling what I do "codifying"; I call it
> saving.
>

>> I personally find that this simple mechanism solves all problems
>> I have with designing a single system that can seamlessly handle
>> both record-based and person-based genealogy. I just need the
>> software to give the UI to do this.
>

> PAF produces a UID for each entry; I'm pretty certain other
> programs do to, particularly the ones that allow
> synchronization of multiple databases.

Well color me surprised.

PAF and Legacy both assign a UID, which one can see in the GED.

Oddly enough, and despite what I've heard said, when I
imported the PAF file into a LEGACY file (direct import, not
GED), the UIDs changed. Seems to me that's a bit awkward
for someone trying to use those UIDs for anything.

Bob LeChevalier

unread,

May 25, 2011, 9:42:46 PM5/25/11

to

Tom Wetmore <tt...@verizon.net> wrote:
>I wasn't clear enough. The idea I was getting at is this. You have found
>an item of evidence that you either copy onto your computer as a file
>or you have as a URL text string. You are pretty sure this evidence refers
>to a person you are interested in, but you haven't gotten enough info
>yet to be sure of this or to know exactly what person it refers to.
>
>Cheryl made the point that she would keep a link to that file or URL in
>a person record in her database. My question was directed to the
>situation where you don't yet have such a person record to hold the link.
>
>My preferred approach is to codify that evidence into new persona
>records and let them be sit in the database while you collect more data.
>These persona records are indexed and searchable and manipulable
>and editable as easily as regular person records.

Thinking about this, one simple solution would be to have a
genealogical data base of evidence personas only, and a second
genealogical data base of (generally linked) persons. I suspect that
most genealogical software can import from another data base created
by the same program fairly trivially. (It is drag and drop in
split-screen mode in Legacy). Putting a persona record into your
persons database (and linking or merging them with some other record
there) does not affect your persona data, so destroying an erroneous
merged record isn't a problem (assuming that you have created all your
persons from personas).

It isn't quite as useful as the separating function on nsf. But I
think it fits what you are describing.

Steven Gibbs

unread,

May 26, 2011, 5:09:09 AM5/26/11

to

"Bob LeChevalier" <loj...@lojban.org> wrote in message
news:pmart65m4u6g5dni0...@4ax.com...

>
> That is the difference in my approach. I generally don't add someone
> to my data base unless I have connected them to at least one other
> person in my data base. Unlinked individuals are better dealt with in
> a flat table (spreadsheet) than in a relational data base.

How do you do that when the data in the record is inadequate to provide
linkage? I used to keep my parish register extractions sorted in text
files, but it became impossible to find things once the files became
significantly large.

Imaginr that you have the will of a John Smith which names his sons as
William and Thomas. Imagine also that you have a marriage certificate for a
Thomas Smith that names his father as John Smith. Clearly on the evidence
I've presented they may or may not be the same people. Can you search your
text files easily to find all candidates for the Thomas Smith who married,
subject to the constraint that his father is called John? If, not having
looked at the family for a few years, you later come across a document which
confirms that Thomas has a brother William, or a document which suggests
that Thomas has no brothers, can you rearrange your thought processes to
take this into account?

I certainly couldn't. I was going through old bits of paper thinking "I'm
sure I decided that these two definitely weren't the same bit I haven't a
clue why I thought that at the time". It was only when I started to build
up possible persons from the available data that the more obscure
coincidences or contadictictions started to jump out at me. So often
previously I would build up a person, realise that this person wasn't
relevant to my research and discard it. Then I'd do all the same work again
a few years later. Now that person stays, fully formed, in my database, and
if I can link an otherwise vague document to that person, I can tell, within
a trivial amount of time, what the relevance of that document to me is.

Steven

Tom Wetmore

unread,

May 26, 2011, 6:38:45 AM5/26/11

to gen...@rootsweb.com

> Oddly enough, and despite what I've heard said, when I
> imported the PAF file into a LEGACY file (direct import, not
> GED), the UIDs changed. Seems to me that's a bit awkward
> for someone trying to use those UIDs for anything.

Cheryl,

In GEDCOM the UID's (not actually called that by GEDCOM) "belong" to
GEDCOM. That is, a GEDCOM import process is free to change the
id values to anything it likes. You can see why this is needed
because of clashes between the ids on incoming records and the ids
already in the database. Most programs start assigning ids as I1, I2, I3
for person ids, so if you import two files that were both exported by
the same program, even though the data is completely different,
there will be id clashes to resolve.

Many people feel they should "own" their ids, and they learn to
navigate through their databases by memorizing a few key ids.
GEDCOM has a solution or this in the REFN tag. The value of
the REFN tag is a user-assigned id value. If your program is
GEDCOM compatible it should allow you to search by REFN values
as well as id values. Of course, if you try to import a GEDCOM file
that has a REFN id that conflicts with one already in your
database, the import process will have to resolve that somehow.

Tom

Tom Wetmore

unread,

May 26, 2011, 6:45:53 AM5/26/11

to

On Wednesday, May 25, 2011 9:42:46 PM UTC-4, Bob LeChevalier wrote:

> Thinking about this, one simple solution would be to have a
> genealogical data base of evidence personas only, and a second
> genealogical data base of (generally linked) persons. I suspect that
> most genealogical software can import from another data base created
> by the same program fairly trivially. (It is drag and drop in
> split-screen mode in Legacy). Putting a persona record into your
> persons database (and linking or merging them with some other record
> there) does not affect your persona data, so destroying an erroneous
> merged record isn't a problem (assuming that you have created all your
> persons from personas).

I agree that this approach would prevent you from loosing you personas
when you decide what person a persona belongs to. It would seem that
you would still have the unmerge problem (that is, the difficulty of
undoing the addition of a persona's data to a person), but with the
original personas still around, I bet clever software might be able to
figure out the undo.

If a new software application were designed to support this idea you
wouldn't have to have two databases.

I think this ranks as a new answer to this thread's question.

Tom

Tom Wetmore

unread,

May 26, 2011, 6:53:37 AM5/26/11

to

Steven,

That's the strongest argument for the persona method as the way to
store evidence that I think I've seen, and you didn't use the word
even once!

It also sounds like you have solved the merge problem by just not
doing any. So I surmise that when you decide that one of your personas
("fully formed" in your terms) is one of your persons of interest you
add info from the persona to the person. So when you decide you were
wrong you always have the original persona around to help you back
out your change to the person record.

I think this qualifies as another good approach to answering this
thread's question.

Tom

Bob LeChevalier

unread,

May 26, 2011, 9:06:45 AM5/26/11

to

"Steven Gibbs" <stev...@sgibbs1.freeserve.co.uk> wrote:
>"Bob LeChevalier" <loj...@lojban.org> wrote in message
>news:pmart65m4u6g5dni0...@4ax.com...
>>
>> That is the difference in my approach. I generally don't add someone
>> to my data base unless I have connected them to at least one other
>> person in my data base. Unlinked individuals are better dealt with in
>> a flat table (spreadsheet) than in a relational data base.
>
>How do you do that when the data in the record is inadequate to provide
>linkage?

Then I don't add it to the genealogical data base, and it remains in
the spreadsheet.

(Actually, for one parish, I simply add individuals unlinked, because
my experience has been that 90-95% of the records will be linked in
eventually as a parent, spouse or child of someone else. But this is
a special case even for me.)

>I used to keep my parish register extractions sorted in text
>files, but it became impossible to find things once the files became
>significantly large.

That is why I use spreadsheets. The key fields are kept together in a
single column, and I can sort or search or select from the entire
sheet or a single column. Finding does not seem to be a problem (I
have some 7500 vital records for one parish and 6500 for a second, so
these aren't small files).

>Imaginr that you have the will of a John Smith which names his sons as
>William and Thomas. Imagine also that you have a marriage certificate for a
>Thomas Smith that names his father as John Smith. Clearly on the evidence
>I've presented they may or may not be the same people. Can you search your
>text files easily to find all candidates for the Thomas Smith who married,
>subject to the constraint that his father is called John? If, not having
>looked at the family for a few years, you later come across a document which
>confirms that Thomas has a brother William, or a document which suggests
>that Thomas has no brothers, can you rearrange your thought processes to
>take this into account?

I've done a little work in British genealogy, including one line of
Smiths, and indeed it is difficult. Alas, I don't accumulate evidence
for such lines - if I can't find a link with what is immediately
accessible, I simply move on - I have thousands more that I can spend
time on.

>I certainly couldn't. I was going through old bits of paper thinking "I'm
>sure I decided that these two definitely weren't the same bit I haven't a
>clue why I thought that at the time".

I have occasionally made a text note that indicates that a certain
record does NOT apply to a specific someone in my tree, and why it
doesn't.

>It was only when I started to build
>up possible persons from the available data that the more obscure
>coincidences or contadictictions started to jump out at me. So often
>previously I would build up a person, realise that this person wasn't
>relevant to my research and discard it. Then I'd do all the same work again
>a few years later. Now that person stays, fully formed, in my database, and
>if I can link an otherwise vague document to that person, I can tell, within
>a trivial amount of time, what the relevance of that document to me is.

If I have a person, fully formed, then likely I have several other
people who are definitely attached to him, even though they might or
might not be relevant to my tree. Your two Smith examples above, will
and marriage certificate, each link (at least) 3 people. You thus
have not several individual jigsaw puzzle pieces, but several
connected groups of pieces - these tend to be easier to fit, and in
any event are meaningful objects of study in themselves.

Steve Hayes

unread,

May 26, 2011, 1:24:19 AM5/26/11

to gen...@rootsweb.com

On Wed, 25 May 2011 21:36:21 -0400, in soc.genealogy.computing you wrote:

>singhals wrote:
>> PAF produces a UID for each entry; I'm pretty certain other
>> programs do to, particularly the ones that allow
>> synchronization of multiple databases.
>
>Well color me surprised.
>
>PAF and Legacy both assign a UID, which one can see in the GED.
>
>Oddly enough, and despite what I've heard said, when I
>imported the PAF file into a LEGACY file (direct import, not
>GED), the UIDs changed. Seems to me that's a bit awkward
>for someone trying to use those UIDs for anything.

What is a UID, and where can I find it?

I regularly export my data from FHS to Legacy.

FHS assigns an RID, which Legacy callas a RIN, but if it imports them from the
GEDCOM file, Legacy scrambles them.

So what I do is import the exported records to PAF 4, and import them directly
from PAF to Legacy. Then the Legacy RINs correspond to the FHS ones.

But what is the UID?

--
Steve Hayes from Tshwane, South Africa
Web: http://hayesfam.bravehost.com/stevesig.htm
Blog: http://methodius.blogspot.com
E-mail - see web page, or parse: shayes at dunelm full stop org full stop uk

singhals

unread,

May 26, 2011, 2:59:34 PM5/26/11

to gen...@rootsweb.com

Tom Wetmore wrote:
>> Oddly enough, and despite what I've heard said, when I
>> imported the PAF file into a LEGACY file (direct import, not
>> GED), the UIDs changed. Seems to me that's a bit awkward
>> for someone trying to use those UIDs for anything.
>
> Cheryl,
>
> In GEDCOM the UID's (not actually called that by GEDCOM) "belong" to
> GEDCOM. That is, a GEDCOM import process is free to change the

Uh, the two GEDs I created yesterday called 'em UID. The
UID is created by the program when the entry is first made;
matching these allows you to be sure you've picked up the
right person.

0 @I7@ INDI
1 NAME ss dd /NAME/
2 SURN name
2 GIVN ss dd
1 SEX M
1 BIRT
2 DATE 17 Nov 1889
2 PLAC Elstie, Cambria, PA
1 DEAT
2 DATE 14 Oct 1964
2 PLAC Altoona, Blair, PA
1 _UID 59360E835210204B9F0FE0685C9D1B7F7635
1 FAMS @F3@
1 FAMC @F4@

and

0 @I7@ INDI
1 NAME ss dd /NAME/
2 GIVN ss dd
2 SURN NAME
1 SEX M
1 BIRT
2 DATE 17 Nov 1889
2 PLAC Elstie, Cambria, PA
1 DEAT
2 DATE 14 Oct 1964
2 PLAC Altoona, Blair, PA
1 _UID 59360E835210204B9F0FE0685C9D1B7F7635

But, I see I erred mentioning it -- I picked up someone else
with the same name when I looked at the GEDs. This UID did
stay the same, which is what I thought would happen.

> id values to anything it likes. You can see why this is needed
> because of clashes between the ids on incoming records and the ids
> already in the database. Most programs start assigning ids as I1, I2, I3
> for person ids, so if you import two files that were both exported by
> the same program, even though the data is completely different,
> there will be id clashes to resolve.
>

You're talking about RINs and internal markers specific to a
database. Add someone, delete someone, and the RIN changes.
A UID is said to be, er, immutable.

> Many people feel they should "own" their ids, and they learn to
> navigate through their databases by memorizing a few key ids.
> GEDCOM has a solution or this in the REFN tag. The value of
> the REFN tag is a user-assigned id value. If your program is
> GEDCOM compatible it should allow you to search by REFN values
> as well as id values. Of course, if you try to import a GEDCOM file
> that has a REFN id that conflicts with one already in your
> database, the import process will have to resolve that somehow.

Yes, of course. But the UID is neither an RIN nor a REFN.

Cheryl

singhals

unread,

May 26, 2011, 3:03:39 PM5/26/11

to haye...@yahoo.com, gen...@rootsweb.com

Steve Hayes wrote:
> On Wed, 25 May 2011 21:36:21 -0400, in soc.genealogy.computing you wrote:
>
>> singhals wrote:

>>> PAF produces a UID for each entry; I'm pretty certain other
>>> programs do to, particularly the ones that allow
>>> synchronization of multiple databases.
>>
>> Well color me surprised.
>>
>> PAF and Legacy both assign a UID, which one can see in the GED.
>>
>> Oddly enough, and despite what I've heard said, when I
>> imported the PAF file into a LEGACY file (direct import, not
>> GED), the UIDs changed. Seems to me that's a bit awkward
>> for someone trying to use those UIDs for anything.
>

> What is a UID, and where can I find it?
>

Unique IDentifier -- new with PAF4 as I recall, and I think
inspired by another program that used them. As I recall, we
were told that if two people used two computers to enter the
same data each computer would generate a different UID for
that person. I don't /think/ you can see them anywhere but
the GED. The computer can, and in PAF when you run the
match/merge routine, it asks if you want to use UID.

> I regularly export my data from FHS to Legacy.
>
> FHS assigns an RID, which Legacy callas a RIN, but if it imports them from the
> GEDCOM file, Legacy scrambles them.
>
> So what I do is import the exported records to PAF 4, and import them directly
> from PAF to Legacy. Then the Legacy RINs correspond to the FHS ones.
>
> But what is the UID?

The UID is not a RID/RIN; it is specifically assigned to an
individual record and does not mutate when moved around.

Cheryl

singhals

unread,

May 26, 2011, 3:06:06 PM5/26/11

to gen...@rootsweb.com

singhals wrote:
> singhals wrote:
>> Tom Wetmore wrote:

>>> On Wednesday, May 25, 2011 12:40:54 PM UTC-4, Wes Groleau wrote:
>>>> On 05-24-2011 16:26, Tom Wetmore wrote:
>>>>> That doesn't solve the problem of where to store those links. As Cheryl
>>>>> points out, if you put a link in a person record, you are making the
>>>>> explicit statement that the linked-to evidence refers to that person.
>>>>
>>>> Always? Many programs support some variation of GEDCOM's TYPE tag.
>>>>
>>>> No reason a link couldn't have a TYPE subrecord, or a NOTE or ....
>>>> Even if GEDCOM doesn't officially support it.
>>>>
>>>> Maybe some software out there somewhere has tried that.
>>>>
>>>> (Please don't take my comments as an enthusiastic endorsement
>>>> of GEDCOM)
>>>>
>>> Wes,
>>>
>>> I wasn't clear enough. The idea I was getting at is this. You have found
>>> an item of evidence that you either copy onto your computer as a file
>>> or you have as a URL text string. You are pretty sure this evidence refers
>>> to a person you are interested in, but you haven't gotten enough info
>>> yet to be sure of this or to know exactly what person it refers to.
>>>
>>> Cheryl made the point that she would keep a link to that file or URL in
>>> a person record in her database. My question was directed to the
>>> situation where you don't yet have such a person record to hold the link.
>>>
>>

>> Then I create a person-record/persona for it. Hence the
>> dozen or so different entries with a single name.
>>

>>> My preferred approach is to codify that evidence into new persona
>>> records and let them be sit in the database while you collect more data.
>>> These persona records are indexed and searchable and manipulable
>>> and editable as easily as regular person records.
>>>
>>

>> Apparently, you're calling what I do "codifying"; I call it
>> saving.
>>

>>> I personally find that this simple mechanism solves all problems
>>> I have with designing a single system that can seamlessly handle
>>> both record-based and person-based genealogy. I just need the
>>> software to give the UI to do this.
>>

>> PAF produces a UID for each entry; I'm pretty certain other
>> programs do to, particularly the ones that allow
>> synchronization of multiple databases.
>
> Well color me surprised.
>
> PAF and Legacy both assign a UID, which one can see in the GED.
>
> Oddly enough, and despite what I've heard said, when I
> imported the PAF file into a LEGACY file (direct import, not
> GED), the UIDs changed. Seems to me that's a bit awkward
> for someone trying to use those UIDs for anything.
>

[BLUSH] Sorry; it does what it's supposed to; I have two
guys with the same name and I picked up the wrong one when I
looked before.

The UIDs did remain stable and unchanged from one program to
the other.

The UID is NOT an RIN/RID; it is a multi-digit number said
to be totally unique (hence, UNIQUE Identifier).

Cheryl

Tom Wetmore

unread,

May 26, 2011, 3:11:03 PM5/26/11

to gen...@rootsweb.com

Cheryl,

I stand corrected. I was talking about standard GEDCOM. The _UID tag is an
extension tag used by some systems. As you say, UID's should be immutable
for all time and space. But since it's not a standard tag there's not telling what
might happen to it on import to arbitrary programs. Any program that changes
its value on import should be taken behind the barn and shot.

Tom

Steve Hayes

unread,

May 26, 2011, 9:11:42 PM5/26/11

to

On Thu, 26 May 2011 10:09:09 +0100, "Steven Gibbs"
<stev...@sgibbs1.freeserve.co.uk> wrote:

>
>"Bob LeChevalier" <loj...@lojban.org> wrote in message
>news:pmart65m4u6g5dni0...@4ax.com...
>>
>> That is the difference in my approach. I generally don't add someone
>> to my data base unless I have connected them to at least one other
>> person in my data base. Unlinked individuals are better dealt with in
>> a flat table (spreadsheet) than in a relational data base.
>
>How do you do that when the data in the record is inadequate to provide
>linkage? I used to keep my parish register extractions sorted in text
>files, but it became impossible to find things once the files became
>significantly large.
>
>Imaginr that you have the will of a John Smith which names his sons as
>William and Thomas. Imagine also that you have a marriage certificate for a
>Thomas Smith that names his father as John Smith. Clearly on the evidence
>I've presented they may or may not be the same people. Can you search your
>text files easily to find all candidates for the Thomas Smith who married,
>subject to the constraint that his father is called John? If, not having
>looked at the family for a few years, you later come across a document which
>confirms that Thomas has a brother William, or a document which suggests
>that Thomas has no brothers, can you rearrange your thought processes to
>take this into account?

It might be easier if you transfer the data from a spreadsheet to a database
program. It is quite easy to do that in most cases. Database programs usually
have better reporting facilities.

Steve Hayes

unread,

May 26, 2011, 9:15:47 PM5/26/11

to

On Thu, 26 May 2011 15:03:39 -0400, singhals <sing...@erols.com> wrote:

>Steve Hayes wrote:
>> On Wed, 25 May 2011 21:36:21 -0400, in soc.genealogy.computing you wrote:
>>
>>> singhals wrote:
>>>> PAF produces a UID for each entry; I'm pretty certain other
>>>> programs do to, particularly the ones that allow
>>>> synchronization of multiple databases.
>>>
>>> Well color me surprised.
>>>
>>> PAF and Legacy both assign a UID, which one can see in the GED.
>>>
>>> Oddly enough, and despite what I've heard said, when I
>>> imported the PAF file into a LEGACY file (direct import, not
>>> GED), the UIDs changed. Seems to me that's a bit awkward
>>> for someone trying to use those UIDs for anything.
>>
>> What is a UID, and where can I find it?
>>
>
>Unique IDentifier -- new with PAF4 as I recall, and I think
>inspired by another program that used them. As I recall, we
>were told that if two people used two computers to enter the
>same data each computer would generate a different UID for
>that person. I don't /think/ you can see them anywhere but
>the GED. The computer can, and in PAF when you run the
>match/merge routine, it asks if you want to use UID.

When I do that it asks me if I want to us the Ancestral File Number.

Thatr works quite well if the person has an Ancestral File Number, but those
numbers have now been discontinued, and no new ones are being generated.

>
>> I regularly export my data from FHS to Legacy.
>>
>> FHS assigns an RID, which Legacy callas a RIN, but if it imports them from the
>> GEDCOM file, Legacy scrambles them.
>>
>> So what I do is import the exported records to PAF 4, and import them directly
>> from PAF to Legacy. Then the Legacy RINs correspond to the FHS ones.
>>
>> But what is the UID?
>
>The UID is not a RID/RIN; it is specifically assigned to an
>individual record and does not mutate when moved around.

I'll have to look at some recent Gedcom files to see if I can find them.

singhals

unread,

May 26, 2011, 10:04:08 PM5/26/11

to soc.genealo...@googlegroups.com, gen...@rootsweb.com

You could be right, but I would have sworn it was in the
GedStan. (shrug) I've been wrong before -- and /recently/.

Cheryl

Bob Melson

unread,

May 26, 2011, 11:31:45 PM5/26/11

to

On Thursday 26 May 2011 20:04, singhals (sing...@erols.com) opined:

Y'all'd have to check the archives for the down'n'dirty, but Wes and I and
I don't remember who all else had a discussion of this very thing (_UID,
_UUID, _GUID) some time back. I contended then that UUID (Universal
Unique ID) and it's close kin, UID and GUID, meant that the number
assigned to Cousin Mortimer should be the same universally - here, there,
wherever it appears. Not so. The "universe" is the machine where the
data appears and on which the software resides by which the UUID is
generated and it's only there that it's guaranteed to be unique. Take
that same data to another machine or other software and, guess what?, the
Universal Unique ID will be different. Or, say, Mort's data changes - ta
DA, you may (or may not) get yet another UUID.

The only use I can see for UUIDs is in determining the origin of a
particular record - if I publish Mort's data and include the UUID and some
time later find that Snively Whiplash has published the exact same data
with the exact same UUID and has claimed it to be his own, then I'll know
what to think about ol' Snively, won't I?

Swell Ol' Bob

--
Robert G. Melson | Rio Grande MicroSolutions | El Paso, Texas
-----
The greatest tyrannies are always perpetrated
in the name of the noblest causes -- Thomas Paine

Tom Wetmore

unread,

May 27, 2011, 3:55:01 AM5/27/11

to amia...@mypacks.net

On Thursday, May 26, 2011 11:31:45 PM UTC-4, Bob Melson wrote:
>
> Y'all'd have to check the archives for the down'n'dirty, but Wes and I and
> I don't remember who all else had a discussion of this very thing (_UID,
> _UUID, _GUID) some time back. I contended then that UUID (Universal
> Unique ID) and it's close kin, UID and GUID, meant that the number
> assigned to Cousin Mortimer should be the same universally - here, there,
> wherever it appears. Not so. The "universe" is the machine where the
> data appears and on which the software resides by which the UUID is
> generated and it's only there that it's guaranteed to be unique. Take
> that same data to another machine or other software and, guess what?, the
> Universal Unique ID will be different. Or, say, Mort's data changes - ta
> DA, you may (or may not) get yet another UUID.
>
> The only use I can see for UUIDs is in determining the origin of a
> particular record - if I publish Mort's data and include the UUID and some
> time later find that Snively Whiplash has published the exact same data
> with the exact same UUID and has claimed it to be his own, then I'll know
> what to think about ol' Snively, won't I?
>
> Swell Ol' Bob
>

Bob,

You were right and should have stuck by your guns!!

I don't know where the others got the interpretation that a UID should
be unique only within one database. This is certainly not a GEDCOM rule
since GEDCOM doesn't even have the UID concept. Their interpretation
can only be coming from some vendor's misinterpretation, so to treat that
misinterpretation as if it were a rule or the way things should be is wrong.

A UUID is intended to be unique for all time and place. If a vendor says they
use UID's where the U means "universal" and they don't support this then they
don't support UID. A program should not alter a UID value upon import.
If it does, though, who really cares, since there's no way to take advantage
of UID's in genealogy. So no wonder it doesn't matter to anyone (yet).

Tom

Message has been deleted

Bob LeChevalier

unread,

May 27, 2011, 8:35:28 AM5/27/11

to

Actually, nFS is indeed generating new numbers, but calls them
something different. Legacy handles them, as of version 7.5
(Actually, it has space for a User ID, an AF number, and a Family
Search ID number, the latter being the new number)

Of course the only way to get a Family Search ID number is to enter
your data on their site.

Bob Melson

unread,

May 27, 2011, 10:53:57 AM5/27/11

to

On Fri, 27 May 2011 00:55:01 -0700 (PDT), Tom Wetmore<tt...@verizon.net>
wrote:

Seems to me at least one requirement for universality is that the same data
input on different machines results in the same output on those machines.
This is absolutely not true - in my experience - when dealing with UUIDs.
Worse yet, not only does the same data NOT result in identical UUIDs when
input on different machines running different software, it doesn't even
result in identical UUIDs when input on different machines using
_identical_ software. This would, IMO, support the contention that
uniqueness and universality is restricted to a single machine and
application on that machine.

Richard Smith

unread,

May 27, 2011, 12:33:03 PM5/27/11

to

On May 23, 12:52 pm, Tom Wetmore <t...@verizon.net> wrote:
> This thread is an offshoot from the Linux thread that is going off on a number of tangentsl.
>
> How should we store evidence in genealogical databases?

[I've been away for a few days, so apologies for coming back into the
discussion rather late.]

I regard genealogical research as a seven stage process, and I tend to
handle the data generated at each stage in different ways.

1) Planning

Sometimes I've got a specific objective in mind -- something like
"find out who Thomas Smith's parents are". For each of these
objectives, I create a text file with a few notes about where might be
a good place to search for evidence, where I've already looked, and a
mixture of speculation and notes to myself. I name the file by
surname, name and some additional suffix (say "the boot-maker") to
make the person unique; if there's more than one plan per person
(there rarely is), I'll disambiguate it in some further way. I also
use symlinks (a bit like Window's shortcuts) to maintain an index of
such plans by ancestor number in a separate directory.

As I've got further back, I've found more and more frequently I don't
have such as specific objective. The ultimate objective is usually to
push back one or more generation, but I'm no longer specifically
targeting records with that individual in mind; instead, I'm gathering
as much information as I can do on the surname in the area. I have a
directory with a more general set of plan files with just a surnames
and area (typically a parish name somewhere near the centre of the
area of interest).

I use a revision control system (currently CVS) to keep track of
changes to these plan files, and also to assist in backing them up.

2) Searching

Whenever I search for something, I try to note the fact that I've done
it in one of the plan files. This is particularly important if the
search fails to find anything. If I'm in a records office, I tend to
have a printout of the plan file and scribble on it, typing the notes
up later. Sometimes I do the same for on-line research.

I find on-line sites such as ancestry.com and familysearch.org
particularly troublesome in this regard -- it's far too easy to spend
an hour or two searching for things and forgetting to note anything
down. Neither site keeps a log (at least, not that's available to the
user) of what you've searched for, so you can't go back and write it
up later.

For this reason I no longer use familysearch.org directly. The only
time I ever used it was to look up things on the IGI, so I wrote a
perl script to drive the (old) site, do searches for me, download the
full data set as GEDCOM and log each search I do to the appropriate
plan file. The program requires me to associate the search with a
specific plan, so I can't avoid recording the fact I've done a search.

Putting these search logs into a database, and associating them with a
source and/or repository, would be an obvious improvement. I did
briefly experiment with gnote and mediawiki for the plan files but
gave up -- I found them both overkill for what I wanted.

The result of the search will vary. It might be a piece of GEDCOM (as
per the example above), or an image (e.g. a census image on
ancestry.com), or a entry in book (in which case I may or may not have
been able to make a copy of it). Any paper copies I do end up with
get scanned, and everything gets stored in directories, classified by
type of record and surname. I'm not a big fan of putting things like
images in a database, though indexing them in a database would be
useful. At the moment, the only index I have is the directory
listing. (As with plan files, I sometimes use symlinks if one
document should be filed in multiple places.)

3) Transcribing

Having found a document, the next job is to transcribe it. Often the
result is a flat text file, again one file per source. I try to
transcribe the document as accurately as plain text will allow, and
there's the odd bit of ad hoc mark-up in it to document important bits
of formatting: e.g. [struck-through: my daughter Isabella] or
[inserted: Hampshire]. I very much like the idea that Nick Matthews
suggested elsewhere of using XML for this, and may well start doing
so.

In longer documents, such as wills, I tend to put asterisks around
peoples names to assist in searching; similarly, I often add ISO-style
dates in parentheses [2011-05-24]. I don't do similar tagging for
place names, though if I move to a light-weight XML format, I probably
will do.

In other cases, the source is essentially a long table. Baptism
registers or census forms are a good example of this. In these cases,
I use a tab-separated text file to record each field. That makes it
easy to import into a spreadsheet or database, but at present the
primary version is simply in the text files. Sometimes I'll use a
spreadsheet to create them too, especially if I'm entering a large
number by hand. If I need to add extra notes, they end up in the
rightmost column.

Tabular data of this sort is, again, an example of something that
could usefully go into a database.

At the moment, the text files get stored in CVS to retain a version
history and to back them up.

4) Translating

This stage is often irrelevant as the source is often in English (the
only language I speak fluently). When it is necessary, I put the
translation below the original transcript, in the same file as it.
Even in English documents, there's sometimes an element of
translation: for example, I'll add a note to remind myself what I
think some obscure word or abbreviation means.

5) Extracting

This is the stage that seems to be causing all the excitement here.
It is when I extract the genealogical content from the source and put
it into some computer-readable form. Typically I use GEDCOM as the
destination format, simply because of its ubiquity. Sometimes I find
GEDCOM inadequate for the purpose. For example, if a will mentions
two grandchildren but gives no indication of whether the grandchildren
are siblings, there's no way of expressing this in GEDCOM. In such a
case, I'll either misuse GEDCOM to express what I need as best I can,
or simply not bother extracting that bit of information (perhaps
instead putting into a text note).

For things like censuses, baptisms and so on, because the result of
the transcription is already in a nice easy-to-parse tabular form, I
have scripts that automatically create GEDCOM from the tables.
Sometimes it needs hand editing afterwards to add some extra
information that was in the source, but outside of the expected data
-- for example, I once found a census on which two children had been
grouped together with a big "}" and "twins" written next to them. In
earlier baptism registers, the data is often more or less tabular, but
with implicit fields recording whatever the priest felt was necessary;
and occasionally an entry will have extra information included. Such
cases need manual handling.

I've also got a number of scripts that create blank bits of GEDCOM --
templates, if you like -- that I can then fill in. That fills in
suitable source information.

The result is hundreds of small GEDCOM files, one per source. Some
(e.g. from a gravestone) just contain a single individual and little
else; others (e.g. from a parish register or from an IGI search) may
contain hundreds of individuals, some of whom may be duplicates (for
example, if a couple have three children baptised, then the parents
will appear three times).

These GEDCOM files then get stored in CVS -- even the automatically
generated ones. I will sometimes upload them into a genealogy
program, but as I've not really settled on one that I like, I regard
the GEDCOM as the primary version and never (well, rarely, anyway) use
the program to make changes. It's just a tool to help me process or
visualise the information. I've also experimented converting the
GEDCOM to RDF and importing into an RDF processor (typically the
Redland one) so that I can run SPARQL queries against it. This is
really powerful, but also painfully tedious to use. I do see a future
for something like this, though.

I've also got a script that can search a directory tree of GEDCOM
files looking for people that match specific criteria -- at the
moment, it's pretty primitive, basically just doing name, date at a
particular event, role in the event. It was originally designed for
looking for baptisms, but has expanded a bit.

6) Reasoning

This is the stage that most people think of as genealogy. It's where
I try to work out how I need to combine the persona-level data
extracted from the sources into real people. Was the John Smith in
the 1851 census the one baptised in North Dunny or South Dunny, or
maybe neither? This typically involves looking through all of the
extracted persona-level data for people with the same (or a similar)
surname in the locality over quite a long period. I tend to the view
that unless I can understand every instance of surname in the source
record, I cannot be confident that I've pieced it all together
correctly. (And sometimes even then I can't be confident of it.) An
unexplained burial could be
evidence that what I had considered to be one family was in fact two,
for example, and that might have knock-on-effects elsewhere.

How I work at this stage depends on how many people I have. Sometimes
there are few enough personae that I can keep everything in my head.
For larger groups, I tend to print things out and spread everything
out of my dining room table. In the very largest cases that's
infeasible. For example, I once had an ancestor called John Smith and
all I knew was that he was a cobbler, from Southampton, and an
approximate date of birth from the 1841 census. Trying to sort out
all of the Smiths in a big town was a complex task. (In the end I
discovered evidence that he wasn't actually from Southampton after all
-- he'd just lived there for a while before his marriage.) In that
case, I created spreadsheet with everyone in. (And I still use an
extended version of that spreadsheet as an index to the other
records.)

Once I've sorted things out into groups, then I enter them into Gramps
(my current preferred program). I'll import bits of the persona-level
GEDCOM because that's a convenient way of keeping source information
with it. (Irritatingly I have to strip the repository from the GEDCOM
and manually reassign it because Gramps can't, so far as I know, merge
repositories as it can with other things, but that's a minor
difficulty.)

But what this doesn't do is give me any way of of documenting why I've
merged the personae as I have. Sometimes this will be immediately
obvious from the sources; but other times it won't. But at times, the
reasoning process is more sophisticated. I often start with a large
number of possibilities, consider each one and gradually discount
possibilities as being too improbable until only one remains which for
the time being I regard as probably correct. Documenting such things
is tricky, but I really do care about documenting things: not
primarily to justify my conclusions to others (though that is useful),
but so that I can easily revisit them as further evidence comes to
light, or as I correct any mistakes.

At present, I use the plan files that I create right at the start of
the whole research process to add notes on why I came to the
conclusions I did. But this means the documentation behind the
merging process is not kept with the merged individuals; nor is there
a computer-readable link from the source to the documentation. I
really want there to be so that if I have to correct a mistake in my
transcription / translation / interpretation of the source, I can
readily see what knock-on effects it might have.

7) Presentation

The final step is presenting the data in a good way. That might means
drawing trees (which many programs seem quite poor at), drawing
ancestor tables (which they're much better at), or maybe just
producing indexes of people. But this step is really beyond the theme
of this discussion.

Like most people, I expect, in practice, these seven steps often get
blurred together, or some of them are not relevant. But whenever I
find myself thinking about how to store some new sort of data, or how
to rearrange the way I file things, I do find it very useful to think
in terms of these seven steps.

Richard

Wes Groleau

unread,

May 27, 2011, 8:18:44 PM5/27/11

to

On 05-27-2011 03:55, Tom Wetmore wrote:
> You were right and should have stuck by your guns!!

I may be misremembering, but I think the disagreement was that someone,
perhaps Bob, thought that the UID would be a universal ID for a _person_
which could be used for that person in all databases. Others stated,
no, it distinguihses a _record_ from other records that otherwise might
seem identical.

> I don't know where the others got the interpretation that a UID should
> be unique only within one database. This is certainly not a GEDCOM rule

I don't recall seeing that interpretation.

Wes Groleau

unread,

May 27, 2011, 8:22:11 PM5/27/11

to

On 05-27-2011 10:53, Bob Melson wrote:
> Seems to me at least one requirement for universality is that the same data
> input on different machines results in the same output on those machines.
> This is absolutely not true - in my experience - when dealing with UUIDs.
> Worse yet, not only does the same data NOT result in identical UUIDs when
> input on different machines running different software, it doesn't even
> result in identical UUIDs when input on different machines using
> _identical_ software. This would, IMO, support the contention that
> uniqueness and universality is restricted to a single machine and
> application on that machine.

UUID = Universal Unique ID. If two machines generate the same string,
then they have blown the Unique criteria. It is not intended to be a
tag that is common to distinct items that happen to be identical.
That would be a checksum. :-)

Richard Smith

unread,

May 27, 2011, 8:47:33 PM5/27/11

to

On May 27, 3:53 pm, Bob Melson <amia9...@mypacks.net> wrote:

> Seems to me at least one requirement for universality is that the same data
> input on different machines results in the same output on those machines.

No, you're misunderstanding the meaning of the 'universal' in a UUID
(or 'global' in a GUID, if you prefer the Microsoft terminology). The
guarantee that a UUID provides is that every time you generate one, it
will unique across all databases, past, present or future, on any
computer, anywhere in the world. UUIDs are not necessarily generated
for any particular data, though in practice in a genealogical
application, they may well be associated with a piece of data, such as
a person, event or place. What you're asking for is not possible with
a UUID, partly because there's no concept of generating a UUID for a
specific piece of data -- there is no input when you generate a UUID.

What you're asking for sounds more like a hash. This is where you
take everything you know (or perhaps just a specific part of the
data), and use it to generate a number -- the hash -- which can be
used as an identifier. Each time you have the same input, you'll get
the same hash out.

But that's not actually very useful for genealogy. We've all got
individuals on our family trees about whom we know very little, and
what we do know is poorly documented. Someone in the family (but you
can't remember who) said that second-cousin Bob had a nephew called
John Smith who lived in London. All we know of this John Smith is
that he lived in London. That must describe hundreds or thousands of
different people. How do we ensure that the genuinely different John
Smiths end up with different identifiers, while also ensuring that two
different researchers without co-operating or even mutual knowledge of
each other, can end up assigning the same identifier to the same
individual even if they both have exactly the same information?

It can't be done. It's all well and good saying we'd like it, but
it's a technical impossibility. Even with the researchers co-
operating in generating identifiers (for example, by using some
central internet-based generator), it can't be done because "John
Smith in London" simply isn't a unique handle.

So we must compromise. Either we lose uniqueness -- that is, accept
that two different people might sometimes get assigned the same
"unique" identifier. Or we lose repeatability -- that is, accept that
sometimes the same data will lead to a the same people, with the same
known information, being assigned multiple identifiers. In the former
case, a hash is a good implementation strategy; in the latter, a UUID
is good. Or we may decide that because we've lost at least one of
these guarantees, we may as well lose both an go for a simpler
implementation, such as the xrefs used in GEDCOM (these are the I0001-
type things).

Richard

Bob Melson

unread,

May 27, 2011, 10:28:20 PM5/27/11

to

On Friday 27 May 2011 18:22, Wes Groleau (Grolea...@FreeShell.org)
opined:

I'll answer this and the one immediately previous with this.

To start, I don't want to re-ignite the previous discussion regarding *IDs,
aside from saying that it seems to me that a {Globally|Universally} Unique
ID should indeed be unique everywhere; given the identical input even on
different machines; the resulting ID should, or so it seems to me, be
identical but unique to that input. This is not "blowing" the unique
criteria, any more than identical checksums derived from identical strings
on different machines "blows the checksum criteria". Matter of fact, I
think a checksum would be a helluva lot more useful than a *ID when you
come right down to it.

All that said, it's more than likely that I have a faulty understanding of
what "globally unique" or "universally unique" actually mean. Based
strictly on the meanings of the words, though ... The ID produced on my
machine uniquely identifies a record ON my machine but is otherwise of no
value and, to my mind, appears to be redundant as there are other record
identifiers that "uniquely" identify that record. That same record on
another machine will produce another unique ID, different from the one
produced on my machine and valid only on the machine producing it. Go to
a third (or a fourth or an Nth) machine with an identical record and
you'll get a 3d or 4th or Nth ID, different from all others and, IMO,
valueless for identifying the record anywhere except on the machine on
which the ID is produced. The end result is that we have N records with N
IDs, all unique, and none of 'em (the IDs) useful for any discernible
purpose.

So, will SOMEbody please 'splain me this thing called a
globally/universally unique ID and its place in the grand scheme of
things?

Stumped Ol' Bob

Wes Groleau

unread,

May 28, 2011, 12:03:46 AM5/28/11

to

On 05-27-2011 22:28, Bob Melson wrote:
> So, will SOMEbody please 'splain me this thing called a
> globally/universally unique ID and its place in the grand scheme of
> things?

UUID / GUID were not created by genealogists to identify multiple
definitions of a person. They were created by computer types to
distinguish between two things that can't be told apart any other way.
Kind of like a serial number.

A checksum on the other hand, is a way to be almost certain that
two items have or don't have identical contents without actually
comparing byte by byte. Or to verify that the item has or hasn't
changed since the checksum was generated.

If a record on your machine and one on my machine have identical
UUIDs, then either one of them was copied from the other (NOT
independently generated) or one of us (or our software) was naughty and
altered a UUID. If the UUIDs match and the items do not, then
either someone changed the UUID on another record, or changed the record
without giving it a new UUID. Well-behaved software will not
gratuitously change a UUID. But lots of programs will fail to create a
new UUID when the item ceases to be a copy of the other. That, in my
opinion makes them useless for genealogy. But they were never intended
to be some magic way of automatically identifying independently
generated records as being representations of the same entity.

Bob Melson

unread,

May 28, 2011, 12:57:57 AM5/28/11

to

On Friday 27 May 2011 22:03, Wes Groleau (Grolea...@FreeShell.org)
opined:

> On 05-27-2011 22:28, Bob Melson wrote:
>> So, will SOMEbody please 'splain me this thing called a
>> globally/universally unique ID and its place in the grand scheme of
>> things?
>

Thanks. It's about what I expected - and took away from the previous
exchange.

> UUID / GUID were not created by genealogists to identify multiple
> definitions of a person. They were created by computer types to
> distinguish between two things that can't be told apart any other way.
> Kind of like a serial number.

>
> A checksum on the other hand, is a way to be almost certain that
> two items have or don't have identical contents without actually
> comparing byte by byte. Or to verify that the item has or hasn't
> changed since the checksum was generated.

And here you have it. My mind insists that *IDs _should_ be like some sort
of super checksum, with the same "record" resulting in the same *ID no
matter _where_ the record might be found. Dunno if it'd be any more
useful than a *ID, but it certainly couldn't be any _less_ useful. (And
the pesky critter - my mind, that is - insists there's gotta be a
reasonable use for *IDs beyond taking up space in a database.)

>
> If a record on your machine and one on my machine have identical
> UUIDs, then either one of them was copied from the other (NOT
> independently generated) or one of us (or our software) was naughty and
> altered a UUID. If the UUIDs match and the items do not, then
> either someone changed the UUID on another record, or changed the record
> without giving it a new UUID. Well-behaved software will not
> gratuitously change a UUID. But lots of programs will fail to create a
> new UUID when the item ceases to be a copy of the other. That, in my
> opinion makes them useless for genealogy. But they were never intended
> to be some magic way of automatically identifying independently
> generated records as being representations of the same entity.

Then why bother with 'em?

I won't further belabor the point. Kinda like the Kipling poem "East is
East and West is West, and never the twain shall meet, 'til earth and sky
meet presently at God's great Judgement Seat", except here we have GUID
and UUID and ...

>
> --
> Wes Groleau
>
> There are two types of people in the world …
> http://Ideas.Lang-Learn.us/barrett?itemid=1157

Sighin' Ol' Bob

Peter J. Seymour

unread,

May 28, 2011, 4:14:20 AM5/28/11

to

On 2011-05-28 05:03, Wes Groleau wrote:
> On 05-27-2011 22:28, Bob Melson wrote:
>> So, will SOMEbody please 'splain me this thing called a
>> globally/universally unique ID and its place in the grand scheme of
>> things?
>
> UUID / GUID were not created by genealogists to identify multiple
> definitions of a person. They were created by computer types to
> distinguish between two things that can't be told apart any other way.

> .....

But lots of programs will fail to create a
> new UUID when the item ceases to be a copy of the other. That, in my
> opinion makes them useless for genealogy. But they were never intended
> to be some magic way of automatically identifying independently
> generated records as being representations of the same entity.

.....

I have never used UUIDs (which are not the same as mere gedcom ids) but
I have always regarded them as a sort of Export Id so that for instance
you can send a bunch of records all marked with UUIDs, let the recipient
edit them as they see fit, and then send them back to you. The point is
that they can edit anything they like except the UUIDs, so that
data-wise the returned records could be difficult to identify except
that you have those unique ids matching those in your original records.
So basically the ids let you recognise records that you had generated,
and also recognise records that were not generated by you. And also
records that, although data-wise are now different, started out as the
same record. It does not matter if programs mess around with gedcom ids
as long as the uuids are unaltered. The uuids are not for identifying
different instances of the same person as such, although they may
achieve this in practice.
The idea that these ids are for uniquely identifying every individual
throughout history seems to me pointless and unworkable.
Probably the only way to obtain truely unique ids is to have them
generated by one central worldwide server. Failing that, unique ids
across a restricted community such as two individuals may be obtained by
having some sort of computer or personal id incorporated into the unique
id. You need this arrangement because that other person might also be
similarly sending you records for attention.
As I said I have never used them, but the above seems to cover the
essential points.
Peter

Richard Smith

unread,

May 28, 2011, 7:14:35 AM5/28/11

to

On May 28, 3:28 am, Bob Melson <amia9...@mypacks.net> wrote:

> To start, I don't want to re-ignite the previous discussion regarding *IDs,
> aside from saying that it seems to me that a {Globally|Universally} Unique
> ID should indeed be unique everywhere; given the identical input even on
> different machines; the resulting ID should, or so it seems to me, be
> identical but unique to that input.

No. A UUID (or GUID if you prefer) is a specific piece of existing
technology. It is not a generic term that can be redefined at will.
It may or may not have been given the best name, but we're stuck with
the name now. We can argue all day about whether the phrase
"universally unique" carries the expectation that given the same
input, it will produce the same ID. But the standards defining UUIDs
are quite specific: the sole guarantee that it provides is that every
time you generate a new UUID, it will be different, even if the input
is the same. For some applications that's useful; for others that may
not be what you want, and if so, a UUID is probably the wrong piece
technology to use.

Richard

singhals

unread,

May 28, 2011, 10:42:49 AM5/28/11

to Grolea...@freeshell.org, gen...@rootsweb.com

Wes Groleau wrote:
> On 05-27-2011 03:55, Tom Wetmore wrote:
>> You were right and should have stuck by your guns!!
>
> I may be misremembering, but I think the disagreement was that someone,
> perhaps Bob, thought that the UID would be a universal ID for a _person_
> which could be used for that person in all databases. Others stated,
> no, it distinguihses a _record_ from other records that otherwise might
> seem identical.
>
>> I don't know where the others got the interpretation that a UID should
>> be unique only within one database. This is certainly not a GEDCOM rule
>
> I don't recall seeing that interpretation.
>

PAF's help files specifically state that; Legacy's help
files /imply/ it.

From PAF's help file:
PAF 5.0 assigns each record in the .paf file a unique record
serial number. Like a record identification number (RIN), a
Unique record serial number identifies each individual in a
.paf file. Unlike a RIN, a unique record serial number is
unique worldwide. This mean that each individual in each of
the .paf files has a different number than all the other
individuals in all other .paf files that are made worldwide.
(para) The number does not change if you export a GEDCOM...

Legacy's help file mentioned (UIDs) indirectly under
Intellishare: One person in the group is designated the
"Keeper of the Records" (Keeper for short). This person
keeps the master Family File. Legacy automatically marks
all the individual records in the Master Family File with a
serial number that uniquely identifies each person ...

Note that both refer to it not as an ID but as a record
serial number.* That distinction virtually guarantees it
has to be within a single database; as an analogy, Xcel and
QPro wouldn't know whether line 564 in "Monthly Expenses"
came before line 564 in "Appliance Repair", so why should
and how would PAF/Legacy/whatever know whether RIN 534 in
"McCOY.paf" came before or after RID 534 in "Hatfield.fdb"?

Cheryl

*Which, yes, inspires one to wonder why they didn't CALL it
a URN or USRN or USN ...

singhals

unread,

May 28, 2011, 10:51:09 AM5/28/11

to gen...@rootsweb.com

AIUI at the time -- they were intended to be used in a
multi-person data-entry/research project which has a
master/main/official database. The database "keeper" to use
Legacy's term, shares the GED from that database with all
researchers, who can modify or extend the data as they see
fit before returning it to the "keeper". The "keeper" then
uses the UIDs to find who changed or added what, without
having to plod line by line through either the GED or the
database.

> I won't further belabor the point. Kinda like the Kipling poem "East is
> East and West is West, and never the twain shall meet, 'til earth and sky
> meet presently at God's great Judgement Seat", except here we have GUID
> and UUID and ...

(G)

Cheryl

singhals

unread,

May 28, 2011, 10:52:55 AM5/28/11

to haye...@yahoo.com, gen...@rootsweb.com

Steve Hayes wrote:
> On Thu, 26 May 2011 15:03:39 -0400, singhals<sing...@erols.com> wrote:
>
>> Steve Hayes wrote:
>>> On Wed, 25 May 2011 21:36:21 -0400, in soc.genealogy.computing you wrote:
>>>
>>>> singhals wrote:
>>>>> PAF produces a UID for each entry; I'm pretty certain other
>>>>> programs do to, particularly the ones that allow
>>>>> synchronization of multiple databases.
>>>>
>>>> Well color me surprised.
>>>>
>>>> PAF and Legacy both assign a UID, which one can see in the GED.
>>>>
>>>> Oddly enough, and despite what I've heard said, when I
>>>> imported the PAF file into a LEGACY file (direct import, not
>>>> GED), the UIDs changed. Seems to me that's a bit awkward
>>>> for someone trying to use those UIDs for anything.
>>>
>>> What is a UID, and where can I find it?
>>>
>>
>> Unique IDentifier -- new with PAF4 as I recall, and I think
>> inspired by another program that used them. As I recall, we
>> were told that if two people used two computers to enter the
>> same data each computer would generate a different UID for
>> that person. I don't /think/ you can see them anywhere but
>> the GED. The computer can, and in PAF when you run the
>> match/merge routine, it asks if you want to use UID.
>
> When I do that it asks me if I want to us the Ancestral File Number.
>

In PAF 5 the UID is immediately below the AFN on that screen.

> Thatr works quite well if the person has an Ancestral File Number, but those
> numbers have now been discontinued, and no new ones are being generated.
>

GED is your friend. (G) If you can see it in a GED, you can
edit it. Copy/paste the UID into a field you can see/sort on.

Cheryl

singhals

unread,

May 28, 2011, 11:07:12 AM5/28/11

to gen...@rootsweb.com

Tom Wetmore wrote:
> This thread is an offshoot from the Linux thread that is going off on a number of tangentsl.
>
> How should we store evidence in genealogical databases?
>

> You get a marriage record in the mail; you find an image of a census record at Ancestry.com; you find the record of an event on a page in a book you found on Google books. What are you going to do with those three records? Here are some possible answers.
>
> First, if you are careful genealogists, you're going to record the source of the records in your database as source records. Got that out of the way.
>
> Second, as far as the "physical records" are concerned, let's say you carefully file the paper marriage record away in your paper filing system, and you go to your big ancestry folder area on your computer and keep copies of those two images. Dandy.
>
> Now, what are you going to do with the information in those three physical records (let's say we can call those image files "physical" for sake of argument).
>
> Here's the "normal" answer in my opinion. You look at the physical records, you decide who the persons were who are mentioned in those records, you go you your genealogy program and you find the appropriate person records, creating them if need be, and you edit in the new information. In other words you extract information from the physical records and you add that information directly to person records. Note that the information from the physical records only enters into your database as items inside person records.
>
> Here's another possibility advocated by some genealogists. After you create the source records for where the physical records came from, you edit those source records, adding to them the information that you got from those sources that you believe is important. You probably have to do this as "unstructured notes." Then you link persons to those sources and you also "copy up" from the stuff you added to the source records into the person records.
>
> Here's another possibility advocated by programs like Gramps for Family Tree Maker. You first create event records from information in the physical records, say a birth or death or marriage events, and then you add a link from some person in your database, creating that person record if need be, to that event record. The events really don't stand alone; you have to link person records to them.
>
> All these techniques work fine while you are in the realm of "person-based genealogy" or "conclusion-based genealogy". When in this realm you either already know whom the people are that you are researching, or you have such a solid vital record and other record trail back to them that you can be sure whom you are researching. You know whether any particular record belongs to a person you are researching or not; you ignore the records that don't, and you simply copy information out of the records that do. In my opinion 98% of the genealogical software is devoted to people working in this mode.
>

I should think we are all always in the realm of
person-based genealogy; otherwise, we could fill in our
pedigree charts with position-titles (great-grandmother,
great-great-grandfather, etc) instead of names of people who
hold that position/title. That we each HAVE 8th
great-grandparents doesn't need further proving; mathematics
does that for us.

> Eventually every genealogist reaches the point when he or she has delved far enough back in time that the solid, firm trail of records has dried up. When we reach this point our task changes from one of simply elaborating persons we know or can learn about easily, to one of true historical research. We embark on the chore of trying to find whatever sources we can, from whatever creative recesses of our minds or experience takes us. From the sources we manage to find, we have to keep whatever information that mentions people that might eventually be of interest to us, and we must record that information somehow so we can continually be able to refer to it. We have faith that at some time in the future we will have found enough records that we'll be able to figure out who all those people are and how they are related. At that time, maybe far in the future, maybe after many serendipities in our record searching, we'll be able to finally create new persons in our database and
ad!
> d the hard fought information to them.
>

If I've understood parts of the discussion so far, it seems
to me that more-than-one of the participants is suggesting
we can NEVER be certain that recordA about Tom Wetmore
refers to the same Tom Wetmore as recordB. How the
researcher stores, organizes, accesses, or displays either
record won't change that. That makes HOW to store,
organize, access, or display sort of like nailing Jello to
the wall: certainly do-able if you work hard enough, but
just as certainly futile in the long run.

> When we reach this point we are in the realm of "record-based genealogy." This has been described as "crossing a chasm." We are now true historians. We must collect lots of records, but we don't know yet whom they belong to.
>

As someone famous once said, now there you go again. ALL
genealogy is record-based, if one uses the word record as a
synonym for the word source. As someone else famous keeps
saying, genealogy without sources is myth. HOW one
organizes, stores, accesses or displays those source-records
doesn't change the fact that we all need them to de-myth our
genealogy.

> What are you going to do with this evidence? If you use some of the approaches above you're kind of stuck. You can add paper copies to your files, or images files to your computer, but what else are you going to do? There are no people records around to stick them to. You can bloat source records with notes, but how can you find any of that unstructured info in the future?
>

It's very difficult to follow about persona vs person
sometimes. Evidence does not exist in a vacuum any more
than people do. IF there's people, there's evidence; one
can always remove references to evidence if one can
conclusively prove it's irrelevant. IME, though, if I don't
SAY I looked hither, thither and yon, someone will suggest I
missed thither. So putting it in and then debunking it is
easier on my nerves. To debunk it, I need to attach it to
who it might but doesn't concern.

> To do your research effectively, to be able to reason about the data you've collected, you have to have some way of finding the information and arranging it. Are you going to do this by spreading sheets of paper on your desk, keeping lots of windows to image files open on your computer, taking lots of notes on 3x5 cards, sketching out possible family groups with paper and pencil?
>

Yes.

> Wouldn't you want all that evidence information codified somehow inside your genealogy program so you can search for names, search for dates, search for places, see the relationships mentioned in the evidence, and so on? How would you want your genealogy application to support you after you have "crossed the chasm?"

No. I want all that evidence raw if I need to re-evaluate
it. I don't want it filtered, as it will be by extracting
bits of it to codify. As someone mentioned, the bit you
don't extract may be the only important bit in there. And
creating a database for the purpose of comparing documents
when the documents to be compared are NOT identical in
content is going to drag AI into it; I'm not up to speed on
the current state of AI, but I know as recently as Y2K, it
couldn't make the same sort of judgments a people can
because people STILL use unconscious, subconscious, and even
subliminal info in analysis.

Cheryl