In anycase, I pulled some data for a surname of current interest to
me (Huston). I was sort of surprised at the entry, as it showed one
of the hits as having been baptized four months before they were
born. Very good planning on the parents part, I guess.
So I went checking for another source of data for this, and found
that Christ Church has a lot of their data on-line. Same problem
with Ancestry, you have to pick who you want to look at, and pull up
data for only them. But you can use it by hunting and pecking.
it was quickly clear that Ancestry or Christ Church had a number of
errors in their data base.
Which brought me back to a long standing question that I periodically
look at---the error rate in old genealogical sources, and in the
modern transcriptions.
This isn't exactly specific to Ancestry, but at least I started out
with an Ancestry issue. But I thought some of the folks on this list
might be able to make some suggestions for me.
At anyrate, I've taken a look at another set of Church records---that
of the First Presbyterian in Philadelphia. These records were
originally transcribed by Linn and Egle in 1880. Subsequently Joe
Patterson and Judy Banja re-transcribed the Linn and Egle
Transcription, and placed the material on line.---Roughy 1700 records.
I used the Patterson and Banja Data to see if I could get at an error
rate for Linn and Egle. I assumed that Patterson and Banja's data
were a perfect transcription (It isn't, I know, but I think their
methodology was good, and it looks to me like there are very few
errors attributable to them---almost all look like they are embedded
in Linn and Egle's transcription.
Briefly, my approach was to take the separate entries for husband and
wife (Linn and Egle apparently made two separate transcriptions which
they merged, one with the husbands name first, the other with the
wife's name first.) and compare them with each other to see how many
differences there were. For simplicity I'll call this the "error
rate"---its not really that, but its easier to say "error rate" than
"differences between husband and wife records".
The error rate I found is not especially surprizing, and is
comparable with what I've seen in other records----
When I looked at the dates of marriage alone, you got an 8.7% error
rate.
When I looked at differences in recorded names (e.g., mostly spelling
differences, but some quite worse), the error rate was 8.2% (husband)
and 9.6% (wife).
I haven't looked at an overall error rate, but if you assume that
there was only one error per record, you'd get an overall error rate
of about 25%. The real overall is less than that, as you will on
occassion get simultaneous errors in husbands name, wife's name, and
DOM, but the overall rate is probably closer to 25% than it is to 10%.
At anyrate, I've place a preliminary summary of my findings on WeRelate
its at
http://www.werelate.org/wiki/Data:Marriages_First_Presbyterian_Church%
2C_Philadelphia%2C_1702-1746#Transcription_Error
One of my interests is in annual and seasonal variations in marriage
rates, and i've used these data to look at this. the Error rate data
is at the end of the article.
I'd like to get some feedback from you to see if how I might best
further revise this article.
Thanks
Bill
AKA "Q" on WeRelate