How...

J. Hugh Sullivan

unread,

May 9, 2012, 2:17:45 PM5/9/12

to

...do people with 50,000 names, plus or minus, keep everything
consistent?

Is County always Co., is every state 2 capital letters, has the
superfluous comma between city and state been removed - and I could go
on.

Keeping all that straight with only 7,000 names is tough but seems to
me that not using consistent standards results in sloppy work.

Hugh

Tom Wetmore

unread,

May 9, 2012, 7:41:45 PM5/9/12

to

I enter the data in my preferred format. Do you do something else?

Tom Wetmore

Peter J. Seymour

unread,

May 10, 2012, 3:15:54 AM5/10/12

to

By being alert to the formatting principles and taking the trouble to
apply them. But the question is, why does it matter? I'm not saying it
doesn't. I try to be as tidy as anyone. But what practical purposes are
served by being consistent with the formatting? The answer helps with
understanding what is really needed.

shmar...@ticnet.com

unread,

May 10, 2012, 7:02:18 AM5/10/12

to

As others have said, I *try* to be consistent in my data entry. Not always successfully. What I have done occasionally is generate a GEDCOM file and then bring it up in a test editor. I can then directly edit locations, dates, names, etc. On the extremely rare occasions that I have merged someone else's GEDCOM into mine I can quickly add an identical NOTE to every individual telling where I got the information.

Ian Goddard

unread,

May 10, 2012, 11:47:31 AM5/10/12

to

One factor which would help would be the ability to reuse locations.
Gramps, for instance, has a Places section. Any dialog which requires a
location to be filled in gives the opportunity to select a Place which
has been entered already or to enter a new one. If multiple events
share the same Place entry then this cuts down the data entry.

It would certainly be possible to improve on this. For instance it
ought to be possible to take advantage of the fact that places are
hierarchical so that one could, for instance, enter lists of counties,
parishes, etc. & select from the list.

Another improvement would be to download and import geolocation
databases, e.g. http://www.opengeocode.org/download/cow.php to
pre-populate the location database in the genealogy app.

--
Ian

The Hotmail address is my spam-bin. Real mail address is iang
at austonley org uk

J. Hugh Sullivan

unread,

May 10, 2012, 1:30:20 PM5/10/12

to

We don't start off being alert to formatting principles - we learn as
we go. So how we enter changes over the years is a variable until we
actually have a standard.

I think it matters for several reasons: How can you trust someone who
does sloppy work? Doesn't it bother you when California is CA, Cal,
Calif. or County is Co., Cty., County? In such a case California would
be four locations on the Master Location List when it should only be
one. How, 'bout Nash Co., Nash Co. NC and Nash Co. NC Vital Statistics
- plus spell out county and North Carolina. In my records they were
multiple sources but now only one (and properly so).

Hugh

Denis Beauregard

unread,

May 10, 2012, 2:47:29 PM5/10/12

to

Le Wed, 09 May 2012 18:17:45 GMT, Ea...@bellsouth.net (J. Hugh
Sullivan) écrivait dans soc.genealogy.computing:

I would say it depends at least on the purpose and the tools you
have.

The purpose : standardization will improve the quality of the product
if distribute it (as a commercial site or database or to a family
society, etc.).

The tools : I developped my own softwares for genealogy. 20 years ago,
I was using a DOS based text editor with this strict format :

1) Given_name, born code date, baptized code date, married code
date person, dead code date, buried code date.

Code was always a 4 or 5 characters string. 2 letters for a Quebec
county, 1 letter and 2 digit for a county in some provinces or
states (i.e. M27 in Mass.), or some area (i.e. F01 for France and
departement), and 2 letters for the place (town, parish, township).

That way, I could check my data for some mistakes, and extract to
files when looking for records in some area (i.e. extracting all
records for a parish when looking at records of that parish).

Today, I am transfering every thing to a libreoffice set of files.
I still have codes for places (while they are different). I check if
the code is correct, but also the format of dates (help to find
some mistakes), the sequences of dates (must be born then married),
etc.

Denis

--
Denis Beauregard - généalogiste émérite (FQSG)
Les Français d'Amérique du Nord - www.francogene.com/genealogie--quebec/
French in North America before 1722 - www.francogene.com/quebec--genealogy/
Sur cédérom à 1780 - On CD-ROM to 1780

Peter J. Seymour

unread,

May 10, 2012, 3:50:29 PM5/10/12

to

De-duping address (and part address) entries by reducing them to a
standard form can certainly be beneficial provided that one understands
that two identical part addresses are not necessarily refering to the
same property. I find that on importing a large gedcom, de-duping the
addresses can result in discarding a couple of thousand entries, making
the data so much simpler without loss of accuracy.

Peter

Peter

J. Hugh Sullivan

unread,

May 10, 2012, 5:10:36 PM5/10/12

to

On Thu, 10 May 2012 16:47:31 +0100, Ian Goddard
<godd...@hotmail.co.uk> wrote:

>J. Hugh Sullivan wrote:
>> ...do people with 50,000 names, plus or minus, keep everything
>> consistent?
>>
>> Is County always Co., is every state 2 capital letters, has the
>> superfluous comma between city and state been removed - and I could go
>> on.
>>
>> Keeping all that straight with only 7,000 names is tough but seems to
>> me that not using consistent standards results in sloppy work.
>
>One factor which would help would be the ability to reuse locations.
>Gramps, for instance, has a Places section. Any dialog which requires a
>location to be filled in gives the opportunity to select a Place which
>has been entered already or to enter a new one. If multiple events
>share the same Place entry then this cuts down the data entry.

It's certainly available with current programs along with sources,
surnames, addresses... But switching programs and data entry 15+ years
ago might not catch up automatically. I thought I ought to.

I have already gone through locations and sources to make sure there
were no duplicate entries and no unused ones. Changing all states to 2
letter caps is easy.

Hugh

J. Hugh Sullivan

unread,

May 10, 2012, 5:17:53 PM5/10/12

to

On Thu, 10 May 2012 14:47:29 -0400, Denis Beauregard
<denis.b-at-f...@fr.invalid> wrote:

>Le Wed, 09 May 2012 18:17:45 GMT, Ea...@bellsouth.net (J. Hugh
>Sullivan) écrivait dans soc.genealogy.computing:
>
>>...do people with 50,000 names, plus or minus, keep everything
>>consistent?
>>
>>Is County always Co., is every state 2 capital letters, has the
>>superfluous comma between city and state been removed - and I could go
>>on.
>>
>>Keeping all that straight with only 7,000 names is tough but seems to
>>me that not using consistent standards results in sloppy work.
>
>I would say it depends at least on the purpose and the tools you
>have.
>
>The purpose : standardization will improve the quality of the product
>if distribute it (as a commercial site or database or to a family
>society, etc.).

No one has my data base - I think I'm just trying to satisfy myself
and it turned out to be a lot of work. I need to pass it before very
long but I want to pass it to someone as serious as I am.

I don't understand the purpose of coding when the proper word would
seem to suffice.

>Today, I am transfering every thing to a libreoffice set of files.
>I still have codes for places (while they are different). I check if
>the code is correct, but also the format of dates (help to find
>some mistakes), the sequences of dates (must be born then married),
>etc.

What about a child born before marriage?

Have you used MS Word? If so are you really satisfied with
libreoffice? Is it absolutely compatible? Does it have drop down menus
like 2003 or the stupid ribbon like 2010?

Hugh

J. Hugh Sullivan

unread,

May 10, 2012, 5:27:51 PM5/10/12

to

On Thu, 10 May 2012 20:50:29 +0100, "Peter J. Seymour"

Agreed.

>I find that on importing a large gedcom, de-duping the
>addresses can result in discarding a couple of thousand entries, making
>the data so much simpler without loss of accuracy.

Agreed again - except I never merge a gedcom. First I don't trust them
and no one has much I don't already have. So I mostly just look at the
family or direct line.

Hugh

Denis Beauregard

unread,

May 10, 2012, 7:53:55 PM5/10/12

to

On Thu, 10 May 2012 21:17:53 GMT, Ea...@bellsouth.net (J. Hugh
Sullivan) wrote in soc.genealogy.computing:

>On Thu, 10 May 2012 14:47:29 -0400, Denis Beauregard
><denis.b-at-f...@fr.invalid> wrote:
>
>>Le Wed, 09 May 2012 18:17:45 GMT, Ea...@bellsouth.net (J. Hugh
>>Sullivan) écrivait dans soc.genealogy.computing:
>>
>>>...do people with 50,000 names, plus or minus, keep everything
>>>consistent?
>>>
>>>Is County always Co., is every state 2 capital letters, has the
>>>superfluous comma between city and state been removed - and I could go
>>>on.
>>>
>>>Keeping all that straight with only 7,000 names is tough but seems to
>>>me that not using consistent standards results in sloppy work.
>>
>>I would say it depends at least on the purpose and the tools you
>>have.
>>
>>The purpose : standardization will improve the quality of the product
>>if distribute it (as a commercial site or database or to a family
>>society, etc.).
>
>No one has my data base - I think I'm just trying to satisfy myself
>and it turned out to be a lot of work. I need to pass it before very
>long but I want to pass it to someone as serious as I am.
>
>I don't understand the purpose of coding when the proper word would
>seem to suffice.

My database was a text, without columns. So I couldn't have the same
effect as a typical database or spreadsheet, that is a column for
the place, one for the area, etc. Moreover, I wanted to extract data
by area (counties or states when I was looking at state archives),
and this means the 1st part of the code must be the top level area.
Most people will use place, county, state, country, not the opposite.
Also, in this project, I needed to isolate items as words separated
by spaces, which meant no space inside the place item.

>>Today, I am transfering every thing to a libreoffice set of files.
>>I still have codes for places (while they are different). I check if
>>the code is correct, but also the format of dates (help to find
>>some mistakes), the sequences of dates (must be born then married),
>>etc.
>
>What about a child born before marriage?

I have a code for that. Actually, I have 2 possibilities : 1 is
to use a special code to tag them, the other (I use) is to identify
them when checking for mistakes. But I keep the special code to see
it when looking at my data.

>Have you used MS Word? If so are you really satisfied with
>libreoffice? Is it absolutely compatible? Does it have drop down menus
>like 2003 or the stupid ribbon like 2010?

It is not (and likely will never) be absolutely compatible. I had
Office 97 on my computer for years (I switched from Win 98 to 7 last
year because my old computer couldn't work anymore). I found that
Office 97 was not compatible in Win7 with some other software I had
(Forte Agent) and I prefered to install OOO then LibreOffice for this
purpose. I tried another computer where MS Office 2010 was already
installed and could never start Forte Agent (email client) in that
computer, so I presume FA has some problem with any MS Office
version, at least on Win Vista and 7.

Nonetheless, I read from time to time complex Word files with LO and
sometimes, a complex structure will cause some issue (but it is not
common). Format can be different (I had to correct the margins when I
installed LO to print CD covers).

I found something amazing : I have Excel xls files (not xlsx) and I
may save it from LO, then read it from MS Excel, and when reading it
back, it says there is a link to another file (which is false). Or
one file couldn't be read back with OOO 3.3 but was readable with
OOO 2.4 and LO 3.4.

Behaviour is different. It has the menus, not the ribbon (I don't
like it). But if you drag and drop, Excel will detect not empty
cells (not OOO/LO), while if you cut and paste, OOO/LO will and
Excel won't !!! Some macros are not yet available in LO like in
Excel.

I used for years this sequence:

=INDEX([conversions.xls]Feuil1!$B$1:$B$1007;EQUIV(D30;[conversions.xls]Feuil1!$A$1:$A$1007;0);1)

Using a couple/individual number as a reference, I could import
from another file some information. For example, I had 4 columns and
I import parents for a series of individuals using the parent's number
as the reference. It won't work in LO. I can't use a column as an
argument (I need to define it from line 1 to 65000 for example,
instead of a whole column). If importing from another file, it can
idle (i.e. freeze). If importing from the same file, I have to repeat
the process because the lines are relative, i.e. if you copy the
formula from line 1 to line 10, the imported element will be moved
from say 1:5000 to 10:5009. So you have to adapt to the different
behaviour.

I have to chose between Forte Agent (all 15 years of emails with
data) or MS Office. I have now very few programs that have no Linux
equivalent so I may jump to Linux at some time (when it will have
the bug fix that said we have to mount/unmount CDs, USB etc. that
are removable media).

J. Hugh Sullivan

unread,

May 10, 2012, 9:33:23 PM5/10/12

to

On Thu, 10 May 2012 19:53:55 -0400, Denis Beauregard

I use Outlook (2003) for e-mail and Forte Free Agent for Newsgroups.
Both work well with 7 Ultimate and Home Premium.

Hugh

herman...@invalid.be.invalid

unread,

May 11, 2012, 4:17:52 AM5/11/12

to

Denis Beauregard wrote:

....snip a lot...

I have now very few programs that have no Linux
> equivalent so I may jump to Linux at some time (when it will have
> the bug fix that said we have to mount/unmount CDs, USB etc. that
> are removable media).

Hallo, what is that issue??? I've been using Mandriva Linux for years now,
and I cann't remember to have ever to mount removable media. The system does
that automatically.

Herman Viaene

--
Veel mensen danken hun goed geweten aan hun slecht geheugen. (G. Bomans)

Lots of people owe their good conscience to their bad memory (G. Bomans)

Denis Beauregard

unread,

May 11, 2012, 7:25:45 AM5/11/12

to

On Fri, 11 May 2012 10:17:52 +0200, herman...@invalid.be.invalid
wrote in soc.genealogy.computing:

>Denis Beauregard wrote:
>
>....snip a lot...
>I have now very few programs that have no Linux
>> equivalent so I may jump to Linux at some time (when it will have
>> the bug fix that said we have to mount/unmount CDs, USB etc. that
>> are removable media).
>
>Hallo, what is that issue??? I've been using Mandriva Linux for years now,
>and I cann't remember to have ever to mount removable media. The system does
>that automatically.

I have a somehow old version of Debian on my laptop (my main computer
has Windows 7) and it doesn't. I can't update it (there is a broken
link in the dependancies table and I don't know how to fix it so I
have an obsolete (2006 ?) Linux. When I want to read my USB files (to
transfer them to the laptop), I have to go to shell and to mount the
USB key.

I tried to install Mandriva or Mandrake years ago and it was not
working. Ditto for Kubuntu 2 years ago.

Tony Proctor

unread,

May 11, 2012, 11:21:54 AM5/11/12

to

"Ian Goddard" <godd...@hotmail.co.uk> wrote in message
news:a1268j...@mid.individual.net...

STEMMA (www.parallaxview.co/familyhistorydata) handles this by representing
each Place as part of a Place Hierarchy, e.g. [San Francisco, California,
US]. That's not unusual in itself but STEMMA takes this down to the level of
street and house/building if relevant. That means you have consistency of
the names, and their parentage (in the hierarchy), and you can attach other
data such as narrative to any level in the hierarchy.

On the down-side, this database of Places has to be constructed by hand at
the moment. The associated research-notes
(www.parallaxview.co/familyhistorydata/research-notes/persons-places) make a
case for having a 'Place Authority' on the Internet which can be
interrogated to get the data. It also suggests some England+Wales resources
that would have made a great start for a street-level Place Authority
related to our census returns.

Tony Proctor

J. Hugh Sullivan

unread,

May 11, 2012, 11:30:54 AM5/11/12

to

On Fri, 11 May 2012 01:33:23 GMT, Ea...@bellsouth.net (J. Hugh
Sullivan) wrote:

>>>Have you used MS Word? If so are you really satisfied with
>>>libreoffice? Is it absolutely compatible? Does it have drop down menus
>>>like 2003 or the stupid ribbon like 2010?

I just downloaded Kingsoft Office Suite. It looks much better that OO
and LO IMO.

Hugh

Message has been deleted

Wes Groleau

unread,

May 11, 2012, 11:20:57 PM5/11/12

to

On 05-10-2012 03:15, Peter J. Seymour wrote:
> what practical purposes are served by being consistent with the formatting

The OP said five thousand persons. Such a collection requires software
assistance, hence the value of consistency.

--
Wes Groleau

What kind of smiley is C:\ ?

Peter J. Seymour

unread,

May 12, 2012, 3:34:38 AM5/12/12

to

On 2012-05-12 04:20, Wes Groleau wrote:
> On 05-10-2012 03:15, Peter J. Seymour wrote:
>> what practical purposes are served by being consistent with the
>> formatting
>
> The OP said five thousand persons. Such a collection requires software
> assistance, hence the value of consistency.
>

And that is a good point. It may seem pedantic but the answer I would
want is still how does that help.
I already mentioned one way which is a potential reduction in the total
amount of data one has to peruse by removing duplicate but differently
spelt references.
I suppose the software itself does not need to be given consistency in
that given a reduction scheme it can achieve it for itself. That is
fairly easy with addresses for instance.
What I aim for is what data organisation features will help me more
easily understand the information encoded. I have to admit that careful
keying of the original data is an important part of that. For the
argumentative, we are well into the realms of altering evidence in
favour of a blatant interpretation.

Peter

Bob LeChevalier

unread,

May 12, 2012, 7:00:00 AM5/12/12

to

"Peter J. Seymour" <Newsg...@pjsey.demon.co.uk> wrote:

>On 2012-05-12 04:20, Wes Groleau wrote:
>> On 05-10-2012 03:15, Peter J. Seymour wrote:
>>> what practical purposes are served by being consistent with the
>>> formatting
>>
>> The OP said five thousand persons. Such a collection requires software
>> assistance, hence the value of consistency.
>>
>And that is a good point. It may seem pedantic but the answer I would
>want is still how does that help.
>I already mentioned one way which is a potential reduction in the total
>amount of data one has to peruse by removing duplicate but differently
>spelt references.

The obvious reason is to make it easier to search and select relevant
data.

I want to find everyone in my data base born in Alabama in the 1800s.
If my search has to include 5 different ways of representing the state
name, not to mention a bunch of place references that didn't even
include the state name, my search will not be easy.

>I suppose the software itself does not need to be given consistency in
>that given a reduction scheme it can achieve it for itself.

Software doesn't do anything for itself. It does (hopefully) what the
programmer told it to do.

Most people use canned software, none of which to my knowledge is
capable of finding all the place references in Alabama without my
specifying in detail how such references may be found. If the places
are not consistently formatted, I'm not going to be able to do that.

>That is fairly easy with addresses for instance.

Not if they aren't entered in a consistent format.

>What I aim for is what data organisation features will help me more
>easily understand the information encoded.

I'm more interested in being able to find the information first.

>For the
>argumentative, we are well into the realms of altering evidence in
>favour of a blatant interpretation.

Who is altering evidence? If I index a record and use standardized
spelling and formats, I haven't changed the record. I may have made
it easier to find, though. If I really care to check the
interpretation that I or someone else made, I can go back and look at
the original. But if I don't know that the original exists and is
relevant because it wasn't usefully indexed, that isn't going to
happen.

lojbab
---
Bob LeChevalier - artificial linguist; genealogist
loj...@lojban.org Lojban language www.lojban.org

J. Hugh Sullivan

unread,

May 12, 2012, 10:12:21 AM5/12/12

to

On Sat, 12 May 2012 07:00:00 -0400, Bob LeChevalier
<loj...@lojban.org> wrote:

>Who is altering evidence? If I index a record and use standardized
>spelling and formats, I haven't changed the record. I may have made
>it easier to find, though. If I really care to check the
>interpretation that I or someone else made, I can go back and look at
>the original. But if I don't know that the original exists and is
>relevant because it wasn't usefully indexed, that isn't going to
>happen.

You make an excellent case. Speaking of not altering a record, I have
found Sullivan spelled 153 ways on official records so far. I don't
have 153 Sullivan surnames in my program, I have one.

Hugh

Graham

unread,

May 12, 2012, 6:09:18 PM5/12/12

to

I thought my Doupain's with 29 different spellings would put me right up
there...but compared to 153...I'm just an amateur.

Graham

J. Hugh Sullivan

unread,

May 12, 2012, 8:51:31 PM5/12/12

to

On Sun, 13 May 2012 08:09:18 +1000, Graham <gra...@bigpond.net.au>
wrote:

Truth be told a number of the variations are caused by the poor
spelling of transcribers. But, if one isn't familiar with a name, that
can happen.

I expect you find, as I did, that 200 or more years ago spelling was
really phonetics.

The most interesting one I found is on adjacent tombstones for husband
and wife in AL - Sullivan and Sullivant. Maybe they argued a lot.

Hugh

Ian Goddard

unread,

May 13, 2012, 5:06:51 AM5/13/12

to

The ideal solution would be to preserve the text of the original record
as found and link the record to a standard set of localities. Exactly
the same as the Persona/Person structure to deal with people.

Locations have their own problems, however as the same name can apply to
different levels of a hierarchy. If the document simply records the
answer to a a question such as "where were you born?" and the reply was
"New York" could you be sure whether he meant the city or the state?

J. Hugh Sullivan

unread,

May 13, 2012, 10:51:31 AM5/13/12

to

On Sun, 13 May 2012 10:06:51 +0100, Ian Goddard
<godd...@hotmail.co.uk> wrote:

>Locations have their own problems, however as the same name can apply to
>different levels of a hierarchy. If the document simply records the
>answer to a a question such as "where were you born?" and the reply was
>"New York" could you be sure whether he meant the city or the state?

>Ian

A good example of that is the formation of counties. I have found
people who lived in as many as three counties without moving.
Brunswick and Nansemond Cos. VA are examples. The pie-shaped wedge of
Nansemond that dipped into Bertie Co. NC became part of Bertie when
the state line was straightened.

Hugh

J. Hugh Sullivan

unread,

May 14, 2012, 8:33:00 PM5/14/12

to

Yes, but my preferences have improved over the last 18 years.

Hugh

J. Hugh Sullivan

unread,

May 14, 2012, 8:34:08 PM5/14/12

to

On Thu, 10 May 2012 04:02:18 -0700 (PDT), shmar...@ticnet.com wrote:

>On Wednesday, May 9, 2012 1:17:45 PM UTC-5, J. Hugh Sullivan wrote:
>> ...do people with 50,000 names, plus or minus, keep everything
>> consistent?

>>=20

>> Is County always Co., is every state 2 capital letters, has the
>> superfluous comma between city and state been removed - and I could go
>> on.

>>=20

>> Keeping all that straight with only 7,000 names is tough but seems to
>> me that not using consistent standards results in sloppy work.

>>=20
>> Hugh
>
>As others have said, I *try* to be consistent in my data entry. Not always=
> successfully. What I have done occasionally is generate a GEDCOM file and=
> then bring it up in a test editor. I can then directly edit locations, da=
>tes, names, etc. On the extremely rare occasions that I have merged someon=
>e else's GEDCOM into mine I can quickly add an identical NOTE to every indi=

>vidual telling where I got the information.

You can do that in Legacy directly in the program.

Hugh

Jack

unread,

May 16, 2012, 4:30:45 AM5/16/12

to

"J. Hugh Sullivan" <Ea...@bellsouth.net> wrote in message
news:4fb1a454...@news.eternal-september.org...

Good software helps a lot.
I have checked my master location list some times.
But I have also kept old style spelling, at least in some cases. Somehow I
like it.

Get legacy :
https://www.legacyfamilytreestore.com/SearchResults.asp?Cat=1&Click=1192

It also pinpoint and plot important locations in ancestors' lives from
within Legacy.

J. Hugh Sullivan

unread,

May 16, 2012, 10:17:48 AM5/16/12

to

On Wed, 16 May 2012 11:30:45 +0300, "Jack" <no...@INVALIDmail.com>
wrote:

>Good software helps a lot.
>I have checked my master location list some times.
>But I have also kept old style spelling, at least in some cases. Somehow I
>like it.

That's what I am doing at the moment. For years I did genealogy only
for myself so eccentricity satisfied me. But it might leave my
successors mystified.

My locations are consistent as are my sources. But I'm having to look
at every record...

My name James "Hugh" Sullivan, and all names with quotes, has the
"name" listed as aka from 2 to 5 times. I'm eliminating those akas.
People were called by the middle name for years and I indicate when I
know.

I also noted that a ton of my name sources had been moved to
"unspecified". That was probably a result of using more than one
program thinking GED transfer would be consistent. I'm to the letter
"S" now.

It takes a lot of time if one is a nut about standardization and
consistency - and I am.

Hugh