GEDitCOM II "normalizing" the .ged file?

71 views
Skip to first unread message

Ryan Hamilton

unread,
May 1, 2012, 4:00:42 PM5/1/12
to GEDitCOM II Discussions
I'm finally upgrading from GEDitCOM to GEDitCOM II. (Thanks,
Lion :>) I've noticed that simply importing my old .ged file into
GEDitCOM II and saving it results in a number of changes that I did
not expect. Changes include:

* Removing "empty" nodes (e.g. "1 DEAT" with no children)
* Extending the line length for TEXT fields
* Changing "inline" OBJE and NOTE elements to top-level elements
referenced by xref
* Creating new top leve _PLC nodes that do not appear to be referenced
by any other nodes

Are these all expected? Is there any way to disable this conversion?
(Since I keep my .ged file in a revision control system I prefer to
keep changes to a minimum)

Cheers,

Ryan

John Nairn

unread,
May 1, 2012, 5:26:37 PM5/1/12
to geditcom-ii...@googlegroups.com
There are all normal and generally "improvements" in my view, but you can also go back (mostly) to other style GEDCOM if ever needed by using the Export -> GEDCOM Data... command. Here are some details:

On May 1, 2012, at 1:00 PM, Ryan Hamilton wrote:

> I'm finally upgrading from GEDitCOM to GEDitCOM II. (Thanks,
> Lion :>) I've noticed that simply importing my old .ged file into
> GEDitCOM II and saving it results in a number of changes that I did
> not expect. Changes include:
>
> * Removing "empty" nodes (e.g. "1 DEAT" with no children)

I would have to check to see exactly what GEDitCOM II does here. I don't recall deleting it. But, a line "1 DEAT" with no subordinate lines is often deleted or ignored by GEDCOM software and the standard even says it will be deleted. The standard method to record that a person has died but no other information is known is to make the line "1 DEAT Y". You get this line in GEDitCOM II by checking the "Has Died" check box for the individual. The standard actually goes on and says using "1 DEAT Y" with subordinate data is an error, but GEDitCOM II allows it and I don't see any confusion in that use.

It is a good idea to check that box for all those known to be deceased but no death information is known. It makes your data more informative. GEDitCOM II has a script in the "Editing Tools" submenu of the scripts menu called "Check Has Died" that will go through all individuals and automatically check that box if it can tell they are deceased by looking at other dates.

> * Extending the line length for TEXT fields

According to the standard, lines of notes text (or any line of GEDCOM data) can be up to 255 characters (which itself is archaic limit imposed by 8-bit character models). You can pick any maximum length you want when you use the Export -> GEDCOM Data... menu command.

> * Changing "inline" OBJE and NOTE elements to top-level elements
> referenced by xref

This is true internally. All multimedia links are in OBJE records rather then "in line". It makes handling of multimedia easier (i.e., you do not have to program interfaces for every possible way multimedia can be linked).

Similarly, having one style of notes (always in NOTE records) makes the GEDCOM data "simpler." This change was actually recommended around GEDCOM 4 (i.e., in the 1980's) that all GEDCOM notes should be linked and not "in line" and the GEDCOM 5.5 standard includes that recommendation. Many software way ignore it and old GEDitCOM allowed, but that is the recommendation. GEDitCOM II follows the recommendation (although I can't find the recommendation in the standard now).

> * Creating new top leve _PLC nodes that do not appear to be referenced
> by any other nodes

The _PLC records are a new feature in GEDitCOM 1.7 that has many new options for managing your places. They ARE all linked to your data, except the link is done by the full place name rather than a GEDCOM ID. In other words, the line

2 PLAC Ridgewood, NJ, USA

is actually a link to the _PLC record whose first name is "Ridgewood, NJ, USA". Using this method, the PLAC lines still look like normal GEDCOM data but they double as a link to a record where you can store lots of place specific records. The new Place Atlas feature lets you look up places and transfer that information to place records.

> Are these all expected? Is there any way to disable this conversion?
> (Since I keep my .ged file in a revision control system I prefer to
> keep changes to a minimum)

The best way to do version control is to place the entire .gedpkg file in your version control. It is a Mac "package" which means it is a Unix folder as far as version control systems are concerned. Inside the package are one .ged file and a collection of text files (which work great in version control) and possibly thumbnail images, if they are saved. Some version control systems write special folders in each controlled folder as well (e.g., SVN or CVS do, but not GIT) and GEDitCOM II is nice enough to look for those folders and makes sure than are always PRESERVED whenever the file is saved. Thus the entire .gedpkg package folder can simply be added to any version control system. I have test it with SVN, CVS, and GIT. For details, see

http://www.geditcom.com/tutorials/collab.html

which focuses on SVN, but other systems can be used too.

If for some reason you need a stand alone .ged in some other style because you need to interact with other software that needs it, you can use the Export -> GEDCOM Data.. menu command. This command lets you choose character set, characters per line, line end characters, and various option. The OBJE links can be left as is or moved to internal links. The various custom records defined by GEDitCOM II can be included or omitted. You can even encode various settings and thumbnail images if you want, but that is only useful if you will reopen in GEDitCOM II (or maybe in the future in a GEDitCOM iPad App). See "Saving Genealogy Files" in the Help topics for more details.

Because exporting with "in line" notes violates the recommendation of the standard, that option is not in the export process. If really needed, there is a script to export with embedded notes, although it would not have the other export options. The script could be enhanced for more export options, but it would only be needed if you later need to import that GEDCOM file to some system the only supports "in line" notes (e.g., some very out-dated GEDCOM software).

------------
John Nairn
http://www.geditcom.com
Genealogy Software for the Mac

Ryan Hamilton

unread,
May 1, 2012, 8:21:56 PM5/1/12
to geditcom-ii...@googlegroups.com
On Tue, May 1, 2012 at 2:26 PM, John Nairn <johna...@gmail.com> wrote:
There are all normal and generally "improvements" in my view, but you can also go back (mostly) to other style GEDCOM if ever needed by using the Export -> GEDCOM Data... command. Here are some details:

Thank you for the detailed response!  I think I agree with you that the changes are mostly improvements.  At the same time, I'm having culture shock because GEDitCOM (the original version) pretty much never touched the parts of my file that I wasn't directly changing.  I so loved that feature of the original version.

 
On May 1, 2012, at 1:00 PM, Ryan Hamilton wrote:

> I'm finally upgrading from GEDitCOM to GEDitCOM II.  (Thanks,
> Lion :>)  I've noticed that simply importing my old .ged file into
> GEDitCOM II and saving it results in a number of changes that I did
> not expect.  Changes include:
>
> * Removing "empty" nodes (e.g. "1 DEAT" with no children)

I would have to check to see exactly what GEDitCOM II does here. I don't recall deleting it. But, a line "1 DEAT" with no subordinate lines is often deleted or ignored by GEDCOM software and the standard even says it will be deleted. The standard method to record that a person has died but no other information is known is to make the line "1 DEAT Y". You get this line in GEDitCOM II by checking the "Has Died" check box for the individual. The standard actually goes on and says using "1 DEAT Y" with subordinate data is an error, but GEDitCOM II allows it and I don't see any confusion in that use.

It is a good idea to check that box for all those known to be deceased but no death information is known. It makes your data more informative. GEDitCOM II has a script in the "Editing Tools" submenu of the scripts menu called "Check Has Died" that will go through all individuals and automatically check that box if it can tell they are deceased by looking at other dates.

Totally agree with you about the "right" thing to do here.  As it happens the .ged file I work with has been hacked on with a variety of tools over the years, and has a pile of cruft :>  Perhaps it's time to bite the bullet and standardize it.  I noticed, by the way, that in my particular file, I start with:

1 DEAT
2 DATE

When I save the file the first time (after making some change), the "2 DATE" line is removed.  Then when I save it the second time, the 1 DEAT is removed.  Does GEDitCOM II have logic to delete "empty" lines in general?  (That seems like a reasonable thing to do).

> * Extending the line length for TEXT fields

According to the standard, lines of notes text (or any line of GEDCOM data) can be up to 255 characters (which itself is archaic limit imposed by 8-bit character models). You can pick any maximum length you want when you use the Export -> GEDCOM Data... menu command.

Gotcha.
 
> * Changing "inline" OBJE and NOTE elements to top-level elements
> referenced by xref

This is true internally. All multimedia links are in OBJE records rather then "in line". It makes handling of multimedia easier (i.e., you do not have to program interfaces for every possible way multimedia can be linked).

Similarly, having one style of notes (always in NOTE records) makes the GEDCOM data "simpler." This change was actually recommended around GEDCOM 4 (i.e., in the 1980's) that all GEDCOM notes should be linked and not "in line" and the GEDCOM 5.5 standard includes that recommendation. Many software way ignore it and old GEDitCOM allowed, but that is the recommendation. GEDitCOM II follows the recommendation (although I can't find the recommendation in the standard now).

*nod*  Since my file is such a frankenstein, I have notes of both kind.  Ideally, GEDitCOM would leave alone notes that I haven't modified, but I understand the motivation.
 
> * Creating new top leve _PLC nodes that do not appear to be referenced
> by any other nodes

The _PLC records are a new feature in GEDitCOM 1.7 that has many new options for managing your places. They ARE all linked to your data, except the link is done by the full place name rather than a GEDCOM ID. In other words, the line

2 PLAC Ridgewood, NJ, USA

is actually a link to the _PLC record whose first name is "Ridgewood, NJ, USA". Using this method, the PLAC lines still look like normal GEDCOM data but they double as a link to a record where you can store lots of place specific records. The new Place Atlas feature lets you look up places and transfer that information to place records.

Oh, I see.  I see the motivation, although the unfortunate consequence of this is that I've ended up with about 17K new lines / .5MB of new data in the new file.  This probably is not the end of the world, though it means that much more time to process.  I could work around this, as you say, by exporting to GEDCOM instead of looking at the  .ged file in the .gedpkg/ directory.  I'm not in love with this approach because I prefer to have a single version of the .ged data.  But it could work.  (Or I could just cook up a perl script to trim these records before I send the data up to my web server).
 
> Are these all expected?  Is there any way to disable this conversion?
> (Since I keep my .ged file in a revision control system I prefer to
> keep changes to a minimum)

The best way to do version control is to place the entire .gedpkg file in your version control.
[snip]

*nod*  Definitely a possibility.  One of the things I really liked about the original version of GEDitCOM was that the .ged was a self contained source of truth.  This was particularly convenient because it mean that the tool I used to view the data on my web server did not need to know anything about the tool I use to edit the data.  I guess the new supported solution is to export to GEDCOM in this case.  (As long as the xrefs don't change, that'll continue to be workable, I think)

Cheers,

Ryan

Reply all
Reply to author
Forward
0 new messages