Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Considerations for a better Import/Export Format

44 views
Skip to first unread message

Tony Proctor

unread,
Sep 28, 2011, 2:37:42 PM9/28/11
to
Some thoughts on a better textual import/export format for genealogical use.

I know I'm setting myself up as a target for trolls here but I promised
Cheryl I would have a go. Please read it as a genuine attempt to help. All
constructive suggestions welcome of course.

References is made to XML purely for illustration. Although it would be a
good candidate, it is not the only possible one and the recommendations
should be as generic as possible.


Goals
Define a universal import/export format
Flexibility. Store virtually anything without having to bend any rules
Locale independence
Potential use as a definitive backup-format or a load-format for databases
Zero-loss when operating between different software units


Locale-independence
The character set should be global which nowadays means UTF-8. This is also
the default with XML. Although the header could explicitly provide a
non-default character set name (again, similar to XML) I think that would
over-complicate the processing and would put an onus for all possible
translations on the receiving software unit.

Another possibility is to use Unicode "escape sequences". There are zillions
of these, e.g. HTML uses a format like "€" whilst Java uses "\u20AC".
The problem with these is having to reserve a magic escape character for
their use ('&' and '\' in these 2 cases). See
http://billposer.org/Software/ListOfRepresentations.html for a good summary.

Data values should be in a locale-neutral format, as with the source code
for programming languages. For this reason, it is sometimes called using a
'programming locale'. This effectively means using a period in all decimal
numbers (not a comma), ISO 8601 format for (Gregorian-)dates (e.g.
yyyy-mm-dd), and unlocalised true/false or just 1/0 for booleans (e.g. for
option selections).

Time Zones (TZ) and Daylight Saving Time (DST) were discussed in this
thread. Although usually applicable to local clock times, they can also
apply to local calendar dates. The importance for genealogy is going to be
slim at best but the area should be clarified. ISO 8601 does not include TZ
designators, with one exception: a 'Z' suffix indicates a UTC (Coordinated
Universal Time) as opposed the default of a 'local date/time'. Local
date/times should be interpreted in the context of the data location rather
than the current location of the user but this would only be significant
when creating a timeline across TZ boundaries.


Main Elements
The definition of a person (or place) should consist of a set of discrete
elements representing

<Events> - Something happening on a particular date (may be approx - see
below). Predefined ones must include BMD events but it should be open-ended
<Notes> - General narrative notes
<Extension> - Extensions to the set of elements, e.g. PlaceOfEducation.
These names should be interpreted only within the particular "namespace"
associated with the current dataset, thus preventing clashes with datasets
from other sources, or with any newer elements names appearing in a future
revision of the schema.
Lineage - see below

All of these elements can contain narrative text, and the narrative text can
have references to people, places, dates, events, references, resources,
etc, embedded within them. This would be a powerful feature allowing a
viewing tool to provide hyperlinks to the associated material.

Each element should have a key associated with it (i.e. a simple name that
is local to the dataset) by which it can be referenced from other elements,
e.g. <Person Name=Tony123>.

If references need to be made across datasets (e.g. when comparing them)
then local names should be decorated with the dataset name to make them
unique, e.g. TonysTree:Tony123.


Lineage
Formats like XML provide an automatic way of depicting a top-down
hierarchical relationship. Unfortunately, genealogical lineage is really a
'network' rather than a pure 'hierarchy'. In effect, a simple nesting of
"offspring" under their associated "parents" is insufficient.

There's also a problem with a top-down approach unless a specific union
between two people has a single representation in the data, but that then
causes further problems with the nature and the lifetime of that union. For
instance, if the father and mother have separate representations in the
data, and they each have links to their associated common offspring, then it
makes it difficult to bring the information together to identity family
units, and also to ensure that there exists two links to each offspring.

I believe it's easier to use a bottom-up representation. Each person has
just one progenitive father and one progenitive mother and so can have
upward links to their appropriate parents (where known). For instance,
<Father> and <Mother> elements. This also makes it easy to have other types
of parentage including <Guardian>, <FosterMother>, <AdoptedMother>, etc.
Key


References
A 'reference' would be a reference to some item of information in an
external catalogue that's not directly accessible from your software. For
example, a BMD reference or a TNA reference. Computers use an extensible
mechanism for such varied names called a URN or Uniform Resource Name. This
is a subset of a URI (Uniform Resource Identifier), similar to the
more-familiar URL and so has the same structure.


Resources
When referring to resources that may be part of your data collection, or to
resources which may be accessed over the Internet, a URL should be employed.
URLs and URN are both subsets of URIs and have similar formats. The 'scheme'
prefix makes them applicable to different stores, protocols and access
methods, e.g. file:// for local files and http:// which we all know about.


Attributes
All element types should accept certain attributes to modify their
interpretation. Suggestions might be:

Sensitivity: Public, Family, Private, Very Sensitive. Default=Public
Surety: Some percentage of how certain the data is. Default=100%
Source: Identity the source of a piece of data

For instance, <Note Sensitivity=Private Surety=20%>...some sensitive note
that I'm not sure about...</Note>


Names
As mentioned already in this thread, names around the world are not used in
the same way. As well as alternative spellings, nicknames, spellings in
alternative languages, and optional parts, the very structure may be
variable leaving the name with little uniqueness and no obvious
interpretation for our forename/middlename/surname concepts.

One possibility is to offer a prioritised set of patters to match. There are
lots of 'pattern definition' languages around but I'll present a very simple
one that can be used for illustration. The stored format doesn't have to use
this syntax itself but it's very convenient when discussing the pattern and
showing written examples.

Let a 'full name' be defined by a list of possible 'sequences'. These would
be in priority order and indicate which should be tested first. Each
'sequence' would be an ordered set from the following:

name - simple name element, e.g. Tony
{name, ...} - 1-or-many alternatives
[name, ...] - 0-or-many alternatives

The following example might belong to someone called Grace Ann Murphy who
doesn't always use her middle name and sometimes goes as Gracie. However,
she's Irish and also has an Irish version of her name. This would require
two 'sequences':

{Grace,Gracie} [Ann] Murphy
Gr�inne [Ann] "N� Murch�"

An interesting issue here concerns the variations of individual name parts.
In this example, Grace accepts "Gracie" as an informal version of her
forename. However, the difference between Ann and Anne is more of a spelling
error, either during recording or a subsequent lookup. I think this should
be handled by the software unit, just as a soundex might. The same could
apply to using a middle initial but that is a very Western convention.


Dates
The general representation of dates is mentioned above. When a date is
referenced in an element, it should have a margin of error associated with
it. This could be a +/- representation such as a day, a month, or a year,
etc., or a more explicit min/max representation.

When deterministic dates, such as our normal Gregorian ones, are loaded into
some type of indexing system like a database, it is expected that they can
all be stored as a pair of internal 'timestamp' values. Since these would be
a binary long-integer representations it would mean issues of date format,
uncertainty, TZ, etc., all become irrelevant and they can be handled
efficiently in the same manner.

Dates expressed in different Calendars are a bit more challenging and I'll
skip that for now :-)


Tony Proctor


Tony Proctor

unread,
Sep 28, 2011, 2:40:47 PM9/28/11
to

"Tony Proctor" <tony@proctor_NoMore_SPAM.net> wrote in message
news:j5vp67$t2b$1...@reader01.news.esat.net...
> Gráinne [Ann] "Ní Murchú"

>
> An interesting issue here concerns the variations of individual name
> parts. In this example, Grace accepts "Gracie" as an informal version of
> her forename. However, the difference between Ann and Anne is more of a
> spelling error, either during recording or a subsequent lookup. I think
> this should be handled by the software unit, just as a soundex might. The
> same could apply to using a middle initial but that is a very Western
> convention.
>
>
> Dates
> The general representation of dates is mentioned above. When a date is
> referenced in an element, it should have a margin of error associated with
> it. This could be a +/- representation such as a day, a month, or a year,
> etc., or a more explicit min/max representation.
>
> When deterministic dates, such as our normal Gregorian ones, are loaded
> into some type of indexing system like a database, it is expected that
> they can all be stored as a pair of internal 'timestamp' values. Since
> these would be a binary long-integer representations it would mean issues
> of date format, uncertainty, TZ, etc., all become irrelevant and they can
> be handled efficiently in the same manner.
>
> Dates expressed in different Calendars are a bit more challenging and I'll
> skip that for now :-)
>
>
> Tony Proctor
>

Sorry, the existing thread referred to here is "WHAT do you want to GED
in/out?" posted by singals. I decided to start a new thread for clarity and
forgot to change the reference

Tony Proctor


Ian Goddard

unread,
Sep 30, 2011, 4:58:43 AM9/30/11
to
Tony Proctor wrote:

> Lineage
> Formats like XML provide an automatic way of depicting a top-down
> hierarchical relationship. Unfortunately, genealogical lineage is really a
> 'network' rather than a pure 'hierarchy'. In effect, a simple nesting of
> "offspring" under their associated "parents" is insufficient.
>
> There's also a problem with a top-down approach unless a specific union
> between two people has a single representation in the data, but that then
> causes further problems with the nature and the lifetime of that union. For
> instance, if the father and mother have separate representations in the
> data, and they each have links to their associated common offspring, then it
> makes it difficult to bring the information together to identity family
> units, and also to ensure that there exists two links to each offspring.
>
> I believe it's easier to use a bottom-up representation. Each person has
> just one progenitive father and one progenitive mother and so can have
> upward links to their appropriate parents (where known). For instance,
> <Father> and<Mother> elements. This also makes it easy to have other types
> of parentage including<Guardian>,<FosterMother>,<AdoptedMother>, etc.
> Key

I agree with you. Trees provide a ready-made structure which is
probably very attractive to anyone setting out to write genealogical
S/W. But it's a siren's song. Any substantial database structured that
way is going to have duplicated sub-trees when pedigree collapse is
encountered - or even worse, sub-trees that ought to be duplicate &
aren't quite - e.g. one copy gets Fred Flintstone's date of death &
another doesn't.

If you have the bottom-up elements of people & links the tree is
implicit in the data & can be created on the fly for reporting or display.

--
Ian

The Hotmail address is my spam-bin. Real mail address is iang
at austonley org uk

Ian Goddard

unread,
Sep 30, 2011, 5:47:52 AM9/30/11
to
Tony Proctor wrote:
> Goals
> Define a universal import/export format
> Flexibility. Store virtually anything without having to bend any rules
> Locale independence
> Potential use as a definitive backup-format or a load-format for databases
> Zero-loss when operating between different software units

From what I've written elsewhere it should be no surprise that I want
to add another. It should distinguish between evidence and conclusions
drawn from that evidence.

This distinction seems very obvious and of prime importance to me but
seems to pass others by so I guess I'll have to have another try:

Evidence is real. If you, I, Cheryl, smart ol' Bob & anyone else views
the same document we all have the same thing in front of us.

Conclusions are mental constructs. Having seen the same document we may
come to different conclusions about its meaning. If the writing's in a
difficult hand we might not even be able to agree about the text. It
would be great if we all agreed although, of course, that may mean
simply that we're all wrong. However the structure of our data
shouldn't impede us in being able to share the same piece of evidence
and and record different conclusions; in fact we should even be able to
record the fact that as individuals we can seen more than one
interpretation and aren't able to distinguish which, if any, is correct.
One of the goals should be to enable us to enable us to do this. As a
sub-goal I think there should also be a means of expressing how we
derived out conclusion from the evidence and our confidence in it and
this should be part of the overall structure and not stuck away in a
note somewhere.

A few other comments:
IMV universal implies that any unique IDs we give to some part of the
data should be universally unique.

Flexibility does have some impact on locale independence. If we're to
be flexible there's going to be a need for adding ad hoc pieces of
information which seems to demand the need to invent types of data
items, albeit in some carefully controlled manner.

If something is part of the standard structure then its meaning is
understood even if the structure is defined in a language which we don't
speak, irrespective of whether that's embedded in the data as with XML
tags or in some external document. An application using this would be
able to present the user with a localised caption.

If, however, the data item is ad hoc then the format is going to have to
make provision for ad hoc labels & unless the application has a set of
translation dictionaries it's just going to have to caption the data as
found. Even in the English-speaking world we're going to have to put up
with some records mentioning, to take one of Cheryl's examples, eye
color & others mentioning eye colour.

Tony Proctor

unread,
Sep 30, 2011, 6:02:06 AM9/30/11
to

"Ian Goddard" <godd...@hotmail.co.uk> wrote in message
news:9elhi9...@mid.individual.net...
The fact versus inference is an excellent point Ian. I agree totally.

Regarding localisation of extended elements, I think any good software unit
would provide configuration options to define appropriate labels, and any
dataset that makes use of them should probably provide a default one. I'm
not sure if you're implying the person/company defining the extensions
should define the labels too or the person using the software unit should
configure it appropriately. As long as that information is separate from the
main dataset - thus keeping it locale independent - then I would be happy.

Tony Proctor


Ian Goddard

unread,
Sep 30, 2011, 6:28:35 AM9/30/11
to
Tony Proctor wrote:
> Names
> As mentioned already in this thread, names around the world are not used in
> the same way. As well as alternative spellings, nicknames, spellings in
> alternative languages, and optional parts, the very structure may be
> variable leaving the name with little uniqueness and no obvious
> interpretation for our forename/middlename/surname concepts.
>
> One possibility is to offer a prioritised set of patters to match. There are
> lots of 'pattern definition' languages around but I'll present a very simple
> one that can be used for illustration. The stored format doesn't have to use
> this syntax itself but it's very convenient when discussing the pattern and
> showing written examples.
>
> Let a 'full name' be defined by a list of possible 'sequences'. These would
> be in priority order and indicate which should be tested first. Each
> 'sequence' would be an ordered set from the following:
>
> name - simple name element, e.g. Tony
> {name, ...} - 1-or-many alternatives
> [name, ...] - 0-or-many alternatives
>
> The following example might belong to someone called Grace Ann Murphy who
> doesn't always use her middle name and sometimes goes as Gracie. However,
> she's Irish and also has an Irish version of her name. This would require
> two 'sequences':
>
> {Grace,Gracie} [Ann] Murphy
> Gráinne [Ann] "Ní Murchú"
>
> An interesting issue here concerns the variations of individual name parts.
> In this example, Grace accepts "Gracie" as an informal version of her
> forename. However, the difference between Ann and Anne is more of a spelling
> error, either during recording or a subsequent lookup. I think this should
> be handled by the software unit, just as a soundex might. The same could
> apply to using a middle initial but that is a very Western convention.

My point about evidence and conclusions bears heavily on this.

Take, for instance a marriage in which Hannah Kaye (as spelled in the
register & indexes) married George Fawley. Examination of the register,
however, shows that she signed rather than made a mark and spelled her
surname Kay. So we have two /names/ in the document, "Hannah Kaye" and
"Hannah Kay" but clearly these both refer to the one historical
/person/. I don't, however, think that this is a trivial difference to
be smudged over by soundex. The predominant spelling hereabouts is
"Kaye". However there was one family in the community which used the
"Kay" spelling and seems to have been quite punctilious about it (it
originated in a Kay/Kaye marriage; I haven't been able to resolve the
groom's identity but wonder whether he may have adopted the alternative
spelling to gloss over a fairly close degree of cousinship). As both
this family and at least one of the Kayes had daughters called Hannah
it's important to recognise both spellings in /analysis/ of the
/evidence/ and, of course, make use of it in my /conclusion/ about this
particular ancestor.

What this means, of course, is that we require more than one "name"
entity. One is the name as found in the original and one which
identifies the historical reconstruction. The former doesn't
necessarily present us with any formal structure such as "Given name[s]
Surname". It may well be something along the lines of "John son of
Jonathan Goddard" in which only the father's name is given in the
expected formal structure. On the whole I'm in favour of expressing the
evidential name as a simple string as found and restricting the
structured form to the reconstruction. Apart from anything else this
gets round the fact that some PRs Latinised the descriptions so that the
spelling as father is systematically different from the name as a data
subject and maybe different again from that in everyday life, e.g.
"Guillielmus f Guillielmi", AKA "William". The form used in
reconstruction may be usefully extended to include some additional
epithet which doesn't necessarily have any historical use but saves to
de-duplicate for our purposes, e.g. "William Goddard IV of Upperthong".

Ian Goddard

unread,
Sep 30, 2011, 6:40:39 AM9/30/11
to
Tony Proctor wrote:
> "Ian Goddard"<godd...@hotmail.co.uk> wrote in message
> news:9elhi9...@mid.individual.net...
>> Tony Proctor wrote:
>>> Goals
>>> Define a universal import/export format
>>> Flexibility. Store virtually anything without having to bend any rules
>>> Locale independence
>>> Potential use as a definitive backup-format or a load-format for
>>> databases
>>> Zero-loss when operating between different software units
>>
>> From what I've written elsewhere it should be no surprise that I want to
>> add another. It should distinguish between evidence and conclusions drawn
>> from that evidence.
>>
>> This distinction seems very obvious and of prime importance to me but
>> seems to pass others by so I guess I'll have to have another try:
>>
>> Evidence is real. If you, I, Cheryl, smart ol' Bob& anyone else views
>> make provision for ad hoc labels& unless the application has a set of
>> translation dictionaries it's just going to have to caption the data as
>> found. Even in the English-speaking world we're going to have to put up
>> with some records mentioning, to take one of Cheryl's examples, eye color
>> & others mentioning eye colour.
>>
>> --
>> Ian
>>
>> The Hotmail address is my spam-bin. Real mail address is iang
>> at austonley org uk
>
> The fact versus inference is an excellent point Ian. I agree totally.
>
> Regarding localisation of extended elements, I think any good software unit
> would provide configuration options to define appropriate labels, and any
> dataset that makes use of them should probably provide a default one. I'm
> not sure if you're implying the person/company defining the extensions
> should define the labels too or the person using the software unit should
> configure it appropriately. As long as that information is separate from the
> main dataset - thus keeping it locale independent - then I would be happy.

Putting it in XML terms you might have something like:

<Person>
<PersonName>......</Name>
....
<OtherItems>
<Item name=.... value=..../>
<Item name=..... value=.../>
</OtherItems>
</Person>

so that the expected structured stuff all has its place and it doesn't
really matter whether the user speaks English or not, if the application
is properly localised it will be able to present a localised caption for
the contents of the PersonalName element. But the only way I can see to
provide for "virtually anything without having to bend any rules" is
some mechanism such as the OtherItems element where the originating user
would be able to define named items on the fly.

The best stab that the application would be able to make at localising
these would be to extend the Item element to include the language, have
a dictionary and still from time to time make hilarious mistranslations.

Tony Proctor

unread,
Sep 30, 2011, 7:54:23 AM9/30/11
to

"Ian Goddard" <godd...@hotmail.co.uk> wrote in message
news:9eljuk...@mid.individual.net...
I've read this a few times and it sounds like you're making a case for
storing other name-like references to a person which would not automatically
be used by a software unit during the pattern matching, i.e. something the
user would have access to and could potentially make use us. Would you agree
Ian?

Tony Proctor


Ian Goddard

unread,
Sep 30, 2011, 8:35:54 AM9/30/11
to
Tony Proctor wrote:
> I've read this a few times and it sounds like you're making a case for
> storing other name-like references to a person which would not automatically
> be used by a software unit during the pattern matching, i.e. something the
> user would have access to and could potentially make use us. Would you agree
> Ian?

What I'm saying is that there are two distinctly different types of
entity which have names sensu lato as attributes.

One is what have been labeled as personae in previous discussions, they
fill roles identified in events described by original texts. Clearly
these relate to real historical people. The other are the historical
people who we reconstruct from the evidence and will appear in family
trees etc. The first are best dealt with as strings representing
exactly what's found in the original. The others are best dealt with as
properly structured, disambiguated names.

For instance a baptism might record "Johanes f Willemi Goddard" which
splits gives us two personae. One would be recorded in that way filling
the role of subject the other will be recorded as "Willelmi Goddard"
filling the role of subject's father. The second entity will, for the
first persona, record something along the lines of

<PersonalName type="Modern English">
<GivenNames>John</GivenNames>
<Surname>Goddard</Surname>
</PersonalName>

and a corresponding element for the second. Clearly the John entity
will have a link to the Johanes persona, but also links to the personae
which might use the form "Johanis" where John's children are baptised.

In fact, that's only for people. You could say the same thing about places.

This is a distinction which Gedcom fails to make. Sure, the additional
information could be plonked in a note entity but I think it's far more
significant than that. Both entities are distinct parts of the data.

Tony Proctor

unread,
Sep 30, 2011, 9:31:57 AM9/30/11
to

"Ian Goddard" <godd...@hotmail.co.uk> wrote in message
news:9elrda...@mid.individual.net...
OK, I see now Ian. That's a very subtle point that I think a lot of people
might miss - including myself. Very interesting point though.

We're implicitly heading towards something for professional usage but I
think that's the way it should be. A lot of hobbyists will eventually hit
brick walls with their popular desktop tools as their experience grows.
There's a lot more to this than being able to draw a pedigree chart that
matches your wallpaper. ;-)

Tony Proctor

Tony Proctor


Tony Proctor

unread,
Sep 30, 2011, 9:58:30 AM9/30/11
to

"Tony Proctor" <tony@proctor_NoMore_SPAM.net> wrote in message
news:j5vp67$t2b$1...@reader01.news.esat.net...
> Gráinne [Ann] "Ní Murchú"
>
> An interesting issue here concerns the variations of individual name
> parts. In this example, Grace accepts "Gracie" as an informal version of
> her forename. However, the difference between Ann and Anne is more of a
> spelling error, either during recording or a subsequent lookup. I think
> this should be handled by the software unit, just as a soundex might. The
> same could apply to using a middle initial but that is a very Western
> convention.
>
>
> Dates
> The general representation of dates is mentioned above. When a date is
> referenced in an element, it should have a margin of error associated with
> it. This could be a +/- representation such as a day, a month, or a year,
> etc., or a more explicit min/max representation.
>
> When deterministic dates, such as our normal Gregorian ones, are loaded
> into some type of indexing system like a database, it is expected that
> they can all be stored as a pair of internal 'timestamp' values. Since
> these would be a binary long-integer representations it would mean issues
> of date format, uncertainty, TZ, etc., all become irrelevant and they can
> be handled efficiently in the same manner.
>
> Dates expressed in different Calendars are a bit more challenging and I'll
> skip that for now :-)
>
>
> Tony Proctor
>

Any suggestions for handling place names?

My post suggested that persons and places should have a similar structure,
including a 'key' by which they can be referenced. However, although I
considered using the same list of pattern 'sequences' for place-names as for
person-names, I'm less than convinced now.

There is a similar goal in being able to give each unique person or place
just a single entity in the stored data - no duplicates. Most products use a
sequence of name-parts for a place name that begins with the most local
(e.g. a street address) and continues to the most global (e.g. a county or
country).

The similarity in there being a sequence of name-parts, and that we could
use a list of sequences to define them (e.g. Sunderland, {Durham,"Co.
Durham"}), would lose something if we took advantage of it... the
hierarchical nature of places.

If you were viewing data on Sunderland, for instance, then it would be
convenient to be ale to move up a level and look generally at Durham, e.g. a
county map.

In effect, each place has a type of parentage but I'm not sure how far to
take the analogy.

Tony Proctor


NigelBufton

unread,
Sep 30, 2011, 10:34:46 AM9/30/11
to
Tony Proctor used his keyboard to write :
> Gráinne [Ann] "Ní Murchú"
>
> An interesting issue here concerns the variations of individual name parts.
> In this example, Grace accepts "Gracie" as an informal version of her
> forename. However, the difference between Ann and Anne is more of a spelling
> error, either during recording or a subsequent lookup. I think this should be
> handled by the software unit, just as a soundex might. The same could apply
> to using a middle initial but that is a very Western convention.
>
>
> Dates
> The general representation of dates is mentioned above. When a date is
> referenced in an element, it should have a margin of error associated with
> it. This could be a +/- representation such as a day, a month, or a year,
> etc., or a more explicit min/max representation.
>
> When deterministic dates, such as our normal Gregorian ones, are loaded into
> some type of indexing system like a database, it is expected that they can
> all be stored as a pair of internal 'timestamp' values. Since these would be
> a binary long-integer representations it would mean issues of date format,
> uncertainty, TZ, etc., all become irrelevant and they can be handled
> efficiently in the same manner.
>
> Dates expressed in different Calendars are a bit more challenging and I'll
> skip that for now :-)
>
>
> Tony Proctor

Having followed this thread for a few days, I have seen very little that is not catered
for in the existing GEDCOM 5.5 standard (or the 5.5.1 proposal).

The issue, as always, is not so much an issue with the current standard, but the fact
that very few programs (I am aware of only two) adhere to the standard that exists.

Therefore, discussions of what might be a better standard would seem moot until
the creators of most products take the time to read the existing standard (which
they have had well over a decade to do).

The worst examples are those that create custom tags to do the job of a perfectly
appropriate standard construct, and the very worst are those that deny that
something is supported - for example Famaliy Tree Manager's refusal to export
links to media files with a help file that says that it is become GEDCOM does not
support it!

Nigel Bufton


Tony Proctor

unread,
Sep 30, 2011, 11:38:15 AM9/30/11
to

"NigelBufton" <ni...@bufton.org> wrote in message
news:j64k26$2pg$1...@dont-email.me...
I have to disagree with you there Nigel.

For a start, GEDCOM is not standards based, even to the point of inventing
its own character set. The post I made cited several different modern
standards, including ISO 8601 which Gedcom 5.5.1 still doesn't address.

The method of extending elements proposed here uses proper namespaces. If
XML were used, for instance, then it would have a standard method of
defining the schema, a way of automated validatation using a schema files,
and a way of defining new elements through a standard namespace with a
unique URI identifying it.

GEDCOM might be fine for simple pedigrees but it leaves no room for the
general narrative which forms a good part of family history. The proposal
here not only allows such narrative to be associated with people and places
but allows it to be qualified according to Surety, Sensitivity, Source, and
(taking Ian's input) fact/inference. It also allows nesting of elements to
provide a way of putting hyperlinks in the presentation of the narrative in
a viewer.

I'm sorry but I see very little common ground at all.

Tony Proctor


Ian Goddard

unread,
Sep 30, 2011, 12:15:39 PM9/30/11
to
> OK, I see now Ian. That's a very subtle point that I think a lot of people
> might miss - including myself. Very interesting point though.

As I wrote before, I spent half my working life in science in two
disciplines involved in investigation of the past knowing my reports
might be scrutinised quite closely so this is a distinction which is
burned into my thinking. And the other half was spent in IT, mostly
involving RDBMSs & more latterly XML....

> We're implicitly heading towards something for professional usage but I
> think that's the way it should be. A lot of hobbyists will eventually hit
> brick walls with their popular desktop tools as their experience grows.

Indeed. As one works on those brick walls one starts to find a lot in
common with the material local historians use. For instance I don't
think it's a coincidence that a couple of new surnames appear in what
was largely a strongly parliamentarian area just after the Civil War &
the Restoration from a few miles away where the Lords of the Manor were
from a dynasty which was & still is notably RC.

> There's a lot more to this than being able to draw a pedigree chart that
> matches your wallpaper. ;-)

Oh damn! Forgot to put that into the statement of requirements.

Tony Proctor

unread,
Sep 30, 2011, 6:00:22 PM9/30/11
to

"Tony Proctor" <tony@proctor_NoMore_SPAM.net> wrote in message
news:j643n7$j15$1...@reader01.news.esat.net...
I didn't expand on the structure of the elements I'd proposed. However, if
XML were going to be used then their design would have to follow best
practices to ensure that a schema-based validation was possible. I've fallen
into the trap before of defining XML in a way that feels natural, and then
finding that it cannot adequately be described in a schema definition.

For the 'Notes', how about something along the lines of <Notes> including a
sequence of <Node> elements, each of which is a "mixed" element with both
narrative text and embedded reference-type elements, e.g.

<Notes>
<Note>
100% sure fact with public sensitivity
</Note>
<Note Surety=80% Fact=0>
80% sure inference
</Note>
</Notes>

The embedded reference-type nodes should have a different element name to
that in the definition of the thing being referenced - another trap it's
easy to fall into. For instance, <PersonRef> instead of <Person>, and
<PlaceRef> instead of <Place>.

If it's done properly then a viewing tool could present the sections of
narrative in clearly different ways. In fact, a relatively simple XSLT (if
such a thing exists) could generate HTML directly from it.

Tony Proctor

Tony Proctor


Peter J. Seymour

unread,
Oct 1, 2011, 4:52:34 AM10/1/11
to
On 2011-09-30 15:34, NigelBufton wrote:
>
> Having followed this thread for a few days, I have seen very little that is not catered
> for in the existing GEDCOM 5.5 standard (or the 5.5.1 proposal).
>
> The issue, as always, is not so much an issue with the current standard, but the fact
> that very few programs (I am aware of only two) adhere to the standard that exists.
>
> Therefore, discussions of what might be a better standard would seem moot until
> the creators of most products take the time to read the existing standard (which
> they have had well over a decade to do).
>
> The worst examples are those that create custom tags to do the job of a perfectly
> appropriate standard construct, and the very worst are those that deny that
> something is supported - for example Famaliy Tree Manager's refusal to export
> links to media files with a help file that says that it is become GEDCOM does not
> support it!
>
> Nigel Bufton
>
>
As has been pointed out many times before, the main problem is that it
does not seem to be in the vested interests of software producers to
follow the standard. This suggests that whatever standard there is, it
will not be followed to varying extents. Gedcom itself works very well
within a defined context.
As I have suggested previously, a way of dealing with the lack of
adherence to standards is to have a "universal" gedcom reader. You can
then massage the data into whatever form you want.

Peter

Peter J. Seymour

unread,
Oct 1, 2011, 5:04:15 AM10/1/11
to
On 2011-09-30 16:38, Tony Proctor wrote:
>
> I have to disagree with you there Nigel.
>
> For a start, GEDCOM is not standards based, even to the point of inventing
> its own character set.

You are perhaps getting a bit too argumentative here. Gedcom is its own
standard. Presumably regarding character sets, you are referring to
ANSEL. Are you claiming that ANSEL is not a standard?

.....

The post I made cited several different modern
> standards, including ISO 8601 which Gedcom 5.5.1 still doesn't address.
......

And presumably never will, but that doesn't matter. It has a defined
date format and it converts readily to ISO 8601 or whatever.

Peter

Tony Proctor

unread,
Oct 1, 2011, 5:35:54 AM10/1/11
to

"Peter J. Seymour" <Newsg...@pjsey.demon.co.uk> wrote in message
news:rIAhq.4$Mi...@newsfe18.ams2...
OK, I probably misrepresented the character set. Although ANSEL has an ANSI
designation, there isn't a lot that uses it is there. Plus I question the
validity of using it as a computer exchange format in the first place. It is
no surprise that software such as FTM doesn't acknowledge it Peter.

Regarding standards in general, GEDCOM is an "isolated standard". Good
standards are built on other standards and the point I was trying to make is
that there is nothing in the definition that acknowledges modern standards.

Perhaps the main thrust of my original post, though, was not so much a
standards one as the applicability to family history in general as opposed
to simple pedigrees and discrete properties. Much of my own research
contains narrative and I have no option but to store it separate in
Word/pdf/etc documents. It then becomes sidelined and wouldn't get used by a
desktop tool.

Tony Proctor


Ian Goddard

unread,
Oct 1, 2011, 5:40:26 AM10/1/11
to
Ian Goddard wrote:
>
> Putting it in XML terms you might have something like:
>
> <Person>
> <PersonName>......</Name>
> ....
> <OtherItems>
> <Item name=.... value=..../>
> <Item name=..... value=.../>
> </OtherItems>
> </Person>

An additional point I should have mentioned is that if some data items
seem to be introduced fairly regularly by this means they could be given
their own elements in subsequent revisions.

Tony Proctor

unread,
Oct 1, 2011, 5:45:55 AM10/1/11
to

"Peter J. Seymour" <Newsg...@pjsey.demon.co.uk> wrote in message
news:vxAhq.140$h45...@newsfe22.ams2...
It's true that adoption is probably governed more by prevailing usage than
standards in cases like this Peter. I wouldn't expect software vendors to
use a new format simply because it was standards based.

However, there are many advantages to a major redesign, including things I
mentioned like applicability to narrative history, globalisation, automated
validation of file structure, zero-loss exchange, acknowledged IT methods
for registering URI-schemes/namespaces/schema-revisions etc.

No one seemed to be putting that first foot forward and suggesting what
might be done. Fool that I am, that's what I volunteered to do ;-)

Tony Proctor


Ian Goddard

unread,
Oct 1, 2011, 5:51:06 AM10/1/11
to
Tony Proctor wrote:
>
> I didn't expand on the structure of the elements I'd proposed. However, if
> XML were going to be used then their design would have to follow best
> practices to ensure that a schema-based validation was possible.

Agreed. One of the advantages of XML is that validation against a
schema makes it possible to reject a document outright even if only one
small part of it fails. That's what will keep the unofficial variations
out.

Because schema references are in the form of URLs an application could
keep abreast of the latest schemas even if it wasn't able to use newly
defined elements. This would, of course, enable a company to define its
own extended schema but unless it published it on the web it would be
automatically failed. And a program would be able to check that the
schema came from the official site and reject it if it didn't.

Ian Goddard

unread,
Oct 1, 2011, 9:24:30 AM10/1/11
to
Tony Proctor wrote:
> Lineage
> Formats like XML provide an automatic way of depicting a top-down
> hierarchical relationship. Unfortunately, genealogical lineage is really a
> 'network' rather than a pure 'hierarchy'. In effect, a simple nesting of
> "offspring" under their associated "parents" is insufficient.

Whilst XML does offer this alternative you don't have to use it. It's
quite acceptable to have something along the lines of:

<Wrapper>
<Person>...</Person>
<Person>...</Person>
<Family>...</Family>
<Place>...</Place>
</Wrapper>

In fact Gramps uses something along these lines.

> There's also a problem with a top-down approach unless a specific union
> between two people has a single representation in the data, but that then
> causes further problems with the nature and the lifetime of that union. For
> instance, if the father and mother have separate representations in the
> data, and they each have links to their associated common offspring, then it
> makes it difficult to bring the information together to identity family
> units, and also to ensure that there exists two links to each offspring.
>
> I believe it's easier to use a bottom-up representation. Each person has
> just one progenitive father and one progenitive mother and so can have
> upward links to their appropriate parents (where known). For instance,
> <Father> and<Mother> elements. This also makes it easy to have other types
> of parentage including<Guardian>,<FosterMother>,<AdoptedMother>, etc.
> Key

Taking an OO approach to design I'd start off with a very broad concept
such as Association which could then have a Family subclass, a
Guardianship subclass etc. You'd then have links of various types to
associate the individuals with the association and their role in the
association - father, mother, child and a set of rules - 0 or 1 father,
0 or 1 mother, 0 to many children.

However, it would also be possible to add further subclasses with their
own rule sets for things like BusinessPartnership to describe the family
business. It's an extensible approach.

singhals

unread,
Oct 1, 2011, 10:03:56 AM10/1/11
to gen...@rootsweb.com
Tony Proctor wrote:

> OK, I probably misrepresented the character set. Although ANSEL has an ANSI
> designation, there isn't a lot that uses it is there. Plus I question the
> validity of using it as a computer exchange format in the first place. It is
> no surprise that software such as FTM doesn't acknowledge it Peter.

Ummm, I wouldn't /exactly/ call FTM the best choice as an
example of what's best in computer programs.

It's certainly widely popular and widely available, but it
does have its little eccentricities which can drive you
crazy. So, the fact that it does or doesn't do P Q or R
doesn't mean other programs do or don't.

Cheryl

Peter J. Seymour

unread,
Oct 1, 2011, 11:28:10 AM10/1/11
to
On 2011-10-01 10:35, Tony Proctor wrote:
> "Peter J. Seymour"<Newsg...@pjsey.demon.co.uk> wrote in message
> news:rIAhq.4$Mi...@newsfe18.ams2...
>> On 2011-09-30 16:38, Tony Proctor wrote:
>>>
>>> I have to disagree with you there Nigel.
>>>
>>> For a start, GEDCOM is not standards based, even to the point of
>>> inventing
>>> its own character set.
>>
>> You are perhaps getting a bit too argumentative here. Gedcom is its own
>> standard. Presumably regarding character sets, you are referring to ANSEL.
>> Are you claiming that ANSEL is not a standard?
>>
>> .....
>>
>> The post I made cited several different modern
>>> standards, including ISO 8601 which Gedcom 5.5.1 still doesn't address.
>> ......
>>
>> And presumably never will, but that doesn't matter. It has a defined date
>> format and it converts readily to ISO 8601 or whatever.
>>
>> Peter
>
> OK, I probably misrepresented the character set. Although ANSEL has an ANSI
> designation, there isn't a lot that uses it is there. Plus I question the
> validity of using it as a computer exchange format in the first place. It is
> no surprise that software such as FTM doesn't acknowledge it Peter.

As I understand it, ANSEL originated in the early days of computng as a
standard for American library computer systems and focussed on
accommodating the character sets of certain languages. It was
effectively obsoleted by utf8. Another example of Gedcom showing its age.

.....


Much of my own research
> contains narrative and I have no option but to store it separate in
> Word/pdf/etc documents. It then becomes sidelined and wouldn't get used by a
> desktop tool.
>
.....

I sympathise. I have a similar problem. I intend to improve the text
handling facilities in Gendatam Suite but I have had other priorities.

Peter

Tony Proctor

unread,
Oct 1, 2011, 12:38:28 PM10/1/11
to

"Ian Goddard" <godd...@hotmail.co.uk> wrote in message
news:9eoike...@mid.individual.net...
Sounds like a reasonable approach Ian. I've tried to steer clear of any OO
aspects in this discussion because the storage format has some specific
requirements of its own.

When I gave this stuff a lot more though - a couple of years ago - I
separated 'storage format' (i.e. interchange, import/export, backup, or load
format) from the run-time 'object model', and from the 'indexed storage'
(e.g. a database). Stuff I'd read before then never gave a clear cut
distinction between these and what requirements they might each have
separately. For instance:

Storage format. This is a definitive storage format - as discussed
throughout this thread - and not the indexed format. Giving this a standard
would allow import/export and other types of exchange without having to
mandate a particular database format. Similar in concept to GEDCOM but a lot
more far-reaching.

Indexed storage. All or part of the storage format could be loaded into a
database. That might be a standard relational one or a proprietary one. It
doesn't really matter since it's the choice of the designer of the software
unit. If they felt SQL databases were hamstrung then they might invent
another one, although it would be a mammoth task in itself. This is how
multi-dimensional OLAP databases came about - something I was heavily
involved in once upon a time.

Object model. This is the run-time object model, used in memory and
communications. This is where the OO aspect comes in. I believe there should
be a standard object model for run-time interoperability. This is a step
beyond offline import/exchange. It would allow live co-operation between
software units holding separate trees - whether on the same computer or not,
and irrespective of whether they were from the same vendor or not - and
allow comparison, merging, etc. This is probably a pipedream but can foresee
family history being published "in the cloud" and your software being able
to connect to it and access parts of it in a controlled way. This is way
different to viewing someone's published pedigree on Ancestry, or where
ever, using a thin-client interface.


...I may go back to this when I finally kick the paid job :-)

Tony Proctor


Tony Proctor

unread,
Oct 1, 2011, 12:43:52 PM10/1/11
to

"Tony Proctor" <tony@proctor_NoMore_SPAM.net> wrote in message
news:j643n7$j15$1...@reader01.news.esat.net...
>
I went back through some of my research, Ian, to check whether I'd
distinguished facts from inference. Of course I knew which was which but I
hadn't made it explicit. Bad boy Tony!

It was useful because I also found a third variation: conjecture. Whereas
inference could be linked to a logical analysis of available facts, and was
merely awaiting some level of substantiation, conjecture is really a
poorly-disguised alternative term for 'guess'.

Starting to sound like something that could be generalised.

Tony Proctor


Ian Goddard

unread,
Oct 1, 2011, 3:21:38 PM10/1/11
to
An interesting experience.

I just paid a visit to the BetterGEDCOM project. They seem to have
en-mired themselves in a waterfall approach and have spent weeks if not
months wrangling a WHAT. What WHAT are they wrangling? Source type,
something which can be simply identified as a piece of data. All they
need is to concentrate on HOW. I left them an example:

<wrapper>
<Source type="archive" ID="660f78b6-ec5a-11e0-b261-001636e96075">
<ParentID/>
<SourceName>Yorkshire Archaeological Society Archive</SourceName>
<ShortName>Yorks Arch Soc Archive</ShortName>
<BriefName>YAS Archive</BriefName>
<AdHoc>
<Item Name="Address" Value="Claremont"/>
<!-- Add as many Items as required -->
</AdHoc>
</Source>
<Source type ="collection" ID="2a2ffc84-ec5b-11e0-a2f8-001636e96075">
<ParentID>660f78b6-ec5a-11e0-b261-001636e96075</ParentID>
<SourceName>H. L. Bradfer-Lawrence Collection</SourceName>
<ShortName>Bradfer-Lawrence Collctn</ShortName>
</Source>
<Source type="collection" ID="9b4a55ea-ec5b-11e0-a42e-001636e96075">
<ParentID>2a2ffc84-ec5b-11e0-a2f8-001636e96075</ParentID>
<SourceName>Millar Collection</SourceName>
</Source>
<Evidence ID="0a77ffee-ec5c-11e0-b798-001636e96075">
<ParentID>9b4a55ea-ec5b-11e0-a42e-001636e96075</ParentID>
<EvidenceName>Gift with warranty MD335/5/108</EvidenceName>
<Date>13th century</Date>
<References>
<Reference
source="2a2ffc84-ec5b-11e0-a2f8-001636e96075">MD335/5/108</Reference>
<Reference
source="9b4a55ea-ec5b-11e0-a42e-001636e96075">Box 64 Millar 108</Reference>
</References>
<EvidentialObject mimeType="text/plain">
Gift with warranty MD335/5/108 [13th century]

Contents:
1. William de Fonte of Hennesale 2. Michael son of John de
Heck William has given to Michael one toft in the vill of Hennesale
(description given). To hold to Michael, rendering yearly to William 6
d. for all services. Witnesses: William son of Thomas de Povlington,
John de Heck, Henry de Goudale, Hugh his brother, John son of Adam de
Wittelay, William son of Adam of the same, William son of Mabel de
Snaith, Gamel son of Richard of the same, Ylard clerk of the same,
Thomas son of Godard de Mora. Bag for seal. Former number, in pencil
'202' [Former ref: Box 64 Millar 108]

</EvidentialObject>
</Evidence>
</wrapper>

and wondered if they'll produce anything usable before I join all the
links between me and Godard, father of Thomas.

Wes Groleau

unread,
Oct 2, 2011, 12:03:33 AM10/2/11
to
On 09-28-2011 14:37, Tony Proctor wrote:
> Data values should be in a locale-neutral format, as with the source code
> for programming languages. For this reason, it is sometimes called using a
> 'programming locale'. This effectively means using a period in all decimal
> numbers (not a comma), ISO 8601 format for (Gregorian-)dates (e.g.
> yyyy-mm-dd), and unlocalised true/false or just 1/0 for booleans (e.g. for
> option selections).

For dates, I personally prefer GEDCOM's approach of allowing multiple
calendars, as long as the one being used is identified. ISO8601 (are
those the right digits? doesn't look right) isn't locale-neutral any
more than dd mmm yyyy is. Both are different locales, and it doesn't
matter which you use as long as which is clearly identified.

GEDCOM's flaws are not syntactic nor lexical, they are semantic.

XML does have the advantage of a wider selection of tools to work
with it, but if you put GEDCOM's data model into an XML syntax,
you haven't really accomplished anything.

Syntactically, one advantage of GEDCOM is that in a pinch,
humans can read it much more easily. In fact, for several
years, my database was a GEDCOM file, and my genealogy program
was Apple's TextEdit (similar to WordPad).

--
Wes Groleau

There are two types of people in the world …
http://Ideas.Lang-Learn.us/barrett?itemid=1157

Wes Groleau

unread,
Oct 2, 2011, 12:20:46 AM10/2/11
to
On 09-28-2011 14:37, Tony Proctor wrote:
> [a] Define a universal import/export format
> [b] Flexibility. Store virtually anything without having to bend any rules
> [c] Locale independence
> [d] Potential use as a definitive backup-format or a load-format for databases
> [e] Zero-loss when operating between different software units

[a] & [e] - Vendors won't comply with GEDCOM. Who's going to
make them comply with anything else?

[b] Impossible--though it ought to be possible to get closer
to this than GEDCOM does.

[c] Unnecessary and impossible. Whatever format is used _is_
either a new locale or a pre-existing one. The important thing
is that the format be defined. GEDCOM at least does that.

[d] GEDCOM can do this, too, except where it can't. :-) That's just
as much a matter of the DB being incompatible with GEDCOM as
it is GEDCOM being incompatible with the DB. The same thing
can happen with any other format.

It's not that I want to defend GEDCOM--there's a lot wrong with it.
But some of the proposals I've seen reappear from time to time are
fixing things that aren't broken while preserving what IS broken.

And the biggest problem of all is adoption.

Wes Groleau

unread,
Oct 2, 2011, 12:55:09 AM10/2/11
to
On 09-28-2011 14:37, Tony Proctor wrote:
> Some thoughts on a better textual import/export format for genealogical use.

Most of these give the impression of being presented as something better
than GEDCOM when in fact they are things GEDCOM already supports.

But you have an important exception:
> I believe it's easier to use a bottom-up representation. Each person has
> just one progenitive father and one progenitive mother and so can have
> upward links to their appropriate parents (where known). For instance,
> <Father> and<Mother> elements. This also makes it easy to have other types
> of parentage including<Guardian>,<FosterMother>,<AdoptedMother>, etc.

I have long wished for almost this. Only, change "parentage" to
"relationship." Sibling, Uncle, Godfather, Mistress, Teacher, .....

Originally, GEDCOM said nobody is related to anybody, instead we're
all related to families, and there are only three relationships: HUSB,
WIFE, CHIL. Eventually, they recognized that there _are_ other
relationships, so they invented ASSO [1]. So now, there are two
classes of relationships, and the "main" ones are still required
to be indirect.

Let people be directly related to other people, and let them be put
in all sorts of groups, not merely one kind, i.e., the traditional family.

[1] Which I am unable to look at without imagining giving the person
a derogatory anatomical title. You have a husband, a wife, and some
children. Everyone else is an ASSO. :-)

Wes Groleau

unread,
Oct 2, 2011, 1:00:12 AM10/2/11
to
On 10-01-2011 05:51, Ian Goddard wrote:
> Agreed. One of the advantages of XML is that validation against a
> schema makes it possible to reject a document outright even if only one
> small part of it fails. That's what will keep the unofficial variations
> out.

I doubt it. Having an "official schema" doesn't stop Microsoft from
changing things. They just create their own schema and pretend everyone
else is non-standard. (And they're just an example--others
do it, too.)

Wes Groleau

unread,
Oct 2, 2011, 1:06:39 AM10/2/11
to
PLEASE! For a discussion of this complexity,
it would be very helpful to see

> point K

Response to point K

(instead of)

> point A
> point B
> .....
> point ZY
> point ZZ

Response to a point that is up there somewhere

Wes Groleau

unread,
Oct 2, 2011, 1:14:33 AM10/2/11
to
On 09-30-2011 11:38, Tony Proctor wrote:
> GEDCOM might be fine for simple pedigrees but it leaves no room for the
> general narrative which forms a good part of family history. The proposal

Of course it does.

> here not only allows such narrative to be associated with people and places
> but allows it to be qualified according to Surety, Sensitivity, Source, and

Surety: from zero to 100% is just as subjective as quality (QUAY) from
one to four. Perhaps less, since some attempt was made to define what
each of the four levels meant.

Sensitivity: restriction is _already_ in the GEDCOM spec. You are
proposing a refinement, not something totally new.

Source: Surely you are not unaware of this in GEDCOM?

> (taking Ian's input) fact/inference. It also allows nesting of elements to

Distinguishing content of documents from conclusions in a structured
manner is definitely something GEDCOM lacks.

Wes Groleau

unread,
Oct 2, 2011, 1:18:30 AM10/2/11
to
On 10-01-2011 11:28, Peter J. Seymour wrote:
> As I understand it, ANSEL originated in the early days of computng as a
> standard for American library computer systems and focussed on
> accommodating the character sets of certain languages. It was
> effectively obsoleted by utf8. Another example of Gedcom showing its age.

And indeed, ANSEL handled European languages better than modern
"eight-bit" codes. But it doesn't handle scripts with non-Latin
characters very well.

Wes Groleau

unread,
Oct 2, 2011, 1:23:48 AM10/2/11
to
On 10-01-2011 09:24, Ian Goddard wrote:
> Taking an OO approach to design I'd start off with a very broad concept
> such as Association which could then have a Family subclass, a
> Guardianship subclass etc. You'd then have links of various types to
> associate the individuals with the association and their role in the
> association - father, mother, child and a set of rules - 0 or 1 father,
> 0 or 1 mother, 0 to many children.

I'd rather have a wide-variety of _relationships_ from one person to
another [1] directly [2], and a wide variety of types of groups that
might contain people in various _roles_

[1] Perhaps they might allow one-to-many, i.e., a list, instead of
one to one.

[2] Instead of GEDCOM's indirect INDI->FAM->INDI model.

NigelBufton

unread,
Oct 2, 2011, 3:01:34 AM10/2/11
to
Tony Proctor pretended :
GEDCOM 5.5 provides for ANSEL, UNICODE and ASCII. UNICODE was added in
GEDCOM 5.3; before that ANSEL had to be used for non-ASCII characters.

Nigel Bufton


NigelBufton

unread,
Oct 2, 2011, 3:09:52 AM10/2/11
to
It happens that Ian Goddard formulated :
GEDCOM does this via the ASSO tag in the INDI record. Programs that
are
compliant use this construct to do exactly that:
1 ASSO @I1234@
2 TYPE INDI
2 RELA Godfather
or
1 ASSO @F789@
2 TYPE FAM
2 RELA Witness at marriage

Nigel Bufton


NigelBufton

unread,
Oct 2, 2011, 3:31:38 AM10/2/11
to
Wes Groleau has brought this to us :
> On 10-01-2011 09:24, Ian Goddard wrote:
>> Taking an OO approach to design I'd start off with a very broad concept
>> such as Association which could then have a Family subclass, a
>> Guardianship subclass etc. You'd then have links of various types to
>> associate the individuals with the association and their role in the
>> association - father, mother, child and a set of rules - 0 or 1 father,
>> 0 or 1 mother, 0 to many children.
>
> I'd rather have a wide-variety of _relationships_ from one person to
> another [1] directly [2], and a wide variety of types of groups that
> might contain people in various _roles_
>
> [1] Perhaps they might allow one-to-many, i.e., a list, instead of
> one to one.
>
> [2] Instead of GEDCOM's indirect INDI->FAM->INDI model.

Although there is no Guardianship method, GEDCOM does provide for
adoption and fostering (and LDS sealing):
0 @I1@ INDI
1 FAMC @F1
1 FAMC @F2@
2 PEDI adopted

The adoptive relationship can be further specified by the ADOP event:
1 ADOP
2 FAMC @F2@
3 ADOP HUSB

Admittedly we have two different sub-structures relating @I1@ to @F2@
which could lead to lack of integrity if a program did not manage the
situation according to the standard. However, GEDCOM is a
communication
format, so programs should ensure that they communicate data in a state
of integrity.

Nigel Bufton


Tony Proctor

unread,
Oct 2, 2011, 6:01:17 AM10/2/11
to

"Wes Groleau" <Grolea...@FreeShell.org> wrote in message
news:j68nqm$qbg$1...@dont-email.me...
> There are two types of people in the world .
> http://Ideas.Lang-Learn.us/barrett?itemid=1157

It's definitely ISO 8601 Wes. See http://en.wikipedia.org/wiki/ISO_8601

I use this a lot with work. It was purposely defined for situations like
this. The ordering of elements is part of the standard rather than the
current locale. Also, the all-numeric format (yyyy-mm-dd) doesn't contain
any localised names such as Jan, January, etc

Tony Proctor


Tony Proctor

unread,
Oct 2, 2011, 6:04:16 AM10/2/11
to

"Wes Groleau" <Grolea...@FreeShell.org> wrote in message
news:j68oqu$u3f$1...@dont-email.me...
> There are two types of people in the world .
> http://Ideas.Lang-Learn.us/barrett?itemid=1157

I disagree strongly with your assessment of (c) Wes. It not only is possible
but it is being done all the time by (good-)XML designers, and there are
standards supporting it. Anyone putting locale-dependent data in public XML
content has sort of missed the point.

Tony Proctor


Tony Proctor

unread,
Oct 2, 2011, 6:05:36 AM10/2/11
to

"Wes Groleau" <Grolea...@FreeShell.org> wrote in message
news:j68qre$76c$1...@dont-email.me...
> There are two types of people in the world .
> http://Ideas.Lang-Learn.us/barrett?itemid=1157

Agreed Wes. I was using "parentage" more in the graphical sense than the
biological one. Apologies for that

Tony Proctor


Tony Proctor

unread,
Oct 2, 2011, 6:09:20 AM10/2/11
to

"Wes Groleau" <Grolea...@FreeShell.org> wrote in message
news:j68rh0$9sb$1...@dont-email.me...
> PLEASE! For a discussion of this complexity,
> it would be very helpful to see
>
> > point K
>
> Response to point K
>
> (instead of)
>
> > point A
> > point B
> > .....
> > point ZY
> > point ZZ
>
> Response to a point that is up there somewhere
>
> --
> Wes Groleau
>
> There are two types of people in the world .
> http://Ideas.Lang-Learn.us/barrett?itemid=1157

Lost me on this one I'm afraid...

Tony Proctor


Ian Goddard

unread,
Oct 2, 2011, 9:54:09 AM10/2/11
to
Wes Groleau wrote:
> On 10-01-2011 09:24, Ian Goddard wrote:
>> Taking an OO approach to design I'd start off with a very broad concept
>> such as Association which could then have a Family subclass, a
>> Guardianship subclass etc. You'd then have links of various types to
>> associate the individuals with the association and their role in the
>> association - father, mother, child and a set of rules - 0 or 1 father,
>> 0 or 1 mother, 0 to many children.
>
> I'd rather have a wide-variety of _relationships_ from one person to
> another [1] directly [2], and a wide variety of types of groups that
> might contain people in various _roles_
>
> [1] Perhaps they might allow one-to-many, i.e., a list, instead of
> one to one.
>
> [2] Instead of GEDCOM's indirect INDI->FAM->INDI model.
>

I think there are three possible models:

1. Direct - Ind to Ind.

2. Indirect - Ind to Assoc to Ind.

3. Double indirect - Ind to Link to Assoc to Link to Ind

I prefer the latter.

Say I send you a file which shows Young Fred as son of Old Fred (1st
form) or as a member of Old Fred's family (2nd form). This requires the
relationship to be expressed as some sort of pointer in either Young
Fred, Old Fred or the Family entity depending on the model and maybe in
more than one entity if we decide pointers have to be reciprocal. I
then realise that there was a different Old Fred so I update my data to
reflect this. We now have at two versions of at least one entity
floating about, the original of which you have a copy and my corrected
one. This is not a good situation.

If we use the double indirect version none of these entities need to be
changed. All the potentially labile information is contained in the
Link entities and we can then be free about changing our minds. If, in
the example, you're of the view from the off that Young Fred is actually
the son of Old Bill you can simply discard my link and substitute your
own without changing any of core entities.

Tony Proctor

unread,
Oct 2, 2011, 10:37:43 AM10/2/11
to

"Tony Proctor" <tony@proctor_NoMore_SPAM.net> wrote in message
news:j67fkb$7bs$1...@reader01.news.esat.net...
Sorry, ignore that. A conjecture, or course, is simply an inference on very
limited evidence. Hence, it is already catered for using the <Surety>
attribute :-)

Tony Proctor


Ian Goddard

unread,
Oct 2, 2011, 12:41:50 PM10/2/11
to
Wes Groleau wrote:
> On 10-01-2011 05:51, Ian Goddard wrote:
>> Agreed. One of the advantages of XML is that validation against a
>> schema makes it possible to reject a document outright even if only one
>> small part of it fails. That's what will keep the unofficial variations
>> out.
>
> I doubt it. Having an "official schema" doesn't stop Microsoft from
> changing things. They just create their own schema and pretend everyone
> else is non-standard. (And they're just an example--others
> do it, too.)
>

I suppose one determining factor is whether they can get away with it.
Clearly nobody could get away with an attempt to make their own tweaks
to something like TCP/IP because it just wouldn't work at all.

ISTR that MS had their own version of some XML technology - schemas or
XSL - because the official version wasn't out quickly enough but enabled
use of the standard when it came along; I'd guess the non-standard
version must be dead by now. Again, in multi-vendor situations a
non-standard implementation would lead to exclusion.

There's also a legal option here - have the format owned by a
foundation, trademark the name and grant a licence to use it only on
condition that a product claiming compatibility validates all documents
on import and export, rejects invalid documents and validates only
against schemas from the foundation's site. It wouldn't stop anyone
from using tweaked versions but they could be sued for trademark
violations by the trademark owner and, depending on the jurisdiction,
sued or even prosecuted under consumer protection legislation if they
tried to claim compatibility.

The other factor is need. I presume for GEDCOMish (for want of a better
word) applications is the inability of developers to represent their
data adequately using only the standard. If a protocol were developed
which was sufficient for their needs there would be no need for vendors
to tweak it except to lock in users. I'd like to think that eventually
users are going to get smart enough to realise that lock-in isn't to
their advantage although this may be simple optimism on my part.

Bob Melson

unread,
Oct 2, 2011, 2:42:17 PM10/2/11
to
On Saturday 01 October 2011 23:00, Wes Groleau (Grolea...@FreeShell.org)
opined:

> On 10-01-2011 05:51, Ian Goddard wrote:
>> Agreed. One of the advantages of XML is that validation against a
>> schema makes it possible to reject a document outright even if only one
>> small part of it fails. That's what will keep the unofficial variations
>> out.
>
> I doubt it. Having an "official schema" doesn't stop Microsoft from
> changing things. They just create their own schema and pretend everyone
> else is non-standard. (And they're just an example--others
> do it, too.)
>

Hear, hear! M$oft has the habit of creating its own competing standards
out of whole cloth and attempting to force others to use/accept them.
Anybody remember the not so distant past controversy over the open
document standard?
--
Robert G. Melson | Rio Grande MicroSolutions | El Paso, Texas
-----
The greatest tyrannies are always perpetrated
in the name of the noblest causes -- Thomas Paine

Tony Proctor

unread,
Oct 2, 2011, 3:44:25 PM10/2/11
to

"Bob Melson" <amia...@mypacks.net> wrote in message
news:MeWdnR3NLrYULRXT...@earthlink.com...
Yup! Happened to me too with OLAP databases. Didn't matter whether there
were "open standards" created by a representative group, or whether you had
patents registered. They just trump the whole lot with something tied into
their own technology stack and then rely on volume of sales to force it to
the top, whether it's better or worse.

Tony Proctor


Wes Groleau

unread,
Oct 2, 2011, 4:52:41 PM10/2/11
to
On 10-02-2011 12:41, Ian Goddard wrote:
> There's also a legal option here - have the format owned by a
> foundation, trademark the name and grant a licence to use it only on
> condition that a product claiming compatibility validates all documents
> on import and export, rejects invalid documents and validates only
> against schemas from the foundation's site. It wouldn't stop anyone
> from using tweaked versions but they could be sued for trademark
> violations by the trademark owner and, depending on the jurisdiction,
> sued or even prosecuted under consumer protection legislation if they
> tried to claim compatibility.

A foundation like ISO? This approach worked for Java, until Microsoft
decided to tamper with it and made their own version that broke the
portability between them and everyone else. Sun sued them to make
them stop using the name. Eventually, MS lost, but their solution
was to market "dot-Net" which was effectively the same thing in
functionality but still not portable.

LDS had a half-hearted version of the approach. AFAIK, they never
stopped anyone from using "GEDCOM", but they did offer an "official
certification" of your software if you jumped through a few hoops.

As you try to supplant GEDCOM with something better, learn from a little
history: Sun was trying to create a product, but kept having
delays because they were using the most error-prone popular language
there is (C). Finally, they decided to solve the problem by creating
a language in which the kinds of errors these experienced C programmers
kept making were impossible. They didn't do much about the kinds of
errors INexperienced C programmers make. They also deliberately banned
things they considered unsafe without bothering to do their homework
and find out how other languages had made those things safe.

Result: a language much better than C but far worse than it could have
been. But an improvement is an improvement, right? But is everyone
programming in Java now? No, C is just as popular as ever, C# may be
more popular than Java, and plenty of other languages (some of them
better than Java or C#) are still in wide use.

Short version: They made a better language, but it didn't get the
adoption they hoped, and it was largely replaced by another that
was not a significant improvement but was incompatible.

Wes Groleau

unread,
Oct 2, 2011, 4:56:44 PM10/2/11
to
On 10-02-2011 06:09, Tony Proctor wrote:
> Lost me on this one I'm afraid...

http://www.dmoz.org/Computers/Usenet/Etiquette/

No big deal on short posts, but when you quote five screens of stuff and
at the end put a comment on something somewhere in the middle, ....

--
Wes Groleau

There are two types of people in the world …
http://Ideas.Lang-Learn.us/barrett?itemid=1157

Wes Groleau

unread,
Oct 2, 2011, 4:59:09 PM10/2/11
to
On 10-02-2011 03:31, NigelBufton wrote:
> Admittedly we have two different sub-structures relating @I1@ to @F2@
> which could lead to lack of integrity if a program did not manage the

Three. There is also the ASSO.

Tony Proctor

unread,
Oct 2, 2011, 5:02:33 PM10/2/11
to

"Wes Groleau" <Grolea...@FreeShell.org> wrote in message
news:j6aiuq$4ea$1...@dont-email.me...
> There are two types of people in the world .
> http://Ideas.Lang-Learn.us/barrett?itemid=1157

Hmm. I don't want to sidetrack this thread but Java is very good nowadays,
especially since 'generics' were added. I agree they could have made it this
good right from the start, and their support classes did undergo a few
serious revisions.

Most people I know continued to use C because of familiarity, or fears over
performance. It may have changed recently but M$soft were once accused of
not using their own .Net languages for anything that they sold themselves.

I never did like C anyway, and C++ has a vile syntax IMHO. We can all
pick-and-choose our history lessons but without any progress at all then
we'd still be writing in assembler :-)

Tony Proctor


Wes Groleau

unread,
Oct 2, 2011, 5:04:43 PM10/2/11
to
On 10-02-2011 06:01, Tony Proctor wrote:
> I use this a lot with work. It was purposely defined for situations like
> this. The ordering of elements is part of the standard rather than the
> current locale. Also, the all-numeric format (yyyy-mm-dd) doesn't contain
> any localised names such as Jan, January, etc

Not for situations "like this." "This" needs support for ranges,
approximations, uncertainties, one or two of the three parts being
unknown (which the current GEDCOM only _partly_ handles).

Localization is trivial, and has been solved in several ways already.

As for ordering, every date representation scheme has only one ordering
that makes any sense, whether it is explicitly stated or not.

--
Wes Groleau

There are two types of people in the world …
http://Ideas.Lang-Learn.us/barrett?itemid=1157

Wes Groleau

unread,
Oct 2, 2011, 5:13:57 PM10/2/11
to
On 10-02-2011 06:04, Tony Proctor wrote:
> I disagree strongly with your assessment of (c) Wes. It not only is possible
> but it is being done all the time by (good-)XML designers, and there are
> standards supporting it. Anyone putting locale-dependent data in public XML
> content has sort of missed the point.

A "locale" is a scheme for representing one or more of dates,
numbers, times, etc. How is what you want us to use not a
"scheme for representing …"

There needs to be a standard way of representing them, yes.
But that way is effectively another "locale"

For dates, changing from the GEDCOM "standard" to ISO 8601 would
make implementation of sorting simpler at the cost of dropping
a lot of existing flexibility.

--
Wes Groleau

There are two types of people in the world …
http://Ideas.Lang-Learn.us/barrett?itemid=1157

Wes Groleau

unread,
Oct 2, 2011, 5:20:41 PM10/2/11
to
On 10-02-2011 17:02, Tony Proctor wrote:
> I never did like C anyway, and C++ has a vile syntax IMHO. We can all
> pick-and-choose our history lessons but without any progress at all then
> we'd still be writing in assembler :-)

I didn't say we have to stick to GEDCOM. Just a caution about thinking
that your/our/that other alternative is going to save the day.

Adoption depends on people, and people are hard to predict.
Decades of griping about the flaws in GEDCOM and dozens of
alternative proposals have so far failed to make any
significant difference.

I was once rather vocal about my gripes, but I've pretty
much given up. Haven't changed my opinions about its
deficiencies, but I also haven't changed my opinion that,
bad as it is, it's still BETTER than most of the implementations
of it.

--
Wes Groleau

There are two types of people in the world …
http://Ideas.Lang-Learn.us/barrett?itemid=1157

Ian Goddard

unread,
Oct 3, 2011, 5:32:18 AM10/3/11
to
Wes Groleau wrote:
> On 10-02-2011 12:41, Ian Goddard wrote:
>> There's also a legal option here - have the format owned by a
>> foundation, trademark the name and grant a licence to use it only on
>> condition that a product claiming compatibility validates all documents
>> on import and export, rejects invalid documents and validates only
>> against schemas from the foundation's site. It wouldn't stop anyone
>> from using tweaked versions but they could be sued for trademark
>> violations by the trademark owner and, depending on the jurisdiction,
>> sued or even prosecuted under consumer protection legislation if they
>> tried to claim compatibility.
>
> A foundation like ISO?

No. I had in mind various foundations from the FOSS world.

> This approach worked for Java, until Microsoft
> decided to tamper with it and made their own version that broke the
> portability between them and everyone else. Sun sued them to make
> them stop using the name. Eventually, MS lost, but their solution
> was to market "dot-Net" which was effectively the same thing in
> functionality but still not portable.
>
> LDS had a half-hearted version of the approach. AFAIK, they never
> stopped anyone from using "GEDCOM", but they did offer an "official
> certification" of your software if you jumped through a few hoops.

AIUI one of the requirements of trademark law is that you do make proper
efforts.

> As you try to supplant GEDCOM with something better, learn from a little
> history:

Indeed. Recent history shows that providing the standard is good enough
it tends not to get fractured. The market would reject deviants. For
instance I haven't heard of anyone trying to impose their own variations
of PDF, MP3, JPEG, etc; it just wouldn't be worth the effort. Even the
example you quoted above proves the point although that needed a trip to
court.

But you missed my main point which is that the validation mechanism of
XML makes it straightforward to confirm whether the product does what it
says on the tin - or in this case, on the shrink-wrapped box. YMMV but
over here one remedy for an aggrieved -punter- consumer is to take his
complaint to Trading Standards who have considerable powers including
prosecution.

This line of discussion ignores one issue, however. Are the variations
on GEDCOM really attempts to lock-in customers or simply uncoordinated
attempts to extend the original format beyond its intended scope?

tms

unread,
Oct 3, 2011, 2:46:45 PM10/3/11
to
On Oct 1, 5:51 am, Ian Goddard <godda...@hotmail.co.uk> wrote:
> Tony Proctor wrote:
>
> > I didn't expand on the structure of the elements I'd proposed. However, if
> > XML were going to be used then their design would have to follow best
> > practices to ensure that a schema-based validation was possible.
>
> Agreed.  One of the advantages of XML is that validation against a
> schema makes it possible to reject a document outright even if only one
> small part of it fails.  That's what will keep the unofficial variations
> out.

That will also keep anyone from using such a program. Imagine, you
just downloaded a database from somewhere that contains a vital clue
you have been searching for for years, but your genealogy program
refuses to load the data because it contains an unrecognized tag, say
one of the tags I add to SOUR records to help BibTeX format them
nicely. Will you: 1) praise your program for being so diligent, or 2)
curse it for not letting you get the data you want, and switch to
another program?

XML schema are useful in some circumstances, but not when the data are
coming from multiple uncontrolled sources.

> Because schema references are in the form of URLs an application could
> keep abreast of the latest schemas even if it wasn't able to use newly
> defined elements.  This would, of course, enable a company to define its
> own extended schema but unless it published it on the web it would be
> automatically failed.  And a program would be able to check that the
> schema came from the official site and reject it if it didn't.

So users could not import data unless they were connected to the net?
And users would not be allowed to create their own tags, as Gedcom
allows?

tms

unread,
Oct 3, 2011, 3:07:53 PM10/3/11
to
On Oct 1, 5:35 am, "Tony Proctor" <tony@proctor_NoMore_SPAM.net>
wrote:
>
> Much of my own research
> contains narrative and I have no option but to store it separate in
> Word/pdf/etc documents. It then becomes sidelined and wouldn't get used by a
> desktop tool.

What's wrong with putting it in NOTEs? That's what I do, complete
with LaTeX markup. Works just fine.

Tony Proctor

unread,
Oct 3, 2011, 3:58:01 PM10/3/11
to

"tms" <tmsom...@gmail.com> wrote in message
news:c2c9b2fd-4547-44b7...@k6g2000yql.googlegroups.com...
The point Ian was making was that XML provides a way of registering extra
namespaces that can contain extended or custom tags. Also, that schema
definition files can be used to perform automated validation of all
contributions to an XML document.

If a program were to reject an XML document then it would mean that either
the vendor didn't follow the standard, or they had a bug in their schema, or
the file itself was corrupt.

Either way I see that as a good thing. There should be no reason for a
vendor to step outside the standard if the XML is defined properly.

Contrast that with GEDCOM where programs are known to generate variations of
the standard, or openly discard bits they don't recognise or haven't
implemented per the standard

Tony Proctor


tms

unread,
Oct 3, 2011, 4:32:58 PM10/3/11
to
On Oct 3, 3:58 pm, "Tony Proctor" <tony@proctor_NoMore_SPAM.net>
wrote:
> "tms" <tmsomme...@gmail.com> wrote in message
The attributions above are messed up.

> The point Ian was making was that XML provides a way of registering extra
> namespaces that can contain extended or custom tags. Also, that schema
> definition files can be used to perform automated validation of all
> contributions to an XML document.

But why should I be forced to go to the trouble of registering a
namespace, etc., just to use a custom tag that probably only I care
about?

> If a program were to reject an XML document then it would mean that either
> the vendor didn't follow the standard, or they had a bug in their schema, or
> the file itself was corrupt.

Or a user defined his or her own tag, something allowed by Gedcom.
None of the given reasons, except possibly corruption, justified
completely rejecting importation.

> Either way I see that as a good thing. There should be no reason for a
> vendor to step outside the standard if the XML is defined properly.

Don't forget that XHTML died because it insisted that non-conforming
pages be rejected.

> Contrast that with GEDCOM where programs are known to generate variations of
> the standard, or openly discard bits they don't recognise or haven't
> implemented per the standard

That is not a problem with Gedcom', but with the implementations.
Gedcom specifically allows user-defined tags, a feature that I find
very useful. I created some tags to help BibTeX format entries
nicely, and others to tell LaTeX how to layout images. With the
proposed strict XML, I'd be out of luck. Or rather, I'd find a
program that was not so strict. Is validating XML really so important
that programs should be otherwise crippled just to do it?

tms

unread,
Oct 3, 2011, 4:41:09 PM10/3/11
to
On Oct 2, 12:55 am, Wes Groleau <Groleau+n...@FreeShell.org> wrote:
>
> Originally, GEDCOM said nobody is related to anybody, instead we're
> all related to families, and there are only three relationships: HUSB,
> WIFE, CHIL.  Eventually, they recognized that there _are_ other
> relationships, so they invented ASSO [1].  So now, there are two
> classes of relationships, and the "main" ones are still required
> to be indirect.

The child is not really the descendant of a father and the descendant
of a mother, but it is the descendant of a father and a mother
simultaneously. I mean that, modulo things like test tubes, the child
is the product of its parents jointly, not severally (to use legal
terminology).

Ian Goddard

unread,
Oct 3, 2011, 5:29:04 PM10/3/11
to
tms wrote:
> On Oct 3, 3:58 pm, "Tony Proctor"<tony@proctor_NoMore_SPAM.net>
> wrote:
>> "tms"<tmsomme...@gmail.com> wrote in message
>>
>> news:c2c9b2fd-4547-44b7...@k6g2000yql.googlegroups.com...
>> On Oct 1, 5:51 am, Ian Goddard<godda...@hotmail.co.uk> wrote:
>>
>>> Tony Proctor wrote:

>> The point Ian was making was that XML provides a way of registering extra
>> namespaces that can contain extended or custom tags. Also, that schema
>> definition files can be used to perform automated validation of all
>> contributions to an XML document.

Actually I wasn't thinking about registering namespaces as such but of
providing spaces within the overall schema where ad hoc data could be
placed as name/value pairs. See the example below.

> But why should I be forced to go to the trouble of registering a
> namespace, etc., just to use a custom tag that probably only I care
> about?
>

You don't. An example.

<Source type="archive" ID="660f78b6-ec5a-11e0-b261-001636e96075">
<ParentID/>
<SourceName>Yorkshire Archaeological Society Archive</SourceName>
<ShortName>Yorks Arch Soc Archive</ShortName>
<BriefName>YAS Archive</BriefName>
<AdHoc>
<Item Name="Address" Value="Claremont"/>
<!-- Add as many Items as required -->
</AdHoc>
</Source>

The expectation would be that in programming terms at data entry time
the application would simply throw up some form of double column control
to allow the user to enter stuff.

Another application receiving it wouldn't be expected to parse it, index
it or whatever, just display it as found. A human user would impute
meaning to it, the application wouldn't. I think this fills Tom's
desire for somewhere to enter stuff only he cares about without
registering a namespace but also without causing validation problems as
AdHoc & Item elements would be part of the schema.

In practical terms, of course, when the data is displayed the Name part
of the Item replaces the caption that a program might supply for a the
contents of a regular element. This is why I said earlier that this
aspect of the interface can't be localised for the reader. Of course
this might not matter; if the caption is all Greek to the reader then
the data probably is too.

Ian Goddard

unread,
Oct 3, 2011, 5:40:37 PM10/3/11
to
Ian Goddard wrote:
>
> <Source type="archive" ID="660f78b6-ec5a-11e0-b261-001636e96075">
> <ParentID/>
> <SourceName>Yorkshire Archaeological Society Archive</SourceName>
> <ShortName>Yorks Arch Soc Archive</ShortName>
> <BriefName>YAS Archive</BriefName>
> <AdHoc>
> <Item Name="Address" Value="Claremont"/>
> <!-- Add as many Items as required -->
> </AdHoc>
> </Source>
>

Taking this off in another direction. The above example assumes that
there's a single Source element. If we take an OO approach to the
design we may decide we need to have a sub-class of Source just for
archives or maybe for any other type of Source with a street address
(e.g. a publisher). In that case an address element containing a number
of address lines might be a reasonable additional member of the
sub-class. Ad hoc information then might simply be used to add extra
info such as "Make appointments at least 24 hours in advance".

However a period of prototyping using an AdHoc element to sub-class on
the fly, as it were, might be quite a good way of identifying what
sub-classes are actually needed in practice.

Ian Goddard

unread,
Oct 3, 2011, 5:58:43 PM10/3/11
to
tms wrote:
> On Oct 1, 5:51 am, Ian Goddard<godda...@hotmail.co.uk> wrote:
>> Tony Proctor wrote:
>>
>>> I didn't expand on the structure of the elements I'd proposed. However, if
>>> XML were going to be used then their design would have to follow best
>>> practices to ensure that a schema-based validation was possible.
>>
>> Agreed. One of the advantages of XML is that validation against a
>> schema makes it possible to reject a document outright even if only one
>> small part of it fails. That's what will keep the unofficial variations
>> out.
>
> That will also keep anyone from using such a program. Imagine, you
> just downloaded a database from somewhere that contains a vital clue
> you have been searching for for years, but your genealogy program
> refuses to load the data because it contains an unrecognized tag, say
> one of the tags I add to SOUR records to help BibTeX format them
> nicely. Will you: 1) praise your program for being so diligent, or 2)
> curse it for not letting you get the data you want, and switch to
> another program?

Why would I want these? My application wouldn't know what to do with
them especially as they seem to belong to the presentation domain and
not to data at all.

> XML schema are useful in some circumstances, but not when the data are
> coming from multiple uncontrolled sources.

I think you're swapping problem and solution here.

>> Because schema references are in the form of URLs an application could
>> keep abreast of the latest schemas even if it wasn't able to use newly
>> defined elements. This would, of course, enable a company to define its
>> own extended schema but unless it published it on the web it would be
>> automatically failed. And a program would be able to check that the
>> schema came from the official site and reject it if it didn't.
>
> So users could not import data unless they were connected to the net?
> And users would not be allowed to create their own tags, as Gedcom
> allows?

I think one wouldn't expect new versions to come out very frequently.
And it seems pretty normal to store copies of schemas locally. I just
ran a find and count on my XML IDE's directory and it has 271 of them.
So access to the schemas when off-line wouldn't be a problem.

cecilia

unread,
Oct 3, 2011, 7:24:13 PM10/3/11
to
On Sun, 02 Oct 2011 16:56:44 -0400, Wes Groleau wrote:

>[...] when you quote five screens of stuff and
>at the end put a comment on something somewhere in the middle, ....

One should also remember that it is rarely necessary to quote
everything in the post to which one is replying. Judicious excision,
leaving only enough of the point(s) being responded to to enable the
reader to "keep up", is advised.

Wes Groleau

unread,
Oct 3, 2011, 9:41:44 PM10/3/11
to
On 10-03-2011 05:32, Ian Goddard wrote:
> This line of discussion ignores one issue, however. Are the variations
> on GEDCOM really attempts to lock-in customers or simply uncoordinated
> attempts to extend the original format beyond its intended scope?

Some of them are the first, some of them are the second, and some of
them are attempting to make up for a lack that doesn't exist.

Wes Groleau

unread,
Oct 3, 2011, 9:45:43 PM10/3/11
to
On 10-03-2011 15:58, Tony Proctor wrote:
> Contrast that with GEDCOM where programs are known to generate variations of
> the standard, or openly discard bits they don't recognise or haven't
> implemented per the standard

Programs exist that can check GEDCOM files for compliance.
Obviously, people can't be forced to use them, and can't
be forced to use schemas either.

Wes Groleau

unread,
Oct 3, 2011, 9:49:49 PM10/3/11
to
On 10-03-2011 17:58, Ian Goddard wrote:
> Why would I want these? My application wouldn't know what to do with
> them especially as they seem to belong to the presentation domain and
> not to data at all.

Wasn't one of the original motivators the anger of a program discarding
data just because it doesn't know what to do with it?

But, even if you could persuade everyone to validate with a schema,
is there such I thing as a schema that can compare input to output and
verify nothing was omitted?

Wes Groleau

unread,
Oct 3, 2011, 9:56:05 PM10/3/11
to
I am not a purist about genetics. I am interested in relationships,
of which the genetic is one of many types. A data model that supports
documentation of many different kinds will not likely prevent
documentation of the genetic kind. One (like GEDCOM) that kludges on a
different approach to some relationships after finding its original
approach is inadequate, just "goes against the grain" for me.

tms

unread,
Oct 5, 2011, 1:54:19 PM10/5/11
to
On Oct 3, 9:56 pm, Wes Groleau <Groleau+n...@FreeShell.org> wrote:
> On 10-03-2011 16:41, tms wrote:
>
> > On Oct 2, 12:55 am, Wes Groleau<Groleau+n...@FreeShell.org>  wrote:
>
> >> Originally, GEDCOM said nobody is related to anybody, instead we're
> >> all related to families, and there are only three relationships: HUSB,
> >> WIFE, CHIL.  Eventually, they recognized that there _are_ other
> >> relationships, so they invented ASSO [1].  So now, there are two
> >> classes of relationships, and the "main" ones are still required
> >> to be indirect.
>
> > The child is not really the descendant of a father and the descendant
> > of a mother, but it is the descendant of a father and a mother
> > simultaneously.  I mean that, modulo things like test tubes, the child
> > is the product of its parents jointly, not severally (to use legal
> > terminology).
>
> I am not a purist about genetics.  I am interested in relationships,
> of which the genetic is one of many types.

I'm not a purist, either. It is just that the father-mother-child
relationship is fundamental to genealogy, and it makes sense to make
that relationship the basis of one's genealogical data. Also note
that it is a ternary, not binary, relation in fact, and so it should
be represented as such in the data.

>  A data model that supports
> documentation of many different kinds will not likely prevent
> documentation of the genetic kind.  One (like GEDCOM) that kludges on a
> different approach to some relationships after finding its original
> approach is inadequate, just "goes against the grain" for me.

Other kinds of relationships are without number (not literally, of
course), and any list is bound to leave some out. So a generic link
such as ASSO also makes sense.

Tony Proctor

unread,
Oct 5, 2011, 2:17:54 PM10/5/11
to

"tms" <tmsom...@gmail.com> wrote in message
news:b9cd80d0-4b5b-448c...@i28g2000yqn.googlegroups.com...
I would have to argue against the 'ternary' description, unless of course
you want a node representing the physical conception. Otherwise, each person
only has one progenitive father and one progenitive mother. Whether the
procreation was achieved inside a legal marriage, an illegal one, outside of
marriage, or with no knowledge that it had happened at all, is irrelevant
from that point of view.

Although there are many other types of linkage, I do believe that some of
the more common ones should be acknowledged using standard tags, e.g.
adoptive mother/father, foster mother/father, guardian mother/father.

The scheme I had originally suggested at the top of this thread would allow
for any number of generic person-to-person links through person-reference
elements being embedded in narrative notes.

Tony Proctor


tms

unread,
Oct 5, 2011, 2:50:16 PM10/5/11
to

That's one of the things I dislike about XML: even its advocates
misuse it. Attributes are for metadata, not data.

Putting that aside, what advantage does this scheme give you over just
allowing user-defined tags? Or over the Gedcom equivalent: "2
_ADDRESS Claremont"? Sure, you get to run it through a validator, but
the validator won't understand the stuff inside the AdHoc tag, so
validation doesn't buy much. Especially since the validator can't
tell if the tags are being used correctly.

> The expectation would be that in programming terms at data entry time
> the application would simply throw up some form of double column control
> to allow the user to enter stuff.

Not all programs have GUIs.

tms

unread,
Oct 5, 2011, 2:55:19 PM10/5/11
to
On Oct 3, 9:45 pm, Wes Groleau <Groleau+n...@FreeShell.org> wrote:
> On 10-03-2011 15:58, Tony Proctor wrote:
>
> > Contrast that with GEDCOM where programs are known to generate variations of
> > the standard, or openly discard bits they don't recognise or haven't
> > implemented per the standard
>
> Programs exist that can check GEDCOM files for compliance.
> Obviously, people can't be forced to use them, and can't
> be forced to use schemas either.

Exactly. Nor can people or programs be forced to use Gedcom or XML
tags correctly. Validation may appear, superficially, to be useful,
but in the present context, it is not of much help at all, especially
if your program is going to reject any "invalid" input.

tms

unread,
Oct 5, 2011, 3:17:04 PM10/5/11
to
On Oct 3, 5:58 pm, Ian Goddard <godda...@hotmail.co.uk> wrote:
> tms wrote:
> > On Oct 1, 5:51 am, Ian Goddard<godda...@hotmail.co.uk>  wrote:
> >> Tony Proctor wrote:
>
> >>> I didn't expand on the structure of the elements I'd proposed. However, if
> >>> XML were going to be used then their design would have to follow best
> >>> practices to ensure that a schema-based validation was possible.
>
> >> Agreed.  One of the advantages of XML is that validation against a
> >> schema makes it possible to reject a document outright even if only one
> >> small part of it fails.  That's what will keep the unofficial variations
> >> out.
>
> > That will also keep anyone from using such a program.  Imagine, you
> > just downloaded a database from somewhere that contains a vital clue
> > you have been searching for for years, but your genealogy program
> > refuses to load the data because it contains an unrecognized tag, say
> > one of the tags I add to SOUR records to help BibTeX format them
> > nicely.  Will you: 1) praise your program for being so diligent, or 2)
> > curse it for not letting you get the data you want, and switch to
> > another program?
>
> Why would I want these?  My application wouldn't know what to do with
> them especially as they seem to belong to the presentation domain and
> not to data at all.

You probably don't want them, but I do, and I should be able to have
them.

> > XML schema are useful in some circumstances, but not when the data are
> > coming from multiple uncontrolled sources.
>
> I think you're swapping problem and solution here.

I don't understand what you mean. What I meant, if it was not clear,
was that XML schema are useful when one party is able to impose its
will on everyone else. If, for example, the new FamilySearch were to
use XML in their API ( I have no idea whether they do so), they would
be able to say to their users (and get away with it), "Use our schema
or else." But no one can force genealogy programs to only export
"correct" XML. Therefore, rejecting the import of "incorrect" XML
only hurts users by denying them access to exports they might want to
see.

> >> Because schema references are in the form of URLs an application could
> >> keep abreast of the latest schemas even if it wasn't able to use newly
> >> defined elements.  This would, of course, enable a company to define its
> >> own extended schema but unless it published it on the web it would be
> >> automatically failed.  And a program would be able to check that the
> >> schema came from the official site and reject it if it didn't.
>
> > So users could not import data unless they were connected to the net?
> > And users would not be allowed to create their own tags, as Gedcom
> > allows?
>
> I think one wouldn't expect new versions to come out very frequently.
> And it seems pretty normal to store copies of schemas locally.  I just
> ran a find and count on my XML IDE's directory and it has 271 of them.
> So access to the schemas when off-line wouldn't be a problem.

What prevents someone, say a vendor, from shipping a modified schema?

tms

unread,
Oct 5, 2011, 3:46:57 PM10/5/11
to
On Oct 5, 2:17 pm, "Tony Proctor" <tony@proctor_NoMore_SPAM.net>
wrote:
> "tms" <tmsomme...@gmail.com> wrote in message
>
> I'm not a purist, either.  It is just that the father-mother-child
> relationship is fundamental to genealogy, and it makes sense to make
> that relationship the basis of one's genealogical data.  Also note
> that it is a ternary, not binary, relation in fact, and so it should
> be represented as such in the data.
>
> I would have to argue against the 'ternary' description, unless of course
> you want a node representing the physical conception.

That would be the most correct way to represent the relation.

> Otherwise, each person
> only has one progenitive father and one progenitive mother. Whether the
> procreation was achieved inside a legal marriage, an illegal one, outside of
> marriage, or with no knowledge that it had happened at all, is irrelevant
> from that point of view.

Granted that Gedcom's FAM is a bit ambiguous, as it includes
biological children as well as step-, foster-, adopted-, and raised-by-
wolves- children, but the ADOP and ASSO tags can resolve the
ambiguities. Not, perhaps, the most elegant solution, but it works.

> The scheme I had originally suggested at the top of this thread would allow
> for any number of generic person-to-person links through person-reference
> elements being embedded in narrative notes.

Which seems a step backward from Gedcom's ASSO, which is at least
structured.

Tony Proctor

unread,
Oct 5, 2011, 3:54:19 PM10/5/11
to

"tms" <tmsom...@gmail.com> wrote in message
news:9b9f5558-9e67-4a88...@t16g2000yqm.googlegroups.com...
Sorry, I may have confused you there. The scheme I proposed had specific
standard tags for all the cases I mentioned. However, the facility to embed
links from narrative to anyone at all (e.g my teacher, Santa Claus, etc)
would soak up a great many "other cases" without having to make use of the
custom-tag feature that was also proposed.

Tony Proctor


Wes Groleau

unread,
Oct 6, 2011, 12:39:34 AM10/6/11
to
On 10-05-2011 14:50, tms wrote:
> Putting that aside, what advantage does this scheme give you over just
> allowing user-defined tags? Or over the Gedcom equivalent: "2
> _ADDRESS Claremont"? Sure, you get to run it through a validator, but
> the validator won't understand the stuff inside the AdHoc tag, so
> validation doesn't buy much. Especially since the validator can't
> tell if the tags are being used correctly.

Nor can the validator prevent the program from discarding it.

Wes Groleau

unread,
Oct 6, 2011, 12:42:11 AM10/6/11
to
On 10-05-2011 15:17, tms wrote:
> will on everyone else. If, for example, the new FamilySearch were to
> use XML in their API ( I have no idea whether they do so), they would
> be able to say to their users (and get away with it), "Use our schema

They do. But they don't force anyone else to do anything based on that
fact. Instead (it seems to me) they send the XML to your browser along
with XSL to make it display nicely.

Steve Hayes

unread,
Oct 6, 2011, 2:57:28 AM10/6/11
to
On Wed, 5 Oct 2011 10:54:19 -0700 (PDT), tms <tmsom...@gmail.com> wrote:

>On Oct 3, 9:56 pm, Wes Groleau <Groleau+n...@FreeShell.org> wrote:
>> I am not a purist about genetics.  I am interested in relationships,
>> of which the genetic is one of many types.
>
>I'm not a purist, either. It is just that the father-mother-child
>relationship is fundamental to genealogy, and it makes sense to make
>that relationship the basis of one's genealogical data. Also note
>that it is a ternary, not binary, relation in fact, and so it should
>be represented as such in the data.

I think the only relationships that really need to be recorded in a genealogy
program are child-father and child-mother. If the sperm came from a sperm bank
then the father-mother relationship is zilch. It cvan be noted, but it is not
essential.

If you are interested in more relationships than genetic, then you need the
kind of program described here:

http://hayesgreene.blogspot.com/2011/05/event-based-history-and-genealogy.html

Unfortunately, no such program exists.

We are spoilt for choice when it comes to lineage-linked genealogy programs,
but I don't know any program that does that (and, before anyone asks, yes, I
HAVE looked at TMG, and Gramps, and no, they DON'T do that).


--
Steve Hayes from Tshwane, South Africa
Web: http://hayesfam.bravehost.com/stevesig.htm
Blog: http://methodius.blogspot.com
E-mail - see web page, or parse: shayes at dunelm full stop org full stop uk

Ian Goddard

unread,
Oct 6, 2011, 6:58:19 AM10/6/11
to
tms wrote:
>>> XML schema are useful in some circumstances, but not when the data are
>>> coming from multiple uncontrolled sources.
>>
>> I think you're swapping problem and solution here.
>
> I don't understand what you mean.

Then let me explain. Uncontrolled sources are the problem. That's
right. The *problem*.

As a general matter, if a program is going to handle data it (OK, I'm
being anthropomorphic; the program's programmer if we insist on being
pedantic) must understand that data. If the data source insists on
cobbling together whatever it feels like this proviso is not going to be
met. That's why multiple uncontrolled sources are a problem.

Let's take an example. The current IP addressing scheme of 4 octets is
running out of space. What if some of us decided to save the situation
by adding another octet to our addresses? None of the kit out there
gluing the internet together wouldn't understand our extended addresses.
Fortunately such invalid addresses would be rejected. The imposition
of rigid limits of what can and can't be transmitted is essential to
keeping things working and stopping uncontrolled sources wreaking havoc.

In XML it's schemas which provide that control. They are the solution,
not the problem.

Ian Goddard

unread,
Oct 6, 2011, 7:59:50 AM10/6/11
to
Where did you get that idea? Perhaps you've been reading the tutorial
on the W3C site which was clearly written by someone who doesn't like
attributes but even so it's prefaced by the statement "There are no
rules about when to use attributes or when to use elements." and
something similar to that is found in any textbook on XML I've ever read.

I used that particular form as being the best to point up the fact that
an Item was being constructed of Name/Value pairs. The following
alternative is indeed a more practical solution:

<AdHoc>
<Item Name="Address">
<![CDATA[Claremont,
23 Clarendon Road,
Leeds,
LS2 9NZ]]>
</Item>
</AdHoc>

> Putting that aside, what advantage does this scheme give you over just
> allowing user-defined tags? Or over the Gedcom equivalent: "2
> _ADDRESS Claremont"?

I did point out that in reality one might take an OO approach and
subclass Source where there was a requirement for an address.
Criticizing an example simplified for illustrative purposes for being
simplified does not help. However to deal with your more significant point:

> Sure, you get to run it through a validator, but
> the validator won't understand the stuff inside the AdHoc tag, so
> validation doesn't buy much. Especially since the validator can't
> tell if the tags are being used correctly.

The schema would specify an AdHoc element consisting of a list of Item
elements and in Item consisting of a Name attribute and whatever
variation we finalise on for the Item contents. The validator would
have no problem whatsoever with understanding this. If this is what you
mean as being able to tell if the tags are being used correctly, then it
can.

From a semantic point of view the validator isn't going to check usage
because validators only check that the form matches the schema.

I think I also pointed out elsewhere that where there are data whose
semantics are significant to the application then a dedicated element
should be used. The entire AdHoc structure is provided for those data
which are not significant in that way and for which there is, therefore,
no dedicated element.

In fact, it exists to allow you to do what you wanted: invent your own
tags for that sort of thing and to do so in a manner which enables you
to follow the schema and generate a valid XML document.

>
> Not all programs have GUIs.
>

Curses. In fact, ncurses.

Peter J. Seymour

unread,
Oct 6, 2011, 11:20:40 AM10/6/11
to
On 2011-10-06 07:57, Steve Hayes wrote:
.....
>
> If you are interested in more relationships than genetic, then you need the
> kind of program described here:
>
> http://hayesgreene.blogspot.com/2011/05/event-based-history-and-genealogy.html
>
> Unfortunately, no such program exists.
>
> We are spoilt for choice when it comes to lineage-linked genealogy programs,
> but I don't know any program that does that (and, before anyone asks, yes, I
> HAVE looked at TMG, and Gramps, and no, they DON'T do that).
>
>

Having read the entry at the supplied link, I am not clear on what you
are after. Is it that you want links between events other than via a
person or couple or by date or place?

Tony Proctor

unread,
Oct 6, 2011, 12:06:51 PM10/6/11
to

"Steve Hayes" <haye...@telkomsa.net> wrote in message
news:jujq87peifp2mnht0...@4ax.com...
Interesting link Steve. It pretty much sums up what I've been trying to
describe with my approach to supporting lineage plus events plus narrative.
It certainly does have applicability beyond Family History, as you say.
Supporting pure Genealogy, by comparison, is a much easier goal.

I think the only point I slightly disagree with is that of designing
database tables around the requirement. Such tables would be an
implementation of the solution rather than the solution itself. In other
words, you wouldn't want to mandate either a particular database or a
particular set of tables so introducing them at such an early stage might be
misunderstood.

Tony Proctor


Steve Hayes

unread,
Oct 6, 2011, 1:05:11 PM10/6/11
to
>Interesting link Steve. It pretty much sums up what I've been trying to
>describe with my approach to supporting lineage plus events plus narrative.
>It certainly does have applicability beyond Family History, as you say.
>Supporting pure Genealogy, by comparison, is a much easier goal.
>
>I think the only point I slightly disagree with is that of designing
>database tables around the requirement. Such tables would be an
>implementation of the solution rather than the solution itself. In other
>words, you wouldn't want to mandate either a particular database or a
>particular set of tables so introducing them at such an early stage might be
>misunderstood.

The set of tables is a sample, to illustrate what if have in mind.

My hope is that others who feel the need for such a thing, and are willing to
work on trying to develop it, shuold look at it and suggest improvements.

Tony Proctor

unread,
Oct 6, 2011, 1:33:15 PM10/6/11
to

"Steve Hayes" <haye...@telkomsa.net> wrote in message
news:4mnr87lgajbt7qkq3...@4ax.com...

I started a similar project a while back, but from a different direction. I
had been keeping my notes in a written form with a well-defined syntax, i.e.
one that could be parsed in principle. This was because I didn't trust the
functionality of any of he commercial products, and also didn't want to get
locked in to any one of them.

I started to think about how I wanted to store my own data. This thread is
touching on an XML format I experimented with - to the point of actually
creating some examples. This is why I described it as a definitive
source/backup format as well as an import/export format and database load
format. The same format would serve all purposes.

The second phase was to define a run-time object model around that data.
Unfortunately, my paid work took over at that point.

My final stage would have been to create a POC incorporating a relational
database. However, both the definitive source format and the object model
would be generic, and could be standardised in principle.

I was hoping to get back to this when my contract finishes. However, if
you're still interested in it yourself then I'd be willing to take it
offline and compare notes. From this thread, I think I realise that the
people who actually "get it", and can see the advantages, are in a minority.
I think that means it's either going to need a "killer product" or Microsoft
to force the quantum leap :-)

Tony Proctor


Tony Proctor

unread,
Oct 6, 2011, 3:03:50 PM10/6/11
to

"Tony Proctor" <tony@proctor_NoMore_SPAM.net> wrote in message
news:j6icl4$vu5$1...@reader01.news.esat.net...
Having a node representing the conception is full of messy problems. It is
very unlikely you can put a date on it so it cannot be treated like an
event. It's possible, of course, that one or both of the parties are
unknown. However, for a biggish family - say 12 children by the same
couple - would your extra node be duplicated for each one? If you didn't do
that then it could make it difficult to manipulate two sets of offspring
where only one of the parents is common but the dates-of-birth are mixed.

In effect, I don't believe it represents anything useful, and a bottom-up
approach is much more practical and meaningful.

The only universal events for a person are birth and death. All the others
like baptism, marriage, divorce, burial, cremation, etc., are culturally
dependent and so cannot be expected to be present. Multiple concurrent
marriages may even be legal in some cultures.

For my own part, even my own marriage cannot be represented accurately in
some products: I lived in England at the time, but I had a civil marriage in
the US, followed a week later by a religious one in Ireland.

Tony Proctor


Steve Hayes

unread,
Oct 6, 2011, 6:03:53 PM10/6/11
to
On Thu, 6 Oct 2011 18:33:15 +0100, "Tony Proctor"
<tony@proctor_NoMore_SPAM.net> wrote:

>
>"Steve Hayes" <haye...@telkomsa.net> wrote in message
>news:4mnr87lgajbt7qkq3...@4ax.com...
>> On Thu, 6 Oct 2011 17:06:51 +0100, "Tony Proctor"
>> <tony@proctor_NoMore_SPAM.net> wrote:

>>>I think the only point I slightly disagree with is that of designing
>>>database tables around the requirement. Such tables would be an
>>>implementation of the solution rather than the solution itself. In other
>>>words, you wouldn't want to mandate either a particular database or a
>>>particular set of tables so introducing them at such an early stage might
>>>be
>>>misunderstood.
>>
>> The set of tables is a sample, to illustrate what if have in mind.
>>
>> My hope is that others who feel the need for such a thing, and are willing
>> to
>> work on trying to develop it, shuold look at it and suggest improvements.

>I started a similar project a while back, but from a different direction. I
>had been keeping my notes in a written form with a well-defined syntax, i.e.
>one that could be parsed in principle. This was because I didn't trust the
>functionality of any of he commercial products, and also didn't want to get
>locked in to any one of them.
>
>I started to think about how I wanted to store my own data. This thread is
>touching on an XML format I experimented with - to the point of actually
>creating some examples. This is why I described it as a definitive
>source/backup format as well as an import/export format and database load
>format. The same format would serve all purposes.
>
>The second phase was to define a run-time object model around that data.
>Unfortunately, my paid work took over at that point.
>
>My final stage would have been to create a POC incorporating a relational
>database. However, both the definitive source format and the object model
>would be generic, and could be standardised in principle.
>
>I was hoping to get back to this when my contract finishes. However, if
>you're still interested in it yourself then I'd be willing to take it
>offline and compare notes. From this thread, I think I realise that the
>people who actually "get it", and can see the advantages, are in a minority.
>I think that means it's either going to need a "killer product" or Microsoft
>to force the quantum leap :-)

Ok, we're in the wrong thread here!

I was taking up Wes's point that he wanted relations other than genetic, and
that is not really an import-export question, but really a matter of what you
want the program to do.

I use three lineage-linked genealogy programs regularly, and several more
occasionally, so I really don't see the need for another one of them.

I could probably have written the kind of application I want in Paradox, but
Paradox doesn't work on my present computer and it also wouldn't be
transferable to anyone else. And with other database software I'm quite sure
that no sooner had I learnt how to do anything useful with it, it would be
obsolete, so I would spend the rest of my life trying to learn to use programs
and never actually being able to use them.

But by talking about it with others who might be interested in using such a
program we could chat about it and tinker with it and do a prototype and test
it and see if we could come up with something useful.

Steve Hayes

unread,
Oct 6, 2011, 6:12:43 PM10/6/11
to
I'm not sure that I understand your question.

What I'm after is a program that can make a list of events in the life of a
person, family, organization or place, and link to the people who were
involved in each event, and in what role or capacity they were involved.

Peter J. Seymour

unread,
Oct 7, 2011, 3:47:22 AM10/7/11
to
On 2011-10-06 23:12, Steve Hayes wrote:
> On Thu, 06 Oct 2011 16:20:40 +0100, "Peter J. Seymour"
> <Newsg...@pjsey.demon.co.uk> wrote:
>
>> On 2011-10-06 07:57, Steve Hayes wrote:
>> .....
>>>
>>> If you are interested in more relationships than genetic, then you need the
>>> kind of program described here:
>>>
>>> http://hayesgreene.blogspot.com/2011/05/event-based-history-and-genealogy.html
>>>
>>> Unfortunately, no such program exists.
>>>
>>> We are spoilt for choice when it comes to lineage-linked genealogy programs,
>>> but I don't know any program that does that (and, before anyone asks, yes, I
>>> HAVE looked at TMG, and Gramps, and no, they DON'T do that).
>>>
>>>
>>
>> Having read the entry at the supplied link, I am not clear on what you
>> are after. Is it that you want links between events other than via a
>> person or couple or by date or place?
>
> I'm not sure that I understand your question.
>
> What I'm after is a program that can make a list of events in the life of a
> person, family, organization or place, and link to the people who were
> involved in each event, and in what role or capacity they were involved.
>
>
I'm trying to see what I can understand from this. As far as I can see,
Gendatam Suite very nearly does all this, except is currently a bit weak
on making the results accessible.

Tony Proctor

unread,
Oct 7, 2011, 4:31:07 AM10/7/11
to

"Steve Hayes" <haye...@telkomsa.net> wrote in message
news:aj8s87l228edp07ph...@4ax.com...
Import/export? That was an incidental usage I just mentioned in passing.
Certainly not the main thrust here

Tony Proctor


Wes Groleau

unread,
Oct 7, 2011, 9:16:32 PM10/7/11
to

Sounds like PGV and webtrees do it, too, though some people point out
their displays aren't as "pretty" as the fluff sites.

tms

unread,
Oct 8, 2011, 2:45:14 PM10/8/11
to
On Oct 6, 3:03 pm, "Tony Proctor" <tony@proctor_NoMore_SPAM.net>
wrote:

> "Tony Proctor" <tony@proctor_NoMore_SPAM.net> wrote in message
> news:j6icl4$vu5$1...@reader01.news.esat.net...
> > "tms" <tmsomme...@gmail.com> wrote in message
> Having a node representing the conception is full of messy problems. It is
> very unlikely you can put a date on it so it cannot be treated like an
> event. It's possible, of course, that one or both of the parties are
> unknown. However, for a biggish family - say 12 children by the same
> couple - would your extra node be duplicated for each one? If you didn't do
> that then it could make it difficult to manipulate two sets of offspring
> where only one of the parents is common but the dates-of-birth are mixed.

I wasn't clear. I was merely saying that the family node as used in
Gedcom is a better way to represent the relation than separate child-
father and child-mother relations. Granted that Gedcom's FAM is a bit
ambiguous in that it includes foster children, etc., but it's still
more useful than distinct child-parent links.

> In effect, I don't believe it represents anything useful, and a bottom-up
> approach is much more practical and meaningful.
>
> The only universal events for a person are birth and death. All the others
> like baptism, marriage, divorce, burial, cremation, etc., are culturally
> dependent and so cannot be expected to be present. Multiple concurrent
> marriages may even be legal in some cultures.

Gedcom will handle multiple simultaneous marriages.

> For my own part, even my own marriage cannot be represented accurately in
> some products: I lived in England at the time, but I had a civil marriage in
> the US, followed a week later by a religious one in Ireland.

Such things were common in Scotland, with marriages a week apart in
different parishes. Nothing keeps a Gedcom FAM node from having two
MARR subnodes.

tms

unread,
Oct 8, 2011, 2:47:08 PM10/8/11
to
On Oct 6, 12:39 am, Wes Groleau <Groleau+n...@FreeShell.org> wrote:
> On 10-05-2011 14:50, tms wrote:
>
> > Putting that aside, what advantage does this scheme give you over just
> > allowing user-defined tags?  Or over the Gedcom equivalent: "2
> > _ADDRESS Claremont"?  Sure, you get to run it through a validator, but
> > the validator won't understand the stuff inside the AdHoc tag, so
> > validation doesn't buy much.  Especially since the validator can't
> > tell if the tags are being used correctly.
>
> Nor can the validator prevent the program from discarding it.

Nor can it prevent the program from playing Towers of Hanoi whenever
it encounters a date of 1 Apr in some event.

tms

unread,
Oct 8, 2011, 3:05:30 PM10/8/11
to
On Oct 6, 7:59 am, Ian Goddard <godda...@hotmail.co.uk> wrote:
> tms wrote:
>
> > That's one of the things I dislike about XML: even its advocates
> > misuse it.  Attributes are for metadata, not data.
>
> Where did you get that idea?  Perhaps you've been reading the tutorial
> on the W3C site which was clearly written by someone who doesn't like
> attributes but even so it's prefaced by the statement "There are no
> rules about when to use attributes or when to use elements." and
> something similar to that is found in any textbook on XML I've ever read.

I confess that I haven't read anything about XML recently, but when I
did, what I read said what I said above. If attributes and content do
not have different purposes, why have both? Granted there is no way
to enforce different uses, but that doesn't mean they don't exist or
aren't best practices. Does something like this make any sense at
all:

<address city="some city" zip="12345">
<type>postal</type>
<street>Main St.</street>
<state>Old York</state>
</address>

> > Putting that aside, what advantage does this scheme give you over just
> > allowing user-defined tags?  Or over the Gedcom equivalent: "2
> > _ADDRESS Claremont"?
>
> I did point out that in reality one might take an OO approach and
> subclass Source where there was a requirement for an address.
> Criticizing an example simplified for illustrative purposes for being
> simplified does not help.

I wasn't criticizing it for being simplified, I was asking what
advantage the XML syntax has over the Gedcom syntax, and what
advantage the AdHoc thing has over user-defined tags.

> However to deal with your more significant point:
>
> >  Sure, you get to run it through a validator, but
> > the validator won't understand the stuff inside the AdHoc tag, so
> > validation doesn't buy much.  Especially since the validator can't
> > tell if the tags are being used correctly.
>
> The schema would specify an AdHoc element consisting of a list of Item
> elements and in Item consisting of a Name attribute and whatever
> variation we finalise on for the Item contents.  The validator would
> have no problem whatsoever with understanding this.  If this is what you
> mean as being able to tell if the tags are being used correctly, then it
> can.

The validator would understand the syntax fine, but it would not have
a clue about the semantics. Allowing the AdHoc element to contain
random name-value pairs gains absolutely nothing over allowing random
user-defined elements.

>  From a semantic point of view the validator isn't going to check usage
> because validators only check that the form matches the schema.

Exactly my point. Validation against a schema only validates the
syntax, which is not a big deal compared to the semantics.

> I think I also pointed out elsewhere that where there are data whose
> semantics are significant to the application then a dedicated element
> should be used.  The entire AdHoc structure is provided for those data
> which are not significant in that way and for which there is, therefore,
> no dedicated element.
>
> In fact, it exists to allow you to do what you wanted: invent your own
> tags for that sort of thing and to do so in a manner which enables you
> to follow the schema and generate a valid XML document.

But you can invent your own tags without a schema, and still generate
valid XML.

> > Not all programs have GUIs.
>
> Curses.

Foiled again.

tms

unread,
Oct 8, 2011, 3:10:12 PM10/8/11
to
On Oct 6, 12:42 am, Wes Groleau <Groleau+n...@FreeShell.org> wrote:
> On 10-05-2011 15:17, tms wrote:
>
> > will on everyone else.  If, for example, the new FamilySearch were to
> > use XML in their API ( I have no idea whether they do so), they would
> > be able to say to their users (and get away with it), "Use our schema
>
> They do.  But they don't force anyone else to do anything based on that
> fact.  Instead (it seems to me) they send the XML to your browser along
> with XSL to make it display nicely.

I was thinking more of the request side of the transaction. They are
in a position to say, "Use our schema and use it correctly, or you
won't get a response from us." With regular genealogy programs,
however, none has the clout to force the others to conform to any
standard, and users will reject any program that won't let them import
"invalid" data.

Wes Groleau

unread,
Oct 8, 2011, 3:10:58 PM10/8/11
to
On 10-08-2011 14:45, tms wrote:
> I wasn't clear. I was merely saying that the family node as used in
> Gedcom is a better way to represent the relation than separate child-
> father and child-mother relations. Granted that Gedcom's FAM is a bit
> ambiguous in that it includes foster children, etc., but it's still
> more useful than distinct child-parent links.

Obviously I disagree. If all relationships are stored, then the
GEDCOM-style FAM is trivially identified as the set of people
who have the same father and mother, along with that father
and mother.

tms

unread,
Oct 8, 2011, 3:23:22 PM10/8/11
to
On Oct 6, 6:58 am, Ian Goddard <godda...@hotmail.co.uk> wrote:
> tms wrote:
> >>> XML schema are useful in some circumstances, but not when the data are
> >>> coming from multiple uncontrolled sources.
>
> >> I think you're swapping problem and solution here.
>
> > I don't understand what you mean.
>
> Then let me explain.  Uncontrolled sources are the problem.  That's
> right.  The *problem*.
>
> As a general matter, if a program is going to handle data it (OK, I'm
> being anthropomorphic; the program's programmer if we insist on being
> pedantic) must understand that data.

To be really pedantic, the programmer doesn't want to handle the data;
that's what he is writing the program for. But anthropomorphism is
okay; it's too awkward to avoid it, and I'm sure everyone here
understands it isn't meant literally.

> If the data source insists on
> cobbling together whatever it feels like this proviso is not going to be
> met.  That's why multiple uncontrolled sources are a problem.

So far, so good.

> Let's take an example.  The current IP addressing scheme of 4 octets is
> running out of space.  What if some of us decided to save the situation
> by adding another octet to our addresses?  None of the kit out there
> gluing the internet together wouldn't understand our extended addresses.
>   Fortunately such invalid addresses would be rejected.  The imposition
> of rigid limits of what can and can't be transmitted is essential to
> keeping things working and stopping uncontrolled sources wreaking havoc.

Right. The net, collectively, is able to impose its will on anyone
who wants to use it. The alternative is for the rebels to create
their own network, which is usually not a feasible solution at all.

> In XML it's schemas which provide that control.  They are the solution,
> not the problem.

But it's only a solution if it can be enforced. The most any
genealogy program can do is refuse to import "invalid" data. No
genealogy program can force other programs to conform; no schema can
force anyone to use itself, or to use itself unaltered.

tms

unread,
Oct 8, 2011, 3:26:32 PM10/8/11
to
On Oct 6, 2:57 am, Steve Hayes <hayes...@telkomsa.net> wrote:

> On Wed, 5 Oct 2011 10:54:19 -0700 (PDT), tms <tmsomme...@gmail.com> wrote:
> >On Oct 3, 9:56 pm, Wes Groleau <Groleau+n...@FreeShell.org> wrote:
> >> I am not a purist about genetics.  I am interested in relationships,
> >> of which the genetic is one of many types.
>
> >I'm not a purist, either.  It is just that the father-mother-child
> >relationship is fundamental to genealogy, and it makes sense to make
> >that relationship the basis of one's genealogical data.  Also note
> >that it is a ternary, not binary, relation in fact, and so it should
> >be represented as such in the data.
>
> I think the only relationships that really need to be recorded in a genealogy
> program are child-father and child-mother. If the sperm came from a sperm bank
> then the father-mother relationship is zilch. It cvan be noted, but it is not
> essential.

So if you had a family that raised a child who came from an anonymous
sperm, you would have no link between the child and the father who
raised it?

Steve Hayes

unread,
Oct 9, 2011, 12:45:07 AM10/9/11
to
On Sat, 8 Oct 2011 11:45:14 -0700 (PDT), tms <tmsom...@gmail.com> wrote:

>I wasn't clear. I was merely saying that the family node as used in
>Gedcom is a better way to represent the relation than separate child-
>father and child-mother relations. Granted that Gedcom's FAM is a bit
>ambiguous in that it includes foster children, etc., but it's still
>more useful than distinct child-parent links.

Why is it better?

I use a program that uses child-father and child-mother relationships only.

When I only know one of those relationships, that's what I enter.

But when I export it to GEDCOM it reports on how many "families" it created in
doing so.

Peter J. Seymour

unread,
Oct 9, 2011, 5:27:24 AM10/9/11
to
On 2011-10-08 19:45, tms wrote:
>.....

>>
>>>> Otherwise, each person
>>>> only has one progenitive father and one progenitive mother. Whether the
>>>> procreation was achieved inside a legal marriage, an illegal one, outside
>>>> of
>>>> marriage, or with no knowledge that it had happened at all, is irrelevant
>>>> from that point of view.
>>
>>> Granted that Gedcom's FAM is a bit ambiguous, as it includes
>>> biological children as well as step-, foster-, adopted-, and raised-by-
>>> wolves- children, but the ADOP and ASSO tags can resolve the
>>> ambiguities. Not, perhaps, the most elegant solution, but it works.
>>
.....
I considered this issue at length when designing the data records for
Gendatam Suite. To begin with I wanted to define a parent-child only
relationship, but on playing around with the possibilities I decided the
Gedcom FAM style of approach was more convenient. So in Gendatam there
is the COUPLE record. This links the two PERSON records for the partners
and records the nature of the relationship. It also links to individual
PERSON records for the children via PARENTAGE records recording the
nature of the parentage. Except that you have to have the appropriate
records to record a parent/child relationship, everything is optional.
So for instance, you can have a COUPLE record with only one partner
defined. You could even have children linked to a COUPLE record but no
parents. It is also flexible. One person can be involved in any number
of COUPLEs possibly with different types of relatinship. A child can be
given links to different couples, presumably with a different type of
parentage in each case (normal, adopted etc), although it would not
actually be in error in data terms to have more than one set of normal
parents defined (just illogical).

It seems to work well. One thing it achieves is to keep the nature of
the relationship separate from the related persons. Also in Gendatam,
where a list of possible attributes is provided, "unset" and "unknown"
are always included, with "unset" tending to be the default value.

Hope this helps.

Peter

Ian Goddard

unread,
Oct 9, 2011, 8:24:21 AM10/9/11
to
tms wrote:
>
> But you can invent your own tags without a schema, and still generate
> valid XML.

If you're without a schema then there's no such thing as validation.
I'm not sure but maybe you're confusing well-formedness and validity.

Well formed XML obeys certain structural constraints. For instance the
following isn't well formed XML:

<element1>
<element2>
</element1>
</element2>

The following is well formed XML:

<element1>
<element2>
</element2>
</element1>

However it would not be valid against a schema which doesn't make
provision for an element2 or which doesn't allow it to be nested inside
element1. If you don't have a schema at all it's just well formed and
neither valid nor invalid because the entire concept of validity doesn't
apply.

If you're supposed to be following a schema you can't invent your own
elements nor can you invent your own attributes.

It is loading more messages.
0 new messages