common vocabulary

7 views
Skip to first unread message

Jesper Zedlitz

unread,
May 26, 2008, 7:04:48 AM5/26/08
to BeyondGen
An important point for data interoperability is a common vocabulary. The
GEDCOM specification gives us a vocabulary for name types and event types but
that is not sufficient in many cases.

examples for event types:
* birth
* baptism
* marriage
* death

examples for name types:
* given name
* family name
* birth name
* German "Rufname" (I don't know if there is a proper English translation,
maybe "call name")
* German "Hofname" (you get that name when moving to a particular farm with
that name)

Types can be specializations of others:
* "Rufname" is a special given namen
* "killed in action" is a specialization of "died"
* "marriage in church" is a spezialisation of "marriage"

Instead of natural language terms these type definitions should use URIs as
identifiers. Otherwise it would not be possible to process them
unambiguously.

Additional information about each type (e.g. "'birth' occurs only once in a
human's life") could be used to perform automatic tests on genealogical data.

If we could create a mapping between different vocabularies or even use the
same (that would require a central vocabulary management) data would be
exchangable between different systems.

Is someone already working on that topic?

Jesper

--
Jesper Zedlitz E-Mail : jes...@zedlitz.de
Homepage : http://www.zedlitz.de
ICQ# : 23890711

signature.asc

Wade Starks

unread,
May 27, 2008, 2:06:25 PM5/27/08
to beyo...@googlegroups.com
Truely useful classifications often involve nested classification trees.
e.g.
Event
    Nativity
        Birth
        Christening
        Circumcision
        Introduction
        . . .
    Union
        Marriage
        Engagement
        Banns
        License
        . . .
    Mortality
       Death
       Burial
       Will Proved
       Killed in Action
       . . .
 
One of the advantages of nesting/grouping of classifications is that if an application, e.g., a pedigree chart doesn't care whether it gets a birth or christening event, then it can simply specify "Nativity", which will return any event so classified by order of preference.  Birth might be returned ahead of Christening and so forth.  If no birth event is present, then the next Nativity event in preference order would be returned. 
 
Name
    Personal Name Phrase (un-analyzed name string)   
        Given Name Phrase (un-analyzed/unparsed given names) 
            Given Name
            Nick Name
        Surname Name Phrase (un-analyzed surname string)
            Particle
            Article
            Adjective
            Noun
            Patronymic
            Patrilineal
            Matrilineal
            Toponymic
    note: types of name components
 
Given a personal name phrase of "Francesco Alberto de la Cruz Santa Vera", we might come up with the following structures and classifications:
 
<Name>
    <Piece type = "Personal Name Phrase"> "Francesco Alberto de la Cruz Santa Vera" </piece>
</Name>
    or analyzed as given name and surname phrases
<Name>
    <Piece type = "Given Name Phrase"> "Francesco Alberto" </piece>
    <Piece type = "Surname Phrase"> "de la Cruz Santa Vera" </piece>
</Name>
    or more fully analyzed as separate name tokens and cultures
<Name>
    <Piece type = "Given Name Phrase">
        <Piece type = "Given Name" order = "1"> "Francesco" </Piece>
        <Piece type = "Given Name" order = "2"> "Alberto" </Piece>
    </Piece>
    <Piece type = "Surname Phrase.Latin.Patri-bilineal"> (from both parents biased to the father)
        <Piece type = "Patrilineal" order = "1"> (from the father)
            <Piece type = "Particle.Preposition" order = "1"> "de" </Piece>
            <Piece type = "Particle.Article" order = "2"> "la" </Piece>
            <Piece type = "Noun" order = "3"> "Cruz" </Piece>
         </Piece>
         <Piece type = "Matrilineal" order = "2"> (from the mother)
            <Piece type = "Adjective" order = "1"> "Santa" </Piece>
            <Piece type = "Noun" order = "2"> "Vera" </Piece
         </Piece>
     </Piece>
</Name> 
 
The above name structure is appropriate for the "Spanish" typical Culture
The Portugese typical Culture is Matri-bilineal, with the reverse order for the parent's contributions.
 
Grouping articles and prepositions as particles simplifies sorting personal names according to the European rules.  Particles are ignored in the sort order.  It is thus simpler to drop all particles as a single instruction, than to drop prepositions and articles as separate instructions.
 
Such structural and classification schemes presented here are very comprehensive and powerful.  They far exceed the ability of the current GEDCOM specifications to describe genealogical information.
 
This is a subset of a much more comprehensive prototype data schema which can describe most, if not all historical documents, with only a handful of unique objects. 
 
Wade Starks
Information Architecture
Family Search (FHD)
    

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQBIOplUjSxW58yLxdgRAr7lAJ9BWC8JeqnJAb1uG9znof6P7Hc+kwCePAvL
Y7DS2qRSUfyXimttt5faauc=
=akKD
-----END PGP SIGNATURE-----




--
Wade Starks

Jesper Zedlitz

unread,
May 28, 2008, 6:09:51 AM5/28/08
to beyo...@googlegroups.com
Wade Starks wrote:
> Truely useful classifications often involve nested classification trees.
> e.g.
> Event
> Nativity
> Birth
> Christening
> Circumcision
> Introduction
> . . .
>
That is exactly what I had in mind. To have a more formal description of the
types I create an OWL Lite ontology for the two classification trees.
Interesting is "stillbirth": it is a nativity and also a mortality event.

> Such structural and classification schemes presented here are very
> comprehensive and powerful. They far exceed the ability of the current
> GEDCOM specifications to describe genealogical information.
>

In a OWL Full ontology we could write down with new types are the same as
existing GEDCOM types, e.g. Death = DEAD. That makes it possible to process
legacy GEDCOM data with programs using the new type hierachy.

Genealogical_Types.owl
signature.asc

Wade Starks

unread,
Jun 2, 2008, 10:54:49 AM6/2/08
to beyo...@googlegroups.com
This controlled vocabulary model allows for each node at any level to be associated with a set of equivalent spellings by language.  It could quite naturally be extended to include GEDCOM as a language in order to associate the legacy GEDCOM tags with the new classification nodes
 
Example:
Event
      Nativity
          Birth {{"German", "Geburt"}, {"Norwegian", "Født"}, {"GEDCOM", "BIRTH"}, . . . }
 
The name of the node is considered the universal name, to which is assigned native spellings, or GEDCOM tags.
          . . .
 
In my earlier discussion of several of the model objects, I purposely left out an entity/object in which the actual value is stored, in order to keep the initial presentation more simple.
 
The actual values of names, dates and places, etc. are not actually stored in the Piece as shown earlier, but in an entity called Form, which is subordinate to Piece.  The purpose of Form is to allow us to store "Actual", "Standard", "Calculated", . . ., and "System" forms of the value for that Piece.  A Piece may include as many Forms as needed.
 
Example:
<Name>
    <Piece type = "Given Name Phrase">
        <Piece type = "Given Name">
            <Form type = "Actual"> "Andrs." </Form>
            <Form tupe = "Standard"> "Andreas" </Form>
        </Piece>
    </Piece
    <Piece type = "Surname Phrase.Nordic">
        <Piece type = "Patronymic" order = "1">
            <Form type = "Actual"> "Lars." </Form>
            <Form tupe = "Standard"> "Larsen" </Form>
        </Piece>
        <Piece type = "Toponymic" order = "2">
            <Label> <Form type = "Actual"> "Gaard Navn" </Form> </Label>
            <Form type = "Actual"> "Baeksted" </Form>
            <Form type = "Standard"> "Bæksted" </Form>
        </Piece>
     </Piece
</Name> 
 
You will also note I have introduced another entity, Label, in the Toponymic Piece.  It allows us to preserve how that Name Piece was labeled (called) in the document.
 
Wade
 
--
Wade Starks
Reply all
Reply to author
Forward
0 new messages