An Evidentiary Model expressed in XML

97 views
Skip to first unread message

wcstarks

unread,
Jan 13, 2009, 12:46:40 PM1/13/09
to Open Ancestry
I thought I would try to organize this discussion tree in such a way
as to catergorize the information to make it more easy to follow. It
will work best if you view it in "tree mode".
I will initially create the following sub-headings under this subject:

General
Structural Classes
Authority Classes

Under each sub-heading I will place items to be discussed. It would
be best to respond only to the leaf-nodes of these sub-headings,
rather than to these abstract sub-headings themselves, as the
information of interest is really at the leaf-nodes below these sub-
headings. I will can add other sub-headings in time as we move on in
this discussion.

Example under Structural Classes I will submit separate responses for
each of the structural objects: "Role, Piece, Form, Attribute,
Reference, Label, Note"
Authority Classes: "Record, Event, Date, Time, Name, Age, Place,
Address, Persona, Group, Membership"

Note in my original posting, I left off the Membership class. I
should have looked at my documentation rather than taking it off the
top of my head. So there are actually 18 classes in all for this
evidentiary model.

wcstarks

unread,
Jan 13, 2009, 12:47:40 PM1/13/09
to Open Ancestry
General

wcstarks

unread,
Jan 13, 2009, 12:52:04 PM1/13/09
to Open Ancestry
Structual Classes

wcstarks

unread,
Jan 13, 2009, 12:52:43 PM1/13/09
to Open Ancestry
Authority Classes

wcstarks

unread,
Jan 13, 2009, 12:54:35 PM1/13/09
to Open Ancestry
Problem

The legacy document models are person centric generally based on the
lineage linked model, rather than document centric. As such the
legacy databases include a multitude of similar fields that differ in
name only. Included in the field names are references to:

1) the event
2) the function of a person in the record
3) the relationship of one person to another in the record
E.g., Principal’s Given Name, Father’s Given Name, Mother’s Given
Name, Maternal Grandfather’s Given Name, et. al..
These de-normalized legacy structures cause hundreds of fields to be
created, greatly increasing the difficulty for programmers to import
and deal with the vast amounts of data held our legacy data bases and
other databases using similar schemas. There is also little
standardization in naming of the fields between data bases.

Model Requirements
1) Normalize and standardize legacy data base fields
2) Provide a common data store to describe all historical documents
3) Preserve the integrity and structure of historical documents
4) Allows import and export transformations needed to support
requirements external to the model
5) Is highly normalized, data driven and readily extensible.

Principles
1) The model is document/record centric – not person centric.
2) A genealogical document generally includes events and participants
and should be modeled accordingly.
3) Persons recorded as participants in the document should be
instantiated as components of that document – not inferred.
4) These persons should be associated with the document by function
and with other persons in the document by function and relationships.
5) Persons in the document may not make assertions outside the domain
of that document. They may assert their roles in the document and
their associations with others in the document to the extent as
provided by the document.
6) The model should only instantiate persons of the document who
actually participated in the activities recorded in the document.
7) Attributes of participants in the document should remain distinct
from attributes of the document.
8) Create full contexts, in which to store the target data, avoiding
creation of ad-hoc fields, e.g., to store the birthplace of the father
of the principle in a census record, the model enforces creation of a
virtual person (as father) as a property of the principal, in which to
store an event of type “Birth”, which contains the place of the
father’s birth locality.
9) The model does not allow ad-hoc field definitions.
10) Each object in the model should perform a distinct function from
other model objects.
11) Similar functions should be handled by a common object (e.g., the
same object can handle ages from birthdays and from other epoch
events, such as how long married and how long a member).
12) Properties of the model objects should be controlled by
authoritative vocabularies.
13) The full meaning of model objects are defined by both context and
classification.
14) Classification schemes allow multi-level trees, when needed, to
support specific and more general descriptions and selections of model
objects.
15) Such schemes allow for cross-classification of nodes between the
class trees.
16) Model objects are either complex or simple objects. Complex
objects, by definition contain multiple elements, where as simple
objects do not.

It became immediately apparent that it would be necessary to get the
event, relationship and function information out of the personal field
names. The only reasonable strategy seemed to be to manage document
descriptions similar to how we deal with the entities in the lineage
linked models, where all relationship info is "external" to and
removed from the individual data.

That required making "real" the individuals implied in the documents
and managing the relationships and functions of the individuals in
their Roles. This normalization also extended beyond just the
individuals, to include making "real" the other implied authority
entities in the documents, such as the entities: Events, Names,
Dates, Places, Ages, etc..

This model has been extensively normalized, with the result that only
one Piece entity services all the authority entities having multiple
parts. There is also only one entity in which to store all document
data. In the process of developing this model, we chose to fully
normalize the entities so as to better understand the elements of the
model. It is easy enough to back up from this degree of normalization
if needed, e.g., to create separate classes for name pieces, place
pieces and date pieces, instead of using one generic piece for all.
While the Attribute entity in this model handles gender, since gender
is such an important property of individuals, it may be useful to
create a special class dedicated to gender.

The key to this more fully normalized schema is in the use of Role to
manage all the relevant functional and relationship information of
individuals in the document. This frees up the representation of each
person in the record to use the same terms for the same type of
attributes, just as is done in current lineage-linked schemas. In
fact, Role can be considered equivalent to the links between
individuals in the lineage-linked model.

wcstarks

unread,
Jan 13, 2009, 1:02:44 PM1/13/09
to Open Ancestry
The most fundamental of the structural classes is the Form. All
"data" , whether is be a name, date, place, etc, is stored in this
object. Form cannot exist in the data outside of some host object.

<Form>
Form specifies by its type whether the phrase that it holds is, for
example, “Actual,” “Standard,” “Display,” “Calculated,” “Generated,”
or “System” data. Form is a structural element in entities like
Piece, Role, Label, Attribute, et al. These entities need to store
their data according to one of these classifications, or other
classifications. The host entity may contain two or more Forms, if it
is to specify multiple forms of the same phrase, such as one for
“Actual” and one for “Standard” phrases. It only needs one Form each
to specify a “Calculated”, a “System“, or a “Generated” phrase.

Structure:
<Form type = [type value] > “[data value]” </Form> (1-M)

wcstarks

unread,
Jan 13, 2009, 1:07:55 PM1/13/09
to Open Ancestry
The Attribute class allows us to handle simple certain and ad-hoc
variables.

<Attribute>
This generic entity stores simple miscellaneous attributes, such as
gender, height, race, complexion, occupation, rank, function, ID,
etc., which do not contain multiple elements, or for which further
analysis is not needed. Each Attribute is distinguished from other
Attributes by its type and its duration of applicability. Although
not specifically shown here, it may be that an Attribute may also
contain an Attribute. This eventuality has not yet been proven by a
use case.

Attribute Structure:
<Attribute type = [type value]>
<Form> . . . </Form> [1-M]
<Event> . . . </Event> [0-1, when used, normally of type “duration”]
<Label> . . . </Label> [0-1]
<Reference> . . . </Reference> [0-1]
</Attribute>

wcstarks

unread,
Jan 13, 2009, 1:12:03 PM1/13/09
to Open Ancestry
The Piece Structual Class
<Piece>
Piece is intended for use by such entities as Name, Date, Place, and
Age to specify each element of a multiple element phrase. The order
of the pieces in the host entity should coincide with the default
order of the pieces in the authority as specified for a given
culture. For dates, the associated calendar authority controls the
presence, order and dependencies of the date pieces. Pieces may need
to be nested to support certain needs of Names and Dates, etc.

Piece is used by these entities to specify either a full phrase, or
token phrases of the full phrase. The Name entity uses a Piece of type
“Personal Name Phrase”, or full, for example, to store the full,
unparsed and un-analyzed name string of “David C Ellingsworth.” Piece
can also deal with each given name and surname phrase as separate
elements and classify them accordingly. This will be discussed further
under Name Authority Class

For the Gregorian calendar, the Piece types would be “Year,” “Month,”
and “Day.” In this case, the dependency rules would allow: “Year”,
“Year and Month”, or “Year, Month and Day” to be present as their
standard Forms. For Age, the Pieces would be typed as “Years”,
“Months”, “Weeks”, “Days”, “Hours”, “Minutes”, “Seconds”, etc.

Pieces may be nested in certain host entities, such as the Name
entity. See more on this there.

Structure:
<Piece type = [type value]>
<Label> . . . </Label> [0-1]
<Attribute type = [value]> . . . </Attribute> [0-M]
<Form type = [value]> . . . </Form> [1-M]
<Piece type = [value]> . . . </piece> [0-M]
</Piece>

wcstarks

unread,
Jan 13, 2009, 1:14:52 PM1/13/09
to Open Ancestry
The Label Structual Class
<Label>
Label specifies actual and standard forms for column headings, or
field labels in documents. It may be self-embedded to allow sub-
headings, e.g., from the 1880 census: [Health, [Blind, Deaf and Dumb,
Idiotic, Insane, Disabled]]. Currently, Label does not use a type
property.

Structure:
<Label> [heading or main heading]
<Reference type [value]> . . . </Reference>
[0-1, for indicating the locus of the label in a document]
<Form type = [value]> . . . <Form> [1-M]
<Attribute> . . . </Attribute> [0-M]
<Label> . . . <Label> [0-M, sub-heading(s)]
</Label>

Label can also be used to indicate the application field name
associated with the Form value of an entity, allowing a smooth bi-
directional transformation between this model and another app.

Example:
<Name>
<Piece type = “surname phrase”>
<Label application = “UDE”>
<Form type = “field name”> “PR_Name_Surname” </Form>
</Label>
<Form type = “Actual”> “Anderson” </Form>
</Piece>
</Name>

wcstarks

unread,
Jan 13, 2009, 1:18:41 PM1/13/09
to Open Ancestry
<Role>
Role is used by Personas and Memberships. Role defines familial,
social and functional roles that personas play in historical records.
Familial roles are usually limited to nuclear family relationships. A
persona may play more than one role in an historical record, e.g., a
persona may be both the “father” and the “sponsor” of an immigrant.
Roles may be self-embedded to show extended relationships, e.g.,
“Maternal Grandmother [of the principal].” This produces three roles
in a persona, e.g., “Mother, Mother, Principal” nested as shown, from
top to bottom, the interpretation being: “the mother of the mother of
the principal.”

The structure of role has been designed in this way to allow the
normalization of all the other personal attributes. This is the same
simplifying feature found in common implementations of the lineage-
linked model. Instead of requiring fields to be labeled as the
principal’s given name, or the maternal grandfather’s surname, which
would greatly multiply the number of name fields for the same data
type, we put all that generational logic in the Role. This allows
each Persona have simply the name, age and birth information, allowing
the personal attributes are fully normalized.

Some roles, such as those used in Membership, may contain an Event of
type “Duration” to show different roles in an organization over time.
As yet, Role has not been implemented to reference “ID”s as links. If
that became desirable, then Role could use the Reference entity for
that purpose.

Structure:
<Role type = [value]>
<Form type = “actual”> . . . </Form> [0-1 when actuals need to be
expressed]
<Event type = “duration”> . . . </Event> [0-1 usually used for roles
in memberships]
<Role type = [value]> . . . </Role> [0-1]
<Reference> . . . </Reference> [0-1]
<Role>

To demonstrate the convenience of self-embedding with Role, we
demonstrate the way a set of class nodes may spell out the full
classification tree, e.g., “Familial.Parent.Father,” for the role of
“Father.” In this way, we avoid ambiguity when a class node may be
used in multiple classification trees. “Father,” for example, is a
familial term in one context and a functional role in the Catholic
Church. The following nested Roles would express the meaning of the
phrase, “paternal grandmother [of the principal]”:

<Role type = “Familial.Parent.Mother”>
<Role type = “Familial.Parent.Father”>
<Role type = “Functional.Principal”> </Role>
</Role>
</Role>

It should be helpful to have more explanation of how this nesting of
the types of roles can work. Notice that the nested role types need
not be of the same category. For example, the phrase “Friend of the
Sister of the Principal” could be defined with the nested roles of
“Friend, Sister, Principal”, even though friend is a “Social” role,
sister is a “Familial” role, and principal is a “Functional” role.

Consider also the case of a christening record. Depending on business
rules, the persona representing the child being christened may be
defined with the role of either “Principal” or “Child.” The father of
the child would have the role of “Father of the Child” and likewise
the mother would have the role of “Mother of the Child.”

A further example is found in the case of a marriage record. The
persona representing the groom would have the role “Functional.Bridal
Couple.Groom” and the persona representing the bride would have the
role “Functional.Bridal Couple.Bride.” The role for the two personas
representing the groom’s parents would be “Familial.Parent.Father” of
the Groom and “Familial.Parent.Mother” of the Groom. The personas of
the bride’s parents would be represented in a similar fashion.

The Role can be chained even longer to identify as many generations as
included in the record. In addition. it may capture lateral and
descending relationships. Other personas not directly related to the
participant(s) might be identified simply by their role in the event,
e.g., “Witness,” or “Friend”, “Principal”, “Mother” and “Sister”.

The “type” value of Role should always be taken from a standard
controlled vocabulary. However there are cases where the role may
come from a document such as a census household, where each individual
has a stated role to the HoH. A template for recording a census
household can only expect there to be a Role, but it cannot know what
it is until each line (person) is processed. In such a case, it is
important to be able to store both the standard type of the Role as
well as the actual term. This may be accomplished using a lookup
function in the template.

<Role type = Lookup(“Role”, [value])>
<Form type = “actual”> [value] </Form>
</Role>

Here, the actual relationship to head of house term is fed to a lookup
function, which will return the standard representation of the actual
term. Then the actual term is preserved in a Form.

The relationship of the personas is relegated to the Role . A persona
may be also be identified with many other personal attributes; there
may be additional names, some personal event, membership, et al.

There is sometimes a situation where two or more generations are
documented. In this case it may be useful to generate separate index
entries for each combination of two or three personas in the record.
We may have a child plus one or two parents. To reference the child
and parents in an index, we would subtract “Principal” from both the
child’s role and his parents’ roles. This would leave the child as a
single individual and the parents as simply “Father” and “Mother”. To
reference spouses in an index, we would subtract “Groom” from the
groom’s role and from the roles of his parents, so as to have the
triplet of the individual, the “Father” and the “Mother.” Then, if
the mother’s parents, were listed as well, we could create another
triplet by subtracting “Principal, Mother” from the mother’s role and
from those of her her parents’, reducing them to the individual, her
“Father” and her “Mother.”

We do not use extended relationships in the Role when they can be
avoided for the simple reason that extended relationships can be
derived by chaining the basic familial terms together into unambiguous
nuclear family memberships. For example, the meaning of maternal
grandfather can be derived with the chain of more basic terms:
“Father, Mother, Principal.” However, in the situation where a term
such as “grandfather” is actually used in a source record, it may, of
course, be used, but it remains ambiguous. For example, even though
“Grandfather, Principal” cannot be used to place the ambiguous
grandfather into a nuclear family group above the Principal, it may be
useful in a matching algorithms that could be developed. There are
other cultures that have certain extended familial relationship terms
that are not ambiguous.

When using a Role term that is not gender specific, it may still be
possible to reconstruct the nuclear families without it. For example,
it is possible to add a child to a family without specifying its sex.
However, the current lineage-linked models available make it
difficult, if not impossible to add a parent or a spouse to an
individual whose sex cannot be specified.

We must also point out that there are other valid Role terms dictated
by the type of event. For example, in a census household event there
are “Head-of-House,” “Laborer,” etc., in a deed conveyance event there
are “Grantor,” “Grantee,” etc., in a probate or will event there are
terms like “Executor,” “Executrix,” “Administrator” and in other
events there are such terms as “Witness,” “Physician,” “Partner,” etc.

In our typical Western European cultures there is a relatively small
set of basic familial terms that should be standardized for use in
Role, and it would be through the use of these terms that processes
would produce the nuclear family memberships and linked extended
nuclear families that they identify. The following is an attempt to
establish adequate standards.

The familial role class may be divided into two sets. In the first
set of leaf node terms of the hierarchy are fully resolvable to a
specific nuclear family:

Nuclear Familial Role Classes [leaf node terms fully resolvable to a
specific nuclear family]
Parent [gender unspecified]
Father [gender male]
Mother [gender female]
Spouse [gender unspecified]
Husband [gender male]
Wife [gender female]
Child [gender unspecified]
Son [gender male]
Daughter [gender female]
Sibling [gender unspecified]
Brother [gender male]
Sister [gender female]

Extended Familial Role Classes [terms not fully resolvable to a
pecific:nuclear family
Grandparent
Grandfather
Grandmother
Uncle
Aunt
Nephew
Niece
Cousin

Social Role Classes:
Friend [gender unspecified
Boy Friend [gender male]
Girl Friend [gender female
. . .
Couple [gender unspecified]
Groom [gender male]
Bride [gender female]

Functional Role Classes:
Sponsor
Agent
Executor [gender male]
Executrix [gender female]
Administrator [gender male]
Administratrix [gender female]
Consignor [gender unspecified]
Witness [gender unspecified]
Deponent [gender unspecified]
Giver
Grantor [gender unspecified]
Testator [gender unspecified]
Donor [gender unspecified]
Presenter [gender unspecified]
Receiver
Legatee [gender unspecified]
Grantee [gender unspecified]
Recipient [gender unspecified]
Consignee [gender unspecified]
Trustee [gender unspecified]

wcstarks

unread,
Jan 13, 2009, 1:22:01 PM1/13/09
to Open Ancestry
The referenc Structural Class

<Reference>
The Reference entity provides references, which are either external,
or internal to a document.

External
There are at least two types of external references:
1) A link to a coverage of a collection
Among these, are ID in the case of a database, or URL. Information
about the collection, such as the name, publisher, media type of the
collection would be obtained from the Collection entity.

2) A specific citation within a collection
The Reference entity provides the specific references within a
collection where the record (or document, or entry) may be found. It
may not be necessary to refer directly to volumes, books, etc., as
these are maintained in the collection and in coverage entities above
the document record, to which entity this document is already linked
via the “Link to a Collection” reference. In this usage, the
Reference would typically be used at the document record level.

Internal
Reference is also useful to describe the layout of a document - to
indicate relative positions, or absolute positions of the data
elements within an original document. It may be attached to a label,
or to an actual input item (variable). Example of relative positions
would be Attribute types of “Row” and “Column,” or “Row” and “Item,”
or larger structures such as “Box.” In this particular case the
References may be nested to show sub-attributes, just as was
illustrated for Types, Roles and Labels. For example, sub-rows and
sub-columns would be useful where a document shows two (or more) rows
of data within one row, column or box on the form.

Reference Structure:
<Reference type = [type value]>
<Piece type = [type value] occurs = “1:M”>
<Form type = [type value] occurs = “1-M”> . . . </Form>
</Piece>
<Reference occurs = “0:M”> . . . </Reference>
</Reference>

Reference can also be used to do Source Citations from the Assertion
entities. More on that later.

wcstarks

unread,
Jan 13, 2009, 1:33:42 PM1/13/09
to Open Ancestry

<Record>
In the evidentiary domain, a Record commonly represents a source
record, as produced by some person or agency (legal person). Usually
the source is an historical event record of which the following are
only a few: birth (or christening) record, deed record, marriage
record, death (or burial) record, voyage, passenger manifest and will.
A source record, besides containing its own inherent attributes,
typically represents one or more events for one or more personas,
which after all are the individuals themselves as recorded in an
historical event record.

Typically a Record of a specified record type, say, “Christening,”
“Marriage,” “Death,” “Deed,” “Will,” will contain a single Event
entity (event as an Attribute) of a similar type. Such an event
represents the purpose for generating this record and is normally
considered to be “Primary.” The Record may also include other event
attributes, directly, or indirectly related to the primary event.
Such an event is usually considered “Secondary”. For example, a birth
event given in a death record is considered “Secondary” to the death
event. It is the death event which is primary to the death record.

A source Record should be associated with a certain Coverage of some
Collection. It should also indicate the language and the culture of
the record. The language and culture are attributes that are normally
inherited by the constituents of the Record.

A Record may represent organizations, passenger manifests, voyages and
ships, for example. A Record may contain other Records.

Structure:
<Record type = [type value] id = [value]>
<Label> . . . </Label> [0-M]
<Event> . . . </Event> [0-M]
<Persona> . . . </Persona> [0-M]
<Address> . . . </Address> [0-M]
<Age> . . . </Age> [0-1]
<Attribute> . . . </Attribute> [0-M]
<Reference> . . . </Reference> [0-M]
<Record> . . . </Record> [0-M]
</Record>

wcstarks

unread,
Jan 13, 2009, 1:39:47 PM1/13/09
to Open Ancestry

<Event>
Events are maintained by the Event Authority. This object associates
a set of Attributes that together form a named set: event type, date
and place. These attributes are then used in this form to help
identify another entity, such as Personas (evidence domain) and
Individuals (assertion domain). It is also used as an attribute to
various other entities which require a date and/or place (depending on
the event type), e.g., Historical Event Records, Names, Localities,
etc.

An attribute of type “Substitute” is an explanation of why this event
never occurred, e.g., “Child” for the “LDS Baptism” type of event. In
this case, an event for LDS Baptism is created without a date or
place, but with the Event “Substitute” filled in to explain why the
date and place are not relevant.

An Event entity is required to give a date context by using the Type
attribute. The same is generally true for a place.

Structure:
<Event type = [type value]>
<Attribute> . . . </Attribute> [0-M]
<Date> . . . </Date> [0-1]
<Time> . . . </Time> [0-1]
<Place> . . . </Place> [0-1]
<Reference> . . . </Reference> [0-1
</Event>

While the above definition indicates all elements of Events are
optional, at least one useful element must be present to justify the
existence of and Event entity. An Event, besides its type, typically
includes both a date and a place. However both are not required, when
only one or the other is all that is known.

Typical classifications of event types may be represented as follows
(using English as the standard terms):

Event (data category)
Nativity (sub-category)
Birth
Christening
Circumcision
. . .
Union
Marriage
Engagement
Banns
License
. . .
Mortality
Death
Burial
Will Proved
Will Dated
. . .
. . .

For multi-language and/or multi-culture support, the classification
structure may be enhanced as follows. This supports the language of
the document and the language of the user.

Event
Nativity
Birth
{{English, Birth}, {German, Geburt}, {Norwegian, Fødsel}, . . .,
{“GEDCOM”, “BIRT”}}

Any class node at any level may be associated with as set of terms by
language, not just the leaf nodes as depicted here. This language
facility can be extended to also include associated GEDCOM tags to
provide compatibility with legacy GEDCOM.

In order to import and store the event, legacy products such as PAF
require the Event type to be specified. If the product does not
recognize the event type, it cannot do anything with it and must
discard it. In reality a database should be able to store an event
regardless of its type. Since the basic difference between one event
and another is its classification, its data store is the same
regardless of its type.

A product exporting or importing genealogical data should only concern
itself with the data category, e.g., Name, Event, etc., not the type.
When an Event of an unrecognized (non-established) type is
encountered, it should still be accepted and stored. Once stored,
however, such an event may sit ignored in the database. It will never
get retrieved, if there is no templated or screen in the product that
asks for its (non-established) event type. On the other hand, if
there were a template which requested to see all events regardless of
their types, then it could display all events whether or not the types
were recognized by the Event Authority.

wcstarks

unread,
Jan 13, 2009, 1:45:27 PM1/13/09
to Open Ancestry

<Date>
The Date entity is maintained by Date Authority. A date entity
records a date and time of day in the context of a calendar, e.g.,
“Julian,” “Gregorian.” It allows for storing the date as actually
found in a record and in a standard form. The date may be formatted
as a full date string, or as separate elements suited to the calendar
specified. The pieces, “Year,” “Month,” and “Day,” are appropriate
for the Gregorian and Julian calendars, but other calendar types may
have different constituent elements. The present Date Authority has a
standard date in the form of the Julian Day Number. This form can be
rendered common to the calendar systems of most, if not all cultures.

Dates inherit the language and culture of the host Event in which they
are defined and these attributes control the form and format of the
date elements. When implemented as designed, a date may be input
according to the form of one calendar, language and culture (those of
the record) and output either in the same calendar, language and
culture, or in some other calendar, language, and culture.

Dates may be full dates, with all constituent parts specified, or
partial dates with some components missing. The “Actual” date may
well allow any part to be missing. The “Standard” date, however,
requires at least the largest division to be specified. This would be
the year in Julian or Gregorian calendars. Dates may be self-embedded
so as to allow for date ranges. The precision on the date Piece
indicates what date components are present. The presence and
dependencies of date components are controlled by the associated
calendar.


Structure:
<Date type = [type value] calendar = [value]> [precision: e.g.,
Normal, Delta]
<Attribute> . . . </Attribute> [0-M, e.g., Confidence Level,
Precision]
<Piece type = [type value]> [1-M]
<Label> . . . </Label> [0-1]
<Attribute> . . . </Attribute> [0-M]
<Form type [type value]> [1-M]
<Attribute type = “Precision”> . . . </Attribute> [0-1]
[value]
</Form>
</Piece>
<Time occurs = “0:3”> . . . </Time> [0-3]
<Date occurs = “0:3”> . . . </Date> [0-3, for date ranges]
</Date>

Or for date ranges such as between dates:
<Date type = “between”>
<Date type = “Begin”> . . . </Date>
<Date type = “End”> . . . </Date>
</Date>

Or for Delta dates (dates around a reference date):
<Date type = “Delta”>
<Date type = “Reference”> . . . </Date>
<Date type = “Before”> . . . </Date>
<Date type = “After”> . . . </Date>
</Date>

Typical Date Pieces for the Gregorian Calendar may be classified as:
Full Date Phrase (meaning the full unparsed, unanalyzed date string)
MonthDay Phrase (meaning the day and month, as in “20 Aug” separate
from the year.
Day
Month
Year
Day-of-Week
All such date pieces may be present in the same date entity.

It is important to realize that the Julian Day Number can support
partial dates. For the Gregorian and Julian calendars we have “Year,”
“Year+Month,” “Year+Month+Day.” This is carried out through the use
of the Attribute of “Precision” indicating the lowest level recorded
by the Julian Day Number. Of course, a Julian day number stores
levels below the day: the time of day, or intervals, down to the
number of decimal places required. This is because the decimal part
of a Julian Day number represents a partial day, or fractional
interval of a day, or time. The standard Julian Day starts at Noon
(GMT), rather than Midnight and begins numbering from 1 Jan 4713 BC,
Gregorian. The Julian Day Number as of noon, 6 Jun 2007am, is
2,454,257 days.

Date prefixes indicate: about, before, after, estimated, etc.

wcstarks

unread,
Jan 13, 2009, 1:47:34 PM1/13/09
to Open Ancestry

<Time>
Time may be part of an event, or part of a date. When the time is
part of a date, it is understood in the context of the date which
standard value is incorporated in the Julian Day Number piece. Self-
embedding is provided in order to group time ranges.

Structure:
<Time type = [type value]>
<Attribute> . . . </Attribute> [0-M]
<Piece> . . </Piece> [1-M]
<Time> . . . </Time> [0-3, for time ranges]
<Reference> . . . </Reference> [0-1]
</Time>

Or for time ranges such as between dates:
<Time type = “between”>
< Time type = “Begin”> . . . </ Time >
< Time type = “End”> . . . </ Time >
</ Time >

Or for Delta times (dates around a reference time):
< Time type = “Delta”>
< Time type = “Reference”> . . . </ Time >
< Time type = “Before”> . . . </ Time >
< Time type = “After”> . . . </ Time >
</ Time >

If Time needs to specify a range of times, it becomes a parent Time
entity, and the two Times that specify either end of the range are
embedded within it. A time is specified as “Relative”, when the time
zone is not known, or when the event to which it belongs occurred
before the initiation of time zones. In those days each locality set
its own time of day relative to when the sun crossed that locality’s
zenith. In either case for Times in the context of a Date, the
standard form is as a fractional part of the Julian Day Number,
relative to GMT (a.k.a. UTC). Although “Relative” times are stored as
if they were GMT, it must be understood that they are “Relative” to
the local area.

wcstarks

unread,
Jan 13, 2009, 1:52:06 PM1/13/09
to Open Ancestry

<Age>
Age allows an individual/persona, or certain other entities to be
associated with an “age” expressed in specific intervals, such as
years, months, weeks and days and yes even in hours, minutes and
seconds in the more extreme. Age is used to show the length of time
elapsed since some epoch event relative to some host entity, e.g.,
years since birth, years since marriage, years as a member of some
organization, et. al. The actual interval of the pieces may be
defined specific to a certain usage. Age can have both a “Full Age
Phrase” piece, e.g., “65 y 4 m 14 d”, and a set of pieces defining
e.g., “Year”, “Month”, and “Day”, or only a “Full Age Phrase” piece,
or only a set of specific pieces.

For example: An Age of type "Marriage" says how many years married.

Structure:
<Age>
<Type> . . . </Type> [1-1, e.g.,
Birth, Marriage, Membership – specifies the epoch event
referenced by the age]
<Label> . . . </Label> [0-1]
<Piece> . . . </Piece> [1-N, e.g., Type: Full Age Phrase, Years,
Months, weeks, or Days]
<Age> . . . </Age> [0-M, for age ranges]
<Attribute> . . . </Attribute> [0-N]
</Age>

Or for age ranges such as between ages:
< Age type = “between”>
< Age type = “Begin”> . . . </ Age >
< Age type = “End”> . . . </ Age >
</ Age >

Or for Delta ages (dates around a reference age):
< Age type = “Delta”>
< Age type = “Reference”> . . . </ Age >
< Age type = “Before”> . . . </ Age >
< Age type = “After”> . . . </ Age >
</ Age >

wcstarks

unread,
Jan 13, 2009, 1:56:02 PM1/13/09
to Open Ancestry
<Place>
Places are localities and are managed by the Locality Authority. The
Place entity inherits the language and culture of the Event which owns
it (when associated with an Event). The Place entity stores both the
actual cultural form of the place (its name as it appears in a record)
and the standard form of the actual (as determined by Locality
Authority). The Place may represent a set of localities having from
one to some specific number elements. For example it may represent 1)
a country, or 2) a country and a county within that country, or 3) or
a country and a county and a town within that county. A set of
localities is not normally restricted to a particular number of
levels. The context may determine the number of levels or the person
entering the data on the record may have made the decision.

The actual form of the “Full” Place piece name may be formatted as a
string including all locality levels each one separated by some
delimiter (normally a comma). It may be formatted as separate
locality Piece entities. The standard place stores the localities as
separate entities in another Piece, from which various cultural
display forms may be generated. The Path ID of the locality Piece
identifies the jurisdictional context in which the localities are
associated, e.g., “Geo-Political,” “Ecclesiastical,” “Judicial,” etc.

The locality Piece entity is used for either the actual or standard
forms. They differ in that 1) the additional attribute Locality ID is
used in the standard form, 2) the Locality Type may not be from a
controlled vocabulary in the actual form, whereas it always is in the
standard usage. Places may be self-embedded so as to allow for place
ranges or routes.

Structure:
<Place>
<Attribute> . . . </Attribute> [0-1, Confidence Level]
<Attribute> . . . </Attribute> [0-1, Prefix – e.g., near]
<Attribute> . . . </Attribute> [1-1, Path ID]
<Piece> . . . </Piece> [1-N, Locality - one piece for each locality
level]
<Place> . . . </Place> [0-1, when prefix is “Between”]
<Reference> . . . </Reference>
</Place>

Or for place ranges such as between places:
<Place type = “between”>
< Place type = “Begin”> . . . </ Place >
< Place type = “End”> . . . </ Place >
</ Place >

Or for Delta places (dates around a reference place):
< Place type = “Delta”>
< Place type = “Reference”> . . . </ Place >
< Place type = “Before”> . . . </ Place >
< Place type = “After”> . . . </ Place >
</ Place >

wcstarks

unread,
Jan 13, 2009, 2:00:41 PM1/13/09
to Open Ancestry
<Address>
The Address entity allows the information typically known as an
address to be associated with contact information, a residence, a
business, a lot, etc.. The address described here may seem too simple
because it is apparently missing certain important address elements.
The reasoning is that since Event allows an address to be assigned for
a period of time and includes the Place entity, that entity may store
the address locality information. Pieces of Place in Address may also
represent such things as “Zip Code,” “Street,” “House Number,”
“Suite,” “Apartment,” “Number,” as well as the locality hierarchy such
as the city, county, state, and country.

Structure:
<Address>
<Type> . . . </Type> [1-1, e.g., residential, business]
<Attribute> . . . </Attribute> [0-N, e.g., Confidence Level]
<Event> . . . </Event> [0-N of type Duration, or events marking
starting and ending, e.g., Moving In and Moving Out/Away]
<Place> . . . <Place> [1-N]
<Label> . . . </Label> [0-1]
<Reference> . . . </Reference>
</Address>

A typical Place defining an address may be structured with pieces
having Names typed as follows: “City”, “State”, “Number,” “Suite,”
“Apartment”, “Rural route”, etc.

Here are some common elements of addresses that should be treated like
any other locality.

Postal Code
A Postal Code is assigned a polygon, which defines its domain of
addresses. It also has a name value, e.g., “84058” or “1K7 A6R.” It
can also be assigned a single coordinate just as a town can. A Postal
Code has a start date and sometimes a stop date. It may evolve out of
a parent, or spawn children. Its polygon may change over time.
Postal codes participate in multiple jurisdictional hierarchies over
time (Postal Path and Geo-political Path). All these are the same
characteristics of traditional localities.

Street/Road/Highway
Streets, roads and highways are not much different in principle than
streams and rivers, which are considered localities, with all the same
kinds of attributes as other non-jurisdictional localities.

Locus
The name or number assigned to a building or lot, commonly called its
address, or street address, includes the specific information by which
its relative/absolute location along the street can be determined.
Like localities it can be defined by a polygon, a point coordinate and
a name. The name pieces of a locus name differ from those of a
personal name. As stated above, they may include a number, suite,
apartment, etc., but may still be considered name pieces and be
manipulated by rules like personal names.

With each Place entity and its name pieces properly identified, it
would be simple, by rules, to format the pieces to a standard address
label, or any other formats.

wcstarks

unread,
Jan 13, 2009, 2:07:46 PM1/13/09
to Open Ancestry
<Name> [Entity]
The Name entity is managed by Name Authority. It organizes its names
by language, culture, and by time period of usage. It associates a
set of names to any Individual (assertion) or Persona (evidentiary).
A personal Name may contain given names, surnames, pre-posed titles,
post-posed titles, and even mid-posed titles. Each name piece is held
in a Piece entity. The Name entity may contain one or more name
pieces, each classified by its name piece type. The name piece
entities are ordered within the Name entity according to the normal
written or spoken order of the language and culture. A person may
have more than one set of names each used at different times in his
life. This is why we use of the Event entity of type “Duration.”

A name Piece is classified by type as given above. Since there are
various cultural forms of the surname, a name piece of type,
“Surname,” must also have a sub-class. This allows the further
distinction of surname phrases by the cultural form.

Structure:
<Name>
<Attribute> . . . </Attribute> [0-1, Culture ID]
<Attribute> . . . </Attribute> [0-1, Language ID]
<Attribute> . . . </Attribute> [0-1, Confidence Level]
<Type> . . . </Type> [1-1, controlled vocabulary; see examples
below]
<Piece> . . . </Piece> [1-N, Full, Given Name, Surname]
<Event> . . . </Event> [0-1 specifying duration of usage]
<Label> . . . </Label> [0-1]
<Reference> . . . </Reference> [0-1]
<Name> . . . </Name> [0-M]
</Name>

Examples of Names in the context of Personas:
PNP is short for “Personal Name Phrase”, GNP is short for “Given Name
Phrase”, SNP is short for Surname Phrase


Un-analyzed name string
<piece type = “Personal Name Phrase”>
form type = “actual”> “Francesco Alberto de la Cruz Santa Vera” </
form>
</piece>

or analyzed as given names and surname phrases

<piece type = “GNP”> form type = “actual”> “Francesco Alberto” </form>
</piece>
<piece type = “SNP”> <form type = “actual”> “de la Cruz Santa Vera” </
form> </piece>

or more fully analyzed as given name and surname tokens

<piece type = “Given Name Phrase”>
<piece type = “Given Name”> form type = “actual”> “Francesco” </
form> </piece>
<piece type = “Given Name”> <form type = “actual”> “Alberto” </form>
</piece>
</piece>
<piece type = “Surname Phrase.latin.patribilineal”>
<piece type = “patrilineal”
<piece type = “particle.preposition”> <form type = “actual”> “de”
</form> </piece>
<piece type = “particle.article”> <form type = “actual”> “la” </
form> </piece>
<piece type = “noun”> <form type = “actual”> “Cruz” </form> </
piece>
</piece
<piece type = “matrilineal”
<piece type = “adjective”> <form type = “actual”> “Santa” </form>
</piece>
<piece type = “noun”> <form type = “actual”> “Vera” </form> </
piece>
</piece
</piece>

Examples:
1) Many Western cultures, such as that of the US, Canada, and other
English speaking lands use a form of surname called “Patrilineal.”
This means that the child inherits the family name through its
father. In Scandinavia the surname phrase is composed in part of a
“Patronymic” name phrase, meaning it is formed from the father’s given
name. In addition the Scandinavian surname phrase may possibly
combine with a “Toponymic” (farm) name phrase or with an
“Occupational” name phrase. As with Scandinavia, sub-classes of a
name phrase piece of type “Surname” are required for certain other
cultures. In the Spanish culture the “Bilineal” sub-class is defined
as a “Patriarchal.” This name is inherited from the first part of the
father’s surname first and from the first part of the mother’s surname
second. In Portugal the sub-class is defined as a “Matriarchal.” In
their culture the order of the name pieces is the reverse of those in
the Spanish culture.

2) East Asian, Slavic and some other cultures, which use multiple
scripts beg an extension to the name entity to manage these multiple
scripts of the same name. This might be handled using the container
concept, similar to how date, age and place manage range sets.

<Name culture = “Sinotypic”>
<Name script = “Syllabary.Katagana”> . . . </Name>
<Name script = “Ideographic.Kanji”> . . . </Name>
<Name script = “Syllabary.Hiragana”> . . . </Name>
</Name>

or

<Name culture = “Eurotypic.Slavic”>
<Name script = “Cyrillic”> . . . </Name>
<Name script = “Roman”> . . . </Name>
</Name>

wcstarks

unread,
Jan 13, 2009, 2:10:24 PM1/13/09
to Open Ancestry
<Membership>
The Membership entity is managed by the Authorities Team. It allows
an individual, or persona to be associated as a member of some
organization. In genealogical records, persons are often shown as a
member of the military. Membership associates an individual by role
(function), rank (Office) and time period with that organization. It
also indicates whether the individual’s associated, for example, as an
“Employee” or as a “Volunteer”. Both Role and Rank may also be
constrained by time period.

Structure:
<Membership> <Type> . . . </Type> [controlled vocabulary]
<Event> . . . </Event> [0-N to define boundary events, or duration]
<Role> . . . </Role>
<Attribute> . . . </Attribute> [0-N of type Rank]
<Age> . . . </Age> [0-1]
<Group> . . . </Group> [1-1, of type Organization]
</Membership>

wcstarks

unread,
Jan 13, 2009, 2:11:01 PM1/13/09
to Open Ancestry
<Note>
The note objet is not formally defined here yet. It can be attached
to any entity, or entity element, as needed.

Structure:
<Note> . . . </Note> [free form text blob]

wcstarks

unread,
Jan 13, 2009, 2:15:40 PM1/13/09
to Open Ancestry
<Persona>
A persona represents an individual as found in the evidentiary
domain. In contrast, Individuals, by definition, reside in the
assertion (compilation) domain. Individuals and personas are similar
in all other respects, except personas are not implemented as
independent records, as are individuals, but as artifacts of
historical records. Personas cannot stand alone outside of an
historical record, whereas individuals can stand alone in the
assertion domain. An individual can also assert to be represented by
separate personas found in multiple historical records. A persona may
not make assertions not allowed by the document.

An historical record may include one to many personas. A personas’
participation in an historical record is defined by that persona’s
role. The role defines principals and other personas, who may
participate in the event in secondary roles. The role is also used to
show how one persona in the historical event record is related to the
other personas in that same record. In addition to roles, personas
may be identified by other characteristics such as names, age at the
time of the event, membership in an organization, by personal events
and other miscellaneous attributes.

A persona may be embed within another persona, where such persona is
treated, by the event record, as an attribute of another persona,
i.e., when specifying the birth place of the father of the principal
in a census record. This avoids creating unique fields for such
information, thus preserving the normalized schema. Embedding also
avoids ambiguity when there are multiple personas in the record with
the same primary role.

Structure:
<Persona>
<Role> . . . </Role> [0-N]
<Reference> . . . </Reference> [0-N]
<Label> . . . <Label> [0-1]
<Name> . . . </Name> [0-N]
<Event> . . . </Event> [0-N]
<Membership> . . . </Membership> [0-N]
<Attribute> . . . </Attribute> [0-N, various misc. attributes such
as gender]
<Age > . . . </Age> [0-N]
<Persona> . . . </Persona> [0-N; of type persona, used as an
attribute of this persona]
</Persona>

Note: While Persona is not explicitly shown here with an ID, one can
be provided if needed, e.g., <Persona ID = [value]>.

wcstarks

unread,
Jan 13, 2009, 2:35:28 PM1/13/09
to Open Ancestry
<Group>
The Group class, which was not fully developed before I retired,
allows us to describe things, such as Family, Organization, Household,
etc.. Group might be implemented as an abstract class with its sub-
classes "Family", ect. inheriting its general features. The Group, or
if implemented as an abstract class, its sub-classes would generally
have structures somewhat similar to <Record>. In fact, we originally
described families, households and organizations as Records. We were
uneasy with that. It now seems that Group is a more approprate object
to use for these types of entities.

Structure:
<Group type = "Family"</Group>
<Attribute type = "[type]" . . . </Attribute> [1-M]
<Event type = "[type]" . . . </Event> [commonly the marriage and
divorce events]
<Persona> . . . </Persona> [1-M]
<Reference type = "[type]> . . . </Reference>
. . .
</Group>

wcstarks

unread,
Jan 13, 2009, 9:12:55 PM1/13/09
to Open Ancestry
I thought all might be interested to see the model used by the
Norwegian Digital Archives for accepting extracted records from
outside parties. While they didn't normalize as much as done in this
Evidentiary Model presented here, they did create a document centric
model which includes a persona for each participant extracted from the
event record. In addition, they implemented the Role for personas
just as done in this model. The Kyrre and the Evidentiary model were
developed completely independently of each other. I am considering
inviting some from the archive to join this group.

Norwegian Digital Archives Model for ingesting 3rd party extraction
submissions to their online site

<kyrre> (submission set)
<kjelde> (Source – Locality - Author)
<prgjeld>Spydeberg</prgjeld> (Clerical District)
<ksokn>Hovin og Heli</ksokn> (Parish)
….. flere felt her ….. (more source info as needed)
</kjelde>
<registrering> (Information about the submission)
<reg_av>Guri Pettersen</reg_av> (person submitting)
<reg_naar>november 2005</reg_naar> (submission month and years)
….. flere felt her ….. (more submitter/submission info as needed)
</registrering>
<dp> ..... dåpshandling ..... (event Record (christening))
<side>5</side> (page)
<lopenr>1</lopenr> (entry number)
<aar>1817</aar> (christening year)
<dpdato>07.01</dpdato> (christening date)
….. flere felt her…. (more record level attributes as needed)
<dp_person> (Persona)
<rolle>barn</rolle> (Role - Child)
<forenamn>Anna Lovinda</forenamn> (Given Name)
<kjonn>k</kjonn> (Gender)
….. flere felt her ….. (More attributes as needed)
</dp_person>
<dp_person> (Persona)
<rolle>far</rolle> (Role – Father)
<forenamn>Mons Ola</forenamn> (Given Name)
<etternamn>Hansen</etternamn> (Surname)
<kjonn>m</kjonn> (Gender)
….. flere felt her ….. (more attributes as needed)
</dp_person>
….. flere personer her ….. (more Personas as needed)
</dp>
….. flere dåper her ….. (more event (Christening) records as needed)
</kyrre>
Reply all
Reply to author
Forward
0 new messages