World War One Linked Open Data

172 views
Skip to first unread message

Rob Warren

unread,
Feb 18, 2012, 11:22:09 PM2/18/12
to Linked Open Data in Libraries, Archives, & Museums, ww1...@mailman.muninn-project.org
Hi there,

We're hitting a critical mass of people working on linked open data
approaches to the first world war. A few of us would like to get a
conversation going to swap ideas and encourage cross linking of our
data-sets. A mailman mailing list has been created at ww1-
l...@mailman.muninn-project.org or http://mailman.muninn-project.org/cgi-bin/mailman/listinfo/ww1-lod.

One of my immediate concerns is with the mark-up of personal names and
finding a class that people are comfortable using for representing for
people. The most widespread standard at the moment is FOAF, it's well
documented and it makes the information immediately accessible to
everyone. I've had a lot of difficulty in dealing with name changes at
marriages, partial initials, knight-hoods and with non-western names.

There has beenf talk in the foaf community about names [1] but no
solutions have been forthcoming. As it stands, foaf has three
competing sets of properties for simple western names but no property
for middle names. Dublin Core [2] has an interesting note page on name
representation but peters out by basically suggesting using someone-
else' name authority file (eg: copy their string exactly). Similarly,
the GEDCOM [3] standard for genealogical data has only limited support
for a 'surname'.

Don't get me started about MARC records or marking up royal styles.

So, how are you handling partial information like initials and complex
names? Do you have any designs you would like to share that we can
agree on to better support record linkage later on? I've experimented
with some DatatypeProperties for names but these provide a solution
for name changing only.

Look forward to hearing your ideas,
rhw


[1] http://wiki.foaf-project.org/w/NamesInFoaf
[2] http://dublincore.org/documents/name-representation/
[3] http://en.wikipedia.org/wiki/GEDCOM
[3] http://sites.google.com/site/opencatalogingrules/22-6-entry-under-title-of-nobility

Jason Darwin

unread,
Feb 19, 2012, 1:46:44 AM2/19/12
to lod...@googlegroups.com
Hi Rob,

Good to hear of your work in this area. We've currently been working on exposing the WWI data concerning New Zealand personnel, and have a rich dataset containing the details of around 110,000 New Zealand personnel that embarked to take part in WWI.

This dataset has been published online in a non-linked-data format (e.g. http://muse.aucklandmuseum.com/databases/cenotaph/35181.detail) however we're now doing the work to allow the data contained in this dataset to be queried using a SPARQL end-point (though we're still a few weeks from being able to make this publicly available).

Regarding your question about mark-up and representation of personal names (or Authority Control, as it is termed in the world of libraries and archives), there are a number of standards out there which are of some use in this area, but perhaps the most useful are:

1. Metadata Authority Description Schema (MADS): http://www.loc.gov/standards/mads/
Maintained by the Library of Congress and now available in RDF, this standard has proven a little problematic in that it doesn't model all the elements and attributes that you would ideally want, and is perhaps a little too-closely focused on the needs of the library sector.

2. Encoded Archival Context - Corporate Bodies, Persons, and Families (EAC-CPF): http://eac.staatsbibliothek-berlin.de/
This standard is very promising, and tries I think to provide an alternative to some of the limitations of MADS, and has been adopted by the National Library of Australia for their People Australia project (https://wiki.nla.gov.au/display/peau/Home;jsessionid=1antlq1xi5x2g19hvtt1hs450h)

3. Entity Authority Toolset (EATS): http://code.google.com/p/eats/
This has the advantage of being both a conceptual model and also an implementation (based on the Django framework), and is geared towards allowing authority control of any entity -- not only people and organizations, but other that you might want to control, e.g. ships, events, etc.
It's only real downside is that the community behind it is rather small, and although well thought-out, it doesn't have significant institutional backing, unlike the others above.

I'd suggest examining the three above systems -- they are more involved than FOAF, but that means that they also allow for the capture of richer data and relationships.

At this stage, for the work we're doing with the WWI Cenotaph data we aren't using an authority control system, though it will become a consideration a we begin to link out to other collections.

Regards,

Jason Darwin

Richard Light

unread,
Feb 20, 2012, 3:26:18 AM2/20/12
to lod...@googlegroups.com
When you ask about conventions for recording personal names, is the intention behind this simply to have consistent metadata relating to people, or is your goal for the normalized name to act, in itself, as an unambiguous identity for a single individual?

If the goal is to identify individuals, a simpler approach would be to have a random (numerical) identifier URL to act as the person's identity, and to associate with that identifier all the facts you know about that person, expressed as simple RDF assertions. This bundle of assertions could be queried to determine whether another person (represented by a different bundle of evidence) is actually the same individual.  This approach allows for the incomplete and inexact nature of the information we will have to hand about any individual, while also removing the need to embed key facts into the URL which acts as an identity for the person.  This in turn means that you never have to change the "identifier" URL simply because new facts come to light.

Richard
--
Richard Light

Antoine Isaac

unread,
Feb 20, 2012, 10:50:41 AM2/20/12
to lod...@googlegroups.com
Dear Rob,

Adding to Jason's list, a couple of maybe-relevant pointers I've seen passing by while working on [1]:
- RDA vocabularies have some person-related properties, especially in group 2 [2]
- the German national library have coined their own extension for persons, a sort of application profile of other existing vocabularies (Gemeinsame NormDatei (GND) vocabulary, see [3])
- of course you may want to see what's happening at viaf.org and the person authorities published at the French national library [4]

But I'm not sure whether there are very detailed frameworks for names in these...

Best,

Antoine Isaac

PS: Europeana may register to your list. We've got quite some WWI-related stuff, too.

[1] http://www.w3.org/2005/Incubator/lld/XGR-lld-vocabdataset/#Metadata_Element_Sets
[2] http://rdvocab.info/ElementsGr2/
[3] https://wiki.d-nb.de/display/LDS
[4] http://data.bnf.fr/semanticweb-en

> Hi Rob,
>
> Good to hear of your work in this area. We've currently been working on exposing the WWI data concerning New Zealand personnel, and have a rich dataset containing the details of around 110,000 New Zealand personnel that embarked to take part in WWI.
>
> This dataset has been published online in a non-linked-data format (e.g. http://muse.aucklandmuseum.com/databases/cenotaph/35181.detail) however we're now doing the work to allow the data contained in this dataset to be queried using a SPARQL end-point (though we're still a few weeks from being able to make this publicly available).
>
> Regarding your question about mark-up and representation of personal names (or Authority Control, as it is termed in the world of libraries and archives), there are a number of standards out there which are of some use in this area, but perhaps the most useful are:
>
> 1. Metadata Authority Description Schema (MADS): http://www.loc.gov/standards/mads/
> Maintained by the Library of Congress and now available in RDF, this standard has proven a little problematic in that it doesn't model all the elements and attributes that you would ideally want, and is perhaps a little too-closely focused on the needs of the library sector.
>
> 2. Encoded Archival Context - Corporate Bodies, Persons, and Families (EAC-CPF): http://eac.staatsbibliothek-berlin.de/
> This standard is very promising, and tries I think to provide an alternative to some of the limitations of MADS, and has been adopted by the National Library of Australia for their People Australia project (https://wiki.nla.gov.au/display/peau/Home;jsessionid=1antlq1xi5x2g19hvtt1hs450h)
>
> 3. Entity Authority Toolset (EATS): http://code.google.com/p/eats/
> This has the advantage of being both a conceptual model and also an implementation (based on the Django framework), and is geared towards allowing authority control of any entity -- not only people and organizations, but other that you might want to control, e.g. ships, events, etc.
> It's only real downside is that the community behind it is rather small, and although well thought-out, it doesn't have significant institutional backing, unlike the others above.
>
> I'd suggest examining the three above systems -- they are more involved than FOAF, but that means that they also allow for the capture of richer data and relationships.
>
> At this stage, for the work we're doing with the WWI Cenotaph data we aren't using an authority control system, though it will become a consideration a we begin to link out to other collections.
>
> Regards,
>
> Jason Darwin
>
>
>
> On Sun, Feb 19, 2012 at 5:22 PM, Rob Warren <muninn....@gmail.com <mailto:muninn....@gmail.com>> wrote:
>
> Hi there,
>
> We're hitting a critical mass of people working on linked open data
> approaches to the first world war. A few of us would like to get a
> conversation going to swap ideas and encourage cross linking of our
> data-sets. A mailman mailing list has been created at ww1-

> l...@mailman.muninn-project.org <mailto:l...@mailman.muninn-project.org> or http://mailman.muninn-project.org/cgi-bin/mailman/listinfo/ww1-lod.

Rob Warren

unread,
Feb 20, 2012, 11:58:14 AM2/20/12
to Linked Open Data in Libraries, Archives, & Museums, ww1...@mailman.muninn-project.org
Jason,

Thank you for the pointers, I'm still working on the names problem.

As a side note, I've been dealing with a few modeling issues with
former British Dominions becoming autonomous and others being
amalgamated into new entities. A typical case is the Dominion of
Labrador and Newfoundland which is now a province of Canada: depending
on what database you are looking at, Newfoundland troops are sometimes
listed under Newfoundland or Canada. As you make your data available
in lod form, would you consider creating an organization instance for
the Dominion of New Zealand? That would make some concepts, like
'British Empire Troops' a bit easier to build.

rhw

Rob Warren

unread,
Feb 20, 2012, 12:17:47 PM2/20/12
to Linked Open Data in Libraries, Archives, & Museums, ww1...@mailman.muninn-project.org
Richard,

I'm looking for a solution to normalizing names sufficiently that
reasonably accurate record linkage can be performed on the data;
eventually, we will want to link some of this data automatically after
all.

My initial design for persons and organizations is essentially your
URL suggestion. The pieces that make up a person' name contain a great
deal of cultural information and there might be a way to extract value
from that by using the right markup / schema / ontology. Simply
stuffing everything into a string as 'preferredName' seems wasteful.
EATS as pointed out by Jason has a few ideas that I find interesting.
I should do a write up on this soon.

rhw

On Feb 20, 3:26 am, Richard Light <rich...@light.demon.co.uk> wrote:
> When you ask about conventions for recording personal names, is the
> intention behind this simply to have consistent metadata relating to
> people, or is your goal for the normalized name to act, in itself, as an
> unambiguous identity for a single individual?
>
> If the goal is to identify individuals, a simpler approach would be to
> have a random (numerical) identifier URL to act as the person's
> identity, and to associate with that identifier all the facts you know
> about that person, expressed as simple RDF assertions. This bundle of
> assertions could be queried to determine whether another person
> (represented by a different bundle of evidence) is actually the same
> individual.  This approach allows for the incomplete and inexact nature
> of the information we will have to hand about any individual, while also
> removing the need to embed key facts into the URL which acts as an
> identity for the person.  This in turn means that you never have to
> change the "identifier" URL simply because new facts come to light.
>
> Richard
>
> On 19/02/2012 06:46, Jason Darwin wrote:
>
>
>
>
>
>
>
>
>
> > Hi Rob,
>
> > Good to hear of your work in this area. We've currently been working
> > on exposing the WWI data concerning New Zealand personnel, and have a
> > rich dataset containing the details of around 110,000 New Zealand
> > personnel that embarked to take part in WWI.
>
> > This dataset has been published online in a non-linked-data format
> > (e.g.http://muse.aucklandmuseum.com/databases/cenotaph/35181.detail)
> > however we're now doing the work to allow the data contained in this
> > dataset to be queried using a SPARQL end-point (though we're still a
> > few weeks from being able to make this publicly available).
>
> > Regarding your question about mark-up and representation of personal
> > names (or Authority Control, as it is termed in the world of libraries
> > and archives), there are a number of standards out there which are of
> > some use in this area, but perhaps the most useful are:
>
> > 1. Metadata Authority Description Schema (MADS):
> >http://www.loc.gov/standards/mads/
> > Maintained by the Library of Congress and now available in RDF, this
> > standard has proven a little problematic in that it doesn't model all
> > the elements and attributes that you would ideally want, and is
> > perhaps a little too-closely focused on the needs of the library sector.
>
> > 2. Encoded Archival Context - Corporate Bodies, Persons, and Families
> > (EAC-CPF):http://eac.staatsbibliothek-berlin.de/
> > This standard is very promising, and tries I think to provide an
> > alternative to some of the limitations of MADS, and has been adopted
> > by the National Library of Australia for their People Australia
> > project
> > (https://wiki.nla.gov.au/display/peau/Home;jsessionid=1antlq1xi5x2g19h...)
>
> > 3. Entity Authority Toolset (EATS):http://code.google.com/p/eats/
> > This has the advantage of being both a conceptual model and also an
> > implementation (based on the Django framework), and is geared towards
> > allowing authority control of any entity -- not only people and
> > organizations, but other that you might want to control, e.g. ships,
> > events, etc.
> > It's only real downside is that the community behind it is rather
> > small, and although well thought-out, it doesn't have significant
> > institutional backing, unlike the others above.
>
> > I'd suggest examining the three above systems -- they are more
> > involved than FOAF, but that means that they also allow for the
> > capture of richer data and relationships.
>
> > At this stage, for the work we're doing with the WWI Cenotaph data we
> > aren't using an authority control system, though it will become a
> > consideration a we begin to link out to other collections.
>
> > Regards,
>
> > Jason Darwin
>
> > On Sun, Feb 19, 2012 at 5:22 PM, Rob Warren <muninn.proj...@gmail.com
> > <mailto:muninn.proj...@gmail.com>> wrote:
>
> >     Hi there,
>
> >     We're hitting a critical mass of people working on linked open data
> >     approaches to the first world war. A few of us would  like to get a
> >     conversation going to swap ideas and encourage cross linking of our
> >     data-sets. A mailman mailing list has been created at ww1-
> >     l...@mailman.muninn-project.org
> >     <mailto:l...@mailman.muninn-project.org> or
> >    http://sites.google.com/site/opencatalogingrules/22-6-entry-under-tit...
>
> --
> *Richard Light*

Rob Warren

unread,
Feb 20, 2012, 12:28:22 PM2/20/12
to Linked Open Data in Libraries, Archives, & Museums
Antoine,

Thank you for the pointers, I found the German National Library
interesting in its use of 'gnd:usedRules' to identify the type of name
representation. The lack of a language / localization tag is a bit
disturbing through.

rhw

On Feb 20, 10:50 am, Antoine Isaac <ais...@few.vu.nl> wrote:
> Dear Rob,
>
> Adding to Jason's list, a couple of maybe-relevant pointers I've seen passing by while working on [1]:
> - RDA vocabularies have some person-related properties, especially in group 2 [2]
> - the German national library have coined their own extension for persons, a sort of application profile of other existing vocabularies (Gemeinsame NormDatei (GND) vocabulary, see [3])
> - of course you may want to see what's happening at viaf.org and the person authorities published at the French national library [4]
>
> But I'm not sure whether there are very detailed frameworks for names in these...
>
> Best,
>
> Antoine Isaac
>
> PS: Europeana may register to your list. We've got quite some WWI-related stuff, too.
>
> [1]http://www.w3.org/2005/Incubator/lld/XGR-lld-vocabdataset/#Metadata_E...
> [2]http://rdvocab.info/ElementsGr2/
> [3]https://wiki.d-nb.de/display/LDS
> [4]http://data.bnf.fr/semanticweb-en
>
>
>
>
>
>
>
> > Hi Rob,
>
> > Good to hear of your work in this area. We've currently been working on exposing the WWI data concerning New Zealand personnel, and have a rich dataset containing the details of around 110,000 New Zealand personnel that embarked to take part in WWI.
>
> > This dataset has been published online in a non-linked-data format (e.g.http://muse.aucklandmuseum.com/databases/cenotaph/35181.detail) however we're now doing the work to allow the data contained in this dataset to be queried using a SPARQL end-point (though we're still a few weeks from being able to make this publicly available).
>
> > Regarding your question about mark-up and representation of personal names (or Authority Control, as it is termed in the world of libraries and archives), there are a number of standards out there which are of some use in this area, but perhaps the most useful are:
>
> > 1. Metadata Authority Description Schema (MADS):http://www.loc.gov/standards/mads/
> > Maintained by the Library of Congress and now available in RDF, this standard has proven a little problematic in that it doesn't model all the elements and attributes that you would ideally want, and is perhaps a little too-closely focused on the needs of the library sector.
>
> > 2. Encoded Archival Context - Corporate Bodies, Persons, and Families (EAC-CPF):http://eac.staatsbibliothek-berlin.de/
> > This standard is very promising, and tries I think to provide an alternative to some of the limitations of MADS, and has been adopted by the National Library of Australia for their People Australia project (https://wiki.nla.gov.au/display/peau/Home;jsessionid=1antlq1xi5x2g19h...)
>
> > 3. Entity Authority Toolset (EATS):http://code.google.com/p/eats/
> > This has the advantage of being both a conceptual model and also an implementation (based on the Django framework), and is geared towards allowing authority control of any entity -- not only people and organizations, but other that you might want to control, e.g. ships, events, etc.
> > It's only real downside is that the community behind it is rather small, and although well thought-out, it doesn't have significant institutional backing, unlike the others above.
>
> > I'd suggest examining the three above systems -- they are more involved than FOAF, but that means that they also allow for the capture of richer data and relationships.
>
> > At this stage, for the work we're doing with the WWI Cenotaph data we aren't using an authority control system, though it will become a consideration a we begin to link out to other collections.
>
> > Regards,
>
> > Jason Darwin
>
> > On Sun, Feb 19, 2012 at 5:22 PM, Rob Warren <muninn.proj...@gmail.com <mailto:muninn.proj...@gmail.com>> wrote:
>
> >     Hi there,
>
> >     We're hitting a critical mass of people working on linked open data
> >     approaches to the first world war. A few of us would like to get a
> >     conversation going to swap ideas and encourage cross linking of our
> >     data-sets. A mailman mailing list has been created at ww1-
> >     l...@mailman.muninn-project.org <mailto:l...@mailman.muninn-project.org> orhttp://mailman.muninn-project.org/cgi-bin/mailman/listinfo/ww1-lod.
> >     [3]http://sites.google.com/site/opencatalogingrules/22-6-entry-under-tit...

Jason Darwin

unread,
Feb 20, 2012, 1:17:05 PM2/20/12
to Linked Open Data approaches to the Great War, Linked Open Data in Libraries, Archives, & Museums
Hi Rob,

No problem -- will bear that in mind.

Jason

_______________________________________________
ww1-lod mailing list
ww1...@mailman.muninn-project.org
http://mailman.muninn-project.org/cgi-bin/mailman/listinfo/ww1-lod

Owen Stephens

unread,
Feb 20, 2012, 5:35:30 PM2/20/12
to Linked Open Data in Libraries, Archives, & Museums
It might be worth looking at event driven models - especially for
aspects like marriage and ennoblement. I can't claim any special
expertise I'm afraid, but the CIDOC-CRM would be a possible starting
point - although as far as I know it doesn't deal with names directly,
it can represent other aspects like marriage events. RDFS for CIDOC-
CRM is at http://www.cidoc-crm.org/rdfs/5.0.4/cidoc-crm, and
documentation on the entities at http://www.cidoc-crm.org/docs/cidoc_crm_version_5.0.4.pdf

Rob Warren

unread,
Feb 29, 2012, 12:22:05 PM2/29/12
to Linked Open Data in Libraries, Archives, & Museums
Yes, I took a good look at their model. It gracefully handles events
but the specific implementation of objects and 'Things' is open. I'm
still working on this.

rhw

On Feb 20, 5:35 pm, Owen Stephens <o...@ostephens.com> wrote:
> It might be worth looking at event driven models - especially for
> aspects like marriage and ennoblement. I can't claim any special
> expertise I'm afraid, but the CIDOC-CRM would be a possible starting
> point - although as far as I know it doesn't deal with names directly,
> it can represent other aspects like marriage events. RDFS for CIDOC-
> CRM is athttp://www.cidoc-crm.org/rdfs/5.0.4/cidoc-crm, and
Reply all
Reply to author
Forward
0 new messages