Archival linked data: a discussion

353 views
Skip to first unread message

Ethan Gruber

unread,
May 16, 2014, 4:22:56 PM5/16/14
to lod...@googlegroups.com
Hi all,

I'm doing a lot of work with archival authorities (EAC-CPF) and EAD finding aids, including publishing both as linked open data. There's a lot of need in this area, as I think linked data methodologies have an important role to play in the dissemination of archival resources, and the community has not yet come together to discuss ontologies, models, and best practices.

As a software developer working on modeling EAC and EAD into RDF and pushing data into triplestores (http://eaditor.blogspot.com/2014/05/linking-archival-entities-and-resources.html), I've reach the extent of what I can do by myself, I think. By that I mean I hesitate to take the modeling further within the silo of my own applications. I would rather implement standards that emerge from the community.

At the moment, I am focusing on modeling EAD into RDF. I am applying Aaron Rubinstein's Arch ontology (http://gslis.simmons.edu/archival/arch/index.html), which I think is a great start. There are a few points with it that I'll raise for discussion:

1. How do you relate things together, e.g., a collection comprising sub-collections. Sub-collections that contain individual items? There's no property for hierarchical relationships in the ontology, but I think dcterms:isPartOf would work. That is to say, an Item (such as a manuscript) dcterms:isPartOf a Collection.

2. arch:Manuscript. This class seems a little too specific. Maybe an Item class would work better? What if a collection contains photographs, glass plate negatives, or rock samples from the collection of a geologist? The particular type of item could be defined with a dcterms:format and a URI from the Getty AAT or another thesaurus.

3. Not that big of a deal, but some properties are in the http://purl.org/dc/elements/1.1/ namespace. I have casually observed that Dublin Core terms from http://purl.org/dc/terms/ are the de facto standard.

4. I think a proof of concept CIDOC-CRM model was developed for representing an archival collection, but I don't know how far down that rabbit hole the community is willing to go.

For those who are interested, here (https://github.com/ewg118/eaditor/blob/master/ui/xslt/serializations/ead/rdf.xsl) is a very rudimentary XSLT stylesheet for transforming an EAD file into RDF/XML following the arch ontology.

Ethan Gruber
American Numismatic Society

Ingrid Mason

unread,
May 18, 2014, 12:11:01 AM5/18/14
to lod...@googlegroups.com
Hi Ethan,

Last year I did some translation work on a humanities project.  I was working to translate metadata from different data sources using CIDOC-CRM and FRBR-oo to cross-walk to.  The project ended up going in a different direction translation-wise, and I haven't had this thinking and translation tested externally, so here goes :-)   

I'm pretty sure there will be gaffs in the translation work.  I'd not cross-walked to ontologies from schemas before, just between schemas and was new to both ontologies.  I'll willingly admit the getting acquainted with both ontologies, the European Data Model and the translation work melted my brain a wee bit.  In the end I slipped around the definition of a collection (between the conceptual and the actual) and items, and sought guidance from my colleague Conal Tuohy.  I found this work from UKOLN quite useful from both a conceptual and functional perspective (i.e. technical aggregation).     

In the end, I decided a collection could be a range of entities in CIDOC-CRM to deal with: collections by theme, archival series, collections by name, physical collections, finding aids, registers, catalogues, datasets, etc.  I got a bit stuck with the conceptual and physical, and finding aids (which is a collection of entries and also a work) when looking at where CIDOC-CRM and FRBR-oo align and the use of E89_Propositional_Object and E73_Information_Object, E31_Document.  I made some decisions about digital archives which may or not fall within the definition for E78_Collection.  Felt a bit mad after all that.  Someone in here may be you or others are able to verify for me where I went wrong, and what I got right in some way.  It would be really good to have someone with a strong archival focus look at this, and compare it to someone with a library and museum focus.    

Several of the datasets dealt with collections, items and people and I list them here and apologise if the thinking behind this is opaque, but thought it might be useful to see at least an attempt at cross-walking from "different data sources that deal with collections, items and people, to CIDOC-CRM.    

PARADISEC *where I got to with attempting to unscramble collections

More fun than a bag of monkeys! :o) 

There are some EAC-CPF/EAD experts here in Australia... so I've pushed your post out the 2cultures and ands-general Google group lists in the hope those with more expertise might respond.  

Hope that in some way helps.

Good wishes, Ingrid 



--
You received this message because you are subscribed to the Google Groups "Linked Open Data in Libraries, Archives, & Museums" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lod-lam+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ethan Gruber

unread,
May 20, 2014, 3:51:19 PM5/20/14
to lod...@googlegroups.com
Hi Ingrid,

I have begun working with CIDOC-CRM in different contexts fairly recently (within the last year), and I definitely think that the CRM could apply to archival collections. Some preliminary work was made in mapping from EAD to CIDOC-CRM (http://www.cidoc-crm.org/workshops/finland_helsinki_20102801/N13_28Jan2010%20Christos%20Papatheodorou.pdf), and I think the attempt was pretty logical, but I don't know how far they took it with respect to putting that RDF into a production environment.

Mapping a complex and highly structural document model like EAD into RDF is a challenge. The first step is to recognize the advantages and disadvantages of linked data and not try to implement a model that becomes exponentially more complicated (and practically impossible to build an information system upon). Ideally, you'd want to start as simple as possible and leave room to add greater semantic complexity later.

My apprehension with the CRM is that it is a closed system that seems to encourage an all or nothing approach. I suppose there is nothing stopping me from defining an E31_Document that contains some dcterms:subjects that point to LCSH URIs, but it seems to be frowned upon. Why does the CRM forge ahead with its own geographic classes and properties when the GIS community has already developed more robust and widely used ontologies for geographic data? Maybe these are questions to be answered some other time.

Do you have some RDF from PARADISEC that I can look at?

Thanks,
Ethan

Simon Spero

unread,
May 20, 2014, 7:54:23 PM5/20/14
to lod...@googlegroups.com
There is a mailing list for the CRM special interest group ; mailman page is 

There are also (non-official) OWL versions of the CRM ontology available at https://github.com/erlangen-crm/ecrm ;  OWL's  support for identity/difference assertions, as well as equivalent properties and classes, make for easier integration with other sources.  

Karen Wickett's dissertation formalizing some of the relationships between collection and item metadata is well worth reading;  https://www.ideals.illinois.edu/handle/2142/42198 

EAD does not map cleaning onto much of anything;  it clearly shows its heritage as a document markup language. 

[My pet peeve is with the single subfield specifier allowed for subject headings, making mapping from the EAD plain text into subdivided subject headings much, much harder (appropriately trained maximum entropy based parts of speech taggers do pretty well, but $x vs. $v probabilities are different for archival headings).  ]
 
Simon

Karen Wickett

unread,
May 20, 2014, 8:49:05 PM5/20/14
to lod...@googlegroups.com
Hello,

If you would like to know what the current thinking is on representing collection-level objects and descriptions in a RDF-based cultural heritage data model (the Europeana Data Model), check out our recent D-Lib article here:

We specifically talk about representing the collection membership relationship there, and recommend defining a sub-property of dcterms:isPartOf, which we gave the name edm:isGatheredInto. If you want to be able to infer (for example) that something with collection members is a collection, you need a more specific relator than dcterms:isPartOf.

In our model development work we were trying to be quite general, and come up with recommendations that make sense in archival contexts. But we did not consider EAD specifically. The technical report ( http://hdl.handle.net/2142/45860) referenced in the D-Lib article proposes a property set for describing collections, it might be feasible to map into from EAD.

Best,
Karen Wickett

Antoine Isaac

unread,
May 21, 2014, 3:45:14 AM5/21/14
to lod...@googlegroups.com
Hi all,

A bit more reading from the Europeana side, if you're interested.

There was a prototype converting EAD data to RDF
http://pro.europeana.eu/ead-edm

This work has been pursued in the DM2E.eu project - I've asked them if they can share something.

Also, the APEx project (http://www.apex-project.eu) is sending us a lot of data in EDM, and thus in RDF. The German Digital Libra
ry too.
These mappings are alluded to in a report we have at
http://pro.europeana.eu/web/network/europeana-tech/-/wiki/Main/Task+force+on+EDM+mappings+refinements+and+extensions

It is certainly possible to get more info, but this will require some digging around, so it will happen only if you are interested.

Otherwise, if you are interested in CRM, FRBR and correspondences in-between (again, in the Europeana context), there was also a task
force reporting on this:
http://pro.europeana.eu/web/network/europeana-tech/-/wiki/Main/Task+Force+EDM+FRBRoo

Best,

Antoine

On 5/21/14 2:49 AM, Karen Wickett wrote:
> Hello,
>
> If you would like to know what the current thinking is on representing collection-level objects and descriptions in a RDF-based cultural heritage data model (the Europeana Data Model), check out our recent D-Lib article here:
> www.dlib.org/dlib/may14/wickett/05wickett.html <http://www.dlib.org/dlib/may14/wickett/05wickett.html>
>
> We specifically talk about representing the collection membership relationship there, and recommend defining a sub-property of dcterms:isPartOf, which we gave the name edm:isGatheredInto. If you want to be able to infer (for example) that something with collection members is a collection, you need a more specific relator than dcterms:isPartOf.
>
> In our model development work we were trying to be quite general, and come up with recommendations that make sense in archival contexts. But we did not consider EAD specifically. The technical report (http://hdl.handle.net/2142/45860) referenced in the D-Lib article proposes a property set for describing collections, it might be feasible to map into from EAD.
>
> Best,
> Karen Wickett
>
>
>
> On Friday, May 16, 2014 3:22:56 PM UTC-5, Ethan Gruber wrote:
>
> Hi all,
>
> I'm doing a lot of work with archival authorities (EAC-CPF) and EAD finding aids, including publishing both as linked open data. There's a lot of need in this area, as I think linked data methodologies have an important role to play in the dissemination of archival resources, and the community has not yet come together to discuss ontologies, models, and best practices.
>
> As a software developer working on modeling EAC and EAD into RDF and pushing data into triplestores (http://eaditor.blogspot.com/2014/05/linking-archival-entities-and-resources.html <http://eaditor.blogspot.com/2014/05/linking-archival-entities-and-resources.html>), I've reach the extent of what I can do by myself, I think. By that I mean I hesitate to take the modeling further within the silo of my own applications. I would rather implement standards that emerge from the community.
>
> At the moment, I am focusing on modeling EAD into RDF. I am applying Aaron Rubinstein's Arch ontology (http://gslis.simmons.edu/archival/arch/index.html <http://gslis.simmons.edu/archival/arch/index.html>), which I think is a great start. There are a few points with it that I'll raise for discussion:
>
> 1. How do you relate things together, e.g., a collection comprising sub-collections. Sub-collections that contain individual items? There's no property for hierarchical relationships in the ontology, but I think dcterms:isPartOf would work. That is to say, an Item (such as a manuscript) dcterms:isPartOf a Collection.
>
> 2. arch:Manuscript. This class seems a little too specific. Maybe an Item class would work better? What if a collection contains photographs, glass plate negatives, or rock samples from the collection of a geologist? The particular type of item could be defined with a dcterms:format and a URI from the Getty AAT or another thesaurus.
>
> 3. Not that big of a deal, but some properties are in the http://purl.org/dc/elements/1.1/ <http://purl.org/dc/elements/1.1/> namespace. I have casually observed that Dublin Core terms from http://purl.org/dc/terms/ are the de facto standard.
>
> 4. I think a proof of concept CIDOC-CRM model was developed for representing an archival collection, but I don't know how far down that rabbit hole the community is willing to go.
>
> For those who are interested, here (https://github.com/ewg118/eaditor/blob/master/ui/xslt/serializations/ead/rdf.xsl <https://github.com/ewg118/eaditor/blob/master/ui/xslt/serializations/ead/rdf.xsl>) is a very rudimentary XSLT stylesheet for transforming an EAD file into RDF/XML following the arch ontology.
>
> Ethan Gruber
> American Numismatic Society
>
> --
> You received this message because you are subscribed to the Google Groups "Linked Open Data in Libraries, Archives, & Museums" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to lod-lam+u...@googlegroups.com <mailto:lod-lam+u...@googlegroups.com>.

Ingrid Mason

unread,
Jan 15, 2015, 9:12:38 PM1/15/15
to lod...@googlegroups.com
Hi Ethan,

Lord, I missed this email and I'm really sorry I did.  I don't have access to RDF from PARADISEC... fyi they encode in qualified DC and use the OLAC application profile.    

I am still very interested to know how people are faring with reusing CIDOC-CRM and EAD & EAC-CPF to support their encoding and any LOD work.  Reason being we had a session on CIDOC-CRM at the 2013 LODLAM summit that was mostly about what it was most suitable for and more recently a bunch of us were interested in the capacity for CIDOC-CRM to serve some of the requirements of archaeologists and cultural heritage collections (and using LOD methods) at the Canberra THATCamp in late 2014.  I've fielded (not well) some questions from newbies about what is a good means of starting out.  I wanted to ask those people to reconsider their own encoding and schemas (whether mapped to a standard or not), what they understood the major entitles were, and the relationships between them, and to move on from there to deciding how to approach creating LOD from that.    

Would be good to get as much of this unpacked as possible and it seems to me this might make for a couple of good sessions at the 2015 LODLAM summit given your point about efforts in the GIS community and the "all or nothing approach" with CIDOC-CRM and the advice about "starting as simple as possible".  

Thanks!  Ingrid   


Reply all
Reply to author
Forward
0 new messages