OAI-PMH: EAD identifiers

181 views
Skip to first unread message

Scott Renton

unread,
Aug 29, 2018, 7:43:09 AM8/29/18
to AtoM Users
Hi folks

(I've spoken to Justin about this in person). I'm harvesting Strathclyde's AtoM and our ArchivesSpace, the idea being to build a merged front end with a smattering of metadata which drives the user back to the original archival repository to view the item fully in context.

If I harvest DC, it works well- I get a public URL that can go on this record. i.e.:

<metadata>
<dc:title>Syllabus of lectures on 'Cities in Evolution'</dc:title>
<dc:description>
An introductory course of general sociology. University of Bombay.
</dc:description>
<dc:date>1919</dc:date>
<dc:format>1 item</dc:format>
<dc:identifier>5</dc:identifier>
<dc:language xsi:type="dcterms:ISO639-3">eng</dc:language>
<dc:rights>Open</dc:rights>
</oai_dc:dc>
</metadata>


With EAD, though, I can map the items appropriately to show in this site, but there is no URL (slug?) on the item (or piece, series or section) at that level to be able to use this.  i.e:
<c level="item">
<did>
<unittitle encodinganalog="3.1.2">
Plan for a garden for small children to be arranged in a public park
</unittitle>
<unitid encodinganalog="3.1.1" countrycode="GB" repositorycode="249">T-GED/7/5/8A</unitid>
<unitdate normal="1909/1909" encodinganalog="3.1.3">1909</unitdate>
<physdesc encodinganalog="3.1.5">1 plan</physdesc>
</did>
<odd type="publicationStatus">
<p>published</p>
</odd>
<scopecontent encodinganalog="3.3.1">
<p>Scale: 16 inches to 1 foot</p>
</scopecontent>
<accessrestrict encodinganalog="3.4.1">
<p>Open</p>
</accessrestrict>
</c>

Is that something that can be configured/added? It would need to be at section, series, item and piece level, as that appears to be what Dublin Core gives us when we harvest by set (collection).

Dublin Core almost gives us what we need (everything but digital objects), and it would be great to use it as I wouldn't need to traverse and map the EAD. 
However, ArchivesSpace, which is the application the other repository uses, does not have Dublin Core set up in as usable a way. I have managed to get the relevant info onto the EAD for that material, so it would be nice to process both the same way (and we would get AtoM digital objects using EAD).

Any thoughts very welcome!
Thanks again
Scott


David Juhasz

unread,
Aug 29, 2018, 2:13:05 PM8/29/18
to ica-ato...@googlegroups.com
Hi Scott,

AtoM currently implements the older EAD 2002 standard for its EAD finding aids module and OAI-PMH EAD responses.  I'm not an expert on EAD 2002, but from my exploration of the tag library and the examples provided I can't find any element or attribute that is intended to represent a URL for a component (e.g. <c>, <c01>) of an EAD 2002 finding aid.  The current EAD3 standard includes a component level @base attribute that is explicitly intended to "specify a base URI that is different than the base URI of the EAD instance", but upgrading AtoM's EAD module to support EAD3 is a significant development project that would require community funding or a code contribution to realize.

If you are interested in discussing sponsorship or code contribution to AtoM to implement EAD3, please contact us off list at in...@artefactual.com.  More information is available on the AtoM2 wiki about the AtoM 2 development philosophy and the Bounty model that drives AtoM2 development.


Best regards,
David
--

David Juhasz
Director, AtoM Technical Services Artefactual Systems Inc. www.artefactual.com


--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To post to this group, send email to ica-ato...@googlegroups.com.
Visit this group at https://groups.google.com/group/ica-atom-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/88b8f007-9d88-494b-95d9-173a39ea2534%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dan Gillean

unread,
Aug 30, 2018, 2:51:08 PM8/30/18
to ICA-AtoM Users
Hi Scott, 

One possibility that has occurred to me: 

We could potentially add an @id attribute to the <unitid> element of the EAD 2002 XML export in AtoM, and use the URL as the value of the <unitid> @id. EAD 2002 describes the id linking attribute as such: 

ID -- An identifier used to name the element so that it can be referred to, or referenced from, somewhere else. Each ID within a document must have a unique value. The ID attribute regularizes the naming of the element and thus facilitates building links between it and other resources.

Since AtoM's URLs are unique per resource, this seems to me like it could work, and it supports the intention of the attribute to facilitate "building links between it and other resources ... so it can be referred to, or referenced from, somewhere else."

You might have to customize your parser to know to scrape that <unitid> @id, but if this seems possible on your end, then I think it would be a small-ish piece of development to add this to our EAD, and ensure it is included in the OAI EAD response. I'd have to do a bit more analysis to confirm that this would still produce valid EAD, but at a glance, it seems so - and it wouldn't require any changes to our import code. 

Regards, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory


David Juhasz

unread,
Aug 30, 2018, 4:36:11 PM8/30/18
to ica-ato...@googlegroups.com
Hi Dan,

I think the EAD 2002 @id attribute implements the xsd:ID data type.  I belive xsd:ID is too strict to represent HTTP URIs because it doesn't allow forward slashes.   I may be wrong about HTTP URIs and xsd:ID being incompatible, but I'm pretty sure I've looked into using @id for URIs in the past and it was a no go.

Best regards,
David
--

David Juhasz
Director, AtoM Technical Services Artefactual Systems Inc. www.artefactual.com


janestev...@gmail.com

unread,
Sep 4, 2018, 10:44:05 AM9/4/18
to AtoM Users
Hi there,

In case it is of interest, the Archives Hub (which is harvesting from AtoM as well as getting data from many other places) provides a URI for every item. At collection-level you can enter a URI within the <eadid> but at item level there isn't really anything appropriate so we create a URI from [countrycode][repositorycode]-[reference]. e.g.  for GB 71 THM/407/3/1/1/46 we create https://archiveshub.jisc.ac.uk/data/gb71-thm/407/thm/407/3/1/1/46.  It does not sit within the EAD, but the <unitid> has: 
<unitid countrycode="GB" identifier="THM/407/3/1/1/46" label="current" repositorycode="71">THM/407/3/1/1/46</unitid>

For the example below we would have the unique reference GB249-T-GED/7/5/8A and the URI https://archiveshub.jisc.ac.uk/data/gb249-t-ged/t-ged/7/5/8a

That does mean we are creating the URI within the Archives Hub. But most descriptions will not have them included and this way they are all consistent to our pattern. 

cheers,
Jane 

Dan Gillean

unread,
Sep 5, 2018, 11:31:48 AM9/5/18
to ICA-AtoM Users
Thanks Jane and David for the further insights!

Jane, in AtoM there are multiple user-configurable settings for how a description's slug (the unique part of the URL) might be generated. Reference code is one of these options, but so is identifier, title, and ref code excluding the country and repository code as a prefix (for disambiguation in international contexts). Because of this, we can't assume the URI of an export's source based on the reference code alone.

However, your comment led me to notice that the <eadid> element does include a @url attribute. I've checked that we do include the URL in this attribute when exporting EAD, but I haven't yet confirmed as to whether or not this element is included in the OAI response when serving up EAD XML. I'm on a borrowed computer at the moment as I await a new power cord, but I will try to look into this further when I have access to my local test environment again.

Regards,

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory

--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To post to this group, send email to ica-ato...@googlegroups.com.
Visit this group at https://groups.google.com/group/ica-atom-users.

Scott Renton

unread,
Sep 5, 2018, 11:41:55 AM9/5/18
to ica-ato...@googlegroups.com
Folks, thanks for all your responses. I'm looking at this again tomorrow, so will have more to say then!

Jane Stevenson

unread,
Sep 6, 2018, 9:21:05 AM9/6/18
to ica-ato...@googlegroups.com
Hi Dan,

Yes, understood. I suppose the thing is that for the Hub we create URIs - we’ve made the decision how to create them and we stick to that in order that they are consistent. So, even if a contributor had a “URI” field with something in it (maybe an ARK), we would keep that in the data, but wouldn’t use it. We didn’t end up using the <eadid> for a URI because it only exists at collection level and we wanted persistent identifiers at all levels. And anyway, with all the exports we get the <eadid> is probably the most messed up element, so we actually discard what we get and create our own entry.

…that’s probably a bit off topic though…

cheers
Jane
> You received this message because you are subscribed to a topic in the Google Groups "AtoM Users" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/ica-atom-users/P7RMSvoYiag/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to ica-atom-user...@googlegroups.com.
> To post to this group, send email to ica-ato...@googlegroups.com.
> Visit this group at https://groups.google.com/group/ica-atom-users.
> To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/CAMMUg8XnrXp9JAD4qm%2BrYJ%3D5BM8xcWdt1TB3Vxu0XofTXieLXQ%40mail.gmail.com.

Dan Gillean

unread,
Sep 6, 2018, 3:32:30 PM9/6/18
to ICA-AtoM Users
Hi Jane, 

Actually, that's very helpful, but also opens another question.

Scott, if ArchivesHub is going to ignore any URI anyway in favor of building a new URI based on the identifier /reference code, then do you in fact need the URL in your OAI EAD response to be able to send the data to the Hub? Also wondering if it is simply possible to harvest at the top level, since EAD XML is meant to capture an entire hierarchy, and any individual descriptions post-import in ArchivesHub would not use the same URI as where they originated. Perhaps this is about ArchivesHub being able to point back to source content? Trying to better understand your use case :)

Additionally, I believe that the DC XML response *should* include digital object paths - you can see an example of this in our documentation, in the second example response for the GetRecord request, here: 
Let me know if you are not seeing this in responses where it is expected, and I can do some testing to see if you've encountered a bug. 

Cheers, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory


Jane Stevenson

unread,
Sep 10, 2018, 4:17:17 AM9/10/18
to ica-ato...@googlegroups.com
Hi all,

Yes, so we do create our own ‘persistent identifiers’ from the country code, repository code and reference. We don’t use one that is within a description - otherwise we’d have a motley selection of identifiers, and anyway, most descriptions don’t come with any kind of identifier.

You can see on the Hub we’ve just added a ‘cite’ facilitation - so from https://archiveshub.jisc.ac.uk/data/gb986-jenp you click on ‘Cite’ and we include the full reference as something that they can use in a citation. We also give them the citation for the Hub page itself using our URI.

We also have local sites for HE contributors, and we use the same kind of pattern for those, e.g. for Brunel they have a URI for a description: https://archiveshub.jisc.ac.uk/bruneluniversity/data/gb1975-ctun

If a contributor has an identifier for their description, we would hope that would be in the description, usually as a link in the ‘additional finding aids’ field, so that we use it to link to their website. However, we can also attempt to deal with ‘alternative identifiers’, such as with the British Library: https://archiveshub.jisc.ac.uk/data/gb58-addms89010 - so this has the EAD:

<unitid countrycode="GB" encodinganalog="3.1.1" identifier="Add MS 89010" label="current" repositorycode="58">Add MS 89010</unitid>
<unitid identifier="ark:/81055/vdc_100000000094.0x0000c1" label="alternative" type="ark">ark:/81055/vdc_100000000094.0x0000c1</unitid>

We would always encourage contributors to have a link to their own description, at any level of description, but we do get some instances of these going to 404s over time.

cheers
Jane



> On 6 Sep 2018, at 20:32, Dan Gillean <d...@artefactual.com> wrote:
>
> Hi Jane,
>
> Actually, that's very helpful, but also opens another question.
>
> Scott, if ArchivesHub is going to ignore any URI anyway in favor of building a new URI based on the identifier /reference code, then do you in fact need the URL in your OAI EAD response to be able to send the data to the Hub? Also wondering if it is simply possible to harvest at the top level, since EAD XML is meant to capture an entire hierarchy, and any individual descriptions post-import in ArchivesHub would not use the same URI as where they originated. Perhaps this is about ArchivesHub being able to point back to source content? Trying to better understand your use case :)
>
> Additionally, I believe that the DC XML response *should* include digital object paths - you can see an example of this in our documentation, in the second example response for the GetRecord request, here:
> • https://www.accesstomemory.org/docs/latest/user-manual/import-export/oai-pmh/#get-record
> Let me know if you are not seeing this in responses where it is expected, and I can do some testing to see if you've encountered a bug.
>
> Cheers,
>
> Dan Gillean, MAS, MLIS
> AtoM Program Manager
> Artefactual Systems, Inc.
> 604-527-2056
> @accesstomemory
>
>
> To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/CAC1FhZL1kibjJGjTrk_XkAfsqvCHrHRu69_RG08WLiDaanW5sQ%40mail.gmail.com.

Scott Renton

unread,
Sep 28, 2018, 8:15:47 AM9/28/18
to AtoM Users
Hi everyone, a quick update on where I've got to on all this. 

I've got control over Edinburgh's ArchivesSpace (but not over Strathclyde's AtoM, although I could ask), and my colleague, Grant Buttars, has had a look at what counts as valid EAD. We have added (eg):
<c id="xxxx level="item">
    <did>
        <unittitle>XXXXX</unittitle>
        <unitid>XXXXX<extptr href="19019"/></unitid>
       <origination....

At each level, and that gives us what we need. I am now successfully pushing back from the front-end site to ArchivesSpace.

Do you think a similar change would be achievable on AtoM? Understand if not. I can achieve most of what I need with Dublin Core, but I would like to use EAD to get at Digital Objects, which are showing on EAD. I thought I could build up a list of slugs (for pushing back to AtoM) from the DC, and then build up a reference to those when harvesting the EAD. Unfortunately, there doesn't seem to be anything unambiguous I can match on across the two formats: 
  • DC has <identifier>oai:atom.lib.strath.ac.uk:ESU_2413</identifier>, which doesn't show in the EAD. 
  • EAD has <unitid encodinganalog="3.1.1" countrycode="GB" repositorycode="249">T-GED/7/5/30/16</unitid>, which doesn't show in the DC!
So, a change would be needed to allow that- is there a customisation to one of the outputs we could apply without having to trouble yourselves for a release? 

I think I can proceed with everything but digital objects without a change, and possibly slightly more detailed metadata, though, so it's not as if I'm stuck.

Cheers
Scott

*We know that extptr will be deprecated in EAD 3, but it works for us just now!

Scott Renton

unread,
Sep 28, 2018, 9:02:13 AM9/28/18
to AtoM Users
Apologies as well folks, I've been quite selective in that reply. In the first instance, we're only interested in harvesting to our joint website with Strathclyde, but we should be doing things in a unified way so we can go to the Hub (I know Grant is keen to re-engage). So, the below works for this case, and we'd be aware that it would be replaced by a URI by the hub. Would that strengthen or weaken the case for using an extptr?!

I also forgot to mention that I did check for digital objects coming out on DC, and I'm definitely not seeing an equivalent to (eg) <dao linktype="simple" href="http://strathclyde.ica-atom.org/uploads/r/university-of-strathclyde-archives-united-kingdom/2/9/2986/T_GED_7_5_30_16_atom.jpg" role="master" actuate="onrequest" show="embed"/> in  the DC record that points to http://strathclyde.ica-atom.org/sections-and-elevations-of-proposed-steps-at-kings-wall-garden-by-norah-geddes.

Thanks again
Scott

Jane Stevenson

unread,
Oct 1, 2018, 2:58:42 AM10/1/18
to ica-ato...@googlegroups.com
Hi Scott,

In terms of the Archives Hub linking the user back to the original archival repository to view the item, we generally do this only if the provider of the EAD has provided a link to their finding aid, and this usually happens through the <otherfindaid> tag, i.e. a specific link is given:

https://archiveshub.jisc.ac.uk/data/gb261-nationalmeteorologicalarchive links to National Meteorological Archive catalogue. So, very straightforward (although 404s are common because things move!).

But we have very few examples of links from individual units. Contributors don’t add links at this level, and we don’t do what you are doing - create links ourselves. With 330 contributors that would not be feasible.

In order for links to exist at this level, they would have to be provided. But with the number of 404s that we already get with digital objects, I would imagine there would be similar risks with links to component metadata.

Something like your example might provide the means to do this:

> <c id="xxxx level="item">
> <did>
> <unittitle>XXXXX</unittitle>
> <unitid>XXXXX<extptr href="19019"/></unitid>
> <origination….

However, we’ve never used <extptr> and we’re not likely to introduce it, as it is made obsolete in EAD version 3, and whilst we are not moving to EAD3 any time soon, we do have an eye on tags that are not going to work when we do move to using it. We have generally used <archref> for links like this, though I must say that knowing which EAD tag to use for what is not always clear!

cheers,
Jane





Links to
> To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/37a4b0a6-0b42-4a63-b38d-2decc8dd4f1b%40googlegroups.com.

Dan Gillean

unread,
Oct 1, 2018, 1:57:35 PM10/1/18
to ICA-AtoM Users
Hi Scott, 

For reasons similar to Jane's, I think we would be hesitant to add the <extptr> element to AtoM as a public feature - and it would require development sponsorship to do so in any case. 

I've also just tested in both my local 2.4.1 development environment, and in the 2.4.0 demo site, and I can see digital object information in the DC XML output when an object is attached to the record where a GetRecord request is made. For example, here is the oai-dc GetRecord URL for the Clara Bernhardt fonds in our demo site: 
Note that our demo site resets every hour, so if you want to follow this link, you will likely have to log into the demo site, navigate to Admin > Plugins, and re-enable the arOAIPlugin. 

In the demo site, there is a digital object attached at the fonds level. The OAI response includes the following: 

oai-getRecord-clara.png

This is not a <dao> link of course - because this is an EAD XML tag that is not available in DC. Nevertheless, I do see a link to both the master digital object, and the thumbnail, as outlined in our documentation. Because this is an externally linked digital object (rather than one that is locally uploaded), the master points out to the original source of the image. 

Remember that DC XML returns individual responses per record  - so if you are looking for a digital object attached to an image that is part of a collection, then the OAI response for the collection will not include the object link - only the item-level record will. 

Regards, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory

Scott Renton

unread,
Oct 2, 2018, 7:21:43 AM10/2/18
to AtoM Users
Thanks very much, Jane and Dan. I think I will proceed with EAD for the ArchivesSpace material and DC for AtoM as stated, with the knowledge that the extptr is not future-proof (I don't mind ArchivesSpace things because I can make changes there).

Is it possible that it's only returning the "link href" on a GetRecord for DC, Dan? I'm running ListRecords as it's obviously far easier. Maybe there's something wrong with my test record here:
when I export as Dublin Core XML from there (http://strathclyde.ica-atom.org/downloads/exports/dc/769675d7c11f336ae6573e7e533570ec.dc.xml) I get this (try it!):

<dc:title>Sections and elevations of proposed steps at King's Wall Garden, by Norah Geddes</dc:title>
<dc:creator>Mears, Norah, 1887-1967, née Geddes</dc:creator>
<dc:date>No date</dc:date>
<dc:type>image</dc:type>
<dc:format>image/jpeg</dc:format>
<dc:format>1 sketch</dc:format>
<dc:identifier>16</dc:identifier>
<dc:language xsi:type="dcterms:ISO639-3">eng</dc:language>
<dc:rights>Open</dc:rights>
</oai_dc:dc>

No link href there, but as I say maybe there is something wrong with the record as catalogued.

The other thing that would make us comfortable with DC would be the export of "Reference code: GB 249 T-GED/7/5/30/16" as per that record, as it tells us a bit more than the granular "16" that goes into the  dc.identifier on the DC record.

Thanks very much for looking at this folks.

Cheers
Scott

Dan Gillean

unread,
Oct 3, 2018, 5:21:51 PM10/3/18
to ICA-AtoM Users
Hi Scott, 

Ahhh, I see - sorry for the confusion! Yes, the digital object URL is only returned for GetRecord requests currently, not in the ListRecords response. 

Regarding the reference code vs. identifier: With the EAD XML export, what is exported depends on the Inherit Reference code settings in Admin > Settings > Global. If you have this setting turned on, then on export, the EAD <unitid> element will include the full reference code at all levels. If you have this setting turned off, then the <unitid> will only contain the current level's identifier. 

We could certainly implement the same logic for the DC export (and correspondingly, the DC OAI response). Doing so would require development, however - meaning we would need community sponsorship or code contributions to be able to implement this. 

Cheers, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory

dls...@ed.ac.uk

unread,
Oct 4, 2018, 7:43:01 AM10/4/18
to AtoM Users
Hi Dan,

No, I am sorry! I have now discovered that I can get at digital objects through ListRecords. As on a GetRecord, they seem to be listed in <record><about><feed><entry><link>, so I'm pretty happy I can reach them. It was hard to spot because they're the exception and only getting 100 records at a time + resumptionToken from ListRecords meant I needed to code something to check!

As I'm doing DC, then I don't think in the first instance I'd need to ask for that development change around the full identifier- it was just if I needed to get some material from DC and some from EAD. I think I can push on with my work and will get back to you if I hit any other queries.

Thanks yet again for your help
Scott
Reply all
Reply to author
Forward
0 new messages