OAI-PMH: how to determine the identifier

65 views
Skip to first unread message

Paul C

unread,
Nov 9, 2018, 5:48:18 AM11/9/18
to AtoM Users
I am experimenting with the OAI-PMH API. If I use ListRecords to get a list of records, I can then successfully use identifiers found in that list to get further details via GetRecord, e.g.


The question I have is: how can I determine the correct identifier for an item that I located through the Web interface? For example:


Based on the XML (right-hand link) it seems that the identifier would be "cromer" or "HOWE-HER-13", but these do not work in a GetRecord URI (I get the idDoesNotExist error).

Thanks,
Paul C.

Dan Gillean

unread,
Nov 9, 2018, 11:14:19 AM11/9/18
to ica-ato...@googlegroups.com
Hi Paul, 

The short answer: Unfortunately, I don't think it's possible to determine the OAI identifier via the user interface. As far as I know, using the OAI module (with a ListRecords or ListIdentifiers request) is the only way at present to get the identifiers. It may be possible to craft a SQL query that would output a list of description slugs, identifiers, and OAI identifiers for reference, but I'm not sure. 

Now some thoughts on possible long-term (development required) solutions: 



I wasn't around when the OAI Repository functionality was first implemented in AtoM, so I'm not fully aware of the rationale behind the current implementation, but I will try to get more information from our team. I'm also trying to dig a bit further into the standard to determine what is possible and what's not when it comes to OAI identifiers. Specifically: 
As far as I can tell from a cursory glance through the above, I don't see why the sanitized slug of the resource couldn't be used for the local-identifier part of the OAI identifier. There are reserved characters that must be escaped in OAI identifiers (";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | ","), but we sanitize those in the current slug settings anyway. So it could be possible to make changes in AtoM to support a more easy-to-determine OAI identifier scheme. However, there are several other factors that would need to be carefully considered as well. 

First, in AtoM 2.5, there will be a more permissive slug configuration option that a) recognizes case (so my-slug and My-slug and My-Slug would point to 3 different resources), and b) will allow the use of any unescaped special character supported by RFC-2396 - in the AtoM implementation, only restricted characters ( / ? # { } ) (some of which are allowed in URIs but tend to mean other things, like slashes and pound/hash symbols) and literal spaces will be replaced with dashes '-' as we currently do. You can see the development ticket for that feature on ticket #11761. There are more restricted characters in the OAI identifier requirements than the new permissive slug option, so... if we used the slug and a user had the permissive slug setting turned on, even in this case, we'd have to fall back to the sanitized version for the OAI identifier, and it wouldn't always be easy to know the exact OAI identifier just by looking at the slug. 

There is also the question of persistence. OAI identifiers are meant to be persistent - but in AtoM there are a couple ways to change the slug of a resource. Users can  edit individual description slugs directly in the user interface using the Rename module, and an administrator could change the permalink settings (to use the reference code instead of title for slugs, for example), and then use the generate slugs task with the --delete option to regenerate new slugs for every description in AtoM based on the settings change. In that case - should the OAI identifier change, or remain the same?

Another approach, which could be used either instead of or in conjunction with an overhaul of how OAI identifiers are generated, would be to display the OAI identifier in the user interface somewhere. Options might include: 
  1. In the Admin area of a description's edit page
  2. In the Admin area of a description's view page
  3. In the search/browse result stubs
  4. In a CSV export column
1 is the least obtrusive - we are using a similar method for showing the "source name" from CSV imports that is saved in the keymap table (information that is really only needed and useful for admins trying to do imports to update existing and previously imported records). However, it's also the least accessible - you'd have to go to each description and enter edit mode to figure out the OAI identifier. 

2 is slightly more useful, but still requires visiting each description indivually. 3 becomes more useful... but really only to site admins and expert users who might be using their own harvesting tools (likely a very small percentage of your public users... if any), and so you risk adding more confusing and unnecessary information to the public catalogue. We could try 2 or 3 but make it visible only to logged in users - but again, without overhauling AtoM's permissions module to make it more scalable, there could be long-term unintended performance and scalability consequences for adding yet another check that must be performed before a page element is loaded. 

4 might be the best option, and it might even be possible to configure the column so it is only included in exports performed by authenticated (i.e. logged in) users if desired, to keep information that will be unnecessary and possibly confusing to most researchers out of the default public export... but this will still require an admin to find and add any records to the clipboard, and then export them, when they want to determine the OAI identifiers. Since this would not be an every day request, and since there is an option to include descendants in the export of Clipboard descriptions, this seems like an acceptable compromise to me... but I'm curious what your thoughts are in terms of meeting your use case and needs, were something like this implemented in a future AtoM release. 

Finally, if the SQL query I mentioned above to get the OAI identifiers is possible, then it's also possible to imagine creating a special CSV export that an admin could trigger from the OAI settings page to export a CSV of all descriptions with only relevant reference columns, like title, identifier, referenceCode, and a new oaiIdentifier column. Ideally such an export would also include a slug/permalink column as well. 

Do you have other ideas or thoughts on how we could improve access to the OAI identifiers?

In any case, all of the above options would require further analysis, and careful consideration around the repercussions for existing OAI users if changes are introduced. Ideally new settings would be added so existing behaviors could be preserved but users wishing to make changes would have the option to enable a setting. These ideas would also require community sponsorship - either in the form of community code contributions that follow the recommendations in our Development resources, or as development sponsorship. To learn more about how we develop and maintain AtoM, please see: 
In the meantime, if I learn anything of note about why and how the current OAI identifiers are generated, I'll post an update on this thread. 

Regards, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory


--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To post to this group, send email to ica-ato...@googlegroups.com.
Visit this group at https://groups.google.com/group/ica-atom-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/37d31595-658a-4cb0-8ff4-8d86a5a4abe6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

paul.c...@millsarchive.org

unread,
Nov 12, 2018, 11:49:15 PM11/12/18
to AtoM Users
Dan, thanks for your detailed expert response!

I actually assumed I was missing something, and expected a reply along the lines of "oh, you just have to add XYZ metadata prefix and then it will work". It does surprise me that a "unique identifier" isn't the one single thing you can rely on to appear alongside an item in every presentation format! But I see there is some history behind this.

Anyway, this was just something that would be very convenient for me to test with; I doubt we would actually need it in production.

Tim Hutchinson

unread,
Nov 21, 2018, 1:03:57 PM11/21/18
to AtoM Users
Hi Paul,

To add to Dan's very comprehensive response, I wanted to point out that it's easy enough to look the OAI identifier if you have access to the MySQL backend.
- In the slug table, look up the object_id based on the slug (i.e. unique part of the URL for a description)
- In the information_object table, search for the object_id you just found (under id); you'll see oai_local_identifier as the third field

And then you can check the syntax for the identifier under settings | OAI repository (which will be based on the OAI repository code you set).

If you don't have direct access to the MySQL database, here is a query you could request to have run (or run yourself to save the extra lookup step every time).

SELECT slug, information_object.id, oai_local_identifier FROM `information_object` inner join slug on information_object.id = slug.object_id

Tim
Reply all
Reply to author
Forward
0 new messages