Download our full-size images via OAI-PMH XML API

96 views
Skip to first unread message

paul.c...@millsarchive.org

unread,
Jun 5, 2019, 6:03:53 AM6/5/19
to AtoM Users
Hello.

I am using the OAI-PMH 2.0 XML API documented here:
to connect to our archive here:

We will need to connect periodically and download the latest full-size images so that we can offer them for sale as downloads from our online shop. (This will be a new project: the current shop does not automatically deliver purchased images and requires one of our users to e-mail them manually to the purchaser.)

Take this record for example:
Its unique ID (obtained via ListRecords) is:
oai:catalogue.millsarchive.org:millsarchiveatom_83475

Note there there is a small/reduced image visible to the public:

If we are logged in to AtoM in the browser, we can also click that small image to see the full-size one (if not logged in, we get a 404 error):

From that same Web page we can export Dublin Core 1.1 XML:
but this format does not give us any image URI at all.
We can also export EAD:
which gives us the URI of the full-size image as desired.

However, I have two problems:

1. When I try to retrieve that EAD XML using the API, via GetRecord for ID "oai:catalogue.millsarchive.org:millsarchiveatom_83475", I do not get the XML back as expected. Instead I get the "cannotDisseminateFormat" error. Why is this?

2. Assuming I manage to resolve point 1, how can I automatically download the full-size image without getting a 404? Bear in mind this is not a Web browser where we can log in with a cookie session etc. but just a simple program calling the API. Is there a way to download the jpg perhaps by sending special authentication info in the HTTP headers or similar?

Thanks,
Paul Collins.

Dan Gillean

unread,
Jun 11, 2019, 10:41:40 AM6/11/19
to ICA-AtoM Users
Hi Paul, 

Regarding your first question: 

EAD XML documents, because they contain the entire descriptive hierarchy and not a single record at a time, tend to be too big to generate and serve on demand via the browser. To avoid this and ensure delivery, there is an option in AtoM to generate XML documents and cache them in advance, so they can be served on demand without needing to be generated first, and thereby generally avoiding the web browser timeout limit. This is described in our documentation here: 
You'll find more information on enabling this in the Settings here: 
 There is also a command-line task that a system administrator can run to pre-generate and cache XML for all existing descriptions. See:
As the documentation notes: 

We strongly recommend users enable this setting and run the command-line task if you wish to make EAD 2002 XML available to harvesters via the OAI Repository module. If you do not, then AtoM will return a cannotDisseminateFormat error code to attempts by harvesters to request oai_ead.
 
Regarding the second: 

I'm not actually sure about passing credentials via headers - I would be somewhat concerned about the security of doing so, and recommend you do some testing to see if using the API key might get you what you need. However, I wanted to point out that in most cases, the master digital object will be available at the same URL as that shared in the OAI response, minus the _142 or _141 found at the end of the URL, which specifies the derivatives. The following previous thread has further examples (and exceptions):
You might also find this older post of interest, where I explain how AtoM currently stores digital objects: 
As a more long-term alternative: 

You could potentially store your images on a separate server or in a DAM of some kind that will provide you with the access you need, rather than uploading them to AtoM directly - so long as your DAM or server can provide you with a valid HTTP/S path that ends in the extension, then when you want to link them to AtoM, you can use the URI option. 

However, I'm sure that rethinking how you've been uploading all of your digital objects is not a preferred solution. I will try to see if our developers have any thoughts on getting access to the master object and passing credentials. 

Cheers, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory


--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/99e4b120-a7f5-4fd4-bf59-5b11825e04ca%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages