Issues with harvesting files from DSpace repository via OAI-PMH

19 views
Skip to first unread message

Carolyn Sullivan

unread,
Mar 18, 2026, 8:47:50 AM (2 days ago) Mar 18
to DSpace Technical Support
Hello all,

One of the group that harvests our theses (Library and Archives Canada) is running into some errors.  Some of our materials show 'Empty folder' when they try to access them.  Problem is, the content they're trying to access is definitely available.


The thesis is available.  The only weird thing about it is that if I try to access the download link given in the OAI-PMH call, it shows the thesis pdf in my internet browser and changes the suffix from 'download' to 'content'... more recent thesis items don't do that.  Example:

and looking at the record, the identifier given for the actual pdf is: 


which is changed to: 


So...
(1) Why do some theses in the collection display in browser when I check the download link given in the OAI-PMH record, and others automatically download?
(2) Does this possibly have something to do with the errors we're getting with harvests?  If not, any idea what could be the cause?

Thanks,
Carolyn.

DSpace Technical Support

unread,
Mar 19, 2026, 4:43:35 PM (7 hours ago) Mar 19
to DSpace Technical Support
Hi Carolyn!

(1) I believe things are working as expected for downloads; the difference between opening in browser and automatic download is probably the size of PDFs. There's a setting called the "Content Inline Disposition Threshold" that sets an item size after which the browser will go straight to automatic download, which defaults to 8MB but can be adjusted (see Configuration documentation).
  • As a side note, the URLs you are seeing are also as expected - /bitstreams/<:uuid>/download is a user interface only URL, and a proxy for the REST API endpoint which is that second one /api/core/bitstreams/<:uuid>/content (it'll always redirect there).
(2) Hard to narrow down about the harvesting errors. Could you provide details about what method they are using to access, and the exact error messages they are seeing/ ideally full error trace?

Best,
Lia

DSpace Technical Support

unread,
Mar 19, 2026, 4:53:28 PM (7 hours ago) Mar 19
to DSpace Technical Support
Hi Carolyn,
I linked to an outdated version of the documentation for the Content Disposition Threshold, sorry about that - for DSpace 7.x and up, the configuration works a little differently: https://wiki.lyrasis.org/display/DSDOC7x/Configuration+Reference#ConfigurationReference-ContentInlineDispositionThreshold/Format

The threshold still works the same, but you can also add filetypes to "webui.content_disposition_format"  such that for example, all PDFs are automatically downloaded regardless of the size. 

-Lia

Reply all
Reply to author
Forward
0 new messages