Incorrect bitstream URLs in OAI-PMH output?

34 views
Skip to first unread message

Michael White

unread,
Feb 27, 2020, 6:52:00 AM2/27/20
to DSpace Tech

Hi,

 

We’re using DSpace v6.2, JSPUI.

 

Whilst troubleshooting an issue with a large number of broken full text links harvested from our repository via OAI-PMH by the CORE service, CORE reported to us that "the provided full text link in the OAI-PMH dc:identifier field is broken."

 

For example, for this item in our repository:

 

https://dspace.stir.ac.uk/handle/1893/30142

 

- the link to the associated bitstream from this repository record is:

 

https://dspace.stir.ac.uk/retrieve/17570e9c-aa29-4c15-99b2-af5892853652/Revisions_Final_Chronic_wounds.pdf

 

- however, if harvested via OAI-PMH:

 

https://dspace.stir.ac.uk/oai/request?verb=GetRecord&identifier=oai:dspace.stir.ac.uk:1893/30142&metadataPrefix=oai_dc

 

- then the bitstream link in dc.identifier is wrong:

 

<dc:identifier>http://dspace.stir.ac.uk/bitstream/1893/30142/-1/Revisions_Final_Chronic_wounds.pdf</dc:identifier>

 

- i.e. it contains "-1" where I'd expect to see the bitstream UUID.

 

And looking at the "raw" XOAI output, it appears to be wrong there too (so not an issue with the oai_dc crosswalk?):

 

https://dspace.stir.ac.uk/oai/request?verb=GetRecord&identifier=oai:dspace.stir.ac.uk:1893/30142&metadataPrefix=xoai

 

<field name="url">http://dspace.stir.ac.uk/bitstream/1893/30142/-1/Revisions_Final_Chronic_wounds.pdf</field>

 

However, a large number of the OAI-PMH bitstream links do work - e.g.:

 

https://dspace.stir.ac.uk/oai/request?verb=GetRecord&identifier=oai:dspace.stir.ac.uk:1893/58&metadataPrefix=oai_dc

 

- includes the correct bitstream URL:

 

<dc:identifier>http://dspace.stir.ac.uk/bitstream/1893/58/1/Thesis.pdf</dc:identifier>

 

I've tried clearing the cache, and rebuilding the OAI-PMH index, but this issue remains. I also searched the Mailing list archives and JIRA, but couldn't find anything that seemed to relate to this problem.

 

I'm not sure, but my current working theory is that links to "older" bitstreams do work because they relate to records added to the repository before the upgrade that moved DSpace from using numeric IDs to UUIDs - but records added since then, that make use of UUIDs, don't work . . . . (but I haven't managed to prove this theory yet!).

 

Has anyone else come across this? Does anyone know of a solution (I'm happy to hack code/apply patches if required)?

 

If you're on this version of DSpace, are all the bitstream URLs harvested via OAI-PMH from your repository correct?

 

If anyone has any fixes, thoughts, observations etc, they would be most welcome as I'm currently at a loss as to how to resolve this and, given the importance of CORE for supporting the upcoming REF here in the UK, my library colleagues are getting a bit jumpy ;-).

 

Cheers,

 

Mike

 

Michael White
Senior Developer

Business Applications and Integrations
Information Services


4B19, Cottrell

University of Stirling
Stirling
FK9 4LA


Tel:  +44 (0)1786 466877
Email:  michae...@stir.ac.uk
Web: stir.ac.uk/informationservices

Banner

 

 


The University achieved an overall 5 stars in the QS World University Rankings 2018
The University of Stirling is a charity registered in Scotland, number SC 011159.

Bram Luyten

unread,
Feb 28, 2020, 3:39:58 AM2/28/20
to Michael White, DSpace Tech
Hi Michael,

thank you for reporting/sharing this.

Not a solution, but I wanted to share two observations to narrow the problem down. 

DSpace 6.3 XMLUI - Fresh install


Conclusion: Can't reproduce

DSpace 6.3 XMLUI - Upgraded instance & item that already exists pre-upgrade


Conclusion: Can't reproduce

Both of these installations have OAI enabled but I didn't have the time to look at the record there

Hope this helps!! Would be interested in learning whether this is specific to your institution/customization, JSPUI specific, ... as it may affect others as well !!

with kindest regards,

Bram

logoBram Luyten
250-B Suite 3A, Lucius Gordon Drive, West Henrietta, NY 14586
Gaston Geenslaan 14, 3001 Leuven, Belgium
atmire.com


--
All messages to this mailing list should adhere to the DuraSpace Code of Conduct: https://duraspace.org/about/policies/code-of-conduct/
---
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-tech/AM6PR03MB5560135CD01E0191E3F1C1C0D4EB0%40AM6PR03MB5560.eurprd03.prod.outlook.com.

Michael White

unread,
Feb 28, 2020, 4:27:55 AM2/28/20
to Bram Luyten, DSpace Tech

Thanks Bram,

 

> Conclusion: Can't reproduce

 

I’ve had a poke about in those 2 repositories, and I’ve also not been able to reproduce the issue in either repository (but my investigation wasn’t particularly exhaustive!) – having said that, the issue is with the link to the bitstream in the OAI-PMH output, but, as far as I could tell, the link to the bitstream isn’t included in the OAI-PMH output from either of those repositories . . . (?)

 

However, I did some further investigation in to the issue I’m seeing with our repository last night and believe that I have identified a bug (at least that is what it looks like to me!).

 

Firstly I note that the format of the link to the bitstream that appears on the Item View page in our Repository has 2 distinct forms – e.g.:

 

https://dspace.stir.ac.uk/handle/1893/58 has a bitstream link of the form: https://dspace.stir.ac.uk/bitstream/1893/58/1/Thesis.pdf

https://dspace.stir.ac.uk/handle/1893/30142 has a bitstream link of the form: https://dspace.stir.ac.uk/retrieve/17570e9c-aa29-4c15-99b2-af5892853652/Revisions_Final_Chronic_wounds.pdf

 

- however, for the latter, the bitstream link that appears in the OAI-PMH (https://dspace.stir.ac.uk/oai/request?verb=GetRecord&identifier=oai:dspace.stir.ac.uk:1893/30142&metadataPrefix=oai_dc) has the form: http://dspace.stir.ac.uk/bitstream/1893/30142/-1/Revisions_Final_Chronic_wounds.pdf

 

- i.e. the URL for this bitstream is being rendered with a URL of the “wrong” format.

 

The key observation here is the Sequence ID (sid) that appears in that URL, has the value “-1” (which is a “non-value”) – and looking in the database, I can see that we have 3390 bitstreams with a sid value of “-1” – I can’t be 100% certain, but I’m guessing this coincides with the records that have been added to the system since we upgraded from v4 to v6.2 a couple of years ago . . . (?)

 

So, the bug . . .

 

Looking at the code that renders the bitstream link in the JSPUI (in dspace-6.2-src-release/dspace-jspui/src/main/java/org/dspace/app/webui/jsptag/ItemTag.java, line 1054), I can see that it first checks the sid – if it is > 0 then the link is rendered in the first format, but if it is <= 0 (i.e. “-1”), then the link is rendered in the second format:

 

if ((handle != null) && (b.getSequenceID() > 0)) {

                bsLink = bsLink + "/bitstream/"

                                                + item.getHandle() + "/"

                                                + b.getSequenceID() + "/";

} else {

                bsLink = bsLink + "/retrieve/"

                                                + b.getID() + "/";

}

 

However, it looks to me like the code that renders the bitstream links for inclusion in OAI-PMH output (in dspace-6.2-src-release/dspace-oai/src/main/java/org/dspace/xoai/util/ItemUtils.java, line 211) doesn’t do this check and simply renders the link in the first format regardless:

 

if (handle != null && baseUrl != null) {

                url = baseUrl + "/bitstream/"

                                                + handle + "/"

                                                + sid + "/"

                                                + URLUtils.encode(bsName);

}

 

Therefore, my current thought is that if I replace the if statement above with:

 

if (handle != null && baseUrl != null)

// Updated code to handle both SID and UUID type bitstream URLs - MW: 27/2/20

{

                if (bit.getSequenceID() > 0) {

                                url = baseUrl + "/bitstream/"

                                                + handle + "/"

                                                + sid + "/"

                                                + URLUtils.encode(bsName);

   } else {

                                url = baseURL + "/retrieve/"

                                                + bit.getID() + "/"

                                                + URLUtils.encode(bsName);

   }

}

 

- then the URLs that are rendered in the OAI-PMH output should be correct for both cases.

 

My next step is to try applying this fix in our DEV system and see if it works as I expect, but I’d be interested to know if others agree with my analysis (and proposed fix), or if I’ve missed anything, or I’m proposing/doing anything daft!

Bram Luyten

unread,
Feb 28, 2020, 4:48:39 AM2/28/20
to Michael White, DSpace Tech
Hi Michael,

impressive detective work!

Maybe an additional piece of the puzzle: I don't think that default DSpace exposes the direct bitstream links in OAI-PMH.

One piece of code where I know this has been added, is in the RIOXX patch:

The RIOXX patch in itself is only compatible with DSpace 5.x, a DSpace 6.x compatible version of the patch has yet to be made.

Maybe what you found is in effect, an incompatibility between the RIOXX patch for DSpace 5, and DSpace 6, and will potentially affect everyone with the RIOXX patch attempting DSpace upgrades to 6?

best regards,

Bram

logoBram Luyten
250-B Suite 3A, Lucius Gordon Drive, West Henrietta, NY 14586
Gaston Geenslaan 14, 3001 Leuven, Belgium
atmire.com


Reply all
Reply to author
Forward
0 new messages