Error trying to Harvest Dspace OAI PHM

33 views
Skip to first unread message

Yona

unread,
Dec 28, 2019, 12:59:52 PM12/28/19
to DSpace Technical Support
Hi all,

I'm trying to list record on OAI PHM, but when I try to see more records (after 1700) I get this error.


I used dspace oai import -c and dspace oai clean-cache but didn't work.

Thanks in advantage

George Kozak

unread,
Jan 1, 2020, 6:31:46 PM1/1/20
to Yona, DSpace Technical Support
Yona:
I had a similar problem a while ago.  My problem was caused by a "bad" record that stops the OAI indexing.  I am not sure if that is what is happening for you, but if you think it is, I have a potential fix.
I was helped by Emilio Lorenzo (elor...@arvo.es) and Adan Roman (adan....@gmail.com).  
They had a patch that I made to the DSpace java code that gave me the ability to find the record that causes the OAI index to fail.  
Basically, this patch allows the OAI import to write in verbose mode the id of an item before it is indexed.  So, when the OAI import stops, you know exactly what record caused it to stop.  Then you can fix the item and rerun the OAI import.  The bad thing is that if you have multiple "bad" records then you have to keep running the import until you clean out the records one by one.
This is the patch:

org.dspace.xoai.app.XOAI.java :


 private SolrInputDocument index(Item item) throws SQLException, MetadataBindException, ParseException, XMLStreamException, WritingXmlException {

     if (verbose) {

         System.out.println("Trying to index into oai item with id:"+item.getID().toString());

     }

        SolrInputDocument doc = new SolrInputDocument();

        doc.addField("item.id", item.getID());


Best Wishes.
George Kozak
Digital library Specialist
218 Olin Library
Cornell University
Ithaca, NY 14853

--
All messages to this mailing list should adhere to the DuraSpace Code of Conduct: https://duraspace.org/about/policies/code-of-conduct/
---
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-tech/ad758e99-c076-41bf-bfc3-29b453ba6146%40googlegroups.com.


--
***************************
George Kozak
Digital Library Specialist
Cornell University Library - IT
218 Olin Library
Cornell University
Ithaca, NY 14853
607-255-8924
Reply all
Reply to author
Forward
0 new messages