DSpace: Harvesting Error

202 views
Skip to first unread message

Lewatle Johannes Phaladi

unread,
Feb 13, 2019, 2:43:36 AM2/13/19
to DSpace Community
Hi DSpace Colleagues,

When I tested my harvesting settings DSpace says settings are valid, but when other repositories harvesting our side tries to harvest they get error messages attached, on another attachment i just put screenshot of test I have done on our dev dspace trying to harvest collection from Prod Dspace site,   TXT document contains error received when running import on another dspace system as test. Your advise on this error is much appreciated!

Regards,
Lewatle 
ir00889a.JPG
oai screenshot1.png
oai screenshot.png
screen error message.txt

Lewatle Johannes Phaladi

unread,
Feb 13, 2019, 3:45:22 AM2/13/19
to DSpace Community
Hi, 

See attached dspace lod file for more on what is happening on the server, Discovery is coming out many times is there anything I should do on the server? re-indexing ?? or anything to resolve this error

Regards,
Lewatle 
dspace.log-2019-02-13.txt

David De La Croes

unread,
Feb 13, 2019, 7:54:33 AM2/13/19
to Lewatle Johannes Phaladi, DSpace Community

Hi Lewatle,

I have tried harvesting oai records for the collection which triggers the error (com_10539_45). I have used the shell-oaiharvester (https://github.com/wimmuskee/shell-oaiharvester), and it has stopped harvesting when it tried to save the record which has the handle http://hdl.handle.net/10539/15066 . It seems that this record has been deleted or hidden from/on  your repository (or not indexed or imported into the OAI database. Perhaps reindexing and importing your OAI records from scratch will correct some errors, which are mainly caused by diacritics or other “foreign” characters in your metadata. This normally happens when the postgresql database has been migrated to another version.

 

Hope this helps!

 

Regards,

David

--
All messages to this mailing list should adhere to the DuraSpace Code of Conduct: https://duraspace.org/about/policies/code-of-conduct/
---
You received this message because you are subscribed to the Google Groups "DSpace Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-communi...@googlegroups.com.
To post to this group, send email to dspace-c...@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-community.
For more options, visit https://groups.google.com/d/optout.

Disclaimer - University of Cape Town This email is subject to UCT policies and email disclaimer published on our website at http://www.uct.ac.za/main/email-disclaimer or obtainable from +27 21 650 9111. If this email is not related to the business of UCT, it is sent by the sender in an individual capacity. Please report security incidents or abuse via https://csirt.uct.ac.za/page/report-an-incident.php.

Lewatle Johannes Phaladi

unread,
Feb 15, 2019, 4:20:34 AM2/15/19
to DSpace Community
Hi David,

Thank you very much and now I am busy doing re-indexing of the DSpace database and I will let you and all DSpace colleagues know of the results after test.

Regars,
Lewatle 


On Wednesday, 13 February 2019 14:54:33 UTC+2, david.delacroes wrote:

Hi Lewatle,

I have tried harvesting oai records for the collection which triggers the error (com_10539_45). I have used the shell-oaiharvester (https://github.com/wimmuskee/shell-oaiharvester), and it has stopped harvesting when it tried to save the record which has the handle http://hdl.handle.net/10539/15066 . It seems that this record has been deleted or hidden from/on  your repository (or not indexed or imported into the OAI database. Perhaps reindexing and importing your OAI records from scratch will correct some errors, which are mainly caused by diacritics or other “foreign” characters in your metadata. This normally happens when the postgresql database has been migrated to another version.

 

Hope this helps!

 

Regards,

David

 

 

 

 

From: dspace-c...@googlegroups.com <dspace-c...@googlegroups.com> On Behalf Of Lewatle Johannes Phaladi
Sent: Wednesday, 13 February 2019 9:44 AM
To: DSpace Community <dspace-c...@googlegroups.com>
Subject: [dspace-community] DSpace: Harvesting Error

 

Hi DSpace Colleagues,

 

When I tested my harvesting settings DSpace says settings are valid, but when other repositories harvesting our side tries to harvest they get error messages attached, on another attachment i just put screenshot of test I have done on our dev dspace trying to harvest collection from Prod Dspace site,   TXT document contains error received when running import on another dspace system as test. Your advise on this error is much appreciated!

 

Regards,

Lewatle 

--
All messages to this mailing list should adhere to the DuraSpace Code of Conduct: https://duraspace.org/about/policies/code-of-conduct/
---
You received this message because you are subscribed to the Google Groups "DSpace Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-communi...@googlegroups.com.

To post to this group, send email to dspace-...@googlegroups.com.

Lewatle Johannes Phaladi

unread,
Feb 19, 2019, 4:27:54 AM2/19/19
to DSpace Community
Hi David and Colleagues,

I have re-indexed discovery with -d, re-imported oai with -o and restarted tomcat then retried to harvest collection on our test server, I got error complaining about https://hdl.handle.net/10539/26028 , the item in question I have deleted from the system after receiving error and re-inxexed again but it is still coming on root cause of the oai error, is there any way harvester can by pass this item.  I have also attached another error messages.

Regards,
Lewatle 

On Wednesday, 13 February 2019 09:43:36 UTC+2, Lewatle Johannes Phaladi wrote:
dspace havester error.txt
HTTP Status 500 – Internal Server Error.txt

David De La Croes

unread,
Feb 19, 2019, 5:01:22 AM2/19/19
to Lewatle Johannes Phaladi, DSpace Community

Hi Lewatle,

We recently experienced similar problems to yours, which prevented external harvesters from receiving our complete OAI feed. We discovered that a number of records caused the OAI XML to become malformed, because of “foreign” characters, etc. Once we have identified those records (only 9 records), we edited each records in DSpace. Thereafter, we executed the command “bin/dspace oai import”, which corrected the changed records in the OAI database and cache.

 

If you delete records, you may have to run the following commands to rebuild your OAI indexes:

bin/dspace oai clean-cache

bin/dspace oai import -c

 

Hope this helps!

Regards,

David

From: dspace-c...@googlegroups.com <dspace-c...@googlegroups.com> On Behalf Of Lewatle Johannes Phaladi


Sent: Tuesday, 19 February 2019 11:28 AM
To: DSpace Community <dspace-c...@googlegroups.com>

--

All messages to this mailing list should adhere to the DuraSpace Code of Conduct: https://duraspace.org/about/policies/code-of-conduct/
---
You received this message because you are subscribed to the Google Groups "DSpace Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-communi...@googlegroups.com.

To post to this group, send email to dspace-c...@googlegroups.com.

Lewatle Johannes Phaladi

unread,
Feb 20, 2019, 9:13:09 AM2/20/19
to DSpace Community
Hi David,

Thanks very much, I am now identifying many items with characters in the abstract where users copied and pasted to Dspace. I am busy removing them while harvesting on development and number of harvested items are increasing from 100 to 4574, I am hoping all will go well, i will let you know once all is completed or if there is new error.

Regards,
Lewatle 

To post to this group, send email to dspace-...@googlegroups.com.

Lewatle Johannes Phaladi

unread,
Feb 27, 2019, 6:58:15 AM2/27/19
to DSpace Community
Hi David,

I have I found unknown characters in description and abstract field where our submitters copied and pasted metadata from word document to Dspace, I have corrected one by one item running bin/dspace oai clean-cache,  bin/dspace oai import -c after cleaning individual item and harvesting to dev site when getting harvest error i clicked link inside and find identifier which direct me to item that has characters then i correct the item and save updates then run the commands again and harvest again aftyer deleting test community, at the end all 44 items found and cleared. I have received confirmationfrom external harvester that they managed to harvest all collection.

Thank you very much for sharing solution.

Regards,
Lewatle  

Jean Pierre

unread,
Aug 13, 2021, 8:10:40 PM8/13/21
to DSpace Community
Hello.
Please could you help me how to identify strange characters, in my dspace the same thing is happening to me and I want to debug the characters? How can I filter and identify the errors? and if it is by database or by the same dspace system?
Reply all
Reply to author
Forward
0 new messages