Harvesting Error when running automatic harvesting scheduler

595 views
Skip to first unread message

euler

unread,
Sep 12, 2018, 2:42:34 AM9/12/18
to DSpace Technical Support
Dear All,

I'm having trouble using the automatic harvest scheduler both in the XMLUI and in the command line. The version of DSpace the server is running on is 6.3. When I use the command line (dspace harvest -S), the command line returned the following error:

Starting harvest loop... running.
org.dspace.core.exception.DatabaseSchemaValidationException: The schema validator returned: null
        at org.dspace.core.Context.init(Context.java:170)
        at org.dspace.core.Context.<init>(Context.java:126)
        at org.dspace.harvest.HarvestScheduler.scheduleLoop(HarvestScheduler.java:149)
        at org.dspace.harvest.HarvestScheduler.run(HarvestScheduler.java:140)
        at java.lang.Thread.run(Unknown Source)
Exception in thread "Thread-4" org.dspace.core.exception.DatabaseSchemaValidationException: The schema validator returned: null
        at org.dspace.core.Context.init(Context.java:170)
        at org.dspace.core.Context.<init>(Context.java:126)
        at org.dspace.harvest.HarvestScheduler.scheduleLoop(HarvestScheduler.java:234)
        at org.dspace.harvest.HarvestScheduler.run(HarvestScheduler.java:140)
        at java.lang.Thread.run(Unknown Source)


The log showed the message below after running the command dspace harvest -S:

2018-09-12 13:14:57,708 FATAL org.dspace.core.Context @ Cannot obtain the bean which provides a database connection. Check previous entries in the dspace.log to find why the db failed to initialize. The schema validator returned: null
2018-09-12 13:14:57,709 ERROR org.dspace.harvest.HarvestScheduler @ Exception on iteration: 0
2018-09-12 13:14:57,713 FATAL org.dspace.core.Context @ Cannot obtain the bean which provides a database connection. Check previous entries in the dspace.log to find why the db failed to initialize. The schema validator returned: null

When using the Harvesting tab in the XMLUI control panel, after clicking Start Harvester, the page returned this message:

failed to lazily initialize a collection of role: org.dspace.content.DSpaceObject.metadata, could not initialize proxy - no Session

Although when I press the back button of the browser, the status says that the scheduler is waiting for collections to harvest. It seems that the automatic scheduler from the UI is now running although it didn't harvest any new items from the collections that I've set up for harvesting because I have been receiving email messages from the server informing me of a Harvest error. Below is the content of the harvest error message:

Collection 5bee1194-648c-4136-8671-ddc43fc3f85f failed on harvest:

Date:           9/11/18 6:25 PM
Status Flag:    -1

We need at least an eperson or a group in order to create a resource policy.

Exception:
java.lang.IllegalArgumentException: We need at least an eperson or a group in order to create a resource policy.
        at org.dspace.authorize.AuthorizeServiceImpl.createResourcePolicy(AuthorizeServiceImpl.java:777)
        at org.dspace.authorize.AuthorizeServiceImpl.addPolicy(AuthorizeServiceImpl.java:533)
        at org.dspace.content.WorkspaceItemServiceImpl.create(WorkspaceItemServiceImpl.java:99)
        at org.dspace.harvest.OAIHarvester.processRecord(OAIHarvester.java:539)
        at org.dspace.harvest.OAIHarvester.runHarvest(OAIHarvester.java:367)
        at org.dspace.harvest.HarvestThread.runHarvest(HarvestThread.java:57)
        at org.dspace.harvest.HarvestThread.run(HarvestThread.java:41)
        at java.lang.Thread.run(Unknown Source)

So I would like to ask for help if anyone have ever encountered the errors above and how did you resolve it? I tried to harvest the collection that is giving me these errors in the dspace demo site using the same harvest_type and metadata_format and the harvest is successful. I just want to harvest automatically so that I don't have to run the harvest manually.

Thanks in advance,
euler

euler

unread,
Sep 27, 2018, 2:48:38 AM9/27/18
to DSpace Technical Support
Dear All,

I am reposting this since I received no responses. It turned out that the Automatic Harvesting (Scheduler) from the Control Panel Screen in the demo server is also experiencing the same error that I stated below. Since I don't have access to the log files of the demo server, I can only assume that it is likely that what I am describing below is also happening in the demo server. I tried this on a fresh install of DSpace version 6.3 and I reproduced the errors mentioned. Right now, the only way I can successfully harvest (not on schedule or automatic) is via the command line by issuing dspace harvest -r -e my@email -c collection_handle.

Also, the error occuring in the demo server when enabling automatic harvesting in the control panel screen was posted in this mailing-list but unfortunately it has no responses: OAI Harvesting Error: failed to lazily initialize a collection of role

Hoping for your positive response since the repository that I'm working on will have the majority of its contents from external dspace sources.

Thanks in advance!
euler

Tim Donohue

unread,
Sep 28, 2018, 12:11:49 PM9/28/18
to euler, DSpace Technical Support
Hi Euler,

This sounds like a bug in the OAI Harvester to me (likely accidentally caused by the Java API refactor in 6.x).  The reason I think that is because of the error you are seeing from the Start Harvester:


failed to lazily initialize a collection of role: org.dspace.content.DSpaceObject.metadata, could not initialize proxy - no Session 

This error is one that we ran into a lot during the development of DSpace 6 (and in later bug fix releases).  It's generally caused by Hibernate attempting to use an object that has been "decached" from memory.  So, it sounds to me like the code here has a bug somewhere that is causing this.

Here's other tickets that describe essentially this same behavior in other areas of DSpace:
https://jira.duraspace.org/browse/DS-3660

Could you create a new JIRA ticket to for this ticket (and essentially provide the same details you sent here via email)?   I can then link it up to all these old tickets and see if we can find a volunteer to investigate & resolve it via a PR.

Thanks

Tim
 

--
All messages to this mailing list should adhere to the DuraSpace Code of Conduct: https://duraspace.org/about/policies/code-of-conduct/
---
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech...@googlegroups.com.
To post to this group, send email to dspac...@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.
--
Tim Donohue
Technical Lead for DSpace & DSpaceDirect
DuraSpace.org | DSpace.org | DSpaceDirect.org

euler

unread,
Sep 30, 2018, 10:51:57 PM9/30/18
to DSpace Technical Support
Hi Tim,

Thanks for the info. I have created a JIRA issue DS-4028. I'm sorry, I can't seem to edit the details after I submitted the ticket but I linked this message thread to that issue.

Thanks again,
euler

Kev Gunn

unread,
Nov 1, 2018, 9:56:01 PM11/1/18
to DSpace Technical Support
Hi,

I got this error as well, but as far as i can tell it's exactly what it says, there is no eperson or group and this exception is explicitly thrown. At least when the harvest is invoked by the scheduler, not the command line, or from UI scenarios.

My previous issue was that the HARVESTED_COLLECTION entity was not updated, and was stuck with a status of QUEUED even though the harvest completed without error, should have been put back into status of READY.
So I checked if the error was still in DSpace 6.3, checked out repo for tag dspace-6.3, compiled installed etc.
First the exception trace line numbers were wrong, this was because the build doubled up on DSpace JARs in the dspace lib directory, and for my XMLUI webapp lib dir. So I removed the older 6.0 versions to get the correct line numbers in the trace.

2018-11-02 11:36:37,490 ERROR org.dspace.harvest.OAIHarvester @ Error occurred while generating an OAI response: We need at least an eperson or a group in order to create a resource policy. null
java.lang.IllegalArgumentException: We need at least an eperson or a group in order to create a resource policy.
at org.dspace.authorize.AuthorizeServiceImpl.createResourcePolicy(AuthorizeServiceImpl.java:777)
at org.dspace.authorize.AuthorizeServiceImpl.addPolicy(AuthorizeServiceImpl.java:533)
at org.dspace.content.WorkspaceItemServiceImpl.create(WorkspaceItemServiceImpl.java:99)
at org.dspace.harvest.OAIHarvester.processRecord(OAIHarvester.java:539)
at org.dspace.harvest.OAIHarvester.runHarvest(OAIHarvester.java:367)
at org.dspace.harvest.HarvestThread.runHarvest(HarvestThread.java:57)
at org.dspace.harvest.HarvestThread.run(HarvestThread.java:41)
at java.lang.Thread.run(Unknown Source)

which leads you here.

@Override
public ResourcePolicy createResourcePolicy(Context context, DSpaceObject dso, Group group, EPerson eperson, int type, String rpType) throws SQLException, AuthorizeException {
if(group == null && eperson == null)
{
throw new IllegalArgumentException("We need at least an eperson or a group in order to create a resource policy.");
}

ResourcePolicy myPolicy = resourcePolicyService.create(context);
myPolicy.setdSpaceObject(dso);
myPolicy.setAction(type);
myPolicy.setGroup(group);
myPolicy.setEPerson(eperson);
myPolicy.setRpType(rpType);
resourcePolicyService.update(context, myPolicy);

return myPolicy;
}

For a scheduled harvest, there is no session, so no current user in the context? Or is there some default? Think previously it might have been the system admin created by 'dspace create-administrator' command.
...
// Create an item
Item item = itemService.create(context, workspaceItem);
item.setSubmitter(context.getCurrentUser());
...

and group is passed as null, so both eperson and group are null.

This is a brand new fresh install, and i didn't explicitly create any polices for the collection i created. I can't see any config option for an eperson to use for the scheduled scenario.

I'll keep digging, but I'm not familiar with DSpace, does KevinVdV still work on this project, he seems to have written much of this code?

Cheers
KevinG

Kev Gunn

unread,
Nov 1, 2018, 11:17:25 PM11/1/18
to DSpace Technical Support
After some more digging, the class HarvestScheduler uses the configuration property oai.harvester.eperson to defined a harvestAdmin.
Currently this isn't passed to HarvestThread which constructs a new context for the OAIHarvest to run in, updating the constructor to pass the harvestAdmin to see if that resolved this issue.
oai.harvester.eperson isn't defined in the default oai.cfg properties file so i wasn't aware it was needed without looking at the source code.

Cheers

On Wednesday, 12 September 2018 16:42:34 UTC+10, euler wrote:

Kev Gunn

unread,
Nov 4, 2018, 7:36:12 PM11/4/18
to DSpace Technical Support
So made the updates, added the missing configuration, and the harvest ran, but now I encounter a thread related hibernate error reported in this post: https://groups.google.com/forum/#!topic/dspace-tech/37HCRaCyePw

Does the scheduled harvest work in DSpace 6.3? Can anyone confirm that for me please?

Log snap shot
2018-11-05 10:16:22,553 ERROR org.dspace.harvest.OAIHarvester @ Error occurred while generating an OAI response: possible non-threadsafe access to session null
org.hibernate.AssertionFailure: possible non-threadsafe access to session
at org.hibernate.action.internal.EntityInsertAction.execute(EntityInsertAction.java:92)
at org.hibernate.engine.spi.ActionQueue.execute(ActionQueue.java:395)
at org.hibernate.engine.spi.ActionQueue.executeActions(ActionQueue.java:387)
at org.hibernate.engine.spi.ActionQueue.executeActions(ActionQueue.java:303)
at org.hibernate.event.internal.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:349)
at org.hibernate.event.internal.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:56)
at org.hibernate.internal.SessionImpl.flush(SessionImpl.java:1195)
at sun.reflect.GeneratedMethodAccessor148.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.hibernate.context.internal.ThreadLocalSessionContext$TransactionProtectionWrapper.invoke(ThreadLocalSessionContext.java:352)
at com.sun.proxy.$Proxy108.flush(Unknown Source)
at org.dspace.core.HibernateDBConnection.commit(HibernateDBConnection.java:83)
at org.dspace.core.Context.commit(Context.java:435)
at org.dspace.harvest.OAIHarvester.intermediateCommit(OAIHarvester.java:436)
at org.dspace.harvest.OAIHarvester.runHarvest(OAIHarvester.java:367)
at org.dspace.harvest.HarvestThread.runHarvest(HarvestThread.java:61)
at org.dspace.harvest.HarvestThread.run(HarvestThread.java:45)
at java.lang.Thread.run(Unknown Source)
2018-11-05 10:16:22,623 INFO  org.hibernate.engine.jdbc.batch.internal.AbstractBatchImpl @ HHH000010: On release of batch it still contained JDBC statements
2018-11-05 10:16:22,623 WARN  org.hibernate.engine.jdbc.spi.SqlExceptionHelper @ SQL Error: 1, SQLState: 23000
2018-11-05 10:16:22,624 ERROR org.hibernate.engine.jdbc.spi.SqlExceptionHelper @ ORA-00001: unique constraint (DSPACE_EXT.SYS_C00147126) violated

2018-11-05 10:16:22,624 WARN  org.hibernate.engine.jdbc.spi.SqlExceptionHelper @ SQL Error: 1, SQLState: 23000
2018-11-05 10:16:22,624 ERROR org.hibernate.engine.jdbc.spi.SqlExceptionHelper @ ORA-00001: unique constraint (DSPACE_EXT.SYS_C00147126) violated

2018-11-05 10:16:22,628 ERROR org.hibernate.engine.jdbc.batch.internal.BatchingBatch @ HHH000315: Exception executing batch [could not execute batch]
2018-11-05 10:16:22,633 ERROR org.dspace.harvest.HarvestThread @ Runtime exception in thread: Thread[Thread-24,5,main]
2018-11-05 10:16:22,633 ERROR org.dspace.harvest.HarvestThread @ failed to lazily initialize a collection of role: org.dspace.content.Community.parentCommunities, could not initialize proxy - no Session null
2018-11-05 10:16:22,633 INFO  org.dspace.core.Context @ complete() was called on a closed Context object. No changes to commit.
2018-11-05 10:16:22,633 INFO  org.dspace.core.Context @ commit() was called on a closed Context object. No changes to commit.
2018-11-05 10:16:22,633 INFO  org.dspace.harvest.HarvestThread @ Thread for collection b2411369-fca4-44f3-ab14-4af19a562e9a ends - Harvested Collection: HarvestedCollection{id=1, collection=org.dspace.content.Collection@7fa79a09, oaiSource='https://ipubs.aims.gov.au:8543/oai/request', oaiSetId='col_11068_2', harvestType=1, harvestStatus=-1, harvestStartTime=Mon Nov 05 10:08:48 AEST 2018, lastHarvested=null, harvestMessage='Runtime error occurred while generating an OAI response', metadataConfigId='dim'}
2018-11-05 10:16:22,633 INFO  org.dspace.harvest.HarvestThread @ Thread for collection b2411369-fca4-44f3-ab14-4af19a562e9a completes.
2018-11-05 10:16:23,280 INFO  org.dspace.harvest.HarvestScheduler @ Done with iteration 7

Cheers

On Wednesday, 12 September 2018 16:42:34 UTC+10, euler wrote:

Diego Brice

unread,
Nov 7, 2018, 4:24:24 PM11/7/18
to DSpace Technical Support
So, If I add the property "oai.harvester.eperson" to oai cfg file it will work?. Or did you change any source code?

Regards

Diego

José Geraldo

unread,
Apr 1, 2019, 2:46:00 PM4/1/19
to DSpace Technical Support
Hello,

I would like to report a possible solution to the error:

1 - insert the oai.harvester.eperson property in the file oai.cfg

2 - edit the class HavestThread.java

private void runHarvest()
    {
        Context context;
        Collection dso;
        HarvestedCollection hc = null;
        try {
            context = new Context();
            String harvestAdminParam = ConfigurationManager.getProperty("oai", "harvester.eperson");
            context.setCurrentUser(EPersonServiceFactory.getInstance().getEPersonService().findByEmail(context, harvestAdminParam));
            dso = collectionService.find(context, collectionId);
            hc = harvestedCollectionService.find(context, dso);


What is the impact of this change on the system?
Reply all
Reply to author
Forward
0 new messages