oai import error dspace 7.6.1

57 views
Skip to first unread message

lucasangelo...@gmail.com

unread,
Jun 17, 2024, 10:23:18 AMJun 17
to DSpace Community
  Dear Collegues, 

I am having a problem when trying to index the metadata using OAI import. In version 7.5, I was able to import all 42,000 items from the digital library. I installed version 7.6.1, and now when I try to run the command, I can only index 1/3 of the total, and it generates a Java heap error. Does anyone know if this is a common issue with version 7.6.1?


Thanks advanced.

Holger Lenz

unread,
Jun 17, 2024, 5:16:38 PMJun 17
to DSpace Community
Hi there,

Are you experiencing the error  " java.lang.OutOfMemoryError: Java heap space",  or it is it a different error?

If it is the former, there is documentation on that (most likely a memory issue): https://wiki.lyrasis.org/display/DSDOC7x/Performance+Tuning+DSpace (subheading "Performance Tuning the Backend (REST API)")

Please let us know if this doesn't point you in the right direction.

Holger



lucasangelo...@gmail.com

unread,
Jun 19, 2024, 8:30:09 AMJun 19
to DSpace Community
  Thank you for your response. 
I am encountering this Java heap error, and I will follow the link you provided to implement the correction. Thank you again.  

lucasangelo...@gmail.com

unread,
Jun 25, 2024, 5:11:35 PM (9 days ago) Jun 25
to DSpace Community

Dear colleague,

I applied the fix for the Java heap memory issue, setting it to 4 GB, but it is not sufficient. When indexing 20k items, it crashes. I also tried this on another instance of DSpace version 7.6.1 and the same thing happens.


Em segunda-feira, 17 de junho de 2024 às 18:16:38 UTC-3, Holger Lenz escreveu:

DSpace Community

unread,
Jun 25, 2024, 5:55:28 PM (9 days ago) Jun 25
to DSpace Community
Hi,

I think we'd need more information on the exact command you are running.  You also should check your logs to see if errors are occurring *before* the Java heap issue.  See our troubleshooting guide: https://wiki.lyrasis.org/display/DSPACE/Troubleshoot+an+error#Troubleshootanerror-DSpace7.x(orabove)

I'm not aware of a memory issue in the "dspace oai import" command.  But, it is always possible that you've encountered a new/undiscovered issue with the command.  So, we need to understand exactly what command you are running in order to see if others can reproduce the issue.

Based on what you've shared so far, it does sound like you might be encountering some sort of bug (especially if it worked fine in 7.5 but the same command isn't working in 7.6.1).  So, you are also welcome to share the detailed information in a bug ticket (https://github.com/DSpace/DSpace/issues), and we can then look for volunteers to investigate what might be going on.

Tim

lucasangelo...@gmail.com

unread,
Jun 25, 2024, 7:55:04 PM (9 days ago) Jun 25
to DSpace Community
Hi,

I run ./dspace oai import -c 

After collecting 14k items I have the following error in the console:
java.lang.OutOfMemoryError: Java heap space
        at java.base/java.util.Arrays.copyOf(Arrays.java:3745)
        at java.base/java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:120)
        at java.base/java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:95)
        at java.base/java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:156)
        at com.ctc.wstx.io.UTF8Writer.write(UTF8Writer.java:143)
        at com.ctc.wstx.sw.BufferingXmlWriter.flushBuffer(BufferingXmlWriter.java:1417)
        at com.ctc.wstx.sw.BufferingXmlWriter.fastWriteRaw(BufferingXmlWriter.java:1463)
        at com.ctc.wstx.sw.BufferingXmlWriter.writeStartTagStart(BufferingXmlWriter.java:763)
        at com.ctc.wstx.sw.BaseNsStreamWriter.doWriteStartTag(BaseNsStreamWriter.java:612)
        at com.ctc.wstx.sw.BaseNsStreamWriter.writeStartElement(BaseNsStreamWriter.java:310)
        at com.lyncode.xoai.util.XmlIOUtils.writeElement(XmlIOUtils.java:19)
        at com.lyncode.xoai.dataprovider.xml.xoai.Metadata.write(Metadata.java:95)
        at org.dspace.xoai.app.XOAI.index(XOAI.java:485)
        at org.dspace.xoai.app.XOAI.index(XOAI.java:320)
        at org.dspace.xoai.app.XOAI.indexAll(XOAI.java:265)
        at org.dspace.xoai.app.XOAI.index(XOAI.java:158)
        at org.dspace.xoai.app.XOAI.main(XOAI.java:618)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)


When I go to check dspace.log I have the following error:

2024-06-25 23:46:36,955 INFO  unknown unknown org.dspace.xoai.util.ItemUtils @ Missing READ rights for license bitstream. Did not include license bitstream for item: 3e52cc21-e8f6-4468-8e59-1e7c371b6b2f.
2024-06-25 23:46:44,672 ERROR unknown unknown org.dspace.xoai.app.XOAI @ Java heap space
java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:3745) ~[?:?]
        at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:120) ~[?:?]
        at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:95) ~[?:?]
        at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:156) ~[?:?]
        at com.ctc.wstx.io.UTF8Writer.write(UTF8Writer.java:143) ~[woodstox-core-6.2.4.jar:6.2.4]
        at com.ctc.wstx.sw.BufferingXmlWriter.flushBuffer(BufferingXmlWriter.java:1417) ~[woodstox-core-6.2.4.jar:6.2.4]
        at com.ctc.wstx.sw.BufferingXmlWriter.fastWriteRaw(BufferingXmlWriter.java:1463) ~[woodstox-core-6.2.4.jar:6.2.4]
        at com.ctc.wstx.sw.BufferingXmlWriter.writeStartTagStart(BufferingXmlWriter.java:763) ~[woodstox-core-6.2.4.jar:6.2.4]
        at com.ctc.wstx.sw.BaseNsStreamWriter.doWriteStartTag(BaseNsStreamWriter.java:612) ~[woodstox-core-6.2.4.jar:6.2.4]
        at com.ctc.wstx.sw.BaseNsStreamWriter.writeStartElement(BaseNsStreamWriter.java:310) ~[woodstox-core-6.2.4.jar:6.2.4]
        at com.lyncode.xoai.util.XmlIOUtils.writeElement(XmlIOUtils.java:19) ~[xoai-3.4.0.jar:3.4.0]
        at com.lyncode.xoai.dataprovider.xml.xoai.Metadata.write(Metadata.java:95) ~[xoai-3.4.0.jar:3.4.0]
        at org.dspace.xoai.app.XOAI.index(XOAI.java:485) ~[dspace-oai-7.6.1.jar:7.6.1]
        at org.dspace.xoai.app.XOAI.index(XOAI.java:320) ~[dspace-oai-7.6.1.jar:7.6.1]
        at org.dspace.xoai.app.XOAI.indexAll(XOAI.java:265) ~[dspace-oai-7.6.1.jar:7.6.1]
        at org.dspace.xoai.app.XOAI.index(XOAI.java:158) ~[dspace-oai-7.6.1.jar:7.6.1]
        at org.dspace.xoai.app.XOAI.main(XOAI.java:618) [dspace-oai-7.6.1.jar:7.6.1]
        at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
        at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?]
        at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
        at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
        at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:283) [dspace-api-7.6.1.jar:7.6.1]
        at org.dspace.app.launcher.ScriptLauncher.handleScript(ScriptLauncher.java:134) [dspace-api-7.6.1.jar:7.6.1]
        at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:99) [dspace-api-7.6.1.jar:7.6.1]
2024-06-25 23:46:44,721 INFO  unknown unknown org.ehcache.core.EhcacheManager @ Cache 'org.dspace.content.MetadataSchema' removed from Eh107InternalCacheManager.


DSpace Community

unread,
Jun 26, 2024, 4:44:24 PM (8 days ago) Jun 26
to DSpace Community
Hi Lucas,

When you run "./dspace oai import -c" you should see occasional messages like this...

___ items imported so far...

These are batches of items that are being committed every once in a while.  The batch size is defined by "oai.import.batch.size" in your [dspace]/config/oai.cfg  (default is 1,000).

So, a few options exist:

  1. You could decrease the batch size to see if that avoids the Out of Memory error.  Set that config to 100 or 500 in either your local.cfg or in the oai.cfg.  Then rerun the script. It will likely go a bit slower, but it should use less memory
  2. You could increase the memory available by ensuring that your commandline tools have more the 4GB of memory.  See instructions at https://wiki.lyrasis.org/display/DSDOC7x/Performance+Tuning+DSpace#PerformanceTuningDSpace-GivetheCommandLineToolsMoreMemory
  3. Or, if none of that works, then you could just keep running "./dspace oai import -c" again and again until everything is indexed.  The script should start off each time from the where you left off (it will determine which Items are already indexed and skip them).

Hopefully that will help.  If we find this is a common issue, there may be a bug here in how memory is used (as it seems like we shouldn't be hitting this error at all), but hopefully those workarounds will help you get past this issue.

Tim

lucasangelo...@gmail.com

unread,
Jun 26, 2024, 8:25:41 PM (7 days ago) Jun 26
to DSpace Community

Hello,

I reduced the slot size to 100, but continued to face the same problem. In this case, it was not even possible to index 14 thousand items, only 12 thousand. Then, I adopted the other strategy of repeatedly running the command ./dspace oai import -c until all items were indexed. I executed the command 10 times, and each time, 12 thousand items were indexed. However, when checking the total on the OAI interface, there was no change; it remained the same 12 thousand items.

Nicholas Woodward

unread,
Jun 27, 2024, 3:28:38 PM (7 days ago) Jun 27
to DSpace Community
Hi Lucas,
Have you tried running OAI import with the -v flag and redirecting that output to a log file? That will indicate the last item to be indexed before the OOM error. If it's the same item every time then you could look at that item to see if there's anything different about it that may be causing the error.

Between 7.5 and 7.6.1 there weren't many changes to the OAI import process, but there was a new XOAI Extension added to get the item's access status: https://github.com/DSpace/DSpace/blob/dspace-7_x/dspace-oai/src/main/java/org/dspace/xoai/app/plugins/AccessStatusElementItemCompilePlugin.java.

Nick
Reply all
Reply to author
Forward
0 new messages