DSpace Java OutOfMemory Errors

816 views
Skip to first unread message

clob...@swarthmore.edu

unread,
Sep 29, 2015, 4:13:46 PM9/29/15
to DSpace Technical Support
We've been seeing DSpace crash a lot recently. Digging into the Catalina logs, I am seeing severe errors indicating a java memory leak:

SEVERE: A web application registered the JBDC driver [org.postgresql.Driver] but failed to unregister it when the 
web application was stopped. To prevent a memory leak, the JDBC Driver has been forcibly unregistered. SEVERE: A web application created a ThreadLocal with key of type [org.springframework.core.NamedThreadLocal]
(value [Prototype beans currently in creation]) and a value of type [null] (value [null]) but failed to remove it when
the web application was stopped. To prevent a memory leak, the ThreadLocal has been forcibly removed. SEVERE: A web application registered the JBDC driver [org.postgresql.Driver] but failed to unregister it when
the web application was stopped. To prevent a memory leak, the JDBC Driver has been forcibly unregistered. SEVERE: A web application appears to have started a thread named [MultiThreadedHttpConnectionManager cleanup] but has
failed to stop it. This is very likely to create a memory leak. SEVERE: A web application appears to have started a thread named [TP-Processor6] but has failed to stop it.
This is very likely to create a memory leak. SEVERE: A web application created a ThreadLocal with key of type [org.springframework.core.NamedThreadLocal]
(value [Prototype beans currently in creation]) and a value of type [null] (value [null]) but failed to remove it when
the web application was stopped. To prevent a memory leak, the ThreadLocal has been forcibly removed.

Looking at the dspace logs, I am seeing many Java OutOfMemory errors:

2015-09-27 02:02:20,235 ERROR org.dspace.app.mediafilter.PDFFilter @ Error parsing PDF document Java heap space java.lang.OutOfMemoryError: Java heap space at org.apache.pdfbox.io.RandomAccessBuffer.expandBuffer(RandomAccessBuffer.java:151) at org.apache.pdfbox.io.RandomAccessBuffer.write(RandomAccessBuffer.java:131) at org.apache.pdfbox.io.RandomAccessFileOutputStream.write(RandomAccessFileOutputStream.java:108) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126) at org.apache.pdfbox.pdmodel.encryption.SecurityHandler.encryptData(SecurityHandler.java:294) at org.apache.pdfbox.pdmodel.encryption.SecurityHandler.decryptStream(SecurityHandler.java:391) at org.apache.pdfbox.pdmodel.encryption.SecurityHandler.decrypt(SecurityHandler.java:363) at org.apache.pdfbox.pdmodel.encryption.SecurityHandler.decryptObject(SecurityHandler.java:337) at org.apache.pdfbox.pdmodel.encryption.SecurityHandler.proceedDecryption(SecurityHandler.java:177) at org.apache.pdfbox.pdmodel.encryption.StandardSecurityHandler.decryptDocument(StandardSecurityHandler.java:257) at org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1325) at org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:796) at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:310) at org.dspace.app.mediafilter.PDFFilter.getDestinationStream(PDFFilter.java:101) at org.dspace.app.mediafilter.MediaFilterManager.processBitstream(MediaFilterManager.java:737) at org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:561) at org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:511) at org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilterManager.java:479) at org.dspace.app.mediafilter.MediaFilterManager.applyFiltersAllItems(MediaFilterManager.java:414) at org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:333) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606)


Our Java memory options are set to 4096M, which is half of the memory we have (8G ram):

# dspace requires tomcat to use UTF-8, and also recommends memory parameters JAVA_OPTS="${JAVA_OPTS} -Xmx4096M -Xms4096M -Dfile.encoding=UTF-8"

We've just set up a cron to do nightly restarts of tomcat. I was just wondering if anyone else had some advice on handling
these out of memory errors? Is there anything else we can do aside from the nightly restart?

Thanks,
Chelsea

Andrea Schweer

unread,
Sep 29, 2015, 4:41:29 PM9/29/15
to clob...@swarthmore.edu, DSpace Technical Support
Hi Chelsea,


On 30/09/15 09:13, clob...@swarthmore.edu wrote:
We've been seeing DSpace crash a lot recently. Digging into the Catalina logs, I am seeing severe errors indicating a java memory leak:

SEVERE: A web application registered the JBDC driver [org.postgresql.Driver] but failed to unregister it when the 
web application was stopped. To prevent a memory leak, the JDBC Driver has been forcibly unregistered.
SEVERE: A web application created a ThreadLocal with key of type [org.springframework.core.NamedThreadLocal] 
(value [Prototype beans currently in creation]) and a value of type [null] (value [null]) but failed to remove it when 
the web application was stopped. To prevent a memory leak, the ThreadLocal has been forcibly removed.
SEVERE: A web application registered the JBDC driver [org.postgresql.Driver] but failed to unregister it when 
the web application was stopped. To prevent a memory leak, the JDBC Driver has been forcibly unregistered.
SEVERE: A web application appears to have started a thread named [MultiThreadedHttpConnectionManager cleanup] but has 
failed to stop it. This is very likely to create a memory leak.
SEVERE: A web application appears to have started a thread named [TP-Processor6] but has failed to stop it. 
This is very likely to create a memory leak.
SEVERE: A web application created a ThreadLocal with key of type [org.springframework.core.NamedThreadLocal] 
(value [Prototype beans currently in creation]) and a value of type [null] (value [null]) but failed to remove it when 
the web application was stopped. To prevent a memory leak, the ThreadLocal has been forcibly removed.

I think this is unrelated to the message below. These messages (unfortunately) are fairly typical when tomcat is shut down. Do you see anything in the catalina log from the time Tomcat crashed?
This message is coming from the media filter cron job, which isn't running under Tomcat at all.

Our Java memory options are set to 4096M, which is half of the memory we have (8G ram):

# dspace requires tomcat to use UTF-8, and also recommends memory parameters
JAVA_OPTS="${JAVA_OPTS} -Xmx4096M -Xms4096M -Dfile.encoding=UTF-8"

I'm assuming these are the JAVA_OPTS for tomcat? For the media filter, you'll also need to change Xms / Xms in [dspace]/bin/dspace -- that's where the command-line tools get their settings from.

You're not saying which DSpace version you're on; in DSpace 5.x the setting is here: https://github.com/DSpace/DSpace/blob/dspace-5_x/dspace/bin/dspace#L77 and defaults to 256MB heap memory for the command-line tools.

Actually, are you sure Tomcat is using the JAVA_OPTS you gave above? Could you have a look at the Java Information tab in your Control Panel (assuming XMLUI) and verify that the values under "Runtime statistics" look right (maximum memory should correspond to the 4GB you set in the JAVA_OPTS).


We've just set up a cron to do nightly restarts of tomcat. I was just wondering if anyone else had some advice on handling 
these out of memory errors? Is there anything else we can do aside from the nightly restart? 

I think we'll need more information from the actual time Tomcat crashes. If you can't find anything, you might like to consider telling Tomcat to write out a heap dump when it runs out of memory that can then be analysed to figure out what's using all the memory:
JAVA_OPTS='-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/path/to/dir'
The eclipse memory analyser tools is quite good for looking at the files produced this way, but of course you will need to have your tomcat crash at least once with that setting on!

cheers,
Andrea

-- 
Dr Andrea Schweer
IRR Technical Specialist, ITS Information Systems
The University of Waikato, Hamilton, New Zealand
+64-7-837 9120

clob...@swarthmore.edu

unread,
Sep 29, 2015, 5:01:52 PM9/29/15
to DSpace Technical Support, clob...@swarthmore.edu
Hi Andrea,

Thanks for the helpful pointers! I'll dig into them more tomorrow. I just wanted to quickly reply and say we are running DSpace 3.1

- Chelsea

clob...@swarthmore.edu

unread,
Oct 2, 2015, 11:50:32 AM10/2/15
to DSpace Technical Support, clob...@swarthmore.edu
Hi Andrea,

I checked the Xms / Xms in [dspace]/bin/dspace and its set to 256M:
 #Default Java to use 256MB of memory
 JAVA_OPTS=-Xmx256m

I checked the Java Information from the Control Panel and Runtime Statistics are reporting 4038 MiB and is consistent with the 4G set for Tomcat.

I went back to the logs for more information. Java OOM errors are only happening when the media filter is running. Yesterday, our DSpace instance had to be restarted around 1:30pm. I see no information in the DSpace logs or Catalina logs relating to the crash (no errors). The only other error I see happening repeatidly in the dspace log is the following:

2015-01-10 21:12:13,700 ERROR org.dspace.app.xmlui.cocoon.DSpaceCocoonServletFilter @ Serious Error Occurred Processing Request!
ClientAbortException:  java.net.SocketException: Broken pipe
        at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:358)
        at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:434)
        at org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:309)
        at org.apache.catalina.connector.OutputBuffer.flush(OutputBuffer.java:288)
        at org.apache.catalina.connector.Response.flushBuffer(Response.java:548)
        at org.apache.catalina.connector.ResponseFacade.flushBuffer(ResponseFacade.java:279)
        at javax.servlet.ServletResponseWrapper.flushBuffer(ServletResponseWrapper.java:166)
        at org.apache.cocoon.servletservice.HttpServletResponseBufferingWrapper.flushBufferedResponse(HttpServletResponseBufferingWrapper.java:240)

I'm new to supporting DSpace so I'm not sure what this error means, but none of the timestamps correlate to our crashes. Are there any other logs to look in to gain more information?

I can try adding a heap dump to our JAVA_OPTS but since I am not seeing Java OOMS error outside of the media filter, I'm not sure how helpful it would be.

- Chelsea

Alan Orth

unread,
Oct 4, 2015, 4:37:52 AM10/4/15
to clob...@swarthmore.edu, DSpace Technical Support
Hi, Chelsea!

Out of memory errors only happen when the Linux kernel runs out of memory and—as a last-gasp effort before crashing—tries to reclaim some memory by killing a process that is deems is responsible for using a lot of memory. The problem with Java's heap is that it just "takes it" all at once, and that memory cannot be used by the OS or any other applications, including cron jobs like the media filter.

You have apparently allocated 4GiB to Tomcat's Java heap, but I doubt you need that much. I'd recommend using 150% of the number you see in DSpace's control panel as "Used memory" and monitoring it. Then, in your media filter and other background tasks, you can just set the JAVA_OPTS on the command line or in your cron job:

JAVA_OPTS="-Xmx768M -Xms768M -Dfile.encoding=UTF-8"

Those are the settings we use in our cron tab, on a DSpace repository with 50,000 items. Other than that, we are only allocating 3GiB to Tomcat's Java heap. I think this should suffice for you. You don't want to starve the OS or other processes! Check the DSpace manual for a copy and paste-able example crontab.

Also, you shouldn't have to restart Tomcat every day, that's not a permanent solution—it's a temporary hack!

Alan

--
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech...@googlegroups.com.
To post to this group, send email to dspac...@googlegroups.com.
Visit this group at http://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.



--
Alan Orth
alan...@gmail.com
https://alaninkenya.org
https://mjanja.ch
"In heaven all the interesting people are missing." -Friedrich Nietzsche
GPG public key ID: 0x8cb0d0acb5cd81ec209c6cdfbd1a0e09c2f836c0

helix84

unread,
Oct 4, 2015, 6:21:58 AM10/4/15
to Alan Orth, Chelsea Lobdell, DSpace Technical Support
On Sun, Oct 4, 2015 at 10:37 AM, Alan Orth <alan...@gmail.com> wrote:
Out of memory errors only happen when the Linux kernel runs out of memory and—as a last-gasp effort before crashing—tries to reclaim some memory by killing a process that is deems is responsible for using a lot of memory. The problem with Java's heap is that it just "takes it" all at once, and that memory cannot be used by the OS or any other applications, including cron jobs like the media filter.

Alan, the Linux OOM killer is not at play here. Were it to kill tomcat, you wouldn't see it in tomcat logs as tomcat wouldn't get the chance to even log a message.

The issue here is, as correctly pointed out before, that command line tools are a separate process with their own memory limit. It should be also said that when settimg memory limits, swapping should be avoided at all costs, therefore to set memory limits, look at a) how much RAM the machine has, b) how much your system uses without tomcat running, c) -Xmx you allocated to Tomcat d) -Xmx you allocated to command line tools. Then change them while keeping in mind that a > b + c + d.

Regarding the specific original problem - raising the -Xmx for command line tools should help, but it just happens that there are sometimes larger PDFs than you have memory available to process them. The media filter will happily skip these and process the rest. If you don't have enough memory to extract fulltext from  a particular PDF, you may consider extracting the text in other way (an offline tool or maybe even manually copying the text out of a PDF viewer) and adding a .txt bitstream with the same name as the .pdf to the TEXT bundle of the pdf's item. I'm not suggesting it's the recommended approach, but it will get the job done if you can't live without the indexed fulltext.


Regards,
~~helix84

Compulsory reading: DSpace Mailing List Etiquette
https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Alan Orth

unread,
Oct 4, 2015, 9:11:46 AM10/4/15
to hel...@centrum.sk, Chelsea Lobdell, DSpace Technical Support

Right, helix, but I mentioned OOM because people seem confused by it and anyways over allocating the Tomcat Java heap potentially starves the OS and other processes like the media filter.

Alan

helix84

unread,
Oct 12, 2015, 5:44:39 AM10/12/15
to Chelsea Lobdell, DSpace Technical Support
Sorry, I just realized that media filter may not skip OOM failures by
default. It's because I use these dspace.cfg options:

pdffilter.largepdfs = true
pdffilter.skiponmemoryexception = true
Reply all
Reply to author
Forward
0 new messages