[Dspace-tech] filter-media allways get java.lang.OutOfMemoryError: Java heap space

217 views
Skip to first unread message

Rui Ramos

unread,
Aug 25, 2015, 12:51:02 PM8/25/15
to Dspace Tech
Hi *,

I'm getting this error when running filter-media

...
ERROR filtering, skipping bitstream:

Item Handle: 10216/10063
Bundle Name: ORIGINAL
File Size: 122589469
Checksum: 7f9aa1bde30b1c3f17b6e8589bcf36f6 (MD5)
Asset Store: 0
org.pdfbox.exceptions.WrappedIOException: Java heap space
org.pdfbox.exceptions.WrappedIOException: Java heap space
at org.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:234)
at org.pdfbox.pdmodel.PDDocument.load(PDDocument.java:707)
at org.pdfbox.pdmodel.PDDocument.load(PDDocument.java:691)
at
org.dspace.app.mediafilter.PDFFilter.getDestinationStream(PDFFilter.java:140)
at
org.dspace.app.mediafilter.MediaFilterManager.processBitstream(MediaFilterManager.java:652)
at
org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:554)
at
org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:504)
at
org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilterManager.java:472)
at
org.dspace.app.mediafilter.MediaFilterManager.applyFiltersAllItems(MediaFilterManager.java:425)
at
org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:359)
java.lang.OutOfMemoryError: Java heap space
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space


I don't know if this is because of a big pdf file or another thing
that's causing this.

Is it possible to prevent the filter command to open pdf files bigger
then XX mb ?

Or skip some of them ?

Another thoughts on how to solve this would be apreciated.

Best regards, Rui


signature.asc

Stuart Lewis

unread,
Aug 25, 2015, 12:51:06 PM8/25/15
to rra...@reit.up.pt, Dspace Tech
Hi Rui,

Which version of DSpace are you running? If you are running 1.5 or
later, you can set the following options in dspace.cfg:

# If true, larger PDFs are written to a temp file as they are
indexed...this
# is slower, but helps ensure that PDFBox software DSpace uses doesn't
eat up
# all your memory
#pdffilter.largepdfs = true

# If true, PDFs which still result in an Out of Memory error from PDFBox
# are skipped over...these problematic PDFs will never be indexed until
# memory usage can be decreased in the PDFBox software
#pdffilter.skiponmemoryexception = true

Thanks,


Stuart Lewis
Digital Services Programmer
Te Tumu Herenga The University of Auckland Library
Auckland Mail Centre, Private Bag 92019, Auckland 1142, New Zealand
Ph: 64 9 373-7599 x81928
http://www.library.auckland.ac.nz/

Rui Ramos

unread,
Aug 25, 2015, 12:51:22 PM8/25/15
to Dspace Tech
Thanks on the info Stuart,

It's version 1.5.1 in the moment. I made the changes you mention and it
processed some more. I still have a memory problem do

java.lang.Throwable: Warning: You did not close the PDF Document
at org.pdfbox.cos.COSDocument.finalize(COSDocument.java:418)
at java.lang.ref.Finalizer.invokeFinalizeMethod(Native Method)
at java.lang.ref.Finalizer.runFinalizer(Finalizer.java:83)
at java.lang.ref.Finalizer.access$100(Finalizer.java:14)
at java.lang.ref.Finalizer
$FinalizerThread.run(Finalizer.java:160)
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

Don't know if it's a bad pdf that's causing this. Anyway the process
should continue with the rest, right ?

Any thoughts on how to solve this issue ?

Cheers, Rui
Reply all
Reply to author
Forward
0 new messages