[Dspace-tech] filter-media error in DSpace 1.4.2

20 views
Skip to first unread message

Thornton, Susan M. (LARC-B702)[NCI INFORMATION SYSTEMS]

unread,
Aug 25, 2015, 10:55:43 AM8/25/15
to dspac...@lists.sourceforge.net
No wonder I didn't get any responses on my previous message...no one
recognized the job name! :-) Sorry...the job that is getting the
following error is "filter-media". It intermittently gets the following
error and a JAVA "heap space" error which someone way-back-when told me
was supposed to be a bug that was going to be fixed.

Does anyone know if there is a fix for it yet? I'm afraid our full-text
search is not accurate because this job is blowing up mid-stream.

Thanks,
Sue

p.s. rim-filter is just our name for the media-filter job with a couple
of delete files added...

Sue Walker-Thornton
NASA Langley Research Center
757-224-4074


Error:
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit
exceeded
at java.util.HashMap.addEntry(HashMap.java:753)
at java.util.HashMap.put(HashMap.java:385)
at org.fontbox.cmap.CMap.addMapping(CMap.java:132)
at org.fontbox.cmap.CMapParser.parse(CMapParser.java:153)
at org.pdfbox.pdmodel.font.PDFont.parseCmap(PDFont.java:535)
at org.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:387)
at
org.pdfbox.util.PDFStreamEngine.showString(PDFStreamEngine.java:325)
at org.pdfbox.util.operator.ShowText.process(ShowText.java:64)
at
org.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:452
)
at
org.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:21
5)
at
org.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:174)
at
org.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:336)
at
org.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:259)
at
org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:216)
at
org.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:149)
at
org.dspace.app.mediafilter.PDFFilter.getDestinationStream(PDFFilter.java
:110)
at
org.dspace.app.mediafilter.MediaFilter.processBitstream(MediaFilter.java
:155)
at
org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilte
rManager.java:340)
at
org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterMana
ger.java:309)
at
org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilt
erManager.java:274)
at
org.dspace.app.mediafilter.MediaFilterManager.applyFiltersAllItems(Media
FilterManager.java:242)
at
org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.ja
va:193)

-----Original Message-----
From: dspace home directory [mailto:dsp...@odyssey.larc.nasa.gov]
Sent: Thursday, June 05, 2008 1:05 AM
To: dsp...@odyssey.larc.nasa.gov
Subject: Output from "cron" command

Your "cron" job on odyssey
/dspace/bin/rim-filter > /dspace/bin/rim-filter.log

produced the following output:


------------------------------------------------------------------------
-
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
DSpace-tech mailing list
DSpac...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Graham Triggs

unread,
Aug 25, 2015, 10:55:44 AM8/25/15
to Thornton, Susan M. (LARC-B702)[NCI INFORMATION SYSTEMS], dspac...@lists.sourceforge.net
There is no fix - it is essentially a bug within PDFBox.

In 1.5, there is a workaround that catches the out of memory exceptions,
and skips the record.

G
> -------------------------------------------------------------------------
> Check out the new SourceForge.net Marketplace.
> It's the best place to buy or sell services for
> just about anything Open Source.
> http://sourceforge.net/services/buy/index.php
> _______________________________________________
> DSpace-tech mailing list
> DSpac...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dspace-tech
>

This email has been scanned by Postini.
For more information please visit http://www.postini.com


Thornton, Susan M. (LARC-B702)[NCI INFORMATION SYSTEMS]

unread,
Aug 25, 2015, 10:55:54 AM8/25/15
to Graham Triggs, dspac...@lists.sourceforge.net
Would I be able to use this program within DSpace 1.4.2?
Thanks,
Sue

Graham Triggs

unread,
Aug 25, 2015, 10:56:02 AM8/25/15
to Thornton, Susan M. (LARC-B702)[NCI INFORMATION SYSTEMS], dspac...@lists.sourceforge.net
Sue,

I *think* you can use the PDFFilter class from 1.5
(/dspace-api/src/main/java/org/dspace/app/mediafilter/PDFFilter.java)

without any problems.

There are changes post-1.4.2 in the media filter regarding the ability
to use 'self-named' plugins, but it doesn't look like this affects the
PDFFilter.

Otherwise, you should be able to get enough information from the
PDFFilter in 1.5 as to how it catches and handles the
OutOfMemoryExecption to port it back to 1.4.2 if necessary.
Reply all
Reply to author
Forward
0 new messages