[Dspace-tech] Getting image thumbnails

8 views
Skip to first unread message

Branko Kovacevic

unread,
Aug 24, 2015, 5:31:00 PM8/24/15
to dspac...@lists.sourceforge.net
Dear All,

So far we've been uploading jpg images into our DSpace system and had
no problems with getting thumbnails for them later.

Unfortunately, recently after uploading a dozen of items with tiff
images (their size is between 4 and 15 Mb)  couldn't  get thumbnails for
them. Filter-media script returns error message. Here is the portion of
the log file, with  some critical messages:

ERROR filtering, skipping bitstream #7542
java.io.FileNotFoundException: no such entry: "0Table"
java.io.FileNotFoundException: no such entry: "0Table"
   at
org.apache.poi.poifs.filesystem.DirectoryNode.getEntry(DirectoryNode.java :283)
   at
org.textmining.text.extraction.WordExtractor.extractText(WordExtractor.java:60)
   at
org.dspace.app.mediafilter.WordFilter.getDestinationStream(WordFilter.java:97)
   at
org.dspace.app.mediafilter.MediaFilter.processBitstream (MediaFilter.java:155)
   at
org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:327)
   at
org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:296)
   at
org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilterManager.java:266)
   at
org.dspace.app.mediafilter.MediaFilterManager.applyFiltersAllItems(MediaFilterManager.java:234)
   at
org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:185)
java.lang.Throwable: Warning: You did not close the PDF Document
   at org.pdfbox.cos.COSDocument.finalize(COSDocument.java:384)
   at gnu.gcj.runtime.FinalizerThread.run(libgcj.so.70)
java.lang.Throwable: Warning: You did not close the PDF Document
   at org.pdfbox.cos.COSDocument.finalize(COSDocument.java:384)
   at gnu.gcj.runtime.FinalizerThread.run (libgcj.so.70)
java.lang.Throwable: Warning: You did not close the PDF Document
   at org.pdfbox.cos.COSDocument.finalize(COSDocument.java:384)
   at gnu.gcj.runtime.FinalizerThread.run(libgcj.so.70)
java.lang.Throwable : Warning: You did not close the PDF Document
   at org.pdfbox.cos.COSDocument.finalize(COSDocument.java:384)
   at gnu.gcj.runtime.FinalizerThread.run(libgcj.so.70)
java.lang.Throwable: Warning: You did not close the PDF Document
   at org.pdfbox.cos.COSDocument.finalize(COSDocument.java:384)
   at gnu.gcj.runtime.FinalizerThread.run(libgcj.so.70)
FILTERED: bitstream 7682 and created
'articles_bridging_20000615.pdf.txt'
FILTERED: bitstream 7683 and created
'articles_sustainable_developement_20000815.pdf.txt'
GC Warning: Repeated allocation of very large block (appr. size
20230144):
        May lead to memory leak and poor performance.
FILTERED: bitstream 7684 and created
'articles_venture_20001215.pdf.txt'
FILTERED: bitstream 7685 and created
'articles_rethinking_20010215.pdf.txt'
FILTERED: bitstream 7686 and created
'articles_relationship_20010515.pdf.txt'
FILTERED: bitstream 7687 and created
'articles_org_capacity_20021115.pdf.txt'
GC Warning: Out of Memory!  Returning NIL!
Exception in thread "main" java.lang.OutOfMemoryError
   <<No stacktrace available>>

Is there any limit of the file size filtering?
Any help is highly appreciated.

Best regards,
Branko Kovacevic

Records Coordinator
Open Society Archives
Arany Janos u. 32
1051 Budapest, Hungary
phone: (36-1) 327-3266  or 327-2029
e-mail: kov...@ceu.hu
website: www.osa.ceu.hu
++++++++++++++++++++++++++++


Keith Gilbertson

unread,
Aug 24, 2015, 5:33:03 PM8/24/15
to Branko Kovacevic, dspac...@lists.sourceforge.net
Hello -

The errors you posted appear to be related to filtering of PDF and Word
documents. I'm not sure of the limits.

For what it's worth, here are some steps you can try to build JPEG
thumbnails of TIFFs in DSpace:

- Enter TIFF in the DSpace bitstream format registry with mime-type
image/tiff and file extensions tiff and tif

- Edit the dspace.cfg file and add image/tiff and TIFF to the
filter.org.dspace.app.mediafilter.JPEGFilter.inputFormats line

- Download and install the Java Advanced Imaging I/O tools, currently
available here:
https://jai-imageio.dev.java.net/binary-builds.html . These tools
contain a TIFF plugin that will allow the JPEGFilter to read the TIFF
format.

- Verify that the bitstreams are marked as TIFF format, and then run
the filter-media script to build the JPEG thumbnails for the TIFFs. Be
patient. If you see memory errors with large TIFF files, you can try
increasing the "-Xmx256m" (maximum heap size) parameter in the dsrun
script to resolve the problem.

If you have certain types of images, you may need to write a custom
filter or modify the JPEGFilter to get better results. For example, if
you have large TIFF files that are primarily black and white, the
JPEGFilter will favor speed over appearance when resampling the image to
the thumbnail sized JPEG, and the resulting thumbnail won't look much
like the original. You might need a filter that uses a different
resampling method.

-- Keith
Systems Developer
OhioLINK
> e-mail: kov...@ceu.hu <mailto:kov...@ceu.hu>
> website: www.osa.ceu.hu <http://www.osa.ceu.hu>
> ++++++++++++++++++++++++++++
>
>



Reply all
Reply to author
Forward
0 new messages