Hello,
I just try to get the XPDF based PDF Thumbnail creation working. It works fine in my DSpace 4.1 test instance.
The feature was already available in DSpace 1.8.2 which is still our production release. Instead of waiting until the new version is production ready, I install the features step by step in the production environment.
On the production machine, I get this error:
esxh-15:/srv/dspace# bin/dspace filter-media -i 2339/4318 -v
The following MediaFilters are enabled:
Full Filter Name: org.dspace.app.mediafilter.HTMLFilter
org.dspace.app.mediafilter.HTMLFilter
Full Filter Name: org.dspace.app.mediafilter.WordFilter
org.dspace.app.mediafilter.WordFilter
Full Filter Name: org.dspace.app.mediafilter.JPEGFilter
org.dspace.app.mediafilter.JPEGFilter
Full Filter Name: org.dspace.app.mediafilter.XPDF2Text
org.dspace.app.mediafilter.XPDF2Text
Full Filter Name: org.dspace.app.mediafilter.XPDF2Thumbnail
org.dspace.app.mediafilter.XPDF2Thumbnail
Full Filter Name: org.dspace.app.mediafilter.PowerPointFilter
org.dspace.app.mediafilter.PowerPointFilter
SKIPPED: bitstream 27442 (item: 2339/4318) because 'Limmerstraße.pdf.txt' already exists
ERROR filtering, skipping bitstream:
Item Handle: 2339/4318
Bundle Name: ORIGINAL
File Size: 2667225
Checksum: 3db0096cb62b6d595c1e4bb77f6833d0 (MD5)
Asset Store: 0
javax.imageio.IIOException: Can't read input file!
javax.imageio.IIOException: Can't read input file!
at javax.imageio.ImageIO.read(ImageIO.java:1291)
at org.dspace.app.mediafilter.XPDF2Thumbnail.getDestinationStream(XPDF2Thumbnail.java:244)
at org.dspace.app.mediafilter.MediaFilterManager.processBitstream(MediaFilterManager.java:737)
at org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:561)
at org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:511)
at org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilterManager.java:479)
at org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:353)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:622)
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:183)
Updating search index:
Note, that the text extraction took place in an earlier run of filter-media. So the message "Can't read input file!" is not very credible. Also the method called when the Exeption took place was XPDF2Thumbnail.getDestinationStream, which means that this issue might not be with the input file but with creating the output file.
In 2012, Osama Alkadi reported a similar issue and solved it by updating the pdftoppm tool. On Debian and Ubuntu, the required tools are contained in the package poppler-utils. I have installed Version 0.18.4 on both test and production machine. Here is the output:
esxh-15:/srv/dspace# pdftoppm -v
pdftoppm version 0.18.4
Copyright 2005-2011 The Poppler Developers -
http://poppler.freedesktop.org
Copyright 1996-2004 Glyph & Cog, LLC
The version numberings seems to have changed in unexpected ways as Osama Alkadi told that he updated from 3.0 to 3.0.2. For the moment, this does not help too much.
All other components involved are also the same on both machines. jai_imageio is version 1.1 and jai_core is 1.1.3.
As the file is hard to find in the assetstore, I downloaded it using the browser, scped it back to the server and converted it manually using pdftoppm -jpeg inputfile.pdf outputname. It worked.
I exported the item containing the file using the AIP packager, transferred it to the test server running DSpace 4.1, imported it and ran filter-media there. It worked fine.
I compared the installation instructions of DSpace 4.1 and 1.8.2 and could not find a significant difference regarding the XPDF Feature. The mvn package and ant update command had not shown any irregularities.
File permissions in assetstore did not show any problems. On both machines, DSpace is run as the daemon user tomcat7. In both cases, I run Tomcat 7, albeit in slightly different versions. But Tomcat is not involved in running the command line tool like bin/dspace filter-media anyway.
So far, I have not found a clue, where to search for the reason. If anybody has an idea, Id be grateful.
Bye, Christian