[Dspace-tech] Filtering errors with XPDF Tools

6 views
Skip to first unread message

Kurzenberger, Eric

unread,
Aug 25, 2015, 1:09:18 PM8/25/15
to dspac...@lists.sourceforge.net
Hello,

I'm running DSpace 1.5.2 XMLUI on a Linux system, and I've been trying to install the XPDF tools for PDF thumbnails, following the recommendation of a user on this list. I followed the instructions to install the tools in the 1.5.2 documentation and believe I've successfully installed the jai_imageio jar and set the configuration correctly. But when I try to run the filter-media command, I'm getting errors with pdfoppm.

The documentation mentions that a POM in the dspace-api module needs to be edited, but it doesn't specify which POM, and what element needs to be added.

I've included a sample of the errors below. Thanks for any help you can provide.

Error:

ERROR filtering, skipping bitstream:

Item Handle: 10538/148
Bundle Name: ORIGINAL
File Size: 6898424
Checksum: 4637dd47354393dc87f3d2b881ec311d (MD5)
Asset Store: 0
java.io.IOException: Cannot run program "/usr/bin/pdftoppm ": java.io.IOException: error=2, No such file or directory
java.io.IOException: Cannot run program "/usr/bin/pdftoppm ": java.io.IOException: error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:459)
at java.lang.Runtime.exec(Runtime.java:593)
at java.lang.Runtime.exec(Runtime.java:466)
at org.dspace.app.mediafilter.XPDF2Thumbnail.getDestinationStream(XPDF2Thumbnail.java:251)
at org.dspace.app.mediafilter.MediaFilterManager.processBitstream(MediaFilterManager.java:668)
at org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:570)
at org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:520)
at org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilterManager.java:488)
at org.dspace.app.mediafilter.MediaFilterManager.applyFiltersAllItems(MediaFilterManager.java:427)
at org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:359)
Caused by: java.io.IOException: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
at java.lang.ProcessImpl.start(ProcessImpl.java:65)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)


I have verified that the XPDF tools, including pdftoppm, are in the /usr/bin directory.

Cheers,

Eric

--

Eric Kurzenberger
Digital Media Coordinator
Yale School of Architecture
180 York St.
New Haven, CT 06511
T: 203.436.4176
F: 203.432.7175





Kurzenberger, Eric

unread,
Aug 25, 2015, 1:09:25 PM8/25/15
to dspac...@lists.sourceforge.net
Just an update to this issue, it looks like there's a section missing from the DSpace documentation. I believe this dependency needs to be added to the pom.xml in the dspace-api directory:

<dependency>
<groupId>com.sun.media</groupId>
<artifactId>jai_imageio</artifactId>
<version>1.0_01</version>
</dependency>

Unfortunately, this doesn't fix my issue: I'm still getting the "java.io.IOException: Cannot run program "/usr/bin/pdftoppm ": java.io.IOException: error=2, No such file or directory" errors when running filter-media.

Anyone have any ideas? I've gone through the other installation steps in the documentation repeatedly, to no avail.

Cheers,

Eric
------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
DSpace-tech mailing list
DSpac...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL SERVICES COMPANY]

unread,
Aug 25, 2015, 1:09:27 PM8/25/15
to Kurzenberger, Eric, dspac...@lists.sourceforge.net
Hi Eric,
I was successfully able to install and use pdftotext, but got the same errors as you when trying to create thumbnails. I tried a bunch of different things, but alas was not able to get it working. I know some folks have gotten it working though, so I'm not sure what the problem is.
Sue

Stuart Lewis

unread,
Aug 25, 2015, 1:09:27 PM8/25/15
to Kurzenberger, Eric, dspac...@lists.sourceforge.net
Hi Eric,

When you installed XPDF, did it install the following application:

- /usr/bin/pdftoppm

The error message suggests that the script is missing.

Thanks,


Stuart Lewis
IT Innovations Analyst and Developer
Te Tumu Herenga The University of Auckland Library
Auckland Mail Centre, Private Bag 92019, Auckland 1142, New Zealand
Ph: 64 9 373-7599 x81928
http://www.library.auckland.ac.nz/

Kurzenberger, Eric

unread,
Aug 25, 2015, 1:09:28 PM8/25/15
to dspac...@lists.sourceforge.net
Hi Stuart,

Pdftoppm is indeed in /usr/bin, along with the other XPDF tools, pdfinfo and pdftotext. I ran /usr/bin/pdftoppm manually on a test file, and it converted my test pdf to a ppm successfully. But for some reason, Java's giving those errors when running the command. Is there some other path I need to set for Java to find it, besides in the dspace.cfg file and the pom.xml in dspace-api?

Cheers,

Eric

Stuart Lewis

unread,
Aug 25, 2015, 1:09:29 PM8/25/15
to Kurzenberger, Eric, dspac...@lists.sourceforge.net
Hi Eric,

> Pdftoppm is indeed in /usr/bin, along with the other XPDF tools,
> pdfinfo and pdftotext. I ran /usr/bin/pdftoppm manually on a test
> file, and it converted my test pdf to a ppm successfully. But for
> some reason, Java's giving those errors when running the command.
> Is there some other path I need to set for Java to find it, besides
> in the dspace.cfg file and the pom.xml in dspace-api?


That is good - so there is probably a problem with the command DSpace
is using. If you change your logging level to DEBUG (http://wiki.dspace.org/index.php/TechnicalFaq#Setting_logging_level_up_to_DEBUG
) you should see some statements in dspace.log along the lines of:

- "Running xpdf command: ..."

These may help to diagnose what is going wrong.

Larry Stone

unread,
Aug 25, 2015, 1:09:30 PM8/25/15
to Kurzenberger, Eric, dspac...@lists.sourceforge.net
These errors imply that the JVM cannot access the executable file.
Since the file exists, make sure the user under whose UID the JVM is
running has read and execute access to the file /usr/bin/pdftoppm (as
well as the /usr and /usr/bin directories, although those are usually
world-rx). Better yet, use "su" or "sudo" to assume the UID under
which the JVM is running and make sure you can actually run that
command.

Java IOExceptions have a way of conflating all file-access errors to
"no such file or directory", even when there's e.g. a permission
problem, which makes it more of a challenge to discover what is really
wrong.

-- Larry

Kurzenberger, Eric

unread,
Aug 25, 2015, 1:09:35 PM8/25/15
to dspac...@lists.sourceforge.net
Thanks for the responses. I checked the permissions on /usr/bin/pdftoppm and verified that I could run it as the dspace user, so it doesn't seem to be a permissions issue.

Turning on debug logging and run the filter-media command results in the several errors of this type in the log:

2009-10-21 09:26:43,478 ERROR org.dspace.app.mediafilter.XPDF2Text @ PDF conversion proc failed, returns=-1, file=/tmp/DSfilt27010.pdf

I verified that the dspace user has access to the /tmp directory as well (permissions on it are 777). The /tmp file doesn't contain any of the files shown in the log, so it looks like they're not being written.

Cheers,

Eric

Van Ly

unread,
Aug 25, 2015, 1:10:22 PM8/25/15
to Kurzenberger, Eric, dspac...@lists.sourceforge.net

On 22/10/2009, at 12:57 AM, Kurzenberger, Eric wrote:

>
>
> Turning on debug logging and run the filter-media command results
> in the several errors of this type in the log:
>
> 2009-10-21 09:26:43,478 ERROR org.dspace.app.mediafilter.XPDF2Text
> @ PDF conversion proc failed, returns=-1, file=/tmp/DSfilt27010.pdf
>
> I verified that the dspace user has access to the /tmp directory as
> well (permissions on it are 777). The /tmp file doesn't contain
> any of the files shown in the log, so it looks like they're not
> being written.
>

Try and perform the conversion in the DSpace/Java user's context with
all operations on files in /tmp . There may be an issue with the
filesystem for /tmp .

Good luck.

>

Van Ly
vly at usyd dot edu dot au





Kurzenberger, Eric

unread,
Aug 25, 2015, 1:10:50 PM8/25/15
to dspac...@lists.sourceforge.net
Thanks for the response, Van. I don't quite follow what you mean. I am running the filter-media command as the dspace user, but I'm not sure what is meant by the "all operations on files in /tmp" part, since the filter-media command seems to put the converted files there by default. Can you clarify?

Cheers,

Eric
Reply all
Reply to author
Forward
0 new messages