PDF text in Windows: digitalobject:extract-text and digitalobject:regen-derivatives

206 views
Skip to first unread message

Gomes Silva

unread,
Apr 20, 2015, 9:14:49 PM4/20/15
to ica-ato...@googlegroups.com
Hello,

Windows 7 x64 user here.

I’m using the latest WAMP (v2.5) x64 and ATOM v2.1.2. There are no problems importing/exporting descriptions, no problems importing multiple digital objects, no problems having thumbnails in images and PDF, so I can say Atom is working very well under Windows.

I’m only having trouble in two things:

1.    Enabling thumbnails in videos: Although ffmpeg is included inside ImageMagick-6.9.1-1-Q16-x64-dll.exe I still downloaded the latest ffmpeg-20150420-git-82d9c4e-win64-shared.7z and set environment variables path to its folder so I can use ffmpeg.exe.

2.    Extracting text from PDF: This is the most important issue, since this feature is crucial. I have downloaded xpdfbin-win-3.04.zip (windows port of poppler-utils’ pdftotext) and set environment variables path to its folder, so I can use pdftotext.exe from any place.

Basically when I upload PDF files I get their thumbnail but OCR text is not extracted. If I use “php symfony digitalobject:extract-text” I get the error: ‘which’ not recognized as an internal or external command, operable program, or batch file.

Every php symfony command works fine except that and this other one: “php symfony digitalobject:regen-derivatives”. If I use this second command images and video files do OK, but next to PDF files will appear “‘‘which’ not recognized as an internal or external command, operable program, or batch file”, just like the first command. The funny thing is that although it gives an error message he actually seems to do the job correctly; “php symfony digitalobject:extract-text” on the other hand don't do anything.

Everything else is OK, so this ‘which’ error may have something to do with QubitDigitalObject.php line [exec('which pdftotext', $output, $status);] but this won’t help me much because I don’t known anything about PHP or similar.


I don’t have any programming skills, but ATOM 2.1.2 installation in Windows (installed several times using WAMP) was simple except for these two (not required) dependencies. Basically I don’t know how to install pdftotext (xpdf) and ffmpeg (thumbnails) in ATOM on Windows nor can I find any instructions on how to do that.


Thank you very much for your time and support. I hope to get some enlightenment.

Jesús García Crespo

unread,
Apr 21, 2015, 2:32:46 PM4/21/15
to ica-ato...@googlegroups.com
Hi Gomes,

Unfortunately AtoM is making some assumptions about the location of these tools that are only valid in UNIX environments. For example, the tool `which` is not available under Windows (as far as I know) and binaries like ffmpeg or pdftotext are not suffixed with the ".exe" extension.

We focus our development efforts on Linux environments. In particular, we recommend Ubuntu 14.04. However, we would be happy to merge code contributions from the community in order to improve the support of other platforms. You can also contact Artefactual Systems or other developers to do this work for you.

Regards,

--
Jesús García Crespo,
Software Engineer, Artefactual Systems Inc.
http://www.artefactual.com | +1.604.527.2056

Gomes Silva

unread,
Apr 22, 2015, 9:09:46 AM4/22/15
to ica-ato...@googlegroups.com
Hello,

Thank you very much for your reply!

So, until this date (April 2015) we can be assured that AtoM it is fully functional in MS Windows except for PDF text extraction and video thumbnails.

If by any chance developers want to fix those 2 limitations check:
- ffmpeg for windows: http://ffmpeg.zeranoe.com/builds/
- pdftotext for windows: http://www.foolabs.com/xpdf/download.html

Thank you again for your help, I will try to install AtoM in Lubunto or Ubunto.

Best regards
Reply all
Reply to author
Forward
0 new messages