Problem ingesting pdf files. ImageMagick issues.

859 views
Skip to first unread message

Newton Kyari

unread,
May 21, 2017, 1:51:00 PM5/21/17
to islandora
Hi guys,

I am back once again. I am having issues ingesting pdf files. 

I have installed the following addons;

  • ImageMagick-7.0.5-Q16
  • XPDF 3.04 for pdftotext
  • Ghostscript 9.21
  • Executable found at C:\Program Files (x86)\Tesseract-OCR\tesseract.exe
  • Required Version: 3.02

I can not upload a pdf without having the following errors. 
  • Debug: ImageMagick command:
    start "ImageMagick" /D "J:\Web Projects\NAKR" /B "C:\Program Files\ImageMagick-7.0.5-Q16\convert.exe" "C:\Windows\Temp\narc-40.OBJ.pdf[0]" -quality "75" -resize "200x200" -colorspace RGB -flatten "jpg:C:\Windows\Temp/narc-40.OBJ.TN.jpg"
    in _imagemagick_convert_exec() (line 495 of J:\Web Projects\NAKR\sites\all\modules\imagemagick\imagemagick.module).
  • Debug: ImageMagick error:
    convert.exe: unable to load module 'C:\Program Files\ImageMagick-7.0.5-Q16\modules\coders\IM_MOD_RL_PNG_.dll': The specified module could not be found.
    
     @ error/module.c/OpenModule/1266.
    convert.exe: no decode delegate for this image format `PNG' @ error/constitute.c/ReadImage/509.
    convert.exe: PDFDelegateFailed `[ghostscript library 9.21] -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72" -dFirstPage=1 -dLastPage=1 "-sOutputFile=C:/WINDOWS/Temp/magick-15532Z_-WJxQUc4Df%d" "-fC:/WINDOWS/Temp/magick-15532RmFaQk1YlIFh" "-fC:/WINDOWS/Temp/magick-15532E2Mp-w0gb50m": (null)' @ error/pdf.c/ReadPDFImage/793.
    convert.exe: no images defined `jpg:C:\Windows\Temp/narc-40.OBJ.TN.jpg' @ error/convert.c/ConvertImageCommand/3254.
    
    in _imagemagick_convert_exec() (line 500 of J:\Web Projects\NAKR\sites\all\modules\imagemagick\imagemagick.module).
  • Debug: ImageMagick command:
    start "ImageMagick" /D "J:\Web Projects\NAKR" /B "C:\Program Files\ImageMagick-7.0.5-Q16\convert.exe" "C:\Windows\Temp\narc-40.OBJ.pdf[0]" -quality "75" -resize "500x700" -colorspace RGB -flatten "jpg:C:\Windows\Temp/narc-40.OBJ.PREVIEW.jpg"
    in _imagemagick_convert_exec() (line 495 of J:\Web Projects\NAKR\sites\all\modules\imagemagick\imagemagick.module).
  • Debug: ImageMagick error:
    convert.exe: unable to load module 'C:\Program Files\ImageMagick-7.0.5-Q16\modules\coders\IM_MOD_RL_PNG_.dll': The specified module could not be found.
    
     @ error/module.c/OpenModule/1266.
    convert.exe: no decode delegate for this image format `PNG' @ error/constitute.c/ReadImage/509.
    convert.exe: PDFDelegateFailed `[ghostscript library 9.21] -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72" -dFirstPage=1 -dLastPage=1 "-sOutputFile=C:/WINDOWS/Temp/magick-17028KVXAP8hnc35K%d" "-fC:/WINDOWS/Temp/magick-17028WDD7D_1-E0D3" "-fC:/WINDOWS/Temp/magick-17028R24dWyyZqBJ7": (null)' @ error/pdf.c/ReadPDFImage/793.
    convert.exe: no images defined `jpg:C:\Windows\Temp/narc-40.OBJ.PREVIEW.jpg' @ error/convert.c/ConvertImageCommand/3254.
    
    in _imagemagick_convert_exec() (line 500 of J:\Web Projects\NAKR\sites\all\modules\imagemagick\imagemagick.module).
  • Derivatives successfully created.
  • "New Object" (ID: narc:40) has been ingested.

Error message

  • Warning: copy(C:\Windows\Temp/narc-40.OBJ.TN.jpg): failed to open stream: No such file or directory in NewFedoraDatastream->getContent() (line 885 of J:\Web Projects\NAKR\sites\all\libraries\tuque\Datastream.php).
  • Warning: copy(C:\Windows\Temp/narc-40.OBJ.PREVIEW.jpg): failed to open stream: No such file or directory in NewFedoraDatastream->getContent() (line 885 of J:\Web Projects\NAKR\sites\all\libraries\tuque\Datastream.php).

There is no pdf thumbnail. The image path is broken. 

I need help resolving this issue. Any help in the right direction is appreciated.

Kind regards.

dp...@metro.org

unread,
May 22, 2017, 3:49:57 PM5/22/17
to islandora
Hi, since you are running islandora on Windows, 
probably some of the needed decoders for imagemagick are not  being found, or not installed with the version you have deployed or you have a path issue.

I read there that a portable exe version exists and this is the download i found =)
More here

Best

Diego Pino
Metro.org

Newton Kyari

unread,
May 24, 2017, 9:40:56 AM5/24/17
to islandora

Thanks Diego,

I downgraded the version of ImageMagick I was using to ImageMagick-6.7.7-Q16. Thumbnails are now created which is 1 part of the problems I was facing. The other part is after Ingesting a pdf file I can not search for it. I can search for other document formats like images, audios, and videos but not a pdf file. Attached is a screenshot of the datastream of the PDF object. 


Even if I try to regenerate the FULL_TEXT object. No difference.

Kind regards

dp...@metro.org

unread,
May 26, 2017, 10:02:40 AM5/26/17
to islandora
Hi,

Good! You got that one solved.
About text search. Your full text datastream is -1Bytes long, which is the not-friendly way of islandora to state that the datastream is empty.
Are `pdftotext` and `gs` installed correctly and the paths also set correctly as the README.md file says?  ( at admin/islandora/solution_pack_config/pdf)

This is the code that runs that derivative (text extraction)
so in case that process fails you should see logs (warnings) in your drupal log

Also, not every PDF can have text extracted, this is just a vectorial text to chars-text procedure (images with printed into the image text won´t get processed),  this is not OCR. If you need OCR, then you need the OCR solution pack, all the dependencies and you will probably want to go for book solution pack instead.

Cheers


Diego Pino N
Metro.org
Reply all
Reply to author
Forward
0 new messages