PDF woes: "pixRead: image file not found error"

1,577 views
Skip to first unread message

John Muccigrosso

unread,
Jun 23, 2016, 8:12:13 PM6/23/16
to tesseract-ocr
Recently installed tesseract and am having some trouble with PDFs. The error is some form of:

Error in fopenReadStream: file not found
%���� in pixRead: image file not found: %PDF-1.3
%���� cannot be read!
Error during processing.

where the 1.3 may be 1.4 or 1.6. Things are fine with a jpg or tiff version of the same PDF (created by exporting from Preview.app).

System: Mac OS X 10.9.5.
"tesseract -v" reports:

tesseract 3.04.01
 leptonica
-1.72
  libjpeg
8d : libpng 1.6.23 : libtiff 4.0.6 : zlib 1.2.5


I installed tesseract and leptonica with homebrew and "brew info tesseract" reports:

tesseract: stable 3.04.01 (bottled), HEAD
OCR
(Optical Character Recognition) engine
https
://github.com/tesseract-ocr/
/usr/local/Cellar/tesseract/3.04.01_1 (93 files, 39.5M) *
 
Poured from bottle on 2016-05-27 at 15:41:15
From: https://github.com/Homebrew/homebrew-core/blob/master/Formula/tesseract.rb
==> Dependencies
Required: leptonica
Recommended: libtiff
==> Options
--with-all-languages
 
Install recognition data for all languages
--with-opencl
 
Enable OpenCL support
--with-training-tools
 
Install OCR training tools
--without-libtiff
 
Build without libtiff support
--HEAD
 
Install HEAD version


I suspect some missing package or something similar, but don't know what exactly.

TIA.

Quan Nguyen

unread,
Jun 23, 2016, 9:09:53 PM6/23/16
to tesseract-ocr
Tesseract cannot read PDF (which is a document format) directly. You'll need to convert it to an image format first.

John Muccigrosso

unread,
Jun 24, 2016, 10:27:52 AM6/24/16
to tesseract-ocr
Ugh, of course. Thanks! 
Reply all
Reply to author
Forward
0 new messages