tif format is 4; unreadable

40 views
Skip to first unread message

Tony Jones

unread,
Oct 18, 2016, 2:43:59 AM10/18/16
to tesseract-ocr
Is there a solution to this,  or am I going to have to dig into the sources?    Thanks!

in.tif:  https://drive.google.com/open?id=0B4f6QpD8ItHyYmdYLWF1WGRFSTQ
[the actual TIF is nothing you'd ever want to OCR but the error below impedes batch conversion of the document]

$ file in.tif
in.tif: TIFF image data, little-endian, direntries=16, height=2558, bps=1, compression=none, PhotometricIntepretation=BlackIsZero, orientation=upper-left, width=1667

$ tesseract in.tif out -l eng pdf
Tesseract Open Source OCR Engine v3.04.01 with Leptonica
Page 1
Too few characters. Skipping this page
OSD: Weak margin (0.00) for 4 blob text block, but using orientation anyway: 0
Error in fopenWriteStream: stream not opened
Error in pixWrite: stream not opened
Error in fopenReadStream: file not found
Error in extractG4DataFromFile: stream not opened to file
Error in l_generateG4Data: datacomp not extracted
Error in pixGenerateCIData: g4 data not made
Error in l_generateCIDataForPdf: file in.tif format is 4; unreadable
Error during processing.

$ tesseract -v
tesseract 3.04.01
leptonica-1.73
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.0) : libpng 1.6.25 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.5.0 : libopenjp2 2.1.0

Reply all
Reply to author
Forward
0 new messages