tif format is 4; unreadable

40 views

Skip to first unread message

Tony Jones

unread,

Oct 18, 2016, 2:43:59 AM10/18/16

to tesseract-ocr

Is there a solution to this, or am I going to have to dig into the sources? Thanks!

in.tif: https://drive.google.com/open?id=0B4f6QpD8ItHyYmdYLWF1WGRFSTQ
[the actual TIF is nothing you'd ever want to OCR but the error below impedes batch conversion of the document]

$ file in.tif
in.tif: TIFF image data, little-endian, direntries=16, height=2558, bps=1, compression=none, PhotometricIntepretation=BlackIsZero, orientation=upper-left, width=1667

$ tesseract in.tif out -l eng pdf
Tesseract Open Source OCR Engine v3.04.01 with Leptonica
Page 1
Too few characters. Skipping this page
OSD: Weak margin (0.00) for 4 blob text block, but using orientation anyway: 0
Error in fopenWriteStream: stream not opened
Error in pixWrite: stream not opened
Error in fopenReadStream: file not found
Error in extractG4DataFromFile: stream not opened to file
Error in l_generateG4Data: datacomp not extracted
Error in pixGenerateCIData: g4 data not made
Error in l_generateCIDataForPdf: file in.tif format is 4; unreadable
Error during processing.

$ tesseract -v
tesseract 3.04.01
leptonica-1.73
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.0) : libpng 1.6.25 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.5.0 : libopenjp2 2.1.0

Reply all

Reply to author

Forward

0 new messages