Hi Valent,
Sorry for taking a while to reply properly. This is the right place
for your questions. There's just rather more people asking questions
than answering here at the moment.
I'll reply to you inline.
On Tue, Jul 10, 2012 at 01:40:23AM -0700, Valent wrote:
> I'm trying to OCR my gas meter [1] usage, and I stumbled upon issue
> that tesseract doesn't recognize anything in some tif images, just
> gives "empty page".
>
> Has anybody had this issue?
I presume you're using Tesseract 2? I have Tesseract 2.04 installed
on my Debian Squeeze box, and ran it on the three images you link.
They all returned text, with your third, grayscale-clean, coming out
best.
I would guess that perhaps your tesseract isn't reading the Tiffs
properly. TIFF is a pretty diverse file format, and tesseract only
likes some of them. Is your Tesseract compiled with compressed TIFF
support? If you can, I recommend using Tesseract 3.01, linked to
the Leptonica library. That way you can use PNGs, which are much
easier to deal with and more reliable. Failing that, see if you can
get ImageMagick to produce something that your Tesseract will read
reliably. Something like this definitely ought to work:
convert in.png -monochrome -density 600 -compress none out.tif
> If possible I would like to use tesseract for automatically reading my
> gas meter usage, is this even possible?
Yes, and it looks like you're close. Good project, I like it :) Let
us know how you get on.
Best of luck,
Nick