what am i missing? tesseract runs but no output

1,322 views
Skip to first unread message

Bob Kuo

unread,
Feb 18, 2011, 9:53:00 AM2/18/11
to tesser...@googlegroups.com
Hello all

Please forgive the newbie question. I've seen this posted several
times before, and I thought I had the right solution but apparently
not. Attached is a PNG that I'd like to run through tesseract. I
used ImageMagick's convert to change it into a tiff:

convert -density 200 -units PixelsPerInch test_page.png -type
Grayscale +compress test_input.tif

(I've also tried to do this at -density 300 with the same results)

The resulting TIF is attached. When I run it through tesseract I get
an output file that is one byte and is basically blank. Command and
output below.

tesseract test_input.tif output -l eng
Tesseract Open Source OCR Engine
Image has 8 * 1 bits per pixel, and size (375,350)
Resolution=200

I saw some other threads about a similar problem, but the solutions
were to scale it to 200 or 300 DPI, make sure it was in grayscale,
remove the alpha layer, and somewhere else it said it was fixed in
Tesseract 2.04. I'm using Tesseract 2.04 on Mac OS X 10.6.6 and
ImageMagick 6.6.7-1. Is my image just unsuitable for OCR-ing?

I appreciate any help.

Thanks,

Bob

test_page.png
test_input.tif

zdenko podobny

unread,
Feb 18, 2011, 10:57:25 AM2/18/11
to tesser...@googlegroups.com
Hi,

Just a quick reply: 
I tried it on Windows XP with tesseract 3.00 and it produced bad result (nothing usefull). 

InfranView informations dialog showed that image has resolution 72x72 DPI -> to low...
So I resampled  it (with Lanczos algorithm) from 100% to 300% size, set DPI to 300 and decreased number of color to 16 (in InfranView because I have no time to play with ImageMagick's options ;-) )...
Than OCR result was much more better with several mistakes (just quick check)...

So with  several image improvements  you can get good OCR result.

BR,

Zd.


--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com.
To unsubscribe from this group, send email to tesseract-oc...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.


Sriranga(78yrsold)

unread,
Feb 18, 2011, 11:22:25 AM2/18/11
to tesser...@googlegroups.com
I checked in FreeOCR(which has tess 3.01 alpha) and found to be in  order with few  minor mistakes.
With help of Irfanview - increased to 300dpi from 72dpi and saved as tif file(uncompressed) and tested.
What zdenko says  is correct.
-sriranga(78yrs)

Bob Kuo

unread,
Feb 18, 2011, 11:54:25 AM2/18/11
to tesseract-ocr
Thanks everyone! I tried it again, got a slightly different section
from the original PDF and saved it as a PNG with 200 DPI. Then I ran
convert with the following options:

convert -density 200 -units PixelsPerInch -type Grayscale +compress
test2.png test_input2.tif

I had to put in the -density 200 because without it the output went to
59 DPI even though the original PNG was at 200.

Yes, there are some minor errors but I'm quite happy with the output.

Again, thanks for everybody's help! I'll be writing a blog post up
about getting all this up and running on Mac OS X 10.6.6.

Bob

On Feb 18, 10:22 am, "Sriranga(78yrsold)" <withblessi...@gmail.com>
wrote:
> I checked in FreeOCR(which has tess 3.01 alpha) and found to be in  order
> with few  minor mistakes.
> With help of Irfanview - increased to 300dpi from 72dpi and saved as tif
> file(uncompressed) and tested.
> What zdenko says  is correct.
> -sriranga(78yrs)
>
> On Fri, Feb 18, 2011 at 9:27 PM, zdenko podobny <zde...@gmail.com> wrote:
> > Hi,
>
> > Just a quick reply:
> > I tried it on Windows XP with tesseract 3.00 and it produced bad result
> > (nothing usefull).
>
> > InfranView informations dialog showed that image has resolution 72x72 DPI
> > -> to low...
> > So I resampled  it (with Lanczos algorithm) from 100% to 300% size, set DPI
> > to 300 and decreased number of color to 16 (in InfranView because I have no
> > time to play with ImageMagick's options ;-) )...
> > Than OCR result was much more better with several mistakes (just quick
> > check)...
>
> > So with  several image improvements  you can get good OCR result.
>
> > BR,
>
> > Zd.
>

Quan Nguyen

unread,
Feb 18, 2011, 11:19:54 PM2/18/11
to tesseract-ocr
I ran test_page.png through VietOCR 3.1 with Screenshot Mode enabled
and got acceptable results back. Since it's a Java program, it
certainly can run on OS X, provided that you build the Tess engine.
And if Ghostscript is installed, VietOCR can read PDF too.

Sriranga(78yrsold)

unread,
Feb 19, 2011, 5:53:24 AM2/19/11
to tesser...@googlegroups.com
According to Irfanview, resolution of the png was 72 dpi. I checked with vietOCR after selecting screenshot mode under Image - result obtained appears to be in order vide attached txt file for ready reference
VietOCR supports screenshot mode images also.

-sriranga(78yrs).
output test-page-png.txt
Reply all
Reply to author
Forward
0 new messages