early errors.

132 views
Skip to first unread message

John Culleton

unread,
Apr 29, 2014, 3:58:30 PM4/29/14
to tesser...@googlegroups.com
in my first attempt I get these messages:
bash-4.2$ tesseract 81.png 81.  
Tesseract Open Source OCR Engine v3.02.02 with Leptonica
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box

I get some output but most of it is garbage.

Two questions:

How do I get rid of the error messages?

What is the correct command line to start to train the program on this particular input?
I get output but it is mostly garbage.



John Culleton

John Culleton

unread,
Apr 29, 2014, 5:11:34 PM4/29/14
to tesser...@googlegroups.com
----------------------------------------------------------------------------------
Well an online video tutorial taught me one magic trick. I resized the file using imagemagick and put it in tiff form. 
convert 81.png resize 5000 81.tiff

This got rid of the error messages and gave me a close to correct result.

Now what is the next step in training tesseract?


John Culleton
81.txt

Nick White

unread,
Apr 29, 2014, 7:24:07 PM4/29/14
to tesser...@googlegroups.com
Hi John,

On Tue, Apr 29, 2014 at 02:11:34PM -0700, John Culleton wrote:
> Well an online video tutorial taught me one magic trick. I resized the file
> using imagemagick and put it in tiff form.
> convert 81.png resize 5000 81.tiff
>
> This got rid of the error messages and gave me a close to correct result.

Yes, the DPI is quite important with Tesseract. It's mentioned in
the FAQ too.

> Now what is the next step in training tesseract?

Follow the TrainingTesseract3 wiki instructions. Though if you're
not doing a different language, or radically different font, your
time will probably be much better spent following the advice on the
ImproveQuality wiki page.

Nick
Reply all
Reply to author
Forward
0 new messages