Hi V.Lorz,
Firstly, it's Tesseract 3.02.02, not 3.2. We may release version 3.2
someday, but not for a long time yet ;)
Doing training is not going to help you, I'm afraid. The font is
quite standard, so you aren't going to be able to do a better job at
training Tesseract for it than the eng.traineddata provides.
Out of curiousity, why did you think that training would help you
here? I ask as it's a very common misconception, but (AFAIK) our
documentation doesn't imply it anywhere.
You may just have to accept that the accuracy from Tesseract won't
be 100%, I'm afraid. Maybe someone else here has suggestions, but
the image looks alright to me, so the general advice of "more
preprocessing" may not be helpful.
Nick
On Wed, Mar 26, 2014 at 11:10:56AM -0700, V.Lorz wrote:
> Hi All,
>
> I started integrating tesseract (version 3.2, EMGV) in a project for
> recognizing short texts in scanned images. Using some very simple image
> processing I extract the area of interest for speeding up the process.
>
> The errors I get are related to recognition results, tesseract sometimes
> confuses the digits '6' and '5', the image bellow is recognized as "4436695"
> instead of "4436696". I'm using the default eng.traineddata file bundled with
> the library. Using some other trained data files from around the Inet I got the
> same results with the same two digits (5 and 6). Before processing the image I
> configure tesseract to process only digits.
>
>
> [VwAAAAASUV]
>
> Does anyone know what could be causing this error? How could I solve it?
>
> I started reading the guide for training the engine (
http://code.google.com/p/
> tesseract-ocr/wiki/TrainingTesseract3) as suggested in some other threads, but
> it is of near to no help for me. Is there any other guide around for 'dummies'
> like [presummably :(] me? In this case I want to train it using one image that
> I created from 40 sampled documents (attached here). Using jTessBoxEditor-1.0 I
> was able to generate and correct the box file. What should I do next?
>
>
> Thanks a lot in advance, V.Lorz
>
>
> --
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to
tesser...@googlegroups.com
> To unsubscribe from this group, send email to
>
tesseract-oc...@googlegroups.com
> For more options, visit this group at
>
http://groups.google.com/group/tesseract-ocr?hl=en
>
> ---
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to
tesseract-oc...@googlegroups.com.
> For more options, visit
https://groups.google.com/d/optout.