low success recognition rate with self trained data

Борис Щец

unread,

Oct 31, 2015, 11:22:03 AM10/31/15

to tesseract-ocr

Hi, i was going to recognize a shop tickets, but a success recognition rate is about 80% which is low, right? i'm generating a new traineddata based on letters i cutted of a real tickets. i generate a pseudo text map for training purposes, but the very different letters come out indistinguishable for tesseract and i'm run out of ideas how to achieve at least 99.9% success level with practically no noised image, so please, help me found out me mistake here.

i attached examples:

training image

box file for training image

trained data

input image example (single line)

output for input image

vivat.plain.exp0.tif

vivat.plain.exp0.box

vivat.traineddata

23.png

output.txt

Борис Щец

unread,

Oct 31, 2015, 4:10:38 PM10/31/15

to tesseract-ocr

in addition i must mention high recognition error rate when a printer faults occur

10.png

12.png

9.png

output.txt

Борис Щец

unread,

Oct 31, 2015, 4:23:01 PM10/31/15

to tesseract-ocr

also have place a false positive errors when an actual few pixels (or just a one pixel) are recognized as upper letter or a digit when obviously there was no symbol

5.png

20.png

output.txt

Reply all

Reply to author

Forward