low success recognition rate with self trained data

92 views
Skip to first unread message

Борис Щец

unread,
Oct 31, 2015, 11:22:03 AM10/31/15
to tesseract-ocr
Hi, i was going to recognize a shop tickets, but a success recognition rate is about 80% which is low, right? i'm generating a new traineddata based on letters i cutted of a real tickets. i generate a pseudo text map for training purposes, but the very different letters come out indistinguishable for tesseract and i'm run out of ideas how to achieve at least 99.9% success level with practically no noised image, so please, help me found out me mistake here.

i attached examples:
training image
box file for training image
trained data
input image example (single line)
output for input image
vivat.plain.exp0.tif
vivat.plain.exp0.box
vivat.traineddata
23.png
output.txt

Борис Щец

unread,
Oct 31, 2015, 4:10:38 PM10/31/15
to tesseract-ocr
in addition i must mention high recognition error rate when a printer faults occur
10.png
12.png
9.png
output.txt

Борис Щец

unread,
Oct 31, 2015, 4:23:01 PM10/31/15
to tesseract-ocr
also have place a false positive errors when an actual few pixels (or just a one pixel) are recognized as upper letter or a digit when obviously there was no symbol
5.png
20.png
output.txt
Reply all
Reply to author
Forward
0 new messages