Errors when numeric and alphabetic data is mixed

47 views
Skip to first unread message

ilochray

unread,
Dec 14, 2018, 5:14:00 PM12/14/18
to tesseract-ocr
I am using the API to read data from an image. I have created training files for the fonts I process and I pre-process the image to deskew and clean it.
When I read entirely numeric data it reads perfectly e.g. 123456.
When I read entirely alphabetic data it reads perfectly e.g. ABCDEFGH.
The problem arises when I try to read text where the two are combined e.f. 12ABC3456. In this case, there are lots of errors (B and 8 mixed up for example).
I have tried setting load_system_dawg and load_freq_dawg to be false but that did not help. Are there any other configuration changes I can make to help?

Shree Devi Kumar

unread,
Dec 14, 2018, 8:09:56 PM12/14/18
to tesser...@googlegroups.com
Try to include mixed data in your training files and see if that helps.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/c9f291a5-0051-4d55-9a89-c5870838f49d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ian Lochray

unread,
Dec 15, 2018, 10:47:30 AM12/15/18
to tesser...@googlegroups.com
thank you for your reply. 

I have tried that. in fact I used the files I am trying to read to generate the training data . 

Ian  

You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/1Ssf3DgRtHo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-oc...@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
Reply all
Reply to author
Forward
0 new messages