Problem Recognizing Non Words

83 views
Skip to first unread message

Pedro Moreira

unread,
Dec 8, 2015, 8:33:07 AM12/8/15
to tesseract-ocr
Hello i am trying to recognize some text codes and i'm having some problems.
One example:
i have a paper sheet with the string:
"5XqaLB"
when showing it all or some part to the camera the results are:
(original -> result)

"5XqaLB" -> "5anLB"
"XqaLB" -> "anLB"
"qaLB" -> "qaLB"
"5Xq" -> "5Xq"
"5Xqa" -> "5an"

The API replaces "Xqa" by "an"

I found this problem with this string but there may be others with similar results.
i assume this is somehow related to the API trying to guess the word.
So, my question is: how do i prevent tesseract from replace the chars it detects?

Thanks

Riasat Al Jamil

unread,
Dec 10, 2015, 2:23:09 AM12/10/15
to tesseract-ocr
Looking for the same solution. I want to recognize per character basis and ignore dictionary/guessing of words (guessing character is fine).

Riasat Al Jamil

unread,
Dec 10, 2015, 2:23:09 AM12/10/15
to tesseract-ocr
Seems like this is the parameter to add if you do not want the dictionary:
 -c load_system_dawg=0



On Tuesday, December 8, 2015 at 7:33:07 PM UTC+6, Pedro Moreira wrote:
Reply all
Reply to author
Forward
0 new messages