Problem Recognizing Non Words

Pedro Moreira

unread,

Dec 8, 2015, 8:33:07 AM12/8/15

to tesseract-ocr

Hello i am trying to recognize some text codes and i'm having some problems.
One example:
i have a paper sheet with the string:
"5XqaLB"
when showing it all or some part to the camera the results are:
(original -> result)

"5XqaLB" -> "5anLB"
"XqaLB" -> "anLB"
"qaLB" -> "qaLB"
"5Xq" -> "5Xq"
"5Xqa" -> "5an"

The API replaces "Xqa" by "an"

I found this problem with this string but there may be others with similar results.
i assume this is somehow related to the API trying to guess the word.
So, my question is: how do i prevent tesseract from replace the chars it detects?

Thanks

Riasat Al Jamil

unread,

Dec 10, 2015, 2:23:09 AM12/10/15

to tesseract-ocr

Looking for the same solution. I want to recognize per character basis and ignore dictionary/guessing of words (guessing character is fine).

Riasat Al Jamil

unread,

Dec 10, 2015, 2:23:09 AM12/10/15

to tesseract-ocr

Seems like this is the parameter to add if you do not want the dictionary:

-c load_system_dawg=0

On Tuesday, December 8, 2015 at 7:33:07 PM UTC+6, Pedro Moreira wrote:

Reply all

Reply to author

Forward