Seems like the dictionary isn't used

27 views
Skip to first unread message

צביקה הרמתי

unread,
Sep 15, 2022, 11:28:43 AM9/15/22
to tesseract-ocr
Hi.

1.
I've an image that's written in a "Science Fiction" style font, where 'E' is written similarly to '='.
Therefore, the attached image is recognized as 
"AR= YOU SURE YOU WANT TO QuIT >"

However, since Tesseract is using an English dictionary, I'd expect it to understand that "ARE" is much more likely than "AR=".

I assume this can be controlled by some configuration?

2.
I tried using https://www.newocr.com/ , which is based on Tesseract, and it correctly recognized it:
"ARE YOU SURE YOU WANT TO QUIT ?"
(I've erased the new line)

So, I assume it should be feasible.

3.
Note that https://www.newocr.com/ also correctly recognized the 'U' of "QUIT"' as uppercase, and also the ending question mark - I assume that's also can be achieved by vanilla Tesseract, the question is how?

Thanks,
Zvika
view_975_loop_0_cel_0.png
Reply all
Reply to author
Forward
0 new messages