Tesseract OCR can't recognize basic alphanumeric codes

74 views
Skip to first unread message

Michael Studebaker

unread,
Sep 20, 2017, 5:17:29 AM9/20/17
to tesseract-ocr
See https://stackoverflow.com/questions/46009161/tesseract-ocr-cant-recognize-basic-alphanumeric-codes

One answer suggests training Tesseract, but I am skeptical that this is the right solution, because I'm not introducing any new fonts, and I want Tesseract to be able to recognize these alphanumeric characters for any font.
I'm looking for opinions to either persuade or dissuade me from training Tesseract.

Thanks,
Michael

ShreeDevi Kumar

unread,
Sep 20, 2017, 10:02:31 AM9/20/17
to tesser...@googlegroups.com
works for me with tesseract4.0alpha (latest code) - see attached.

Recognition is better with tessdata_best/Latin.traineddata  --oem 1 --psm 6

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/47396dd9-40c1-4e48-81bd-66d9df8367b0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jotJu-eng.txt
jotJu-Latin.txt
Reply all
Reply to author
Forward
0 new messages