A few characters being misrecognized

33 views
Skip to first unread message

Péter Györök

unread,
Jul 18, 2024, 11:12:43 PM (8 days ago) Jul 18
to tesseract-ocr
I'm using this command:
tesseract file.png - --psm 6 -l script/Latin

img1.png returns "JUCcCcsus" instead of "Juccsus".
img2.png returns "Bladé" instead of "Bladë".

Any suggestions on how to fix these?

img2.png
img1.png

Zdenko Podobny

unread,
Jul 26, 2024, 8:59:45 AM (24 hours ago) Jul 26
to tesser...@googlegroups.com
tesseract img1.png - --psm 6  -l fra
Juccsus

tesseract img2.png - --psm 6  -l fra
Bladë

Zdenko


pi 19. 7. 2024 o 5:12 Péter Györök <gyorok...@gmail.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/334c1f47-c957-431e-a5da-d9de11fd4531n%40googlegroups.com.

Péter Györök

unread,
Jul 26, 2024, 9:58:44 AM (23 hours ago) Jul 26
to tesseract-ocr
That might work for these particular ones but what if it breaks other cases (e.g. for characters not in French)? There is no way to programmatically know what language will be the best fit for each name when names can come from every language using the Latin alphabet (in fact they don't even have to come from the Latin alphabet but that's a whole other story).
Reply all
Reply to author
Forward
0 new messages