Incorrect text detection

95 views
Skip to first unread message

Thomas McGrew

unread,
Aug 10, 2025, 3:41:43 AMAug 10
to tesseract-ocr
I'm trying to understand why tesseract is detecting this text incorrectly.

--oem 0 has issues with italics, so I've been using --oem 1, however on this one image (that I've noticed so far), it seems to be totally incorrect.

The image clearly contains only the text "'Kaay."
Yet tesseract reads the text with --oem 1 as "LECEVA"
--oem 0 does read the text correctly.

I'm using the default psm of 3, but no others I have tried seem to read the text correctly.
001951.png

Zdenko Podobny

unread,
Aug 10, 2025, 3:49:27 AMAug 10
to tesser...@googlegroups.com
You checked the documentation first, didn’t you?

Zdenko


ne 10. 8. 2025 o 9:41 Thomas McGrew <tjmc...@gmail.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/017ef73f-c695-4a06-819a-9f2b46ab3e89n%40googlegroups.com.

Thomas McGrew

unread,
Aug 10, 2025, 4:25:23 AMAug 10
to tesser...@googlegroups.com
I read the man page and the command line help, unless you're referring to some other documentation, then yes I read it.

Thomas McGrew

--
You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/TRLTSbSg_30/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-oc...@googlegroups.com.

Zdenko Podobny

unread,
Aug 10, 2025, 5:18:53 AMAug 10
to tesser...@googlegroups.com

ne 10. 8. 2025 o 10:25 Thomas McGrew <tjmc...@gmail.com> napísal(a):
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/CAM3xfkfm0iXN_ZAmdu84vqEuwQ1a3GF6wwGd5wL-AiMNPONUTg%40mail.gmail.com.

Thomas McGrew

unread,
Aug 10, 2025, 3:16:07 PMAug 10
to tesseract-ocr
I had looked through that some, but I looked again and I don't see anything in the documentation that addresses this problem. Is there something in particular in the documentation that I should read?

I know how to install the application, as I have already done so, I know how to run it - I'm mostly using it via pyocr, but the command line gives the same result. I have the models for OSD, English and Japanese installed. I have run tesseract on thousands of images like this, and 99% of the time it works fine.

If the model sometimes hallucinates and there is nothing to be done then that's fine and just something I'll have to work around. I did find that scaling an image to a different size does generally make tesseract read the text correctly when this happens, for whatever reason.

Zdenko Podobny

unread,
Aug 10, 2025, 3:44:50 PMAug 10
to tesser...@googlegroups.com

Thomas McGrew

unread,
Aug 10, 2025, 9:45:58 PMAug 10
to tesseract-ocr
You are correct, I did miss that section. Inverting the image seems to produce better results.

I think the fact that the images are simple and that the resulting text was not even close had me in the mindset that it wasn't a quality problem as much as an option I was missing somehow, so I was looking for something like that.

Anyway, thank you for pointing it out.
Reply all
Reply to author
Forward
0 new messages