Incorrect text detection

Thomas McGrew

unread,

Aug 10, 2025, 3:41:43 AM8/10/25

to tesseract-ocr

I'm trying to understand why tesseract is detecting this text incorrectly.

--oem 0 has issues with italics, so I've been using --oem 1, however on this one image (that I've noticed so far), it seems to be totally incorrect.

The image clearly contains only the text "'Kaay."

Yet tesseract reads the text with --oem 1 as "LECEVA"

--oem 0 does read the text correctly.

I'm using the default psm of 3, but no others I have tried seem to read the text correctly.

001951.png

Zdenko Podobny

unread,

Aug 10, 2025, 3:49:27 AM8/10/25

to tesser...@googlegroups.com

You checked the documentation first, didn’t you?

Zdenko

ne 10. 8. 2025 o 9:41 Thomas McGrew <tjmc...@gmail.com> napísal(a):

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/017ef73f-c695-4a06-819a-9f2b46ab3e89n%40googlegroups.com.

Thomas McGrew

unread,

Aug 10, 2025, 4:25:23 AM8/10/25

to tesser...@googlegroups.com

I read the man page and the command line help, unless you're referring to some other documentation, then yes I read it.

Thomas McGrew

--
You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/TRLTSbSg_30/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-oc...@googlegroups.com.

Zdenko Podobny

unread,

Aug 10, 2025, 5:18:53 AM8/10/25

to tesser...@googlegroups.com

https://github.com/tesseract-ocr/tessdoc

Zdenko

ne 10. 8. 2025 o 10:25 Thomas McGrew <tjmc...@gmail.com> napísal(a):

You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/CAM3xfkfm0iXN_ZAmdu84vqEuwQ1a3GF6wwGd5wL-AiMNPONUTg%40mail.gmail.com.

Thomas McGrew

unread,

Aug 10, 2025, 3:16:07 PM8/10/25

to tesseract-ocr

I had looked through that some, but I looked again and I don't see anything in the documentation that addresses this problem. Is there something in particular in the documentation that I should read?

I know how to install the application, as I have already done so, I know how to run it - I'm mostly using it via pyocr, but the command line gives the same result. I have the models for OSD, English and Japanese installed. I have run tesseract on thousands of images like this, and 99% of the time it works fine.

If the model sometimes hallucinates and there is nothing to be done then that's fine and just something I'll have to work around. I did find that scaling an image to a different size does generally make tesseract read the text correctly when this happens, for whatever reason.

Zdenko Podobny

unread,

Aug 10, 2025, 3:44:50 PM8/10/25

to tesser...@googlegroups.com

Seems like you miss this https://github.com/tesseract-ocr/tessdoc/blob/main/ImproveQuality.md...

Zdenko

ne 10. 8. 2025 o 21:16 Thomas McGrew <tjmc...@gmail.com> napísal(a):

To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/f3d99941-39ed-499c-8bd1-ad79d437c959n%40googlegroups.com.

Thomas McGrew

unread,

Aug 10, 2025, 9:45:58 PM8/10/25

to tesseract-ocr

You are correct, I did miss that section. Inverting the image seems to produce better results.

I think the fact that the images are simple and that the resulting text was not even close had me in the mindset that it wasn't a quality problem as much as an option I was missing somehow, so I was looking for something like that.

Anyway, thank you for pointing it out.

Reply all

Reply to author

Forward