Same command for 2 files

85 views

Skip to first unread message

Jean-Marc Spaggiari

unread,

Sep 25, 2025, 6:31:58 PM9/25/25

to tesser...@googlegroups.com

Hi,

I have 2 images pretty similar that I want to OCR.

I think they are both pretty good quality. To OCR the 2nd one I'm using this command:

tesseract image_1758836841_box0_score0_87.jpg stdout --dpi 600 --psm 7 -l eng

And I'm getting exactly what is in the picture.

However, the same command for the first picture doesn't return anything.

Now, if I change the command for this one:
tesseract image_1758836719_box0_score0_87.jpg stdout --dpi 600 -l eng

I'm getting some output with a lot of noise:
Detected 6 diacritics
— sl O

a e any aS |
Lightning Greaves

But for the Aurochs file I'm getting "Empty page!!". I have not been able to get a command working for both.

So I have a few questions here.

Is there a way to say something like "try without PSM and if empty page try with psm 7"?
Is that possible to provide my own list of possible words to look for? Like, can I provide "Aurochs, Greaves, Lightning" and enforce the OCR to use only those possible words?

Thanks,

Zdenko Podobny

unread,

Sep 28, 2025, 11:57:13 AM9/28/25

to tesser...@googlegroups.com

Hi,

But for the Aurochs file I'm getting "Empty page!!". I have not been able to get a command working for both.

invest some time to reading documentation: https://github.com/tesseract-ocr/tessdoc/blob/main/ImproveQuality.md

Is there a way to say something like "try without PSM and if empty page try with psm 7"?

Tesseract is OCR engine (with simple image layout detection), so if you need to apply some logic you need to implement it by yourself.

Is that possible to provide my own list of possible words to look for? Like, can I provide "Aurochs, Greaves, Lightning" and enforce the OCR to use only those possible words?

Yes it is. Read documentation how. But the effect of customized dictionaries is very limited usually.

Best regards,

Zdenko

pi 26. 9. 2025 o 0:31 Jean-Marc Spaggiari <jean...@spaggiari.org> napísal(a):

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/CAPQV63Uuzf7%2Bro%3Dfi3ff_7cswa%3DjvMAA7nPaynSxP1ZVG_YQ2g%40mail.gmail.com.

Reply all

Reply to author

Forward

0 new messages