Same command for 2 files

20 views
Skip to first unread message

Jean-Marc Spaggiari

unread,
Sep 25, 2025, 6:31:58 PM (4 days ago) Sep 25
to tesser...@googlegroups.com
Hi,

I have 2 images pretty similar that I want to OCR.

image_1758836719_box0_score0_87.jpg
image_1758836841_box0_score0_87.jpg
I think they are both pretty good quality. To OCR the 2nd one I'm using this command:
tesseract image_1758836841_box0_score0_87.jpg stdout --dpi 600 --psm 7 -l eng

And I'm getting exactly what is in the picture.
However, the same command for the first picture doesn't return anything.

Now, if I change the command for this one:
tesseract image_1758836719_box0_score0_87.jpg stdout --dpi 600 -l eng

I'm getting some output with a lot of noise:
Detected 6 diacritics
— sl O

a e any aS |
Lightning Greaves


But for the Aurochs file I'm getting "Empty page!!". I have not been able to get a command working for both.

So I have a few questions here. 
  • Is there a way to say something like "try without PSM and if empty page try with psm 7"?
  • Is that possible to provide my own list of possible words to look for? Like, can I provide "Aurochs, Greaves, Lightning" and enforce the OCR to use only those possible words?

Thanks,

JM

Zdenko Podobny

unread,
Sep 28, 2025, 11:57:13 AM (yesterday) Sep 28
to tesser...@googlegroups.com
Hi,

But for the Aurochs file I'm getting "Empty page!!". I have not been able to get a command working for both.


Is there a way to say something like "try without PSM and if empty page try with psm 7"?

Tesseract is OCR engine (with simple image layout detection), so if you need to apply some logic you need to implement it by yourself.

Is that possible to provide my own list of possible words to look for? Like, can I provide "Aurochs, Greaves, Lightning" and enforce the OCR to use only those possible words?

Yes it is. Read documentation how. But the effect of customized dictionaries is very limited usually. 

Best regards,

Zdenko


pi 26. 9. 2025 o 0:31 Jean-Marc Spaggiari <jean...@spaggiari.org> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/CAPQV63Uuzf7%2Bro%3Dfi3ff_7cswa%3DjvMAA7nPaynSxP1ZVG_YQ2g%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages