Guidance for not recognized text

58 views
Skip to first unread message

Jean-Marc Spaggiari

unread,
Oct 1, 2020, 8:58:49 AM10/1/20
to tesseract-ocr
Hi,

I'm playing around with Tesseract to try to do some OCR on screen captures.

My picture looks like this:
name.png

But is recognized like this:
Eglise Chrétienne Evangélique de
sy oan 8)=1=

Place Je Me Souviens, Laval, QC H7L 1T9,
‘Tate lale|

Long lines are fine, but short are definitely not. So I tried to split the picture per line. The last line now looks like this:
text_0277.png

But "tesseract filename.png out" gives me an empty output file without any text in it. Long lines are still fine even when there is just one line per file. Any idea why?

Thanks,

JMS

Lorenzo Bolzani

unread,
Oct 1, 2020, 12:59:09 PM10/1/20
to tesser...@googlegroups.com
Invert the image.



--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/9488a325-b90b-4bd4-ad2e-ecabe6801b24n%40googlegroups.com.

Jean-Marc Spaggiari

unread,
Oct 1, 2020, 3:55:32 PM10/1/20
to tesseract-ocr
I was curious as why it works super well for some white and black, and not at all for others. I will try the invertion.

Thanks,

JMS

Zdenko Podobny

unread,
Oct 2, 2020, 5:29:12 AM10/2/20
to tesser...@googlegroups.com
https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html

Algorithm responsible for providing OCR results for "inverted images" is not reliable in tesseracrt >=4 (or LSTM engine only?)...

Zdenko


št 1. 10. 2020 o 21:55 Jean-Marc Spaggiari <jean...@spaggiari.org> napísal(a):
Reply all
Reply to author
Forward
0 new messages