Tesseract Strangely Thinks Text is Upside Down - ACCURACY

341 views
Skip to first unread message

Umut Barış Korkut

unread,
Oct 10, 2019, 6:47:16 AM10/10/19
to tesseract-ocr
Hey,

Tesseract sometimes thinks all the text in the page is upside down. 
For example the text "MOM" is recognized as "WOW" by the tesseract. 
Similarly "GENERAL NOTES" is recognized as "SALON IWYANSAD".

How can I fix this is there any suggestions?

I have attached 2 similar images here, tesseract is very successful on one of them and extremely awful on the other one.

I tried with both tesseract 4 and 5 using psm 12 mode.

Zdenko Podobny

unread,
Oct 12, 2019, 7:27:29 AM10/12/19
to tesser...@googlegroups.com
Do not use psm 12. Default psm seems to work.

Zdenko


št 10. 10. 2019 o 12:47 Umut Barış Korkut <umut....@gamyte.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/de5c596d-79c9-4314-93fb-7c3f9b0ffb31%40googlegroups.com.

Umut Barış Korkut

unread,
Oct 17, 2019, 2:12:26 PM10/17/19
to tesseract-ocr
Default psm works with these two pages but it does not work with the other pages of the document because they have tables and vertical text.

Is it possible to give the orientation of the page to tesseract or Is it possible to disable detection of upside down text?



On Saturday, October 12, 2019 at 2:27:29 PM UTC+3, zdenop wrote:
Do not use psm 12. Default psm seems to work.

Zdenko


št 10. 10. 2019 o 12:47 Umut Barış Korkut <umut....@gamyte.com> napísal(a):
Hey,

Tesseract sometimes thinks all the text in the page is upside down. 
For example the text "MOM" is recognized as "WOW" by the tesseract. 
Similarly "GENERAL NOTES" is recognized as "SALON IWYANSAD".

How can I fix this is there any suggestions?

I have attached 2 similar images here, tesseract is very successful on one of them and extremely awful on the other one.

I tried with both tesseract 4 and 5 using psm 12 mode.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.

Lorenzo Bolzani

unread,
Oct 17, 2019, 2:28:30 PM10/17/19
to tesser...@googlegroups.com

Maybe a problem with the exif rotation data?

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/190ac479-a4e5-427e-90b6-5928c7540483%40googlegroups.com.

Umut Barış Korkut

unread,
Oct 18, 2019, 2:23:15 AM10/18/19
to tesseract-ocr
I've checked the exif rotation data. Two pages are identical. I also tried tiff and png with different DPIs, same result.
Reply all
Reply to author
Forward
0 new messages