Unable to tesseract text from cropped image

71 views
Skip to first unread message

pts test

unread,
Aug 17, 2020, 8:11:06 AM8/17/20
to tesseract-ocr
Here, we have cropped image from main source. I can't able to extract the text and numbers properly. I have checked with tesseract-5 & 4, also increased the image resolution and have done with  pre-processing steps but can't able to get accurate ocr result.( inst.no and off rec 0719 page 0530 not extracted)
MTG_719_530_1.jpg

Zdenko Podobny

unread,
Aug 18, 2020, 3:07:38 AM8/18/20
to tesser...@googlegroups.com
Follow documentation:  https://github.com/tesseract-ocr/tessdoc/blob/master/ImproveQuality.md

e.g. you did not crop the image properly - there are still black folders, there are graphical elements (signature).

Also the last line with different fonts/size will fool OCR - you need to implement custom segmentation for this and OCR each segment separately.

Zdenko


po 17. 8. 2020 o 14:10 pts test <ptste...@gmail.com> napísal(a):
Here, we have cropped image from main source. I can't able to extract the text and numbers properly. I have checked with tesseract-5 & 4, also increased the image resolution and have done with  pre-processing steps but can't able to get accurate ocr result.( inst.no and off rec 0719 page 0530 not extracted)

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/5aeb0b91-ee35-402b-999e-01ae435b143fn%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages