Unable to tesseract text from cropped image

71 views

Skip to first unread message

pts test

unread,

Aug 17, 2020, 8:11:06 AM8/17/20

to tesseract-ocr

Here, we have cropped image from main source. I can't able to extract the text and numbers properly. I have checked with tesseract-5 & 4, also increased the image resolution and have done with pre-processing steps but can't able to get accurate ocr result.( inst.no and off rec 0719 page 0530 not extracted)

MTG_719_530_1.jpg

Zdenko Podobny

unread,

Aug 18, 2020, 3:07:38 AM8/18/20

to tesser...@googlegroups.com

Follow documentation: https://github.com/tesseract-ocr/tessdoc/blob/master/ImproveQuality.md

e.g. you did not crop the image properly - there are still black folders, there are graphical elements (signature).

Also the last line with different fonts/size will fool OCR - you need to implement custom segmentation for this and OCR each segment separately.

Zdenko

po 17. 8. 2020 o 14:10 pts test <ptste...@gmail.com> napísal(a):

Here, we have cropped image from main source. I can't able to extract the text and numbers properly. I have checked with tesseract-5 & 4, also increased the image resolution and have done with pre-processing steps but can't able to get accurate ocr result.( inst.no and off rec 0719 page 0530 not extracted)

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/5aeb0b91-ee35-402b-999e-01ae435b143fn%40googlegroups.com.

Reply all

Reply to author

Forward

0 new messages