Wrong or missing Segmentation of Words

86 views
Skip to first unread message

Thomas Zipproth

unread,
May 4, 2017, 8:12:32 AM5/4/17
to tesseract-ocr
We tried to read english documents with the C++ API (Tesseract 3 and 4),
but in most cases a lot of words are missing or the word rectangles are completely wrong.

In the attached example, you can see the missing words and a red marked wrong rectangle.
I tried different page segmentation methods and other parameters, but without success.

The document resolution is 300 dpi, it is a bit resized (smaller).







Tesseract_Segmentation.jpg

ShreeDevi Kumar

unread,
May 4, 2017, 11:21:12 AM5/4/17
to tesser...@googlegroups.com
Please provide your original image for testing. Thanks!

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/27b90c78-57ca-4361-9258-120b1ff99c9b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

puneet sinha

unread,
May 22, 2017, 2:26:28 AM5/22/17
to tesseract-ocr
Hi Thomas,

are you able to extract the text from the given image ? can you share the steps how you got all right and OCR worked correctly for you
Reply all
Reply to author
Forward
0 new messages