Trying to get the bounding boxes of all recognized words using python-tesseract

632 views
Skip to first unread message

Mrinmoy Nath

unread,
Jun 6, 2017, 4:43:47 AM6/6/17
to tesseract-ocr
Hi,

I am trying to extract each word from a .png image (converted from pdf documents).
Using Python 2.7 and tesseract-3.05 APIs.
But for few of the documents instead of drawing the bounding box around a word Tesseract is drawing the same for a larger area and missing some of the words.
I am using 1111.png as input. Also find the output in 1111_op.png.
Could you please help me out to understand what could be the reason. 

Regards,
Mrinmoy
 
1111.png
1111_op.png

Dhrumil Barot

unread,
Jan 4, 2020, 9:39:01 AM1/4/20
to tesseract-ocr
Did you solve this problem? I also have similar layout documents with handwritten digits.
Reply all
Reply to author
Forward
0 new messages