Trying to get the bounding boxes of all recognized words using python-tesseract

632 views

Skip to first unread message

Mrinmoy Nath

unread,

Jun 6, 2017, 4:43:47 AM6/6/17

to tesseract-ocr

Hi,

I am trying to extract each word from a .png image (converted from pdf documents).

Using Python 2.7 and tesseract-3.05 APIs.

But for few of the documents instead of drawing the bounding box around a word Tesseract is drawing the same for a larger area and missing some of the words.

I am using 1111.png as input. Also find the output in 1111_op.png.

Could you please help me out to understand what could be the reason.

Regards,

Mrinmoy

1111.png

1111_op.png

Dhrumil Barot

unread,

Jan 4, 2020, 9:39:01 AM1/4/20

to tesseract-ocr

Did you solve this problem? I also have similar layout documents with handwritten digits.

Reply all

Reply to author

Forward

0 new messages