Tesseract convert bbox to css??

354 views
Skip to first unread message

fabio...@gmail.com

unread,
Sep 17, 2015, 2:12:20 PM9/17/15
to tesseract-ocr
Hello.

Im using Tess4j, a java wrapper for the Tesseract dll, and Im trying to convert an hocr result to css coordinates.

The reason Im doing this is that I need to position the ocr result over the original image, in a web page.

I noticed that hocr files contain a 'bbox' coordinate system:

<div class='ocr_page' id='page_1' title='image "c:\ocr.jpg"; bbox 0 0 827 1169; ppageno 0'>
   <div class='ocr_carea' id='block_1_1' title="bbox 98 35 747 78">
....
</div>

Is there a way to generate "position: absolute; top, left, width, height", based on the bbox information?

Thanks!

Helmut Wollmersdorfer

unread,
Sep 25, 2015, 2:41:05 AM9/25/15
to tesseract-ocr
Yes, I do this with JavaScript in the browser.

You need:
- the dimension of the image-file on which the bbox coordinates are based
- the dimensions of the image in the HTML-page (scalable, scrollable, ...)
- parse the bbox numbers
- calculate CSS positions and size
- add the style to each element

Then highlight, frame, render or whatever on mouseover, on click, on focus.

HTH

Helmut Wollmersdorfer
Reply all
Reply to author
Forward
0 new messages