hOCR bbox viewer?

2,558 views
Skip to first unread message

matthew christy

unread,
Oct 6, 2013, 3:26:58 PM10/6/13
to tesser...@googlegroups.com
Does anyone know about a tool that already exists that allows you to see all the bounding boxes identified in Tesseract's hOCR output on one page?

Thanks,
Matt

Dev Doshi

unread,
Oct 6, 2013, 7:26:46 PM10/6/13
to tesser...@googlegroups.com
Hi Matt,

I think you might find what you are looking for here:
https://code.google.com/p/tesseract-ocr/wiki/AddOns

Otherwise, you could code a simple jQuery script that traverses the
hOCR output and draws the boxes and the recognized text in HTML,
optionally drawing the page image behind the boxes. (I'd do it for a
fee :))

Hope this helps,
Dev
> --
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesser...@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-oc...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>
> ---
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-oc...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

Nick White

unread,
Oct 7, 2013, 5:24:57 AM10/7/13
to tesser...@googlegroups.com
Hi Matt,

On Sun, Oct 06, 2013 at 12:26:58PM -0700, matthew christy wrote:
> Does anyone know about a tool that already exists that allows you to see all
> the bounding boxes identified in Tesseract's hOCR output on one page?

I wrote a very simple tool that does that called hocrdraw. It's in
my 'tesstrainingtools' repository:
https://gitorious.org/ancient-greek-training-for-tesseract/tesstrainingtools

Nick

Zeth Weissman

unread,
Oct 13, 2016, 9:59:28 AM10/13/16
to tesseract-ocr
Better late than never, but found this tool that will do what you want. 


You just need to rename your hocr or html file (depending on version of tesseract) to xml.

weish...@gmail.com

unread,
Mar 1, 2018, 4:54:04 AM3/1/18
to tesseract-ocr
Thank you for sharing!
Reply all
Reply to author
Forward
0 new messages