hOCR output?

2,343 views
Skip to first unread message

Bill Janssen

unread,
Apr 21, 2011, 7:55:51 PM4/21/11
to tesseract-ocr
So, how do I enable the hOCR output mode in tesseract 3?

Dmitri Silaev

unread,
Apr 22, 2011, 12:23:13 AM4/22/11
to tesser...@googlegroups.com, bill.j...@gmail.com
You're still striving to know, since 2008 ))

Use this in a config file:
tessedit_create_hocr T

Warm regards,
Dmitri Silaev
www.CustomOCR.com

On Fri, Apr 22, 2011 at 3:55 AM, Bill Janssen <bill.j...@gmail.com> wrote:
> So, how do I enable the hOCR output mode in tesseract 3?
>

> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesser...@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-oc...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

Bill Janssen

unread,
May 6, 2011, 12:57:49 PM5/6/11
to tesseract-ocr
And invoke tesseract thusly:

tesseract page.tif output config-file

The hOCR will wind up in output.html.

Bill

Dmitri Silaev

unread,
May 7, 2011, 6:48:30 AM5/7/11
to tesser...@googlegroups.com
Yep, exactly. Without that "config-file" it would instead spit out
plain text into "output.txt"
Reply all
Reply to author
Forward
0 new messages