Color OCR information lost?

32 views
Skip to first unread message

rs_nuke

unread,
Jul 31, 2007, 3:48:41 PM7/31/07
to ocropus
Hello all, I and I must admit that the the ocropus project to be a
great one.

I wonder how I can get 'color' information, or if that is even
possible? I am using ocropus to ocr "linked" (hyperlinked) documents,
and hyperlinks are usually underlined and in color blue. I wonder how
I cannot lose this information. Is there a way to know what words were
underlined? or in blue color?

Ilya Mezhirov

unread,
Aug 3, 2007, 11:43:29 AM8/3/07
to ocropus
Hello,

Thanks for your interest.
Currently the color information is just lost and there are no plans
currently to get it through.

Ilya

Thomas Breuel

unread,
Aug 7, 2007, 1:48:32 PM8/7/07
to ocr...@googlegroups.com
Well, there are plans to support color and other font properties at some point, just not in the upcoming releases.

If you really need it, it's not that hard to write a post-processor that takes the hOCR output, reads the corresponding image, determines the color information from the image for the bounding boxes, and adds it to the hOCR info.  It's probably a few hundred lines of Python code (using PIL, DOM, and building on some of the hOCR sample tools in Python).

Cheers,
Thomas.

Nitin Shinde

unread,
Aug 7, 2012, 11:18:59 PM8/7/12
to ocr...@googlegroups.com
Is there any updates regarding support for color and other properties ??
Reply all
Reply to author
Forward
0 new messages