Extract font size,style,colour from an image

4,025 views
Skip to first unread message

mandar bandodekar

unread,
May 19, 2017, 8:55:17 AM5/19/17
to tesseract-ocr
Hi ,
Is it possible to extract font colour, font style(Bold , italic), size using Tesseract-ocr?

Zdenko Podobný

unread,
May 19, 2017, 1:09:59 PM5/19/17
to tesser...@googlegroups.com
tesseract 3.05 (the current stable version) has ability to detect some font characteristic, but it is not perfect (e.g. not color detection because OCR is run on binarized images).
You can test with hocr or play with API(ResultIterator and WordFontAttributes).

Zdenko

On Fri, May 19, 2017 at 2:43 PM, mandar bandodekar <mandarba...@gmail.com> wrote:
Hi ,
Is it possible to extract font colour, font style(Bold , italic), size using Tesseract-ocr?

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/62cf8b4e-a6c8-484c-856c-d47bbae878ba%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

akhil katpally

unread,
May 19, 2017, 2:25:58 PM5/19/17
to tesseract-ocr
tesseract image.tiff image.txt -c tessedit_debug_fonts=1   ... this would give you the font type and the confidence of its font type.  

akhil katpally

unread,
May 19, 2017, 2:26:55 PM5/19/17
to tesseract-ocr
it gives at the character level ....

Zdenko Podobný

unread,
May 19, 2017, 2:36:08 PM5/19/17
to tesser...@googlegroups.com
Unfortunately there is possibility only to delete message - which I did a moment ago.

Zdenko

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

VANAM VISHAL

unread,
May 7, 2019, 7:31:44 AM5/7/19
to tesseract-ocr
I am using tesseract 3.05.00 stable version alongside tesserocr and couldn't use WordFontAttributes to check whether a word is bold, font-size etc.. But, I can find the text detection but not size and is bold?


On Friday, May 19, 2017 at 10:39:59 PM UTC+5:30, zdenop wrote:
tesseract 3.05 (the current stable version) has ability to detect some font characteristic, but it is not perfect (e.g. not color detection because OCR is run on binarized images).
You can test with hocr or play with API(ResultIterator and WordFontAttributes).

Zdenko

On Fri, May 19, 2017 at 2:43 PM, mandar bandodekar <mandarba...@gmail.com> wrote:
Hi ,
Is it possible to extract font colour, font style(Bold , italic), size using Tesseract-ocr?

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages