FontAwesome and Tesseract

62 views
Skip to first unread message

Jason

unread,
May 21, 2019, 10:09:57 AM5/21/19
to tesseract-ocr
I would like to be able to detect shapes like those contained in FontAwesome. Take for example a gear: (https://fontawesome.com/icons?d=gallery&q=gear) This is unicode character \uf013
I think this would be as simple as training a font, via http://trainyourtesseract.com/, but this did not work. I am not sure why it failed, but any insight on how to do this would be appreciated. I am thinking the unicode range is the issue?
Also, I would be fundamentally training characters, not words. 

Thank you.

Jason

unread,
Jun 17, 2019, 2:34:37 PM6/17/19
to tesseract-ocr
Can I "bump" this? 

Even if I only get a high-level description of the process?
- How to make a box file (for v4) of unicode chars
- How to make the training size invariant?
Etc.

Many thanks!

Lorenzo Bolzani

unread,
Jun 18, 2019, 4:07:16 AM6/18/19
to tesser...@googlegroups.com

How many different chars do you need to detect? What is the size range (in pixels)? What kind of images, scans, smartphone pictures, screenshots?

If you just want to locate the symbols something like opencv matchTemplate may work. Or training an opencv/dlib hog detector may work better if the symbols are skewed and with complex lighting. Tesseract is not a text/symbols detector.

If you have multiple symbols use multiple patterns/train multiple detectors.


Bye

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/09a628f2-01a4-49fe-a8a5-55c17d44a4ce%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Shree Devi Kumar

unread,
Jun 20, 2019, 12:05:31 PM6/20/19
to tesser...@googlegroups.com

Font Awesome uses PUA Unicode range for the icons. So it did not work with text2image. I used other emoji fonts.

The script and training data used are also in the repo.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/09a628f2-01a4-49fe-a8a5-55c17d44a4ce%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
Reply all
Reply to author
Forward
0 new messages