Train new font using Tesseract 5 with legacy tessdata 3.0.5

205 views
Skip to first unread message

Kehinde Adeoya

unread,
May 19, 2022, 10:12:42 AM5/19/22
to tesseract-ocr
Are the tutorials where it is detailed on how to train a new font using the latest Tesseract-5 and Tessdata-3.0.5? I have not found any till date for over 2 months.

Zdenko Podobny

unread,
May 20, 2022, 5:32:12 AM5/20/22
to tesser...@googlegroups.com
Can you please clarify what exactly you want to do / achieve? Training LSTM model or legacy model? 

Zdenko


št 19. 5. 2022 o 16:12 Kehinde Adeoya <kehind...@gmail.com> napísal(a):
Are the tutorials where it is detailed on how to train a new font using the latest Tesseract-5 and Tessdata-3.0.5? I have not found any till date for over 2 months.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/24845d09-8f1a-4dc0-8ba8-dc32463be06an%40googlegroups.com.

Kehinde Adeoya

unread,
May 20, 2022, 8:17:26 AM5/20/22
to tesseract-ocr
Thanks, @Zdenko
I have newly trained new fonts successfully. I trained Ubuntu and Inter fonts. I am using Tesseract 3.0.5, and Tessdata-3.0.4.
1. I noticed Tesseract does not recognize them, but kept returning a strange name for the fonts. It returned the 1809_Homer font name for Ubuntu, and kept me wondering if there is anything wrong with the training.
2. Secondly, Tesseract seems not to be able to differentiate between font-weight: 700, and font-weight: bold. These are the same, but Tesseract sees font-weight: 700 as a normal font. What can I do to remedy this?

Zdenko Podobny

unread,
May 22, 2022, 10:22:17 AM5/22/22
to tesser...@googlegroups.com
Tesseract is OCR engine and not a font recognition tool. From your post I got the impression you tried to (mis)use tesseract for a purpose, it was not designed...

Zdenko


pi 20. 5. 2022 o 14:17 Kehinde Adeoya <kehind...@gmail.com> napísal(a):
Reply all
Reply to author
Forward
0 new messages