Train new font using Tesseract 5 with legacy tessdata 3.0.5

Kehinde Adeoya

unread,

May 19, 2022, 10:12:42 AM5/19/22

to tesseract-ocr

Are the tutorials where it is detailed on how to train a new font using the latest Tesseract-5 and Tessdata-3.0.5? I have not found any till date for over 2 months.

Zdenko Podobny

unread,

May 20, 2022, 5:32:12 AM5/20/22

to tesser...@googlegroups.com

Can you please clarify what exactly you want to do / achieve? Training LSTM model or legacy model?

Zdenko

št 19. 5. 2022 o 16:12 Kehinde Adeoya <kehind...@gmail.com> napísal(a):

Are the tutorials where it is detailed on how to train a new font using the latest Tesseract-5 and Tessdata-3.0.5? I have not found any till date for over 2 months.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/24845d09-8f1a-4dc0-8ba8-dc32463be06an%40googlegroups.com.

Kehinde Adeoya

unread,

May 20, 2022, 8:17:26 AM5/20/22

to tesseract-ocr

Thanks, @Zdenko

I have newly trained new fonts successfully. I trained Ubuntu and Inter fonts. I am using Tesseract 3.0.5, and Tessdata-3.0.4.

1. I noticed Tesseract does not recognize them, but kept returning a strange name for the fonts. It returned the 1809_Homer font name for Ubuntu, and kept me wondering if there is anything wrong with the training.

2. Secondly, Tesseract seems not to be able to differentiate between font-weight: 700, and font-weight: bold. These are the same, but Tesseract sees font-weight: 700 as a normal font. What can I do to remedy this?

Zdenko Podobny

unread,

May 22, 2022, 10:22:17 AM5/22/22

to tesser...@googlegroups.com

Tesseract is OCR engine and not a font recognition tool. From your post I got the impression you tried to (mis)use tesseract for a purpose, it was not designed...

Zdenko

pi 20. 5. 2022 o 14:17 Kehinde Adeoya <kehind...@gmail.com> napísal(a):

To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/0056318d-fd36-4838-bbc1-e66eaa76f2f7n%40googlegroups.com.

Reply all

Reply to author

Forward