Tesseract unable to recognise Ubuntu and Inter fonts, it returned - 1809_Homer

42 views
Skip to first unread message

Kehinde Adeoya

unread,
May 20, 2022, 9:00:35 AM5/20/22
to tesseract-ocr
I have newly trained new fonts successfully. I trained Ubuntu and Inter fonts. I am using Tesseract 3.0.5, and Tessdata-3.0.4.

1. I noticed Tesseract does not recognize them, but kept returning a strange name for the fonts. It returned the 1809_Homer font name for Ubuntu, and Inter. This kept me wondering if there is anything wrong with the training.
2. Secondly, Tesseract seems not to be able to differentiate between font-weight: 700, and font-weight: bold. These are the same, but Tesseract sees font-weight: 700 as a normal font. What can I do to remedy this?

This is how I trained the new tessdata
PANGOCAIRO_BACKEND=fc sh tesstrain.sh --fontlist "Ubuntu" "Ubuntu Bold" "Ubuntu Bold Italic" "Ubuntu Italic" "Ubuntu Light" "Ubuntu Light Italic" "Ubuntu Medium" "Ubuntu Medium Italic" "Inter" "Inter Bold" "Inter Heavy" "Inter Light" "Inter Medium" "Inter Semi-Bold" "Inter Ultra-Bold" "Inter weight=250" --fonts_dir /Library/Fonts --lang nld --langdata_dir /tessapp/langdata --output_dir /fonts/samples --training_text /tessapp/langdata/nld/nld.training_text --tessdata_dir /tessapp/tesseract-3.05.02/tessdata --langdata_dir /tessapp/langdata

I got this as the output
nld.traineddata
Reply all
Reply to author
Forward
0 new messages