could not find fonts

51 views
Skip to first unread message

Jingjing Lin

unread,
Jun 13, 2019, 10:38:39 AM6/13/19
to tesseract-ocr
When I was trying to fine tune a few character for chi_sim, by typing in:

src/training/tesstrain.sh --fonts_dir /usr/share/fonts --lang chi_sim --linedata_only \
  --noextract_font_properties --langdata_dir ../langdata \
  --tessdata_dir ./tessdata --output_dir ~/tesstutorial/train

I'm getting an error:

Could not find font named 'AR PL UKai CN'.

Pango suggested font 'Arial'.

I checked what fonts I have using:
text2image --find_fonts --text ./langdata/chi_sim/chi_sim.training_text --outputbase ./langdata/chi_sim/  --min_coverage 0.999  --fonts_dir=/usr/share/fonts/    

indeed didn't see AR PL UKai CN.   

Now my question is,
How do I install necessary fonts for chi_sim?

I couldn't find a way to do it from here:

Thanks for your help!

Jingjing Lin

unread,
Jun 13, 2019, 12:13:57 PM6/13/19
to tesseract-ocr
turns out this is actually not a tesseract problem, instead it's an operating system problem. we need to install the necessary fonts to our operating system (ubuntu) via:
sudo apt-get install ***

a useful link is:  

https://github.com/tesseract-ocr/tesseract/blob/master/src/training/language-specific.sh

here you find the fonts necessary for your language.


In the end I couldn't find some of the fonts listed in the link above for chi_sim, so I added some other fonts to training/language-specific.sh, and make sure these fonts can be find at langdata/font_properties


would appreciate it if anybody knows where to find the necessary chi_sim that was used for training. Although I believe some of them are commercial.


to find the fonts available in our system, you can use: fc-list :lang=** (for chinese **=zh)



在 2019年6月13日星期四 UTC-4上午10:38:39,Jingjing Lin写道:

Shree Devi Kumar

unread,
Jun 13, 2019, 12:39:57 PM6/13/19
to tesser...@googlegroups.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/9f60b8bc-7254-44bc-bc4f-7d9373d90985%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

Jingjing Lin

unread,
Jun 13, 2019, 2:46:51 PM6/13/19
to tesseract-ocr
Thanks for the info.

在 2019年6月13日星期四 UTC-4下午12:39:57,shree写道:
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.

Jingjing Lin

unread,
Jun 13, 2019, 3:55:59 PM6/13/19
to tesseract-ocr
Why hasn't the list below been updated though? For the chi_sim_fonts I only see the fonts used for base tesseract. Do we just need to manually add the fonts to language-specific.sh?

在 2019年6月13日星期四 UTC-4下午12:39:57,shree写道:
FYI
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages