src/training/tesstrain.sh --fonts_dir /usr/share/fonts --lang chi_sim --linedata_only \ --noextract_font_properties --langdata_dir ../langdata \ --tessdata_dir ./tessdata --output_dir
~/tesstutorial/trainCould not find font named 'AR PL UKai CN'.
https://github.com/tesseract-ocr/tesseract/blob/master/src/training/language-specific.sh
here you find the fonts necessary for your language.
In the end I couldn't find some of the fonts listed in the link above for chi_sim, so I added some other fonts to training/language-specific.sh, and make sure these fonts can be find at langdata/font_properties
would appreciate it if anybody knows where to find the necessary chi_sim that was used for training. Although I believe some of them are commercial.
to find the fonts available in our system, you can use: fc-list :lang=** (for chinese **=zh)
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/9f60b8bc-7254-44bc-bc4f-7d9373d90985%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/9f60b8bc-7254-44bc-bc4f-7d9373d90985%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
FYI
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/9f60b8bc-7254-44bc-bc4f-7d9373d90985%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.