Other case Л of л is not in unicharset

147 views
Skip to first unread message

roberty...@gmail.com

unread,
Aug 14, 2017, 4:49:47 AM8/14/17
to tesseract-ocr
Hello,

I use the new tutorial to fine tuning the traineddata. I want to add some specific symbols to the existing chi_sim.traineddata model.

First, I use the command: training/tesstrain.sh --fonts_dir /usr/share/fonts --lang chi_sim --linedata_only --noextract_font_properties --langdata_dir ../langdata --fontlist "SIMSUN" --tessdata_dir ./tessdata --output_dir ~/tesstutorial/trainspecial to create the new training data. But some specific symbols cannot be added to the unicharset file.

A part of output information showed following:

=== Phase UP: Generating unicharset and unichar properties files ===
[2017年 08月 14日 星期一 15:59:17 CST] /usr/local/bin/unicharset_extractor -D /tmp/tmp.78WyISy4o7/chi_sim/ /tmp/tmp.78WyISy4o7/chi_sim/chi_sim.SIMSUN.exp0.box
Extracting unicharset from /tmp/tmp.78WyISy4o7/chi_sim/chi_sim.SIMSUN.exp0.box
Wrote unicharset file /tmp/tmp.78WyISy4o7/chi_sim//unicharset.
[2017年 08月 14日 星期一 15:59:17 CST] /usr/local/bin/set_unicharset_properties -U /tmp/tmp.78WyISy4o7/chi_sim/chi_sim.unicharset -O /tmp/tmp.78WyISy4o7/chi_sim/chi_sim.unicharset -X /tmp/tmp.78WyISy4o7/chi_sim/chi_sim.xheights --script_dir=../langdata
Loaded unicharset of size 1129 from file /tmp/tmp.78WyISy4o7/chi_sim/chi_sim.unicharset
Setting unichar properties
Other case Л of л is not in unicharset
Other case Υ of υ is not in unicharset
Other case Π of π is not in unicharset
Other case Β of β is not in unicharset
Mirror ∼ of ∽ is not in unicharset
Mirror ⧵ of ∕ is not in unicharset
Other case σ of Σ is not in unicharset
Other case Ρ of ρ is not in unicharset
Mirror 》 of 《 is not in unicharset
Other case j of J is not in unicharset
Mirror 【 of 】 is not in unicharset
Mirror 「 of 」 is not in unicharset
Other case K of k is not in unicharset
Mirror { of } is not in unicharset
Other case q of Q is not in unicharset
Mirror 〗 of 〖 is not in unicharset
Setting script properties
Warning: properties incomplete for index 57 = )
Warning: properties incomplete for index 60 = :
Warning: properties incomplete for index 64 = !
Warning: properties incomplete for index 67 = ?
Warning: properties incomplete for index 73 = >
Warning: properties incomplete for index 81 = ;
Warning: properties incomplete for index 82 = ~
Warning: properties incomplete for index 90 = .
Warning: properties incomplete for index 98 = (
Warning: properties incomplete for index 99 = ゜
Warning: properties incomplete for index 115 = <
Warning: properties incomplete for index 190 = ,
Writing unicharset to file /tmp/tmp.78WyISy4o7/chi_sim/chi_sim.unicharset


which shows that some specific symbols such as 'Л', '》', ...,   cannot be added to the unicharset.


How can I add these symbols to the unicharset? Should I add them manually?
Reply all
Reply to author
Forward
0 new messages