src/training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng --linedata_only \
--noextract_font_properties --langdata_dir ../langdata \
--tessdata_dir ./tessdata \
--fontlist "Impact Condensed" --output_dir ~/tesstutorial/engevalmkdir -p ~/tesstutorial/engoutput
training/lstmtraining --debug_interval 100 \
--traineddata ~/tesstutorial/engtrain/eng/eng.traineddata \
--net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c111]' \
--model_output ~/tesstutorial/engoutput/base --learning_rate 20e-4 \
--train_listfile ~/tesstutorial/engtrain/eng.training_files.txt \
--eval_listfile ~/tesstutorial/engeval/eng.training_files.txt \
--max_iterations 5000 &>~/tesstutorial/engoutput/basetrain.logHere,I am confused,because currently I am in the tesseract directory, I can not find training folder under this directory.
and I think after I install the tesseract successfully,the system can recognize the lstmtraining command,so I use this command instead.
lstmtraining --debug_interval 100 \ --traineddata ~/tesstutorial/engtrain/eng/eng.traineddata \ --net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c111]' \ --model_output ~/tesstutorial/engoutput/base --learning_rate 20e-4 \ --train_listfile ~/tesstutorial/engtrain/eng.training_files.txt \ --eval_listfile ~/tesstutorial/engeval/eng.training_files.txt \ --max_iterations 5000
There is an error.mgr_.Init(traineddata_path.c_str()):Error:Assert failed:in file ../../src/lstm/lstmtrainer.h, line 110
Segmentation fault (core dumped)I look the source code in lstmtrainer.h102 // assumed that the character set is to be re-mapped from old_traineddata to
103 // the new, with consequent change in weight matrices etc.
104 bool TryLoadingCheckpoint(const char* filename, const char* old_traineddata);
105
106 // Initializes the character set encode/decode mechanism directly from a
107 // previously setup traineddata containing dawgs, UNICHARSET and
108 // UnicharCompress. Note: Call before InitNetwork!
109 void InitCharSet(const std::string& traineddata_path) {
110 ASSERT_HOST(mgr_.Init(traineddata_path.c_str()));
111 InitCharSet();
112 }
113 void InitCharSet(const TessdataManager& mgr) {
114 mgr_ = mgr;
115 InitCharSet();
116 }
I don't know how to solve the problem.Is anyone can help me.Thanks in advance.Sorry for my poor english.
std::cerr << traineddata_path.c_str() << std::endl;
src/training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng --linedata_only \
--noextract_font_properties --langdata_dir ../langdata \
--tessdata_dir ./tessdata --output_dir ~/tesstutorial/engtrain
By the way, i am stuck at this point , tesseract seems to loop infintely at this point.--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/6e11f4c3-3142-45a1-9f31-9a9f86504a93%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Yes you need to install some fonts
You can find a tutorial here http://www.linuxandubuntu.com/home/how-to-install-microsoft-fonts-in-ubuntu-linux
You can find the fonts that tesseract use for the his command in the script language_specific.sh if i remember correctly. To find the location of this cript. Do a simple whereis tesstrain.sh
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/b98c7851-441c-4a59-855b-08b32527ea13%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/b98c7851-441c-4a59-855b-08b32527ea13%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
training/lstmtraining --debug_interval 100 \
--traineddata ~/tesstutorial/engtrain/eng/eng.traineddata \
--net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c111]' \
--model_output ~/tesstutorial/engoutput/base --learning_rate 20e-4 \
--train_listfile ~/tesstutorial/engtrain/eng.training_files.txt \
--eval_listfile ~/tesstutorial/engeval/eng.training_files.txt \
--max_iterations 5000 &>~/tesstutorial/engoutput/basetrain.logI notice the basetrain.log file.Warning: given outputs 111 not equal to unicharset of 110. Num outputs,weights in Series: 1,36,0,1:1, 0 Num outputs,weights in Series: C3,3:9, 0 Ft16:16, 160 Total weights = 160 [C3,3Ft16]:16, 160 Mp3,3:16, 0 Lfys48:48, 12480 Lfx96:96, 55680 Lrx96:96, 74112 Lfx256:256, 361472 Fc110:110, 28270 Total weights = 532174 Built network:[1,36,0,1[C3,3Ft16]Mp3,3Lfys48Lfx96Lrx96Lfx256Fc110] from request [1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c111] Training parameters: Debug interval = 100, weights = 0.1, learning rate = 0.002, momentum=0.5 null char=109 Loaded 72/72 pages (1-72) of document /home/yixin/tesstutorial/engtrain/eng.Arial_Bold.exp0.lstmf Loaded 72/72 pages (1-72) of document /home/yixin/tesstutorial/engeval/eng.Impact_Condensed.exp0.lstmf Loaded 72/72 pages (1-72) of document /home/yixin/tesstutorial/engtrain/eng.Century_Schoolbook_L_Medium.exp0.lstmf Loaded 72/72 pages (1-72) of document /home/yixin/tesstutorial/engtrain/eng.Century_Schoolbook_L_Italic.exp0.lstmf Loaded 72/72 pages (1-72) of document /home/yixin/tesstutorial/engtrain/eng.Arial_Italic.exp0.lstmf Loaded 72/72 pages (1-72) of document /home/yixin/tesstutorial/engtrain/eng.Courier_New_Bold.exp0.lstmf Loaded 72/72 pages (1-72) of document /home/yixin/tesstutorial/engtrain/eng.Century_Schoolbook_L_Bold.exp0.lstmf Loaded 72/72 pages (1-72) of document /home/yixin/tesstutorial/engtrain/eng.Century_Schoolbook_L_Bold_Italic.exp0.lstmf Loaded 72/72 pages (1-72) of document /home/yixin/tesstutorial/engtrain/eng.Arial_Bold_Italic.exp0.lstmf Loaded 72/72 pages (1-72) of document /home/yixin/tesstutorial/engtrain/eng.Arial.exp0.lstmf Starting sh -c "trap 'kill %1' 0 1 2 ; java -Xms1024m -Xmx2048m -jar ./ScrollView.jar & wait" Error: Unable to access jarfile ./ScrollView.jar sh: 1: kill: No such process ScrollView: Waiting for server... ScrollView: Waiting for server... ScrollView: Waiting for server... ScrollView: Waiting for server... ScrollView: Waiting for server... ScrollView: Waiting for server... ScrollView: Waiting for server... ScrollView: Waiting for server... ScrollView: Waiting for server... ScrollView: Waiting for server... ScrollView: Waiting for server... ScrollView: Waiting for server... ScrollView: Waiting for server... ScrollView: Waiting for server... ScrollView: Waiting for server... ScrollView: Waiting for server... ScrollView: Waiting for server... ScrollView: Waiting for server... ScrollView: Waiting for server... ScrollView: Waiting for server... ScrollView: Waiting for server...
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAPiKE21WHaXwQGsgyVqFGrw3NdnHJk2U8dAsCSW5vxdRtetkRg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
change --debug_interval 100 by --debug_interval 0
Le vendredi 25 janvier 2019 08:17:28 UTC+1, Aodren BARY a écrit :change --debug_interval 100 by --debug_interval 0It's not mandatory to install ScrollView
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/f6cf6f74-f580-4ff2-824f-976c40b7390f%40googlegroups.com.