I've trained about 18000 line for persian language. I use this command:
bash -x tesstrain.sh --fonts_dir /usr/share/fonts --lang fas --training_text /home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/fas.training_text.txt --wordlist /home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/fas.wordlist.txt --linedata_only \
--noextract_font_properties --langdata_dir /home/zohreh/Desktop/tesseract-master/src/training/langdata \
--tessdata_dir /home/zohreh/Desktop/tesseract-master/tessdata \
--fontlist "Arial" --output_dir /home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/Phase2
and then run this:
sudo /home/zohreh/Desktop/tesseract-master/src/training/lstmtraining \
--traineddata /home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/Phase2/fas/fas.traineddata --net_spec '[1,48,0,1Ct3,3,16Mp3,3Lfys64Lfx96Lrx96Lfx192O1c1]' \
--model_output /home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/Out/base --learning_rate 0.001 \
--train_listfile /home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/Phase2/fas.training_files.txt \
--eval_listfile /home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/v/fas.training_files.txt \
--max_iterations 5000 &>/home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/Out/basetrain.log
but always show Compute CTC targets failed and the model is not well at all.
I normal my text and each line of the text have 20 token(max).
Could you pleas help me?