lstmtraining generates only checkpoint file, how can i get traineddata?

598 views
Skip to first unread message

enkhbaata...@unimedia.co.jp

unread,
Sep 20, 2017, 10:39:04 PM9/20/17
to tesseract-ocr
I've used fine tuning to add new fonts to Japanese.traineddata, executed command as following 
 training/lstmtraining 
 --model_output ../tesstutorial/jpntune 
 --continue_from /home/workspace/tesstrain/jpn/jpnoutput/Japanese.lstm 
 --traineddata /usr/local/share/tessdata/best/Japanese.traineddata 
 --train_listfile /home/workspace/tesstrain/jpn.training_files.txt 
 --max_iterations 8000
After training process completed, i got checkpoint files in my model_output directory and Japanese.traineddata is not updated, so is there any other step to create new traineddata? 
 

ShreeDevi Kumar

unread,
Sep 20, 2017, 10:56:19 PM9/20/17
to tesser...@googlegroups.com
Please reading the training page on wiki.

You will need command such as the following:

training/lstmtraining --stop_training \
  --continue_from ../tesstutorial/vedic_deva/san_deva_checkpoint \
  --traineddata ../tesstutorial/deva/san/san.traineddata \
  --model_output ../tesstutorial/vedic_deva/san_deva.traineddata
  
cp  ../tesstutorial/vedic_deva/san_deva.traineddata  ../tessdata_best/

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/f5676080-7b58-4bd4-a2ac-e9fb36a7642c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

enkhbaata...@unimedia.co.jp

unread,
Sep 20, 2017, 11:53:42 PM9/20/17
to tesseract-ocr
Thanks a lot, i think training page on wiki needs more code example to understand training part completely.   


On Thursday, September 21, 2017 at 10:56:19 AM UTC+8, shree wrote:
Please reading the training page on wiki.

You will need command such as the following:

training/lstmtraining --stop_training \
  --continue_from ../tesstutorial/vedic_deva/san_deva_checkpoint \
  --traineddata ../tesstutorial/deva/san/san.traineddata \
  --model_output ../tesstutorial/vedic_deva/san_deva.traineddata
  
cp  ../tesstutorial/vedic_deva/san_deva.traineddata  ../tessdata_best/

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Thu, Sep 21, 2017 at 8:09 AM, <enkhbaata...@unimedia.co.jp> wrote:
I've used fine tuning to add new fonts to Japanese.traineddata, executed command as following 
 training/lstmtraining 
 --model_output ../tesstutorial/jpntune 
 --continue_from /home/workspace/tesstrain/jpn/jpnoutput/Japanese.lstm 
 --traineddata /usr/local/share/tessdata/best/Japanese.traineddata 
 --train_listfile /home/workspace/tesstrain/jpn.training_files.txt 
 --max_iterations 8000
After training process completed, i got checkpoint files in my model_output directory and Japanese.traineddata is not updated, so is there any other step to create new traineddata? 
 

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

sai sumanth Kalluri

unread,
Jun 21, 2019, 11:06:12 AM6/21/19
to tesseract-ocr
May I know from which directory I'm supposed to run these commands? because all I get are symbol lookup errors. Sorry for the very basic question but i'm a newbie when it comes to linux.

Shree Devi Kumar

unread,
Jun 21, 2019, 12:14:40 PM6/21/19
to tesser...@googlegroups.com
run from main tesseract directory.

The directory structure has been changed. it will be  src/training/lstmtraining if you built like that.
If you installed it, it should be accessible as `lstmtraing`.



--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

For more options, visit https://groups.google.com/d/optout.


--
Reply all
Reply to author
Forward
0 new messages