Fine Tuning Iterations

109 views
Skip to first unread message

Ibr

unread,
Jun 22, 2017, 5:45:38 AM6/22/17
to tesseract-ocr
Hi,

if I want to run the command:

training/lstmtraining --model_output ~/tesstutorial/full_japanese/new \
 
--continue_from ~/tesstutorial/extracted_lstm/jpn.lstm \
 
--train_listfile ~/tesstutorial/jpntrain/jpn.training_files.txt \
 
--max_iterations 100000

how can I match the --max_iterations so all lstmf files inside the training_files.txt  will be trained against? I mean if I have 40 lstmf files inside training_files.txt , what is the number of the iterations that will for sure cover the 40 lstmf files?

also if I trained against set of lstmf files, then I got a new set, I can continue the training against the new set without repeating the first set, correct? and if yes, all what I have to do is changing the path to the new set of lstmf files inside the training_files.txt file, while keeping the --continue_from as it is, correct?

thanks

ShreeDevi Kumar

unread,
Jun 22, 2017, 6:01:13 AM6/22/17
to tesser...@googlegroups.com
>what is the number of the iterations that will for sure cover the 40 lstmf files?

It will depend on number of lines in each file eg. If each file has 1000 lines, then 40,000 iterations should cover all files once.

You can use   --target_error_rate 0.01  instead of number of iterations as a guide for how long to train.



ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/76179e5f-6a8b-4cb8-aa22-e4df1baa0d1b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ibr

unread,
Jun 22, 2017, 6:32:07 AM6/22/17
to tesseract-ocr
thanks


On Thursday, June 22, 2017 at 1:01:13 PM UTC+3, shree wrote:
>what is the number of the iterations that will for sure cover the 40 lstmf files?

It will depend on number of lines in each file eg. If each file has 1000 lines, then 40,000 iterations should cover all files once.

You can use   --target_error_rate 0.01  instead of number of iterations as a guide for how long to train.



ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Thu, Jun 22, 2017 at 3:15 PM, Ibr <ibr.h...@gmail.com> wrote:
Hi,

if I want to run the command:

training/lstmtraining --model_output ~/tesstutorial/full_japanese/new \
 
--continue_from ~/tesstutorial/extracted_lstm/jpn.lstm \
 
--train_listfile ~/tesstutorial/jpntrain/jpn.training_files.txt \
 
--max_iterations 100000

how can I match the --max_iterations so all lstmf files inside the training_files.txt  will be trained against? I mean if I have 40 lstmf files inside training_files.txt , what is the number of the iterations that will for sure cover the 40 lstmf files?

also if I trained against set of lstmf files, then I got a new set, I can continue the training against the new set without repeating the first set, correct? and if yes, all what I have to do is changing the path to the new set of lstmf files inside the training_files.txt file, while keeping the --continue_from as it is, correct?

thanks

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

Ibr

unread,
Jun 22, 2017, 6:37:09 AM6/22/17
to tesseract-ocr
how can I know how many lines in each lstmf file? I opened one with the notepad ++ and it was almost 70000 line, and that can't be correct since I tried 61 font with 100000 iterations


Reply all
Reply to author
Forward
0 new messages