Interrupting and restarting lstmtraining

40 views
Skip to first unread message

Adam Funk

unread,
Dec 23, 2019, 7:03:49 AM12/23/19
to tesseract-ocr
Hi,

I have an lstmtraining job running without --max_iterations; it's been
going for a couple of weeks now (in a docker container in screen on a
server that I ssh into).

Can I safely use ctrl-C to stop it, use lstmtraining --stop_training
(with appropriate settings for --continue_from --traineddata and
--model_output) to create a trained model that I can drop into
tesseraact somewhere else, and then restart the lstmtraining job?

I don't want to lose anything that's been produced so far.

Thanks,
Adam

Shree Devi Kumar

unread,
Dec 23, 2019, 12:13:24 PM12/23/19
to tesseract-ocr
You can create traineddata with the --stop-training while lstmtraining continues to run.

If you are using tesstrain makefile then it has a target called traineddata which will generate traineddata file for each intermediate checkpoint.

You can stop and start training but I have a feeling that training runs longer in that case.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/8b53685f-6368-6239-726c-3b2e5305fe64%40sheffield.ac.uk.

Adam Funk

unread,
Dec 24, 2019, 5:44:57 AM12/24/19
to tesser...@googlegroups.com

That's useful to know --- thanks!

If I want to add more training data without waiting for lstmtraining to
stop itself, is this the right way to do it without losing the trained
model so far?

1. ctrl-C on the lstmtraining job
2. generate the *.lstmf files from the new image and box files
3. generate the training and test lists of filenames
4. start lstmtraining again with the same working directory and the new
lists of filenames

Thanks,
Adam
> <mailto:tesseract-ocr%2Bunsu...@googlegroups.com>.
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to tesseract-oc...@googlegroups.com
> <mailto:tesseract-oc...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWwka7RJ1FG7dGdx_WjTnhVq3h%3DfNDrfC7OhYAY56t_yw%40mail.gmail.com
> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWwka7RJ1FG7dGdx_WjTnhVq3h%3DfNDrfC7OhYAY56t_yw%40mail.gmail.com?utm_medium=email&utm_source=footer>.

Adam Funk

unread,
Jan 6, 2020, 8:40:36 AM1/6/20
to tesser...@googlegroups.com

Thanks! Unfortunately I'm now getting the dreaded "Failed to read
continue from:" error.

I have used docker exec to start another terminal in the running
container, and am trying to use the following command:

lstmtraining \
--stop_training \
--continue_from /data/output/meme_checkpoint \
--traineddata /data/starter/eng/eng.traineddata \
--model_output /data/output/mem.traineddata

The file I'm using in --continue_from always has a fresh timestamp, but
the other checkpoints (with numbers in the filenames) in the same
directory are quite old.

I'd be grateful for any suggestions.




On 23/12/2019 17:10, Shree Devi Kumar wrote:
> <mailto:tesseract-ocr%2Bunsu...@googlegroups.com>.
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to tesseract-oc...@googlegroups.com
> <mailto:tesseract-oc...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWwka7RJ1FG7dGdx_WjTnhVq3h%3DfNDrfC7OhYAY56t_yw%40mail.gmail.com
> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWwka7RJ1FG7dGdx_WjTnhVq3h%3DfNDrfC7OhYAY56t_yw%40mail.gmail.com?utm_medium=email&utm_source=footer>.

Adam Funk

unread,
Jan 8, 2020, 9:54:26 AM1/8/20
to tesser...@googlegroups.com
Hi again,

I think I may have figured this out by experimentation (tinkering). The
following command succeeds:

lstmtraining --stop_training \
--continue_from /data/output/meme0.512_97818.checkpoint \
--traineddata /data/starter/eng/eng.traineddata \
--model_output /data/output/mem.traineddata

where meme0.512_97818.checkpoint is the most recent checkpoint other
than the one that (I presume) is currently being modified by the main
lstmtraining process.

So am I right in guessing that while lstmtraining is running, the
meme_checkpoint file is not only susceptible to being changed, but is
missing some kind of "completion" that is required to use the
stop_training subcommand to generate the model?

Thanks,
Adam

Shree Devi Kumar

unread,
Jan 8, 2020, 10:27:34 AM1/8/20
to tesseract-ocr
As far as I understand, meme_checkpoint will have multiple models in it, in order to go back and restart training with a different model if there is no convergence. You may have noticed that the file size is bigger. 

Sometimes during the training process, sub models are evaluated. You can review the log file for details. It is possible that at such times meme_checkpoint cannot be used to continue from. 

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/131f6776-74a8-7afc-5a5b-43e7f20d4e93%40sheffield.ac.uk.
Reply all
Reply to author
Forward
0 new messages