Finetuning in ocrd-train

ameera...@gmail.com

unread,

Mar 10, 2019, 12:53:52 AM3/10/19

to tesseract-ocr

In the ocrd-train Makefile, here is the code for finetuning

ifdef START_MODEL
$(LAST_CHECKPOINT): unicharset lists $(PROTO_MODEL)
mkdir -p data/checkpoints
lstmtraining
--traineddata $(PROTO_MODEL)
--old_traineddata $(TESSDATA)/$(START_MODEL).traineddata
--continue_from data/$(START_MODEL)/$(START_MODEL).lstm
--net_spec "[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1chead -n1 data/unicharset]"
--model_output data/checkpoints/$(MODEL_NAME)
--learning_rate 20e-4
--train_listfile data/list.train
--eval_listfile data/list.eval
--max_iterations 10000

Why do we need the following line? I thought it was only used in training from scratch.
--net_spec "[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1chead -n1 data/unicharset]" \

Should the learning rate be set lower for fine-tuning? The learning rate for training from scratch is 20e-4, so it would seem that the learning rate for fine-tuning should be significantly lower?
--learning_rate 20e-4 \

Shree Devi Kumar

unread,

Mar 10, 2019, 3:40:47 AM3/10/19

to tesser...@googlegroups.com

Please see https://github.com/OCR-D/ocrd-train/blob/f89efdd46c01aedea615d35e0561c50d7f86e584/Makefile

learning rate does not need to be specified for finetuning. It is automatically determined/reduced - see https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#net-mode-and-optimization

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/d1c9058e-3885-4ae7-9602-cf9033ccd87f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

ameera...@gmail.com

unread,

Mar 10, 2019, 6:56:34 PM3/10/19

to tesseract-ocr

Thanks Shree!

Jens Humrich

unread,

Mar 21, 2019, 5:49:15 AM3/21/19

to tesseract-ocr

The parameter can be left out of the command. It does not appear to change the result.

Reply all

Reply to author

Forward