How to continue training with Makefile + EPOCHS

75 views
Skip to first unread message

Ghinwa Choueiter

unread,
Dec 5, 2023, 8:47:43 AM12/5/23
to tesseract-ocr
Hi there,

I trained a model as follows

export TESSDATA_DIR=/<path>/tessdata_best/
export LANGDATA_DIR=/<path>/tesstrain/data

nohup make LANG_TYPE=RTL \
      MODEL_NAME=ara_plus \
      PSM=13 \
      START_MODEL=ara \
      TESSDATA=$TESSDATA_DIR \
      LANGDATA_DIR=$LANGDATA_DIR \
      EPOCHS=100 \
      RATIO_TRAIN=0.90 \
      DEBUG_INTERVAL=-1 training >> data/ara_plus.log &

1. once I have the initial model, how would I run further iterations on the same data. Should I copy ara_plus.traineddata to  $TESSDATA_DIR and specify START_MODEL=ara_plus? Or is there another way.

2. When I specify EPOCHS > 0 then I see that the Makefile sets the iterations to - EPOCHS. What is that actually doing? Will it actually iterate = EPOCHS * data points. I see we are using SGD so LSTM training is running each data point separately. 

thank you.
G

Keith Smith

unread,
Dec 5, 2023, 8:41:56 PM12/5/23
to tesser...@googlegroups.com
From one novice to another ...

1. Yes, that is my understanding of how to run further iterations.

2. Yes, EPOCHS says to iterate that many times over your set of tests.  I think I have heard the recommended number of EPOCHS in general is 2, though I don't know how much science is behind that.  I think 100 EPOCHs is too many and will over fit.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/b6e1ab30-75f8-4a93-a80d-a95cb72e5b22n%40googlegroups.com.

Ghinwa Choueiter

unread,
Dec 18, 2023, 9:49:53 AM12/18/23
to tesseract-ocr
thanks will keep trying.
G

Reply all
Reply to author
Forward
0 new messages