Watching the learning iteration is better method than watching the BCER

51 views

Skip to first unread message

Des Bw

unread,

Oct 18, 2023, 3:10:00 AM10/18/23

to tesseract-ocr

I am just writing a little observation here for beginners like me.

( would love to be corrected if I am wrong).

I am training by cutting the top layer of a best model; to improve the existing model. I have about 400,000 lines of texts; and generated the box and images files using text2image.

As I am training the model, I am getting BCER very low very fast. It took me not even two epochs to reach to BCER to 0.001. That might sound a good thing for an inexperienced user like me. But, as I am try the output model, the accuracy is nowhere as good as the default best model. So, I have to change t the target_error parameter to lower (0.0001), keep on training; and the model is getting better and better.

So, it looks like watching watching your learning iteration, which is the first number from the number of iterations (https://tesseract-ocr.github.io/tessdoc/tess4/TrainingTesseract-4.00.html#iterations-and-checkpoints) is a better approach than to watch the BCER. If the learning iteration keeps on growing, that means, the model is still learning. You need to keep on training, regardless of the BCER.

Des Bw

unread,

Oct 18, 2023, 3:13:16 AM10/18/23

to tesseract-ocr

In other words, the BCER is an unreliable measure of accuracy. At least, that is my experience training from synthetic data.

Reply all

Reply to author

Forward

0 new messages