Training Metrics

Simon

unread,

Nov 22, 2023, 8:50:45 AM11/22/23

to tesseract-ocr

As I am training my model I got in contact with the following metrics:

E.g.:
At iteration 6345/6500/6500, Mean rms=6.246%, delta=7.139%, char train=68.07%, word train=92.2%, skip ratio=0%, New best char error = 68.07 wrote checkpoint.

Unfortunately I don't find any proper and detailed description or explanation of these metrics on the web.

To evaluate the metrics this information would be really helpful, as right now It feels more like guessing what values are "good". As most developers are lacking in experience it is pretty hard to tell what values are "good" or "bad".

Message has been deleted

Des Bw

unread,

Nov 22, 2023, 9:34:52 AM11/22/23

to tesseract-ocr

The character rate is the most common measure of the quality of your training.

- train with large data. Run it on a couple of epochs; so that your CER will be as close as 0.01. That is the most common strategy.

Message has been deleted

Simon

unread,

Nov 23, 2023, 4:28:35 AM11/23/23

to tesseract-ocr

Alright,

this might be a litte bit of a dump question but where exactly can I see the CER?

2 Percent improvement time=56, best error was 12.49 @ 8294
At iteration 8350/10000/10000, Mean rms=2.701%, delta=2.491%, char train=10.385%, word train=24.4%, skip ratio=0%, New best char error = 10.385 wrote best model:data/Common_num/checkpoints/Common_num10.385_8350.checkpoint wrote checkpoint.

Is it the "best char error"? Where do I have to look to find CER? Is the CER in the above example?

Also what are signs that my model is overfitted? Is there any possibility recognicing this in the above statement?

Des Bw

unread,

Nov 23, 2023, 4:34:40 AM11/23/23

to tesseract-ocr

I think they are abbreviations:

best char error =BCER

character error = CER

There is no signs to tell if the model is overfit. I know no diagnostics for that. For fine-tuning, running iterations higher than 400 is always problematic because it destroys the base model.

- So, the common strategy is to increase your data; and run just 300 iterations. The BCER is not that important in that case.

But, for training from scratch or from layer (network), you should try to get the BCER (error rate) as low as possible. Overfitting happens when the data is too small, and the iterations are too many. From my experience, running 2-5 epochs seems to generate good results. But, I have seen experienced guys training for hundred even thousands of epochs.

Des Bw

unread,

Nov 23, 2023, 4:36:06 AM11/23/23

to tesseract-ocr

BCER (best character rate) is automatically picked by tesseract from all list of character rates errors (CER).

Reply all

Reply to author

Forward