I think they are abbreviations:
best char error =BCER
character error = CER
There is no signs to tell if the model is overfit. I know no diagnostics for that. For fine-tuning, running iterations higher than 400 is always problematic because it destroys the base model.
- So, the common strategy is to increase your data; and run just 300 iterations. The BCER is not that important in that case.
But, for training from scratch or from layer (network), you should try to get the BCER (error rate) as low as possible. Overfitting happens when the data is too small, and the iterations are too many. From my experience, running 2-5 epochs seems to generate good results. But, I have seen experienced guys training for hundred even thousands of epochs.