I am just writing a little observation here for beginners like me.
( would love to be corrected if I am wrong).
I am training by cutting the top layer of a best model; to improve the existing model. I have about 400,000 lines of texts; and generated the box and images files using text2image.
As I am training the model, I am getting BCER very low very fast. It took me not even two epochs to reach to BCER to 0.001. That might sound a good thing for an inexperienced user like me. But, as I am try the output model, the accuracy is nowhere as good as the default best model. So, I have to change t the target_error parameter to lower (0.0001), keep on training; and the model is getting better and better.