BCER = 0.01; the actual result is not satisfactory

85 views
Skip to first unread message

Des Bw

unread,
Oct 6, 2023, 2:48:25 PM10/6/23
to tesseract-ocr
- I have been training a large amount of data: about 390,000 lines of text for each font: for 15 fonts. I run around 1.2 million iterations. The progress was encouraging to some degree. But, at some point, the BCER started to get down fast; and reached at 0.01 error. The training stopped at that point. 
I was excited thinking that my model is getting as accurate as it gets. But, turns out, it is no where as good as the default (RAy's) best model. 

The am doing fine tuning by removing the top layer of the network. 
This was the command I used to train: make training MODEL_NAME=amh START_MODEL=amh APPEND_INDEX=5 NET_SPEC='[Lfx256 O1c105]' TESSDATA=../tesseract/tessdata MAX_ITERATIONS=500000 DEBUG_INTERVAL=-1 training >> data/amh.log &

What do you think is going on here?
Do you think I am over-fitting the model?
Reply all
Reply to author
Forward
0 new messages