Fine Tuning

106 views

Skip to first unread message

Simon

unread,

Jan 23, 2024, 6:26:05 PM1/23/24

to tesseract-ocr

Hello everybody,

I just finished fine tuning according to Ray's tutorial.

I did the following steps:

I used tesstrain.sh to create training data and the starter traineddata. The training data consists of the eng.training_text with the multiple times added ± character.
I used combine_tessdata to extract the eng.lstm from the best eng.traineddata
I used lstmtraining with the extracted eng.lstm, the starter traineddata from step1 to train the model
This is the end of training:
At iteration 1264/3000/3000, mean rms=0.202%, delta=0.003%, BCER train=0.020%, BWER train=0.072%, skip ratio=0.000%, New worst BCER = 0.020 wrote checkpoint.
Finished! Selected model with minimal training error rate (BCER) = 0.017
Then I made a Screenshot of a textline with the same Font I created the training data with and ran tesseract with the finished traineddata. (also the text is 1:1 in the training daa
This is the text in the image
New Articles page ± 23 a To Service ~~ a details DC that don't

This is the result with the freshly trained model:
Ne Artic(Tes page = 23 aa To Bervice ww a detHiTs Dc that don lt

When I use the best eng.traineddata model I get this output:
New Articles page = 23 a To Service ~~ a details DC that don't

Can someone explain why I get such a bad result? The training seems fine. I don't get any error messages. Everything I get back from my "fine tuned" model is absolute crap and way worse than the original one.

Reply all

Reply to author

Forward

0 new messages