Fine Tuning

106 views
Skip to first unread message

Simon

unread,
Jan 23, 2024, 6:26:05 PM1/23/24
to tesseract-ocr
Hello everybody, 

I just finished fine tuning according to Ray's tutorial.

I did the following steps: 
  1.  I used tesstrain.sh to create training data and the starter traineddata. The training data consists of the eng.training_text with the multiple times added ± character. 

  2. I used combine_tessdata to extract the eng.lstm from the best eng.traineddata

  3. I used lstmtraining with the extracted eng.lstm, the starter traineddata from step1 to train the model
    This is the end of training: 
    At iteration 1264/3000/3000, mean rms=0.202%, delta=0.003%, BCER train=0.020%, BWER train=0.072%, skip ratio=0.000%, New worst BCER = 0.020 wrote checkpoint.
    Finished! Selected model with minimal training error rate (BCER) = 0.017

  4. Then I made a Screenshot of a textline with the same Font I created the training data with and ran tesseract with the finished traineddata. (also the text is 1:1 in the training daa
    This is the text in the image
    New Articles page ± 23 a To Service ~~ a details DC that don't

    This is the result with the freshly trained model:
    Ne Artic(Tes page = 23 aa To Bervice ww a detHiTs Dc that don lt

    When I use the best eng.traineddata model I get this output:
    New Articles page = 23 a To Service ~~ a details DC that don't
    Can someone explain why I get such a bad result? The training seems fine. I don't get any error messages. Everything I get back from my "fine tuned" model is absolute crap and way worse than the original one. 
    Reply all
    Reply to author
    Forward
    0 new messages