You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to tesseract-ocr
Hello everybody,
I just finished fine tuning according to Ray's tutorial.
I did the following steps:
I used tesstrain.sh to create training data and the starter traineddata. The training data consists of the eng.training_text with the multiple times added ± character.
I used combine_tessdata to extract the eng.lstm from the best eng.traineddata
I used lstmtraining with the extracted eng.lstm, the starter traineddata from step1 to train the model This is the end of training: At iteration 1264/3000/3000, mean rms=0.202%, delta=0.003%, BCER train=0.020%, BWER train=0.072%, skip ratio=0.000%, New worst BCER = 0.020 wrote checkpoint. Finished! Selected model with minimal training error rate (BCER) = 0.017
Then I made a Screenshot of a textline with the same Font I created the training data with and ran tesseract with the finished traineddata. (also the text is 1:1 in the training daa This is the text in the image New Articles page ± 23 a To Service ~~ a details DC that don't
This is the result with the freshly trained model: Ne Artic(Tes page = 23 aa To Bervice ww a detHiTs Dc that don lt
When I use the best eng.traineddata model I get this output: New Articles page = 23 a To Service ~~ a details DC that don't
Can someone explain why I get such a bad result? The training seems fine. I don't get any error messages. Everything I get back from my "fine tuned" model is absolute crap and way worse than the original one.