LSTM training tesseract OCR high error rate

144 views
Skip to first unread message

Mridul Davesar

unread,
Mar 12, 2024, 4:43:38 AM3/12/24
to tesseract-ocr
Hey everyone ,
I am train my own lstm model based using some specific images that I want tesseract to work efficiently on. I have used the command  
$ lstmtraining --model_output=my_output.lstm --traineddata="C:\Program Files\Tesseract-OCR\tessdata\eng.traineddata" --old_traineddata="C:\Program Files\Tesseract-OCR\tessdata\eng.traineddata" --train_listfile=traindata.txt

but it is giving I high error rate 
At iteration 40/40/40, Mean rms=5.874000%, delta=47.785000%, BCER train=99.487000%, BWER train=100.000000%, skip ratio=0.000000%,  New worst BCER = 99.487000 wrote checkpoint.

Finished! Selected model with minimal training error rate (BCER) = 99.367

So my questions is What is the reason for this high error rate as my file contains normal english sentences.
I think maybe my custom model is not leveraging the preptrained "eng.lstm"  model 

Thanks
Reply all
Reply to author
Forward
0 new messages