LSTM training tesseract OCR high error rate

144 views

Skip to first unread message

Mridul Davesar

unread,

Mar 12, 2024, 4:43:38 AM3/12/24

to tesseract-ocr

Hey everyone ,

I am train my own lstm model based using some specific images that I want tesseract to work efficiently on. I have used the command

$ lstmtraining --model_output=my_output.lstm --traineddata="C:\Program Files\Tesseract-OCR\tessdata\eng.traineddata" --old_traineddata="C:\Program Files\Tesseract-OCR\tessdata\eng.traineddata" --train_listfile=traindata.txt

but it is giving I high error rate

At iteration 40/40/40, Mean rms=5.874000%, delta=47.785000%, BCER train=99.487000%, BWER train=100.000000%, skip ratio=0.000000%, New worst BCER = 99.487000 wrote checkpoint.

Finished! Selected model with minimal training error rate (BCER) = 99.367

So my questions is What is the reason for this high error rate as my file contains normal english sentences.

I think maybe my custom model is not leveraging the preptrained "eng.lstm" model