Trying to train a new font (LCD screen style), unable to get error rate under 40%

93 views
Skip to first unread message

Chaitanya Vermani

unread,
Feb 21, 2024, 1:12:37 AM2/21/24
to tesseract-ocr
I have been trying to retrain tesseract to read characters on a LCD screen, like 0 with a slash, certain V, M, N, A s.
My setup:
Using tesseract 5.3.2, on a Debian 12 machine for training.
to generate my training text. Training text is based on eng.training_text. I also have the correct eng.trainneddata in the tesseract/tessdata folder. 
The ground truth is then copied into tesstrain/data and I use make tesseract-langdata first to have langdata folder inside tesstrain.
I use this command: make training MODEL_NAME=lcd, START_MODEL=eng, TESSDATA=



For my ground truth folder, I have tried different sample sizes, from 1000 lines to all 195k lines and used to train tesseract upto a few thousand iterations. Almost always the error converges down to 45% and just stays there. 


Can someone help me out with reducing this error rate? I have been going at it for quite a while, going through tessdocs and stuff, still unable to find a solution. If there's any information missing or required, lmk I'll update asap
Reply all
Reply to author
Forward
0 new messages