I have been trying to retrain tesseract to read characters on a LCD screen, like 0 with a slash, certain V, M, N, A s.
My setup:
Using tesseract 5.3.2, on a Debian 12 machine for training.
to generate my training text. Training text is based on eng.training_text. I also have the correct eng.trainneddata in the tesseract/tessdata folder.
The ground truth is then copied into tesstrain/data and I use make tesseract-langdata first to have langdata folder inside tesstrain.
I use this command: make training MODEL_NAME=lcd, START_MODEL=eng, TESSDATA=
For my ground truth folder, I have tried different sample sizes, from 1000 lines to all 195k lines and used to train tesseract upto a few thousand iterations. Almost always the error converges down to 45% and just stays there.
Can someone help me out with reducing this error rate? I have been going at it for quite a while, going through tessdocs and stuff, still unable to find a solution. If there's any information missing or required, lmk I'll update asap