Hello,
I'm using tesseract (actually the tesseract.js port) to recognize short 6 character codes like this
F65M0P
What are the optimal settings for this in terms of speed and correctness?
Things to note:
- Language is irrelevant.
- Codes are always 6 characters long, uppercase, both digits and letters.
- The font is chosen for its prevalence OCR contexts.
- This image consists of the code and nothing else.
I experience tesseract quite slow compared to larger texts which I suspect has to do with trying to force out a dictionary word. It will mostly prefer letters over digits for example S instead of 5.
I am unfamiliar with tessaract and OCR, and you might be unfamiliar with the js-port. I don't think I can train the engine but I can set options like language_model_penalty_non_dict_word.
Thanks for your help.
Martin