Settings for non-language recognition of short codes

48 views
Skip to first unread message

Martin Camitz

unread,
Aug 21, 2018, 3:27:14 PM8/21/18
to tesseract-ocr
Hello,

I'm using tesseract (actually the tesseract.js port) to recognize short 6 character codes like this

F65M0P

What are the optimal settings for this in terms of speed and correctness?

Things to note:

- Language is irrelevant.
- Codes are always 6 characters long, uppercase, both digits and letters.
- The font is chosen for its prevalence OCR contexts.
- This image consists of the code and nothing else.

I experience tesseract quite slow compared to larger texts which I suspect has to do with trying to force out a dictionary word. It will mostly prefer letters over digits for example S instead of 5.

I am unfamiliar with tessaract and OCR, and you might be unfamiliar with the js-port. I don't think I can train the engine but I can set options like language_model_penalty_non_dict_word.

Thanks for your help.

Martin
Reply all
Reply to author
Forward
0 new messages