Settings for non-language recognition of short codes

48 views

Skip to first unread message

Martin Camitz

unread,

Aug 21, 2018, 3:27:14 PM8/21/18

to tesseract-ocr

Hello,

I'm using tesseract (actually the tesseract.js port) to recognize short 6 character codes like this

F65M0P

What are the optimal settings for this in terms of speed and correctness?

Things to note:

- Language is irrelevant.

- Codes are always 6 characters long, uppercase, both digits and letters.

- The font is chosen for its prevalence OCR contexts.

- This image consists of the code and nothing else.

I experience tesseract quite slow compared to larger texts which I suspect has to do with trying to force out a dictionary word. It will mostly prefer letters over digits for example S instead of 5.

I am unfamiliar with tessaract and OCR, and you might be unfamiliar with the js-port. I don't think I can train the engine but I can set options like language_model_penalty_non_dict_word.

Thanks for your help.

Martin

Reply all

Reply to author

Forward

0 new messages