tesseract pattern not enforced?

47 views

Skip to first unread message

MAtteo Acquarone

unread,

Sep 17, 2020, 3:28:50 AM9/17/20

to tesseract-ocr

Hello,

I'm using Tesseract 4.1.0.0 trying to OCR a text field on the target that contains codes that have a pattern ( implemented as pattern file in Tesseract terms):
P\n\n\n\n
C\n\n\n\n
B\n\n\n\n
U\n\n\n\n

In practice there is a letter that can be P or C, or B or U and then 4 more hex digits.
The length is always exactly 5 char in total.

So, at least in my intention with this pattern file, correct output would be, as examples:
P0123, P2EFD, C12EF, B2BCD and so on.
Running the script that does OCR thousands of times I see that the vast majority of the output is as expected but I have also some results like PPB, PFF3,CC3 and so on.
Is there a way I can enforce more the adherence to the pattern I setup like this:
user_patterns_file=C:\Util\Code_OCR.Pattern
tessedit_char_whitelist=PCBU0123456789ABCDEF
tessedit_char_blacklist=abcdefGgHhIiLlMmNnOopQqRrSsTtuVvZzJjYyKkWw-!|
load_system_dawg=F
load_freq_dawg=F