Hello everybody,
I want to recognize a random combination of numbers and letters string in OCR A font. I did not had any luck with that so I tried to recognize reference texts with the same font and got surprised: The "normal" text got recognized without a problem, while my desired string is wrongly interpreted. Searching through the web lead me to disable the dictionaries used by Tesseract to match common words in the config file. Unfortunately that didn't help either.
Furthermore I tried the same text with a different font (CourierNew) and I was able to detect my desired string. So I would assume it has something to do with the font.
String I want to detect: 0300FY9N457
My machine is Win10 and the output of "tesseract --version" is:
tesseract v5.0.0-alpha.20191030
leptonica-1.78.0
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
Found AVX2
Found AVX
Found FMA
Found SSE
Found libarchive 3.3.2 zlib/1.2.11 liblzma/5.2.3 bz2lib/1.0.6 liblz4/1.7.5
Attached is the input- and outputfile, as well as the configfile and the generated .tif.
The cmd line used for generating the outputfile was: tesseract OCRA_Reference.jpg test [PATH]\config.txt
All kinds of help/suggestions are much appreciated.
Best regards
Alex