Problems recognizing certain string using font OCR A

52 views
Skip to first unread message

Alexander Bartl

unread,
Nov 7, 2019, 12:15:34 PM11/7/19
to tesseract-ocr
Hello everybody, 

I want to recognize a random combination of numbers and letters string in OCR A font. I did not had any luck with that so I tried to recognize reference texts with the same font and got surprised: The "normal" text got recognized without a problem, while my desired string is wrongly interpreted. Searching through the web lead me to disable the dictionaries used by Tesseract to match common words in the config file. Unfortunately that didn't help either. 

Furthermore I tried the same text with a different font (CourierNew) and I was able to detect my desired string. So I would assume it has something to do with the font.

String I want to detect: 0300FY9N457

My machine is Win10 and the output of "tesseract --version" is:
tesseract v5.0.0-alpha.20191030
 leptonica-1.78.0
  libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
 Found AVX2
 Found AVX
 Found FMA
 Found SSE
 Found libarchive 3.3.2 zlib/1.2.11 liblzma/5.2.3 bz2lib/1.0.6 liblz4/1.7.5


Attached is the input- and outputfile, as well as the configfile and the generated .tif. 

The cmd line used for generating the outputfile was: tesseract OCRA_Reference.jpg test [PATH]\config.txt 


All kinds of help/suggestions are much appreciated. 
Best regards 
Alex
config.txt
OCRA_Reference.JPG
tessinput.tif
test.txt
Reply all
Reply to author
Forward
0 new messages