Problems recognizing certain string using font OCR A

52 views

Skip to first unread message

Alexander Bartl

unread,

Nov 7, 2019, 12:15:34 PM11/7/19

to tesseract-ocr

Hello everybody,

I want to recognize a random combination of numbers and letters string in OCR A font. I did not had any luck with that so I tried to recognize reference texts with the same font and got surprised: The "normal" text got recognized without a problem, while my desired string is wrongly interpreted. Searching through the web lead me to disable the dictionaries used by Tesseract to match common words in the config file. Unfortunately that didn't help either.

Furthermore I tried the same text with a different font (CourierNew) and I was able to detect my desired string. So I would assume it has something to do with the font.

String I want to detect: 0300FY9N457

My machine is Win10 and the output of "tesseract --version" is:

tesseract v5.0.0-alpha.20191030

leptonica-1.78.0

libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0

Found AVX2

Found AVX

Found FMA

Found SSE

Found libarchive 3.3.2 zlib/1.2.11 liblzma/5.2.3 bz2lib/1.0.6 liblz4/1.7.5

Attached is the input- and outputfile, as well as the configfile and the generated .tif.

The cmd line used for generating the outputfile was: tesseract OCRA_Reference.jpg test [PATH]\config.txt

All kinds of help/suggestions are much appreciated.
Best regards
Alex

config.txt

OCRA_Reference.JPG

tessinput.tif

test.txt

Reply all

Reply to author

Forward

0 new messages