Help with recognition please

102 views
Skip to first unread message

Will Fetherolf

unread,
Oct 7, 2024, 9:33:12 PM10/7/24
to tesseract-ocr
The application I'm attempting to OCR is using what I think is Arial for the font, but every time I run the attached image through Tesseract 5.4.0 on Windows I get "NVA" or "NIA" depending on which PSM I use.  If I use 7, I always get back "NIA".  I have tried running training on a variety of captured data from my application with no success.

Help me, Obi-Wan Kenobi, you're my only hope!
image.20241004100212.68.bmp

Art Rhyno

unread,
Oct 8, 2024, 11:12:51 AM10/8/24
to tesser...@googlegroups.com

You could try resizing the image, with imagemagick, something like:

 

convert  test.bmp -resize 200% test.png

 

That seems to be enough to separate out the “N” and the “/”.

 

art

 

From: tesser...@googlegroups.com <tesser...@googlegroups.com> On Behalf Of Will Fetherolf
Sent: Monday, October 7, 2024 9:33 PM
To: tesseract-ocr <tesser...@googlegroups.com>
Subject: [tesseract-ocr] Help with recognition please

 

You don't often get email from will.fe...@gmail.com. Learn why this is important

The application I'm attempting to OCR is using what I think is Arial for the font, but every time I run the attached image through Tesseract 5.4.0 on Windows I get "NVA" or "NIA" depending on which PSM I use.  If I use 7, I always get back "NIA".  I have tried running training on a variety of captured data from my application with no success.

 

Help me, Obi-Wan Kenobi, you're my only hope!

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/01ab548e-e45e-48b7-824d-73debed1adb1n%40googlegroups.com.

Will Fetherolf

unread,
Oct 8, 2024, 11:15:49 AM10/8/24
to tesseract-ocr
I'll look into that.  I've got code in my automation system that blows up the image, but it's not doing any kind of smoothing.  I might have to code up a "nicer" image blowup function.

Pankaj Duggal

unread,
Oct 8, 2024, 11:16:17 AM10/8/24
to tesser...@googlegroups.com

Will Fetherolf

unread,
Oct 9, 2024, 12:13:16 PM10/9/24
to tesseract-ocr
Using different interpolation methods of magnification gave me different results, but I was not able to get the "/" character out of the string.
Magnifying the image by 200% using a Box, Triangle, or Catmull-Rom interpolation algorithm gave me "NIA". Using Mitchell, I got "NVA".  The Cubic B-Spline was too fuzzy for Tesseract to recognize any of the characters.

Does anyone have any further ideas?  I wish there was a way to tell Tesseract to ignore font embellishments, such as italics or underlining.

Will Fetherolf

unread,
Oct 9, 2024, 12:19:02 PM10/9/24
to tesseract-ocr
I also understand that part of the problem is the kerning used by the TrueType fonts, and I do not have the ability to get it switched to a monospaced font.  If that were the case this would be easy.
Reply all
Reply to author
Forward
0 new messages