Help with recognition please

Will Fetherolf

unread,

Oct 7, 2024, 9:33:12 PM10/7/24

to tesseract-ocr

The application I'm attempting to OCR is using what I think is Arial for the font, but every time I run the attached image through Tesseract 5.4.0 on Windows I get "NVA" or "NIA" depending on which PSM I use. If I use 7, I always get back "NIA". I have tried running training on a variety of captured data from my application with no success.

Help me, Obi-Wan Kenobi, you're my only hope!

image.20241004100212.68.bmp

Art Rhyno

unread,

Oct 8, 2024, 11:12:51 AM10/8/24

to tesser...@googlegroups.com

You could try resizing the image, with imagemagick, something like:

convert test.bmp -resize 200% test.png

That seems to be enough to separate out the “N” and the “/”.

art

From: tesser...@googlegroups.com <tesser...@googlegroups.com> On Behalf Of Will Fetherolf
Sent: Monday, October 7, 2024 9:33 PM
To: tesseract-ocr <tesser...@googlegroups.com>
Subject: [tesseract-ocr] Help with recognition please

You don't often get email from will.fe...@gmail.com. Learn why this is important

The application I'm attempting to OCR is using what I think is Arial for the font, but every time I run the attached image through Tesseract 5.4.0 on Windows I get "NVA" or "NIA" depending on which PSM I use. If I use 7, I always get back "NIA". I have tried running training on a variety of captured data from my application with no success.

Help me, Obi-Wan Kenobi, you're my only hope!

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/01ab548e-e45e-48b7-824d-73debed1adb1n%40googlegroups.com.

Will Fetherolf

unread,

Oct 8, 2024, 11:15:49 AM10/8/24

to tesseract-ocr

I'll look into that. I've got code in my automation system that blows up the image, but it's not doing any kind of smoothing. I might have to code up a "nicer" image blowup function.

Pankaj Duggal

unread,

Oct 8, 2024, 11:16:17 AM10/8/24

to tesser...@googlegroups.com

Hi
Did you try this trick ??

To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/YQBPR0101MB85429847B45CE3732F0ECE5FDC7E2%40YQBPR0101MB8542.CANPRD01.PROD.OUTLOOK.COM.

Will Fetherolf

unread,

Oct 9, 2024, 12:13:16 PM10/9/24

to tesseract-ocr

Using different interpolation methods of magnification gave me different results, but I was not able to get the "/" character out of the string.

Magnifying the image by 200% using a Box, Triangle, or Catmull-Rom interpolation algorithm gave me "NIA". Using Mitchell, I got "NVA". The Cubic B-Spline was too fuzzy for Tesseract to recognize any of the characters.

Does anyone have any further ideas? I wish there was a way to tell Tesseract to ignore font embellishments, such as italics or underlining.

Will Fetherolf

unread,

Oct 9, 2024, 12:19:02 PM10/9/24

to tesseract-ocr

I also understand that part of the problem is the kerning used by the TrueType fonts, and I do not have the ability to get it switched to a monospaced font. If that were the case this would be easy.

Reply all

Reply to author

Forward