Number Recognition

151 views
Skip to first unread message

Harry Stevenson

unread,
Nov 5, 2023, 11:40:46 AM11/5/23
to tesseract-ocr
I'm trying to extract numerical data from this image, but I'm not getting good results. Can anyone recommend any other config options/ how I should crop it to help. 
Here is what it currently recognises:
`$
8 ee
- oO eta
8334 .°8R 3
339 Sf 2 fe
2 Soe £3 BS oO
BRSal Sage
5 SEE:
papeo cee |`
with config: --psm 5 load_system_dawg=false load_freq_dawg=false
Thank you
MkWGeckoCodes.png

La Monte H. P. Yarroll

unread,
Nov 6, 2023, 4:46:22 PM11/6/23
to tesser...@googlegroups.com
You need to reduce it to black and white, or at least greyscale. This appears to be crafted specifically to thwart OCR. The color gradient in the background is echoed in the numbers so select by color isn't that helpful. It looks like gimp fuzzy select can get you close, but those drop-shadows around the digits are really a pain. They're not properly all the same color. If you can get the background and shadows around the digits to a proper black, you may be able to invert the colors and get something useful.

After 20 minutes of mucking about, I've not been able to produce anything usable, unless you would be happy with just the labels and not the numbers.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/930a2ad3-e8c7-4239-95a0-8ad1e1dcc53dn%40googlegroups.com.

Tom Morris

unread,
Nov 7, 2023, 3:47:58 PM11/7/23
to tesseract-ocr
If that format is fixed, with odd font and extremely constrained set of symbols (0-9, minus sign, decimal point), you might be better of using something like OpenCV to do the symbol matching by hand.

Tom
Reply all
Reply to author
Forward
0 new messages