Trouble with Apparently Simple Source Image

40 views
Skip to first unread message

Rob

unread,
Feb 12, 2024, 12:41:02 PM2/12/24
to tesseract-ocr
Hello,

I've run into some trouble using Tesseract OCR in a python program doing some screen scraping. I can't quite wrap my head around why this one value is having so much more trouble than the others on the same page,  with the same contrast and font.

This is the image in question:
It has been scraped from a 1080p resolution screenshot, sliced into individual images for the values in a grid, scaled up by 10x, inverted (from white-on-black to this), thresholded, and passed to Tesseract. I have also tried various Gaussian and median blurs but those seem to just make other strings fail more.

I have tried most of the PSM options that make sense, and passed options with just numerals, $, comma, and decimal as allow list of characters. I've tried all the different interpolations OpenCV has to offer. Tesseract just constantly chokes on this value.

It's a little frustrating because the only OCR I've found that works with this value is an A9T9 model(I think) through the free api at ocr.space ( https://ocr.space/ocrapi#ocrengine2 ). Unfortunately there doesn't appear to be a way for me to run that locally, and the string seems like it should be simple for an OCR read.

Any advice on poking Tesseract in the right way to read this, or some fancy filtering I could do to help make the image clearer for it?

Thanks!

Zdenko Podobny

unread,
Feb 12, 2024, 1:53:46 PM2/12/24
to tesser...@googlegroups.com
tesseract I_read_docs_carefully_instead_of_a_lot_of_writing.png - --psm 6
$0.081

Zdenko


po 12. 2. 2024 o 18:40 Rob <madi...@gmail.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ae2ae7cd-6cd1-44ef-843e-ef10a35929c6n%40googlegroups.com.

René JM Clais

unread,
Feb 12, 2024, 1:55:46 PM2/12/24
to tesser...@googlegroups.com
Hi Rob,
I try with my own python program with your picture and I get the following result:
$0.081
Is this correct ? 
I use : custom_config = r' -l eng --psm 6  '      
Does it help ?
Cheers
René

--
Reply all
Reply to author
Forward
0 new messages