Trouble extracting date and time from image

19 views
Skip to first unread message

Michael Schuh

unread,
Oct 30, 2025, 1:26:49 PM (2 days ago) Oct 30
to tesseract-ocr
I am trying to extract the date and time from 

time.png

I have successfully use tesseract to extract text from other images.  tesseract does not find any text in the above image, 

   michael@argon:~/michael/trunk/src/tides$ tesseract time.png out
   Estimating resolution as 142

   michael@argon:~/michael/trunk/src/tides$ cat out.txt

   michael@argon:~/michael/trunk/src/tides$ ls -l out.txt
   -rw-r----- 1 michael michael 0 Oct 30 08:58 out.txt

Any help you can give me would be appreciated.  I attached the time.png file I used above.

Thanks,
   Michael
time.png

Rucha Patil

unread,
Oct 30, 2025, 2:44:05 PM (2 days ago) Oct 30
to tesser...@googlegroups.com
Could be a color performance issue. Text also has shadows.Tesseract does better on black on white. Try thresholding the image/ detect white color text make it black and then make the rest background white. Make sure you’re using the correct psm mode. 

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/77ac0d2b-7796-4f17-8bc6-0e70a9653adan%40googlegroups.com.

Ger Hobbelt

unread,
Oct 30, 2025, 2:57:34 PM (2 days ago) Oct 30
to tesser...@googlegroups.com
I cannot emphasize this single item (in a long list of stuff one can/must do before feeding any image to an OCR engine) enough: tesseract has been trained to 'read' books, i.e black text on white background. Consequently, any image preprocessing step(s) that get you there, are strongly advised.

This, and lots of other "I don't wanna hear this 🥴" important details show up in the documents and emails listed below: 
(I know people like twitter-sized or shorter text, but you've got some reading to do if you want to be successful at OCRing stuff. We all have to, it's not simple.)


and then a bunch of messages that are related; I'd rather not repeat myself, so please take your time and read those threads: some of it may sound crazy at first, but you're doing something that's touching on the edge of the original design goals and that means you're bound to meet some "weird behaviour" along the way. Before I let myself out, the second most important piece of advice I can give everyone: use HOCR (which is HTML content plus coordinates) or TSV output instead of anything else; do not, I repeat: !DO NOT! output txt format, just because every internet wizard out there does it in their blog: txt (text) format is minimal-information and you are way better off with a maximal-information output for when you need to diagnose trouble -- plus, now you've seen the workflow diagram that's part of the info above, turning HOCR/TSV into TXT should be part of your postprocessing, AFAIAC.
Other direct or sideways relevant blurbs to be read here (again, consider reading the entire threads; OCR is one of those activities where 'quickly scanning my text books to pass my exam' as you previously learned at school is not going to get you closer to success faster, on the contrary:


HTH

Met vriendelijke groeten / Best regards,

Ger Hobbelt

--------------------------------------------------
web:    http://www.hobbelt.com/
        http://www.hebbut.net/
mail:   g...@hobbelt.com
mobile: +31-6-11 120 978
--------------------------------------------------


--

Michael Schuh

unread,
Oct 30, 2025, 8:21:16 PM (2 days ago) Oct 30
to tesseract-ocr
Thanks.  I figured out how to use ImageMagick to change the mottled gray to green.

michael@argon:~/michael/trunk/src/tides$ convert time.png -fuzz 20% -fill "green" -opaque "gray(60%)" time_green.png

time_green.png

michael@argon:~/michael/trunk/src/tides$ tesseract time_green.png -
Estimating resolution as 147

10/29/2025
9:43:16 PM

Rucha Patil

unread,
Oct 31, 2025, 12:46:13 AM (yesterday) Oct 31
to tesser...@googlegroups.com
Green? Why? I dont know if this might resolve the issue. Lmk the behavior I’m curious. But you need an image that has white background and black text. You can achieve this easily using cv2 functions. 

--

Ger Hobbelt

unread,
Oct 31, 2025, 8:52:26 PM (6 hours ago) Oct 31
to tesser...@googlegroups.com
Indeed, why? (What is the thought that drove you to run this particular imagemagick command?)  While it might help visually debugging something you're trying, the simplest path towards "black text on white background" is 

1. converting any image to greyscale. (and see for yourself if that output is easily legible; if it's not, chances are the machine will have trouble too, so more preprocessing /before/ the greyscale transform is needed then)
2. use a 'threshold' (a.k.a. binarization) step to possibly help (though tesseract can oftentimes do a better job with greyscale instead of hard black & white as there's more 'detail' in the image pixels then. YMMV).

You can do this many ways, using imagemagick is one, openCV another. For one-offs I use Krita / Photoshop filter layers (stacking the filters to get what I want). 
Anything really that gets you something that approaches 'crisp dark/black text on a clean, white background, text characters about 30px high' (dpi is irrelevant, though often mentioned elsewhere: tesseract does digital image pixels, not classical printer mindset dots-per-inch). 

Note that 'simplest path towards' does not mean 'always the best way'.

Met vriendelijke groeten / Best regards,

Ger Hobbelt

--------------------------------------------------
web:    http://www.hobbelt.com/
        http://www.hebbut.net/
mail:   g...@hobbelt.com
mobile: +31-6-11 120 978
--------------------------------------------------

Michael Schuh

unread,
1:01 AM (2 hours ago) 1:01 AM
to tesser...@googlegroups.com
Rucha > Green? Why?

Ger > Indeed, why? (What is the thought that drove you to run this particular imagemagick command?)

Fair questions.  I saw both black and white in the text so I picked a background color that does not exist in the text and has high contrast.   tesseract did a great job with the green background.  I want to process images to extract Palo Alto California tide data, date, and time and then plot the results against xtide predictions.  I am close to processing a day's worth of images collected once a minute so I will see how well the green background works.  If I have problems, I will definitely try using your (Ger and Rucha's) advice.

Thank you Ger and Racha very much for your advice.

Best Regards,
   Michael

Reply all
Reply to author
Forward
0 new messages