OCRing screenshots faster

138 views

Skip to first unread message

Billy Croan

unread,

Oct 14, 2024, 12:37:25 AM10/14/24

to tesser...@googlegroups.com

I have a 1920x1080 screen and I have a script to screenshot it every so often (usually 30 seconds) and I run tesseract on those screenshots to make them searchable, so I can go back in time and find something that I thought I recall seeing.

This works well, and has given me much appreciated certainty many times. It is perhaps a little cpu/power hungry though. It's the only thing that pegs the cpu most times. So today I optimized it to only run the OCR when the battery is full and using AC power.

Then I got to thinking. Tesseract takes about 4 seconds to process one screenshot. Or about 13% of my whole cpu. That's only okay for web browsers, right? :-p

Is there a way to speed that up? So I read https://tesseract-ocr.github.io/tessdoc/FAQ.html#can-i-increase-speed-of-ocr And I tried "tessedit_do_invert=0 " and it wrecked the output. completely unusable garbled output.

I've been specifying dpi 96 all this time and maybe dpi could affect performance?

I tried "OMP_THREAD_LIMIT=1" as well. But 1, 2, and 4 performed the same. My cheap laptop has a " i5-1235U" cpu so 2 performance cores and 8 efficiency cores. I have no idea how to tell tesseract to use the performance cores only but maybe the e-cores slow it down.

I also wonder if there's some parts of tesseract that I can shut off to reduce CPU usage... Knowing that my input is "perfect" text. i.e. it will never be tilted or rotated 90 or 180 degrees. I only want to recognise English. And it is guaranteed never to have defects common to printed/scanned paper images. Tesseract could be 'lazier' maybe and still do a good job in this case.

Any suggestions, feedback? maybe I should be trying to text-scrape via X11 or gtk somehow? But I do often use ipmi kvmoip consoles or remote terminals where my local PC wouldn't have the text in a buffer but it should still be exceptionally clean text.

Ger Hobbelt

unread,

Oct 21, 2024, 9:08:23 AM10/21/24

to tesseract-ocr

Since you're talking screenshots:

- tesseract is designed and trained to process books and published papers, i.e. black printed text on white background. If you have your UI set to "dark mode" i.e. bright text on dark background, you can help tesseract a lot by preprocessing your image, e.g. invert the colors, so the input image is much closer to black text on white BG. what tesseract does under the hood is only try both ways (regular + inverted word image snippet) for any word/particle that resulted in a lower than 0.7 confidence estimate on the first try: by making sure your input image is as clean as possible and black/,text on white/bright background you save yourself and tesseract up to half of the OCR attempts.

- cleanliness is godliness in OCR ;-) : remove any noise from your input image, including window borders and other graphical elements that are not text: this saves tesseract time in it's image-to-line/word segmenter and will consequently produce fewer and cleaner bboxes (bounding boxes) of image snippets to feed into the neural net that does the image pixels to text transformation. Less pixels to munge means more speed going through a 'page' (= input image).

Tesseract has an internal image preprocess which detects long lines (window borders and such) and a few other bits of graphic content, but that is a very generic machine: you surely can do better in a bespoke solution as part of your own image preprocessing stage of the entire screen-to-searchable-text process.

- where text scraping is possible, it will always win: across the board it's lower CPU cost than running an image-based neural net and has FAR fewer quality issues due to the inherent statistics of both procedures. OCR is and always should be: a last resort.

- in the old days, with lower Rez displays, yes, the computer text was 'crisp' - in a very specific technical way that's not conducive to good generic OCR, which is usually printed-book trained and oriented, and with modern displays you get some human-visual improvements but also do realize those new 'crisp' looking characters carry some edge noise, thanks to modern anti aliasing (ClearText and other algos used by the various os'es and display drivers) and ubiquitous subpixel positioning. Hence, an 'A' here does not have to match an 'A' there, pixel for pixel, in the same window+screenshot any more.

That being said, it might be useful to check other, more direct, pattern recognition approaches when your input is decoding rendered text consoles. Maybe look around at openCV, for instance. I don't know: I haven't dealt with your particular input myself.

Cheers,

Ger

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CADUq1f5tFHjEqC_S4fD%2BoeBhwmBV%3DmtqFxe9scPCcRBcoRgctw%40mail.gmail.com.

Reply all

Reply to author

Forward

0 new messages