Hi,
Had a very quick look but got sidetracked into something else, so I didn't write the tesseract test script I wanted, so TILAAEFTR. Here goes:
your '4' output image is rather large for tesseract to treat it as a 'single line'.
tess is known to deliver different accuracies for (*wildly*) different line sizes -- I seem to recall some research and graphs from 2019 where accuracy went down for both too small (8-10px) and *way too high* (200+px), producing a bit of /skewed/ bathtub curve for the OCR error rate, so the idea here is to rescale your extracted number images to a suitable size, before feeding it ot the OCR engine.
Test this remark/idea with a script:
```
let img = 'out.png' // the '4', f.e.
for (let h = 8; h < 500; h = ceil( h * 1.1 /* = +10% */ )) {
/* use imagemagick for scaling, f.e.? */
rescale(img, height: h, unit: 'px') -> img2
tesseract(img2) -> txt
}
```
(pseudocode above; write in your favorite scripting language: bash, js, python, whatever)
collect the `txt` OCR results; rank them and see where your 'optimum height' lands you. Then use that for your application.
Afterthought / Side thought:
I see you are grabbing a computer display screen and applying OCR to it. A few thoughts pop up immediately given the source type:
I see a rather organized screen, no noisy/chaotic background you get with burned-in subtitles, for example. Food for thought.
- doesn't it suffice to take the number (*digit*) images and compare them against a (created) master set, using a image similarity metric? As it's the machine rendering those numbers, they should be pretty consistent, save for some anti-aliasing or non-pixel-accurate positioning in the renderer resulting in (slightly) different pixel values / images for each digit. (Feels like tesseract is an elephant gun for this. But then I probably missed several cues and be utterly wrong...)
- of that same vein, taking it one further: since it's output from a computer machine, can't we hook into the software which produces these images and get the raw digital numeric / scoreboard data from the software straight away? Iff we can, we don't have the significant overhead and data accuracy challenges that come with reversing anything using OCR: it's never a 100% accuracy this way. (software protections and other obstructions related to data commerce and ~ politics can keep us at a distance, where screengrabbing+OCR becomes an optimum viable solution if we want to get access to the data, but I would love to get away with less for the same (or better) result. :-S )
- is it me or am I seeing more of this machine -> screengrab/scan/photograph, digitally or *analog* (phone snaps of other phones' screens) -> machine OCR data transport queries lately ('22 / '23)? Have I missed something?
This looks like trade/score screens and at least the traders would have *some* incentive to provide an API for this. (When you find the related paywall insurmountable, grab+OCR is the way to go, alas, but it will always be somewhat finicky.)