Hi Nicolas, I think what you did is good, you just need to play with pre-processing more.
I usually process the images with Gimp until I can get a good results, then I try to do the same processing with opencv/PIL.
You do not strictly need to threshold the image, a very very strong contrast is enough and may work better. Play with curves, histogram normalization (cv2.normalize(MINMAX), cv2.equalizeHist, PIL.rescale_intensity, PIL.autocontrast), multiply the image with itself, sharpen. Gaussian difference could also give good results. A little blur/denoise should remove the small dust or close/erode after thresholding.
You can try CLAHE to normalize the illumination, this looks like a big problem. The left part is terrible, barely readable, maybe focus is not flat on the screen? The "EXT" text is very hard to recover. Try to get a better starting image if possible. The screen contrast in the other shots is very good, try to understand why it gets lost. Maybe the viewing angle? Camera settings? LCD is bad when seen from above? Maybe side/bottom may work better or a few degrees more tilted? Too much light washing out the black text (from the leds?)?
See the attached script for some ideas and examples (not fine tuned for this image, I used it for something else). Try the other one to understand where the text is coming from (I think you need libtesseract-dev to install tesserocr).
I would crop the image into five different lines, process each one individually, especially for adaptive stuff, OTSU, CLAHE, etc. You
could also separate the text part (left) from the numbers and
process them separately. Do you really need the text on the right? Isn't
it fixed?
You do not need so much resolution, downscale the text so that each line is about 30/50px, try different the scales that works best. Usually it is better to downscale after the pre-processing.
I think dpi are used only of the page segmentation part, I never use them, I downscale the text and use single lines.
I do not think you need to fine tune the model right now, try as much as possible not to do it. If you cannot get good results with hand-tuned gimp pre-processing of individual lines consider fine tuning on this font. The big advantage of fine tuning is also that you can limit the set of characters. There is a digits only model around, you may try that on the numbers part.
This is what I get from tesseract:
$ tesseract -l eng --psm 6 step5-threshold.jpeg -
YY ALEIRES MESUREES
EXTER | EUR 21.9%
SHEE ANT 24.2%
EG EA 2a .e%T
HBETTOUE EAD ITNT 22.4%
Downscale the image to width 400:
YRLEURS MESUREES :
EETERIEDR 21 .9¢
SIE AMT 24.2%
CEG EAU 20.0%
SETOUR EAD INT 22.4%
This is almost perfect, but it is quite fragile, using 500 I get this:
YALEURS MESUREES :
ENTER EUR 21.9%
SAEIEE AMT 24.2%
EG EA 26.8%
SETOUR EAD INT 22.4%
Cutting out the celsius unit gives a little better results. Also a darker threshold, where the characters are more connected seems to work better. Maybe the font need some fine tuning, the third line with the zeros is the main problem. Cutting individual lines or numbers does not help.
I started over with gimp from step0 focusing on numbers and I get the attached image, with this one the results are more stable with different image downscales (400, 500, 600). Maybe the third line was just a bad case.
VAIL FURS NMESUREES
XTE (EI IF 21.9%
Sead T 24.2%
EAU 20.0%
EFAS INT 22.94%
Bye
Lorenzo