Hi Shree,
I'd love to but it is a commercial project I'm working on so I cannot share the current solution.
I will try to find the old scripts I used for the first attempts. Basically it was something like this:
- normalize lightness
- make illumination uniform (CLAHE on HSV "V" channel)
- denoise/divide to remove background (with custom level based on noise estimation)
- normalize text size for a fixed value
- remove "dust" with morphological operations
- remove light gray shades with a "soft threshold"
- stretch contrast/histogram
- straighten text (and dewarp for very long lines)
I used opencv and PIP.
The main problem is that a ton of fine tuning is required for each of these steps if the input are random pictures from smartphones/scanner/etc.
It also depends on how noisy the background is or if color can be used as a hint for background detection. For example converting the image to HSV makes very simple to remove colored noise or colored background. You select the parts with high saturation with a numpy mask and set them to white or black depending on their luminance.
Measuring noise, blurriness, contrast, etc. helps to decide what processing to apply or to do it proportionally to the measured value.
Many fine tuning values also depend on the image/text size.
Gaussian difference and divide and the best way I found general cleanup.
Sometimes multiply works great for details enhancement of low contrast images.
I can try to put together a small sample script because there are not many around or at least easy to find. Not much time to do it but I'll try.
Bye
Lorenzo