I don't really agree with your statement. There is a lot of things we had to consider with image processing before tesseract finally gave us accurate results. But it all makes sense. Here is our actual pipeline:
1 - Cleanup the image: remove any artifact of the camera or scan device, cut the paper accurately, remove noise, binarize
2 - Unskew the image: make text lines very horizontal
3 - Cut the zone of interest: take text zone of interest in the document, using DNN to recognize the zones
4 - Clean the text zone: remove any unrelevant part in the image (like lines, tables, stamps)
5 - Create a whitelist based on the zone of probable characters (this one improves accuracy a lot !)
6 - Submit to tesseract with appropriate settings for the language
1: it is understandable how noise or image quality could affect recognition
2: tesseract expect lines of text to be straight
3: this reduces the processing speed and allow us to focus on the zone for further cleaning (next steps) or custom parameters before submitting
4: lines, tables, and other things can alter recognition, because a piece of line sometimes is recognised as |, -, _, l, `1`. it could also affect nearby characters, especially when working with Chinese-based characters
5: whitelisting based on the content helps recognition a lot. simple example is if you search for numbers, whitelist "1234567890" - 0 is close to O. Even humans make the mistake, that's why we banned O from Wifi passwords :laugh:
6: Settings of tesseract can improve a lot the recognition when working with non-english scripts or when image is not perfect (tesseract works best with dpi 300)
We gone from 10% accuracy to nearly 95% now. Each image is different and each may require different processing or parameters. Making a solutions that fits all is very complex, but I still think it is possible if the application is specific enough. I guess that is why it is not included in tesseract. Making it work very well for a specific use-case would break others.
I guess you just have to find the right pre-processing for your kind of image
Hope it thelps