I'm working on a project to automatically process scanned documents. These documents contain handwriting over the printed document that damages the OCR over printed blocks. It can appear as a signature over a name and job title. These handwritings are more rounded and thin than the printed background and easily recognizable by human reading, and they do not differ by color or any easy image process that I could think of.
Generally these blocks of text aren't even recognized as char boxes, so I don't train these blocks as these noises are not constant.
I was wondering if any of you had a similar case and can Leptonica / tesseract variables help improve the recognition of these chars.
Thanks in advance,
Manuel