Removing handwriting from printed documents

36 views

Skip to first unread message

Manuel Le Normand

unread,

Jul 30, 2014, 3:46:29 AM7/30/14

to tesser...@googlegroups.com

I'm working on a project to automatically process scanned documents. These documents contain handwriting over the printed document that damages the OCR over printed blocks. It can appear as a signature over a name and job title. These handwritings are more rounded and thin than the printed background and easily recognizable by human reading, and they do not differ by color or any easy image process that I could think of.

Generally these blocks of text aren't even recognized as char boxes, so I don't train these blocks as these noises are not constant.

A similar issue was discussed here but with no hint for a solution - http://stackoverflow.com/questions/8158182/removing-noise-from-document-images

I was wondering if any of you had a similar case and can Leptonica / tesseract variables help improve the recognition of these chars.

Thanks in advance,

Manuel

Reply all

Reply to author

Forward

0 new messages