recognizing partly striked-out text

1 view
Skip to first unread message

Felix Deutsch

unread,
Aug 24, 2003, 7:39:47 AM8/24/03
to
Is anybody aware of an OCR solution for recognizing partly striked-out text
(the sort of which appears in published redacted government files)?

As you probably know, there is a judicial inquiry underway right now in the
UK, trying to find out about the circumstances surrounding the death of chief
chem/bio weapons expert Dr. Kelly.

Documents (like email printouts, notes, etc.) related to the inquiry are being
posted to http://www.the-hutton-inquiry.org.uk, but it happens that seemingly
interesting parts (especially involving the efforts to link Iraq to al-Quaeda)
have been redacted with black marker.

A good example would probably be this:
http://www.the-hutton-inquiry.org.uk/content/cab/cab_11_0077to0078.pdf

As you can see, some parts of the redacted text can be made out by the naked
eye, but others also only partly covered by the marker (sometimes top and/or
bottom parts of letters visible) could use some statistical, automated help.

Can anyone point me to a readily available application/toolset optimized for
this task (free software preferred, operating system doesn't really matter)?

If anybody wants to have a go at it themselves, I'd appreciate if you posted
your results (or mailed them to me), indicating the methodology/software used.

Thanks.

Reply all
Reply to author
Forward
0 new messages