On 17 January 2014 23:52, Anton Stoychev <
anti...@gmail.com> wrote:
> However problem is still there because the PDF redaction is based on
> ImageMagick and the redaction generates a image-based pdf. So no text.
If the initial document is already a text-based PDF, rather than just
an image, then Alaveteli's censor rules already work with it just fine
— there's no need for any new functionality here at all[1]. We do
stuff like this on WhatDoTheyKnow all the time. Or have I
misunderstood the requirement?
For PDFs that are already imaged based, then they're already not going
to be searchable etc. anyway, so again you're not really losing
anything by just blacking part of that out (whether 'within'
Alaveteli, or by just download the document, manipulating it offline,
and uploading it again over the old one).
Of course, adding functionality to Alaveteli to do some sort of OCR
against _all_ image documents so we can index the text etc would be
superb, but that's a more general issue that's not really related to
this sort of redaction issue, beyond making sure that redacting a
document retriggers that OCR.
Tony
[1] Well, there are some outstanding tickets to make this work better
/ more easily for redactions that span multiple lines, for example,
and patches for that would be very welcome indeed! :)