How do I redact text based on regular expression searches?

58 views
Skip to first unread message

Aaron Gravesdale

unread,
Nov 19, 2013, 12:24:13 PM11/19/13
to pdfne...@googlegroups.com
Q:

I have a question on the new improvements:

> ContentReplacer can search and replace strings on a PDF page with user defined patterns.

With this change, will it be possible to perform a search and replace on a wildcard string [i.e., a social security number or phone number pattern – nnn-nnn-nnnn or (nnn) nnn-nnnn]?  We often have to redact out this type of personally identifiable information (PII) from documents.

A:

ContentReplacer is the wrong tool for this task.  It can't match regular expressions, and is not intended for redaction.

A better solution would be to perform a text search to find the bounding boxes of text matching a regular expression, as shown in the TextSearch sample code:

http://www.pdftron.com/pdfnet/samplecode.html#TextSearch

Then, to correctly redact the text, use the PDF Redactor add-on. The following sample code shows how:

http://www.pdftron.com/pdfnet/samplecode.html#PDFRedact

Reply all
Reply to author
Forward
0 new messages