Q: We are still pursuing the issue of turning off text-searching
within PDF files.
This was the question and answer we received from your team some time
back.
'What is the best way to disable text searching within a PDF?'
Forum reference (in
http://groups.google.com/group/pdfnet-sdk/browse_thread/thread/96e5909b28510d3c/89b9bdeb1a08b5cf?lnk=gst&q=pdf+text+search#89b9bdeb1a08b5cf
):
Well, we have successfully used rasterizing the pages as images. I
was impressed with the ease with which PDFNet let us do this!
Unfortunately to have the output in reasonable quality (DPI) makes the
files too large, and we really want to retain the excellent text
quality under magnification that we already have.
So we are once again investigating the font encoding options as
referred to in your answer. I would really be grateful for some
pointers as to how to make this work.
If we scramble the font – are we not scrambling the text as well? In
that case if we substitute the codes for the fonts, would we have to
make the same changes to the text? Would the spacing on proportional
fonts then be a problem? Some tips or links on this area would be very
helpful!!
------
A: You would need to scramble the font encoding. You would also need
to replace text data with re-encoded text as well. The PDF fill still
look exactly the same as before however text extraction or copy &
paste would result in junk text. The only way to obtain text from this
type of scrambled PDFs is to perform an OCR on the rasterized pages.
> In that case if we substitute the codes for the fonts, would we have to make the same changes to the text?
Correct, you would most likely need to update text as well. This can
be implemented along the lines of EditText sample project (http://
www.pdftron.com/net/samplecode.html#EditText)
One approach to implement this font scrambler would be to extract
glyph outlines for each referenced glyph using
pdftron.PDF.Font.GetGlyphPath(). This outline can be used to construct
a new font (with scrambled encoding). Probably the simplest approach
would be to dynamically build a Type 3 (i.e. a PDF) font using PDFNet
API (i.e. using ElementBuilder, ElementWriter, and SDF API). The other
option would be to rebuild a TTF or Type1 font but this is probably
much more work.