Replacing text in existing PDF documents.

235 views
Skip to first unread message

Support

unread,
Sep 14, 2009, 1:57:48 PM9/14/09
to PDFTron PDFNet SDK
Q: I am attempting to search and replace text. I create a new page
and copy the source page to the new and replace text as I go. I create
a new element using ElementBuilder and copy info from the source
element to the new element (see code below). The problem is that if
the new text has any lowercase characters they are overlayed on top of
each other but uppercase looks fine. The font is \'TimesNewRomanPSMT\'
but if I create some other font for the new element then it works.

Element element2 = builder.CreateTextBegin(element.GetGState().GetFont
(), element.GetGState().GetFontSize());

//System.Drawing.Font font = new System.Drawing.Font(\"Time New Roman
\", 1, eStyle); //Element element2 = builder.CreateTextBegin
(pdftron.PDF.Font.CreateTrueTypeFont(m_doc, font, true, true),
element.GetGState().GetFontSize());

writer.WriteElement(element2);

element2 = builder.CreateTextRun(\"fred\"); // \'FRED\' would work
element2.SetTextMatrix(element.GetTextMatrix());
GState gs = element.GetGState();
GState gs2 = element2.GetGState();
gs2.SetCharSpacing(gs.GetCharSpacing());
gs2.SetTransform(gs.GetTransform());
gs2.SetFillColorSpace(gs.GetFillColorSpace());
gs2.SetFillColor(gs.GetFillColor());
gs2.SetStrokeColorSpace(gs.GetStrokeColorSpace());
gs2.SetStrokeColor(gs.GetStrokeColor());
gs2.SetFillOpacity(gs.GetFillOpacity());
gs2.SetStrokeOpacity(gs.GetStrokeOpacity());
gs2.SetWordSpacing(gs.GetWordSpacing());

writer.WriteElement(element2);

writer.WriteElement(builder.CreateTextEnd());


------------------
A: I assume that the problem is caused because you are replacing text
using an existing PDF font. Some PDF creators subset-fonts (remove
unreferenced glyphs) or omit information for glyphs that are not
referenced. In your case it is possible that 'Width' array in the font
dictionary is missing advance widths (or they are 0) for required
lowercase characters.

All of this makes it hard to reuse existing PDF fonts from generic
documents. As a workaround, you could find a substitute font with a
similar name or other characteristics (similar to the commented-out
line in your code). To keep the file size low you could cache this
font (e.g. as a static variable) and reused for all editing operations
throughout the document.

trn2

unread,
Sep 22, 2009, 8:14:46 PM9/22/09
to PDFTron PDFNet SDK
Q: Thanks for the response. From the knowledge base articles I see
that people have had the same issue. However, I couldn't find any
sample code that would show how to figure out and create a "substitute
font". Do you know of any sample code that can get me started?

-----------
A: Windows can search for the font with the same name or find a
substitute when you create 'System.Drawing.Font'.

For example:
System.Drawing.Font font = new System.Drawing.Font(fontname, 1,
eStyle);
pdftron.PDF.Font f = pdftron.PDF.Font.CreateTrueTypeFont(doc, font,
true, false);

The remaining task it to obtain the font name (i.e. fontname) and font
styles (e.g. font weight, italic angle, etc) from the old PDF font.
All of this information can be used to select the appropriate font
from the user system. For example:

string fn = oldfont.GetName();

Please note that in PDF, subsetted fonts sometimes start with a 'ABCDEF
+' prefix and this prefix should be removed. Also you should remove
any trailing text following a comma (as in TimesNewRoman,Bold ->
TimesNewRoman). Something along the following lines:

int idx = fn.find_first_of('+');
if (idx==6 && fn.size()>7) { // remove font substitution prefix
fn = fn.substr(idx+1);
}

// extract name for TrueType fonts with extra styles idx =
fn.find_first_of (','); if (idx != string::npos) {
string style = fn.substr(idx+1);
fn = fn.substr(0, idx);
if (style == "Bold") {
is_bold = true;
}
else if (style == "Italic") {
is_italic = true;
}
else if (style == "BoldItalic") {
is_bold = true;
is_italic = true;
}
}

You can obtain FontWeight and other styles from a PDF font as follows:

double GetFontWeight(pdftron.PDF.Font font) {
double v = 400;
Obj font_desc = font.GetDescriptor();
if (font_desc != null) {
Obj a = font_desc.FindObj("FontWeight");
if (a != null) {
v = a.GetNumber();
}
}
return v;
}

Font 'weight' is a number in the range 300-900 where each number
indicates a weight that is at least as dark as its predecessor. A
value of 400 indicates a normal weight; 700 indicates bold.
---

In case you don't like Windows font substitution you can also
implement your own substitution, however it is a bit more work.
Reply all
Reply to author
Forward
0 new messages