What data is in Element.GetTextData?

36 views
Skip to first unread message

Ryan

unread,
Dec 23, 2015, 1:57:37 PM12/23/15
to PDFTron PDFNet SDK
Q:

We are trying to manipulate text data for Arabic text. What exactly is contained in Element.GetTextData and what goes into Element.SetTextData?

------------------------------------------------------------------------------------------------------------------------------------------------------

A:

The TextData in a Text Element (Element.GetType() == e_text) is the actual character codes used in the PDF. These character codes only have meaning with regards to the currently active font (Element.GetGState().GetFont)

If the font is simple, then each character code is a single byte (0-255), otherwise, the data is UTF16-BE multi-byte.

byte[] text_data = element.GetTextData();
if(element.GetGState().GetFont().IsSimple())
{
   
// each byte is a 'character'
}
else
{
   
// multibyte data, treat as UTF16-BE
}

This means that switching fonts can completely break the appearance/unicode values, since the character encoding between the two fonts might not be the same.

Reply all
Reply to author
Forward
0 new messages