font name UTF8 encoding bug

100 views

Skip to first unread message

Han Chipset

unread,

Jul 14, 2021, 11:13:02 PM7/14/21

to pdfium

On Windows, the font name (call function FPDFText_GetFontInfo) in buffer is ANSI (Windows 1252?) encoding, NOT UTF8. For English, UTF8 is the same as ANSI, so there is no problem, but for other languages(such as Chinese), there will be a problem. For example, for Chinese SimSun this is fine, but Chinese 宋体 will cause a garbled code.

The tested pdf file attached in this post.

// Experimental API.

// Function: FPDFText_GetFontInfo

// Get the font name and flags of a particular character.

// Parameters:

// text_page - Handle to a text page information structure.

// Returned by FPDFText_LoadPage function.

// index - Zero-based index of the character.

// buffer - A buffer receiving the font name.

// buflen - The length of |buffer| in bytes.

// flags - Optional pointer to an int receiving the font flags.

// These flags should be interpreted per PDF spec 1.7

// Section 5.7.1 Font Descriptor Flags.

// Return value:

// On success, return the length of the font name, including the

// trailing NUL character, in bytes. If this length is less than or

// set to the font flags. |buffer| is in UTF-8 encoding. Return 0 on

// failure.

FPDF_EXPORT unsigned long FPDF_CALLCONV FPDFText_GetFontInfo(FPDF_TEXTPAGE text_page, int index, void* buffer, unsigned long buflen, int* flags);

annotation1.pdf

Lei Zhang

unread,

Jul 14, 2021, 11:30:12 PM7/14/21

to Han Chipset, pdfium

Thanks for bringing up this issue. Can you file a bug for it at
https://crbug.com/pdfium/new ?

> --
> You received this message because you are subscribed to the Google Groups "pdfium" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pdfium+un...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/pdfium/dbd32fa5-e886-4a41-a32a-82a448587b96n%40googlegroups.com.

Reply all

Reply to author

Forward

0 new messages