Issue 1608 in pdfium: Embed CJK font and insert text in PDF, some characters don't render correctly.

206 views
Skip to first unread message

n… via monorail

unread,
Oct 29, 2020, 1:46:24 PM10/29/20
to pdfiu...@googlegroups.com
Status: Assigned
Owner: ni...@chromium.org
Labels: Type-Defect Priority-Medium

New issue 1608 by ni...@chromium.org: Embed CJK font and insert text in PDF, some characters don't render correctly.
https://bugs.chromium.org/p/pdfium/issues/detail?id=1608

What steps will reproduce the problem?
1. Download a Noto Sans CJK font, such as NotoSansSC.
2. Using the following steps to embed the font into a PDF and set a text message in CJK characters. Here the text string example is "这是第一句。 这是第二行。".

FPDF_FONT font = FPDFText_LoadFont(
fpdf_document, reinterpret_cast<const uint8_t*>(font_data.data()),
font_data.size(), FPDF_FONT_TYPE1, true);
FPDF_PAGEOBJECT new_obj =
FPDFPageObj_CreateTextObj(fpdf_document, font, 8.0f);
FPDFText_SetText(new_obj,
reinterpret_cast<FPDF_WIDESTRING>(replies_wtext.data()));
FPDFPageObj_Transform(new_obj, 1, 0, 0, 1, 50, 500);
FPDFPage_InsertObject(fpdf_page, new_obj);
FPDFPage_GenerateContent(fpdf_page);

3. Open the PDF in chrome PDF viewer or render with pdfium_test, some characters are not rendered correctly, such as "句", "行" or space, they are rendered as tofu.

--
You received this message because:
1. The project was configured to send all issue notifications to this address

You may adjust your notification preferences at:
https://bugs.chromium.org/hosting/settings

n… via monorail

unread,
Oct 29, 2020, 1:47:28 PM10/29/20
to pdfiu...@googlegroups.com
Updates:
Status: Started

Comment #1 on issue 1608 by ni...@chromium.org: Embed CJK font and insert text in PDF, some characters don't render correctly.
https://bugs.chromium.org/p/pdfium/issues/detail?id=1608#c1

(No comment was entered for this change.)

n… via monorail

unread,
Oct 29, 2020, 1:51:05 PM10/29/20
to pdfiu...@googlegroups.com

Comment #2 on issue 1608 by ni...@chromium.org: Embed CJK font and insert text in PDF, some characters don't render correctly.
https://bugs.chromium.org/p/pdfium/issues/detail?id=1608#c2

What caused the tofu boxes is that some of the Chinese characters' unicode (such as "行" U+884C) were translated into character code (cid) "0000", then they got written into the stream in PDF file. Later on, these "0000" cids get rendered as "tofus" since they don't match any glyphs.

Usually PDFium looks up CID by unicode through reverse look-up in ToUnicode map, or look up the CID from a CID map. Usually CID maps are not required, therefore the main issue here is that ToUnicode map inside the PDF is missing certain unicodes.

Both 884c and 535e are missing from the ToUnicode map. And I iterate through the code where the ToUnicode map is generated, both 884c and 535e are never encountered even though these characters do exist in the font. This means there are some issues when this ToUnicode map is generated from the embedded font.

bugdroid via monorail

unread,
Nov 2, 2020, 1:20:29 PM11/2/20
to pdfiu...@googlegroups.com

Comment #3 on issue 1608 by bugdroid: Embed CJK font and insert text in PDF, some characters don't render correctly.
https://bugs.chromium.org/p/pdfium/issues/detail?id=1608#c3

The following revision refers to this bug:
https://pdfium.googlesource.com/pdfium/+/8e88e3866bc3bde31ee9de6c7df388e844d85a03

commit 8e88e3866bc3bde31ee9de6c7df388e844d85a03
Author: Hui Yingst <ni...@chromium.org>
Date: Mon Nov 02 18:19:56 2020

Fix issue that some characters from the embedded font don't render.

Due to character composition, different unicodes could represent the
same character, which means they are mapped to the same CID. However the
current implementation for CID to unicode mapping is always 1:1, which
leads to the situation that certain characters cannot be recognized by
one of its valid unicodes during rendering.

- To fix the embedding process, this CL changes |to_unicode| in
LoadCompositeFont() to std::multimap so that when embedding a font to
a PDF file, all valid unicodes for the same CID won't overwrite each
other.

- To fix the rendering process, this CL replaces |m_Map| in
CPDF_ToUnicodeMap with |m_Multimap| of type std::multimap so that it
can store multiple entries with the same CID as keys.

- This CL also adds a matching embedder test which embeds a subset of
of NotoSansSC-Regular font into a PDF and tests the text rendering
result.

Bug: pdfium:1608
Change-Id: Ifadc2aa4df0de14e9d5a7c38da209f81769c0b3b
Reviewed-on: https://pdfium-review.googlesource.com/c/pdfium/+/75830
Reviewed-by: Daniel Hosseinian <dh...@chromium.org>
Commit-Queue: Hui Yingst <ni...@chromium.org>

[add] https://pdfium.googlesource.com/pdfium/+/8e88e3866bc3bde31ee9de6c7df388e844d85a03/testing/resources/fonts/third_party/NotoSansSC/NotoSansSC-Regular.subset.otf
[modify] https://pdfium.googlesource.com/pdfium/+/8e88e3866bc3bde31ee9de6c7df388e844d85a03/core/fpdfapi/font/cpdf_tounicodemap.cpp
[add] https://pdfium.googlesource.com/pdfium/+/8e88e3866bc3bde31ee9de6c7df388e844d85a03/testing/resources/fonts/third_party/NotoSansSC/README.pdfium
[modify] https://pdfium.googlesource.com/pdfium/+/8e88e3866bc3bde31ee9de6c7df388e844d85a03/core/fpdfapi/font/cpdf_tounicodemap.h
[modify] https://pdfium.googlesource.com/pdfium/+/8e88e3866bc3bde31ee9de6c7df388e844d85a03/fpdfsdk/fpdf_edit_embeddertest.cpp
[add] https://pdfium.googlesource.com/pdfium/+/8e88e3866bc3bde31ee9de6c7df388e844d85a03/testing/resources/fonts/third_party/NotoSansSC/LICENSE.txt
[modify] https://pdfium.googlesource.com/pdfium/+/8e88e3866bc3bde31ee9de6c7df388e844d85a03/fpdfsdk/fpdf_edittext.cpp

bugdroid via monorail

unread,
Nov 4, 2020, 2:17:52 PM11/4/20
to pdfiu...@googlegroups.com

Comment #4 on issue 1608 by bugdroid: Embed CJK font and insert text in PDF, some characters don't render correctly.
https://bugs.chromium.org/p/pdfium/issues/detail?id=1608#c4


The following revision refers to this bug:
https://pdfium.googlesource.com/pdfium/+/8532ef05056ce00eda2d73d2a8a35bac4b2973ad

commit 8532ef05056ce00eda2d73d2a8a35bac4b2973ad
Author: Hui Yingst <ni...@chromium.org>
Date: Wed Nov 04 19:17:17 2020

Move the Noto font file for testing to top-level third-party directory.

This CL moves the subset font file for NotoSansSC-Regular to PDFium's
top-level third-party directory, and changes the README.pdfium file to
comply with Chromium's guideline adding_to_third_party.md.

This CL also renamed the directory name to "NotoSansCJK/" so that more
Noto CJK fonts can be added in the same directory for increasing test
coverage in PDFIum embedder tests.

Bug: pdfium:1608
Change-Id: I28d59dceaa63e66d9a537d9d1485b307b2223167
Reviewed-on: https://pdfium-review.googlesource.com/c/pdfium/+/75930
Commit-Queue: Hui Yingst <ni...@chromium.org>
Reviewed-by: dsinclair <dsin...@chromium.org>

[delete] https://pdfium.googlesource.com/pdfium/+/d2b76754bcbea8a64617eeea42a390779c70bf19/testing/resources/fonts/third_party/NotoSansSC/README.pdfium
[rename] https://pdfium.googlesource.com/pdfium/+/8532ef05056ce00eda2d73d2a8a35bac4b2973ad/third_party/NotoSansCJK/NotoSansSC-Regular.subset.otf
[modify] https://pdfium.googlesource.com/pdfium/+/8532ef05056ce00eda2d73d2a8a35bac4b2973ad/testing/utils/path_service.cpp
[modify] https://pdfium.googlesource.com/pdfium/+/8532ef05056ce00eda2d73d2a8a35bac4b2973ad/fpdfsdk/fpdf_edit_embeddertest.cpp
[add] https://pdfium.googlesource.com/pdfium/+/8532ef05056ce00eda2d73d2a8a35bac4b2973ad/third_party/NotoSansCJK/README.pdfium
[modify] https://pdfium.googlesource.com/pdfium/+/8532ef05056ce00eda2d73d2a8a35bac4b2973ad/testing/utils/path_service.h
[rename] https://pdfium.googlesource.com/pdfium/+/8532ef05056ce00eda2d73d2a8a35bac4b2973ad/third_party/NotoSansCJK/LICENSE

n… via monorail

unread,
Nov 4, 2020, 7:01:49 PM11/4/20
to pdfiu...@googlegroups.com
Updates:
Status: Fixed

Comment #5 on issue 1608 by ni...@chromium.org: Embed CJK font and insert text in PDF, some characters don't render correctly.
https://bugs.chromium.org/p/pdfium/issues/detail?id=1608#c5


(No comment was entered for this change.)

Reply all
Reply to author
Forward
0 new messages