Issue 583 in pdfium: toUnicode map ignored

31 views
Skip to first unread message

halcan… via monorail

unread,
Aug 29, 2016, 10:47:57 AM8/29/16
to pdfiu...@googlegroups.com
Status: New
Owner: ----
Labels: Type-Defect Priority-Medium

New issue 583 by halc...@google.com: toUnicode map ignored
https://bugs.chromium.org/p/pdfium/issues/detail?id=583

The attached PDF draws one glyph, glyph 0x01:

BT
/F0 18 Tf
1 0 0 -1 16 32 Tm
<01> Tj
ET

The ToUnicode map is:

/CIDInit /ProcSet findresource begin
12 dict begin
begincmap
/CIDSystemInfo
<< /Registry (Adobe)
/Ordering (UCS)
/Supplement 0
>> def
/CMapName /Adobe-Identity-UCS def
/CMapType 2 def
1 begincodespacerange
<0001> <0001>
endcodespacerange
1 beginbfchar
<0001> <0000>
endbfchar
endcmap
CMapName currentdict /CMap defineresource pop
end
end
endstream

Which should map glyph 0x01 to U+0000.

However pdfium extracts U+0001,

$ pdfium_test --txt skbug_5606_b.pdf
$ hexdump -e '8/4 "%04X " "\n"' skbug_5606_b.pdf.0.txt
FEFF 0001


Attachments:
skbug_5606_b.pdf 2.7 KB

--
You received this message because:
1. The project was configured to send all issue notifications to this address

You may adjust your notification preferences at:
https://bugs.chromium.org/hosting/settings

n… via monorail

unread,
Sep 6, 2016, 11:30:01 AM9/6/16
to pdfiu...@googlegroups.com

Comment #1 on issue 583 by n...@chromium.org: toUnicode map ignored
https://bugs.chromium.org/p/pdfium/issues/detail?id=583#c1

The map is being ignored because it maps to 0000, which is NULL, the unicode for unsupported charcodes. We try to avoid it since this unicode means we will not render a glyph.

halcan… via monorail

unread,
Sep 12, 2016, 4:46:30 PM9/12/16
to pdfiu...@googlegroups.com

Comment #2 on issue 583 by halc...@chromium.org: toUnicode map ignored
https://bugs.chromium.org/p/pdfium/issues/detail?id=583#c2

That seems wrong. Isn't U+0000 a valid unicode code point?

n… via monorail

unread,
Sep 15, 2016, 2:29:44 PM9/15/16
to pdfiu...@googlegroups.com
Updates:
Status: Started

Comment #3 on issue 583 by n...@chromium.org: toUnicode map ignored
https://bugs.chromium.org/p/pdfium/issues/detail?id=583#c3

Can I add your pdf to repo?

halcan… via monorail

unread,
Sep 15, 2016, 2:53:38 PM9/15/16
to pdfiu...@googlegroups.com

Comment #4 on issue 583 by halc...@chromium.org: toUnicode map ignored
https://bugs.chromium.org/p/pdfium/issues/detail?id=583#c4

of course.

bugdro… via monorail

unread,
Sep 15, 2016, 4:28:26 PM9/15/16
to pdfiu...@googlegroups.com

Comment #5 on issue 583 by bugd...@chromium.org: toUnicode map ignored
https://bugs.chromium.org/p/pdfium/issues/detail?id=583#c5

The following revision refers to this bug:
https://pdfium.googlesource.com/pdfium.git/+/84be3a3cfec5107aac9a58ea00b58b733d393c7d

commit 84be3a3cfec5107aac9a58ea00b58b733d393c7d
Author: npm
Date: Thu Sep 15 20:27:21 2016

Use ToUnicode mapping even when unicode is 0.

CPDF_Font::UnicodeFromCharcode returns 0 only if ToUnicode map maps the
charcode to 0. CPDF_SimpleFont::UnicodeFromCharcode and CPDF_CID_Font::
UnicodeFromCharCode return 0 only if the call to CPDF_Font returns 0.
In other cases, these methods return an empty string. So when
processing text, a 0 return from the method should not be replaced
with the charcode.

BUG=pdfium:583

Review-Url: https://codereview.chromium.org/2342073002

[modify] https://crrev.com/84be3a3cfec5107aac9a58ea00b58b733d393c7d/core/fpdftext/cpdf_textpage.cpp
[modify] https://crrev.com/84be3a3cfec5107aac9a58ea00b58b733d393c7d/fpdfsdk/fpdftext_embeddertest.cpp
[add] https://crrev.com/84be3a3cfec5107aac9a58ea00b58b733d393c7d/testing/resources/bug_583.pdf

n… via monorail

unread,
Sep 15, 2016, 4:30:36 PM9/15/16
to pdfiu...@googlegroups.com
Updates:
Owner: n...@chromium.org
Status: Fixed

Comment #6 on issue 583 by n...@chromium.org: toUnicode map ignored
https://bugs.chromium.org/p/pdfium/issues/detail?id=583#c6

(No comment was entered for this change.)
Reply all
Reply to author
Forward
0 new messages