Comment #4 on issue 1010 by
bugd...@chromium.org: Regarding invalid result of text search
https://bugs.chromium.org/p/pdfium/issues/detail?id=1010#c4The following revision refers to this bug:
https://pdfium.googlesource.com/pdfium/+/886f932aeeb4c0ed3bb6ccb6ba4da45f9fd29a6fcommit 886f932aeeb4c0ed3bb6ccb6ba4da45f9fd29a6f
Author: Ryan Harrison <
rhar...@chromium.org>
Date: Fri Feb 16 20:02:50 2018
Correct mapping text to characters for characters missing from font
When parsing text streams there is an internal character list that is
generated of all the characters in the stream. Additionally a text
string is generated that is exposed via the public API. This string
will have all of the printing, i.e. non-control characters, in it. For
characters that are not in the font of the stream the unicode, but
printable, the character 0xFFFE is used in the text to indicate a
missing character. This a non-printing character to indicate
non-unicode.
The internal character list gets a Unicode value 0x0 when there isn't
a glyph in the font for it and the original character code is
preserved. This means that when generating the mapping between text
string and character list, the code is mistakenly thinking that the
unprintable character was not present in the text string. I have
changed the check in the mapping generation code to correctly account
for this. Additional investigation is needed to determine if inserting
0xFFFE in the text is the correct behaviour.
This patch resolves an issue where the find highlights in Chrome for a
PDF would be offset when there are unprintable characters in a stream.
BUG=pdfium:1010
Change-Id: I7547c46c5645e039a4b5138f2ce1137fa31990a5
Reviewed-on:
https://pdfium-review.googlesource.com/27051Reviewed-by: Henrique Nakashima <
hnaka...@chromium.org>
Commit-Queue: Ryan Harrison <
rhar...@chromium.org>
[modify]
https://crrev.com/886f932aeeb4c0ed3bb6ccb6ba4da45f9fd29a6f/core/fpdftext/cpdf_textpage.cpp