Issue 1010 in pdfium: Regarding invalid result of text search

6 views
Skip to first unread message

j.rechar… via monorail

unread,
Feb 9, 2018, 11:22:08 PM2/9/18
to pdfiu...@googlegroups.com
Status: New
Owner: ----
Labels: Type-Defect Priority-Medium

New issue 1010 by j.rechar...@gmail.com: Regarding invalid result of text search
https://bugs.chromium.org/p/pdfium/issues/detail?id=1010

What steps will reproduce the problem?
1. Open below url on the Chrome browser

URL =>

http://www.taekmin.co.kr/inobbs/down/down.php?filename=/data/sbb_001/2/164.pdf&realname=17%C1%FD_02_%C0%CC%BB%F3%C8%C6.pdf

2. Find a string 'war' using Ctrl+F


What is the expected output? What do you see instead?

I expected that the viewer exactly highlight on 'war' text string.

Howerver, it highlight on an odd text string.

I attached screenshot.

When I copy a selected text then paste the text, it show an odd text string.

What version of the product are you using? On what operating system?

Chrome 64.0.3282.140 (64 bit) (cohort: 64_140_win)
a06bc1d5e8e285c70078802de990c1719ccc75e8-refs/branch-heads/3282@{#631}
OS Windows
JavaScript V8 6.4.388.41

Please provide any additional information below.



Attachments:
HightlightedOnOddString.JPG 53.4 KB

--
You received this message because:
1. The project was configured to send all issue notifications to this address

You may adjust your notification preferences at:
https://bugs.chromium.org/hosting/settings

dsincl… via monorail

unread,
Feb 12, 2018, 9:39:13 AM2/12/18
to pdfiu...@googlegroups.com
Updates:
Owner: rhar...@chromium.org

Comment #1 on issue 1010 by dsin...@chromium.org: Regarding invalid result of text search
https://bugs.chromium.org/p/pdfium/issues/detail?id=1010#c1

(No comment was entered for this change.)

rharri… via monorail

unread,
Feb 12, 2018, 11:11:09 AM2/12/18
to pdfiu...@googlegroups.com
Updates:
Status: Accepted

Comment #2 on issue 1010 by rhar...@chromium.org: Regarding invalid result of text search
https://bugs.chromium.org/p/pdfium/issues/detail?id=1010#c2

Hmm... I get the misplaced highlight issue, but when I do a copy and paste I am getting 'war'. This is likely an issue with how Unicode characters are being counted in the UI code vs the PDFium API.

rharri… via monorail

unread,
Feb 14, 2018, 2:58:34 PM2/14/18
to pdfiu...@googlegroups.com
Updates:
Status: Started

Comment #3 on issue 1010 by rhar...@chromium.org: Regarding invalid result of text search
https://bugs.chromium.org/p/pdfium/issues/detail?id=1010#c3


(No comment was entered for this change.)

bugdro… via monorail

unread,
Feb 16, 2018, 3:03:43 PM2/16/18
to pdfiu...@googlegroups.com

Comment #4 on issue 1010 by bugd...@chromium.org: Regarding invalid result of text search
https://bugs.chromium.org/p/pdfium/issues/detail?id=1010#c4

The following revision refers to this bug:
https://pdfium.googlesource.com/pdfium/+/886f932aeeb4c0ed3bb6ccb6ba4da45f9fd29a6f

commit 886f932aeeb4c0ed3bb6ccb6ba4da45f9fd29a6f
Author: Ryan Harrison <rhar...@chromium.org>
Date: Fri Feb 16 20:02:50 2018

Correct mapping text to characters for characters missing from font

When parsing text streams there is an internal character list that is
generated of all the characters in the stream. Additionally a text
string is generated that is exposed via the public API. This string
will have all of the printing, i.e. non-control characters, in it. For
characters that are not in the font of the stream the unicode, but
printable, the character 0xFFFE is used in the text to indicate a
missing character. This a non-printing character to indicate
non-unicode.

The internal character list gets a Unicode value 0x0 when there isn't
a glyph in the font for it and the original character code is
preserved. This means that when generating the mapping between text
string and character list, the code is mistakenly thinking that the
unprintable character was not present in the text string. I have
changed the check in the mapping generation code to correctly account
for this. Additional investigation is needed to determine if inserting
0xFFFE in the text is the correct behaviour.

This patch resolves an issue where the find highlights in Chrome for a
PDF would be offset when there are unprintable characters in a stream.

BUG=pdfium:1010

Change-Id: I7547c46c5645e039a4b5138f2ce1137fa31990a5
Reviewed-on: https://pdfium-review.googlesource.com/27051
Reviewed-by: Henrique Nakashima <hnaka...@chromium.org>
Commit-Queue: Ryan Harrison <rhar...@chromium.org>

[modify] https://crrev.com/886f932aeeb4c0ed3bb6ccb6ba4da45f9fd29a6f/core/fpdftext/cpdf_textpage.cpp

rharri… via monorail

unread,
Feb 19, 2018, 10:24:55 AM2/19/18
to pdfiu...@googlegroups.com
Updates:
Status: Fixed

Comment #5 on issue 1010 by rhar...@chromium.org: Regarding invalid result of text search
https://bugs.chromium.org/p/pdfium/issues/detail?id=1010#c5


(No comment was entered for this change.)

Reply all
Reply to author
Forward
0 new messages