Regarding the bug where the latest version of Chrome does not support text extraction from some Chinese PDFs

129 views
Skip to first unread message

autice

unread,
Feb 13, 2025, 12:05:06 PM2/13/25
to Chromium-discuss, pdf...@googlegroups.com
I just upgraded my Chrome browser to the latest version: 133.0.6943.99 (Official Build) (64-bit) (cohort: Stable). I noticed that the Chinese PDF text selection feature, which worked normally in older Chrome versions, now has a defect. While PDF pages display correctly, copied and pasted Chinese text becomes garbled. This issue doesn't occur in older Chrome versions or in the latest Microsoft Edge (version 133.0.3065.59), which uses Chromium kernel version approximately 93.0.3065.59 and maintains normal Chinese copy-paste functionality consistent with older Chrome versions. Additionally, when using the latest version of PDFium, I encountered errors in obtaining character Unicode values. Therefore, I strongly suspect this functional defect is caused by the latest PDFium version's handling of Chinese Unicode character retrieval2025-02-14 005851.png2025-02-14 005950.png
3.pdf

Lei Zhang

unread,
Feb 13, 2025, 12:07:22 PM2/13/25
to autice, Chromium-discuss, pdf...@googlegroups.com
Thanks for the notification. Does Chrome Canary still have this problem?

On Thu, Feb 13, 2025 at 9:05 AM autice <hiw...@gmail.com> wrote:
I just upgraded my Chrome browser to the latest version: 133.0.6943.99 (Official Build) (64-bit) (cohort: Stable). I noticed that the Chinese PDF text selection feature, which worked normally in older Chrome versions, now has a defect. While PDF pages display correctly, copied and pasted Chinese text becomes garbled. This issue doesn't occur in older Chrome versions or in the latest Microsoft Edge (version 133.0.3065.59), which uses Chromium kernel version approximately 93.0.3065.59 and maintains normal Chinese copy-paste functionality consistent with older Chrome versions. Additionally, when using the latest version of PDFium, I encountered errors in obtaining character Unicode values. Therefore, I strongly suspect this functional defect is caused by the latest PDFium version's handling of Chinese Unicode character retrieval2025-02-14 005851.png2025-02-14 005950.png

--
You received this message because you are subscribed to the Google Groups "pdfium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pdfium+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/pdfium/349969ab-319d-40d7-b0d0-77639f509671n%40chromium.org.

Lei Zhang

unread,
Feb 13, 2025, 12:22:02 PM2/13/25
to autice, Chromium-discuss, pdf...@googlegroups.com
I tested the following Google Chrome version:

132.0.6834.197: Good
133.0.6943.99: Bad
135.0.7015.1: Good

I believe this is https://crbug.com/394891352, which got fixed by https://pdfium-review.googlesource.com/128630. I can cherry-pick the fix to earlier release branches.

Inside this PDF sample, the /ToUnicode data has: 811 beginbfchar, which technically violates the spec.


autice

unread,
Feb 13, 2025, 12:56:36 PM2/13/25
to Chromium-discuss, Lei Zhang, Chromium-discuss, pdf...@googlegroups.com, autice
Yes,This pdf  sample   is an early version, but the previous Chrome version has good compatibility with such pdfs, and later versions should also be maintained.
and  i can upgrade to the latest version which is 133.0.6943. 
I found this problem because I used the latest version of pdfium lib. It also has this problem, but versions before October 2024 do not.
 It should have been after chromium/6844.  However, chromium/6721 is fine.


Reply all
Reply to author
Forward
0 new messages