Issue 1601 in pdfium: Extracted text from the table, cell values get merge

24 views
Skip to first unread message

vikas… via monorail

unread,
Oct 22, 2020, 6:02:54 PM10/22/20
to pdfiu...@googlegroups.com
Status: Unconfirmed
Owner: ----
Labels: Type-Defect Priority-Medium

New issue 1601 by vikas...@gmail.com: Extracted text from the table, cell values get merge
https://bugs.chromium.org/p/pdfium/issues/detail?id=1601

Hi Team,

From the attached document, the extracted text from the table's cell values gets concat.I am using FPDFText_GetText() method to get text values.


Current output:

Name FirstNumber SecondNumber ThirdNumber lastName
Person 899955 55 Name
Person 55581 69 69 Name

Expected output:

Name FirstNumber SecondNumber ThirdNumber lastName
Person 8999 55 55 Name
Person 55581 69 69 Name

Kindly check this issue and let me know the solution to resolve this.

please let me know if you have any concerns.

Regards,
Vikas

Attachments:
test.pdf 45.5 KB

--
You received this message because:
1. The project was configured to send all issue notifications to this address

You may adjust your notification preferences at:
https://bugs.chromium.org/hosting/settings

n… via monorail

unread,
Oct 22, 2020, 7:21:01 PM10/22/20
to pdfiu...@googlegroups.com
Updates:
Status: Available

Comment #1 on issue 1601 by ni...@chromium.org: Extracted text from the table, cell values get merge
https://bugs.chromium.org/p/pdfium/issues/detail?id=1601#c1

Acrobat puts "\n" when extracting the whole row of text. Extracted result:
"Person
8999
55
55
Name".

Okular extracts the text and put space or "\n " between every two columns. Extracted result:

"Person 8999
55 55 Name"

Confirmed that chrome PDF viewer extracts "Person 899955 55 Name", which is not consistently putting spaces as dividers for different text objects (wrapped by BT, ET operators).
Reply all
Reply to author
Forward
0 new messages