Issue 1599 in pdfium: Add a new public API to get text from a text object without reference to text page

81 views
Skip to first unread message

aperr… via monorail

unread,
Oct 21, 2020, 5:59:14 PM10/21/20
to pdfiu...@googlegroups.com
Status: Unconfirmed
Owner: ----
Labels: Type-Defect Priority-Medium

New issue 1599 by aperr...@gmail.com: Add a new public API to get text from a text object without reference to text page
https://bugs.chromium.org/p/pdfium/issues/detail?id=1599

It would be great to get Text from a text object (FPDF_PAGEOBJECT) without reference to text page (FPDF_TEXTPAGE) like FPDFText_SetText can set text of a text object. This feature could be used to extract text from a text object included in a stamp annotation using FPDFAnnot_AppendObject for editing purpose.

--
You received this message because:
1. The project was configured to send all issue notifications to this address

You may adjust your notification preferences at:
https://bugs.chromium.org/hosting/settings

dh… via monorail

unread,
Oct 23, 2020, 10:07:16 PM10/23/20
to pdfiu...@googlegroups.com

Comment #1 on issue 1599 by dh...@chromium.org: Add a new public API to get text from a text object without reference to text page
https://bugs.chromium.org/p/pdfium/issues/detail?id=1599#c1

Seems like a reasonable request.

While the current API might not be perfect, is there any reason why the current API doesn't work for you?

aperr… via monorail

unread,
Oct 25, 2020, 1:34:39 PM10/25/20
to pdfiu...@googlegroups.com

Comment #2 on issue 1599 by aperr...@gmail.com: Add a new public API to get text from a text object without reference to text page
https://bugs.chromium.org/p/pdfium/issues/detail?id=1599#c2

I didn't find a function which allows to extract the text from a text object owned by another object (in a stamp annotation for example: FPDFPageObj_NewTextObj or FPDFPageObj_CreateTextObj, FPDFText_SetText then FPDFAnnot_AppendObject).

FPDFTextObj_GetText and FPDFText_GetText refer to the texts of a page returned by FPDFText_LoadPage which apparently does not extract the texts contained in sub-objects. I tried unsuccessfully to set NULL in the "text_page: FPDF_TEXTPAGE" parameters.

The solutions could be:
- allow the FPDFText_LoadPage function to read texts contained in sub-objects (with owner)
- add a new function that allows to directly reach a text object FPDF_PAGEOBJECT as it is possible in FPDFText_SetText
- adapt FPDFTextObj_GetText to be able to directly reach a text object FPDF_PAGEOBJECT without reference to the texts of a page returned by FPDFText_LoadPage

m… via monorail

unread,
Oct 25, 2020, 1:46:18 PM10/25/20
to pdfiu...@googlegroups.com

Comment #3 on issue 1599 by m...@asger-p.dk: Add a new public API to get text from a text object without reference to text page
https://bugs.chromium.org/p/pdfium/issues/detail?id=1599#c3

I would also very much like to be able to extract text from annotations.
Reply all
Reply to author
Forward
0 new messages