Hi all, I’m using FPDFText_GetUnicode() to extract text from pdf.
I’m finding that this also extracts hidden text.
Is there a way to find out if text is hidden, so that I can discard hidden text?
I tried FPDFText_GetTextRenderMode but that doesn’t work here.
Thanks for any hints!
This is probably a proof with the recent editing history included. I don't have the source file, just the pdf. The extracted text contains repetitions of fragments and paragraphs, also from adjacent pages.
Thank you. Ok I can determine if clipping paths have been applied. See code below. But how is this going to help? How do I determine which text objects are invisible? And, do I then need to reconstruct a Text_Page, so that I can keep using FPDFText_GetUnicode() or FPDFText_GetText()?
int nobjects = (int)FPDFPage_CountObjects(Pdf_Page);