How to remove Tagged content from a PDF?

11 views
Skip to first unread message

Ryan

unread,
Nov 29, 2019, 5:32:49 PM11/29/19
to PDFTron PDFNet SDK
Question:

We are having some issues with a 3rd party tool and how it is failing to handle Tagged PDF files. How can we remove the Tagged data in hopes that the other tool will work properly?

Answer:

There are 3 ways to remove the Tagged data, each one involving a larger modification of the PDF.

1) Set the MarkInfo/Marked boolean to false, or just delete (as we do in this case, since default is False anyway).

Obj root = pdfDoc.GetRoot();
root
.Erase("MarkInfo");


2) Do (1) above, and then also delete the Logical Structure.

Obj root = pdfDoc.GetRoot();
pdfdoc
.GetRoot().Erase("MarkInfo");
pdfdoc
.GetRoot().Erase("StructTreeRoot");


3) Do (1) and (2) above, and then use the ElementEdit sample code to read each pages content, and NOT write back the following types.

e_marked_content_begin
e_marked_content_point
e_marked_content_end



Full code available upon request.
Reply all
Reply to author
Forward
0 new messages