Reduce XFDF file size

87 views
Skip to first unread message

Mysochenko Yuriy

unread,
Feb 10, 2016, 12:30:51 PM2/10/16
to PDFTron PDFNet SDK
Hello.

I save annotations in XFDF file:

PDFDoc doc = mPdfView.getDoc();
FDFDoc fdf = doc.fdfExtract(PDFDoc.e_annots_only);
fdf.saveAsXFDF(annotationsPath);

It's ok, but in case when a pdf document is very large (1000 paged for example), the xfdf file contains unnecessary information for each page:

<link page="2" rect="36.000000,657.000000,558.000000,673.000000">
<OnActivation>
<Action Trigger="U">
<GoTo>
<Dest>
<XYZ Zoom="" Top="727" Left="92" Page="16">
</XYZ>
</Dest>
</GoTo>
</Action>
</OnActivation>
</link>

I read XFDF spec, but didn't understand the role of these lines.

As a result, the XFDF file is very large, and when I try to load annotations from this file I get OutOfMemoryError.

Is there a way to reduce the file size using pdfnet sdk?

Ryan

unread,
Feb 11, 2016, 8:26:57 PM2/11/16
to PDFTron PDFNet SDK
This xfdf data means that each page has an annotation, that when clicked, takes the user to another page. In this case from page 3 to page 17 (xfdf is zero based page numbering).

You could preprocess the PDF and remove the link annotations.

There would need to be a lot of annotations for a 32bit process to run out of memory, but possible.

Is it possible for you to keep the annotations in FDF format (which is binary and much smaller)?

Mysochenko Yuriy

unread,
Feb 12, 2016, 4:33:08 PM2/12/16
to PDFTron PDFNet SDK
Thanks for response.

I made some investigation:
pdf - 6.5 mb
fdf - 1.0 mb
xfdf - 3.8 mb
xfdf without 'line' annot - 0.03 mb

fdf looks better, but still the file size too big.

So, I decided write xfdf manually (or use own format):

for (PageIterator itr = doc.getPageIterator(); itr.hasNext(); ) {
Page page = (Page) (itr.next());
int num_annots = page.getNumAnnots();
for (int i = 0; i < num_annots; ++i) {
Annot annot = page.getAnnot(i);
if (!annot.isValid()) {
continue;
}
Obj sdf = annot.getSDFObj();
String subtype = sdf.get("Subtype").value().getName();
if (!subtype.equals("Link")) {
// TODO convert Annot to xfdf
}
}
}

Is there utility class that can help me convert Annot to xfdf format like this?

<highlight color="#FFFF00" opacity="1" creationdate="D:20160212092604Z00'00'" flags="print" date="D:20160212092604Z00'00'" page="0" coords="256.035004,524.870002,374.325004,524.870002,256.035004,
498.530002,374.325004,498.530002"
rect="256.035004,498.530002,374.325004,524.870002" title="">
</highlight>

I didn't find documentation that describes how I can extract required fields from Annot class for each type of annotation (I need only highlight, strikeout, underline, ink, text)


пʼятниця, 12 лютого 2016 р. 03:26:57 UTC+2 користувач Ryan написав:

Ryan

unread,
Feb 12, 2016, 7:33:38 PM2/12/16
to PDFTron PDFNet SDK
This sample code shows how to parse annotations, in particular 
Reply all
Reply to author
Forward
0 new messages