How can I extract file attachments from a PDF document?

257 views
Skip to first unread message

Support

unread,
Sep 13, 2007, 8:43:27 PM9/13/07
to PDFTron PDFNet SDK
Q:

How can I extract file attachments from a PDF document?

Apparently, file extraction has to do something with the FileSpec
class, but I couldn't relate it with PDFDoc.

----

A:
The code used to transverse low-level objects may look as follows
(most of the code was extracted from JBIG2 sample project):

In C#:

Doc cos_doc = doc.GetSDFDoc();
int num_objs = cos_doc.XRefSize();

for (int i=1; i<num_objs; ++i) {
Obj obj = cos_doc.GetObj(i);
if (obj!=null && !obj.IsFree()&& obj.IsDict()){
// Process only FileSpec objects
DictIterator itr = obj.Find("Type");
if (itr == obj.DictEnd() || itr.Value().GetName() != "FileSpec")
continue;
FileSpec file_spec = FileSpec(obj);
...
}
}

To extract files from 'File Attachment Annotations' you could use the
following code (similar to code from Annotation sample project):

PageIterator end = doc.PageEnd();
for (PageIterator itr = doc.PageBegin(); itr!=end; itr.Next()) {
Page page = itr.Current();
int num_annots = page.GetNumAnnots();
for (int i=0; i<num_annots; ++i) {
Annot annot = page.GetAnnot(i);
if(annot.GetType() == Annot.Type.e_FileAttachment) {
FileSpec file_spec = annot.GetFileAttachment();
...
}
}
}

Embedded file streams can be also associated with the document as a
whole through the EmbeddedFiles entry in the PDF document's name
dictionary. The associated name tree maps name strings to file
specifications that refer to embedded file streams through their 'EF'
entries.

Using PDFNet you can traverse 'EmbeddedFiles' map using
pdftron.SDF.NameTree as follows:

PDFDoc doc = new PDFDoc("mypdf.pdf");
doc.InitSecurityHandler();

NameTree file_map = SDF.NameTree.Find(doc, "EmbeddedFiles");
if (!file_map.IsValid()) return;

// Traverse all entries in the NameTree...
SDF.NameTreeIterator end = dests.End();
SDF.NameTreeIterator i = dests.Begin();
for (; i!=end; ++i) {
String key = i.Key().GetStr()
Obj value = i.Value();
FileSpec file_spec = new FileSpec(value);
...
}

Ryan

unread,
Nov 29, 2016, 7:00:57 PM11/29/16
to PDFTron PDFNet SDK
The code from the last part of the previous post is out of date. Below is the current way to parse file attachments.

NameTree file_map = NameTree.Find(doc, "EmbeddedFiles");

if (!file_map.IsValid()) return;


// Traverse all entries in the NameTree...
NameTreeIterator i = file_map.GetIterator();
for (; i.HasNext(); i.Next())
{
   
String key = i.Key().GetAsPDFText();

   
Obj value = i.Value();
   
FileSpec file_spec = new FileSpec(value);

   
Console.WriteLine(String.Format("{0} {1}", key, file_spec.GetFilePath()));
}

Reply all
Reply to author
Forward
0 new messages