How do I extract file attachments from PDF?

800 views
Skip to first unread message

Support

unread,
Nov 16, 2007, 6:47:08 PM11/16/07
to PDFTron PDFNet SDK
Q: How do I extract file attachments from PDF?

--------
A:
To extract data associated with 'File Attachment Annotations' you
could use the following code (similar to code from Annotation sample
project):

PageIterator end = doc.PageEnd();
for (PageIterator itr = doc.PageBegin(); itr!=end; itr.Next()) {
Page page = itr.Current();
int num_annots = page.GetNumAnnots();
for (int i=0; i<num_annots; ++i) {
Annot annot = page.GetAnnot(i);
if(annot.GetType() == Annot.Type.e_FileAttachment) {
FileSpec file_spec = annot.GetFileAttachment();
Filter stm = file_spec.GetFileData();
if (stm != null)
FilterReader reader = new FilterReader(stm);
// use file_spec.GetFilePath() to get the filename...
StdFile out_file = new StdFile("out.dat",
StdFile.OpenMode.e_write_mode);
FilterWriter writer = new FilteWriter(out_file);
writer.WriteFilter(reader);
writer.Flush();
out_file.Close();
}
}
}

Embedded file streams can be also associated with the document as a
whole through the EmbeddedFiles entry in the PDF document's name
dictionary. The associated name tree maps name strings to file
specifications that refer to embedded file streams through their 'EF'
entries.

Using PDFNet you can traverse 'EmbeddedFiles' map using
pdftron.SDF.NameTree as follows:

PDFDoc doc = new PDFDoc("mypdf.pdf");
doc.InitSecurityHandler();

NameTree file_map = SDF.NameTree.Find(doc, "EmbeddedFiles");
if (!file_map.IsValid()) return;

// Traverse all entries in the NameTree...
SDF.NameTreeIterator end = dests.End();
SDF.NameTreeIterator i = dests.Begin();
for (; i!=end; ++i) {
String key = i.Key().GetStr()
Obj value = i.Value();
FileSpec file_spec = new FileSpec(value);
... same as above ...
}

Also please see the following

http://groups.google.com/group/pdfnet-sdk/browse_thread/thread/21da509149023400

Support

unread,
Apr 22, 2013, 5:27:02 PM4/22/13
to pdfne...@googlegroups.com
Q:
 

We have some PDFs that contain file attachments (e.g. an embedded JPEG file). In Adobe Reader, we could double-click the link to open the jpg file in Windows Photo Viewer. While, in our PDF Viewer (using PDFNet) or your PDFViewer sample application, when double-clicking the link we only got a small empty "Sticky Note" window.  We are wondering whether this kind of internal or external file links in a pdf document is supported by PDFNet or not?  Could we implement something using PDFNet SDK so that the file link will work?

------------------

A: 
 

You can use the following code as a starting point to extract file attachments via PDFViewCtrl (WinForms):

 

protected override void OnMouseDoubleClick(System.Windows.Forms.MouseEventArgs e)

{

    // base.OnMouseDown(e);  // First process the event in the base class

 

    int page_num = GetPageNumberFromScreenPt(e.X, e.Y);

    if (page_num < 1) return;

 

    // Find the click point in page coordinate system... is there a file attachment annotation at this point?

    double x = e.X, y = e.Y;

    ConvScreenPtToPagePt(ref x, ref y, page_num);

 

    Page page = GetDoc().GetPage(page_num);

    int annot_num = page.GetNumAnnots();

    for (int i = 0; i < annot_num; ++i)

    {

        Annot annot = page.GetAnnot(i);

        // Process only file attachment annotations...

        if (annot.IsValid() == false ||

        annot.GetType() != Annot.Type.e_FileAttachment) continue;

 

        Rect box = annot.GetRect();

        if (box.Contains(x, y))  {

            // Extract the file attachment ...

            // See https://groups.google.com/d/msg/pdfnet-sdk/gA8o_eKVG7c/kc0BsgEhif0J

            pdftron.PDF.Annots.FileAttachment fileAttachment = new pdftron.PDF.Annots.FileAttachment(annot);

            FileSpec file_spec = fileAttachment.GetFileSpec();

            using (Filter stm = file_spec.GetFileData()) {

                if (stm != null)  {

                    FilterReader reader = new FilterReader(stm);

                    // use file_spec.GetFilePath() to get the filename...

                    using (StdFile out_file = new StdFile("c:/1.jpg", StdFile.OpenMode.e_write_mode))  {

                        FilterWriter writer = new pdftron.Filters.FilterWriter(out_file);

                        writer.WriteFilter(reader);

                        writer.Flush();

                    }

 

                    // Launch the attachment in an external viewer ... ?

                    try {

                        System.Diagnostics.Process.Start("c:/1.jpg");

                    }

                    catch (System.ComponentModel.Win32Exception noBrowser)  {

                        if (noBrowser.ErrorCode == -2147467259)

                            MessageBox.Show(noBrowser.Message);

                    }

                    catch (System.Exception other)  {

                        MessageBox.Show(other.Message);

                    }

                }

            }

        }

    }

}

 

Please keep in mind that extracting and executing file attachments from PDF is potentially a dangerous operations (since embedded files may also contain executables, viruses, etc.).

 

 

 

Daniel Lutz

unread,
Nov 5, 2019, 6:10:16 PM11/5/19
to PDFTron PDFNet SDK
Hello support,

the StdFile class seems not to be existed. So today what ist the ebst way to safe a file attachment to disc?

Best regards
Daniel

Ryan

unread,
Nov 5, 2019, 8:16:20 PM11/5/19
to PDFTron PDFNet SDK
Yes, the StdFile class is gone. Instead the Filter class has a WriteToFile method.

Filter stm = file_spec.GetFileData();
stm
.WriteToFile(path);

Reply all
Reply to author
Forward
0 new messages