How can I extract File Attachments from PDF?

184 views
Skip to first unread message

Support

unread,
Dec 19, 2008, 9:06:06 PM12/19/08
to PDFTron PDFNet SDK
Q: I have a pdf file that has Word docs attached to it. I need to
extract theWord docs from the file. What steps and library of yours
would you recommend?

-----
A: You may want to search for "file attachment" in PDFNet Knowledge
Base. I
found couple of relevant articles:

http://groups.google.com/group/pdfnet-sdk/browse_thread/thread/8a77b2a324cbd
f4/d2c0ed4d3f62f17d
http://groups.google.com/group/pdfnet-sdk/browse_thread/thread/800f28fde2951
bb7/fd892101b201cd91

Support

unread,
Dec 19, 2008, 9:11:45 PM12/19/08
to PDFTron PDFNet SDK
Q: I have been playing with this for a few days and have not gotten
this to work but I do have a better understanding now. Can you give me
the underlying strategy for extracting file attachments from a pdf
file. It seems there is a collections of objects to iterate thru and
find file attachment then extract it. (Is that true?). Do you have any
sample code vb.net or c#?

-----
A:
The following sample code can be used to extract file attachments that
are embedded within a PDF document or are attached on specific PDF
pages.

// In VB.NET (C#/JAVA/C/C++ is essentailly the same).

Imports System

Imports PDFTRON
Imports PDFTRON.Common
Imports PDFTRON.SDF
Imports PDFTRON.PDF

Module Module1
' Relative path to the folder containing test files.
Dim input_path As String = "../../../TestFiles/"
Dim output_path As String = "../../../TestFiles/Output/"

Sub Main()
PDFNet.Initialize()
Try
Dim doc As PDFDoc = New PDFDoc(input_path + "FileAttachment.pdf")
doc.InitSecurityHandler()

' Extract file attachments associated with a document.
Dim names As NameTree = NameTree.Find(doc.GetSDFDoc(),
"EmbeddedFiles")
If names.IsValid() Then
Dim itr As NameTreeIterator = names.GetIterator()
While itr.HasNext()
Dim fs As FileSpec = New FileSpec(itr.Value())
fs.Export(output_path + fs.GetFilePath())
itr.Next()
End While
End If

' Extract all file attachments associated with pages.

Dim pgitr As PageIterator = doc.GetPageIterator()
While pgitr.HasNext()
Dim pg As Page = pgitr.Current()
Dim num_annots = pg.GetNumAnnots() - 1
Dim i As Integer = 0
For i = 0 To num_annots
Dim ann As Annot = pg.GetAnnot(i)
If ann.IsValid And ann.GetType() = Annot.Type.e_FileAttachment
Then
Dim fs As FileSpec = ann.GetFileAttachment()
fs.Export(output_path + fs.GetFilePath())
End If
Next

pgitr.Next()
End While



doc.Close()
Catch ex As PDFNetException
Console.WriteLine(ex.Message)
Catch ex As Exception
MsgBox(ex.Message)
End Try

PDFNet.Terminate()
End Sub
End Module
Reply all
Reply to author
Forward
0 new messages