Q: I have been playing with this for a few days and have not gotten
this to work but I do have a better understanding now. Can you give me
the underlying strategy for extracting file attachments from a pdf
file. It seems there is a collections of objects to iterate thru and
find file attachment then extract it. (Is that true?). Do you have any
sample code
vb.net or c#?
-----
A:
The following sample code can be used to extract file attachments that
are embedded within a PDF document or are attached on specific PDF
pages.
// In
VB.NET (C#/JAVA/C/C++ is essentailly the same).
Imports System
Imports PDFTRON
Imports PDFTRON.Common
Imports PDFTRON.SDF
Imports PDFTRON.PDF
Module Module1
' Relative path to the folder containing test files.
Dim input_path As String = "../../../TestFiles/"
Dim output_path As String = "../../../TestFiles/Output/"
Sub Main()
PDFNet.Initialize()
Try
Dim doc As PDFDoc = New PDFDoc(input_path + "FileAttachment.pdf")
doc.InitSecurityHandler()
' Extract file attachments associated with a document.
Dim names As NameTree = NameTree.Find(doc.GetSDFDoc(),
"EmbeddedFiles")
If names.IsValid() Then
Dim itr As NameTreeIterator = names.GetIterator()
While itr.HasNext()
Dim fs As FileSpec = New FileSpec(itr.Value())
fs.Export(output_path + fs.GetFilePath())
itr.Next()
End While
End If
' Extract all file attachments associated with pages.
Dim pgitr As PageIterator = doc.GetPageIterator()
While pgitr.HasNext()
Dim pg As Page = pgitr.Current()
Dim num_annots = pg.GetNumAnnots() - 1
Dim i As Integer = 0
For i = 0 To num_annots
Dim ann As Annot = pg.GetAnnot(i)
If ann.IsValid And ann.GetType() = Annot.Type.e_FileAttachment
Then
Dim fs As FileSpec = ann.GetFileAttachment()
fs.Export(output_path + fs.GetFilePath())
End If
Next
pgitr.Next()
End While
doc.Close()
Catch ex As PDFNetException
Console.WriteLine(ex.Message)
Catch ex As Exception
MsgBox(ex.Message)
End Try
PDFNet.Terminate()
End Sub
End Module