Extract doc info and other metadata properties from PDF documents

236 views
Skip to first unread message

Support

unread,
Dec 21, 2012, 3:57:32 PM12/21/12
to pdfne...@googlegroups.com

Q:

We are using PDFNet SDK in a .NET application to check the properties of PDF documents and we have found several documents for which the document properties appears empty when we check them with Adobe Acrobat(we are using Acrobat 9 and 10) but are not when we check them with our applications.

 

In our code, we try to get the properties of the PDF document with the following code (doc is a PDFDoc object representing our document):

PDFDocInfo info = doc.GetDocInfo();

ourCustomObject.MetaData.Author = info.GetAuthor();

ourCustomObject.MetaData.Keywords = info.GetKeywords();

ourCustomObject.MetaData.Subject = info.GetSubject();

ourCustomObject.MetaData.Title = info.GetTitle();

 

Could you give us more information on how to get the same information through PDFNet than through the document properties in Adobe Acrobat?

 
--------
A: 
 

The most likely issue is that the file contains both PDF doc info and XMP metadata steam. Sometimes file can have one  or the another or both (sometimes they are not in sync). For more info please see:

  https://groups.google.com/d/topic/pdfnet-sdk/Jm04ped89ig/discussion

 

Btw. if required you can also extract the XML metadata steam as follows (this is VB L but you could easily translate it to whatever language you need):

 

Private Function ExtractXMLMetadata() As XElement

        Dim bufferSize As Integer = 256

        Dim xMeta As XElement = Nothing

        Dim finalStr As String = ""

        Dim xmpStream As pdftron.SDF.Obj = Me.p_doc.GetRoot().FindObj("Metadata")

        If (xmpStream IsNot Nothing) Then

            Dim oStream As pdftron.Filters.Filter = xmpStream.GetDecodedStream()

            Dim oReader As pdftron.Filters.FilterReader = New pdftron.Filters.FilterReader(oStream)

 

            Dim buffer As Byte() = New Byte(bufferSize) {}

            While (oReader.Read(buffer) <> 0)

                Dim tmpStr As String = System.Text.Encoding.UTF8.GetString(buffer)

                finalStr &= tmpStr

                buffer = New Byte(bufferSize) {}

            End While

        End If

 

        If (finalStr <> "") Then

            finalStr &= vbCrLf

            Try

                Dim xDoc As New System.Xml.XmlDocument()

                xDoc.LoadXml(finalStr)

                xMeta = XElement.Parse(xDoc.OuterXml)

            Catch ex As Exception

            End Try

        End If

        Return xMeta

    End Function

Reply all
Reply to author
Forward
0 new messages