Editing PDF XMP metadata.

1,478 views
Skip to first unread message

Support

unread,
Jan 6, 2011, 6:30:02 PM1/6/11
to PDFTron PDFNet SDK
Q: I’m looking for some information about editing advanced metadata in
PDF.

I have a PDF with advanced XMP metadata and I would like to add/modify
or delete this with a VB.Net application.
I use PdfNet SDK 5.0.2.0 with .Net Framework 4.

I tried this code :

For writing metadata XML node :
Private Sub SaveMetadata(ByVal xMetadata As System.Xml.XmlElement)
Dim xMetadataStr As String = xMetadata.OuterXml
Dim metadataByte As Byte() =
System.Text.Encoding.UTF8.GetBytes(xMetadataStr)

Dim xmp_stm As pdftron.SDF.Obj =
Me.p_pdfDoc.doc.CreateIndirectStream(metadataByte)
xmp_stm.PutName("Subtype", "XML")
xmp_stm.PutName("Type", "Metadata")

Me.p_pdfDoc.doc.GetRoot().Erase("Metadata")
Me.p_pdfDoc.doc.GetRoot().Put("Metadata", xmp_stm)
Me.p_pdfDoc.doc.Save("test.pdf",
pdftron.SDF.SDFDoc.SaveOptions.e_linearized +
pdftron.SDF.SDFDoc.SaveOptions.e_remove_unused)
End Sub

For extracting/reading metadata XML node :
Private Function ExtractMetaData() As XElement
Dim bufferSize As Integer = 256
Dim xMeta As XElement = Nothing

Dim finalStr As String = ""
Dim xmpStream As pdftron.SDF.Obj =
Me.p_doc.GetRoot().FindObj("Metadata")
If (xmpStream IsNot Nothing) Then
Dim oStream As pdftron.Filters.Filter =
xmpStream.GetDecodedStream()
Dim oReader As pdftron.Filters.FilterReader = New
pdftron.Filters.FilterReader(oStream)

Dim buffer As Byte() = New Byte(bufferSize) {}
While (oReader.Read(buffer) <> 0)
Dim tmpStr As String =
System.Text.Encoding.UTF8.GetString(buffer)
finalStr &= tmpStr
buffer = New Byte(bufferSize) {}
End While
End If

If (finalStr <> "") Then
finalStr &= vbCrLf
Try
Dim xDoc As New System.Xml.XmlDocument()
xDoc.LoadXml(finalStr)
xMeta = XElement.Parse(xDoc.OuterXml)
Catch ex As Exception
End Try
End If
Return xMeta
End Function


My problem is :
When I use a PDF with advanced metadata, the metadata node is stored
at the end of document like this (pdf is editing in a simple text
editor) :

1683 0 obj
<</Length 3587/Subtype/XML/Type/Metadata>>stream
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?> <x:xmpmeta
xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 4.2.1-c043 52.372728,
2009/01/18-15:08:04 ">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about=""
xmlns:xmp="http://ns.adobe.com/xap/1.0/">
<xmp:ModifyDate>2010-12-22T16:25:45+01:00</xmp:ModifyDate>
<xmp:CreateDate>2010-06-10T16:03:48+02:00</xmp:CreateDate>
<xmp:MetadataDate>2010-12-22T16:25:45+01:00</
xmp:MetadataDate>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:format>application/pdf</dc:format>
<dc:title>
<rdf:Alt>
<rdf:li xml:lang="x-default">tetete</rdf:li>
</rdf:Alt>
</dc:title>
<dc:creator>
<rdf:Bag/>
</dc:creator>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/">
<xmpMM:DocumentID>uuid:c4978727-4b6e-5b49-9040-ed94427de250</
xmpMM:DocumentID>
<xmpMM:InstanceID>uuid:992fd930-caeb-40e7-a6dc-5de01c7906ba</
xmpMM:InstanceID>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:pdfx="http://ns.adobe.com/pdfx/1.3/">
<pdfx:clientSiteHref>www.bar.com</pdfx:clientSiteHref>
<pdfx:clientSiteTitle>bar</pdfx:clientSiteTitle>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>
<?xpacket end="w"?>
endstream
endobj


My method extract metadata node. I adjust the pdfx region (in VB.Net/
WinForm) with my own metadata like this :
<rdf:Description rdf:about="" xmlns:pdfx="http://ns.adobe.com/
pdfx/1.3/">
<pdfx:clientSiteHref>www.tata.com</pdfx:clientSiteHref>
<pdfx:clientSiteTitle>tata</pdfx:clientSiteTitle>
<pdfx:pod>good</pdfx:pod>
</rdf:Description>

I replace the old XML nodewith the new one for having a node like :
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 4.2.1-c043
52.372728, 2009/01/18-15:08:04 ">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about=""
xmlns:xmp="http://ns.adobe.com/xap/1.0/">
<xmp:ModifyDate>2010-12-22T16:25:45+01:00</xmp:ModifyDate>
<xmp:CreateDate>2010-06-10T16:03:48+02:00</xmp:CreateDate>
<xmp:MetadataDate>2010-12-22T16:25:45+01:00</
xmp:MetadataDate>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:format>application/pdf</dc:format>
<dc:title>
<rdf:Alt>
<rdf:li xml:lang="x-default">tetete</rdf:li>
</rdf:Alt>
</dc:title>
<dc:creator>
<rdf:Bag/>
</dc:creator>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/">
<xmpMM:DocumentID>uuid:c4978727-4b6e-5b49-9040-ed94427de250</
xmpMM:DocumentID>
<xmpMM:InstanceID>uuid:992fd930-caeb-40e7-a6dc-5de01c7906ba</
xmpMM:InstanceID>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:pdfx="http://ns.adobe.com/pdfx/1.3/">
<pdfx:clientSiteHref>www.bar.com</pdfx:clientSiteHref>
<pdfx:clientSiteTitle>bar</pdfx:clientSiteTitle>
<pdfx:pod>good</pdfx:pod>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>

I use my writing method and the pdf result give me 2 information, one
object with the modified metadata XML node and one object with the old
metadata information, but with format like this :
4747 0 obj
<</CreationDate (D:20100610160348+02'00')/ModDate (D:
20101222162545+01'00')/Title (Rapport 2009)/clientSiteHref
(www.foobar.com)/clientSiteTitle (toto)>> endobj

If I use the new PDF with my application, I can get the good new
metadata XML node. But when I open the new PDF with Adobe Acrobat Pro,
the metadata are the old version.

-----------------------
A: PDF documents can contain metadata stored in the document
information dictionary as well as XMP (i.e. as Metadata key in the
document catalog). Did you try erasing the old document info
dictionary? For example: doc.GetTrailer().Erase("Info");

ponsharan

unread,
Jun 12, 2014, 8:24:55 AM6/12/14
to pdfne...@googlegroups.com
Hi,

I am developing tool to import XMP information into PDF file. I have used the following line which throws an UNKNOWN EXCEPTION error while compiling:

Dim xmp_stm As pdftron.SDF.Obj = oInputDoc.CreateIndirectStream(reader)
       
So, Changed the coding little like this

Dim xmp_stm As pdftron.SDF.Obj = oInputDoc.CreateIndirectStream(reader, New pdftron.Filters.FlateEncode(Nothing))

Still, the issue is not getting resolved. Could you please tell me what went wrong with this line.

Best regards
Sharan       

Aaron

unread,
Jun 12, 2014, 2:28:42 PM6/12/14
to pdfne...@googlegroups.com
Hello Sharan,

Thank you for letting us know that you're seeing this behaviour.  So that we're on the same page, could you forward a more complete code sample to sup...@pdftron.com?  Thank you for your help.
Reply all
Reply to author
Forward
0 new messages