On 9/7/2012 1:13 AM, Sunil Agrawal wrote:
> Does anyone know how can I parse the XMP contained within the Pdf and
> exposed by PdfJs in 'metadata' object?
PDF.js is using XMP metadata to extract and display the PDF document
title. Once the XMP chunk is extracted, the data is parsed at
https://github.com/mozilla/pdf.js/blob/master/src/metadata.js .
> I'm looking for a JS library that
> would easy the XMP parsing pain (I am quite new to XMP, hence don't know
> all the namespace involved and such).
An XMP data is just an XML document, which can be parsed in browsers using
var parser = new DOMParser();
meta = parser.parseFromString(meta, 'application/xml');
and then queried using regular DOM methods. That required knowing the
DOM API (see more at
http://www.w3.org/DOM/DOMTR).
An XMP chunk in any file can be found by scanning its binary data using
the rules defined in
http://partners.adobe.com/public/developer/en/xmp/sdk/XMPspecification.pdf#page=34
without knowing the file structure. In PDF.js case, we know the
structure and know where to find the XMP chunk without scanning the
whole file.
Thanks,
Yury