Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

XMP parsing in Javascript

1,444 views
Skip to first unread message

Sunil Agrawal

unread,
Sep 7, 2012, 2:13:01 AM9/7/12
to dev-p...@lists.mozilla.org
Hi,

Does anyone know how can I parse the XMP contained within the Pdf and
exposed by PdfJs in 'metadata' object? I'm looking for a JS library that
would easy the XMP parsing pain (I am quite new to XMP, hence don't know
all the namespace involved and such).

Thanks, Sunil

Yury Delendik

unread,
Sep 7, 2012, 8:42:20 AM9/7/12
to Sunil Agrawal
On 9/7/2012 1:13 AM, Sunil Agrawal wrote:
> Does anyone know how can I parse the XMP contained within the Pdf and
> exposed by PdfJs in 'metadata' object?

PDF.js is using XMP metadata to extract and display the PDF document
title. Once the XMP chunk is extracted, the data is parsed at
https://github.com/mozilla/pdf.js/blob/master/src/metadata.js .

> I'm looking for a JS library that
> would easy the XMP parsing pain (I am quite new to XMP, hence don't know
> all the namespace involved and such).

An XMP data is just an XML document, which can be parsed in browsers using

var parser = new DOMParser();
meta = parser.parseFromString(meta, 'application/xml');

and then queried using regular DOM methods. That required knowing the
DOM API (see more at http://www.w3.org/DOM/DOMTR).

An XMP chunk in any file can be found by scanning its binary data using
the rules defined in
http://partners.adobe.com/public/developer/en/xmp/sdk/XMPspecification.pdf#page=34
without knowing the file structure. In PDF.js case, we know the
structure and know where to find the XMP chunk without scanning the
whole file.

Thanks,
Yury
0 new messages