Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Generating IDs for pdf files opened by pdf js

48 views
Skip to first unread message

Apurva Jalit

unread,
Aug 4, 2015, 8:43:30 AM8/4/15
to dev-p...@lists.mozilla.org
Hello folks,
I am working on a browser extension that internally uses pdfjs to open pdf
documents on the browser. For a certain use case I need to get the ID of
the pdf opened by pdf js. I see this ID/URN or the identifier string for
the pdf printed in the console(when using pdf viewer extension to open a
pdf via pdfjs). For eg. for the page
http://www.ifets.info/journals/10_4/9.pdf I can see this string
"30eb7fcbb6756e461176fbbd0ceba7b9" in the console.

How is this ID generated? Is there a standard algorithm for it?

Thanks,
Apurva Jalit

ydel...@mozilla.com

unread,
Aug 4, 2015, 9:05:24 AM8/4/15
to mozilla-d...@lists.mozilla.org
On Tuesday, August 4, 2015 at 7:43:30 AM UTC-5, Apurva Jalit wrote:
> For a certain use case I need to get the ID of
> the pdf opened by pdf js. I see this ID/URN or the identifier string for
> the pdf printed in the console(when using pdf viewer extension to open a
> pdf via pdfjs).

> How is this ID generated? Is there a standard algorithm for it?

Hello,

A PDF file contains ID field that suppose to be somewhat unique, see the section 14.4. File Identifiers of the spec [1]. If the ID field in absent or set to zeros, we calculate MD5 of the first 1024 bytes of the PDF file [2].

Thanks.

[1] http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#page=559
[2] https://github.com/mozilla/pdf.js/blob/master/src/core/core.js#L512

Randall Leeds

unread,
Aug 4, 2015, 12:12:33 PM8/4/15
to Apurva Jalit, dev-p...@lists.mozilla.org
Yes, it's in the PDF spec, which is a very long document so I can't dig up
the link right now.

It's two parts. It's a hash of some content, if I recall correctly, and
this is repeated. When the document is edited, one half is kept the same
and the other is recomputed. In this way, it's half fixed, half
version-related.

I don't think PDF.js calculates these. It just reads what's stored in the
PDF.

On Tue, Aug 4, 2015, 05:43 Apurva Jalit <apurva...@gmail.com> wrote:

> Hello folks,
> I am working on a browser extension that internally uses pdfjs to open pdf
> documents on the browser. For a certain use case I need to get the ID of
> the pdf opened by pdf js. I see this ID/URN or the identifier string for
> the pdf printed in the console(when using pdf viewer extension to open a
> pdf via pdfjs). For eg. for the page
> http://www.ifets.info/journals/10_4/9.pdf I can see this string
> "30eb7fcbb6756e461176fbbd0ceba7b9" in the console.
>
> How is this ID generated? Is there a standard algorithm for it?
>
> Thanks,
> Apurva Jalit
> _______________________________________________
> dev-pdf-js mailing list
> dev-p...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-pdf-js
>
0 new messages