Embedding metadata in PDFs

Bryan Bishop

unread,

Jan 15, 2013, 11:43:21 AM1/15/13

to science-libe...@googlegroups.com, Bryan Bishop

"We present a novel approach to retrieve metadata to scholarly papers
stored locally as PDF files. A fingerprint is produced from the PDF
fulltext to query an online metadata repository. The returned results
are matched back to identify the correct metadata entry. These
metadata can then be stored in the PDF itself, indexed for a desktop
search engine, and collected in a user's or community's bibliography.
We think this hitherto missing link but with our tool now available
data eases the organization of scholarly papers, and increases
accessibility to one's collected academic content."

http://code.google.com/p/pdfmeat/

paper:
http://dbs.uni-leipzig.de/publication/title/retrieving_metadata_for_your_local_scholarly_papers

https://pdfmeat.googlecode.com/files/pdfmeat.py

It also needs a file called "bibtex2pdfmetadata.pl"... I think an
enterprising programmer could polish this project up, e.g. the source
code is in the "downloads" section instead of in a version control
system (whaaat).

- Bryan
http://heybryan.org/
1 512 203 0507

Nathan McCorkle

unread,

Jan 15, 2013, 10:41:56 PM1/15/13

to science-libe...@googlegroups.com, Bryan Bishop

Have you used this Bryan? Could this be used to index all the PDF files I already have?

Bryan Bishop

unread,

Jan 15, 2013, 10:50:14 PM1/15/13

to Nathan McCorkle, Bryan Bishop, science-libe...@googlegroups.com

On Tue, Jan 15, 2013 at 9:41 PM, Nathan McCorkle wrote:
> Have you used this Bryan? Could this be used to index all the PDF files I
> already have?

No, I haven't used it. It seems to just dump text snippets to Google
Scholar and then use the bibtex resources that Google Scholar
provides. I think Zotero also does something similar to this.

http://www.zotero.org/support/pdf_fulltext_indexing

Bryan Bishop

unread,

Jan 15, 2013, 10:52:04 PM1/15/13

to Nathan McCorkle, Bryan Bishop, science-libe...@googlegroups.com

On Tue, Jan 15, 2013 at 9:50 PM, Bryan Bishop <kan...@gmail.com> wrote:
> http://www.zotero.org/support/pdf_fulltext_indexing

Oops, I meant to link to this:

http://www.zotero.org/support/retrieve_pdf_metadata

"""
Zotero can take PDFs of scholarly papers and query the Google Scholar
database for matches. The most straight-forward way it does this is by
matching up an embedded Digital Object Identifier (DOI), but that's
far from necessary. If Zotero finds the PDF in Google Scholar, it
creates a new library item for the paper, downloads the bibliographic
metadata from and attaches the original PDF to the new item. Begin by
dragging your existing PDFs into your Zotero library (currently broken
on linux) or use the “Store Copy of File” option from the add new item
menu (green plus sign). Once they appear in the middle column, select
the ones for which you wish to retrieve metadata. Right click on them
and select “Retrieve Metadata for PDF”. If Zotero was able to find a
match on Google Scholar, you should be all set. With this feature,
there should be no major hurdles to switching to Zotero and taking
full advantage of all its powerful search, indexing, organizational
and citation features.

Reply all

Reply to author

Forward