downloading intra-arxiv-references

149 views
Skip to first unread message

CrazyCarl

unread,
May 6, 2012, 11:37:07 AM5/6/12
to arXiv api
I'm looking for a list for the past 5-10 years of citations between
arxiv.org papers.

I'm building out a hive-plot visualization of how different branches
of arxiv.org papers intermingle. I hypothesize that it being an open
access journal that there will be substantial cross-pollination of
ideas and make for a stunning visual.

I've developed numerous crawlers but arxiv.org doesn't have the
bandwidth for that. Does the API happen to have particular methods /
keywords for returning *ONLY* references listed in a paper?

... it would be absolutely amazing if it did.

Thanks



Thorsten

unread,
May 9, 2012, 2:16:07 PM5/9/12
to arxi...@googlegroups.com


  arXiv does not extract or curate references, so there is no method (api or otherwise) to get at just those. However, for the hep* community INSPIRE does an excellent job at reference curation and you can use the RESTful api of INSPIRE to collect those references (be gentle, single thread, some reasonable throttling, ....), e.g. the MARCXML output format http://inspirehep.net/record/1112851?of=xm

If you want to do a full scale analysis, you can get the full content of arXiv as PDF from the amazon cloud service, see http://arxiv.org/help/bulk_data_s3 and then run your own reference extraction or use e.g. invenio refextract http://invenio-software.org/repo/invenio/tree/modules/bibedit/lib/refextract.py which is part of the invenio suite of modules

Cheers
T.
Reply all
Reply to author
Forward
0 new messages