On Nov 19, 2022, at 1:42 PM, Chinar Dankhara <
chinar...@gmail.com> wrote:
> Hi, I am currently trying to take in Arxiv links and fetch the results using Arxiv API and extract DOIs. It seems that only DOIs resolving to external resources are returned and sometimes not the paper DOI. Is there a way to get paper DOI from Arxiv API?
I have/had almost the exact same problem, and I found a work-around (sort of):
1. download a snapshot of the Arxiv metadata as a JSON stream as provided by
Kaggle [1]
2. after looking very closely at the JSON and the URLs it provides, figure out
that links to the actual PDF files take the shape of
https://arxiv.org/pdf/<identifier>
3. read the documentation, and learn that bulk downloading ought to be rooted
at
https://export.arxiv.org
4. loop through the JSON file extracting bibliographics as well as the identifier
5. munge the identifier into the a downloadable PDF URL, for example:
https://export.arxive.org/pdf/0704.0001
6. download the desired PDF file
...Hmmm. Well, this does not answer the question asked; this solution does not return DOIs to PDFs, but it does get the PDFs.
[1] JSON stream -
http://bit.ly/3tKyLxO
--
Eric Lease Morgan
Navari Family Center for Digital Scholarship
Hesburgh Libraries
University of Notre Dame
574/631-8604
https://cds.library.nd.edu