Arxiv to DOI resolution

Chinar Dankhara

unread,

Nov 19, 2022, 1:48:32 PM11/19/22

to arXiv API

Hi, I am currently trying to take in Arxiv links and fetch the results using Arxiv API and extract DOIs. It seems that only DOIs resolving to external resources are returned and sometimes not the paper DOI. Is there a way to get paper DOI from Arxiv API?

Jake Weiskoff

unread,

Nov 19, 2022, 2:25:02 PM11/19/22

to arxi...@googlegroups.com

Hi Chinar,

The API does not currently provide the DOIs supplied by arXiv. I
imagine this will be included in any future version of our API, but
the timeline is unclear when that would be available. In the meantime,
you should be able to manually harvest these using the export nodes.
See:

https://arxiv.org/help/bulk_data

for more information.

Regards,
-Jake

On Sat, Nov 19, 2022 at 12:48 PM Chinar Dankhara <chinar...@gmail.com> wrote:
>
> Hi, I am currently trying to take in Arxiv links and fetch the results using Arxiv API and extract DOIs. It seems that only DOIs resolving to external resources are returned and sometimes not the paper DOI. Is there a way to get paper DOI from Arxiv API?
>

> --
> You received this message because you are subscribed to the Google Groups "arXiv API" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to arxiv-api+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/arxiv-api/4ba90424-af05-40fa-af46-481b31e99ef0n%40googlegroups.com.

Eric Lease Morgan

unread,

Nov 19, 2022, 2:57:42 PM11/19/22

to arxi...@googlegroups.com

On Nov 19, 2022, at 1:42 PM, Chinar Dankhara <chinar...@gmail.com> wrote:

> Hi, I am currently trying to take in Arxiv links and fetch the results using Arxiv API and extract DOIs. It seems that only DOIs resolving to external resources are returned and sometimes not the paper DOI. Is there a way to get paper DOI from Arxiv API?

I have/had almost the exact same problem, and I found a work-around (sort of):

1. download a snapshot of the Arxiv metadata as a JSON stream as provided by
Kaggle [1]

2. after looking very closely at the JSON and the URLs it provides, figure out
that links to the actual PDF files take the shape of
https://arxiv.org/pdf/<identifier>

3. read the documentation, and learn that bulk downloading ought to be rooted
at https://export.arxiv.org

4. loop through the JSON file extracting bibliographics as well as the identifier

5. munge the identifier into the a downloadable PDF URL, for example:
https://export.arxive.org/pdf/0704.0001

6. download the desired PDF file

...Hmmm. Well, this does not answer the question asked; this solution does not return DOIs to PDFs, but it does get the PDFs.

[1] JSON stream - http://bit.ly/3tKyLxO

--
Eric Lease Morgan
Navari Family Center for Digital Scholarship
Hesburgh Libraries
University of Notre Dame

574/631-8604
https://cds.library.nd.edu

Jake Weiskoff

unread,

Nov 19, 2022, 3:04:53 PM11/19/22

to arxi...@googlegroups.com

Actually now that I'm thinking about it, you don't actually need the API to provide the DOI for the arXiv-resolving link. All of them have the same structure:

10.48550/arXiv.[arXiv-id]

so if you substitute [arXiv-id] in that construction for the value returned from that element, you'll have the working DOI. For example:

https://doi.org/10.48550/arXiv.2211.00230

Best,

-Jake

--
You received this message because you are subscribed to the Google Groups "arXiv API" group.
To unsubscribe from this group and stop receiving emails from it, send an email to arxiv-api+...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/arxiv-api/1EAF3D16-5061-44CB-99E4-4605F936CA0F%40nd.edu.

Chinar Dankhara

unread,

Nov 22, 2022, 12:47:38 AM11/22/22

to arXiv API

Hey Jake,

I did not know there is a relationship between DOI and Arxiv id construction. Do you have more details/open code on how to make this happen?

Jake Weiskoff

unread,

Nov 22, 2022, 8:55:47 AM11/22/22

to arxi...@googlegroups.com

Hi Chinar,

arXiv announced this construction method as part of the DOI announcement, here:

https://blog.arxiv.org/2022/02/17/new-arxiv-articles-are-now-automatically-assigned-dois/

If you using our search's API, the arXiv-id is part of the url within the <id> element:

https://arxiv.org/help/api/user-manual#title_id_published_updated

and follows the form explained here:

https://arxiv.org/help/arxiv_identifier

There's not currently a code sample that provides these details.

Best,

-Jake

To view this discussion on the web visit https://groups.google.com/d/msgid/arxiv-api/2a662480-ea09-4566-a227-b7848fc83ff6n%40googlegroups.com.

Reply all

Reply to author

Forward