Verifying if DOIs exist - using OpenAlex?

52 views
Skip to first unread message

Krugs.de

unread,
Feb 25, 2025, 8:48:45 AM2/25/25
to OpenAlex Community
Hi
I want to verify if a large number of DOIs do exist (I.e. resolve). I was trying to use the DOI.org API, but as I can only supply on doi at a time, this becomes a bombardment of API calls and I do neither want to be blocked or wait exceedingly long (we are talking about tens of thousand DOIs). Therefore I am thinking about using OpenAlex for that.

Everything is fine - I can make calls to retrieve one field per DOI and put multiple DOIs in a search. But is there a more efficient way of achieving this?

I assume there is not (yet?) a way of providing a list of DOIs and get a list of 0 and 1 back which indicates if the corresponding DOI does exist? (Maybe with the new GUI?)

Any suggestions?

Thanks,

Rainer

---
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany)

Orcid ID: 0000-0002-7490-0066

Department of Geography
University of Zürich
Winterthurerstrasse 190
8075 Zürich
Switzerland

Cell: +41 (0)78 630 66 57
email: Raine...@uzh.ch
Rai...@krugs.de

Sol Lederman

unread,
Feb 25, 2025, 9:10:55 AM2/25/25
to Krugs.de, OpenAlex Community
I've queried doi.org for many tens of thousands of dois one at a time just to verify their existence without being blocked. Yes, it takes time to run those queries but my impression is that doi.org's set of distributed servers is designed for such a load.

You should consider that there are multiple doi registration agencies and OpenAlex doesn't have metadata for every doi from every agency. If you're looking to verify academic paper dois from Crossref you're probably in good shape but I don't believe OpenAlex has all dois from DataCite, for example. And, I imagine that OpenAlex doesn't collect metadata from some of the smaller registration agencies. So, if accuracy is important, I would go with doi.org.

Sol

--
You received this message because you are subscribed to the Google Groups "OpenAlex Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openalex-commun...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/openalex-community/6C09AD05-6F4E-4F79-A55D-93376D2B3E46%40krugs.de.

Krugs.de

unread,
Feb 25, 2025, 10:00:12 AM2/25/25
to Sol Lederman, OpenAlex Community
Hi Sol

That was my concern as well, that OpenAlex does not have all DOIs and doi.org has more. 

Did you run the queries with a waiting period in between? Because I am at the moment.

Also, did you run these API calls sequential or parallel?

And finally, Which one is the best to use? I am using / was using 


Because I assumed it would be the one returning the smallest amount of data and requiring the smallest amount of processing.

Thanks for the thumbs up,

Rainer

On 25 Feb 2025, at 15:10, Sol Lederman <sol.le...@gmail.com> wrote:



Samuel Mok

unread,
Feb 25, 2025, 10:00:44 AM2/25/25
to Sol Lederman, Krugs.de, OpenAlex Community
Also this is the core function of crossref: hundreds of thousands of users entering a doi.org url in their browser and being sent to the page linked to it.
If you don't want to use the api you could also just send a http request to the doi and see which referral url comes back, e.g. like so:


image.png

I'm certain they won't mind you visiting doi.org urls, even en-masse :)

Sol Lederman

unread,
Feb 25, 2025, 10:23:07 AM2/25/25
to Krugs.de, OpenAlex Community
Hi Rainer,

I fetched over 2 million doi handles records using the same handles API url form you are using. I had no pause between queries. I only used a single sequential process and I would not recommend multiple processes. I don't recall how long the process took. It might have been a week. I didn't get blocked nor did I run into any other problems.

Sol

Krugs.de

unread,
Feb 25, 2025, 11:06:19 AM2/25/25
to Sol Lederman, OpenAlex Community
Thanks a lot everybody - There is an advantage of doing it DOI by DOI - it is easier to react to malformed DOIs (return FALSE). SO I will go with doi.org, which seems to take about 40 minutes for 10200 DOIs.

Cheers and thanks,

Rainer


On 25 Feb 2025, at 16:23, Sol Lederman <sol.le...@gmail.com> wrote:


Reply all
Reply to author
Forward
0 new messages