API missing dois found in simple query and manual search

102 views
Skip to first unread message

Jade Yonehiro

unread,
May 17, 2022, 12:37:37 PM5/17/22
to Unpaywall discussion
Hi there,

TL;DR: Unpaywall API is throwing errors for a large number of dois and not returning data or recording the error messages associated with these dois. When investigating further using the simple query I found that at least 80% of the rejected dois are in unpaywall.
I'm wondering if there are limits to number of dois that can be fed into the API at a time that could be causing errors and if anyone else is having issues with getting the API to record error messages. Any help or pointers would be greatly appreciated!

Detailed:
I am working on a large scale pull of data from unpaywall, but getting a lot of rejected dois (n=9,442). For some reason the error messages are not being recorded, making it difficult to trouble shoot this issue - so I did some further investigating. 
With this high number of missed dois, I queried unpaywall using both the simple simple query tool and the  manual single item search to double check if these dois were in unpaywall's system or not. 
In the simple query tool, I found data for 7,584 of the 9,442 missing dois. Investigating the remaining 1,858 missing dois, I found some patterns in the journal dois being rejected. I plugged a small selection of these dois into the manual single item search and was able to find several of the dois. Large numbers of dois from journals like Journal of High Energy Physics, the Lancet journals, and  Journal of the American Medical Association are being rejected despite finding them in the single search.

I'm wondering if there are limits to number of dois that can be fed into the API at a time that could be causing errors and if anyone else is having issues with getting the API to record error messages. Any help or pointers would be greatly appreciated!

Any ideas/thoughts/pointers would be greatly appreciated!

Casey M

unread,
Jun 17, 2022, 9:55:08 AM6/17/22
to Unpaywall discussion
Hello! My name is Casey and I'm with Unpaywall now and can help with this.

There should not be a difference between the API results and the simple query tool. Are you calling the API like this: https://api.unpaywall.org/v2/10.1016/j.jorganchem.2020.121307?email=sup...@unpaywall.org ? Maybe you can post a DOI where you had trouble getting a result or an error message so we can troubleshoot it? The use limit is fairly high so you should not be running into that.

Jade Yonehiro

unread,
Jun 29, 2022, 11:34:26 AM6/29/22
to Unpaywall discussion
Hi Casey,

I am using R to search for dois, below is my code for feeding in a list of dois and pulling their info from the API:

Rsnippet.PNG

I have a list of 1,858 dois that did not return data using R or simple query tool. Below is a sample of dois from that list that did return data searched via the single doi manual search:
  • 10.1002/jac5.1362
  • 10.1002/jac5.1412
  • 10.1002/jper.19-0441
  • 10.1016/s1470-2045(20)30539-8
  • 10.1017/s0022377821000970
  • 10.1029/2020gc009449
  • 10.1109/asap52443.2021.00009
  • 10.1109/dac18074.2021.9586194

Casey M

unread,
Jun 30, 2022, 3:19:03 PM6/30/22
to Unpaywall discussion
Oh I see. I'm not sure what's happening. I ran some of the DOIs you listed with the roadoi package in R (similar to your code) and did not receive any errors.

When I run the DOIs you listed I can see them in the simple query tool and in the single DOI search. Maybe you are running into some network errors and they are not being reported by the roadoi package? If it's possible you could divide your data into smaller chunks and try that way. 

Casey



Jade Yonehiro

unread,
Jul 18, 2022, 11:57:01 AM7/18/22
to Unpaywall discussion
Hi Casey,

So I split up my doi list and reran the API calls. I was able to get info for all but the attached set of dois. I tried splitting this large list into even smaller lists (as few as 1,000 dois at a time) and separately re-ran the calls to no avail.
Even when the call runs successfully (i.e., no timeouts) it writes a JSON file that is completely empty:
brokenJSON.PNG
Any ideas on what may be happening?
I was able to successfully retrieve data using the same code for the other 40,632 dois (pulling 10,158 dois at a time) I was looking for.

Some other trouble shooting I tried
- connecting directly to ethernet to ensure internet speeds and connection were not the issue (internet speeds while connected is 910.94 Mbps download, 928.13 Mbps upload)
- running on different days to ensure I wasn't surpassing the 100,000 limit
- running on a different computer incase it was a hardware issue or my IP address was being blocked from too many requests
BrokenUPWDois

Casey M

unread,
Jul 18, 2022, 6:31:07 PM7/18/22
to Jade Yonehiro, Unpaywall discussion, Jason Priem
Hi Jade,

This is puzzling! So if you run 3 or 4 of those DOIs do you receive no response? I took a sample from the document and am able to see the results in R:

Screen Shot 2022-07-18 at 5.26.10 PM.png

Maybe we can start with a very small group and troubleshoot from there? Also, what is the "D1" portion for in the DOI list? Is that being fed into the script?

Thanks,
Casey

--
You received this message because you are subscribed to a topic in the Google Groups "Unpaywall discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/unpaywall/X750oiQGTeM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to unpaywall+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/unpaywall/f2631929-e57e-4c28-899f-5604ad6345f4n%40googlegroups.com.

Jade Yonehiro

unread,
Jul 19, 2022, 10:08:42 AM7/19/22
to Unpaywall discussion
Hi Casey,

I will give that a shot!  I'm going to see if I write a loop function that will feed in one doi at a time and identify where the break is happening at.

The D1 is just one of the subsets of the D list from when I tried breaking the ~10,000 results into smaller sets of ~3k and ~1.5k.

Reply all
Reply to author
Forward
0 new messages