retrieve a huge amount of data given scholar names in R or Python

202 views
Skip to first unread message

Shuning Ge

unread,
Jan 17, 2022, 1:44:00 PM1/17/22
to ORCID API Users
Hi ORCID API users,

My name is Shuning Ge, I'm a data scientist at UPenn. I'm currently working on a project that requires basic education and employment information of all political scientists. I think ORCID API is a good tool but I'm having trouble implementing it.

I only have a list of scholar names (over 20k), without knowing their ORCID. I'm wondering if there's any method that I could use to retrieve data given names instead of ID. Also, I noticed that there's a rate limit. Is there any possible way to work around it...

I mostly work with R or Python. Any help on the code is greatly appreciated!! 

Chun Ly

unread,
Jan 19, 2022, 9:57:52 AM1/19/22
to ORCID API Users
This is not a direct answer to your question, but for such analysis, you might consider using the public data that ORCiD releases annually. See: https://support.orcid.org/hc/en-us/articles/360006897394-How-do-I-get-the-public-data-file-

Heather Piwowar

unread,
Jan 19, 2022, 10:50:08 AM1/19/22
to ORCID API Users
Hi Shuning, 

Going from names to ORCIDs is hard -- so many people have the same name and then it is hard to know which ORCID to use. Using the fact that you know they are in Political Science can help.

You might want to try using OpenAlex dataset for this (https://openalex.org, just came out a few weeks ago) because we've done some disambiguation to make it easier.

Three ways I can think of to get at the data you are looking for:

1.  If the you have papers for the people you are looking for, call the api using the DOI -- the authors will be there along with their ORCID when we know it.

2. Query the API for authors tagged with "Political Science" (concept "C17744445" see http://api.openalex.org/concepts/C17744445)  using the api call https://api.openalex.org/authors?filter=x_concepts.id:C17744445 and then keep the authors whose names match those on your lists (they will have their ORCID listed when we know it)

3. Download all "authors" and then filter it offline for the authors you want -- it is in JSON which makes it a bit easier to work with in my experience than the ORCID annual data dump.  https://docs.openalex.org/download-snapshot/snapshot-data-format

Needless to say we at OpenAlex are huge fans of ORCID data and API and snapshot!  We couldn't include their data in OpenAlex without all their hard work and CC0 data :)  But fyi these are some other query paths into the data in case they help.

Heather

---

Heather Piwowar, cofounder OurResearch: tools to make scholarly research more open, connected, and reusable follow at @researchremix, @OurResearch_org, @unpaywall, @unsub_org, and @OpenAlex_org

Shuning Ge

unread,
Feb 1, 2022, 10:45:16 AM2/1/22
to ORCID API Users
Thank you so much for your reply! I'll for sure take a look! The file looks really huge (many G's). But it looks promising!

Shuning Ge

unread,
Feb 1, 2022, 10:48:32 AM2/1/22
to ORCID API Users
Dear Heather,

Thank you so much for such great instructions and introduction on this amazing resource! I really appreciate it! I will carefully look at each approach you suggested, and test them if possible. I'll keep you updated on how things work out!

Wish you a wonderful day! Shuning
Reply all
Reply to author
Forward
0 new messages