Bugs in OpenAlex data

114 views
Skip to first unread message

Johan K

unread,
Aug 7, 2022, 2:35:08 PM8/7/22
to OpenAlex users
Hi OpenAlex team,

I'm working with OpenAlex as part of my Master's thesis and am exploring the included objects as well as their level of detail and data quality. 
During that, I found two issues that might be of interest to you in case you're not already aware of them.

One, regarding page numbers in the works:
There are items that have non-integer characters for the page numbers included such as: first page: "[ 7 \"]"
I noticed this when my parser raised exceptions, and especially escaped characters were an issue once in a while, and the double quotes confused my functions a bit. 
Maybe systematically filtering all non-integer values could be an option for these attributes?
 
Second, institution image urls:
The image and thumbnail URLs targeting wikimedia have some additional '%' characters and can not be reached. 

Example (faulty) link from OpenAlex: 
Correct link: 

Example: https://api.openalex.org/institutions/I166928557


Hope it helps, keep up the good work!

Best regards
Johan 

 

Casey Meyer

unread,
Aug 8, 2022, 9:17:33 AM8/8/22
to Johan K, OpenAlex users
Hi Johan,

Great feedback! We've added these to our tracker so we can get them cleaned up.

Thanks,
Casey

--
You received this message because you are subscribed to the Google Groups "OpenAlex users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openalex-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openalex-users/fc704d9f-013c-48b3-a80b-206424410a7dn%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Casey Meyer
Developer - OpenAlex, Unpaywall
OurResearchWe build tools to make scholarly research more open, connected, and reusable—for everyone.
Reply all
Reply to author
Forward
0 new messages