Bugs in OpenAlex data

瀏覽次數:115 次
跳到第一則未讀訊息

Johan K

未讀,
2022年8月7日 下午2:35:082022/8/7
收件者:OpenAlex users
Hi OpenAlex team,

I'm working with OpenAlex as part of my Master's thesis and am exploring the included objects as well as their level of detail and data quality. 
During that, I found two issues that might be of interest to you in case you're not already aware of them.

One, regarding page numbers in the works:
There are items that have non-integer characters for the page numbers included such as: first page: "[ 7 \"]"
I noticed this when my parser raised exceptions, and especially escaped characters were an issue once in a while, and the double quotes confused my functions a bit. 
Maybe systematically filtering all non-integer values could be an option for these attributes?
 
Second, institution image urls:
The image and thumbnail URLs targeting wikimedia have some additional '%' characters and can not be reached. 

Example (faulty) link from OpenAlex: 
Correct link: 

Example: https://api.openalex.org/institutions/I166928557


Hope it helps, keep up the good work!

Best regards
Johan 

 

Casey Meyer

未讀,
2022年8月8日 上午9:17:332022/8/8
收件者:Johan K、OpenAlex users
Hi Johan,

Great feedback! We've added these to our tracker so we can get them cleaned up.

Thanks,
Casey

--
You received this message because you are subscribed to the Google Groups "OpenAlex users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openalex-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openalex-users/fc704d9f-013c-48b3-a80b-206424410a7dn%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Casey Meyer
Developer - OpenAlex, Unpaywall
OurResearchWe build tools to make scholarly research more open, connected, and reusable—for everyone.
回覆所有人
回覆作者
轉寄
0 則新訊息