inconsistent keywords data from the API and the web UI

49 views
Skip to first unread message

Nicandro Bovenzi

unread,
Mar 28, 2025, 1:18:21 PMMar 28
to OpenAlex Community
Hi,
Looking at the keywords associated to documents, there seems to be something strange. Many keywords that one expects being very reccurent have unrealistically low hits. A few examples: "Artificial Intelligence" has one single article for instance. "Oceanography" and "Physiology" are empty, "Graphene" has 2 and "Engineering" has 7 entries.
Moreover, there seem to be many documents (having abstract) that do not have any keyword associated, such as this one: https://openalex.org/workspage=1&zoom=w4389308279.

I looked if there was any annoucnement recently made regarding keywords but couldn't find anything. What is going on?

Thanks! 

Samuel Mok

unread,
Mar 28, 2025, 3:21:03 PMMar 28
to Nicandro Bovenzi, OpenAlex Community
Keywords are a new addition and a work in progress, see here and here for more information on them. 

Note that these are distinct from the old, deprecated Concepts (that are still included when pulling items from the API) and the newer Topics system (which replaced Concepts & form the basis for the keywords).

--
You received this message because you are subscribed to the Google Groups "OpenAlex Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openalex-commun...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/openalex-community/bcb861ca-0392-45d2-8509-c172d8619c92n%40googlegroups.com.

Nicandro Bovenzi

unread,
Mar 31, 2025, 10:46:36 AMMar 31
to OpenAlex Community
Thanks for your message, Samuel.
I am well aware of the background behing the development of keywords and deprecation of the concepts.
However, it seems to me that the current keyword-to-document assignation is surprisingly very bad/incomplete. 
Comparing the current data with the October 2024 snapshot, this is striking.


Eck, N.J.P. van (Nees Jan)

unread,
Mar 31, 2025, 11:08:52 AMMar 31
to Nicandro Bovenzi, OpenAlex Community

Hi Nicandro, just to clarify, are you saying that the keyword-to-document assignments were much better in the October 2024 snapshot, but noticeably worse in the current data? If so, it might be worth bringing this up with the OpenAlex team to figure out what’s causing the change.

Nicandro Bovenzi

unread,
Mar 31, 2025, 11:22:42 AMMar 31
to OpenAlex Community
Hi Nees,
indeed! I've submitted a support ticket  request on the matter.
Thanks and kind regards,

Nicandro

Kyle Demes

unread,
Mar 31, 2025, 11:47:08 AMMar 31
to Nicandro Bovenzi, OpenAlex Community
Hi folks,
A quick note on this. tl;dr : we are aware of the issue and it's high on our priority list to fix as soon as we finish the guts code rewrite that has been all-consuming.

Originally, we implemented keywords from the CWTS concept taxonomy because they worked great for their purposes and seemed like a good place to start! But we weren't able to get the desired functionality in OpenAlex production (too few keywords per topic to describe diversity within topic yet over-factored when trying to use all possible keywords in addition to large user demand to have keywords be orthogonal to the topic classification). We then tried to bring in the end node concepts in addition to those to remedy. And that produced results more closely aligned with user expectations but having two separate lists and classification pipelines for keywords ended up causing lots of issues. The goal is to have only a single list of keywords and a single way of classifying in production. We are close to that fix, but unfortunately it's been stuck in the backlog. But we are almost done with the projects that have been consuming our time and just today hired a new data engineer! So we're hoping to get this resolved soon.

Thanks for your patience,
kyle

Nicandro Bovenzi

unread,
Apr 1, 2025, 3:47:18 AMApr 1
to OpenAlex Community
Thanks a lot for the clarification, Kyle!
Reply all
Reply to author
Forward
0 new messages