Thanks to the code you supplied, I've done a little bit of coding and mapped all the translations and aliases to the openalex concepts. I then matched them with the keywords field in ORCID that researchers use to self-describe themselves.
Some discoveries:
- There are 3m+ aliases and translations for the ~65k concepts.
- Education is the top self-reported keyword concept in the ORCID registry after mapping to aliases and translations. Previously it was the English language string "Machine Learning"
- Researchers have used 41 different ways to refer to 'data mining' in the ORCID registry
- 790k ORCID records are linked to at least one OpenAlex concept via the keywords field
- There are some popular concepts in the ORCID registry that don't match well to openalex concepts. For example "Hydrology" in ORCID does not match to the "
Hydrology (agriculture)" concept in openalex/wikidata.
A couple of things I need to work on to get more useful insights and accurate numbers:
- The same alias/translation can appear in multiple wikidata concepts. For example, the hindi translation of "Informal Education" in Wikidata is "education", which causes me issues.
- A surprisingly small number of keywords in ORCID match the 3m+ translations and aliases (~80K). There are a lot of keywords (~1m) in ORCID that do not map to any of the 3m aliases/translations. This could be something to do with the way I'm matching non-roman characters in my relational database.
This is a first exploratory step, but it's very helpful to start to make sense of the various ways researchers self-describe.
And thanks for the mention of Throwdown. Filming it was a lot of fun. I can wait to watch the new series!