Affiliation country metadata

124 views
Skip to first unread message

Eric Jeangirard

unread,
Feb 23, 2023, 10:11:21 AM2/23/23
to openale...@googlegroups.com
Hello

We noted a number of cases where the country_code metadata of institutions was missing or incorrect, despite the presence of a raw_affiliation_string.

We believe that many use cases using country affiliation can be developed from OpenAlex, but today the quality of this metadata still needs to be improved for the implementation of these use cases with OpenAlex to be fully relevant. For example, every country seeking to steer its public research policies needs to use a reliable database for listing and analyzing its scientific output. A country, as well as international organizations, also want to be able to compare production from one country to another. Understanding the international mobility of researchers is also a matter of interest.

I suspect that your roadmap is already more than full, but let me stress the importance of this metadata, especially for national-level and international-level entities.

I logged into this repo https://github.com/dataesr/openalex-affiliation-country the results of an experiment in which we used a direct approach raw_affiliation_string ---> country matching.

We have separated the resulting data into two files:
- missing_country_code.csv: cases where OpenAlex does not provide a country_code
e.g "KU, Leuven, Leuven, Belgium" from https://openalex.org/W3085273257 has no country_code in OpenAlex (should be 'BE')

- mismatch_country_code.csv: cases where OpenAlex MAY not provide an accurate country_code. Again these detections were done automatically and do contain errors for sure, it is NOT a golden dataset, but again, we believe that the vast majority of the cases raised here are of interest.
e.g "ANDRA, Ci2A, Soulaines-Dhuys, France" from https://openalex.org/W2802150657 is matched by OpenAlex to "Australian National Drag Racing Association", country_code AU whereas it should be matched to country 'FR'.

This email is not a complaint but rather a contribution to highlight the importance of this metadata. In this sense, we hope that the sample analysis provided in the github repo will be useful.

Regards

Eric Jeangirard

Data scientist


image.png


Casey Meyer

unread,
Mar 1, 2023, 3:45:00 PM3/1/23
to OpenAlex users
Hi Eric,

The github repo and examples you sent are very helpful. Your post is great timing because we are putting a lot of effort towards this problem. It's good to see where there is some "low hanging fruit" that we can improve upon. The way we are going after this is to 1) gather more affiliation metadata 2) improve the matching algorithms and 3) test the data more closely to reveal issues. Hopefully you will see continual improvement moving forward.

Thanks,
Casey

Reply all
Reply to author
Forward
0 new messages