Hello
We noted a number of cases where the country_code metadata of institutions was missing or incorrect, despite the presence of a raw_affiliation_string.
We believe that many use cases using country affiliation can be developed from OpenAlex, but today the quality of this metadata still needs to be improved for the implementation of these use cases with OpenAlex to be fully relevant. For example, every country seeking to steer its public research policies needs to use a reliable database for listing and analyzing its scientific output. A country, as well as international organizations, also want to be able to compare production from one country to another. Understanding the international mobility of researchers is also a matter of interest.
I suspect that your roadmap is already more than full, but let me stress the importance of this metadata, especially for national-level and international-level entities.
We have separated the resulting data into two files:
-
missing_country_code.csv: cases where OpenAlex does not provide a country_code
e.g "KU, Leuven, Leuven, Belgium" from
https://openalex.org/W3085273257 has no country_code in OpenAlex (should be 'BE')
-
mismatch_country_code.csv: cases where OpenAlex MAY not provide an accurate country_code. Again these detections were done automatically and do contain errors for sure, it is NOT a golden dataset, but again, we believe that the vast majority of the cases raised here are of interest.
e.g "ANDRA, Ci2A, Soulaines-Dhuys, France" from
https://openalex.org/W2802150657 is matched by OpenAlex to "Australian National Drag Racing Association", country_code AU whereas it should be matched to country 'FR'.
This email is not a complaint but rather a contribution to highlight the importance of this metadata. In this sense, we hope that the sample analysis provided in the github repo will be useful.
Regards
Eric Jeangirard
Data scientist