Statistics on author disambiguation

60 views
Skip to first unread message

Peter Lombaers

unread,
Mar 10, 2023, 4:34:07 AM3/10/23
to OpenAlex users
Dear OpenAlex team,

Great work on bringing the author count down by such a number. It is a lot closer to what you would expect the real number of authors to be. 

I was wondering if you have statistics on how accurate the result is. There will be many cases where multiple author identifiers correctly get associated with the same author, but naturally there will also be more cases where a work incorrectly gets associated with the wrong author. Do you have any idea how often this happens? Was there a percentage you were aiming for? That would be very helpful in communicating to users.

I don't know if this helps, but I have a couple of cases where a work gets associated with the wrong author (with a very similar name):


Correct author: The correct author is a thyroid surgeon I could not find an identifier for, the associated author is a political scientist.

Correct author: The correct author is a surgeon I could not find an identifier for, the associated author is a geographer.

Best,
Peter Lombaers

Angelo Salatino

unread,
Mar 11, 2023, 5:03:16 AM3/11/23
to Peter Lombaers, OpenAlex users
Hi OpenAlex team,

I am using this thread to also highlight something odd with my author profile.

I was checking https://api.openalex.org/works/https://doi.org/10.1162/qss_a_00162 in which I am the second author.

In the previous snapshot (before author merging), my id was https://openalex.org/A2972862327 (which numeric part was also my MAG id) and has display name as "Angelo Antonio Salatino".
From the recent snapshot, my id became https://openalex.org/A3203666745 which has:
  • display name: "Salatino, Angelo A., Thiviyan Thanapalasingam, Andrea Mannocci, Francesco Osborne" (multiple authors-colleagues blended)
  • and 344 works (which is way too much than my current number of publications so far)
I hope this feedback helps fixing your pipeline.

All the best
Angelo


Angelo Antonio Salatino [salatino.org] [about.me/angelosalatino]


--
You received this message because you are subscribed to the Google Groups "OpenAlex users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openalex-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openalex-users/691b52e0-e103-4aa4-a260-9825c71ca174n%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Casey Meyer

unread,
Mar 15, 2023, 11:10:36 AM3/15/23
to OpenAlex users
Hi Peter and Angelo,

Thanks for the feedback! The short answer is no, we do not have statistics available right now. But it's definitely something we want to do and this thread helped get the discussion going.

We've been using the new sample parameter to gather random records when testing features. We can manually annotate the actual result, then compare how close the software gets to that expected result. We want to do that for the author name disambiguation software soon. That should give us some better statistics to work with, not only to publicly display to the group, but more importantly to keep improving the feature.

We look at the specific examples you sent, so thanks for sending those!

Thanks,
Casey  

Reply all
Reply to author
Forward
0 new messages