Significant errors in author information

66 views
Skip to first unread message

Jason Augustyn

unread,
May 18, 2026, 8:55:48 AM (10 days ago) May 18
to OpenAlex Community
A member of my team has surfaced issues with the author information on several papers he spot-checked for a project. The issues appear to affect both the bulk snapshot data and the OpenAlex API. For example, this paper:


Lists "Marion Pang" as the second author in the snapshot, while the API and OpenAlex website list "Miaosen Pang". The journal website lists Miaosen Pang:


However, there appears to be deeper disambiguation issues. The OpenAlex record for this author appears to be conflating them with at least one other author that does biomedical research. In fact, the ORCID provided in the OpenAlex data points to a researcher who works in proteomics, not metallic alloys.


I am concerned because we identified several such examples with very limited manual exploration, making me think there may be systemic issues with author data. We rely on this to be accurate for our primary use case, and at this point my team doesn't trust the data.

Can someone from OpenAlex comment on this issue? 

Jay Pfaffman

unread,
May 18, 2026, 3:02:54 PM (10 days ago) May 18
to Jason Augustyn, OpenAlex Community
I too have noticed a surprising number of user records with multiple users. 

My advisor, Daniel L Schartz, who's the Dean of Stanford Graduate School of Education is in https://openalex.org/authors/A5051808588 , which has another Dan Schwartz included in his publications and institutions. The record has gotten better, though! It used to have another name of another co-author (whose name does not include Dan, Daniel, or Schwartz!). I suspect things are made more difficult by my Dan not having an Orchid ID, though, I don't know if that was part of how things get connected. Right, that record has an orchid ID that's not his.

You have to scroll pretty far down the UI page to find articles not by him, but sorting by date, the other one has a lot of work.

I am concerned because we identified several such examples with very limited manual exploration, making me think there may be systemic issues with author data. We rely on this to be accurate for our primary use case, and at this point my team doesn't trust the data.

If your primary use case has to do with author records, then your team is right not to trust the data. I think you'll need to find another database or work out your own disambiguation plan in the short term. 


For researchers: A complete rewrite of author name disambiguation ships by end of Q1. This has always been the hardest problem in bibliometrics. With today’s AI, we think we can build the most accurate system ever made. 

I'm not convinced today's AI can solve the problem, though, as I said above, I do see that improvements have been made.

I've got a somewhat silly citation graph tool here: https://www.refrunner.com/authors/A5051808588/graph/incoming -- you can see that the most-cited articles are education/pyschology and most of the green less-referenced ones are in medicine. Looks like he is also a Daniel L Schwartz. :-( 



--
You received this message because you are subscribed to the Google Groups "OpenAlex Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openalex-commun...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/openalex-community/8361e477-826e-42dd-b682-78c41bd25ee3n%40googlegroups.com.

Ricardo Rodrigues Lucca

unread,
May 19, 2026, 4:28:05 PM (9 days ago) May 19
to OpenAlex Community
I have some cases like that too. For example:

author on snapshot:  Thaynara Silva de Oliveira (https://openalex.org/authors/a5063807970)
correct author:  Tatiana Silva de Oliveira

author on snapshot:  Keury Carolaine Pereira da Silva (https://openalex.org/authors/a5075286824)
correct author:  Karin Cristina Da Silva

These authors share the same ID, but they are different people. The API appears show it correctly, but clicking on the author's name takes you to the wrong author.

Jay Pfaffman

unread,
May 19, 2026, 5:33:34 PM (9 days ago) May 19
to Ricardo Rodrigues Lucca, OpenAlex Community
These authors share the same ID, but they are different people. The API appears show it correctly, but clicking on the author's name takes you to the wrong author.

https://api.openalex.org/a5075286824 does list all of these, so the authors are conflated.

[
"K. C. Silva",
"Karin Cristina Da Silva",

"Keury Carolaine Pereira da Silva"
], 

You can see in https://openalex.org/authors/a5063807970 that the paper is listed, so it's that both authors share an author ID.

i suspect the other is the same. When there are fewer than 10 references, it's not that hard to fix, but when both authors each have hundreds of papers, it's another story.

Ricardo Lucca

unread,
May 20, 2026, 9:13:20 AM (8 days ago) May 20
to Jay Pfaffman, OpenAlex Community
Yes, and I don't know why it only happens on first author. I've a few other cases.
Reply all
Reply to author
Forward
0 new messages