--
You received this message because you are subscribed to the Google Groups "OpenAlex Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openalex-commun...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/openalex-community/a8bed8ee-ea23-42cb-9613-3c08c2e01621n%40googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/openalex-community/CANB9rgu2h0OL%3DtRMEQ1-KJMWPhtY1kGcwkdQk3Hgo95XQOc5JQ%40mail.gmail.com.
Hi,
As far as I see the origin of the problem is that OpenAlex assumes that the cited paper had a DOI and fetches the metadata from Crossref corresponding to the DOI (found on an unreliable preprint server) it has automatically assigned to the cited paper. In reality, the citing article in question never mentioned any preprint or DOI, and it has correctly cited the original publication which is a proceedings paper from 2017.
This is the citing article: https://doi.org/10.1038/s41467-020-17591-w (https://www.nature.com/articles/s41467-020-17591-w)
In this article reference 47 is the following: Vaswani, A., et al. Attention is all you need. In Advances in Neural Information Processing Systems, 5998–6008 (2017). (Without DOI).
In the metadata of the citing article no DOI is given either to this reference (https://api.crossref.org/works/10.1038/s41467-020-17591-w):
{"key": "17591_CR47",
"unstructured": "Vaswani, A., et al. Attention is all you need. In Advances in Neural Information Processing Systems, 5998–6008 (2017)."}
Advances in Neural Information Processing Systems is a proceedings series (https://proceedings.neurips.cc/), where the original cited paper is still available: https://papers.nips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
According to Google Scholar there are around 70 different versions available for this work uploaded in different repositories/websites and the paper has received more than 200 000 citations (https://scholar.google.com/scholar?cluster=2960712678066186980&hl=en&as_sdt=0,5). This is one of the seminal Google "AI" papers about neural networks.
Currently OpenAlex only finds less than 7000 citations to this paper: https://openalex.org/works/w2626778328 and incorrectly assumes that this is a preprint from 2025. So obviously OpenAlex misses tens of thousand citations to the original paper.
According to Crossref the DOI which is erroneously assigned to this paper in OpenAlex (from the unreliable preprint server) has never received any citations: https://api.crossref.org/works/10.65215/ne77pf66
This is kind of an interesting example because it this is a paper with extreme citation count, which is not a traditional journal article and doesn't have a DOI. There are probably many other examples like this in OpenAlex: for instance highly cited book chapters from 2025: https://openalex.org/works?page=1&filter=type:types/book-chapter,publication_year:2025, I checked some of the most cited book chapters from 2025 and most of them have hundreds or even thousands of citations from previous years (sometimes 10-20 years back). Obviously, these were not citing the 2025 version.
Probably it would be possible to somehow automatically check if a publication receives high number of citations from publications which were published earlier, then something is probably wrong.
Gabor Schubert
Stockholm University