Older works referencing newer works

40 views
Skip to first unread message

Purna Srivatsa

unread,
Dec 17, 2025, 4:46:47 AM (yesterday) Dec 17
to OpenAlex Community
Hi,
I'm not sure if this is a bug or an expected pattern.

But this paper


published in 2020, has in its references ("referenced_works" field)

a paper ( https://api.openalex.org/w2626778328 ) from 2025.

Could someone please let me know if this is intentional or a bug ?

Thanks,
Purna

Gabor Schubert

unread,
Dec 17, 2025, 8:49:54 AM (yesterday) Dec 17
to OpenAlex Community
Hi Purna,

This is caused by the fact that the cited reference (or at least a version of it) was published earlier than 2025 (in this case in 2017: https://en.wikipedia.org/wiki/Attention_Is_All_You_Need), but for some reason the OpenAlex metadata contains 2025 as publication year. It seems to be some kind of bug.

Best regards,
Gabor Schubert

Samuel Mok

unread,
Dec 17, 2025, 9:54:01 AM (yesterday) Dec 17
to Gabor Schubert, OpenAlex Community
The issue here is that the referenced paper entry is indeed cluttered with wrong entries, mostly from https://langtaosha.org.cn/, a Chinese preprint server. All of the dois pointing there for this article are dead: probably people uploading the original paper without permission or something, which apparently has happened quite a few times. 

Anyway, LangToaSha registered the dois w/ metadata, which you can view in the crossref api. They use the uploaded date as the publication date; set the article type to 'posted-content' instead of 'preprint', etc. Openalex, assuming the data in crossref is valid, ingested this data into their systems, leading to the error you see here. 

It feels to me that LangToaSha is not using the DOI registry of crossref according to the terms, and should probably be (temporarily?) banned from registering more. For this specific item alone, they registered 8 DOIS, all of which resolve to a dead page, and along the way they've added a lot of junk to the metadata environment. 

Cheers,
Samuel

--
You received this message because you are subscribed to the Google Groups "OpenAlex Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openalex-commun...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/openalex-community/a8bed8ee-ea23-42cb-9613-3c08c2e01621n%40googlegroups.com.

Bianca Kramer

unread,
Dec 17, 2025, 11:31:17 AM (yesterday) Dec 17
to Samuel Mok, Gabor Schubert, OpenAlex Community
Hi Samuel, 

Thanks for investigating as always! Just one thing: while I agree there appear to be many issues with metadata registration for/by this preprint server, the fact that preprints are registered with type 'posted-content' (and subtype "preprint") is as expected and in line with documentation (also see here and here) - other preprint servers (including bioRxiv etc) follow this practice as well. 

kind regards, 
Bianca 

Op wo 17 dec 2025 om 15:54 schreef Samuel Mok <sam...@gmail.com>:

Gabor Schubert

unread,
3:31 AM (13 hours ago) 3:31 AM
to OpenAlex Community

Hi,

As far as I see the origin of the problem is that OpenAlex assumes that the cited paper had a DOI and fetches the metadata from Crossref corresponding to the DOI (found on an unreliable preprint server) it has automatically assigned to the cited paper. In reality, the citing article in question never mentioned any preprint or DOI, and it has correctly cited the original publication which is a proceedings paper from 2017.

This is the citing article: https://doi.org/10.1038/s41467-020-17591-w (https://www.nature.com/articles/s41467-020-17591-w)

In this article reference 47 is the following: Vaswani, A., et al. Attention is all you need. In Advances in Neural Information Processing Systems, 5998–6008 (2017). (Without DOI).

In the metadata of the citing article no DOI is given either to this reference (https://api.crossref.org/works/10.1038/s41467-020-17591-w):

{"key": "17591_CR47",

"unstructured": "Vaswani, A., et al. Attention is all you need. In Advances in Neural Information Processing Systems, 5998–6008 (2017)."}

Advances in Neural Information Processing Systems is a proceedings series (https://proceedings.neurips.cc/), where the original cited paper is still available: https://papers.nips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html

According to Google Scholar there are around 70 different versions available for this work uploaded in different repositories/websites and the paper has received more than 200 000 citations (https://scholar.google.com/scholar?cluster=2960712678066186980&hl=en&as_sdt=0,5). This is one of the seminal Google "AI" papers about neural networks.

Currently OpenAlex only finds less than 7000 citations to this paper: https://openalex.org/works/w2626778328 and incorrectly assumes that this is a preprint from 2025. So obviously OpenAlex misses tens of thousand citations to the original paper.

According to Crossref the DOI which is erroneously assigned to this paper in OpenAlex (from the unreliable preprint server) has never received any citations: https://api.crossref.org/works/10.65215/ne77pf66

This is kind of an interesting example because it this is a paper with extreme citation count, which is not a traditional journal article and doesn't have a DOI. There are probably many other examples like this in OpenAlex: for instance highly cited book chapters from 2025: https://openalex.org/works?page=1&filter=type:types/book-chapter,publication_year:2025, I checked some of the most cited book chapters from 2025 and most of them have hundreds or even thousands of citations from previous years (sometimes 10-20 years back). Obviously, these were not citing the 2025 version.

Probably it would be possible to somehow automatically check if a publication receives high number of citations from publications which were published earlier, then something is probably wrong.

Gabor Schubert
Stockholm University

Reply all
Reply to author
Forward
0 new messages