Crossref test prefixes in OpenAlex...

131 views
Skip to first unread message

Poppy Nicolette

unread,
Feb 24, 2025, 3:15:22 PMFeb 24
to OpenAlex Community
Hi all,
I searched in the archives, but did not see a prior discussion - my apologies if this has been covered before.

Crossref has a few prefixes that are used internally for testing purposes. You can see lists of them through their API here and here
test_prefixes = ["10.18810",
"10.5555",
"10.88888",
"10.30444",
"10.30446",
"10.30447",
"10.30448",
"10.30449",
"10.50505",
"10.13003",
"10.30443"]

10.5555 is the classic example as its typically used in their documentation. As such, it seems to find its way into automated processes. However, its impossible to register a DOI with one of these prefixes - they would be rejected at the time of metadata submission to Crossref.
There are many of these in OpenAlex, as found with this simple search: https://openalex.org/works?page=1&filter=doi_starts_with%3A10.5555
Note that the first one is for an article on LDA with over 27k citations.
Wouldn't it make sense to remove/screen these on ingest as they are not valid? Its not a big deal to run a function to remove them for my own purposes, just thought it would benefit the community at large if this was part of the ingest cleaning in OpenAlex?
Thanks!
Poppy

Jason Priem

unread,
Feb 25, 2025, 4:09:54 AMFeb 25
to Poppy Nicolette, OpenAlex Community
Thanks Poppy. That's great feedback. And super interesting! I hadn't known about that test DOI. We're doing a pretty big update in April, and I'll try to get this addressed in that update, or soon thereafter.
j

--
You received this message because you are subscribed to the Google Groups "OpenAlex Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openalex-commun...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/openalex-community/e13a806f-4168-4205-bb9f-f0af43115067n%40googlegroups.com.


--
Jason Priem, CEO
OurResearch: We make software to help open science.
follow at @jasonpriem and @OurResearch_org

Euan Adie

unread,
Feb 25, 2025, 4:22:41 AMFeb 25
to Poppy Nicolette, OpenAlex Community
This is super interesting (thanks Poppy)! Is it only from a certain
year forward? Can see there's e.g.

https://pubmed.ncbi.nlm.nih.gov/17604528/

at 10.5555/afhs.2007.7.1.55

though this doesn't resolve with dx.doi.org !

At Overton we've got a couple of hundred DOIs with the 10.5555 prefix
from reference sections in policy docs, in theory those DOIs came from
the Crossref bibliometric matching endpoint but could also be authors
typing in DOIs wrong or something. Would be keen to figure out where
they're leaking out into the ecosystem, just out of interest apart
from anything else.

Best,

Euan

Gabor Schubert

unread,
Feb 25, 2025, 4:43:38 AMFeb 25
to OpenAlex Community
Hi,

It seems that the LDA article has the 10.5555 DOI on the ACM journal homepage: https://dl.acm.org/doi/10.5555/944919.944937
Although the DOI does not resolve to the article: https://doi.org/10.5555/944919.944937
And this DOI is not available via the Crossref API either: https://api.crossref.org/works/10.5555/944919.944937

ACM only hosts the content of the journal "The Journal of Machine Learning Research" and the original article is found at the journal homepage at JMLR.org: https://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf , and OpenAlex correctly links to this source from the article page: https://openalex.org/works/w1880262756

It is possible that OpenAlex fetches the DOI from ACM and mixes the metadata from different sources, and in this case the DOI is not valid. The question is why ACM uses this test DOI?

Gabor Schubert
Stockholm University

Poppy Nicolette

unread,
Feb 26, 2025, 8:51:59 AMFeb 26
to OpenAlex Community
Hi all on this thread,
I just wanted to provide an update from the support folks at Crossref.

10.13003 is used for Crossref’s own publications and includes slide decks, presentations, etc. So, DOI's with this prefix are legitimate works.

10.18810 is the prefix for Linked Clinical Trials registry and the DOIs under it are legitimate identifiers, similar to the Funder Registry identifiers on 10.13039. But these are also no publications.

I ran a quick query to see how many are in Crossref under these prefixes, if its helpful for anyone:

10.18810 = 29
10.5555 = 317771
10.88888 = 24
10.30444 = 0
10.30446-9 = 0
10.50505 = 4346
10.13003 = 54
10.30443 = 0

-poppy

Poppy Nicolette

unread,
Feb 26, 2025, 9:43:57 AMFeb 26
to OpenAlex Community
Hi Gabor,
Regarding the question on ACM - as a publisher, they can create and publish a DOI anywhere, but its not 'real' untill its been deposited in Crossref, (or some other registration agency.) Chances are they tried to deposit this under 10.5555 and it was rejected by Crossref's submission checks. Given the volume they publish, someone just didn't follow up on that failure.

-poppy
On Tuesday, February 25, 2025 at 5:43:38 AM UTC-4 gabor.sch...@gmail.com wrote:

Poppy Nicolette

unread,
Feb 26, 2025, 9:52:53 AMFeb 26
to OpenAlex Community
Hi Euan,
Here's what Shayn at Crossref support had to say on those in 10.5555:
"10.5555 in particular has been used both for testing and as a holding place for DOIs that were created inadvertently and aren’t being actively stewarded. We also overwrite their metadata with placeholder data, but that wasn’t always true in the past, so you might occasionally find real metadata on 10.5555-owned records. 10.88888 and 10.50505 are used strictly for testing."

But, Crossref did confirm that anything on 10.5555 is not a 'real' publication and was used for testing purposes only.

The article you provided the link from African Health Sciences - they could have created that DOI with the 10.5555 prefix through some automated process, then published it on their website. I'm unfamiliar with how PubMed ingests material, but in this case, that DOI did not come from Crossref. I did check to see if it was under another registration agency with the agency suffix to the API query: https://api.crossref.org/works/10.5555/afhs.2007.7.1.55/agency - but nothing comes up.

In this case, you could reach out to African Health Sciences and ask them to update it with a real DOI, or reach out to sup...@crossref.org and report the DOI. They reach out to the publisher to correct the issue.

-poppy

Poppy Nicolette

unread,
Feb 26, 2025, 9:56:35 AMFeb 26
to OpenAlex Community
Hi Jason,

Just wanted to ping back on this to see the notes from Shayn regarding 10.13003 as being valid for publications and 10.18810 is for Clinical Trial Registries. There is 1 work_id in openalex under the 10.18810 prefix, so up to you if its excluded: https://openalex.org/works?page=1&filter=doi_starts_with%3A10.18810.

Glad this was helpful!
-poppy

Gabor Schubert

unread,
Feb 26, 2025, 1:21:42 PMFeb 26
to OpenAlex Community
Hi Poppy,

I still don't really understand how did these 10.5555 DOIs end up in OpenAlex if they are not registered/deposited in Crossref. I suspect that they are harvested from other sources: pubmed, journal homepages, etc. 

I found more than 70k publications in OpenAlex with 10.5555 DOIs: https://openalex.org/works?page=1&filter=doi_starts_with%3A10.5555%2F
Most of them seem to be Computer science related, so it is possible the invalid DOIs are coming from ACM, but I'm not sure how.

I checked Web of Science, Scopus and Dimensions (free version) for 10.5555/* DOIs and I found less than 10 in WoS, ca 200 in Scopus, and ca 1200 in Dimensions. I checked some of them and none of them resolves, many of them are in OpenAlex as well. 

Gabor

Poppy Nicolette

unread,
Feb 27, 2025, 10:44:10 AMFeb 27
to OpenAlex Community
Hi Gabor,
They are in Crossref, but I'd guess they all have URLs that go to defunct DOI or DOI not found pages. That's an interesting question and I'll try to dig into that a bit. They get used for testing purposes and may contain 'real' metadata, but they are not supposed to resolve to functional URLs.
I'll leave it to an OpenAlex person to discuss their source, but its likely from the data dumps that Crossref provides. I'd say OpenAlex is doing a pretty good job cleaning as the Crossref API for 10.5555 shows a lot more! https://api.crossref.org/works?filter=prefix%3A10.5555&rows=0

-poppy

Gabor Schubert

unread,
Feb 27, 2025, 12:23:33 PMFeb 27
to OpenAlex Community

Hi Poppy,

I am not sure that the ACM DOIs are in Crossref. If you change the Crossref API link to https://api.crossref.org/works?filter=prefix%3A10.5555&rows=200&select=DOI, it shows 200 DOIs but only very few have actually 10.5555 prefix, which is quite interesting in itself. I looked up one of them (10.5555/cggxqwjdpx) with the Crossref API and it works fine: https://api.crossref.org/works?filter=doi:10.5555/cggxqwjdpx, and it resolves to a Crossref blog page. But when I look for 10.5555 ACM DOIs with the Crossref API, for example the LDA article: https://dl.acm.org/doi/10.5555/944919.944937 it gives 0 results in Crossref: https://api.crossref.org/works?filter=doi:10.5555/944919.944937. It is possible that the Crossref data dump still includes this, but it is weird that it is not shown in the API. 

 Gabor

Message has been deleted

Gabor Schubert

unread,
Feb 27, 2025, 5:56:41 PMFeb 27
to OpenAlex Community
Hi Poppy,

I extracted all the 10.5555/ DOIs from OpenAlex. I found 76541 which are included in the attached text file. The simple search:  https://openalex.org/works?page=1&filter=doi_starts_with%3A10.5555 gives somewhat more, because there are some other non-related DOIs which begin with 10.5555*, like 10.55550, 10.55551, etc. As far as I see the majority (more than 68000) of these 76541 are in the style of ACM: "10.5555/some_numbers.some_numbers", from 10.5555/100296.100297 to 10.5555/99868.99882. I checked some of them randomly, and all of them give DOI not found error with dx.doi.org and resource not found error in the Crossref API. But all of them work at the ACM homepage with this syntax: https://dl.acm.org/doi/10.5555/100296.100297 . Many of these are publications in older ACM conference proceedings.
A large chunk of the rest of the 10.5555 DOIs (more than 6000) are in the format of: 10.5555/uri:pii:1079613498900900, which looks like some kind of Elsevier code. Here are all of them: https://openalex.org/works?page=1&filter=doi_starts_with%3A10.5555%2Furi . They give DOI not found and resource not found error at Crossref, but the OpenAlex records indeed lead to some older Elsevier hosted publications: like this one: https://openalex.org/works?page=1&filter=doi%3A10.5555%2Furi%3Apii%3As1079613498900900
The rest of the DOIs (ca. 2000-3000) are various other stuff. Some of them are actually working and pointing to Crossref test materials and give hits in the Crossref API, like this one: https://doi.org/10.5555/usavgdkxmz

Gabor

On Monday, 24 February 2025 at 21:15:22 UTC+1 poppyni...@gmail.com wrote:
all-5555-dois-from-openalex.txt

Poppy Nicolette

unread,
Mar 25, 2025, 10:40:49 AMMar 25
to OpenAlex Community
Hi Gabor,
I'm coming back around to this now that the semester is close to wrapping up. This is awesome. Is it ok if I share this with folks at Crossref? Obviously both ACM and Elsevier deposit metadata with them and this may be useful for cleanup if needed. As mentioned previously, these are very possibly just exposed test prefixes where Crossref copied existing records and then replaced the DOI with the 10.5555.

Re: DataCite
I did get a reply back from Mary at DataCite. They do have test prefixes, but they are not exposed.

Warm regards,
Poppy

Gabor Schubert

unread,
Mar 25, 2025, 3:38:36 PMMar 25
to OpenAlex Community
Hi Poppy,

Sure, you can share this with Crossref folk. Although I have a feeling that for instance the tens of thousands of ACM DOIs are not really connected to Crossref. It is problem created by ACM alone: they use an internal nomenclature system for their webpages which have a similar syntax as Crossref DOIs, but they are not actual DOIs. And later the algorithms at OpenAlex interpret these webpage names as DOIs and assign them to the publications. I'm not sure that ACM have ever tried to register these at Crossref or any other DOI registration agancies. When I checked any of these 10.5555 ACM pages, I've never seen any mention of DOIs, the 10.5555 names only exist in the webpage URLs. Probably it would be interesting to reach out to ACM and ask  them, as well.

Best regards,
Gabor

Reply all
Reply to author
Forward
0 new messages