Dataset Versions Not Connected?

Elisabeth Shook

unread,

Mar 5, 2026, 5:42:52 PMMar 5

to OpenAlex Community

Hello everyone,

Long story short, I am interested in the citations associated with ICPSR datasets, but I have discovered that each dataset version does not appear to be connected. For example, 10.3886/icpsr01335(W4230215047) and 10.3886/icpsr01335.v1(W4239955004) have nothing indicating a relationship in the metadata. I have checked a handful of examples and found the same issue. What's more, each work has a different number of "cited by" works, though they share the same referenced works.

Is there really nothing in the metadata, outside of sharing the same canonical DOI, that connects dataset versions? I can work around this, but does that not artificially multiply the number of datasets?

Thanks so much for any help!

Elisabeth Shook

Rainer M Krug

unread,

Mar 6, 2026, 3:03:22 AMMar 6

to Elisabeth Shook, OpenAlex Community

You are raising an important point, which also relates to Zenodo deposits in general. Deposits with a version history, have two DOIs: one which ALWAYS resolve to the newest version (concept DOI), and each version has an individual (versioned) DOI. This makes it difficult

As an example:

https://openalex.org/works?search.title_and_abstract=openalexPro&page=1&sort=relevance_score:desc

https://doi.org/10.5281/zenodo.17453180 is the concept DOI, which is nowhere in the result from the api call (e.g. https://api.openalex.org/w7125710957)

It would be ideal, if Op[enAlex would also provide the concept id, although I do not know how widely it is used from other repositories?

Cheers

Rainer

--
You received this message because you are subscribed to the Google Groups "OpenAlex Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openalex-commun...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/openalex-community/272deeba-e70d-40be-bcf7-2a2f0f4478b5n%40googlegroups.com.

Kevin Leonard

unread,

Mar 6, 2026, 4:17:53 AMMar 6

to OpenAlex Community

Hi All,

Indeed, the dataset coverage in OpenAlex is still at times a bit muddy, but as Rainer mentions, I think this is often not the fault of OpenAlex but of upstream sources.

For example, searching for datasets from authors affiliated with my institution, I can find this dataset 10.48527/kiper0 (OpenAlex, repository page), which is composed of 6 data files and 1 README file. Because of the configuration of this particular Dataverse, each file gets its own unique PID (created by appending a few characters to the end of the dataset-level PID). What that means, however, is that these file-level PIDs are also being ingested into OpenAlex and treated as a fully-fledged dataset (for example: 10.48527/kiper0/vsmrgo [OpenAlex, repository page]). The result is that what should constitute a single dataset entry in DataCite (and OpenAlex, and other downstream targets) ends up appearing as 8 different discrete entries. That means your counts of datasets may be being artificially inflated by more than just versioning.

I will say that the DataCite entry for the above dataset-level PID contains HasPart relatedIdentifiers for its 7 children datafile records (and for completeness, the children have isPartOf relatedIdentifiers to their parent [example]). Similarly, for the record that Rainer linked, the DataCite JSON uses the IsVersionOf relatedIdentifier to link the versions of the datasets to one another. So, it would seem like at least a relevant first step would be for OpenAlex to preserve some of these relatedIdentifier fields to allow easier downstream identification of versions and parent-child relationships.