Why are the MDC download counts from Dataverse API/database different than MDC download counts from DataCite API/Event Data?

10 views
Skip to first unread message

Julian Gautier

unread,
12:01 PM (11 hours ago) 12:01 PM
to Dataverse Users Community
Hi everyone,

Why are the MDC download counts from Dataverse API/database different than MDC download counts from DataCite API/Event Data?

Here's more context and examples:

As I've been learning what users think about the Make Data Count usage metrics, I've been collecting these MDC download counts from datasets and collections in Dataverse repositories, mostly from Harvard Dataverse but also from other Dataverse repositories that are collecting and report MDC counts.

Until yesterday, I've been doing this by only using the DataCite API to retrieve information about each DOI. For example, for the dataset at https://doi.org/10.7910/DVN/WS9OUR, the API call https://api.datacite.org/dois/10.7910/DVN/WS9OUR returns 31 as the total MDC downloadCount (as of today) and then breaks that count down by month and year:

Screenshot 2026-01-13 at 10.50.37 AM.png

DataCite's docs for this API say that those counts are "pulled from Event Data", which I assume is the same as what we've been calling Make Data Count counts. These are also the same counts shown in DataCite commons, such as https://commons.datacite.org/doi.org/10.7910/DVN/WS9OUR.

When I instead use the Dataverse API or query the Dataverse database where Dataverse records those MDC counts, I get different counts.


From the Dataverse database's datasetmetrics table, I can also see these counts broken down by month so that I'm able to see for which months the counts are different between the two sources.

The differences in the examples I've given are relatively small - 31 versus 35. I suppose they could be explained by caching or other timing issues, especially when the counts are from more recent months?

But for other datasets, the differences get bigger, like:
From the datasets I've checked so far, the counts from the Dataverse API have been greater than the counts from the DataCite API.

Lastly, I haven't compared the view counts but I can or others can if folks think that might help with troubleshooting.

Thanks in advance for any insights you can provide :)

Julian Gautier (he/him)
Product Research Specialist, IQSS
Interested in helping test Dataverse? Sign up for user experience research

James Myers

unread,
1:03 PM (10 hours ago) 1:03 PM
to dataverse...@googlegroups.com

FWIW: My initial guess would be that the reporting to DataCite failed for some months. As you said, for the dataset you cite, the DataCite API shows views and downloads per month and you can see there are gaps in that list (months where no views/downloads occurred). I’d suggest looking at the DataCite reports API to see if there are reports for the months that are missing from the dataset-level info (FOR QDR the query is https://api.test.datacite.org/reports?platform=QDR&created-by=QDR). If not, then those missing reports could contain the missing counts. Alternately, you should be able to get the total download counts for each month for that dataset from the datasetmetrics table and see if all the missing counts are from months not shown in the DataCite API.

 

-- Jim

 

From: dataverse...@googlegroups.com <dataverse...@googlegroups.com> On Behalf Of Julian Gautier
Sent: Tuesday, January 13, 2026 12:01 PM
To: Dataverse Users Community <dataverse...@googlegroups.com>
Subject: [Dataverse-Users] Why are the MDC download counts from Dataverse API/database different than MDC download counts from DataCite API/Event Data?

 

Hi everyone,

 

Why are the MDC download counts from Dataverse API/database different than MDC download counts from DataCite API/Event Data?

 

Here's more context and examples:

 

As I've been learning what users think about the Make Data Count usage metrics, I've been collecting these MDC download counts from datasets and collections in Dataverse repositories, mostly from Harvard Dataverse but also from other Dataverse repositories that are collecting and report MDC counts.

 

Until yesterday, I've been doing this by only using the DataCite API to retrieve information about each DOI. For example, for the dataset at https://doi.org/10.7910/DVN/WS9OUR, the API call https://api.datacite.org/dois/10.7910/DVN/WS9OUR returns 31 as the total MDC downloadCount (as of today) and then breaks that count down by month and year:

 

 

DataCite's docs for this API say that those counts are "pulled from Event Data", which I assume is the same as what we've been calling Make Data Count counts. These are also the same counts shown in DataCite commons, such as https://commons.datacite.org/doi.org/10.7910/DVN/WS9OUR.

 

When I instead use the Dataverse API or query the Dataverse database where Dataverse records those MDC counts, I get different counts.

 

 

From the Dataverse database's datasetmetrics table, I can also see these counts broken down by month so that I'm able to see for which months the counts are different between the two sources.

 

The differences in the examples I've given are relatively small - 31 versus 35. I suppose they could be explained by caching or other timing issues, especially when the counts are from more recent months?

 

But for other datasets, the differences get bigger, like:

From the datasets I've checked so far, the counts from the Dataverse API have been greater than the counts from the DataCite API.

 

Lastly, I haven't compared the view counts but I can or others can if folks think that might help with troubleshooting.

 

Thanks in advance for any insights you can provide :)

 

Julian Gautier (he/him)

Product Research Specialist, IQSS

Interested in helping test Dataverse? Sign up for user experience research

 

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dataverse-community/099b8865-d76e-44b0-acf7-e51dfb5114ecn%40googlegroups.com.

Julian Gautier

unread,
3:24 PM (7 hours ago) 3:24 PM
to Dataverse Users Community
Thanks Jim. I think your reply's answered a few questions I might've made more explicit earlier:
  • It's fair to think that ideally the MDC download counts from Dataverse (API and database) and the download counts from DataCite (API and Event Data) should be the same. And it's fair to think that the downloadCounts from DataCite's API, which are "pulled from Event Data", are the same as what we've been calling Make Data Count counts. I'd been wondering if I've misunderstood the intention and the different terms being used.
  • The differences between the counts from both sources are great enough to suspect that it's not only a matter of something expected and temporary, like caching, that should correct itself in a reasonable amount of time.
I'll look more closely at some datasets' monthly counts from each source and check out those reports you mentioned. I see the ones for Harvard Dataverse at https://api.datacite.org/reports?platform=Harvard+Dataverse&created-by=Harvard+Dataverse&page[size]=1000.

And I'll ping Steve Winship. Ceilyn reminded me that he also worked on how Dataverse sends this info to DataCite.
Reply all
Reply to author
Forward
0 new messages