
FWIW: My initial guess would be that the reporting to DataCite failed for some months. As you said, for the dataset you cite, the DataCite API shows views and downloads per month and you can see there are gaps in that list (months where no views/downloads occurred). I’d suggest looking at the DataCite reports API to see if there are reports for the months that are missing from the dataset-level info (FOR QDR the query is https://api.test.datacite.org/reports?platform=QDR&created-by=QDR). If not, then those missing reports could contain the missing counts. Alternately, you should be able to get the total download counts for each month for that dataset from the datasetmetrics table and see if all the missing counts are from months not shown in the DataCite API.
-- Jim
From: dataverse...@googlegroups.com <dataverse...@googlegroups.com>
On Behalf Of Julian Gautier
Sent: Tuesday, January 13, 2026 12:01 PM
To: Dataverse Users Community <dataverse...@googlegroups.com>
Subject: [Dataverse-Users] Why are the MDC download counts from Dataverse API/database different than MDC download counts from DataCite API/Event Data?
Hi everyone,
Why are the MDC download counts from Dataverse API/database different than MDC download counts from DataCite API/Event Data?
Here's more context and examples:
As I've been learning what users think about the Make Data Count usage metrics, I've been collecting these MDC download counts from datasets and collections in Dataverse repositories, mostly from Harvard Dataverse but also from other Dataverse repositories that are collecting and report MDC counts.
Until yesterday, I've been doing this by only using the DataCite API to retrieve information about each DOI. For example, for the dataset at https://doi.org/10.7910/DVN/WS9OUR, the API call https://api.datacite.org/dois/10.7910/DVN/WS9OUR returns 31 as the total MDC downloadCount (as of today) and then breaks that count down by month and year:

DataCite's docs for this API say that those counts are "pulled from Event Data", which I assume is the same as what we've been calling Make Data Count counts. These are also the same counts shown in DataCite commons, such as https://commons.datacite.org/doi.org/10.7910/DVN/WS9OUR.
When I instead use the Dataverse API or query the Dataverse database where Dataverse records those MDC counts, I get different counts.
For example, for the same example dataset at https://doi.org/10.7910/DVN/WS9OUR, the Dataverse API call https://dataverse.harvard.edu/api/datasets/:persistentId/makeDataCount/downloadsTotal?persistentId=doi:10.7910/DVN/WS9OUR returns 35 (as of today).
From the Dataverse database's datasetmetrics table, I can also see these counts broken down by month so that I'm able to see for which months the counts are different between the two sources.
The differences in the examples I've given are relatively small - 31 versus 35. I suppose they could be explained by caching or other timing issues, especially when the counts are from more recent months?
But for other datasets, the differences get bigger, like:
From the datasets I've checked so far, the counts from the Dataverse API have been greater than the counts from the DataCite API.
Lastly, I haven't compared the view counts but I can or others can if folks think that might help with troubleshooting.
Thanks in advance for any insights you can provide :)
Julian Gautier (he/him)
Product Research Specialist, IQSS
Interested in helping test Dataverse? Sign up for user experience research
--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
dataverse-commu...@googlegroups.com.
To view this discussion visit
https://groups.google.com/d/msgid/dataverse-community/099b8865-d76e-44b0-acf7-e51dfb5114ecn%40googlegroups.com.
FWIW: One part of the difference appears to be that DataCite appears to be reporting unique views whereas we’re reporting total views. Those are both derived quantities summing lower-level entries for (as named in our database) viewsuniqueregular and viewsuniquemachine for unique views, or viewstotalregular and viewstotalmachine for viewstotal, across all country codes (and then summed for all months). (I think both types of counting already remove ‘double-clicks’ – requests for the same URL within 30 seconds).
Since unique means “Multiple activities qualifying for the metric type in question representing the same dataset and occurring in the same user-sessions MUST be counted as only one “unique” activity for that dataset.”, I think that would mean downloading of 1000 datafiles in a dataset by one user would count as 1 unique download and 1000 regular downloads (haven’t fully verified that) which could account for the big differences on big datasets. It’s been too long for me to recall what was discussed when we chose what to display, but I wouldn’t be surprised if there was concern that unique views would mean big datasets would be underrepresented (e.g. fewer counts than if you simply spread the files across multiple datasets).
-- Jim
From: dataverse...@googlegroups.com <dataverse...@googlegroups.com> On Behalf Of Julian Gautier
Sent: Friday, January 16, 2026 4:03 PM
To: Dataverse Users Community <dataverse...@googlegroups.com>
--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
dataverse-commu...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dataverse-community/bbdb564e-14ea-4841-8ca4-a5093c7de526n%40googlegroups.com.
I may have been mistaken that DataCite is using total unique counts. Instead, it may be showing regular unique counts (total = regular + machine):
Looking at the first two datasets listed for Syracuse/QDR, the db has
6107 R 5006 M = 11113 total
6578 R 1433 M = 8011 total
Whereas DataCite has 5895 and 6417 respectively for those two, which a) looks close to the Regular counts (slightly below – I haven’t checked if we had reporting problems some months since 2019-10-01 when we start MDC, but wouldn’t be too surprised), and b) has Machine counts that mirror the sizes of the difference (i.e. while the regular counts differ by <10%, the machine counts are ~3x different for these two datasets).
FWIW: There has been another discussion w.r.t. machine counts with DataCite not picking them up with their new browser script (and possibly not reporting them in the api) and several of us pointing out that we have tools/scripts (PyDataverse, etc.) in use that represent real scientific use but don’t get counted as regular/via a browser counts.
-- Jim
From: dataverse...@googlegroups.com <dataverse...@googlegroups.com>
On Behalf Of Julian Gautier
Sent: Tuesday, January 20, 2026 10:05 AM
To: Dataverse Users Community <dataverse...@googlegroups.com>
Subject: Re: [Dataverse-Users] Why are the MDC download counts from Dataverse API/database different than MDC download counts from DataCite API/Event Data?
Thanks Jim! I think that's worth a lot! I updated the Google Sheet to show each datasets' unique counts in Dataverse databases, too, and to show the differences between those unique counts and DataCite's counts. The differences are much smaller, which I hope supports the idea that the folks at DataCite decided to report unique counts in their API and on DataCite Commons pages.
--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
dataverse-commu...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dataverse-community/fd2aa178-a0e0-4370-9811-9c6fa31107e2n%40googlegroups.com.
