Metrics API files/monthly vs. number of files in UI

65 views
Skip to first unread message

Laura Huisintveld

unread,
Jan 23, 2025, 4:52:20 AMJan 23
to Dataverse Users Community
Hello all,

I have a question about this API endpoint: /api/info/metrics/files/monthly
The Guide mentions for this API: 'monthly cumulative timeseries from first date of first entry to now' and 'eleased means only currently released dataset versions (not unpublished or DEACCESSIONED versions)'.

If I retrieve metrics from this endpoint for DataverseNL, I get as result a csv, with last entry: 2025-01: 545,035

However, if I look up the total number of files in the UI, (by going to the root and using the facets to filter on published files only), I get a result of 220,553. 

I know now that the download statistics shown in the UI are an approximation, but the number of search results is probably not estimated?

How can this difference be explained?

Best wishes, Laura

James Myers

unread,
Jan 23, 2025, 8:33:32 AMJan 23
to dataverse...@googlegroups.com

Laura,

 

Great catch! I’m not completely sure yet but looking at the code this morning, I think the underlying query is getting the number of files in dataset versions published per month. It is then incorrect to simply add these to get the total over time because when a new version is published in a new month, the files in common are counted twice. Please submit an issue. FWIW, the /api/info/metrics/files endpoint looks like it does give the correct all-time total (at least does not have this bug).

 

-- Jim

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dataverse-community/5151fafa-a572-43cf-af67-ff808fcd5570n%40googlegroups.com.

qqm...@hotmail.com

unread,
Jan 24, 2025, 4:39:07 PMJan 24
to Dataverse Users Community
I believe #11189 should be a fix for this issue. Note that I think the /files endpoint had a bug that resulted in an under-count as well. That is also hopefully fixed in the PR. The queries for this are complex - help from anyone in reviewing them or testing the fixes would be greatly appreciated.
Message has been deleted

Laura Huisintveld

unread,
Jan 27, 2025, 9:43:49 AMJan 27
to Dataverse Users Community
Thanks you Jim, for creating the PR!

Op vrijdag 24 januari 2025 om 22:39:07 UTC+1 schreef qqm...@hotmail.com:
Reply all
Reply to author
Forward
0 new messages