Retrieving file download metrics at dataset level

56 views
Skip to first unread message

Philipp at UiT

unread,
May 31, 2022, 7:01:30 AM5/31/22
to Dataverse Users Community
One of our data data depositors is asking whether there is a way to retrieve file download metrics at file level. The idea is to display the metrics on a GitHub page related to the dataset, preferably in the form of a badge/shield; cf. https://github.com/badges/shields.

For internal purposes, I usually retrieve such metrics by querying the database, but the depositor is looking for an easier way, possibly through API. I see that the metrics API allows to retrieve file downloads at collection level, but I couldn't find examples at dataset level.

Thanks for any advice!
Best, Philipp

Philip Durbin

unread,
Jun 2, 2022, 9:41:08 AM6/2/22
to dataverse...@googlegroups.com
Hi Philipp,

I can't seem to find exactly what you want (passing in a dataset DOI and getting a download count per file) but there is a new-ish APi endpoint to get counts of all file downloads like this:

$ curl  -s https://data.qdr.syr.edu/api/info/metrics/filedownloads | head
id,pid,count
2891,"doi:10.5064/F6CE5MRF/3VVO7G",759
2880,"doi:10.5064/F6CE5MRF/AYIZ06",730
2881,"doi:10.5064/F6CE5MRF/YQMBNX",685
2893,"doi:10.5064/F6CE5MRF/LM8RFL",672
2879,"doi:10.5064/F6CE5MRF/M6NUCT",358
2681,"doi:10.5064/F6Z60KZB/0NR0VZ",329
24047,"doi:10.5064/F6LBYMQO/3GX4Y9",235
24118,"doi:10.5064/F6LBYMQO/EMRGLA",204
2885,"doi:10.5064/F6CE5MRF/JMZPKR",197

I'm using QDR's server because this doesn't seem to work on the demo server. I'll also attach the "Downloads per DataFile (top 100)" graph from https://data.qdr.syr.edu/metrics_5ef2ae2be4b/ which visualizes these download counts.

Note that it's also possible to pass in the parent dataverse collection like this:

$ curl  -s https://data.qdr.syr.edu/api/info/metrics/filedownloads?parentAlias=NIRI | head
id,pid,count
6635,"doi:10.5064/F6LHMHJR/875QHI",55
4382,"doi:10.5064/F6LHMHJR/QH0ND8",37
4380,"doi:10.5064/F6LHMHJR/VGB0CW",36
4383,"doi:10.5064/F6LHMHJR/Q0LKWJ",33
4490,"doi:10.5064/F6LHMHJR/3Z4THJ",29
4494,"doi:10.5064/F6LHMHJR/IQFOII",27
4492,"doi:10.5064/F6LHMHJR/RR8ZVU",26
4495,"doi:10.5064/F6LHMHJR/XEX32O",25
4491,"doi:10.5064/F6LHMHJR/AIIB3C",23

So that might help if the dataset you're interested in happens to be in its own collection.

You're very welcome to open an issue about the feature you want, of course.

Thanks,

Phil



--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/80dc9d0f-d5c9-444e-a171-32ece20f9f0en%40googlegroups.com.


--
Screen Shot 2022-06-02 at 9.33.48 AM.png

Philipp at UiT

unread,
Jun 3, 2022, 1:40:22 AM6/3/22
to Dataverse Users Community
Thanks, Phil!

I've been able to retrieve the sum of all file download counts for a specific dataset using your curl command combined with some more commands:

curl -s https://dataverse.no/api/info/metrics/filedownloads?parentAlias=ntnu | grep 10.18710/TLA01U | awk -F, '{sum+=$3}END{print sum}'

What I don't know, though, is how to display this on the GitHub page related to the dataset, preferably in the form of a badge/shield; cf. https://github.com/badges/shields. Any ideas?

Best, Philipp

P.S.: The strange thing is that the curl command only works if I specify the sub-collection (?parentAlias=ntnu). Maybe that was the issue on the demo server as well? Why does it work on QDR without specifying the sub-collection?

Philip Durbin

unread,
Jun 8, 2022, 5:10:08 PM6/8/22
to dataverse...@googlegroups.com
Here, the attached (badge.html) works on my machine. The DOI is hard coded, as is your server. Please give it a shot.

badge.html

Philipp at UiT

unread,
Jun 10, 2022, 1:24:32 AM6/10/22
to Dataverse Users Community
Thanks a lot, Phil! This works like a dream!

One thing I realized, though, is that there seems to be a delay for downloads numbers to be updated when retrieved through API. I downloaded a file from the dataset yesterday, so the number of downloads increased from 289 to 290, and 290 was immediately displayed on the dataset landing page. But when retrieving the number through API, the number was still 289. First today the updated number was showing via API.

Philip Durbin

unread,
Jun 13, 2022, 2:03:21 PM6/13/22
to dataverse...@googlegroups.com
Hi Philipp,

I'm glad it worked for you. :)

The reason you aren't seeing the latest download count is that the Metrics API caches results for a time. You can lower this setting using :MetricsCacheTimeoutMinutes if you like: https://guides.dataverse.org/en/5.10.1/installation/config.html#metricscachetimeoutminutes

While I'm glad that bit of Javascript worked, another way I've seen this implemented is for the image/badge/shield to be generated server-side. For example, for Stack Overflow, I can get a badge of my score/karma/points/whatever by putting my id in a URL like this: https://stackexchange.com/users/flair/10330.png

Thanks,

Phil

Philipp at UiT

unread,
Jun 18, 2022, 7:02:05 AM6/18/22
to Dataverse Users Community
Thanks, Phil! This is also what we heard back from the depositor asking for this feature: That it would be nice to have this implemented in a way so that users simply could generate an image/badge/shield on server-side as you outlined. Is this something that would need to be provided by the Dataverse software or by another service, e.g., https://github.com/badges/shields/tree/master/services?

Best, Philipp

Philip Durbin

unread,
Jun 23, 2022, 12:15:10 PM6/23/22
to dataverse...@googlegroups.com
The badges could be created and hosted by a different, non-Dataverse server. However, practically speaking, it would probably be nicer to have this built into Dataverse. Please feel free to create an issue about this. In the meantime, someone could probably build a little service on the side without much trouble.

Reply all
Reply to author
Forward
0 new messages