Downloads count doesn't show correctly in the front page

133 views
Skip to first unread message

CRISTIAN BENITES C.

unread,
May 16, 2024, 12:03:09 PM5/16/24
to Dataverse Users Community
Hello,
We are experiencing some troubles with downloads count in the front page. The download count for every dataset is ok, but sometimes the total number of downloads doesn't sum up and it just shows 10. By "sometimes" I mean that apparently it fixes on its own after some time, I don't know what could be happening. 
We didn't configure Make Data Count and our Dataverse installation is the 6.0.

James Myers

unread,
May 16, 2024, 12:25:33 PM5/16/24
to dataverse...@googlegroups.com

FWIW: The front page uses an approximation which works well at larger counts but is somewhat choppy with low numbers as it relies on things the database does periodically to update table statistics. Without that, when you have many counts, just computing that download count number starts to slow the page display.

 

There was an improvement in PR#8840 that makes the updates occur more frequently for small counts, and for version 6.1, the estimate is not used until the count is fairly large so you wouldn’t see it when the count is only 10. There are more details in the PR above.

 

If I recall the original estimate could be off by a few percent (and more when the count is <100), and the improvement dropped the error to <1% (maybe less), with the page now showing exact counts for low counts (hundreds?). At high counts, if you are watching closely, you might see the count go up 1 by one for a while and then adjust up or down by a few as the estimate adapts.

 

-- Jim

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/27bff915-a9ac-45e3-93a8-b9ede598ac44n%40googlegroups.com.

Message has been deleted
Message has been deleted

CRISTIAN BENITES C.

unread,
May 21, 2024, 9:11:18 AM5/21/24
to Dataverse Users Community
Thanks Jim,
It seems that's the case, now that there are more dataset downloads the count is showing correctly.

Bethany Seeger

unread,
May 14, 2025, 9:12:52 AMMay 14
to Dataverse Users Community
Hi Jim,

We are on 5.14 and are experiencing something like this now. If I download a dataset, I see the number on the dataset page go up, but the number on the front page remains static.  It hasn't changed in a few weeks, despite downloads (it's a 4833, in case that is helpful to know). 

Can you describe more of how this number is computed? I'd like to dig into it a little, but don't know where to start. 

In fact, it seems like the number was higher a while ago, but is now lower (though I can't quite prove this to myself, but it's a collective memory some in our group have).   Where could I look in the database to check on if the number is even remotely accurate?  

Also, does this play into datacite setup at all? We are not using Make Data Count stuff currently. 

Thanks!
Bethany



James Myers

unread,
May 14, 2025, 9:40:33 AMMay 14
to dataverse...@googlegroups.com

Bethany,

The comments in the PR linked below (PR#8840) describe the issue and how to investigate it. In short, the front page uses an estimate that is much faster to compute than counting the downloads themselves and, in 5.14, is probably correct within ~1%, but won’t update with every count. The API does not use the estimate, but is cached up to 7 days by default. Currently calling https://dataverse.yale.edu/api/info/metrics/downloads gives 4836 so the estimate looks to be close to the actual. (You can lower the caching time using https://guides.dataverse.org/en/latest/installation/config.html#metricscachetimeoutminutes .)

 

-- Jim

Bethany Seeger

unread,
May 14, 2025, 3:40:56 PMMay 14
to Dataverse Users Community
Thanks, Jim.  When I ran the command to look at the count in the db, it actually returns something different than either of those counts: 

dvndb=> select COUNT(id) from guestbookresponse;
 count
-------
  4863
(1 row)

dvndb=> SELECT ((reltuples / relpages) * (pg_relation_size('public.guestbookresponse') / current_setting('block_size')::int))::bigint FROM   pg_class WHERE  oid = 'public.guestbookresponse'::regclass;
 int8
------
 4833

https://dataverse.yale.edu/api/info/metrics/downloads returns:  {"status":"OK","data":{"count":4836}}

Our 
autovacuum_analyze_scale_factor=0.01

I'll check in with our devops team to see if they have any thoughts on how to optimize this.  

if we wanted to make it calculate the value each time (first query above), would we just remove the autovacume_analzye_scale_factor from the guestbookresponse table? 

Best,
Bethany

James Myers

unread,
May 14, 2025, 4:10:54 PMMay 14
to dataverse...@googlegroups.com

FWIW: All of that seems consistent with the design as of 5.14 – if I recall, a scale factor of 0.01 improves the estimate for every 1% + 50 rows of growth in that table. (That doesn’t mean the count only changes every ~100 downloads, it means that your ~30 count discrepancy should drop back to 0 every ~100 downloads when you have ~5000 total. You should still be seeing the number change every few downloads.) Similarly, if those 30 downloads could have been in the last seven days, the api call could be off by that much as well.

 

For a one-time fix, you can run the vacuum command manually – it’s in those PR comments (vacuum (VERBOSE, ANALYZE) guestbookresponse; ). You could potentially call this via a cron job as well.

 

If you want a better estimate, you can make the autovacuum smaller (not delete or make It bigger – the default was 0.1). I think the default of 50 rows is also something that could be lowered but I haven’t looked that up.

 

Another option would be to get to v6.1 – we added a lower limit for when the estimating starts and do a direct calculation below that value.

 

A complete change to do a direct table count, until you get to 6.1 would be to change the estimateGuestBookResponseTableSize() in postgres. In 6.1, it is set via the following script: https://github.com/IQSS/dataverse/blob/50239a082e38d30efa8c9bd6cc3b76980dbba3f3/src/main/resources/db/migration/V6.0.0.3__10095-guestbook-at-request2.sql That script is appropriate for 6.1 after the guestbook-at-request functionality was added, so you can’t use it as is, but you could just use the line at https://github.com/IQSS/dataverse/blob/50239a082e38d30efa8c9bd6cc3b76980dbba3f3/src/main/resources/db/migration/V6.0.0.3__10095-guestbook-at-request2.sql#L29 in the begin block and remove the first select (which is the estimate).

 

With this approach, Dataverse will still overwrite your custom method when you update to v6.1, so you’d have to change it back after that if you wanted to keep your version. (In general, customizing the database is risky because we automate the changes needed to upgrade and we can’t tell if you’ve made changes.)

Reply all
Reply to author
Forward
0 new messages