FWIW: The front page uses an approximation which works well at larger counts but is somewhat choppy with low numbers as it relies on things the database does periodically to update table statistics. Without that, when you have many counts, just computing that download count number starts to slow the page display.
There was an improvement in PR#8840 that makes the updates occur more frequently for small counts, and for version 6.1, the estimate is not used until the count is fairly large so you wouldn’t see it when the count is only 10. There are more details in the PR above.
If I recall the original estimate could be off by a few percent (and more when the count is <100), and the improvement dropped the error to <1% (maybe less), with the page now showing exact counts for low counts (hundreds?). At high counts, if you are watching closely, you might see the count go up 1 by one for a while and then adjust up or down by a few as the estimate adapts.
-- Jim
--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
dataverse-commu...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/dataverse-community/27bff915-a9ac-45e3-93a8-b9ede598ac44n%40googlegroups.com.
Bethany,
The comments in the PR linked below (PR#8840) describe the issue and how to investigate it. In short, the front page uses an estimate that is much faster to compute than counting the downloads themselves and, in 5.14, is probably correct within ~1%, but won’t update with every count. The API does not use the estimate, but is cached up to 7 days by default. Currently calling https://dataverse.yale.edu/api/info/metrics/downloads gives 4836 so the estimate looks to be close to the actual. (You can lower the caching time using https://guides.dataverse.org/en/latest/installation/config.html#metricscachetimeoutminutes .)
-- Jim
To view this discussion visit https://groups.google.com/d/msgid/dataverse-community/b0961fa3-8763-4ac1-850a-84916cb8a91dn%40googlegroups.com.
FWIW: All of that seems consistent with the design as of 5.14 – if I recall, a scale factor of 0.01 improves the estimate for every 1% + 50 rows of growth in that table. (That doesn’t mean the count only changes every ~100 downloads, it means that your ~30 count discrepancy should drop back to 0 every ~100 downloads when you have ~5000 total. You should still be seeing the number change every few downloads.) Similarly, if those 30 downloads could have been in the last seven days, the api call could be off by that much as well.
For a one-time fix, you can run the vacuum command manually – it’s in those PR comments (vacuum (VERBOSE, ANALYZE) guestbookresponse; ). You could potentially call this via a cron job as well.
If you want a better estimate, you can make the autovacuum smaller (not delete or make It bigger – the default was 0.1). I think the default of 50 rows is also something that could be lowered but I haven’t looked that up.
Another option would be to get to v6.1 – we added a lower limit for when the estimating starts and do a direct calculation below that value.
A complete change to do a direct table count, until you get to 6.1 would be to change the estimateGuestBookResponseTableSize() in postgres. In 6.1, it is set via the following script: https://github.com/IQSS/dataverse/blob/50239a082e38d30efa8c9bd6cc3b76980dbba3f3/src/main/resources/db/migration/V6.0.0.3__10095-guestbook-at-request2.sql That script is appropriate for 6.1 after the guestbook-at-request functionality was added, so you can’t use it as is, but you could just use the line at https://github.com/IQSS/dataverse/blob/50239a082e38d30efa8c9bd6cc3b76980dbba3f3/src/main/resources/db/migration/V6.0.0.3__10095-guestbook-at-request2.sql#L29 in the begin block and remove the first select (which is the estimate).
With this approach, Dataverse will still overwrite your custom method when you update to v6.1, so you’d have to change it back after that if you wanted to keep your version. (In general, customizing the database is risky because we automate the changes needed to upgrade and we can’t tell if you’ve made changes.)
To view this discussion visit https://groups.google.com/d/msgid/dataverse-community/51bf6779-015b-4348-bb2b-46dc4bca68f0n%40googlegroups.com.