Percentage in Datasets by Most Common Subjec metric plot

12 views
Skip to first unread message

Michel Bamouni

unread,
May 14, 2019, 7:35:48 AM5/14/19
to Dataverse Users Community
Hi,

I install Dataverse-Metrics and I successfully get the six plots.
In Harvard dataverse metrics, for the plot "Datasets by Most Common Subject", there is a percentage like this Total: 26.5k (25% of 105k). In my metrics, I only have the Total and not the percentage. In Dataverse-Metrics install guide, they don't talk about this percentage anywhere. So, how can I get the percentage in parenthesis like for Harvard metrics in the plot "Datasets by Most Common Subject"?

I join my plot and harvard plot.

Best regards,

Michel

Philip Durbin

unread,
May 14, 2019, 8:15:03 AM5/14/19
to dataverse...@googlegroups.com
Hi! First, let's not call the installation of dataverse-metrics at https://dataverse.org/metrics "Harvard Dataverse Metrics" because it includes metrics from a dozen installations of Dataverse (and we'd like to add more once everyone upgrades to 4.9 or higher). :)

Let's call that installation "community metrics" or something. :)

If you click "show only this frame" in Firefox you can see that the installation is hosted by UNC: https://dataversemetrics.odum.unc.edu/dataverse-metrics/

I bring this up because from there you can download the config file from https://dataversemetrics.odum.unc.edu/dataverse-metrics/config.json

Let's take a look at the blacklist mentioned at https://github.com/IQSS/dataverse-metrics/tree/v0.2.3#configuration


{
  "datasets/bySubject": [
    "Not specified",
    "Other"
  ]
}

It is the presence of this blacklist that leads to "Total: 26.5k (25% of 105k)" being shown. It means that without the blacklist in place, 105k datasets would be shown. 26.5k datasets are not blacklisted. 78.5k datasets are not shown in the treemap (they are blacklisted) because they are either "Not specified" or "Other".

We can get the exact numbers like this:


Not specified    77893


Other    934

"datasets-bySubject.tsv" appears in https://github.com/IQSS/dataverse-metrics/blob/v0.2.3/plots.js#L131 and you can scroll around in that file to find the names of the tsv files for the other 5 plots.

I hope this makes sense. Please keep the questions coming!

Thanks,

Phil




--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/20132093-e26e-49ea-8958-015445f57e68%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--

Michel Bamouni

unread,
May 15, 2019, 3:27:24 AM5/15/19
to Dataverse Users Community
Hi Phil,

The answer is clear. I will add the blacklist subject to my dataverse-Metrics config file.
I also notice that https://dataverse.org/metrics is for many dataverses and not Harvard only.

Best regards

Michel
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages