top 25 cited datasets in your Dataverse installation

89 views
Skip to first unread message

Philip Durbin

unread,
Jan 26, 2024, 12:36:41 PM1/26/24
to dataverse...@googlegroups.com
I don't know if this is common knowledge or not, but I was just chatting* with Kelly Statis from DataCite and if your installation of Dataverse uses DataCite (most do), you can pretty easily get a list of the 25 most cited datasets.

To back up a bit, if you go to https://commons.datacite.org/repositories/x3oc4vr which is Harvard Dataverse's landing page in DataCite Commons, you'll see 2578 citations. Great! But which datasets have been cited? I know you can set up Make Data Count for this, but below is a quick way to check the top 25.

The main thing you need to know is your installation's client-id (Harvard Dataverse example below). I'm not 100% sure where to find this but I assume it's the value of dataverse.pid.datacite.username in domain.xml. You might also be able to find it in the list of DataCite clients at https://support.datacite.org/reference/get_clients

Anyway, once you have your client-d, here's how you can get the top 25 citations:

curl 'https://api.datacite.org/dois/?client-id=gdcc.harvard-dv&sort=-citation-count' | jq '.data[].attributes | "\(.citationCount) \(.url)"' -r

You can also paginate, of course, to get more than 25. See https://support.datacite.org/docs/api-get-lists

Here are the top 25 cited datasets for Harvard Dataverse, as of this writing:

144 https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/OHHUKH
34 https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/VOQCHQ
32 https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/28075
26 https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/LM4OWF
25 https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/GDF6Z0
25 https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/PRFF8V
18 https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/4JQRCL
16 https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/ZTPW0Y
16 https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/LEJUQZ
14 https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/PE8TWP
13 https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/HTTWYL
11 https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/28468
10 https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/ZSBZ7K
9 https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/DBW86T
9 https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/LZHMG3
9 https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/TDOAPG
8 https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/42MVDX
8 https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/WMGTNS
8 https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/TTMZ08
8 https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/3WZFK9
8 https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/26147
7 https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/NRR7MB
7 https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/E9N6PH
7 https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/AMRXJA
7 https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/25833

Have fun!

Phil

Philip Durbin

unread,
Mar 1, 2024, 9:40:03 AM3/1/24
to dataverse...@googlegroups.com
Just a quick follow up...

In the example I used Harvard Dataverse's client id (gdcc.harvard-dv):

curl 'https://api.datacite.org/dois/?client-id=gdcc.harvard-dv&sort=-citation-count' | jq '.data[].attributes | "\(.citationCount) \(.url)"' -r

I checked with Kelly Stathis at DataCite and she explained that you can easily figure out the client id for your Dataverse installation by looking it up via the DOI authority/prefix. For example, UNC Dataverse has 10.15139 as the authority. You can look up the client id for 10.15139 by going to https://api.datacite.org/prefixes/10.15139 to discover that it is "gdcc.odum-dv". Then you can plug in that client id as before:

curl 'https://api.datacite.org/dois/?client-id=gdcc.odum-dv&sort=-citation-count' | jq '.data[].attributes | "\(.citationCount) \(.url)"' -r

This tells us their top cited dataset is https://doi.org/10.15139/S3/11900

Pretty interesting, no? I hope you get a chance to try it with your Dataverse installation.

Long live data citation!

Phil

Reply all
Reply to author
Forward
0 new messages