api question: get a list of all dataverses?

46 views
Skip to first unread message

Frank Andreas Sposito

unread,
Dec 18, 2017, 8:10:00 AM12/18/17
to Dataverse Users Community
Hi Everyone,

Please forgive me if an answer to this question exists elsewhere. I scoured both the threads on this sites and the internet itself but was unable to find a solution. 

Our dataverse installation is now live and we are beginning the process of migrating about 700 old Nesstar DDI2.5 records into our new platform (data.aussda.at). To do this we need to write scripts that leverage the various Dataverse APIs. The problem we're having is securing clear documentation about the complete universe of API calls that exist. I've read the API guides thoroughly, but have been unable to determine, for example, if it's possible to get a list of all dataverses associated with an installation. At the moment we are making a standard http request through glassfish and parsing the results - basically a hack. Is there any other way to get a complete list of dataverses (so that we can collect their ids and subsequently upload new metadata to them vie SWORD depending on their contents)? More broadly, are their any shadow API guides that fill in some of the gaps in the official guides and documentation?

Thanks very much in advance for the help. Frank

Philip Durbin

unread,
Dec 18, 2017, 8:43:44 AM12/18/17
to dataverse...@googlegroups.com
Hi Frank,

The API Guide needs some work.

To get a list of *published* dataverses, the easiest way is probably to use the Search API with type=dataverse like https://demo.dataverse.org/api/search?q=*&type=dataverse and then iterate through the results as described at http://guides.dataverse.org/en/4.8.4/api/search.html#iteration

To get a list of all dataverses (published and unpublished), I would suggest using a superuser API token against the SWORD "Service Document" API: http://guides.dataverse.org/en/4.8.4/api/sword.html#retrieve-sword-service-document . This endpoint shows users which dataverses they have permission to deposit into, which is all dataverses for a superuser. Given the task you're working on (depositing data), this is probably your best bet.

Finally, there's the "contents" API. Performance of this endpoint hasn't been great and rather than a flat list it will only show you direct children. That means if you have a deep hiearchy of dataverses, you'll need to make many API calls to collect them all. You're welcome to give is a shot: http://guides.dataverse.org/en/4.8.4/api/native-api.html#dataverses

I'd be remiss if I didn't mention that there's a Python app you can install on the side called "miniverse" that can use a read-only user to your Dataverse database to pull out metrics and such. I just clicked "Other links" and then "List of published Dataverses (.xlsx)" at https://services.dataverse.harvard.edu for example. On Dataverse community calls we've talked now and then about the future of miniverse now that the original developer has moved on. You can find the code at https://github.com/IQSS/miniverse

Oh, also there's a new effort to write useful database queries at https://github.com/IQSS/dataverse/issues/4169 if you don't mind querying the database directly,

Honestly, given how long this email is getting I'd probably be halfway done writing a new API endpoint that simply lists all dataverses via API for a superuser. You could help define what the output would be. If you're so inclined, please open a new GitHub issue at https://github.com/IQSS/dataverse/issues and specify how you'd like it to work. Please also open issue about confusing parts of the API Guide. For a while I've been thinking we should have a page in there that's more task-based. "I want a list of all dataverses." Stuff like that. We're happy to get ideas from the community.

Thanks!

Phil

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse-community@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/fa74ceff-939d-48dd-bda5-2d9a0d305b5f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

Frank Andreas Sposito

unread,
Dec 18, 2017, 9:56:54 AM12/18/17
to Dataverse Users Community
Hi Phil,
Thanks very much, this is just what I was hoping for. The contents api is actually perfect. Not sure how I missed it (and please forgive me for not looking more closely before asking). The database queries document is truly an unexpected and wonderful bonus. We've been thinking about working this way for quite a while, but were not sure how to proceed. 

But since we're on the subject, a related question. Our systems admin installed Dataverse such that the postgres can only be accessed from localhost and (I think) the glassfish account. Are there guidelines in the Dataverse user community on how to securely broaden this such that, for example, I could hit the database from a client application from the computer in my offices (which is part of the same local network)? The admin is wisely concerned about security exposure. But I think if there is a best practice recommendation he would open it up. Any suggestions?

Thanks again for your help. We've read a lot of your posts, and would have been lost without them. Best, Frank

To post to this group, send email to dataverse...@googlegroups.com.

Philip Durbin

unread,
Dec 18, 2017, 10:22:08 AM12/18/17
to dataverse...@googlegroups.com
Well, I'm a believe in the principle of least privilege* so I'd definitely create a read-only database user. Obviously, it's more secure to *not* open up ports to clients, more secure to keep the firewall in place. Everyone has different security needs so I'm happy to defer to others on what their experiences have been.

I'm glad you're finding the Google doc with database queries useful. Everyone should feel free to leave comments or request access to ask for a query or provide an answer to someone else. Oh, another resource you might enjoy if you're doing database queries are these SchemaSpy entity relationship diagrams: http://phoenix.dataverse.org/schemaspy/latest/relationships.html

I'm happy to hear that posts on this mailing list have been useful. If anyone has an interest in improving the documentation, we provide some tips at http://guides.dataverse.org/en/4.8.4/developers/documentation.html and you can find some "help wanted" issues at https://github.com/IQSS/dataverse/labels/Help%20Wanted%3A%20Documentation

Thanks,

Phil

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

benjami...@gmail.com

unread,
Dec 15, 2020, 6:45:19 AM12/15/20
to Dataverse Users Community
Wow, that ER diagram of Dataverse's database rocks, Philip! Thanks for posting it.

Reply all
Reply to author
Forward
0 new messages