Extract Metadata from Dataverse Repository

149 views
Skip to first unread message

S Patil

unread,
Mar 21, 2022, 9:29:47 AM3/21/22
to Dataverse Users Community
Hi,
I would like to extract the metadata from the prototype dataverse repository (https://portal.odissei.nl/).
I have checked the user guide of Dataverse and I am not able to find a way to extract the complete metadata from this repository. Could someone please help!

This is much needed as a starting point for my master's project.

Thanks,
Suraj Patil

Philip Durbin

unread,
Mar 21, 2022, 10:58:36 AM3/21/22
to dataverse...@googlegroups.com
Hi Suraj,

Downloading all metadata via API is probably the best approach. Let me give you a link for version 5.6 of Dataverse since that's what you seem to be running: https://guides.dataverse.org/en/5.6/api/getting-started.html#downloading-metadata

The most complete of the metadata formats is the "native" JSON format. From a dataset landing page, you can find this by clicking "Metadata" then "Export Metadata" then "JSON" to arrive at a URL like this: https://portal.odissei.nl/dataset.xhtml?persistentId=doi:10.17026/dans-xnf-34mm

If you're ok with non-complete metadata, another approach would be to use some of the community-contributed reporting tools at https://guides.dataverse.org/en/5.6/admin/reporting-tools-and-queries.html

I hope this helps. Please keep the questions coming.

Thanks,

Phil

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/0929c61c-1a70-4ab9-856b-e0117b0c1510n%40googlegroups.com.


--

Julian Gautier

unread,
Mar 21, 2022, 11:10:23 AM3/21/22
to Dataverse Users Community
Hi Suraj Patil,

To add to what Phil wrote:

We've heard from users who need to be able to do this, especially without the need to use APIs and scripting languages, but official support for this hasn't been built in the Dataverse software, yet. In addition to what Phil wrote, there are some other things that might help in the meantime:
  • If you have access to the database of that prototype Dataverse repository, or if you can get in touch with the administrators of the repository, the database could be queried for the metadata you need. That would give you access to all of the repository's metadata.
  • If you need just the dataset-level metadata (such as dataset title and author), if you work on a Mac computer, and if you'd like to explore the metadata in CSV files, I can share an application I've been working on that extracts dataset metadata into CSV files. For now it works only on Mac computers and gets only dataset metadata, so it doesn't get metadata for collections (like the name and creator metadata of the DANS EASY metadata [test] collection) or metadata for files within each dataset (although I see that none of this repository's published datasets have files). And if you want the metadata of datasets that have not been published, you'll need to provide the application with the API Token of an account on that prototype repository that has permission to view those unpublished datasets.
Julian

Julian Gautier
Product Research Specialist, IQSS

S Patil

unread,
Mar 23, 2022, 4:24:37 AM3/23/22
to Dataverse Users Community
Hi Phil,

Thanks for your help!

I had a look at the guide link that you have shared and I can see the guide only explains how to get the metadata for a single DOI. However, my aim is to retrieve the complete metadata from the portal.odissei.

The example curl command attached is from the user guide, it says how to get the metadata for a particular DOI. However if there is a curl command that can use to get me the complete metadata, it will be of a great help. 

I am sorry, I am new to the curl and if you have any source that can give me the curl code to extract the complete metadata it will serve my purpose.

Thanks again!
Suraj Patil
CURL-Dataverse User Guide.PNG

S Patil

unread,
Mar 23, 2022, 4:52:15 AM3/23/22
to Dataverse Users Community
Hi Julian,

This is indeed much helpful! Thanks a lot!

For the option 1 I am checking internally at university (as the prototype, portal.odissie is a last year project by one of the student) to see if someone has access to the database to query the metadata. 

I am eager to know about the option 2, could you please share with me the tool/application that can be used in Mac to get the metdata, indeed the main interest of the metadata for my project is the author name and the title. Hence I believe this will be of a great help!

Also, can you please let me know if this tool is an outcome of any of the research paper, that I can cite.

Thanks again for all your help!

Regards,
Suraj Patil

Julian Gautier

unread,
Mar 23, 2022, 9:42:22 AM3/23/22
to Dataverse Users Community
Hi Suraj Patil,

The Dataverse guides don't have any single curl command that when run would return all of the metadata of all datasets in a Dataverse repository. I think you would have to write a script to use several API endpoints, including the one in your screenshot, in order to get those metadata export files for each dataset. Each export file would contain the metadata for a dataset, so for the "ODISSEI prototype" repository you would have 2,806 JSON files or 2,806 XML files, depending on which export file you specify, then you would need to query all of those files depending on what you need to do with the metadata.

If you email me at julian...@g.harvard.edu I can send you the tool, which basically uses Python to create a graphical user interface that uses the Dataverse APIs to write dataset metadata to CSV files. The tool isn't the outcome of a research paper. I'd suggest citing the tool as software, but I'm testing it now and I haven't released it widely, yet. I'm happy to discuss more over email if you're still interested.

Thanks!
Julian
Message has been deleted

Sherry Lake

unread,
Mar 23, 2022, 2:58:59 PM3/23/22
to Dataverse Users Community

Hello Suraj,

You could ask the admin for https://portal.odissei.nl/ to set up an OAI-PMH server.

The admin can set it up to include all the local, published datasets by specifying the Unique Identifier authority registered to your Dataverse installation, for example: for my Dataverse repository I set the "Definition Query":  dsPersistentId:"doi:10/18013/"

Once the OAI-PMH query has been exported, you can then capture metadata in dc or ddi format (XML output) with these browser commands (from UVa Dataverse repository):


You can also run this command;

which generates a XML file with the individual dataset metadata query commands (using directApiCall) to get specific dataset's and file's metadata  in json format.

--
Sherry Lake
University of Virginia - LibraData 

Stefan Kasberger

unread,
Apr 27, 2022, 6:39:59 AM4/27/22
to Dataverse Users Community
Hi Suraj,

i use pyDataverse (I am the developer of it) for this. You can find some helpful code snippets in the docs to get your data harvesting started.

Greetz, Stefan

paul....@ubc.ca

unread,
May 11, 2022, 5:54:32 PM5/11/22
to Dataverse Users Community
Another possible solution would be to use the OAI feed from the Dataverse installation. The metadata from the OAI feed would not be as complete as Dataverse JSON, but it would certainly be easier to harvest, possibly using something like pyoai.

Paul
Reply all
Reply to author
Forward
0 new messages