DataVerse as interactive data catalogue?

63 views
Skip to first unread message

Michiel Meulendijk

unread,
Mar 15, 2017, 9:48:23 AM3/15/17
to Dataverse Users Community
I work at an academic hospital in the Netherlands, and am looking into open portal options for our research data. Ideally, we would like to facilitate the whole process from data request to publication.

Our main requirement is making operational hospital data (mainly patient data from a number of sources) available to researchers. They should be able to browse through them, view metadata (including e.g. ownership), visualize datasets, build data queries to include in their research proposals, etc.

Now, when the data is provided for research and results in a publication, it needs to remain traceable. As we are working with patient data, however, storing it in a public repository is out of the question. I would hope that storing a dataset in our local repository would maintain it for future reference.

I am not sure if DataVerse is ideal for the kind of interactive data browsing and querying we are envisioning. However, as it is built for academic purposes, I expect it to better fit into the academic workflow than other packages such as CKAN.

I would love to hear some perspectives on this.

Philip Durbin

unread,
Mar 15, 2017, 10:21:21 AM3/15/17
to dataverse...@googlegroups.com
Others should chime in but Dataverse is designed to be a public repository. We assume you'll want a DOI (or Handle) for your dataset and that you want to make available to services such as DataCite at least some basic metadata about the dataset such as title, author, description, etc.

Especially because you're talking about patient data, which is confidential, Dataverse is currently a poor fit for hosting it out on the internet. Even the support for "sensitive data" we plan to work on for Dataverse 5 is only rated as level 3 by Harvard's classification. See the roadmap at http://dataverse.org/goals-roadmap-and-releases and the definition of level 3 at http://policy.security.harvard.edu/view-data-security-level

All that said, even today you could consider using Dataverse to make your data more discoverable if can store the data itself elsewhere and "harvest" the metadata from that system into Dataverse via OAI-PMH.

For data that isn't confidential, Dataverse does offer interactive browsing of tabular files using TwoRavens: http://guides.dataverse.org/en/4.6.1/user/data-exploration . And there's a tool under development called "Data Explorer"* that we'd like to support in the future.

Dataverse does track who changes a dataset over time, which helps with your need for traceability.

I hope this helps. I hope others jump in with more thoughts.

Also, please note that there are two installations of Dataverse in the Netherlands:

- https://dataverse.nl
- https://datasets.socialhistory.org

Thanks for your interest in Dataverse! Please keep the questions coming. You're also welcome to call in if you'd like: http://dataverse.org/community-calls

Phil

* See the long thread at https://groups.google.com/d/msg/dataverse-community/Nc8tX0s8lo8/EPTCua8pBgAJ


--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse-community@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/8551eeb5-e9ec-429f-aaf6-b0384de2d765%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

Mercè Crosas

unread,
Mar 15, 2017, 4:53:40 PM3/15/17
to dataverse...@googlegroups.com
I think Dataverse could be a good fit. It would provide a way to manage and share the patient data to be reused for research - with additional documentation (if desired), metadata, versioning and terms of use support, and a DOI for each dataset to uniquely reference it. Most patient data in the U.S. falls under HIPAA regulation. To expand on Phil's comment, in Dataverse 5, Dataverse will integrate with DataTags to support this type of sensitive data (including many cases of HIPAA type data). 

Even though you want to keep the datasets in the repository private for obvious reasons, you would still have the chance to make the citation information for the dataset, as well as other descriptive metadata, public, so others know that that dataset exists even though they can't access it, and you'll be able to integrate the dataset with the publication that has used that data. Dataverse supports linking a dataset with one (or more) publications, which helps integrate the research process and outputs - in fact, this is a common use of Dataverse repositories.

Merce


----------
Mercè Crosas, Ph.D., Chief Data Science and Technology Officer, IQSS, Harvard University
@mercecrosas mercecrosas.com

On Wed, Mar 15, 2017 at 10:21 AM, Philip Durbin <philip...@harvard.edu> wrote:
Others should chime in but Dataverse is designed to be a public repository. We assume you'll want a DOI (or Handle) for your dataset and that you want to make available to services such as DataCite at least some basic metadata about the dataset such as title, author, description, etc.

Especially because you're talking about patient data, which is confidential, Dataverse is currently a poor fit for hosting it out on the internet. Even the support for "sensitive data" we plan to work on for Dataverse 5 is only rated as level 3 by Harvard's classification. See the roadmap at http://dataverse.org/goals-roadmap-and-releases and the definition of level 3 at http://policy.security.harvard.edu/view-data-security-level

All that said, even today you could consider using Dataverse to make your data more discoverable if can store the data itself elsewhere and "harvest" the metadata from that system into Dataverse via OAI-PMH.

For data that isn't confidential, Dataverse does offer interactive browsing of tabular files using TwoRavens: http://guides.dataverse.org/en/4.6.1/user/data-exploration . And there's a tool under development called "Data Explorer"* that we'd like to support in the future.

Dataverse does track who changes a dataset over time, which helps with your need for traceability.

I hope this helps. I hope others jump in with more thoughts.

Also, please note that there are two installations of Dataverse in the Netherlands:

- https://dataverse.nl
- https://datasets.socialhistory.org

Thanks for your interest in Dataverse! Please keep the questions coming. You're also welcome to call in if you'd like: http://dataverse.org/community-calls

Phil

* See the long thread at https://groups.google.com/d/msg/dataverse-community/Nc8tX0s8lo8/EPTCua8pBgAJ
On Wed, Mar 15, 2017 at 9:48 AM, Michiel Meulendijk <michielm...@gmail.com> wrote:
I work at an academic hospital in the Netherlands, and am looking into open portal options for our research data. Ideally, we would like to facilitate the whole process from data request to publication.

Our main requirement is making operational hospital data (mainly patient data from a number of sources) available to researchers. They should be able to browse through them, view metadata (including e.g. ownership), visualize datasets, build data queries to include in their research proposals, etc.

Now, when the data is provided for research and results in a publication, it needs to remain traceable. As we are working with patient data, however, storing it in a public repository is out of the question. I would hope that storing a dataset in our local repository would maintain it for future reference.

I am not sure if DataVerse is ideal for the kind of interactive data browsing and querying we are envisioning. However, as it is built for academic purposes, I expect it to better fit into the academic workflow than other packages such as CKAN.

I would love to hear some perspectives on this.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse-community@googlegroups.com.

marion.w...@dans.knaw.nl

unread,
Mar 21, 2017, 6:54:17 AM3/21/17
to Dataverse Users Community, michielm...@gmail.com
Dear Michiel,

DANS (an institute of KNAW and NWO) is running a dataverse service for the Netherlands: DataverseNL.
See our webiste: https://dans.knaw.nl/en/about/services/archiving-and-reusing-data/DataverseNL?set_language=en

If you are interested, we could make an appointment to discuss all the possibilities of Dataverse for your needs.

Best, Marion Wittenberg

Op woensdag 15 maart 2017 14:48:23 UTC+1 schreef Michiel Meulendijk:
Reply all
Reply to author
Forward
0 new messages