register the data without uploading its copy to Dataverse?

51 views
Skip to first unread message

Yury Katkov

unread,
Jun 17, 2014, 11:55:01 AM6/17/14
to dataverse...@googlegroups.com
Hi everyone! 

There are petabytes of data in millions of files that I want to release, that is related to my study and lets say that all these data are stored in some special ftp server inside the organization. I want to publish these data in Dataverse but I don't want to upload all my petabytes to Dataverse thus duplicating it. Is it possible for Dataverse to just organize the access to the data and not to store it? 

Cheers,
Yury

Eleni Castro

unread,
Jun 17, 2014, 3:35:39 PM6/17/14
to dataverse...@googlegroups.com
Hi Yury,

Thanks for your interest in Dataverse! We have had requests like this in the past, and considering the size of your dataset it does make complete sense that you would not be able to load everything onto Dataverse. We would very much like to work with you to find a solution for making your data easily accessible via Dataverse.

Would it be possible to get more information from you on the guarantee that the data will be accessible in the long term from your FTP server inside your organization? Based on our membership with DataCite (we mint DOIs from them), the Joint Declaration of Data Citation Principles, and general data preservation best practices, we kindly ask that anyone linking out to their data from Dataverse rather than depositing the files directly into Dataverse should ensure that we can at least point to a persistently accessible location for the data on their organization's server, and whenever possible provide further details on how other users can get access to the data.

Please let me know if this would be possible and we can work directly together to ensure that the information with regards to access of the data, and description of the data (documentation/readme files) is made clear to our users.

Best regards,
Eleni


-- 

Eleni Castro

Research Coordinator, Data Acquisition and Archiving, Data Science

IQSS, Harvard University

http://www.iq.harvard.edu/people/eleni-castro 


We're redesigning Dataverse and want your feedback! Please check out our Beta Site.



--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/79ada9f1-1b14-4438-b49a-3b1ab0719044%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Yury Katkov

unread,
Jun 17, 2014, 6:50:58 PM6/17/14
to dataverse...@googlegroups.com
Hello! 

I have to clarify that I didn't mean the particular Harvard Dataverse installation, but rather the ability of the Dataverse software to deal with URL to the files without making us uploading the files themselves to Dataverse storage. 

-----
Yury Katkov


--
You received this message because you are subscribed to a topic in the Google Groups "Dataverse Users Community" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dataverse-community/hXvMxd54HlY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to dataverse-commu...@googlegroups.com.

To post to this group, send email to dataverse...@googlegroups.com.

Eleni Castro

unread,
Jun 18, 2014, 10:01:40 AM6/18/14
to dataverse...@googlegroups.com
Hi Yury,

Apologies for misunderstanding.

The Dataverse software does support clickable URLs in our HTML supported fields, which have a blue background (see screenshot below). One place where you could document and link out to your data in the Dataverse study would be in the "Description" field.
Inline image 1

I would also recommend filling out information about Data Set Availability as shown in the screenshot below, so your users know more about where/how they can access it if its not available in your Dataverse itself:
Inline image 2

Hope this information is helpful, and please let us know if you need any additional assistance.

Cheers,

Mercè Crosas

unread,
Jun 18, 2014, 11:22:04 AM6/18/14
to dataverse...@googlegroups.com
Yuri and others in the list,

Being able to registere and catalog datasets in Dataverse but having flexibility on where the data storage resides is an important feature in the future of Dataverse software. There are some groups that are already working on a proof of concept for a Dataverse architecture that will allow this type of configuration - our collaborators at ODUM (University of North Carolina) are working on an abstract layer to support different type of storages. In this case, they are integrating Dataverse with IRODS for their data storage, through this abstract layer. An extension of this architecture would allow to publish a dataset in Dataverse but have all the data for that dataset stored in a location of your choice different than the default Network File System. This architecture uses ModeShape.

In fact, today there is a talk about this in our IRODS - Dataverse meeting. We'll share the slides for this talk in this dataverse list. There are other groups using Dataverse (Dutch Dataverses and ScholarPortal in Canada) who are also interested in this type of functionality. I encourage you to collaborate with us to define the requirements and provide us feedback about the direction we are going with this.

Merce


Mercè Crosas, Ph.D.
Director of Data Science, IQSS
Harvard University


Reply all
Reply to author
Forward
0 new messages