Feature Request Discussion: Access (from/to) Cloud Storage services (e.g., Dropbox, github)

44 views
Skip to first unread message

August Muench

unread,
Dec 5, 2013, 11:17:23 AM12/5/13
to dataverse...@googlegroups.com
I'd like to kick off another end user feature request discussion (previous threads have included user interface changes, access to data files and metadata models/templates).

This one I'd like to discuss access to/from cloud storage services in Dataverse.

The specific use case I had in mind was one where scientists/researchers use a cloud storage service as a "workbench" for collaborative (also: real time, versioned/unversioned, prepublication) data analysis. 

The types of interactions that I might envision between these services/workbenches and Dataverse include:

  1. push/freeze: from the cloud service to a study.
  2. pull/unfreeze: from dataverse to a workbench 
I'm happy to add more description of these interactions for any other of the 100 Dataverse "users" on this forum who don't get what I'm talking about. I feel it more likely that a subset (of non-Dataverse developers) know exactly what I'm talking about, and hopefully will jump in to flesh this out.

Cheers,

 - Gus

PS: to remind I'm coming at this from the "astronomers" perspective. some astronomer developers have started building such services. here is a list:

Philip Durbin

unread,
Dec 5, 2013, 11:26:36 AM12/5/13
to dataverse...@googlegroups.com
I'm reminded of this blog post, especially the phrase "version of record":

Citing Bytes - Adventures in Data Citation: Frozen Datasets are
Useful, So are Active ones -
http://citingbytes.blogspot.com/2013/11/frozen-datasets-are-useful-so-are.html

Gus, you seem to be saying that SciDrive is where the "active" dataset
would be stored and Dataverse is where the "frozen" dataset would be
stored.

Phil
> --
> You received this message because you are subscribed to the Google Groups
> "Dataverse Users Community" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to dataverse-commu...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.



--
Philip Durbin
Software Developer for http://thedata.org
http://www.iq.harvard.edu/people/philip-durbin

Merce Crosas

unread,
Dec 6, 2013, 11:39:26 AM12/6/13
to dataverse...@googlegroups.com
Gus,

As you know, I'm fully supportive of this idea. A note of caution though, it would be good to formally capture the changes in a dataset when a new version of the dataset is pushed back to Dataverse from the workbench. This could be in the form of a note for version 2, but we might be loosing some of the intermediate steps that happened in the workbench which could be useful for provenance trail.

Still, your suggestion seems a good compromise between the researcher workflow and the data publishing workflow.

Merce


On Thu, Dec 5, 2013 at 11:17 AM, August Muench <augus...@gmail.com> wrote:

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
Mercè Crosas, Ph.D.
Director of Data Science, IQSS
Harvard University



Stephen Marks

unread,
Dec 6, 2013, 3:17:50 PM12/6/13
to dataverse...@googlegroups.com
I like the idea of this functionality as well. I'm always supportive of meeting the researchers "where they are", process-wise, and with DV's versioning you could almost provide a CVS-like model for datasets. Maintaining provenance is definitely a concern, I wonder if there would be some way to use the UNF algorithm to generate a kind of 'diff' for the data. This could be used as a default comment for the commit back to DV.

Really, I'm just making stuff up on a Friday afternoon. =)

s



Reply all
Reply to author
Forward
0 new messages