How sustainable is dataverse system? are there plans to move to a decentralised system?

49 views

Skip to first unread message

mrdav...@gmail.com

unread,

Feb 23, 2016, 4:53:11 AM2/23/16

to Dataverse Users Community

Hi all,

I'm an epidemiologist in the UK, and recently discovered the dataverse system. I think it's a very impressive and positive development. I'd like to pursue such systems in the future in my work, and also potentially to encourage their use in efforts to aid replication and avoid duplication in the cohort studies which I work on (eg, in syntax sharing - dataset sharing could be problematic given guidelines on access). However, I wanted to ask about the project's funding and long-term plans in this respect, since I understand that data can be housed at Harvard.

I've also recently become aware of the potential strength of decentralised systems to enable data sharing - these have been used for financial purposes (eg, cryptocurrency), with developments in publishing relevant to science (eg, nanopublications https://www.w3.org/wiki/images/c/c0/HCLSIG$$SWANSIOC$$Actions$$RhetoricalStructure$$meetings$$20100215$cwa-anatomy-nanopub-v3.pdf). It seems that a decentralised system could be very promising means of sharing syntax and possible also data - given the lack of central infrastructure needed it could be sustainable in the long term. For example, the Maidsafe network (due for release this year) could be used (http://maidsafe.net). I wonder what you think.

Best wishes

David

---

David Bann PhD

Lecturer, Co-Investigator of the 1958 British birth cohort study (NCDS) and Research Officer

Centre for Longitudinal Studies

UCL Institute of Education

0207 911 5426 | www.cls.ioe.ac.uk

Philip Durbin

unread,

Feb 23, 2016, 8:19:44 AM2/23/16

to dataverse...@googlegroups.com

There's a bit about funding sources at http://dataverse.org/about

Data *can* be housed at Harvard but to be clear, Dataverse is open source and the map at http://dataverse.org shows some installations around the world.

We often talk of "harvesting" as a solution to distribute the *metadata* of datasets in a decentralized manner across Dataverse installations. The files themselves are not harvested, however. This feature is being tracked at https://github.com/IQSS/dataverse/issues/813

For files themselves perhaps this "Support for Duplication of Data Collections Across Repositories" is a good one to leave comments on: https://github.com/IQSS/dataverse/issues/2025 . I found it by searching for "LOCKSS" which stands for "Lots of Copies Keep Stuff Safe": http://www.lockss.org

MaidSafe seems interesting from a quick look. I can't tell if it's open source or not.

Decentralization is a somewhat hard problem of course. Git users tend to laugh that git is a distributed version control system and yet everyone freaks out when GitHub is down. This "Git Distributed Human Interface" doc at https://docs.google.com/a/simple.com/document/d/1ZD5zkT7yEkuCJUyexVbjNzhNKdfdmNfUPYAKoWhae7c/edit#heading=h.1qr6auawov6l talks about ways to avoid this problem but I find it complicated.

Anyway, welcome to the Dataverse community, David! Thanks for sharing your thoughts. I hope this helps a little at least. :)

Phil

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/4236802f-9bae-4ed9-886d-ba0ff4b99ca7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Philip Durbin
Software Developer for http://dataverse.org
http://www.iq.harvard.edu/people/philip-durbin

Reply all

Reply to author

Forward

0 new messages