Apologies if this is a FAQ, but I'm not seeing answers in
https://osf.io/faq/ or in an all-too-brief scan of
https://groups.google.com/forum/#!forum/openscienceframework supplemented by a close reading of the thread in which is found
Matus Fri, 17 Jan 2014 09:28:08 -0800 (PST)
> I have several gigabytes of eyetracking data. Would you host these?
What size datasets can OSF (or Linode or other of repositories with which you integrate) host? How and at what cost? Why I ask:
I'm a grad student seeking to manage pre-publication data for my work on an Eulerian atmospheric model. Eulerian ~= FEM: one divides the 3D atmosphere into boxes (aka voxels) within which matter and energy dis/appear and between which matter and energy are transported. The simulation on which I'm currently working (with a federal agency who are my only real support) involves 50+ chemical species, the usual meteorological variables, and land-use and agricultural variables over 299 "rows" (aka division of the geographical domain in the x/lat direction), 459 "columns" (y/lon), 24 vertical layers (soon to be 35), and hourly timesteps over a year-long run (plus 10-day spinup).
Floats add up :-) Many of my inputs are hundreds of megabytes (though many are smaller--emissions and surface interactions are 2D and often have much longer timesteps) for each of 376 days. Accordingly, most of our data never leaves our clusters. Unfortunately, most also never gets metadata, or any other exposure to the world for, e.g., replication. (An activity to which, as former professional coder, I am more accustomed to referring as "testing" :-)
For that matter, just *finding* data--in house!--is often a major PITA. E.g., the only reason why I'm writing this now (and not earlier--I was privileged to meet Jeff and Josh in Nov 2013 and meant to followup then) is that I'm blocked from continuing to debug my run until my boss returns from her travels and gets some files off her desktop :-( I have for awhile (though only intermittently) been looking @ data repositories (e.g. KNB
http://knb.ecoinformatics.org/m/
) but "still haven't found what I'm looking for" (to paraphrase some Irish guy), and am hampered by
* having money ~= 0.
* having pubs==0. As listizens have noted, there are some great services for reposing data for published papers, which helps me not.
* time pressure
Aesthetically, I would also very much like to have data and code (and, ultimately, an executable paper--e.g., knitr document or IPython notebook) in the same conceptual space, which means I value ease of integration of
https://bitbucket.org/tlroche/profile/repositories
(on which they have graciously hosted my smallest datasets and some figures). From what I see @
https://osf.io/getting-started/ youse have put good thought into your UI, but some of us (at least Matus and I) have a major interest in storage, about which details seem to be lacking. (Esp the details that begin with '$' and end with 'B' :-)
TIA, Tom Roche <
Tom_...@pobox.com>