Comparison: Git, Github, OSF, Databrary and other project management options

matus.s...@privatdemail.net

unread,

Jan 17, 2014, 12:28:08 PM1/17/14

to openscienc...@googlegroups.com

My problem with github add-on is now resolved but I have an additional question. I decided to make it into separate topic.

I'm using Git and Github to manage my projects. I'm satisfied so far, but I'm also open to other suggestions.

Could you compare OSF to Github and other project management options? What are the benefits, what are the drawbacks?

I'm using Github and I had only a brief glance at OSF. I see that OSF allows unlimited private repositories for free, while on github you have to pay to get private repositories.

But OSF is lacking git functionality, so I can't just clone my repository to OSF and then pull/push from it, correct? Somewhere on this list I read that OSF is uses git for version control under the hood so this should be possible. If not now then in the future.

OSF also offers preregistration snapshots, but I guess this is just git's release functionality.

Any other benefits of OSF? How about large datasets? I have several gigabytes of eyetracking data. Would you host these?

What are other options for project managment? I saw the databrary project (http://www.databrary.com/). Their focus is narrower (dev psy data), but we also have some infant eyetracking data, so where should we go with these?

These options also provide additional functionality for rendering certain data formats. You can display and edit code or in fact any text file (+ syntax highlighting), but github doesn't like binaries. Databrary offers some video editing and video coding software. Does OSF offer any file editing functionality?

Anyway, why do you prefer OSF?

Best,

Matus

sheila miguez

unread,

Jan 17, 2014, 12:41:34 PM1/17/14

to openscienc...@googlegroups.com

Good questions and Dataverse is another project to look at, and you can even see that the OSF group is working on integrating with the dataverse project. https://github.com/CenterForOpenScience/openscienceframework.org/issues/112

I saw that because I am also looking to work with dataverse for a project I am working on, <http://researchcompendia.org>.

This is something I think about, so I have a wiki page where I plopped a lot of links to services that provide api documentation, <https://github.com/researchcompendia/researchcompendia/wiki/brainstorming-external-services#wiki-archiving>

I haven't played with http://www.datadryad.org/ yet but I think they sound cool because they let you perform solr searches against data sets.

Dataverse allows some queries that include some statistical analysis, which is also cool.

--
You received this message because you are subscribed to the Google Groups "Open Science Framework" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openscienceframe...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
sheila

Brian Nosek

unread,

Jan 17, 2014, 12:42:40 PM1/17/14

to Open Science Framework

Github is excellent. We use it extensively at COS (https://github.com/CenterForOpenScience).

Considering the relevant overlapping features between Github and OSF, Github does them far better and with more user control. OSF is deliberately simplified on those aspects of the interface. Also, the purpose of pursuing the Github add-on is to expose its great features to OSF users. [As you point out, there is more that can be done to enrich this in the future.]

The OSF model is to connect services so that the user has maximum flexibility to organize their workflow with the tools and services best suited to their needs. For example, some of the next add-on releases will be data repositories like Amazon S3 and Dataverse (we've been talking to Databrary about this too). So, in your OSF project, you could add a component of whichever repository fits best for your data needs, and then link that data to other services that get incorporated (e.g., analytic and visualization tools).

Finally, OSF will be an open infrastructure so that add-ons and services can be linked together by the community as there is interest and expertise available to get it done.

In short, I don't see OSF as a replacement for other tools, but a means of helping users put those tools together to advance their research progress.

Brian

--

Hilmar Lapp

unread,

Jan 17, 2014, 1:31:20 PM1/17/14

to openscienc...@googlegroups.com

On Jan 17, 2014, at 12:41 PM, sheila miguez wrote:

I haven't played with http://www.datadryad.org/ yet but I think they sound cool because they let you perform solr searches against data sets.

Note that the datadryad.org repository wouldn't be very suitable to use for managing data pre-publication. That's because data to be submitted have to be associated with a publication, or (depending on the journal) a manuscript in review.

Having said that, the repository software is open source and is available from https://github.com/datadryad/dryad-repo. You could stand up your own instance of it and use different rules for what data gets accepted.

-hilmar

--
Hilmar Lapp -:- lappland.io

Tom Roche

unread,

Jan 20, 2014, 11:34:20 PM1/20/14

to openscienc...@googlegroups.com

Apologies if this is a FAQ, but I'm not seeing answers in https://osf.io/faq/ or in an all-too-brief scan of https://groups.google.com/forum/#!forum/openscienceframework supplemented by a close reading of the thread in which is found

Matus Fri, 17 Jan 2014 09:28:08 -0800 (PST)

> I have several gigabytes of eyetracking data. Would you host these?

What size datasets can OSF (or Linode or other of repositories with which you integrate) host? How and at what cost? Why I ask:

I'm a grad student seeking to manage pre-publication data for my work on an Eulerian atmospheric model. Eulerian ~= FEM: one divides the 3D atmosphere into boxes (aka voxels) within which matter and energy dis/appear and between which matter and energy are transported. The simulation on which I'm currently working (with a federal agency who are my only real support) involves 50+ chemical species, the usual meteorological variables, and land-use and agricultural variables over 299 "rows" (aka division of the geographical domain in the x/lat direction), 459 "columns" (y/lon), 24 vertical layers (soon to be 35), and hourly timesteps over a year-long run (plus 10-day spinup).

Floats add up :-) Many of my inputs are hundreds of megabytes (though many are smaller--emissions and surface interactions are 2D and often have much longer timesteps) for each of 376 days. Accordingly, most of our data never leaves our clusters. Unfortunately, most also never gets metadata, or any other exposure to the world for, e.g., replication. (An activity to which, as former professional coder, I am more accustomed to referring as "testing" :-)

For that matter, just *finding* data--in house!--is often a major PITA. E.g., the only reason why I'm writing this now (and not earlier--I was privileged to meet Jeff and Josh in Nov 2013 and meant to followup then) is that I'm blocked from continuing to debug my run until my boss returns from her travels and gets some files off her desktop :-( I have for awhile (though only intermittently) been looking @ data repositories (e.g. KNB

http://knb.ecoinformatics.org/m/

) but "still haven't found what I'm looking for" (to paraphrase some Irish guy), and am hampered by

* having money ~= 0.

* having pubs==0. As listizens have noted, there are some great services for reposing data for published papers, which helps me not.

* time pressure

Aesthetically, I would also very much like to have data and code (and, ultimately, an executable paper--e.g., knitr document or IPython notebook) in the same conceptual space, which means I value ease of integration of

https://bitbucket.org/tlroche/profile/repositories

(on which they have graciously hosted my smallest datasets and some figures). From what I see @ https://osf.io/getting-started/ youse have put good thought into your UI, but some of us (at least Matus and I) have a major interest in storage, about which details seem to be lacking. (Esp the details that begin with '$' and end with 'B' :-)

TIA, Tom Roche <Tom_...@pobox.com>

Tim Bates

unread,

Jan 21, 2014, 6:08:34 AM1/21/14

to openscienc...@googlegroups.com

> Matus Fri, 17 Jan 2014 09:28:08 -0800 (PST)
>> I have several gigabytes of eyetracking data. Would you host these?

Sounds like a job for amazon web services or google’s parallel compute cloud. I think they donate space for free, but perhaps they’d donate some r/w and compute time to science/OSF?

t

http://aws.amazon.com/s3/

https://cloud.google.com/products/compute-engine/

Jeffrey Spies

unread,

Jan 21, 2014, 9:30:07 AM1/21/14

to openscienc...@googlegroups.com

Good to hear from you again, Tom.

Tim is right, and we're rolling out an S3 add-on shortly as well as
talking to Amazon about storage possibilities. I have conversations
arranged with a few other large providers as well. We also have a
dataverse add-on in the works, and they'd be able to host projects
with those requirements. I'm not sure what Figshare's limits are, but
they're add-on is also being developed and will rollout
shortly--probably around the same time as S3.

Jeff.

sheila miguez

unread,

Jan 21, 2014, 11:12:21 AM1/21/14

to openscienc...@googlegroups.com

On Tue, Jan 21, 2014 at 8:30 AM, Jeffrey Spies <je...@cos.io> wrote:

with those requirements. I'm not sure what Figshare's limits are, but

I think they are 1GB for free.

--
sheila

Reply all

Reply to author

Forward