I want to propose a data release policy for Biostar.
As you all know one of my goals is to make running biostar really easy
- the downside is that we could end up with lots of identical clones
that have no purpose other than trying to run ads on existing content.
So right now I am thinking that the full datadumps should be released
on say a six month delay.
Administrators would have access to daily datadumps. Anyone else that
would want to run their own analyses could request an up-to-date
datadump from the admins.
These would be full postgresql datadumps (without the OpenID
components) that can be loaded via a single command in about 5 minutes
like so:
./biostar.sh pgimport
The user emails are included in the datadump, we could work on
anonymizing them if people feel strongly about that, of course once
anonymized some functionality (email notifications etc) could not
implemented anymore.
Feel free to comment,
best,
Istvan
--
Istvan Albert
Associate Professor, Bioinformatics
Pennsylvania State University
http://www.personal.psu.edu/iua1/