data release policy

10 views
Skip to first unread message

Istvan Albert

unread,
Apr 4, 2012, 11:40:18 AM4/4/12
to biostar...@googlegroups.com
Hello Everyone,

I want to propose a data release policy for Biostar.

As you all know one of my goals is to make running biostar really easy
- the downside is that we could end up with lots of identical clones
that have no purpose other than trying to run ads on existing content.
So right now I am thinking that the full datadumps should be released
on say a six month delay.

Administrators would have access to daily datadumps. Anyone else that
would want to run their own analyses could request an up-to-date
datadump from the admins.

These would be full postgresql datadumps (without the OpenID
components) that can be loaded via a single command in about 5 minutes
like so:

./biostar.sh pgimport

The user emails are included in the datadump, we could work on
anonymizing them if people feel strongly about that, of course once
anonymized some functionality (email notifications etc) could not
implemented anymore.

Feel free to comment,

best,

Istvan


--
Istvan Albert
Associate Professor, Bioinformatics
Pennsylvania State University
http://www.personal.psu.edu/iua1/

Bio X2Y

unread,
May 5, 2012, 4:27:12 PM5/5/12
to biostar...@googlegroups.com
Hi Istvan,

What personal information is included in the data dumps?

I feel that the dumps should not disclose any information that isn't currently publicly displayed on the site.

For example, the following details should not be disclosed:

(a) e-mail address (unless explicitly provided by the user for their profile page)
(b) IP addresses.
(c) login dates/times (other than the single login needed for the "Last seen" functionality)

I don't believe users have given their permission for these details to be disclosed.

Thanks.

Istvan Albert

unread,
May 5, 2012, 7:17:53 PM5/5/12
to biostar...@googlegroups.com
On Sat, May 5, 2012 at 4:27 PM, Bio X2Y <bio...@gmail.com> wrote:

> What personal information is included in the data dumps?

The admin dump is a full backup of the database with all information
so that the site can be recreated if necessary.

I guess the public dump will probably need to be anonymized to some extent.

best,

Istvan.
Reply all
Reply to author
Forward
0 new messages