cost of biological databases (vs wikis)?

James Cheney

unread,

Apr 15, 2011, 1:53:39 PM4/15/11

to BioWiki

Hi,

Does anyone on the list happen to know concrete/citable figures about
how much is spent on building or maintaining biological databases
(either worldwide or by particular countries or organizations)? I
have heard figures quoted of $25+million for the NIH in the US, or
"billions" in various places without links to sources.

Sorry if this is off topic. I imagine one motivation for using wikis
for biological databases is to cut down this cost. I'd also be very
interested in any figures about the relative costs of wiki vs. other
development models.

Thanks for any pointers,
--James

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

barend mons

unread,

Apr 15, 2011, 2:15:57 PM4/15/11

to James Cheney, BioWiki

There is a report in ELIXIR about this issue (a year old) will try to
find it, no guarantee.
it must be somewhere on their website: http://www.elixir-europe.org/page.php

**************************************
Dr. Barend Mons
Scientific Director
Support and external relations
Netherlands Bioinformatics Centre (NBIC)
http://www.nbic.nl
and Biosemantics Group
Leiden University Medical Centre
http://www.biosemantics.org
Mobile: +31-624879779
E-mail: Baren...@nbic.nl
Phone: +31 (0)24 36 19 500
Fax: +31 (0)24 89 01 798

Mail: Netherlands Bioinformatics Centre
260 NBIC
P.O. Box 9101
6500 HB Nijmegen

Visiting address:
LUMC building 2, Einthovenweg 20
2333 ZC Leiden, The Netherlands

> --
> BioWiki mailing list:
> biow...@googlegroups.com
>
> Subscription & archives:
> http://groups.google.com/group/biowiki-l
>
> Unsubscribe:
> biowiki-l+...@googlegroups.com

barend mons

unread,

Apr 15, 2011, 2:24:50 PM4/15/11

to James Cheney, BioWiki

in fact it is the report on WP2 (under reports) Hope this helps

**************************************
Dr. Barend Mons
Scientific Director
Support and external relations
Netherlands Bioinformatics Centre (NBIC)
http://www.nbic.nl
and Biosemantics Group
Leiden University Medical Centre
http://www.biosemantics.org
Mobile: +31-624879779
E-mail: Baren...@nbic.nl
Phone: +31 (0)24 36 19 500
Fax: +31 (0)24 89 01 798

Mail: Netherlands Bioinformatics Centre
260 NBIC
P.O. Box 9101
6500 HB Nijmegen

Visiting address:
LUMC building 2, Einthovenweg 20
2333 ZC Leiden, The Netherlands

On Apr 15, 2011, at 7:53 PM, James Cheney wrote:

Dan Bolser

unread,

Apr 15, 2011, 3:11:15 PM4/15/11

to James Cheney, BioWiki

I don't think this is off topic!

I know the budget of Wikipedia is in the order of a few millions of
dollars a year, which obviously pales into insignificance when you
start talking about 'billions'... I would imagine that you can source
this particular figure from the Wikimedia Foundation.

TBH, I think the real advantage of wiki vs. 'conventional biological
database' isn't one of cost. Both are very cheap to maintain compared
to the cost of 'real' biological research. The difference is that wiki
databases are 'owned' by the community in some sense (in principle at
least), although traditional biological databases are 'owned' by
particular groups, departments, institutions or universities.

My favourite example is the BIND database of protein-protein
interaction, which essentially died when the funding ran out. i.e.
Group X fails to get funding for project y, database z dies. I think
wiki makes this less likely.

HTH,
Dan.

barend mons

unread,

Apr 15, 2011, 3:38:37 PM4/15/11

to Dan Bolser, James Cheney, BioWiki

hear,hear !!!

**************************************
Dr. Barend Mons
Scientific Director
Support and external relations
Netherlands Bioinformatics Centre (NBIC)
http://www.nbic.nl
and Biosemantics Group
Leiden University Medical Centre
http://www.biosemantics.org
Mobile: +31-624879779
E-mail: Baren...@nbic.nl
Phone: +31 (0)24 36 19 500
Fax: +31 (0)24 89 01 798

Mail: Netherlands Bioinformatics Centre
260 NBIC
P.O. Box 9101
6500 HB Nijmegen

Visiting address:
LUMC building 2, Einthovenweg 20
2333 ZC Leiden, The Netherlands

Sriram Kosuri

unread,

Apr 15, 2011, 3:44:39 PM4/15/11

to barend mons, Dan Bolser, James Cheney, BioWiki

It might be worth sending an email to NCBI, as they recently made the decision to stop the Short Read and Trace Archives ( http://www.ncbi.nlm.nih.gov/sra ).

Sri

barend mons

unread,

Apr 15, 2011, 3:58:28 PM4/15/11

to Sriram Kosuri, Dan Bolser, James Cheney, BioWiki

unbelievable coincidence that Atul Butte just showed this very screenshot at the SageBionetworks meeting (where I am sitting now)

So, do we need to convince them to continue funding or to move the wiki way ?

or to 'publish data' (see attached)

CM-ng0411_web.pdf

ED-ng.800-WEB.pdf

Mike Cariaso

unread,

Apr 15, 2011, 4:32:50 PM4/15/11

to BioWiki

Yes wikis, like all software, can have excellent return on investment.
But I think the SRA shutdown may being unfairly viewed as evidence of
a budget crunch. I think it is instead evidence of a responsibly spent
budget.

SRA was useful at the dawn of NGS. Among other things, it was a common
collection of real data data from various emerging formats. It was
cited in
http://nar.oxfordjournals.org/content/early/2009/12/16/nar.gkp1137.full
to show how the fastq format had been allowed to diverge. My
impression was that the SRA would also capture the raw data from the
Heliscope, Polonator, Nanopore, Zs, ... and whatever came next.

But for the major platform it's just become too cheap to produce the
data, storage costs become the new bottleneck. Many labs don't
permanently archive all of their own primary data. Providing a hot
backup every run of every sequencer, everywhere, forever is expensive
and of limited value. They seem to be interested in keeping the
existing content online, but stopping new submissions. Seems pretty
reasonable to me.

I'd like to see them also accept at least 1M sequences from any new
platforms, but I think it is right to close to submissions from the
well established platforms.

> --
> BioWiki mailing list:
> biow...@googlegroups.com
>
> Subscription & archives:
> http://groups.google.com/group/biowiki-l
>
> Unsubscribe:
> biowiki-l+...@googlegroups.com
>
>
>
>

--
--
Mike Cariaso
http://www.cariaso.com

Andrew Su

unread,

Apr 15, 2011, 4:45:43 PM4/15/11

to James Cheney, BioWiki

I recently did a query of the NIH crisp database for "model organism
database" and seem to remember coming up with a number in that range.
I could get you the firm number/results next week.

Andrew

Evelo Chris (BIGCAT)

unread,

Apr 15, 2011, 5:55:01 PM4/15/11

to Andrew Su, James Cheney, BioWiki

Well I think I can make some kind of estimate for WikiPathways.

On our side the investment so far was about 1.5 full PhD students. Meaning 6 year of PhD salary. Counting salary, taxes and overhead that is around 75.000 euro /year or 450.000 total. In the US at least half of that and probably about the same was invested. So that brings the total so far on between 700K and 1M. Not counting investments by Google through GSoC and people from other groups contributing parts of the code (and not at all counting time invested by people to produce pathways).

Best wishes, Chris

James Cheney

unread,

Apr 16, 2011, 1:52:09 PM4/16/11

to Andrew Su, BioWiki

Thanks, I'd really appreciate that since I don't access to that
data. Also, I'll summarize any responses sent to me off-list, as well
as anything I find offline.

The context of this question is that I am trying to find quantitative
evidence (ideally, citable) of the significant costs attached to
biological data and annotation that could be reduced by developing
well-targeted general purpose tools (including wikis, among other
possibilities). I've been finding that many fellow computer
scientists whose work might bear on the problem do not know the
conventional wisdom about the cost of biological data, and I have only
heard secondhand figures, making it hard to persuade others that this
is an important problem.

As other responses have pointed out, there are many different kinds of
data/databases, some of which (e.g. high-throughput sequencing data)
are no longer cost-effective to store given that the data can be
regenerated on demand more cheaply. I'm most interested in those for
which this is not the case (e.g. curated databases containing
community or expert annotations).

--James

Reply all

Reply to author

Forward