[tt] NYT: How to Share Scientific Data

Skip to first unread message

Eugen Leitl

Aug 14, 2013, 5:31:00 AM8/14/13
to science-libe...@googlegroups.com
----- Forwarded message from Frank Forman <che...@panix.com> -----

Date: Tue, 13 Aug 2013 21:20:06 +0000 (GMT)
From: Frank Forman <che...@panix.com>
To: Transhuman Tech <t...@postbiota.org>
Subject: [tt] NYT: How to Share Scientific Data

How to Share Scientific Data


Stewart Brand, the founder of the Whole Earth catalog and a Silicon
Valley muse, once said that information wanted to be free and
expensive, simultaneously. That paradox is increasingly haunting the
world of modern science.

A deluge of digital data from scientific research has spawned a
controversy over who should have access to it, how it can be stored
and who will pay to do so.

The matter was the subject of discussion after the journal Science
published a paper on Thursday by Francine Berman, a computer
scientist at Rensselaer Polytechnic Institute who is a leader of a
group that focuses on research data, and Vinton Cerf, the vice
president of Google.

The paper calls for a different culture around scientific data based
on acknowledging that costs must be shared. It also explores
economic models that would involve the support of various scientific
communities: public, private and academic.

In an interview, Dr. Cerf said storing and sharing digital
information was becoming a "crucial issue" for both public and
private institutions.

The debate is likely to accelerate next week when federal agencies
are expected to file proposals for how they would "support increased
public access to the results of research funded by the federal
government." The plans were requested by John P. Holdren, director
of the federal Office of Science and Technology Policy, in a
memorandum in February.

But Mr. Holdren also directed that plans be carried out using
"resources within the existing agency budget."

That is likely to be a formidable challenge. While the cost of data
storage is falling rapidly, the amount of information created by
data-based science is immense. In addition, many agencies have
complicated arrangements providing favorable access to corporations
that then resell federal data. The agencies also must overcome the
hurdle of developing systems that will make the data accessible.

Still, the federal guidelines underscore the importance of digital
information in scientific research and the growing urgency to
resolve the problems.

"Data is the new currency for research," said Alan Blatecky, the
director of advanced cyberinfrastructure at the National Science
Foundation. "The question is how do you address the cost issues,
because there is no new money."

Dr. Berman and Dr. Cerf argued in their paper that private
companies, as well as academic and corporate laboratories, must be
willing to invest in new computer data centers and storage systems
so that crucial research data is not irretrievably lost.

"There is no economic 'magic bullet' that does not require someone,
somewhere, to pay," they wrote.

Dr. Berman is the chairwoman of the United States branch of the
Research Data Alliance, an organization of academic, government and
corporate researchers attempting to build new systems to store the
digital data sets being created, and to develop new software
techniques for integrating different kinds of data and making it
accessible. "Publicly accessible data requires a stable home and
someone to pay the mortgage," she said in an interview.

Google initially promised to host large data sets for scientists for
free, and then killed the program in 2008 after just a year, for
unspecified business reasons.

It may have been that the company was taken aback by the size of
scientific research data sets. For example, the Obama
administration's proposal to eventually capture the activity of just
one million neurons in the human brain (the human brain has about 85
to 100 billion neurons) for a year would require about 3 petabytes
of data storage, or almost one third the amount generated by the
Large Hadron Collider during the same period.

Dr. Berman said she was heartened to see a growing international
recognition of the scope of the problem. The Research Data Alliance,
begun last August with an international telephone conference of just
eight researchers, now has more than 750 academic, corporate and
government scientists and information technology specialists in 50

In their paper, she and Dr. Cerf argue that coping with the
explosion of data would require a cultural shift on the part of not
just the government and corporate institutions, but also individual

"The casual approach for many scientists has been to 'stick it on my
disk drive and make it available to anyone who wants to use it,' "
Dr. Cerf said.

They argued that the costs need not be prohibitive. "If you want to
download a song from iTunes, it's not free, but it doesn't break the
bank," Dr. Berman said.

Even those who feel that information should be free and open
acknowledge that easy availability to data from
government-subsidized projects gives an unfair and unnecessary
advantage to private firms.

And some scientists argue that there would be advantages to charging
for data. "Paying a small fee for downloads in the aggregate would
also act as an incentive for providing the needed infrastructure,"
said Bernardo A. Huberman, a physicist at Hewlett-Packard

In his memorandum, Dr. Holdren told the federal agencies to delay
the release of research papers for a year; the reasons were not
explained. That has angered activists who favor immediate and broad
availability of publicly financed research.

"In scientific fields, a year is a very long time," said Carl
Malamud, the founder of Public. Resource.Org, a nonprofit group that
attempts to make government information freely available online.
Meanwhile, he said, corporations could sell the information. "It's a
sop to the special interests that publish this stuff."

Dr. Berman said there were models that could provide ideas for the
new infrastructures needed to store the data and make it accessible.
One is the Protein Data Bank--a database of biological molecules
--that is heavily used by the life sciences community and is
publicly supported.

That data is freely available. However, she also pointed to the
social science database Longitudinal Study of American Youth, which
is maintained by the Inter-University Consortium for Political and
Social Research at the University of Michigan. Users are charged a
subscription fee.
tt mailing list

----- End forwarded message -----
Eugen* Leitl <a href="http://leitl.org">leitl</a> http://leitl.org
ICBM: 48.07100, 11.36820 http://ativel.com http://postbiota.org
AC894EC5: 38A5 5F46 A4FF 59B8 336B 47EE F46E 3489 AC89 4EC5
Reply all
Reply to author
0 new messages